Job Description
: Spark/Scala/PySpark developer who knows how to fully exploit the potential of our Spark
cluster. Should have the ability to clean, transform, and analyze vast amounts of raw data from
various systems using Spark to provide ready-to-use data.
Responsibilities: Create Scala/Spark/Pyspark jobs for data transformation and aggregation Produce unit tests for Spark transformations and helper methods Write Scaladoc-style documentation with all code Design data processing pipelines
Skills: Pyspark Scala (with a focus on the functional programming paradigm) Apache Spark 2. x, 3. x
- Apache Spark RDD API
- Apache Spark SQL DataFrame API
- Apache Spark Streaming API
Spark query tuning and performance optimization SQL database integration (Postgres, and/or MySQL) Experience working with HDFS, AWS ( S3, Redshift, EMR, IAM, Polices, Routing) CI-CD Pipeline, Jenkins, Gitlab /Bitbucket Deep understanding of distributed systems (e.g. partitioning, replication, consistency, and consensus)
Employement Category:
Employement Type: Full time
Industry: IT - Software
Role Category: IT Operations / EDP / MIS
Functional Area: Not Applicable
Role/Responsibilies: Data Streaming
Contact Details:
Company: Integrated Personnel
Location(s): Multi-City, India
Keyskills:
scala
spark
pyspark