Role & Responsibilities Build highly scalable, available, fault-tolerant distributed data processing systems (batch and streaming systems) processing over 100s of terabytes of data ingested every day and petabyte-sized data warehouse and elasticsearch cluster. Build quality data solutions and refine existing diverse datasets to simplified models encouraging self-service. Build data pipelines that optimize on data quality and are resilient to poor-quality data sources. Own the data mapping, business logic, transformations, and data quality. Low-level systems debugging, performance measurement & optimization on large production clusters. Participate in architecture discussions, influence product roadmap, and take ownership and responsibility over new projects. Maintain and support existing platforms and evolve to newer technology stacks and Candidate : Proficiency in Python and PySpark. Deep understanding of Apache Spark, Spark tuning, creating RDDs, and building data frames. Experience in big data technologies like HDFS, YARN, Map-Reduce, Hive, Kafka, Spark, Airflow, Presto, etc. Experience in building distributed environments using any of Kafka, Spark, Hive, Hadoop, etc. Good understanding of the architecture and functioning of distributed database systems. Experience working with various file formats like Parquet, Avro, etc., for large volumes of data. Experience with one or more NoSQL databases. Experience with AWS, GCP. 5+ years of professional experience as a data or software engineer. (ref:hirist.tech),
Employement Category:
Employement Type: Full time Industry: IT Services & Consulting Role Category: Not Specified Functional Area: Not Specified Role/Responsibilies: Lead Data Engineer - Python Job in Tech
Contact Details:
Company: Tech Recruitz Location(s): Other Karnataka