GCP Pyspark Data Engineer

Full Time | PAN India | India

Industry : Information Technology and Services
Experience5 - 8 years
Compensation1,000,000 - 2,000,000
Openings2

Role Overview

Core Technical Skills

· Python:

o Data processing and transformation using Pandas, NumPy

o Writing modular, reusable code for ETL workflows

o Automation and scripting for data operations

· PySpark:

o Building distributed data pipelines

o Spark SQL, DataFrame APIs, and RDDs

o Performance tuning (partitioning, caching, shuffle optimization)

· SQL:

o Complex queries, joins, aggregations, and window functions

o Query optimization for large datasets

· Data Modeling & ETL:

o Designing schemas for analytics and operational systems

o Implementing ETL/ELT pipelines and orchestration tools (Airflow, Databricks Jobs)

· Big Data & Cloud Platforms:

o Experience with AWS, Azure, or GCP

o Familiarity with data lakes and Delta Lake patterns

· File Formats & Storage:

o Parquet, ORC, Avro for efficient storage

o Understanding of partitioning strategies

· Testing & CI/CD:

o Unit and integration testing for data pipelines

o Git-based workflows and automated deployments

Skill Set

pyspark azure aws gcp data engineer
Application

Apply for this role

PDF, DOC, or DOCX up to your server upload limit.