GCP Pyspark Data Engineer
Full Time | PAN India | India
Role Overview
Core Technical Skills
· Python:
o Data processing and transformation using Pandas, NumPy
o Writing modular, reusable code for ETL workflows
o Automation and scripting for data operations
· PySpark:
o Building distributed data pipelines
o Spark SQL, DataFrame APIs, and RDDs
o Performance tuning (partitioning, caching, shuffle optimization)
· SQL:
o Complex queries, joins, aggregations, and window functions
o Query optimization for large datasets
· Data Modeling & ETL:
o Designing schemas for analytics and operational systems
o Implementing ETL/ELT pipelines and orchestration tools (Airflow, Databricks Jobs)
· Big Data & Cloud Platforms:
o Experience with AWS, Azure, or GCP
o Familiarity with data lakes and Delta Lake patterns
· File Formats & Storage:
o Parquet, ORC, Avro for efficient storage
o Understanding of partitioning strategies
· Testing & CI/CD:
o Unit and integration testing for data pipelines
o Git-based workflows and automated deployments