Senior Data Engineer
Job Details
About the Company
With operational hubs scattered across Europe, Asia, and LATAM, and its headquarters situated in San Francisco, US, the company boasts a workforce of over 1,000 adept professionals. Spanning across more than 20 countries, ALLSTARSIT offers a diverse range of skilled employees across various verticals, including AI, cybersecurity, healthcare, fintech, telecom, media, and so on.
About the Project
Global SaaS company that collects, processes, and analyzes large-scale web data to create consumer insights for leading international brands. Tools and platforms help clients manage e-commerce, social media, and product development activities efficiently. The company’s data solutions combine cutting-edge technologies, analytics, and AI-based components – enabling businesses to understand markets, trends, and customer behavior more deeply.
About the Role
We are seeking an experienced Senior Data Engineer to design, build, and optimize scalable data pipelines using AWS and the Databricks framework. As part of the Data Team, you will be responsible for ensuring data reliability, integrity, and efficiency, while contributing to the implementation of Machine Learning (ML), MLflow, and Large Language Model (LLM) based solutions. You will collaborate closely with R&D, Product, and Delivery teams to validate new features, resolve technical issues, and ensure that high-quality, client-ready data solutions are delivered on time.
Specialization
Headquarters
Years on the market
Team size and structure
Current technology stack
Required skills:
- Minimum 5 years of experience as a Data Engineer.
- Strong hands-on experience with PySpark, Python, and SQL.
- Proven background with AWS (S3, Glue, Lambda, Redshift, or similar).
- Experience with the Databricks (DBX) framework.
- Background in automated testing and data pipeline QA.
- Strong English communication skills and ability to work cross-functionally.
Nice to Have
- Familiarity with MLflow, LLM-based solutions, or data lake architectures.
- Experience with CI/CD pipelines and DevOps practices.
- Understanding of data governance and monitoring frameworks.
Scope of work:
- Design, develop, and maintain robust data pipelines using PySpark and AWS services.
- Deliver production-grade data outputs with a strong focus on accuracy, reliability, and performance.
- Develop and maintain automated testing systems to ensure data quality and integrity.
- Integrate ML, MLflow, and LLM-based workflows into production data systems.
- Troubleshoot complex issues across data pipelines and optimize performance.
- Collaborate with cross-functional teams to define release readiness and quality standards.
- Promote best practices in data engineering, testing, documentation, and continuous improvement.