Middle Data Engineer
Job Details
About the Company
With operational hubs scattered across Europe, Asia, and LATAM, and its headquarters situated in San Francisco, US, the company boasts a workforce of over 1,000 adept professionals. Spanning across more than 20 countries, ALLSTARSIT offers a diverse range of skilled employees across various verticals, including AI, cybersecurity, healthcare, fintech, telecom, media, and so on.
About the Project
Global SaaS company that collects, processes, and analyzes large-scale web data to create consumer insights for leading international brands. Tools and platforms help clients manage e-commerce, social media, and product development activities efficiently. The company’s data solutions combine cutting-edge technologies, analytics, and AI-based components – enabling businesses to understand markets, trends, and customer behavior more deeply.
About the Role
We are seeking a Mid-Level Data Engineer to design, build, and optimize scalable data pipelines using AWS and the Databricks framework. As a member of the Data Engineering team, you’ll ensure data reliability, integrity, and efficiency, and contribute to implementing machine learning (ML), MLflow, and LLM-based solutions that power intelligent data products. You will collaborate closely with R&D, Product, and Delivery teams to validate features, improve system performance, and deliver high-quality, client-ready data outputs.
Specialization
Headquarters
Years on the market
Team size and structure
Current technology stack
Required skills:
- 3+ years of professional experience as a Data Engineer.
- Strong hands-on experience with PySpark, Python, and SQL.
- Proven experience with AWS frameworks (e.g., S3, Glue, Redshift, Lambda).
- Knowledge of automated testing, data QA, and performance optimization.
- Excellent communication skills in English and ability to work cross-functionally in global teams.
Nice to Have
- Experience with Databricks (DBX).
- Familiarity with big data and data lake architectures.
- Understanding of CI/CD pipelines and DevOps practices.
- Experience with MLflow or exposure to LLM-based solutions.
- Knowledge of data governance and monitoring frameworks.
Scope of work:
- Design, build, and maintain data pipelines using PySpark and AWS services.
- Deliver production-grade data solutions that are accurate, efficient, and reliable.
- Develop automated testing and QA systems to ensure data quality.
- Integrate ML, MLflow, and LLM-based workflows into existing data pipelines.
- Troubleshoot and resolve complex pipeline and performance issues.
- Work closely with Product Managers and Analysts to define release quality and readiness standards.
- Follow and promote best practices in data engineering, QA, and documentation.