Senior Data Engineer / NDA

Job Details

Posted on: 
January 22, 2026
Job ID:

About the Company

Established in 2004, ALLSTARSIT was founded with a clear vision: to enhance the landscape of global IT employment by bridging the gap between companies and skilled professionals. The core belief was that assembling a team shouldn't be hindered by geographical constraints. Fast forward to the present day, ALLSTARSIT stands as an international outstaffing service provider committed to change the way businesses recruit, compensate, and oversee top talent worldwide. 

With operational hubs scattered across Europe, Asia, and LATAM, and its headquarters situated in San Francisco, US, the company boasts a workforce of over 1,000 adept professionals. Spanning across more than 20 countries, ALLSTARSIT offers a diverse range of skilled employees across various verticals, including AI, cybersecurity, healthcare, fintech, telecom, media, and so on.

About the Project

Our client is a legal tech startup that focuses on AI and machine learning, specifically building chatbots to answer legal questions for lawyers. They are looking for a Senior Data Engineer for a high-impact project: digitizing law in Morocco and Africa and creating the first AI-quarriable legal knowledge base.

Their ambition is to build a platform capable of answering legal questions in a reliable, well-sourced, and traceable way, based on a massive corpus of heterogeneous legal documents.

🚀 Why this project is different
You will join a true “knowledge infrastructure” mission:

  • Contribute to making the law more accessible
  • Build a durable asset: a structured database of Moroccan law (in French), extensible to Africa
  • Work on a concrete and deep technical challenge: transforming unstructured data into exploitable, reliable, and maintainable data at scale

Specialization

Headquarters

Years on the market

Team size and structure

Current technology stack

Required skills:

  • 3+ years of experience in Data Engineering and/or applied Document AI / NLP
  • Strong proficiency in Python
  • Hands-on experience with unstructured documents: PDF parsing, OCR, cleaning, structuring
  • Used to delivering to production: robust pipelines, observability, quality, performance

🛠 Stack/skills (indicative)

  • Storage: AWS
  • Document processing: OCR/parsing tools, text preprocessing pipelines
  • Testing & quality: metrics, sampling, automated validatio


Nice to have

  • Experience with legal / regulatory corpora or high-precision content
  • Familiarity with multilingual issues and encoding
  • Basic knowledge of downstream needs (vector DBs, retrieval, citation)

Scope of work:

You will be responsible for the “documents → structured data” pipeline that will feed our AI (RAG) engine.

At the core of the role (technical focus)
Build a structured database of Moroccan law in French from highly heterogeneous data:

  • PDFs (text-based and scanned), Word files, images, text files, sometimes noisy or incomplete
  • Text extraction (parsing + OCR when needed), cleaning
  • Structuring: detection of titles/chapters/sections/articles, hierarchy, normalization
  • Intelligent chunking (based on legal structure rather than arbitrary size), with traceability (source, page, identifiers)
  • Metadata: date, type of text (law/decree/circular/case law, etc.), source, version, article numbers, etc.
  • Deduplication & versioning: redundant documents, amendments, consolidated versions
  • Industrialization: orchestration, logs, retries, idempotence, monitoring, quality tests

Why ALLSTARSIT?

Apply now

More open positions

Apply for

Senior Data Engineer / NDA

Full name *

E-mail *

Phone *

Country

Uploading...
fileuploaded.jpg
Upload failed. Max size for files is 10 MB.

Cover Letter

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.