SRE Team Lead (Hands-on)

Level

Team Lead

Department

Other IT Positions

Type

Full Time

Project

Locations:

Poland

Romania

Ukraine

Bulgaria

Serbia

Job Details

Posted on:

January 22, 2026

Job ID:

About the Company

Established in 2004, ALLSTARSIT was founded with a clear vision: to enhance the landscape of global IT employment by bridging the gap between companies and skilled professionals. The core belief was that assembling a team shouldn't be hindered by geographical constraints. Fast forward to the present day, ALLSTARSIT stands as an international outstaffing service provider committed to change the way businesses recruit, compensate, and oversee top talent worldwide.

With operational hubs scattered across Europe, Asia, and LATAM, and its headquarters situated in San Francisco, US, the company boasts a workforce of over 1,000 adept professionals. Spanning across more than 20 countries, ALLSTARSIT offers a diverse range of skilled employees across various verticals, including AI, cybersecurity, healthcare, fintech, telecom, media, and so on.

About the Project

We are looking for a hands-on SRE Team Lead to own the reliability, scalability, and operational excellence of a cloud-native fintech platform built on microservices.
This role combines technical leadership, architecture ownership, and deep hands-on execution.

You will lead a small SRE team while remaining actively involved in design, coding, incident response, and reliability engineering.

Specialization

Headquarters

Years on the market

Team size and structure

Current technology stack

Required skills:

8+ years of experience in SRE / Platform / DevOps engineering
Strong hands-on experience with:
- AWS (EKS, EC2, RDS, IAM, CloudWatch, ALB)
- Kubernetes & Docker
- Microservices architectures
Strong programming background in Java and/or Node.js
Deep understanding of:
- Distributed systems
- Production debugging
- Capacity planning
Experience in fintech or regulated environments is a strong plus

Nice to Have

Experience with chaos engineering tools
Security & compliance exposure (PCI-DSS, SOC2, ISO)
Prior experience building or scaling SRE teams

Scope of work:

Reliability & Architecture

Own platform availability, latency, scalability, and resilience across environments
Define and enforce SLOs, SLIs, error budgets, and operational KPIs
Design and review resilience patterns: circuit breakers, retries, rate limiting, graceful degradation
Drive chaos engineering, fault-injection, and disaster-recovery readiness

Hands-on Engineering

Actively contribute code (Java / Node) for:
- Reliability tooling
- Platform automation
- Observability integrations
Review microservice architecture with engineering teams to eliminate single points of failure

Cloud & DevOps Leadership

Own AWS architecture (VPCs, IAM, EKS, RDS, ALB/NLB, autoscaling)
Drive Kubernetes best practices (resource tuning, HPA, pod disruption budgets)
Improve CI/CD pipelines for reliability, speed, and safety

Incident & Operations

Lead production incident response, root cause analysis (RCA), and postmortems
Establish blameless postmortem culture
Reduce MTTR through automation and better observability
Participate in escalation/on-call strategy (not firefighting 24×7)

People & Process