AI Engineer (Contract)
Remote | Full-time preferred | 4-6 months, with potential to extend
We’re looking for an AI engineer to build intelligent assessment systems that transform how skills and knowledge are evaluated. You’ll work across backend systems, evaluation frameworks, and LLM workflows to create scalable, accurate AI features that help awarding bodies and training providers automate assessment, generate user insights, and enable adaptive learning experiences.
We’re a small, pragmatic team where processes are informal and everyone pitches in to improve how we work. You should be comfortable working independently, making technical decisions without everything being fully specced, and building systems that can adapt as requirements evolve. We’re building our AI assessment capabilities, and your expertise will help shape our approach to automated evaluation.
About us
sAInaptic is a UK-based AI company reinventing workforce certification. We work with awarding bodies and training providers to automate evaluation of knowledge and skills, helping clients save money and time while improving feedback and consistency of marking. Our vision is to get more skilled professionals into the future workforce.
What you’ll do
- Design evaluation pipelines and metrics aligned with marking criteria; maintain golden datasets and regression tests
- Experiment with, tune, and benchmark LLMs; optimise latency and cost
- Prototype and productionise agent-based workflows with tool calling to reduce manual setup
- Build and maintain APIs and services for AI-driven assessment features
- Build multimodal workflows that process images and video alongside text, and apply semantic analysis for accurate scoring
- Build systems that generate clear, actionable feedback aligned to assessment criteria, with confidence signals or brief rationales, to support adaptive learning insights
- Instrument LLM workflows with tracing and online evaluation (e.g., Langfuse); design guardrails and fallbacks (confidence thresholds, human review); document evaluation methods and results for reproducibility
- Collaborate with the dev team (4-5 developers) and assessment experts; participate in code reviews; contribute to architecture
What we’re looking for
- 4+ years of backend or AI-focused engineering experience
- Hands-on experience working with LLMs, agentic workflows, and related frameworks (such as LangChain or LangGraph)
- Strong Python programming skills; experience building APIs
- Proven track record designing evaluation frameworks and tuning model performance
- Experience shipping AI features that use at least one of: speech-to-text for audio/video, or computer vision for images/video
- Experience with experiment and production observability and safeguards (MLflow/W&B/DVC, Langfuse, confidence calibration, human-in-the-loop)
- Experience with cloud platforms (AWS preferred, or Azure) for model deployment, evaluation, and scalable AI workloads
- Understanding of data privacy, fairness, auditability, system integrity, and scalable architecture
- Proficiency with Git, GitHub, and collaborative development workflows
- Comfortable working without close supervision and contributing to shared architecture
Bonus points for
- Experience with assessment technology, EdTech, or automated evaluation systems
- Familiarity with ML frameworks (PyTorch, Scikit-learn)
- Experience with semantic search, retrieval pipelines, or document analysis
- Interest in prompt engineering, user intent modelling, or AI agent design
- Background in natural language processing or automated scoring
- Experience with both speech-to-text and computer vision, and with models that combine images/video with text
Working hours & location
This is a fully remote position. We’re UK-based and prefer significant overlap with UK hours. Full-time availability is ideal, though part-time may be considered. Competitive rates offered.