Portrait of Ahmed Guebsi

AI Engineer · LLM Systems · Python Backend

I build AI systems that ship and hold under pressure.

Agentic architectures (LangGraph, MCP), production RAG pipelines, and observable Python backends — instrumented end-to-end so failures are never silent.

Ahmed Guebsi
  • 120K+Documents processed via RAG
  • 8Production endpoints shipped
  • 99.5%Uptime under concurrent load

About

Engineering trust into AI systems.

I work at the intersection of agentic architecture, RAG pipelines, and production Python backend engineering. At Avaxia, I deployed multi-step LLM workflows using MCP and LangGraph across enterprise clients, and instrumented them end-to-end with Langfuse and LangSmith — because in production, failures should never be silent.

Earlier, at RegimLab, I built EEG-based clinical AI that reached 92% ADHD classification accuracy, layered with Explainable AI (SHAP, LIME, Integrated Gradients) and validated by domain clinicians. In healthcare, a model that can't explain itself is a model that can't be trusted.

I care about systems that are observable, testable, and honest about their failure modes.

What I work with

Skills & Stack

LLM & Agentic Systems

The core of my work — building reliable agentic pipelines that ship to production, not just demos.

  • LangGraph
  • MCP
  • A2A
  • LangChain
  • LlamaIndex
  • OpenAI API
  • Claude API
  • Ollama
  • Prompt Engineering
  • Tool Use
  • Context Management

RAG & Retrieval

  • RAG Pipelines
  • Milvus
  • Semantic Chunking
  • Embeddings
  • Hybrid Search
  • Re-ranking
  • LangSmith
  • Langfuse
  • RAGAS
  • n8n

Backend & APIs

  • FastAPI
  • Flask
  • asyncio
  • Celery
  • Redis
  • REST
  • PostgreSQL
  • MongoDB
  • Pydantic
  • WebSockets

MLOps & Observability

  • Docker
  • Kubernetes
  • ArgoCD
  • MLflow
  • GitHub Actions
  • Azure
  • AWS
  • Selenium
  • CI/CD for ML
  • Model Versioning

Research & XAI

  • SHAP
  • LIME
  • Integrated Gradients
  • PyTorch
  • EEGNet
  • CNN / LSTM
  • Time-Series
  • Hugging Face
  • LoRA / PEFT
  • Quantization
Code
  • Python
  • Java
  • C++
  • SQL
  • Bash
Languages
  • Arabic (Native)
  • English (Fluent)
  • French (Fluent)

Where I've worked

Experience

AI Backend Developer

Avaxia Group  ·  Tunis, Tunisia  ·  May 2024 – Nov 2025

  • Architected and shipped production-grade agentic AI systems using LangGraph and MCP, orchestrating multi-step tool-calling workflows across 2 enterprise clients and processing 120K+ documents through semantic RAG pipelines — lifting retrieval precision from 61% → 84% (RAGAS context precision).
  • Engineered 8 production RESTful endpoints with FastAPI and async Python exposing LLM inference, RAG retrieval, and conversational AI services — sustaining 99.5% uptime and a 38ms median response time under concurrent load.
  • Cut API p99 latency by 76% (2.8s → 670ms) via cProfile-driven optimization, asyncio refactoring, and Redis TTL tuning — scaling throughput from 120 to 520 RPM without infrastructure changes.
  • Owned end-to-end delivery of 4 production AI features across 8 months, translating requirements from 2 cross-functional teams into deployable services on Azure with ArgoCD-managed GitOps pipelines.
  • Drove code quality through structured PR reviews and a pytest + Selenium suite over 40+ critical paths — reaching 78% coverage and cutting production incidents by 35%.
  • Python
  • FastAPI
  • LangGraph
  • MCP
  • LangChain
  • LlamaIndex
  • OpenAI
  • Claude
  • Mixtral
  • Ollama
  • Redis
  • Celery
  • Milvus
  • Langfuse
  • LangSmith
  • n8n
  • Azure
  • ArgoCD
  • Docker
  • Selenium

Machine Learning Researcher · Intern

RegimLab  ·  Sfax, Tunisia  ·  May 2023 – Oct 2023

  • Designed a clinical-grade ADHD detection system on EEG time-series from 180 subjects, combining CNN, LSTM, and EEGNet with Explainable AI overlays — the lab's first XAI-augmented diagnostic pipeline validated by medical reviewers.
  • Engineered a signal processing pipeline (VMD + ICA) to clean 64-channel EEG recordings — compressing raw feature space from 320+ dimensions to 28 informative biomarkers while preserving clinically relevant frequency bands.
  • Reached 92% accuracy (F1: 0.91) — beating the prior lab baseline by +6 pp — via a CNN/LSTM/EEGNet ensemble with PCA + RFE feature selection on multi-band EEG spectrograms.
  • Elevated clinical trust through SHAP, LIME, and Integrated Gradients overlays generating per-prediction biomarker attribution maps — reducing physician review time by 22%, validated by 2 domain clinicians.
  • Python
  • PyTorch
  • EEGNet
  • CNN
  • LSTM
  • MNE
  • SHAP
  • LIME
  • Integrated Gradients
  • XGBoost
  • PCA
  • VMD
  • ICA

Data Science Intern

Datasphera  ·  Tunis, Tunisia  ·  Jun 2022 – Aug 2022

  • Delivered an internal NLP chatbot handling 60+ daily queries with integrated sentiment analysis — reaching 84% intent classification accuracy on a 12K-sample domain corpus via fine-tuned BERT.
  • Compressed a BERT-based model from 340MB to 87MB using LoRA fine-tuning and INT8 quantization — a 2.2× inference speedup with under 1% accuracy loss, enabling deployment on memory-constrained instances.
  • Containerized and deployed the full NLP inference stack on Kubernetes with zero-downtime rolling releases — 99.5% availability over 3 months, tracking 22 experiment runs in MLflow.
  • Cut end-to-end chatbot response time by 30% (1.4s → 980ms) via Flask request batching, model-layer caching, and INT8 quantization — lifting post-launch user satisfaction by 14%.
  • Python
  • Flask
  • BERT
  • PyTorch
  • Hugging Face
  • LoRA
  • INT8 Quantization
  • Docker
  • Kubernetes
  • GitHub Actions
  • MLflow
  • AWS

Software Engineer Intern

Innovup  ·  Tunis, Tunisia  ·  Jul 2021 – Sep 2021

  • Launched a full-stack project tracking platform (MEAN stack) adopted by 3 internal teams and 22 users within 2 weeks — integrating OAuth 2.0 / JWT and real-time webhook notifications with zero security incidents post-launch.
  • Designed a behavioral data pipeline capturing 8+ interaction event types at roughly 4K events/day — powering analytics dashboards and personalization models with Pandas and SQL.
  • Built content-based and collaborative-filtering recommenders in Scikit-learn — a 12% lift in task click-through and 9% improvement in collaborator discovery over the non-personalized baseline.
  • Ran 3 A/B experiments (p < 0.05) demonstrating an 8% reduction in project completion time — informing the product team's full rollout decision.
  • MEAN Stack
  • JWT
  • OAuth 2.0
  • Redux
  • Python
  • Scikit-learn
  • Pandas
  • SQL
  • A/B Testing
  • Recommenders

Credentials

Education & Certifications

  • 2020 – 2023

    Engineering Degree · Computer Science & Applied Mathematics

    National Engineering School of Sfax (ENIS) · Sfax, Tunisia

  • 2017 – 2020

    Pre-engineering Cycle · Mathematics & Physics

    Preparatory Institute of Engineering Studies, El Manar · Tunis, Tunisia

  • 2016

    Baccalaureate in Mathematics

    High School · Tunis, Tunisia

Certifications

  • Huawei HCIA-AI Jun 2022 – Jun 2025

Languages

  • ArabicNative
  • EnglishFluent
  • FrenchFluent

Get in touch

Let's build something together.

Open to AI Engineer / LLM Systems / Python Backend roles and consulting. I usually reply within 24 hours.