AI

ESIA

School of Artificial Intelligence

Worldwide cohort
Students across time zones
HomeMaster’sTrainingsProjectsResearchBlogAboutContact

Master’s Program

Data Scientist 4.0 & Industrial Data Analytics

End-to-end ML + GenAI, from data to production systems

Overview

Data Scientist 4.0 is a modern, applied track built for people who want to ship real AI products. You’ll learn the full lifecycle: data preparation, modeling (ML & deep learning), GenAI patterns, evaluation, deployment, monitoring, and iteration with production constraints.

Key outcomes

  • Build reliable ML pipelines with strong data quality foundations
  • Train, evaluate, and debug models using robust methodology
  • Apply GenAI patterns (RAG, agents, guardrails) with measurable KPIs
  • Deploy production services (FastAPI), containers (Docker) and CI/CD basics
  • Deliver a portfolio of end-to-end projects with documentation

Format

  • Flexible: 4-week intensive or 10-week part-time format
  • Weekly deliverables + checkpoints
  • Portfolio-based evaluation
  • Remote-friendly (worldwide cohorts)

Tools

  • Python, Pandas, NumPy
  • scikit-learn, PyTorch or TensorFlow
  • FastAPI, Docker
  • Git/GitHub, CI/CD fundamentals
  • MLflow basics (tracking & versioning)
  • Vector DB concepts (RAG)

Detailed program

A production-first curriculum aligned with today’s hiring standards: ML systems, deployment, monitoring, and GenAI (RAG) patterns used in real products.

Module 1 — Data Foundations & Quality

Build clean, reliable datasets and prevent downstream quality issues.

2–3 weeks

What you’ll learn

  • Advanced pandas patterns (joins, reshaping, performance basics)
  • Data quality checks (missingness, outliers, duplicates, schema validation)
  • SQL for analytics (CTE, window functions, query design basics)
  • Leakage detection and reproducible dataset building
  • Project structure: src/, configs, notebooks policy, reproducibility habits

Skills you’ll gain

Data cleaning at scaleSQL analytics foundationsData quality & validation mindsetLeakage preventionReproducible project structure

Module 2 — Statistics, Experimentation & Decision Metrics

Make decisions with correct metrics and avoid common statistical traps.

2 weeks

What you’ll learn

  • EDA framework: distributions, correlations, target leakage checklist
  • Practical statistics: confidence intervals, hypothesis testing, effect size
  • A/B testing basics and pitfalls (power, bias, peeking, seasonality)
  • Business metrics vs model metrics: aligning success criteria

Skills you’ll gain

Statistical reasoningExperiment design basicsMetric selection & trade-offsStakeholder-ready insights

Module 3 — Supervised Machine Learning (Production Mindset)

Train strong baselines, optimize, and explain models reliably.

4 weeks

What you’ll learn

  • Regression & classification pipelines (preprocessing, CV, baselines)
  • Evaluation beyond accuracy: ROC-AUC, PR-AUC, calibration, thresholding
  • Error analysis: slices, segments, stability checks
  • Feature engineering: categorical, time-based, aggregation features
  • Explainability basics: SHAP intuition + communicating model behavior

Skills you’ll gain

End-to-end ML pipeline designRobust evaluation methodologyError analysis & debuggingFeature engineeringExplainability storytelling

Module 4 — Deep Learning Essentials

Understand neural networks enough to use them correctly in real projects.

2–3 weeks

What you’ll learn

  • Neural network fundamentals: forward/backprop intuition
  • Training strategy: learning rate, batch size, regularization, early stopping
  • Overfitting control and practical troubleshooting
  • When to use deep learning vs classical ML (cost/benefit)

Skills you’ll gain

Deep learning fundamentalsTraining & tuning disciplineOverfitting controlModel selection judgment

Module 5 — GenAI in Production (RAG + Evaluation + Guardrails)

Build ‘chat-with-your-data’ systems with evaluation and reliability.

2–3 weeks

What you’ll learn

  • Prompting patterns and structured outputs (schemas, constraints)
  • Embeddings + vector search fundamentals (RAG architecture)
  • Chunking, retrieval strategy, citations, freshness considerations
  • Evaluation basics: groundedness, hallucinations, test sets, regression tests
  • Safety basics: guardrails mindset, policy constraints, failure modes

Skills you’ll gain

RAG architecture understandingVector search reasoningLLM evaluation mindsetReliability & guardrails basics

Module 6 — Deployment & MLOps Basics

Ship models as real services: versioning, CI/CD basics, monitoring signals.

4–5 weeks

What you’ll learn

  • Serving: FastAPI inference endpoints, input validation, payload schemas
  • Docker basics for ML services (images, env configs, reproducibility)
  • MLflow patterns: tracking runs, artifacts, model versioning basics
  • CI/CD fundamentals: lint/tests, basic pipeline concepts, release discipline
  • Monitoring signals: performance degradation, drift awareness, feedback loop

Skills you’ll gain

FastAPI model servingDockerized deploymentMLflow tracking basicsCI/CD discipline basicsMonitoring & drift awareness

Capstone — End-to-End AI Product

Deliver a portfolio-grade product with demo, docs, and deployment.

3–4 weeks

What you’ll learn

  • Problem framing: objectives, constraints, success metrics
  • Data pipeline + modeling + evaluation + error analysis
  • Deploy an API + containerize + basic release workflow
  • Documentation, demo storytelling, and portfolio packaging

Skills you’ll gain

End-to-end deliveryProduct thinkingProduction deployment workflowPortfolio-ready communication