Profesor(es)
Ioannis Chatzigiannakis
Turno
Turno Tarde (13:30 a 16:30)
Cupo
Sin definir
Idioma
Inglés
Descripción

ML often centralizes data for training, weakening data control and raising privacy, security, efficiency concerns—especially on edge devices. Federated Learning (FL) trains models without moving raw data, improving privacy and reducing transfer costs. This course moves from theory to practice, mixing fundamentals with advanced topics (personalization, hierarchy, privacy, robustness) and hands-on labs. We cover core FL variants (cross-device, cross-silo, hierarchical, personalized), the role of data heterogeneity (IID vs. non-IID), and its impact on convergence, fairness, and robustness. Flower is used to convert a centralized training loop into a federated one with minimal changes, and FedArtML to generate controllable non-IID datasets for rigorous evaluation. We also examine threats—poisoning, backdoors, Byzantine behavior, membership inference—and practical defenses (robust aggregation, sanitization, differential privacy, secure aggregation). Special attention goes to Internet of Things (IoT) scenarios, where devices with limited processing, storage, and energy join training using resource-aware strategies. Lectures, demos, and labs guide participants to implement an end-to-end FL pipeline, from data generation to attack mitigation.
Objectives of the course
Understand the FL computation model and its motivations (privacy, regulation, efficiency).
Distinguish and apply variants: cross-device, cross-silo, hierarchical, and personalized FL.
Master IID vs. non-IID notions and quantify their effect on performance and stability.
Use Flower to transform a centralized (PyTorch/TensorFlow) training routine into a federated one.
Generate and control heterogeneity with FedArtML for reproducible experimentation.
Implement and compare optimization/aggregation algorithms (FedAvg, FedProx, FedOpt) under diverse scenarios.
Recognize relevant cyber-attacks (data/model poisoning, backdoors, Byzantine behavior, inference) and apply practical defenses (Krum/median/trimmed-mean, sanitization, DP, secure aggregation).
Design a mini-project: an FL pipeline under non-IID data with evaluation of a poisoning attack and mitigation.
Connection to last year's course: builds on Andrea's 2025 ECI course via its treatment of Differential Privacy and consensus/Byzantine-robust aggregation mechanisms.

Programa del curso

Day 1 — Introduction & Foundations
Lecture: FL overview, use cases, system architecture, variants; introduction to differential privacy (DP) and how to protect exchanged model weights from revealing sensitive data (e.g., noise mechanisms, clipping; brief on secure aggregation).
Lab: Environment setup; review of a centralized model (PyTorch/TensorFlow); implement a simple DP mechanism on gradients/weights and observe privacy–utility trade-offs.
Day 2 — Non-IID with FedArtML & Personalization
Lecture: Data heterogeneity (IID vs. non-IID), personalized and hierarchical FL; why partitioning datasets first enables meaningful federated training; definitions and intuition of Hellinger Distance, Jensen–Shannon Distance, and Earth Mover's (Wasserstein) Distance as non-IIDness metrics on client label distributions.
Lab: Synthesize non-IID datasets with FedArtML (tunable heterogeneity) and compute/report the above metrics per-client and aggregated; discuss how metric levels affect stability; optional personalization baseline.
Day 3 — Algorithms & Flower
Lecture: FedAvg, FedProx, FedOpt; evaluation criteria including performance disparity (variance/IQR and worst-client accuracy), convergence time to a chosen threshold, and communication volume per round (bytes of updates/metadata).
Lab: Migrate the centralized model to Flower; configure strategy and training loop; train on Day-2 partitions while logging disparity, convergence time, and per-round communication; compare runs across strategies.
Day 4 — Fairness in Federated Learning
Lecture: Notions of fairness in FL; how to measure it (e.g., demographic parity difference, equalized odds/TPR-FPR gaps, worst-client fairness) and address it (re-weighting, fairness-aware objectives/constraints, personalization for protected groups).
Lab: Practical exercises on datasets with protected attributes (e.g., gender/age); implement fairness metrics and apply mitigation strategies within the FL pipeline; analyze fairness–accuracy trade-offs.
Day 5 — Threats & Defenses
Lecture: Poisoning, backdoors, Byzantine behavior, inference; privacy and secure aggregation; definitions of adversarial robustness, accuracy under attacks (absolute/relative drop vs. clean baseline), and fault tolerance (degradation vs. client failures/malicious fraction; worst-case and recovery time).
Lab: Simulate poisoning/backdoor scenarios; deploy defenses (Krum/median/trimmed-mean; sanitization); measure and report robustness, accuracy-under-attack, and fault-tolerance metrics; analyze trade-offs.

Brief Index
Motivation and FL architecture: cross-device vs. cross-silo; orchestration and aggregation.
Core and advanced algorithms: FedAvg, FedProx, FedOpt; evaluation criteria.
IID vs. non-IID data: taxonomy, heterogeneity metrics, and effects on convergence/generalization.
Flower: server, clients, strategies; migrating a centralized model (PyTorch/TensorFlow) to FL.
FedArtML: generating non-IID datasets (Dirichlet, label-skew, quantity-skew) and tuning heterogeneity.
Personalization and hierarchy: pFedMe, FedPer, meta-learning ideas; hierarchical FL.
Security & privacy: poisoning and backdoors; Byzantine threats; inference/leakage; secure aggregation; differential privacy.
Robust aggregation & sanitization: Krum, median, trimmed-mean; anomaly detection.
Engineering best practices: traceability, reproducibility, reporting.

Requisitos del curso

Fundamentals of machine learning (regression/classification, neural networks).
Programming in Python and basic use of PyTorch or TensorFlow.
Probability/statistics and basic optimization.

Bibliografía

Jeno, George. Federated Learning with Python: Design and implement a federated learning system and develop applications using existing frameworks. Packt Publishing Ltd, 2022.
Ludwig, Heiko, and Nathalie Baracaldo, eds. Federated learning: A comprehensive overview of methods and applications. Cham: Springer, 2022.
Jimenez-Gutierrez, D. M.; Falkouskaya, Y.; Hernandez-Ramos, J. L.; Anagnostopoulos, A.; Chatzigiannakis, I.; Vitaletti, A. On the Security and Privacy of Federated Learning: A Survey with Attacks, Defenses, Frameworks, Applications, and Future Directions, arXiv:2508.13730, 2025. doi: 10.48550/arXiv.2508.13730.
Jimenez G., D. M.; Solans, D.; Heikkila, M.; Vitaletti, A.; Kourtellis, N.; Anagnostopoulos, A.; Chatzigiannakis, I. Non-IID data in Federated Learning: A Survey with Taxonomy, Metrics, Methods, Frameworks and Future Directions, arXiv:2411.12377, 2024. doi: 10.48550/arXiv.2411.12377.
Gutierrez, D. M. J.; Anagnostopoulos, A.; Chatzigiannakis, I.; Vitaletti, A. FedArtML: A Tool to Facilitate the Generation of Non-IID Datasets in a Controlled Way to Support Federated Learning Research, IEEE Access, vol. 12, pp. 81004–81016, 2024. doi: 10.1109/ACCESS.2024.3410026.