Federated Learning, Part 1: The Basics of Training Models Where the Data Lives

I the concept of federated learning (FL) through a comic by Google in 2019. It was a brilliant piece and did a great job at explaining how products can improve without sending user data to the cloud. Lately, I have been wanting to understand the technical side of this field in more detail. Training data has become such an important commodity as it is essential for building good models but a lot of this gets unused because it is fragmented, unstructured or locked inside silos.

As I started exploring this field, I found the Flower framework to be the most straightforward and beginner-friendly way to get started in FL. It is open source, the documentation is clear, and the community around it is very active and helpful. It is one of the reason for my renewed interest in this field.

This article is the first part of a series where I explore federated learning in more depth, covering what it is, how it is implemented, the open problems it faces, and why it matters in privacy-sensitive settings. In the next instalments, I will go deeper into practical implementation with the Flower framework, discuss privacy in federated learning and examine how these ideas extend to more advanced use cases.

When Centralised Machine learning is not ideal

We know AI models depend on large amounts of data, yet much of the most useful data is sensitive, distributed, and hard to access. Think of data inside hospitals, phones, cars, sensors, and other edge systems. Privacy concerns, local rules, limited storage, and network limits make moving this data to a central place very difficult or even impossible. As a result, large amounts of valuable data remain unused. In healthcare, this problem is especially visible. Hospitals generate tens of petabytes of data every year, yet studies estimate that up to 97% of this data goes unused.

Traditional machine learning assumes that all training data can be collected in one place, usually on a centralized server or data center. This works when data can be freely moved, but it breaks down when data is private or protected. In practice, centralised training also depends on stable connectivity, enough bandwidth, and low latency, which are difficult to guarantee in distributed or edge environments.

In such cases, two common choices appear. One option is to not use the data at all, which means valuable information remains locked inside silos.

The other option is to let each local entity train a model on its own data and share only what the model learns, while the raw data never leaves its original location. This second option forms the basis of federated learning, which allows models to learn from distributed data without moving it. A well-known example is Google Gboard on Android, where features like next-word prediction and Smart Compose run across hundreds of millions of devices.

Federated Learning: Moving the Model to the Data

Federated learning can be thought of as a collaborative machine learning setup where training happens without collecting data in one central place. Before looking at how it works under the hood, let’s see a few real-world examples that show why this approach matters in high-risk settings, spanning domains from healthcare to security-sensitive environments.

Healthcare

In healthcare, federated learning enabled early COVID screening through Curial AI, a system trained across multiple NHS hospitals using routine vital signs and blood tests. Because patient data could not be shared across hospitals, training was done locally at each site and only model updates were exchanged. The resulting global model generalized better than models trained at individual hospitals, especially when evaluated on unseen sites.

Medical Imaging

Federated learning is also being explored in medical imaging. Researchers at UCL and Moorfields Eye Hospital are using it to fine-tune large vision foundation models on sensitive eye scans that cannot be centralized.

Defense

Beyond healthcare, federated learning is also being applied in security-sensitive domains such as defense and aviation. Here, models are trained on distributed physiological and operational data that must remain local.

Different types of Federated Learning

At a high-level, Federated learning can be grouped into a few common types based on who the clients are and how the data is split.

• Cross-Device vs Cross-Silo Federated Learning

Cross-device federated learning involves use of many clients which may go up to millions, like personal devices or phones, each with a small amount of local data and unreliable connectivity. At a given time, however, only a small fraction of devices participate in any given round. Google Gboard is a typical example of this setup.

Cross-silo federated learning, on the other hand, involves a much smaller number of clients, usually organizations like hospitals or banks. Each client holds a large dataset and has stable compute and connectivity. Most real-world enterprise and healthcare use cases look like cross-silo federated learning.

• Horizontal vs Vertical Federated Learning

Visualization of Horizontal and Vertical Federated learning strategies

Horizontal federated learning describes how data is split across clients. In this case, all clients share the same feature space, but each holds different samples. For example, multiple hospitals may record the same medical variables, but for different patients. This is the most common form of federated learning.

Vertical federated learning is used when clients share the same set of entities but have different features. For example, a hospital and an insurance provider may both have data about the same individuals, but with different attributes. Training, in this case requires secure coordination because feature spaces differ, and this setup is less common than horizontal federated learning.

These categories are not mutually exclusive. A real system is often described using both axes, for example, a cross-silo, horizontal federated learning setup.

How Federated Learning works

Federated learning follows a simple, repeated process coordinated by a central server and executed by multiple clients that hold data locally, as shown in the diagram below.

Training in federated learning proceeds through repeated federated learning rounds. In each round, the server selects a small random subset of clients, sends them the current model weights, and waits for updates. Each client trains the model locally using stochastic gradient descent, usually for several local epochs on its own batches, and returns only the updated weights. At a high level it follows the following five steps:

Initialisation

A global model is created on the server, which acts as the coordinator. The model may be randomly initialized or start from a pretrained state.

2. Model distribution

In each round, the server selects a set of clients(based on random sampling or a predefined strategy) which take part in training and sends them the current global model weights. These clients can be phones, IoT devices or individual hospitals.

3. Local training

Each selected client then trains the model locally using its own data. The data never leaves the client and all computation happens on device or within an organization like hospital or a bank.

4. Model update communication

After the local training, clients send only the updated model parameters (could be weights or gradients) back to the server while raw data is shared at any point.

5. Aggregation

The server aggregates the client updates to produce a new global model. While Federated Averaging (Fed Avg) is a common approach for aggregation, other strategies are also used. The updated model is then sent back to clients, and the process repeats until convergence.

Federated learning is an iterative process and each pass through this loop is called a round. Training a federated model usually requires many rounds, sometimes hundreds, depending on factors such as model size, data distribution and the problem being solved.

Mathematical Intuition behind Federated Averaging

The workflow described above can also be written more formally. The figure below shows the original Federated Averaging (Fed Avg) algorithm from Google’s seminal paper. This algorithm later became the main reference point and demonstrated that federated learning can work in practice. This formulation in fact became the reference point for most federated learning systems today.

The original Federated Averaging algorithm, showing the server–client training loop and weighted aggregation of local models | Source: Communication-Efficient Learning of Deep Networks from Decentralized Data

The original Federated Averaging algorithm, showing the server–client training loop and weighted aggregation of local models.
At the core of Federated Averaging is the aggregation step, where the server updates the global model by taking a weighted average of the locally trained client models. This can be written as:

Mathematical representation of the Federated Averaging algorithm

This equation makes it clear how each client contributes to the global model. Clients with more local data have a larger influence, while those with fewer samples contribute proportionally less. In practice, this simple idea is the reason why Fed Avg became the default baseline for federated learning.

A simple NumPy implementation

Let’s look at a minimal example where five clients have been selected. For the sake of simplicity, we assume that each client has already finished local training and returned its updated model weights along with the number of samples it used. Using these values, the server computes a weighted sum that produces the new global model for the next round. This mirrors the Fed Avg equation directly, without introducing training or client-side details.

import numpy as np

# Client models after local training (w_{t+1}^k)
client_weights = [
    np.array([1.0, 0.8, 0.5]),     # client 1
    np.array([1.2, 0.9, 0.6]),     # client 2
    np.array([0.9, 0.7, 0.4]),     # client 3
    np.array([1.1, 0.85, 0.55]),   # client 4
    np.array([1.3, 1.0, 0.65])     # client 5
]

# Number of samples at each client (n_k)
client_sizes = [50, 150, 100, 300, 4000]

# m_t = total number of samples across selected clients S_t
m_t = sum(client_sizes) # 50+150+100+300+400

# Initialize global model w_{t+1}
w_t_plus_1 = np.zeros_like(client_weights[0])

# FedAvg aggregation:

# w_{t+1} = sum_{k in S_t} (n_k / m_t) * w_{t+1}^k
# (50/1000) * w_1 + (150/1000) * w_2 + ...

for w_k, n_k in zip(client_weights, client_sizes):
    w_t_plus_1 += (n_k / m_t) * w_k

print("Aggregated global model w_{t+1}:", w_t_plus_1)
-------------------------------------------------------------
Aggregated global model w_{t+1}: [1.27173913 0.97826087 0.63478261]

How the aggregation is computed

Just to put things into perspective, we can expand the aggregation step for just two clients and see how the numbers line up.

Challenges in Federated Learning Environments

Federated learning comes with its own set of challenges. One of the major issues when implementing it is that the data across clients is often non-IID (non-independent and identically distributed). This means different clients may see very different data distributions which in turn can slow training and make the global model less stable. For instance, Hospitals in a federation can serve different populations that can follow different patterns.

Federated systems can involve anything from a few organizations to millions of devices and managing participation, dropouts and aggregation becomes more difficult as the system scales.

While federated learning keeps raw data local, it does not fully solve privacy on its own. Model updates can still leak private information if not protected and so extra privacy methods are often needed. Finally, communication can be a source of bottleneck. Since networks can be slow or unreliable and sending frequent updates can be costly.

Conclusion and what’s next

In this article, we understood how federated learning works at a high level and also walked through a simply Numpy implementation. However, instead of writing the core logic by hand, there are frameworks like Flower which provides a simple and flexible way to build federated learning systems. In the next part, we’ll utilise Flower to do the heavy lifting for us so that we can focus on the model and the data rather than the mechanics of federated learning. We’ll also have a look at federated LLMs, where model size, communication cost, and privacy constraints become even more important.

Note: All images, unless otherwise stated, are created by the author.

Source link

Sign Up to Our Newsletter

Top Categories

Uncategorized

Tech News

Tech

Software development

Popular Tech News

Top Categories

Uncategorized

Tech News

Tech

Software development

Popular Tech News

When Centralised Machine learning is not ideal

Federated Learning: Moving the Model to the Data

Healthcare

Medical Imaging

Defense

Different types of Federated Learning

• Cross-Device vs Cross-Silo Federated Learning

• Horizontal vs Vertical Federated Learning

How Federated Learning works

Mathematical Intuition behind Federated Averaging

A simple NumPy implementation

How the aggregation is computed

Challenges in Federated Learning Environments

Conclusion and what’s next

7 new Windows laptops that delighted and surprised us at CES

Google Photos might soon give you more control over how your photos and videos are backed up

Team TeachToday

About Author

You may also like

Our Company

Categories

Get Latest Updates and big deals

Our expertise, as well as our passion for web design, sets us apart from other agencies.