One of the early innovations in the Siri application on the iPhone was the personalization of the “Hey Siri” wake phrase. More specifically Siri was trained to be able to respond only to you and not to your friends or family. The reason Apple created this personalized wake phrase was because they wanted a way to keep all the iPhones in a room from responding when one person uttered the wake phrase. You might think that Apple would need to collect a lot of your audio data in order to personalize Siri’s response to your voice and your voice alone.  But surprisingly, it didn’t.

If Apple had used a traditional central-architecture approach, raw audio would have been sent to the Apple center cloud, where engineers would have applied neural network technology similar to what was used for Stanford’s 2010 ImageNet competition. However, that would have posed two significant challenges. First, having your voice in the Apple cloud would run straight into issues related to privacy preservation. Second, all issues with privacy aside, who would be willing to pay for the network bandwidth required to send your voice commands to the Apple cloud every time you wanted Siri to answer a question?

Instead, federated learning was used to improve the accuracy of Siri. Federated learning is a privacy-preserving, machine-learning method first introduced by Google in 2017. It has allowed Apple to train individual copies of a speaker recognition model across all its user’s devices, using only the audio data available locally (i.e. on each individual device). Each device then sends just the updated AI-generated model (but not your audio) back to a central server where it can all be combined into a master model. In this way, raw audio of user’s Siri requests never leaves their iPhones or iPads, while Siri is nevertheless able to continuously improve her accuracy.

So how does she do this?

Federated learning takes advantage of a decentralized architecture, which enables Siri to learn on millions of iPhones in parallel. Training on just your voice, Siri computes neural network weights. These weights are then sent to a central aggregation server, where the results of each of these parallel training sessions are combined to create a new consensus model. Each iteration of this process – parallel training, update aggregation, and distribution of new parameters – is called a federated learning round.

Federated learning is privacy-preserving since only the neural network weights, rather than the “raw data” that informs them, are shared. It’s network-preserving because it’s just numbers, not the entire raw audio.  Given these benefits, you might be left wondering why everyone isn’t switching from centralized to decentralized, federated learning.  In the case of consumer applications, it’s because decentralized federated learning can pose some specific challenges. Three important considerations in the application of decentralized federated learning include:

  1. Not fast, not continuous communication

Federated networks are inherently comprised of a number of devices. In some instances (millions of smartphones, for example), this number can be massive, placing heavy demands on the network. As a result, communication in a federated network can be much slower than in classical data center environments, and orders of magnitude slower than local computation. To account for this challenge, it is necessary to develop communication-efficient methods that iteratively send small messages or model updates more frequently as part of the training process, rather than waiting to send the dataset in its entirety over the network.

  1. Complicated power management

Consumer federated learning applications need to make sure they are not using too much power on the phone as well as dealing with consumers powering off their phones.

  1. Systems heterogeneity

The storage, computational, and communication capabilities of each device in federated networks may differ due to variability in hardware (CPU, memory), network connectivity (3G, 4G, 5G, WiFi), and power (battery level). Additionally, the network size and systems-related constraints on each device typically result in only a small fraction of the devices being active at once. For example, only hundreds of devices at any given time may be active in a million-device network. Each device may also be unreliable, and it is not uncommon for an active device to drop out at a given iteration. These system-level characteristics make issues such as stragglers and fault tolerance significantly more prevalent than in typical data center environments.

  1. Statistical Heterogeneity

Devices frequently generate and collect data in a non-identical decentralized manner across the network. For example, mobile phone users may have varied use of language in the context of a next-word prediction task. Moreover, the number of data points across devices may vary significantly, and there may be an underlying structure present that captures the relationship among devices and their associated distributions. Simply put, in a consumer federated learning system you don’t have the luxury of each piece of data being generated and collected in the same, systematic way.

Even given the challenges above, federated learning is generating significant enthusiasm with its ability to provide privacy-preserving, network-preserving training for consumer AI applications.

AI tools and their deployment will be discussed at the in-person AIMed Global Summit scheduled for June 4-7th 2023 in San Diego with the remainder of the week filled with exciting AI in medicine events like the Stanford AIMI Symposium on June 8th.Book your place now! 

We believe in changing healthcare one connection at a time. If you are interested in the opinions in this piece, in connecting with the author, or the opportunity to submit an article, let us know. We love to help bring people together! [email protected]