In the race to continue building more sophisticated AI deep learning models, Facebook has a secret weapon, billions of images on Instagram.
In research that the company is presenting today at F8, Facebook details how it took what amounted to billions of public Instagram photos that had been annotated by users with hashtags and has used that data to train their own image recognition models. They relied on hundreds of GPUs running around the clock to parse through the data, but they were ultimately left with deep learning models that beat industry benchmarks, the best of which achieved 85.4 percent accuracy on ImageNet.
If you’ve ever put a few hashtags onto an Instagram photo, you’ll know doing so isn’t exactly a research-grade process. There is generally some sort of method to why users tag an image with a specific hashtag, the challenge for Facebook was sorting what was relevant across billions of images.
When you’re operating at this scale — the largest of the tests used 3.5 billion Instagram images spanning 17k hashtags — even Facebook doesn’t have the resources to closely supervise the data. While other image recognition benchmarks may rely on millions photos that human beings have pored through and annotated personally, Facebook had to find methods to clean up what users had submitted that they could do at scale.
The “pre-training” research focused on developing systems for finding relevant hashtags, that meant discovering what hashtags were synonymous while also learning to prioritize more specific hashtags over the more general ones. This ultimately led to what the research group called the “large-scale hashtag prediction model.”
The privacy implications here are interesting. On one hand, Facebook is only using what amounts to public data (no private accounts), but when a user posts an Instagram photo how aware are they that they’re also contributing to a database that’s training deep learning models for a tech mega-corp? These are the questions of 2018, but they’re also issues that Facebook is undoubtedly growing more sensitive to out of self-preservation.
It’s worth noting that the product of these models were more centered on the more object-focused image recognition. Facebook won’t be able to use this data to predict who your #mancrushmonday is and it also isn’t using the database to finally understand what makes a photo #lit. It can tell dog breeds, plants, food, and plenty of other things that it’s grabbed from WordNet.
The accuracy from using this data aren’t necessarily the impressive part here. The increases in image recognition accuracy only were a couple points in many of the tests, but what’s fascinating are the pre-training processes that turned noisy data that was this vast into something effective while being weakly-trained. The models that this data trained will be pretty universally useful to Facebook, but image recognition could also bring users better search and accessibility tools as well as strengthening Facebook’s efforts to combat abuse on their platform.