AI’s Next Great Challenge: Understanding the Nuances of Language

jul18-25-language-james-yang-678 — Illustration by James Yang

Language is a uniquely human capability and the manifestation of our intelligence. But through AI — specifically natural language processing (NLP) — we are providing machines with language capabilities, opening up a new realm of possibilities for how we’ll work with them.

Today you can walk into a darkened living room and ask Alexa to turn the smart lights up to a pleasant 75% brightness. Or, you can summon information about weather conditions on the other side of the world. The progress the industry has made was on display in Google’s recent demo of Duplex, in which an AI agent called businesses and booked appointments. What once seemed like science fiction is now reality, but to maintain a truly copacetic human-machine relationship, machines must be able to hold more intuitive, contextual, and natural conversations — something that remains a challenge. I’ve spent my career focusing on NLP, a research area nearly as old as AI itself, and we’re still in the beginning phase of this journey.

Language is the mechanism for sharing information and connecting to those around us, but machines need to understand the intricacies of language and how we as humans communicate in order to make use of it. Advances in sentiment analysis, question answering, and joint multi-task learning are making it possible for AI to truly understand humans and the way we communicate.

Sentiment Analysis

Language is inherently difficult. It evolves constantly, it’s highly nuanced, and it takes the average person years to master. With sentiment analysis, we can use AI to understand certain things about a given statement, such as whether a brand mention or film review is positive, negative, or neutral. But we can also figure out things like the attitude and intentions of the speaker (Is she angry? Happy? Surprised? Ready to buy?). From customer service to online community moderation to algorithmic trading, it’s tremendously valuable to businesses to be able to understand public sentiment toward a brand by analyzing thousands of tweets or hundreds of product reviews instantly.

Sentiment analysis has been around for a while, but it hasn’t always been very accurate. However, that is changing with advances in NLP. At Salesforce, where I work as chief scientist, our Einstein AI services allow brands to get real-time analysis of sentiment in emails, social media, and text from chat in order to provide better customer experiences. Accurate sentiment analysis allows, for instance, service agents to get insight into which dissatisfied customers to help first or who to extend promotional offers to. It’s also possible to identify product deficiencies, measure overall product satisfaction, and even monitor the perception of the brand across social media channels. Other tech companies offer similar services.

We also need context. Suppose you have a soap business and somebody tweets “This soap is really great for babies.” That could be seen as a positive endorsement of the soap for children, or sarcastically imply that it’s terrible for kids. There’s so much context tied up in that statement — and it’s a pretty simple statement! Teaching AI to parse all the possible meanings of a sentence construction and understand which one a person intends in a given context is one of the great challenges in NLP research. It requires both labeled data to improve model training and new models that can learn context and share knowledge across many different kinds of tasks simultaneously.

Question Answering

As NLP gets better at parsing the meaning of text, the intelligence of digital assistants helping to manage our lives will improve. Applications like Siri and Google Assistant already provide pretty good answers to common questions and execute fairly simple commands. Ideally, though, we should be able to ask our computers arbitrary questions and still get good answers.

One way to provide better answers is to make sure the computer understands the question. If you ask “When will my plane arrive?” how does the computer know whether you’re talking about your flight or the woodworking tool you ordered from Amazon? Computers are getting better at guessing our meaning through a deeper understanding of semantics, plus a smarter use of contextual data. With NLP, we’re figuring out how to learn each of these layers of context so that AI can process all of it at once and not miss vital information.

For example, dynamic coattention networks can interpret a single document differently depending on the various questions it’s asked, such as “Which team represented the NFC in Super Bowl 50?” or “Who scored the touchdown in the fourth quarter?” With this conditional interpretation it can then iteratively hypothesize multiple answers in order to arrive at the best, most accurate result.

Joint Multi-Task Learning

The scientific community is good at building AI models that perform a single task really well. But more intuitive, conversational, and contextual interfaces will require an AI model that learns continuously — integrating new tasks with old tasks and learning to perform ever-more-complex ones in the process. This is true of AI in general, but particularly true when it comes to language, which requires flexibility.

The question “Who are my customers?” presents a simple enough task: Create a list of customers. But what about the question “Who are my best customers in the Pacific Northwest for a particular product?” Now we’ve added a layer of complexity that requires a number of integrated tasks to answer qualifying questions, such as: How is “best” defined? Where is each customer located? What factors contribute to one customer being interested in one product versus another? By adding one item to the query, the complexity of the question increases dramatically.

Salesforce Research recently created the Natural Language Decathlon, a challenge that leverages the power of question answering to tackle 10 of NLP’s toughest tasks in a single model: question answering, machine translation, summarization, natural language inference, sentiment analysis, semantic role labeling, relation extraction, goal-oriented dialogue, database query generation, and pronoun resolution. Using a multitask question-answering model that poses each task as a form of question answering, the single model jointly learns and processes different tasks without any specific parameters or modules. Not only does this mean data scientists no longer have to build, train, and optimize individual models for each task, but it also means the model will have zero-shot learning capability — in other words, the model can tackle tasks that it has never seen before or been specifically trained to do.

As researchers continue to improve models like this one, we’ll see AI interfaces grow smarter as they take on more-complex tasks.

Though we’ve been working on NLP for a long time, it’s still early days. The hope, though, is that as NLP improves, it will allow AI to change everything about how we interact with our machines.