Cortana Everywhere: Speech-Driven Digital Assistants As The New Universal UI


When the internet was younger, the search box was promoted as a kind of universal user interface. That was the mantra and marketing message of the former enterprise search company Fast Search & Transfer, which Microsoft acquired in 2008.

With the proliferation of smartphones, smartwatches and, soon, numerous other connected devices — the so-called “Internet of Things” — the familiar search box (and mouse and keyboard) will necessarily give way to speech or perhaps a digital assistants such as Cortana as a successor UI.

Microsoft is seeking to enable this with its ambitious Project Oxford, a set of APIs and SDKs for developers that enable them to add a range of advanced capabilities to their applications.

These tools include facial recognition APIs (most recently demonstrated in Microsoft’s viral, speech-to-text and text-to-speech processing in multiple languages, image recognition to identify objects and images and what the company calls “Language Understanding Intelligent Service” (LUIS). Unlike the others LUIS, which promises advanced natural language understanding and “intent detection,” is invitation only for now.

Project Oxford capabilites Microsoft

An article in Ars Technica goes into all this in some (technical) detail. But here’s the most important part of the article:

The longterm result could be that developers of all sorts of devices could build speech and computer vision into their products, delivering the equivalent of Cortana on everything from televisions to assembly line equipment to household automation systems. All such implementations would be customized to specific tasks and backed by cloud-based artificial intelligence.

This is what LUIS integration promises for developers:

Create models for your application to better understand intents like “turn on the lights”, or entities such as “start a new jog/walk/hike/bikeride”. Tune your model with in-depth performance visualizations . . .

Use the pre-built, world-class models to recognize entities like places, times, numbers, temperatures, and to also handle common requests like “set an alarm for 8 AM”. Immediately enable personal assistant functionalities by using a selection of Cortana understanding models.

The bottom line is that LUIS essentially enables Bing’s “intelligence engine” and Cortana-like capabilities to extend to any app on any device.

At its Build developer conference earlier this month Microsoft also announced deeper app integration into Cortana for task completion. Actions and tasks can thus be accomplished (e.g., call an Uber car) without launching the app in question. Google is doing something similar by integrating action buttons (i.e., “buy“) and transactional capabilities into search and Google Now.

The larger point here is that what we call “search” today is going to change radically over the next decade. Its impact on marketers and disciplines such as SEO will be profound, though is not entirely clear at this point. We can project, however, that the familiar “query in a box” and related SERP will likely become a less and less common way that people retrieve and interact with content over time.

