In most cases it’s relatively easy to get some sense how others are reacting or feeling in a live situation but online or via video conference, such subtleties are much more difficult to detect. Microsoft this week released the public beta of an API and tool that lets developers program the ability to detect emotions into their apps.
The new emotion API, debuted at Microsoft’s Future Decoded conference in London, was developed by the company’s Project Oxford team and demonstrated by Chris Bishop, head of Microsoft Research in Cambridge, U.K., during his keynote address. Microsoft revealed Project Oxford at its Build conference back in April.
Microsoft describes Project Oxford as a portfolio of REST-based APIs and SDKs that allow developers to add intelligence into their applications and services using the company’s machine learning technology that comes out of Microsoft Research. Among the APIs now in beta are facial detection, speech and computer vision.
This week’s new tool, released to beta testers, is designed to let developers build the ability to detect the eight most common states of emotion: anger, contempt, fear, disgust, happiness, neutral, sadness or surprise. The state is detected by reading the kind of facial expressions that typically convey those feelings. In a blog post, Microsoft described some scenarios where those APIs would be useful, such as to develop systems for marketers to gauge reactions to a store display, movie or food or creating apps that render options based on the emotion it recognizes in a photo or video.
Microsoft also showcased a scenario tied to the facial hair fundraising effort Movember, in which the company released MyMoustache, to rate facial hair. Microsoft also released a spell check API beta. It’s a context-aware programmatic interface that can detect slang as well as proper word usage (such when “four,” “for” or “fore” is correct). It also supports brand names and commonly used terms.
By year’s end, Microsoft will be releasing additional tools coming from the Project Oxford team, including a video API based on some of the same technology found in Microsoft Hyperlapse that can automatically clean up video. Also coming by year’s end are tools to recognize speakers and custom recognition intelligent services (CRIS), which can detect speech in noisy environments.