Microsoft releases MS MARCO dataset to train AI systems
Microsoft Corp. has made yet another big bet in its quest to
help lead the development of artificial intelligence with the release of
a new dataset containing 100,000 questions and answers.
Called MS MARCO, or Microsoft Machine Reading Comprehension, the dataset is being made available for researchers wishing to train their AI systems. The company says the anonymized data is based on real-world queries typed into its Bing search engine, and that the aim is to make AIs better able to understand questions in a conversational context than they are now.
Microsoft explains that while virtual assistants like Cortana and Siri are already quite adept at reciting facts and figures like the population of certain cities or previous World Series winners, they’re not quite so comfortable with more complex or ambiguous questions. For example, if someone asks Siri what’s the current state of the war in Syria, most virtual assistants will simply provide search engine results that the user then has to comb through to find the answer.
That simply isn’t good enough for Microsoft, which believes its dataset can be used by virtual assistants to provide more definitive answers to such questions. The idea is that instead of simply providing a page of search query results, AIs might be able to analyze those results themselves and come up with an actual answer to the question.
“In order to move toward artificial general intelligence, we need to take a step toward being able to read a document and understand it as well as a person,” said Rangan Majumder (above), a partner group program manager with Microsoft’s Bing search engine division who is leading the effort. “This is a step in that direction.”
Microsoft said the MS MARCO dataset contains questions that its researchers found “interesting.” The answers were based on existing web pages and verified to be accurate by real humans, so as to try and teach AIs to do the same thing themselves. Microsoft said the dataset is available for researchers for free.
The release of MS MARCO came at the end of a busy week on the AI front for Microsoft. Last Monday, the company made headlines with the announcement of a new fund for AI startups, which has already taken a startup called Element AI under its wing. Element AI, is based in Montreal, is working to build commercial-grade AI systems and support the work of local startups trying to apply neural networks in new fields.
Also last week, Microsoft announced a preview of the Cortana Skills Kit and Devices SDK, which are designed for manufacturers that want to integrate Cortana into various smart hardware devices, from cars to home appliances.
With the Cortana Devices SDK, Microsoft is hoping to take on Amazon.com, Inc.’s Alexa-powered Dot and Echo devices, and also Google Inc’s smart home speaker Google Home. To do so, Microsoft is collaborating with Harman Kardon, a brand under Harman International Industries Inc., to create an Amazon Echo-like device that’s integrated with Cortana’s AI capabilities.
Source: http://siliconangle.com/blog/2016/12/18/microsoft-releases-ms-marco-dataset-train-ai-systems/
Called MS MARCO, or Microsoft Machine Reading Comprehension, the dataset is being made available for researchers wishing to train their AI systems. The company says the anonymized data is based on real-world queries typed into its Bing search engine, and that the aim is to make AIs better able to understand questions in a conversational context than they are now.
Microsoft explains that while virtual assistants like Cortana and Siri are already quite adept at reciting facts and figures like the population of certain cities or previous World Series winners, they’re not quite so comfortable with more complex or ambiguous questions. For example, if someone asks Siri what’s the current state of the war in Syria, most virtual assistants will simply provide search engine results that the user then has to comb through to find the answer.
That simply isn’t good enough for Microsoft, which believes its dataset can be used by virtual assistants to provide more definitive answers to such questions. The idea is that instead of simply providing a page of search query results, AIs might be able to analyze those results themselves and come up with an actual answer to the question.
“In order to move toward artificial general intelligence, we need to take a step toward being able to read a document and understand it as well as a person,” said Rangan Majumder (above), a partner group program manager with Microsoft’s Bing search engine division who is leading the effort. “This is a step in that direction.”
Microsoft said the MS MARCO dataset contains questions that its researchers found “interesting.” The answers were based on existing web pages and verified to be accurate by real humans, so as to try and teach AIs to do the same thing themselves. Microsoft said the dataset is available for researchers for free.
The release of MS MARCO came at the end of a busy week on the AI front for Microsoft. Last Monday, the company made headlines with the announcement of a new fund for AI startups, which has already taken a startup called Element AI under its wing. Element AI, is based in Montreal, is working to build commercial-grade AI systems and support the work of local startups trying to apply neural networks in new fields.
Also last week, Microsoft announced a preview of the Cortana Skills Kit and Devices SDK, which are designed for manufacturers that want to integrate Cortana into various smart hardware devices, from cars to home appliances.
With the Cortana Devices SDK, Microsoft is hoping to take on Amazon.com, Inc.’s Alexa-powered Dot and Echo devices, and also Google Inc’s smart home speaker Google Home. To do so, Microsoft is collaborating with Harman Kardon, a brand under Harman International Industries Inc., to create an Amazon Echo-like device that’s integrated with Cortana’s AI capabilities.
Source: http://siliconangle.com/blog/2016/12/18/microsoft-releases-ms-marco-dataset-train-ai-systems/
No comments:
Post a Comment
Have a Say?..Note it down below.