What the future holds for semantics
This is a guest post from Nico Lavarini, Chief Scientist, who focuses on new methods and technologies to deliver better knowledge management solutions to customers as part of the R&D division of Expert System.
Today, while we are seeing more and more of the big players diving into the field of semantics most of them are not yet exploiting it in their core products.
Most approaches that you experience actually rely on machine learning, a wide variety of systems aimed at learning from data. These systems can be set up quickly and work well with data, but applied in the real world, they suffer some drawbacks that make practical, out-of-the-box functionality difficult, and reuse or even fine tuning tricky, if not impossible.
Instead, the analytical approach of deep semantics that we use delivers a higher quality of ‘findability’, and requires strong specific skills such as grammar, language morphology, advanced syntax and language pragmatics, and abstraction skills to generalize phenomena). It’s a compromise between complete analysis and feasibility, and allows for more arbitrary, quality tuning.
Any approach carries a non-negligible error factor: after all, even humans sometimes misinterpret texts (not to mention voice commands). Moreover, on complex tasks (such as syntax and sentiment analysis) human linguistic experts who manually evaluate results of automatic analysis typically agree about 80% of the time . So, a machine that performs with 70% accuracy is doing nearly as well as humans (even though such accuracy may not sound impressive).
Human language and all of its nuances and ambiguities make 100% accuracy difficult to impossible. There’s a significant difference in roles and relationships between Anne and Mary in the examples “Anne and Mary are mothers” and “Anne and Mary are sisters”. From a glance, it’s easy to identify the different roles, but it would be difficult to impossible for an automatic engine to figure out. So, what does the future for semantics look like?
1. The best results will come from a combined approach. Rather than using a lot of human effort in data analysis and knowledge management, we will leverage the big data machines for the heavy lifting, processing large amounts of information and data, and use the extracted knowledge for accurate, high-quality semantic analysis. Just because we can access information doesn’t mean that we understand its significance. Because we need knowledge (and therefore meaning) and not just information, analyzing lots of documents from the web will only be the beginning to actually understand what they talk about, and this is where semantics is a huge differentiator.
2. The next stage of the voice assistant cycle. NLP will benefit from the large-scale processing of synthesis data to distill commonalities in information, understanding entities and relations and extracting most relevant features, by means of large amounts of context. Merging the data from large corpora up to analysis of language phenomena will improve the NLP stage of the vocal assistant cycle (speech-to-text, text analysis, knowledge processing, response synthesis, text-to-speech), a workflow involved in any interaction-based system, from Siri to HAL 9000. Furthermore, we will get an open-domain NLP efficient for question answering and sentiment analysis, on unstructured information. In particular, open-domain NLP will move industry approaches from creating specific solutions for specific cases to general systems that will be able to answer “almost any” question on “almost everything”.
3. Semantic networks will go mainstream. The continuous improvement of semantic analysis engines especially in terms of deeper understanding of the nuances of language and the relationships between concepts, will enable an increasing automation in the creation and management of a semantic network. With the now recognized importance of a rich and deep semantic network in improving the precision and recall of semantic engines, this creates a positive spiral in the benefits of selecting a linguistic-based platform compared to legacy technologies. The platform becomes more intelligent and more effective over time. As a result, organizing, enhancing, expanding and updating it is easier, allowing it to be more impactful for any organization.
To read more about our predictions for where semantic technology will make the greatest impact in the coming year, read our report, 10 Trends for Semantic Technology in 2014.