Unstructured Content: The Unexploited Zone
Is unstructured data exploitation the equivalent of the shale boom in the oil & gas analytics world? After all, it’s been a locked vault for years, and we’ve finally got the technology to effectively unlock it.
If you are in the oil and gas business, you know all too well that we as an industry are very good with structured data. It is the very lifeblood of how decisions are made in this business. Suffice it to say that the oil and gas business is probably the most data driven vertical on the planet. The industry is constantly trying to answer difficult questions: Where will we drill our next well? What is my competition doing? Where is the next big play? These are all questions that structured data helps oil and gas producers and operators answer every day.
Unstructured content has always been the thorn in the side of data-hungry oil and gas operators. The geologist, engineer, or landman has certainly asked themselves a time or two within their career, “If I could only turn this paragraph, or paper, into some sort of consumable data… without having to hire a sea of people to read it, understand it, and turn into data.” Plain and simple, unstructured content analysis requires manpower, and the idea of being able to run analysis on an aggregate level of any significant scale is a thought lost to the simple fact that the technology hasn’t existed to be able to extract the content the way a human would.
The critical component of being able to extract meaning in an automated fashion from unstructured text is being able to understand and interpret the content the way a human would. Understanding the meaning of words in context is a critical element to being able to remove the ambiguity that exists in the language.
Take for example:
- Yates the well,
- Yates the field,
- Yates the formation,
- Yates the lease, and
- Yates the company.
Yates is the same word, but it takes on many different meanings. A computer using a keyword and statistics based search can’t tell the difference in context or meaning of the different uses of the word “Yates”.
A practical example in the oil and gas community: a geologist or engineer sits down with his/her coffee in the morning and needs to pull up a dataset and a few documents that pertain to a lease named Yates in Kansas. When he/she searches for this information, he/she should not have to weed through the extra masses of results returned that may pertain to Yates the company, Yates the formation, or heaven forbid Yates the field in West TX. Keyword & Statistics based search is ineffective – you want to find information, not search for it. Deep linguistic based search increases findability by increasing accuracy, saving time, and yielding conceptual and unexpected results.
Expert System is changing the oil and gas content findability landscape via its rich ontology of the language (roughly 2,000,000 words and concepts, and about 8,000,000 relationships between those words and concepts), and its enriched oil and gas ontology. It has taken Expert System 300+ man years of labor to construct this map of the language. Expert System can read a document like a human does. Oil and gas companies are leveraging Expert System to extract knowledge from previously untapped content.
If an oil and gas company’s content can be converted to text, Expert System can:
- Read it
- Extract meaning based on context
- Tag it
- Categorize it
- Enable you to act on it
Expert System has spent the last 10 years working with some of the most respected operators in the oil business to develop rich and scientifically specific oil and gas ontologies and taxonomies for use in extracting knowledge from seas of content on a massive scale. Seas of unstructured content are so wide by nature, that the broadness of applications that has been built around the extracted content has proven wide as well. Some application categories of Expert System’s customers include:
- Exploration (Geological and Geophysical)
- Competitive Intelligence
- Geospatial (Geolocation tagging)
- Document metadata enhancement for internal knowledge sharing and search-ability
- Linking to structured data
We must be careful not to overlook the potential significance of the content buried in text. A prime example from the internal operations of most oil and gas companies is a shared drive or server sitting somewhere in your company where people have been dumping documents for years. In some cases, it has been estimated that 40% of mission critical information has been deemed un-consumable at an aggregate analytical level in the past, simply because it’s not structured. You can do the math, 40% of anything is a lot.
In conclusion, the untapped resource that is unstructured content has the chance to contribute knowledge at the micro-est of levels to help the geophysicist determine his next well site, or at the macro-est of levels, helping the scout determine what his competition’s intentions are. It’s time for oil and gas companies to start paying attention to what unstructured content can do for them.