Blog, Semantic Intelligence

Semantics & Folding Proteins – More in Common Than You Might Think

In science we have tackled great problems.  It was only a short number of years ago that we had mapped the human genome.  Imagine unlocking the code of what makes us human.  More recently, scientists are studying how proteins operate.  Or more precisely how they fold.  It is in the folding that we learn what a protein is intended for and what job it is supposed to do.  Once we unlock this we will know how diseases form, replicate and, most importantly, how to beat them… all of them.

So what does the information science of semantics have to do with proteins?  Semantics fold too.  That’s what.

Scientists studying proteins that fold are discovering it’s most important and elemental attributes.
The same is true with semantics.  Boil a sentence down to its most elemental parts and you get what is called a triple – that is a subject, a predicate and an object.   So consider the sentence below;

“John works in the White House”.

Subject:  Who or what does the sentence describe?  Obviously, that would be” John”.
Predicate:  What is the property that describes or connects the subject to the rest of the sentence?  That would be the verb “works”.
Object:  What is the value of the property?  That would be “White House”.

So that example is pretty easy.  What about a longer sentence.  Something like this;

“John, a favorite of the President Obama from his days in Chicago,
now works as public liaisonin the White House”.

Now the job is tougher.   It is clear John is still the subject of the sentence.  It might be tempting to assign “favorite” as the predicate since it connects John to President Obama.  But the commas indicate to us that this is really a clausal description of John and not the central action of the sentence.  So we are left with “works” as the predicate.  But what does “works” connect to?  Is it “public liaison” or “White House”?  The stronger connection is “public liaison” since this describes the kind of work John does.  The White House is just the location of that work so it is nothing more than a qualifier.

When we learned to read as a child we were taught to reason through these example sentences pretty much like I just described.  Of course you don’t think about it very deeply – the understanding of the sentence, the essence of it comes naturally:  John – works – public liaison.  The rest just colors these most important facts.

Semantics is the information science of establishing meaning over text without human intervention – and this includes establishing the triple of any sentence.  This is also what is called the Semantic Web or Web 3.0.  From a diagram perspective this basic notion is sometimes represented notionally like this;

You will note this diagram looks much like cells or proteins linked together.  There is a reason for that.  Like the proteins that fold and match up along the edges that are common in order to do their work so do semantic triples.  Switching to a protein example now let’s consider these two sentences;
1.    Protein X adds two molecules of zinc to the cell for each molecule of oxygen.
2.    Protein Y adds one molecule of copper to the cell for each molecule of iron.

Our diagram now looks like the following;

So what happened?  Each sentence has its own triple.  But they have a common predicate of “adds”.  So we can diagram two subjects and two objects but with a common predicate.

Just like proteins that fold and combine to make something new we have done the same here in the science of semantics.  Because we boiled the sentences down to triples, stored them in a place that can be queried we can ask for all predicates that match to “add(s)”.

Why is this important?  It gives scientists, researchers, business professionals, citizens a chance to tap into and glean true meaning from their documents, email or the web. This is far different from a Google like keyword match. The word “add(s)” certainly matched but it was the words role that also matched.
But what if the author of sentence (1) did not use the word “adds” but instead used the word “increased”.  A keyword match would fail here.  But semantics can also understand that “add” and “increase” are related and so the query would result in the same scientific discovery of Proteins that add/increase molecules.

Now let’s change sentence (2) from Protein Y to Protein X.  A more restrictive query on a store of triples where you would ask for both subject and predicate matches would result in a diagram like below.

Again why is this important?  Because now a scientist can rely on the smarts built into such a search index to deliver all the Protein X’s that add/increase [some kind of] molecule to a cell.  The interesting thing for the scientist will be to group and sort the kind of molecules that will be added to the cell.

This is real discovery in science.  It is semantics that get language out of the way.  It is semantics that build in smarts to a system so the scientist can find, analyze and create new cures for diseases that have yet to be worked on effectively.  So… semantics and folding proteins do have a lot in common – more than you thought.

Share On