The Patent

Methods for mapping data into lower dimensions.

Inventors:

Hemant Virkar (Potomac, MD, US); Karen Stark (Arlington, MA, US); Jacob Borgman (West Newbury, MA, US)

Class Name:

Data processing: structural design, modeling, simulation, and emulation modeling by mathematical expression

Publication Date:

Aug 19, 2014

Publication #:

Claims

The overall organization of the information in the big data sets becomes intuitively visible and novel insights can be made at a glance. The patent includes the use of the images as a revolutionary method for visual database indexing. Unlike a traditional relational database query in which specific fields, their values or value ranges must be explicitly stated, these unique visual indexes can look at samples considering thousands of fields simultaneously.  Automated, self-optimizing machine learning can also be used to create visual sub-indexes using only those fields the machine learning determines to be important for the user’s chosen topic resulting in a visual index for that specialized purpose. Selection in the image itself allows access to the data sample(s) chosen.

Digital Infuzion intends to revolutionize the way that we interact with big data. Data mining for discovery belongs in the hands of the subject matter expert, not the computer specialist. Standard database querying is transformed through this flexible way of imaging big or complex data that requires no specialized mathematical knowledge by the end user. 

How We’ve Used It

Previously, to find other patients similar to a current patient in a repository of big data, researchers would conduct very structured queries based on the thousands of medical records, gene test results and other pieces of data associated with every patient.  Finding similar patients relied on the painstaking approach of stipulating exactly what ranges of difference in these data items was acceptable. Digital Infuzion’s patented technology allows researchers to visualize their patient and all of the other patients in a repository such that similar patients become immediately apparent. 

These methods can also be used as a form of topic analysis, to group unstructured text documents of any type into natural groupings based on their verbal content.  This method has been applied to tweets, news articles, web pages and RSS feeds. Users can rapidly zero in on the subset of these documents that is of interest to them, as well as get an overall comprehensive view of how many different types of subjects are found in any collection of documents.