At the heart of our approach to analytics and precision medicine is what we call the “data lake,” a new way of storing and managing data. Data is no longer locked in limited, isolated databases. Instead, all the available data is consolidated into a single pool, or “lake.” It is both stored and analyzed in the cloud, using networks of computers.
What makes this new approach possible is the way the computer finds the data. With a relational database, each piece of data is assigned a location based on rows and columns, as with a spreadsheet. Because the data has to be painstakingly formatted, this method only works well with relatively small amounts of data.
The data lake solves this problem by identifying the data in a way that doesn’t rely on rows and columns. Instead, as each piece of data is put into the data lake, it is “tagged” with accompanying details that can be used to locate it. For example, a piece of patient information, such as genomic data, can be tagged with other information about the patient, such as age, medical condition, medications, income, etc.