Highlights

Data remain local
Data are standardized locally with retention of original format for both:
- Quality checks
- Recoding in future
Each organization retains control of patient level data
Local processing allows expansion

:: A Federated Network is

Databases that reside in multiple locations and organizations that are linked through an infrastructure so they can be searched as one large database.

Home > Research and Datasets > Data Extraction

Data Extraction

DARTNet is a federated network of electronic health record data and other clinical information from multiple organizations across the United States. A federated network is a collection of databases that reside in multiple member practices and that are linked through a secure Web-based system so they can be searched and queried as one large database while maintaining privacy and confidentiality of patient data.

Three Models

In the first model, data from each member organizations' electronic health record or clinical data warehouse are extracted into an XML file or a set of flat files that conform to a slightly modified version of Observational Medical Outcomes Project (OMOP) V4 Common Data Model. The XML or flat files are loaded into the ROSITA system. The ROSITA system serves several functions:

ROSITA performs record linkage if data from the same set of patients are being loaded from multiple sources (e.g. EHR and claims),
it recodes the source values into standardized concept IDs (using OMOP V4 Vocabulary and local mapping),
it strips direct patient identifiers, and
it outputs a limited data set to the grid node housed at each site where the data are available for query.

Data are queried via a secure web portal to be used for research studies and quality improvement activities. Permission from each practice is required each and every time to make data available to DARTNet.

In the second model, clinics use third-party clinical decision support vendors to handle data extraction and standardization. In this model, data from each member practice's EHR are captured, de-identified, coded, standardized and stored in a database which resides at each individual practice. These vendors also include other important data sources such as billing, lab, hospital, and prescription databases in their secondary databases. To link the different databases, The member clinics authorize the third-party CDS vendors to generate the XML or flat files on their behalf for loading into ROSITA, or authorize them to transfer limited data sets for specific studies directly to the DI research team.

In the third model of data extraction, programmers at the clinic site collaborate directly with DI staff to pull a limited data set from the clinic's systems for specific studies. The extracted data can be in nearly any format as long as it can be transformed by DI analysts into the OMOP V4 CDM. In this model, DI analysts programmatically aggregate and standardize limited data sets from each clinic, essentially performing the functions of ROSITA and the web-portal. This model provides a means of participation in DI studies for clinic sites that cannot extract data in the exact XML format, or that do not wish to host a grid node.

More Details

Observational Medical Outcomes Partnership - DI collaborates with this public-private partnership to identify the most reliable methods for analyzing huge volumes of data drawn from heterogeneous sources

Reusable OMOP and SAFTINet Interface Transformation Adaptor (ROSITA) - an innovation that transforms identified data into a limited data set that is available for query by researchers.