%0 Journal Article %J Journal of Biomedical Informatics %D 2016 %T Automated population of an i2b2 clinical data warehouse from an {openEHR}-based data repository %A Haarbrandt, Birger %A Tute, Erik %A Marschollek, Michael %K Archetypes %K Clinical data repository %K clinical information systems %K Data warehouse %K Detailed clinical models %K Healthcare analytics %K i2b2 %K openEHR %K Secondary use %X BACKGROUND: Detailed Clinical Model (DCM) approaches have recently seen wider adoption. More specifically, openEHR-based application systems are now used in production in several countries, serving diverse fields of application such as health information exchange, clinical registries and electronic medical record systems. However, approaches to efficiently provide openEHR data to researchers for secondary use have not yet been investigated or established. METHODS: We developed an approach to automatically load openEHR data instances into the open source clinical data warehouse i2b2. We evaluated query capabilities and the performance of this approach in the context of the Hanover Medical School Translational Research Framework (HaMSTR), an openEHR-based data repository. RESULTS: Automated creation of i2b2 ontologies from archetypes and templates and the integration of openEHR data instances from 903 patients of a paediatric intensive care unit has been achieved. In total, it took an average of ∼2527s to create 2.311.624 facts from 141.917 XML documents. Using the imported data, we conducted sample queries to compare the performance with two openEHR systems and to investigate if this representation of data is feasible to support cohort identification and record level data extraction. DISCUSSION: We found the automated population of an i2b2 clinical data warehouse to be a feasible approach to make openEHR data instances available for secondary use. Such an approach can facilitate timely provision of clinical data to researchers. It complements analytics based on the Archetype Query Language by allowing querying on both, legacy clinical data sources and openEHR data instances at the same time and by providing an easy-to-use query interface. However, due to different levels of expressiveness in the data models, not all semantics could be preserved during the ETL process. %B Journal of Biomedical Informatics %V 63 %P 277–294 %G eng %R 10.1016/j.jbi.2016.08.007