zoukankan      html  css  js  c++  java
  • OHDSI——数据标准化

    Home Data Standardization

    Data Standardization

    Data standardization is the critical process of bringing data into a common format that allows for collaborative research, large-scale analytics, and sharing of sophisticated tools and methodologies[.meθə'dɑlədʒi],方法论;研究法;【生】分类法】. Why is it so important?

    Healthcare data can vary greatly from one organization to the next. Data are collected for different purposes, such as provider reimbursement[ˌri:ɪm'bɜ:smənt],补偿;付还】, clinical research, and direct patient care. These data may be stored in different formats using different database systems and information models. And despite the growing use of standard terminologies in healthcare, the same concept (e.g., blood glucose) may be represented in a variety of ways from one setting to the next.

    We at OHDSI are deeply involved in the evolutionevolutionary  [,ev·o'lu·tion·ar·y || ‚evə'luːʃənərɪ /ɪːv-]adj.  发展的; 渐进的; 进化的】and adoption【n.  采纳, 采用; 收养; 正式通过

    of a Common Data Model known as the OMOP Common Data Model. We provide resources to convert a wide variety of datasets into the CDM, as well as a plethora【['pleθərə],过多的】 of tools to take advantage of your data once it is in CDM format.

    Most importantly, we have an active community that has done many data conversions (often called ETLs) with members who are eager to help you with your CDM conversion and maintenance.

    OMOP Common Data Model

    Why use the Common Data Model?

    What is the OMOP Common Data Model (CDM)?

    The OMOP Common Data Model allows for the systematic analysis of disparatedisparate  [dis·pa·rate || 'dɪspərət],不同的】observational databases. The concept behind this approach is to transform data contained within those databases into a common format (data model) as well as a common representation (表述)(terminologiesterminology  [ter·mi·nol·o·gy |用词,术语】, vocabularies(词汇), coding schemes), and then perform systematic analyses using a library of standard analytic routines that have been written based on the common format.

    Why do we need a CDM?

    Observational databases differ in both purpose and design. Electronic Medical Records (EMR) are aimed at supporting clinical practice at the point of care, while administrativeadj.  管理的; 行政的)claims data are built for the insurance reimbursementreimbursement  [,re·im'burse·ment ]补偿,赔偿】processes. Each has been collected for a different purpose, resulting in different logical organizations and physical formats, and the terminologies used to describe the medicinal products and clinical conditions vary from source to source.

    The CDM can accommodate【[ac·com·mo·date || ə'kɒmədeɪt]v.  调节, 使适应, 和解; 供应; 适应 both administrative claims and EHR, allowing users to generate evidence from a wide variety of sources. It would also support collaborative research across data sources both within and outside the United States, in addition to being manageable for data owners and useful for data users.

    Why use the OMOP CDM?

    The Observational Medical Outcomes Partnership (OMOP) CDM, now in its version 5.0.1, offers a solution unlike any other. OMOP found that disparate coding systems can be harmonized(和谐)—with minimal information loss—to a standardized vocabulary.

    Once a database has been converted to the OMOP CDM, evidence can be generated using standardized analytics tools. We at OHDSI are currently developing Open Source tools for data quality and characterization(特征描述), medical product safety surveillance, comparative effectiveness, quality of care, and patient-level predictive modeling, but there are also other sources of such tools, some of them commercial.

    For more information about the CDM please read the documentation, download the DDL for various database dialects and learn about the Standardized Vocabularies. If you have qustions post them at the OHDSI Forum.

    Vocabulary Resources
    The Standard Vocabulary is a foundational tool initially developed by some of us at OMOP that enables transparent and consistent content across disparate observational databases, and serves to support the OHDSI research community in conducting efficient and reproducible observational research.

    To download the standard vocabularies, please visit our Athena download site:

    Building your CDM
    Building your CDM is a process that necessitates proper planning and execution, and we are here to help. Successful use of an observational data network requires a collaborative, interdisciplinary approach that includes:

    • Local knowledge of the source data: underlying data capture process and its role in the healthcare system
    • Clinical understanding of medical products and disease
    • Domain expertise in the analytical use cases: epidemiology, pharmacovigilance, health economics and outcomes research
    • Command of advanced statistical techniques for large-scale modeling and exploratory analysis
    • Informatics experience with ontology management and leveraging standard terminologies for analysis
    • Technical/programming skills to implement design and develop a scalable solution

    Getting Started

    Ready to get started on the conversion (ETL) process? Here are some recommended steps for an effective process:

    1. Train on OMOP CDM and Vocabulary
    2. Discuss analysis opportunities (Why are we doing this? What do you want to be able to do once CDM is done?)
    3. Evaluate technology requirements and infrastructure
    4. Discuss data dictionary and documentation on raw database
    5. Perform a systematic scan of  raw database
    6. Draft Business Logic
      a.  Table level
      b. Variable level
      c. Value level (mapping)
      d. Capture what will not be captured (lost) in the transformation
    7. Create data sample to allow initial development
    8. DON’T START IMPLEMENTING UNTIL THE DESIGN IS COMPLETE
    9. don't start implementing ustil design is complete

    Helpful Hints
    Having gone through the ETL process with several databases over the past few years, we know that there will be obstacles to overcome and challenges to solve. Here are some helpful hints and lessons learned from the OHDSI collaborative:

    • A successful ETL requires a village; don’t make one person try to be the hero and do it all themselves
      • Team design
      • Team implementation
      • Team testing
    • Document early and often, the more details the better
    • Data quality checking is required at every step of the process
    • Don’t make assumptions about source data based on documentation; verify by looking at the data
    • Good design and comprehensive specifications should save unnecessary iterations and thrash during implementation
    • ETL design/documentation/implementation is a living process. It will never be done and it can always be better. But don’t let the perfect be the enemy of the good

    For more information, check out the documentation on our wiki page: www.ohdsi.org/web/wiki

    And remember, the OHDSI community is here to help! Contact us at contact@ohdsi.org.

    A 100k sample of CMS SynPUF data in CDM Version 5.2 is available to download on LTS Computing LLC’s download site:

  • 相关阅读:
    Java学习之分支结构---判断语句:if语句和switch语句
    CSS知识点之字体大小属性font-size
    CSS小知识点一
    loadRunner之参数化,对用户名和密码进行参数化,并打印输出---实际操作:用户登录的账号用随机值来登录
    LoadRunner 场景运行error的几种情况
    loadrunner使用随机值
    loadrunner报错-持续更新
    关联及web_reg_save_param
    jenkins 提示No emails were triggered
    环境部署(八):jenkins配置邮件通知
  • 原文地址:https://www.cnblogs.com/quietwalk/p/9256949.html
Copyright © 2011-2022 走看看