- Home
- News and Resources
- Resources
- From Chaos to Clarity: The Journey of Converting Raw Data into SDTM Datasets
From Chaos to Clarity: The Journey of Converting Raw Data into SDTM Datasets
Introduction
The Standard Data Tabulation Model (SDTM) is a key component of CDISC standards that offers a structured and uniform method for organizing and formatting clinical trial data for submission to regulatory agencies such as the FDA. It establishes a consistent format for organizing and presenting clinical data, including aspects such as demographics, medications, adverse events, and lab results. SDTM's standardized format streamlines the review and analysis of clinical trial data, making it easy to compare and interpret data from various phases of a trial or across different trials. This standardization is crucial for the precise evaluation of drug efficacy and safety, simplifying the regulatory review process and ensuring the prompt delivery of effective treatments to people in need. The SDTM process not only supports regulatory compliance but also promotes data integrity and transparency, which are essential for advancing medical science and improving patient outcomes.
The SDTM Process
Mapping – The first step to creating SDTM datasets is identifying which SDTM domains are needed. The mapping specifications is then created to outline where the raw source will go in SDTM datasets. The mapping specification may contain both standard SDTM domains as well as study specific custom domains. Controlled terminology is used wherever applicable to maintain Clinical Data Acquisition Standards Harmonization (CDASH) compliance. Once ready, the mapping specification is then sent out to programmers to create SDTM datasets. SDTM mapping is consistently updated throughout the study to account for newly added data, fields, or domains. Programmers are alerted of any updates in the mapping so that they can update and generate final SDTM datasets.
Annotated CRF – The SDTM Annotated Case Report Form (aCRF) is typically prepared alongside the mapping specification to indicate where each domain and field on the CRF is included in the SDTM datasets. The annotations are added in accordance to the SDTM Implementation Guide (IG), and are formatted and grouped by color to create a visual representation of what SDTM domains and variables the data can be found in. Not submitted fields are also annotated to specify which fields are not included in SDTM.
Define – The SDTM Define is an XML file that contains metadata for the SDTM datasets. The Define.xml is created using Pinnacle 21 Community and is produced in compliance with the Clinical Data Interchange Standards Consortium (CDISC) standards. This file includes metadata at the following levels: Dataset, Variable, ValueLevel, Codelists, Dictionaries, Methods, Comments, and Supplemental Documents.
Reviewer’s Guide – The SDTM Reviewer’s Guide is a documentation that includes additional information related to the content included in the SDTM datasets. This document provides viewers information regarding the SDTM domains and supplemental domains, lists out the data included in trail domains, and includes a summary of issues and conformance for all SDTM datasets.
SDTM dataset output – Once the mapping process is complete, programmers use the final mapping specifications to create and output the final SDTM datasets. Final SDTM datasets are outputted in SAS, Excel, and Transport format.
SDTM Validation and Quality Check
A crucial aspect of the SDTM process is ensuring that data collection is reliable, accurate, and adheres to regulatory guidelines. It is essential that the data in the datasets remains consistent, coherent, and unchanged throughout the transfer, retrieval, and update processes to maintain its integrity. To achieve this, quality check (QC) programmers develop QC programs and datasets to compare with the main program and datasets, ensuring that there are no discrepancies and that data is accurately represented in the final SDTM datasets.
Additionally, datasets are validated using Pinnacle 21 Community to identify and address any errors or issues in accordance with regulatory standards. Validation can be performed on datasets alone, Define.xml alone, or both datasets and Define.xml together. Any unresolved issues are documented in the Reviewer’s Guide, complete with explanations.
Importance of Real-Time SDTM Conversion
Real-time SDTM conversion refers to reformatting clinical data into SDTM format as the data is collected, rather than waiting until the end of the study. Performing the SDTM process while the study is ongoing is beneficial for several reasons. To begin with, it allows for early detection of data issues or inconsistencies, which in return reduces the risk of major issues at the end of the study. Performing early data review and validation allows for a streamlined data management process. This helps ensure that data is consistently organized and formatted according to regulatory requirements, which results in generating final submissions and reports more effectively. By preparing the data in a regulatory compliant format early on, the risk of delays and adjustments are minimalized during final submission preparation.
Since data is reviewed while its collected, the decision-making process can be performed faster, enabling quicker interim analyses. This helps in detecting trends or issues early in the trial. Not only does this allow for quicker data analysis, but it also allows for a more effective way of using resources. SDTM formatted data allows for various teams to collaborate and communicate more effectively, leading to a more seamless teamwork.
Applying SDTM processes in real-time during the study ensures data is consistently formatted and combined among different parts of the study, which is useful especially if a study is a complex longitudinal study with a large number of datasets. This leads to improving quality control and maintaining high data quality throughout the study by periodic data checks, allowing for issues to be addressed early on as opposed to the end of the study. Preparing SDTM datasets in the beginning phases of the study can ultimately reduce the time and cost associated with managing and finalizing accurate and reliable data.
Conclusion
Understanding and applying CDISC standards, such as SDTM, is fundamental to the success of clinical research. These standards guarantee data consistency, reliability, and regulatory compliance, leading to a more efficient review process and faster access to new treatments for patients. As the pharmaceutical and biotech sectors advance, adherence to CDISC is essential for successful clinical trials and drug approvals. The implementation of these standards supports the integration of data across various studies, allowing for more comprehensive analysis and cross-study comparisons. This harmonization is critical for identifying trends, assessing treatment efficacy, and ensuring that findings are robust and reproducible. By continuously integrating and upholding these standards, clinical research organizations contribute to the advancement of medical science and the delivery of effective, timely treatments to those in need.
About the Authors
Mahsa Ameri, MPH
Mahsa Ameri is a Senior Data Administrator at Amarex Clinical Research, an NSF company, where she has been a member of the Data Operations team for over five years. Mahsa began her career in clinical research following the completion of her Bachelor's degree in Public Health Science from the University of Maryland, College Park. She developed a passion for improving health outcomes and advancing clinical research, inspiring her to earn a Master of Public Health (MPH) with a concentration in Biostatistics.
Mahsa specializes in data cleaning, validation, and SDTM conversion, ensuring compliance with CDISC standards. Her responsibilities include collaborating closely with data managers and programmers to ensure the accuracy and reliability of data for regulatory submissions. She plays a crucial role in preparing data and ensuring compliance with regulatory guidelines.
Christian Betre
Christian Betre is a Data Administrator in the Biometrics Team at Amarex Clinical Research, specializing in the standardization of datasets to SDTM and the management of external vendor data. With a proven track record in data conversion, validation, and regulatory submissions, Christian brings a comprehensive understanding of the clinical trial process, ensuring high-quality data that aligns with industry standards.
Christian has a strong interest and extensive experience in infectious disease, oncology, and medical device trials. She collaborates effectively with stakeholders, internal team, and end users to translate program intent and data requirements into effective data solutions. Her attention to detail in reviewing clinical trial data for accuracy and compliance ensures the identification and correction of errors, discrepancies, and protocol violations, contributing to the integrity and completeness of databases.
References:
Clinical Data Interchange Standards Consortium. (2018).
Study data tabulation model (SDTM) implementation guide: Version 3.4. Clinical Data Interchange Standards Consortium.
Cs Clinical. (2023, August 14). Understanding CDISC standards for clinical data management. LinkedIn. https://www.linkedin.com/pulse/understanding-cdisc-standards-clinical-data-management-csclinical-em0ze?utm_source=share&utm_medium=member_ios&utm_campaign=share_via
Quanticate. (2014, November 17). CDISC SDTM v3.1.2: Theory and application. Quanticate. https://www.quanticate.com/blog/bid/51830/cdisc-sdtm-v3-1-2-theory-and-application