Data Management Methods

Methods for Maintaining Longitudinal Population Health Studies

*to jump to Hospital-level race/ethnic data quality DataBooks, please click HERE

1. Basic Computing Environment 

  • Organize the computer and software
  • Prepare the Tools and Working (Project) Environments
  • Basic issues with Master and Confidential Environments

TOOLS_2023.ZIP contains FHOP macros and related files introduced in this volume

2. Standardizing Variables Over Time

  • Time variables
  • Demographic variables
  • Confidential data elements

3. Preparing Master Files

  • Setup activities
  • The RDYR macro
  • Check longitudinal consistency

4. Special Issues with Birth and Fetal Death Files

  • Steps to make master files
  • Check longitudinal consistency
  • Geographic classification
  • Data quality

BC_FORMATS_2021.ZIP contains FHOP's current format library for use with birth certificate and fetal death data (1989-2019)

11METH_WORK_2022.PDF describes steps to clean  work-related variables in the California 2010-2017 birth (mother and father) and 2014-2017 death (decedent) files and develop formats to classify those variables.

WORK_FORMATS_2020.ZIP contains the resulting format library.

Industry and Occupation in California Birth Certificates (1998-2019): Reporting Disparities and Classification Codability. American Journal of Industrial Medicine (2023).

5. Special Issues with Death Files

  • Steps to make master files
  • Check longitudinal consistency
  • Cause of death
  • Geographic classification
  • Data quality

DT_FORMATS_2022.ZIP contains FHOP's current format library for use with death certificate and fetal death data (1980-2019)

SD_GEOCODE7.PDF summarizes work to evaluate the quality of address data in the California Death Statistical Master file in 2005 (before electronic death registration) and 2007 (after electronic death registration). It also compares the accuracy of two geocoding systems used in California at that time.

6. Maintaining Hospital Formats

  • Structure of formats program
  • OSHPD facility labels
  • Centers for Medicare and Medicaid Services
  • Clinical Classification System (CCS)
  • Injury Classification

OSH_FORMATS_2020.ZIP contains the SAS format library FHOP currently uses for OSHPD inpatient admissions (1983 to 2020) and emergency department and ambulatory care encounters (2005 to 2020). The files listed below are the source for the formats.

DXFH2018.ZIP contains the last cross-classified lists of ICD-9 diagnoses (1983 to Sep-2015). This Excel file is the source for formats that variously classify ICD-9 diagnosis codes

DXTFH2023.ZIP contains the cross-classified lists of ICD-10 diagnoses (Oct-2015 - Dec-2023). This Excel file is the source for formats that variously classify ICD-10 diagnosis codes

GEMI9I10.ZIP contains the longitudinal crosswalk between the ICD-9 and ICD-10 diagnosis codes. This Excel file is the source for formats that back-classify ICD9 to be consistent with current ICD-10 groupings.

ICD10_CONVERSION_2020.PDF describes the work to validate the longitudinal GEMS crosswalk between the ICD-9 and the ICD-10 diagnosis codes with a focus on the Clinical Classification System, and particularly mental health (DXCH06) and conditions occurring during pregnancy, birth, and the puerperium (DXCH11).

PXAH2018.ZIP contains the last cross-classified lists of ICD-9 procedures (1983 to Sep-2015). This file is the source for the formats that variously classify ICD-9 procedure codes. CCS did not update ICD-9 procedure codes in 2015.

PXTFH2023.ZIP contains the cross-classified lists of ICD-10 procedures (Oct-2015 - Dec-2023). This file is the source for the formats that variously classify ICD-10 procedure codes

7. Maintaining Geography Formats

  • The need for longitudinal geographic datasets
  • Standard administrative boundaries
  • Planning and policy geography
  • Data sets with geographic boundaries

GEOG_FORMATS_2020.ZIP contains the full set of California geography formats in current use

8. Annual Hospital Disclosure Report

This document describes methods followed to prepare the Annual Hospital Disclosure Report distributed by the California Office of Statewide Health Planning and Development (OSHPD), now known as the Department of Healthcare Access and Information (HCAI).

9. Hospital Crosswalk

10. Population Master Files

  • Department of Finance
  • National Population Estimates
  • Intercensal Small Area Population Estimates

The file below is the current version of our longitudinal county-level population file (1980-2022). Before 2000, DOF race/ethnicity had 5 groups: White, Black, Hispanic, Asian/Pacific Island (API), and American Indian/American Native (AIAN). In 2000, DOF added Multirace which changed counts in the five basic groups. To calculate statistics longitudinally, we reallocate Multirace to the five groups, using a process developed by the National Center for Health Statistics. The variable RACETHD introduces Multirace in 2000 and the variable POP is the original population. The variable RACETHB is the bridged variable and POPB is the bridged population. Note that POP = POPB until the year 2000.


Issues and Decisions to be made on Collecting, Coding and Reporting Race and Ethnicity for Public Health Indicators

The “Race/Ethnicity Guidelines”, approved in 2003 by the California Directors of Public Health (CDPH) and Health and Human Services (CHHS) for use by all programs, explicitly did not address how to handle multi-race coding for trend analysis. Further, the National Center for Health Statistics (NCHS) had not yet provided guidance on what to do when the same groups are not available over time or there is a mismatch between groups in the numerator and denominator. This document discusses issues related to developing a standardized approach to coding and reporting race and ethnicity for data sets maintained by CDPH. The focus is using these to explore race/ethnic differences in indicators of health status and outcomes over time. (September 2011).

Creating Longitudinal Hospital-Level Data Sets

Per California regulations, hospital licenses are based on a given physical location. When hospitals disappear from various data files the explanation is not readily apparent. We must determine whether it is because the facility closed, merged, converted to consolidated reporting, or moved, resulting in a new license ID. Yet another possibility is that a new license ID was assigned to a facility at the same location. We developed a series of decision rules to resolve such issues in a longitudinally consistent manner. These included rules to handle changes in hospital identifiers, physical location, consolidated data reporting, ownership, organizational type, and structural capacity. This document provides a full discussion of the issues encountered in creating the hospital-level data sets, their resolution, and the creation of related analysis data sets and variables. (June 2004)

Longitudinal Hospital Discharge and Vital Statistics Death Record Linkage

In this paper, we describe a method to link records over a 3-year period within a statewide hospital discharge dataset and then to a vital statistics death dataset. These methods were developed for the California Injury Hot Spots Project. The primary goal of the Hot Spots Project was to create small area maps identifying California communities with high incidence and rates of severe injury between 1995 and 1997 to children, adolescents, and young adults age 1 day to 24 years who were California residents [1]. Population-based studies in this age group are rare in the United States because of the frequent absence of a unique identifier. (May 2001)