In the recent article “Why Big Data Science and Data Analytics Projects Fail,” Data Science Process Alliance consultant Nick Hotz outlines common problems all data project teams face and the questions that need to be answered for a data project to succeed. Since 2011, hc1 has been helping health systems and laboratories analyze their laboratory testing data to improve patient care. Our approach to resolving data project problems for our customers is rooted in our core values. I spoke with hc1 data scientist Alex Karr, for more details about this approach.
Hire curious people and develop their talents
Hotz mentions the difficulty of finding people with the appropriate data skills in today’s competitive market. What he doesn’t say is that organizations can choose to invest in developing the skills of the people they already have. Karr is a great example of this kind of career growth. He started with hc1 as an intern writing training documentation for our business intelligence team. When he was hired full-time, he learned to write queries and develop reports as an analyst. He then used his deep knowledge of the hc1 database to move into a data engineering role with the data team to manage their ETL (extract, transform, load) processes. While continuing to work at hc1, Karr earned his master’s degree in data science and was promoted to his current position.
“For me, the natural progression was we’ve got all this data, how do we actually make insights out of it? Of course, one way is doing visualizations for a client, but what I was more interested in was machine learning, so, you know, how can we make predictions based on what we have,” Karr said.
Be accountable for data quality
As the Data Science Process Alliance article points out, not having the right data is the clearest reason for data project failure. “Not having the right data is pretty critical, because we can work with what we’ve got, but it’s always gonna have limitations. We have to make sure that we recognize those limitations and try to account for them, ” Karr said. He explained some of the ways we do this:
- Integrating data from multiple sources. Lab testing data comes from systems that are designed to collect the information necessary for reimbursement, not operational or patient care improvement, so there are likely to be gaps in the information that can be gleaned from it. For example, we might need to look at medical claims data to get full diagnosis information or add financial data to present a complete picture of lab operations.
- Cleaning data. Our patient matching process de-duplicates patient records so that we can build data models from a more manageable, performant dataset. Standardizing the format of zip code and state data points, for example, is important in public and population health use cases that involve patient geography, such as evaluating the risk of COVID-19 spread or tracking changes in opioid overdoses.
Standardizing test names and abbreviations and comparing test codes, analytes, and units of measure to determine whether tests with similar names are in fact referring to the same test is an essential part of making sure that the insights we provide for test utilization and order volume are accurate.
- Evaluating data for bias. When compared with U.S. Census data, our dataset reveals a slight overrepresentation of women, for example. Knowing that fact allows our data scientists to adjust data models to reflect patient gender ratios that align with U.S. population numbers.
At this point in the interview, Karr started getting into concepts that reminded me that not all hc1 technical writers (namely me) are destined to become data scientists. He spoke about imputation and overfitting and data leakage. Mostly, I understood that a major part of building a machine learning model is figuring out which data points to include and which ones to exclude, a process called feature selection. He also stressed that it’s important to separate your training data from your testing data.
Use a collaborative project management process
The Hotz article lists ineffective project management processes as a reason for data project failure and recommends the Agile methodology, which is what the hc1 technology department uses, as a solution. As hc1 practices it, Agile means small, focused teams of product owners, software engineers, business and data analysts, and data scientists work together in two-week-long sprints to complete defined blocks of work. Breaking up the work in this way provides the flexibility to incorporate customer feedback and adjust to changing priorities. It also shortens the amount of time required to deliver value. Frequent communication and collaboration are essential to making the Agile process work, as is keeping the focus on the problem the customer is trying to solve. Although data science doesn’t often fall neatly into defined deliverables—many sprints are devoted to research—Karr finds the structure and clear goals of Agile to be helpful, especially when he splits his time among different teams and projects.
Handle data ethically
As a healthcare technology company, hc1 safeguards the protected health information in our systems in accordance with the Health Insurance Portability and Accountability Act of 1996 (HIPAA). We also have taken the extra step to become HITRUST Risk-based, 2-year (r2) Certified to demonstrate our commitment to managing information security risks and protecting customer data. The Health Information Trust Alliance Common Security Framework (HITRUST CSF) serves to unify security controls based on aspects of U.S. federal law (such as HIPAA and HITECH), certain state-specific laws, and other industry-standard compliance frameworks into a single comprehensive set of baseline security and privacy controls, built specifically for healthcare needs.
The (r2) validated assessment certification is a tailored assessment for the highest level of assurance that an organization may earn from HITRUST.
hc1 was founded on the belief that every patient should be treated as a unique individual and that if labs could organize every individual's information intelligently, they could personalize and improve care for all patients. Now, hc1 solutions optimize laboratory operations for thousands of locations and inform testing and treatment decisions for millions of patients. Click here to learn more about the hc1 Platform which has organized diagnostic data for over 200 million patients and processes more than 30 billion clinical transactions and 500 million test results per month.