Using artificial intelligence to screen for health-related social needs in clinical notes

Sep 03, 2025

Care management & redesign Data analysis & integration Data sharing Quality improvement SDOH & health equity

As a former Center for Medicare and Medicaid Services (CMS) Accountable Health Communities (AHC) site, we understand the amount of time and effort required to do in-person social needs screening. We also know that many of the people we screened could have been identified and supported more proactively if existing health system data was activated sooner. Seeing firsthand the gap between what was possible and what was practical has shaped how we at the Coalition think about emerging tools that can help close it.

Recent advances in artificial intelligence, particularly the rise of large language models (LLMs) like ChatGPT, Gemini, and Claude, are rapidly transforming our abilities to process information. These tools can generate content, answer questions, and even hold conversations. One of the most promising capabilities lies in analyzing large amounts of text far faster than a human ever could.

In complex care, we know the value of rich, in-person interactions, and we recognize that narrative data — such as provider notes and discharge summaries — often capture critical details about a patient’s lived experiences that structured fields, like diagnosis codes or even screening data, may miss. With this in mind, the Camden Coalition’s Data and Quality Improvement team (DAQI) saw an opportunity to use LLMs to extract key information from these narrative clinical notes, unlocking information that would otherwise remain buried and inaccessible for both analysis and action.

Housing instability, one of the most significant health-related social need, is a priority focus for many Camden Coalition programs. Individuals experiencing homelessness have mortality rates three to four times higher than housed populations and account for a disproportionate share of emergency department visits and hospital readmissions. While diagnosis codes historically focused on medical conditions, the introduction of SDOH “Z-codes” in ICD-10 represented a pivotal acknowledgment of the powerful role that social factors like housing, food insecurity, and transportation play in health outcomes. Unfortunately, these codes remain underused: studies show that fewer than 30% of homeless patients have a Z-code documenting their housing status. As a result, the Camden Coalition’s clinical teams often spend significant time manually reviewing records to identify periods of homelessness — work that could be streamlined through LLMs.

Using LLMs to surface this information from clinical notes could save valuable staff time, enable proactive outreach, connect patients to housing resources more quickly, and provide richer insights into the prevalence and impact of housing instability. This work is especially timely given the number of health efforts focused on housing including New Jersey Medicaid’s recent 1115 waiver, which expands Medicaid’s ability to cover housing-related supports for eligible members. By creating a more reliable and scalable way to identify individuals experiencing housing instability, we can ensure that these new benefits reach people who need them most — quickly and efficiently. Without accurate identification, many eligible members may go unnoticed, missing opportunities for timely interventions that could improve health outcomes and reduce costly emergency utilization. Leveraging tools like LLMs to surface housing-related needs from clinical notes positions the Camden Coalition and its partners to leverage data more efficiently to address whole-person care.

In February, the DAQI team joined the New Jersey Social Determinants of Health Hackathon, hosted by Cooper University Health Care’s Innovation Center. The event brought together stakeholders from across the state to explore innovative ways to use health IT to better engage at-risk populations and improve outcomes. Our team set out to test whether LLMs could help detect indicators of housing instability in clinical notes accessed through the Camden Coalition’s Health Information Exchange (HIE), which aggregates data from healthcare providers across South Jersey. If successful, this approach could save significant staff time on manual review, enable more proactive outreach, connect patients to housing resources more quickly, and provide richer -population-level insights.

Over the course of less than two weeks, the DAQI team implemented a proof of concept around the idea. We extracted large volumes of clinical reports from the HIE, representing terabytes of data, designed a sampling approach, built secure HIPAA-compliant data pipelines, developed and refined prompts for the LLMs, and evaluated results. Using a sample of roughly 18,000 clinical reports—some with and some without housing-related Z-codes—we tested whether the models could flag housing instability. Chart reviews by program staff revealed that LLMs were able to accurately identify homelessness and housing instability even in cases where no Z-code was present, meaning many affected patients would have been missed if we relied only on structured data.

The results were promising: in cases where the LLM flagged homelessness without a corresponding Z-code, over 95% were confirmed correct. The models excelled at identifying clear cases of homelessness but were less precise with more nuanced situations, such as individuals “doubled up” with friends or family or experiencing poor housing quality, which highlights an area for further refinement. Even so, these findings demonstrate the potential of LLMs to surface critical social needs information hidden in free-text notes, with meaningful implications for care coordination and population health monitoring.

Of course, the hackathon setting meant our methods were shaped by tight timelines and the need for rapid iteration, with validation conducted on a small scale. Our next steps include expanding to a broader range of note types, developing a robust multi-rater validation framework, and testing model performance across the full Camden HIE dataset. We are actively seeking funding and academic partnerships to move beyond this early prototype, refine our approach, and integrate these tools into our routine, real-time HIE data workflows. By building on these early successes, we can create solutions that not only improve our local interventions but also contribute to the growing national conversation on how technology can help better identify and address health related social needs.

We’re especially excited to continue this conversation at our challenges of using AI and large language models in complex care. This will be a chance to engage with thought leaders, practitioners, and researchers from across the country on how these emerging tools can enhance care coordination, surface critical health-related social needs, and ultimately improve outcomes for the populations we serve.