Stress Testing and Failure Analysis of Large Language Models
This line of research focuses on developing methods for stress testing large language models to systematically identify failure modes, including jailbreak vulnerabilities, unsafe outputs, and robustness breakdowns under distribution shift. The work emphasizes evaluation frameworks and adversarial testing approaches for characterizing model behavior in high-risk or deployment-relevant settings.
Building an Actionable Framework for Responsible AI Integration in Clinical and Public Health Contexts
Building on the AI Ethics Box, this work adapts bioethical principles to healthcare settings and links them to concrete technical methods for evaluation, governance, and post-deployment monitoring. The framework addresses the gap between high-level ethical guidance and operational decision-making in clinical and public health contexts. It is refined through interdisciplinary collaboration and stakeholder co-design, supporting practical use by clinicians, health system leaders, and developers in real-world AI deployment.
RAI4MH: Responsible AI for Mental Health Initiative
RAI4MH is an international partnership developing evidence-based guidance for the responsible use of AI in mental health contexts. As the U.S. lead, I contribute to coordinating interdisciplinary collaboration across computer science, mental health, ethics, and policy. Key outputs include white papers that synthesize international expert consensus and contribute directly to policy discussions, including a POSTnote for the UK Parliament on responsible AI in mental health.
This work examines how AI systems interact with suicide-related content, focusing on system behavior, responses, safeguards, and evaluation. Ongoing work includes (i) understanding how suicide is expressed through language and emotional expression, and (ii) identifying the medical and sociodemographic contexts that drive suicide risk. These findings are used as evidence to inform policy and deployment decisions involving AI systems.