Research

What Counts as Human-AI Collaboration?

I keep coming back to a simple question: when I use an LLM, am I collaborating with it, or just using a tool? That distinction matters more than it first appears. If we call every AI interaction a collaboration, the word becomes too loose to be useful. But if we reserve collaboration for situations where the human and the system genuinely shape each other’s work, then the term becomes more precise — and more honest. ...

[Research] Optimizing Order Sets With a Large Language Model–Powered Multiagent System

Paper Overview Title: Optimizing Order Sets With a Large Language Model–Powered Multiagent System Authors: Liu S, Huang SS, McCoy AB, Wright AP, Horst S, Wright A Journal: JAMA Network Open Year: 2025 DOI: https://doi.org/10.1001/jamanetworkopen.2025.33277 Why This Paper? I read this paper because it sits at the intersection of clinical pharmacy, healthcare workflow, and practical AI systems. Relevant to clinical decision support and order-set maintenance Uses a multiagent LLM design instead of a single-model prompt Shows the gap between factual correctness and actual clinical usefulness Offers a good example of expert alignment in a high-stakes domain This article is a cleaned-up conversion of my original blog post into the site’s Notes format. ...

[Research] A Framework for Human Evaluation of Large Language Models in Healthcare Derived from Literature Review

Paper Overview Title: A Framework for Human Evaluation of Large Language Models in Healthcare Derived from Literature Review Authors: Thomas Yu Chow Tam, Sonish Sivarajkumar, Sumit Kapoor, Alisa V. Stolyar, Katelyn Polanska, Karleigh R. McCarthy, Hunter Osterhoudt, Xizhi Wu, Shyam Visweswaran, Sunyang Fu, Piyush Mathur, Giovanni E. Cacciamani, Cong Sun, Yifan Peng, and Yanshan Wang Journal/Conference: npj Digital Medicine Year: 2024 DOI/Link: https://doi.org/10.1038/s41746-024-01258-7 This scoping review analyzes 142 studies of human evaluation for healthcare LLMs and argues that current practice is inconsistent, under-specified, and often too weak for high-risk clinical use cases. Selected Figures Figure 1. Healthcare applications of LLMs This figure shows where human evaluation has been used most often: clinical decision support, medical education, patient education, and question answering. Figure 7. QUEST human evaluation framework This is the most important figure in the paper because it turns the review findings into a practical evaluation workflow. Figure 9. PRISMA flow diagram This figure summarizes the literature search and screening process behind the 142 included studies. ...