Shin Li

The 7 Skills You Need to Build AI Agents

Wed, 13 May 2026 10:42:46 +0800

IBM Technology’s The 7 Skills You Need to Build AI Agents makes a point that feels increasingly true: if an agent can act in the real world, then prompt writing is only the starting point.

[Dev] Following a Goal with Codex (/goal)

Tue, 12 May 2026 06:00:00 +0800

I have been looking for a clean way to explain what /goal really does in Codex.

The most useful mental model I found is simple: /goal is not a prettier prompt. It is a working contract for long-running agent work. You are telling the agent what success looks like, what the boundary is, and how to know when to stop.

That framing matters because the feature is built for work that outlives one turn. If the objective is durable enough, the agent can keep making progress, validate its own steps, and come back to you with a result instead of a half-finished thought.

[Dev] Learning from Matt Pocock’s Agent Skills

Thu, 07 May 2026 08:53:00 +0800

I recently read Matt Pocock’s article, “5 Agent Skills I Use Every Day”. It resonated with my experience using coding agents such as Claude Sonnet and Claude Opus.

The article gave me a clearer language for something I have been feeling: good agent work depends on good engineering process. We need better questions, written context, small slices, tests, and codebases that agents can understand.

[Dev] Trying Reflex (Python) for Web Apps

Wed, 06 May 2026 03:10:00 +0800

I’ve been using Streamlit for quick internal tools and dashboards, but a colleague introduced me to Reflex, so I’m trying it out as another way to build Python web apps.

What caught my attention is that Reflex is a full-stack Python framework for building web apps with UI, state, backend logic, data models, and deployment in one codebase. This is especially suitable for Python backend developers who seek to build more scalable and production-ready web apps.

[Research] Optimizing Order Sets With a Large Language Model–Powered Multiagent System

Tue, 18 Nov 2025 09:42:00 +0800

Paper Overview

Title: Optimizing Order Sets With a Large Language Model–Powered Multiagent System

Authors: Liu S, Huang SS, McCoy AB, Wright AP, Horst S, Wright A

Journal: JAMA Network Open

Year: 2025

DOI: https://doi.org/10.1001/jamanetworkopen.2025.33277

Why This Paper?

I read this paper because it sits at the intersection of clinical pharmacy, healthcare workflow, and practical AI systems.

Relevant to clinical decision support and order-set maintenance
Uses a multiagent LLM design instead of a single-model prompt
Shows the gap between factual correctness and actual clinical usefulness
Offers a good example of expert alignment in a high-stakes domain

This article is a cleaned-up conversion of my original blog post into the site’s Notes format.

[Research] A Framework for Human Evaluation of Large Language Models in Healthcare Derived from Literature Review

Fri, 14 Nov 2025 14:47:00 +0800

Paper Overview

Title: A Framework for Human Evaluation of Large Language Models in Healthcare Derived from Literature Review

Authors: Thomas Yu Chow Tam, Sonish Sivarajkumar, Sumit Kapoor, Alisa V. Stolyar, Katelyn Polanska, Karleigh R. McCarthy, Hunter Osterhoudt, Xizhi Wu, Shyam Visweswaran, Sunyang Fu, Piyush Mathur, Giovanni E. Cacciamani, Cong Sun, Yifan Peng, and Yanshan Wang

Journal/Conference: npj Digital Medicine

Year: 2024

DOI/Link: https://doi.org/10.1038/s41746-024-01258-7

This scoping review analyzes 142 studies of human evaluation for healthcare LLMs and argues that current practice is inconsistent, under-specified, and often too weak for high-risk clinical use cases.

Selected Figures

Figure 1. Healthcare applications of LLMs

This figure shows where human evaluation has been used most often: clinical decision support, medical education, patient education, and question answering.

Figure 7. QUEST human evaluation framework

This is the most important figure in the paper because it turns the review findings into a practical evaluation workflow.

Figure 9. PRISMA flow diagram

This figure summarizes the literature search and screening process behind the 142 included studies.

Markdown Syntax Guide

Sun, 01 Sep 2024 05:34:09 +0800

This article offers a sample of basic Markdown syntax that can be used in Hugo content files.

First Post

Sun, 01 Sep 2024 00:08:39 +0800

Hit the ground running

This is my first post.

About

Mon, 01 Jan 0001 00:00:00 +0000

I’m Shin Li, a pharmacist, engineer, healthcare AI researcher, educator, and lifelong learner based in Taipei.

I spend much of my time at the edges between domains: clinical pharmacy and software, healthcare workflows and AI systems, research and teaching, structure and creativity. I like making complex things easier to understand, and I care about tools that are not only technically interesting, but also useful in real clinical and human contexts.

Now

Mon, 01 Jan 0001 00:00:00 +0000

What I am focused on at this point in life.

Projects

Mon, 01 Jan 0001 00:00:00 +0000

Things I’ve built, organized, studied, or kept returning to.