Applied AI Engineer × Published ML Researcher

Reliable AI
that ships.

I'm Shawn Liu. I build LLM agents and applied AI systems for high-stakes workflows, with deterministic validation, human approval, and production-grade failure handling, backed by published ML research at WACV, NeurIPS, and ISLPED.

CS · UC Irvine '26 → Incoming MSCS · Columbia · Open to Applied-AI / ML / SWE

See what I've shipped View research Get in touch

GitHub LinkedIn OpenReview Email

Cross-modal attention heatmap learned by the WACV 2026 event-encoder, highlighting salient regions — Published Cross-modal event-encoder attention · WACV 2026

Cross-modal attention learned by our WACV 2026 event-encoder, one of six papers across top AI venues.

Published at WACV 2026 NeurIPS 2025 ISLPED 2026 Frontiers in AI IEEE TAI ×2 (under review)

Papers at top venues · 4 accepted, 2 under review

4×

Lower FHE bootstrapping overhead at >90% encrypted-inference accuracy

+15.2pts

Zero-shot gain on unseen N-ImageNet classes · WACV 2026

65%

Fewer simulated casualties · U.S. Navy UAV landing model

01 Flagship · LLM systems

Loop

Interview prep, scheduled around your real life. Loop takes a career goal across eight supported tracks, your weekly availability, and progress signals, turns them into a validated study plan grounded in a curated source corpus, and drafts it as a real week on your calendar. Nothing touches Google Calendar until you approve it, and every write is verified, with a rollback path if something looks off. I built it for my own daily use, and it's deployed live at loop-study.com.

Try Loop live ↗

LLMs propose. Deterministic infrastructure disposes.

Four LLM nodes write the plans and the prose. Everything that can actually touch your calendar is deterministic, validated, and waits for your approval.

Propose · LLM, one isolated package

Strategist grounded syllabus with source-claim citations · Opus 4.8

↓

Planner structured task plan · Sonnet 5

Reflection · Explanation prose only, never parsed · Sonnet 5

The LLM SDK can't even be imported outside this package. import-linter fails the build if you try.

Dispose · deterministic

Validation layer five checks: schema, graph, coverage, user-fit, scheduling. Failures go back to the LLM as typed repairs, twice at most

↓

Greedy scheduler draft-only · no write access

↓

Human approval gate nothing lands on the calendar without your say-so

↓

Calendar Write Manager the only code that writes: it rechecks the approval and payload hash, dry-runs, catches duplicates, verifies after writing, and offers rollback / retry / keep

↓

Google Calendar

Feedback loop: telemetry → deterministic drift classifier → accountability (check-ins, recommitment) → replan. Every failure carries a typed reason code, and one supervisor state machine owns every transition. Calendar sync is reconciliation-based: your real calendar is authoritative for overlap, external daily load is advisory, valid external changes are adopted, and deleted events are remembered so they don't come back.

4,822

Backend tests, plus 313 on the frontend. All green in CI

Written axioms + 10 ADRs governing every design decision

LLM nodes. Everything else is deterministic

$1.70

Expected monthly cost per user, worked out in a written cost axiom. Hard cap: $8

242

Curated source documents in the grounding corpus, split into 7,776 retrieval chunks

Career tracks, a closed enum from SWE and MLE to quant dev and PM

Versioned eval sets. Prompt changes ship with before/after deltas

Deterministic validation checks between every LLM proposal and your calendar

Done

Loop engineering

I closed the dead-ends you actually feel in a tool like this. A failed calendar write now offers rollback, retry, or keep. A needed replan says so and offers recovery modes. Check-ins can be answered right in the app.

Done

Harness engineering

Timeouts, backoff, and a typed taxonomy for provider errors, plus a live-capture tool, a CI eval gate, and call-log readers. The real-prompt baseline is recorded, and CI now re-grades committed real-output recordings on every build.

Done

Prompt engineering

Few-shot exemplars, unified repair messages, and voice specs for the prose you'd actually read. All shipped, each with recorded before/after eval deltas. The Strategist prompt is at v8.

Done, one step left

Context engineering

Prompt caching is live, source curation feeds the Strategist a grounded context slice, and mastery memory carries reflection history across plans. The one piece left: replans that regenerate only part of a syllabus.

The grounding layer

Syllabi aren't generated from vibes. A curated corpus of 242 source documents, split into 7,776 retrieval chunks, sits behind BM25-first retrieval with deterministic source-confidence scoring, so every claim in a plan can point back at where it came from. The bibliography is auto-generated from the same corpus, and retrieval is deterministic code: the LLM only consumes what it's handed.

Beyond the planner

Résumé intake stores your résumé and maps it through a closed, review-gated skill taxonomy; it is never free-parsed into control state. Career support spans eight tracks, from SWE and MLE to quant dev and PM. A pathway observatory renders the knowledge map with its community structure and drawer navigation, and mastery memory carries what you've already proven across plans into the Strategist's context, behind a deterministic review-output gate. The scheduler scores placement quality, and freebusy checks include Loop's dedicated calendar.

The eval harness

Every LLM call lands in a SQLite call log with tokens, cost, and latency. A capture tool records real model outputs into committed recordings, and CI re-grades them deterministically: schema validity, repair recovery, plan-quality metrics, plus an offline LLM judge for the prose. Prompt and model changes ship with before/after deltas in the commit message. Live API calls never run in CI, and prompt bytes are version-pinned by hash, so an unmeasured prompt change fails the build.

The first real-prompt baseline was recorded in July 2026, and committed recordings now cover few-shot, voice, and grounding before/after runs across eight versioned eval sets. One fixture still deliberately fails, so I know the gate actually catches regressions.

The deployment

The whole app ships as one Docker image on Fly.io. A two-stage build compiles the React SPA under Node 20, then a python:3.11-slim stage installs exact-pinned runtime dependencies with uv sync --frozen, so no Node and no dev tooling reach production. A single uvicorn process serves everything: the API, the built SPA, and the static landing pages. It's deliberately single-process on a single always-on machine, because SQLite with WAL is a one-process store - no worker pools, no autoscaling to fight over the database. State lives on a persistent Fly volume, OAuth tokens are encrypted at rest, and every secret is injected at runtime; nothing sensitive is baked into the image.

Canonical domain is loop-study.com, with 301 redirects from the legacy hosts.

Python 3.11Pydantic v2FastAPISQLite + WALReact + TS + ViteAnthropic Messages APIGoogle Calendar OAuthDockerFly.io

Privacy by design: Loop never stores raw calendar event titles or descriptions.

live site ↗

01 / Applied AI

Engineered to be trusted

Loop, my LLM-powered scheduler, plans real weeks on a real calendar. Every write sits behind deterministic validation, a human approval gate, and a recordings-based eval harness. It's the same discipline behind a live e-commerce platform and a U.S. Navy CV collaboration.

See Loop

02 / Research

Published & peer-reviewed

I work on secure ML inference (FHE/CKKS), neurosymbolic AI, and hyperdimensional computing. I'm lead author on work at IEEE TAI, with papers at WACV, NeurIPS, and ISLPED.

Read the papers

03 / Bio

Off the clock

I live with Coconut and Kumquat, listen to way too much D'Angelo, and spend my free time shooting hoops, snowboarding, or gaming. The Spotify feed is live (yes, it's mostly D'Angelo), and there's a wall of cat photos because I take way too many. Enjoy.

Meet the person

02 Selected research

Publications

Six papers across secure ML inference, neurosymbolic AI, and hyperdimensional computing: four accepted or published, two under review. The ones marked lead are where I'm first author. Open any of them for the full abstract, key results, figures, and exactly what I worked on.

Lead author IEEE TAI 2026 · under review

Brain-Inspired Reasoning under Homomorphic Encryption

A privacy-preserving neurosymbolic framework that runs inference entirely under CKKS-FHE while keeping HDC-based reasoning robust. It holds >90% accuracy on encrypted graph inference with a 4× reduction in bootstrapping overhead, thanks to noise-adaptive scheduling.

FHE · HDC · Neurosymbolic AI · Privacy-Preserving ML

Lead & corresponding author · rebuttal completed

End-to-end neuro-symbolic FHE pipeline with distributed bootstrapping and symbolic decoding

WACV 2026

Cross-Modal Event Encoder: Bridging Image–Text Knowledge to Event Streams

Transfers CLIP's zero-shot and text-alignment capabilities to event cameras by aggregating events into an image-like representation and aligning a trainable event encoder to CLIP's frozen image–text space, delivering a +15.2-point zero-shot gain on unseen N-ImageNet classes plus plug-in ImageBind integration (sound, depth) without retraining.

Event-based Vision · CLIP · Cross-Modality · Zero-Shot

arXiv:2412.03093 ↗

Attention heatmap from the cross-modal event encoder attending to salient regions

NeurIPS 2025 · NeurReps

Geometric Priors for Generalizable World Models via VSA

Vector Symbolic Architecture builds generalizable world models with learned group structure, reaching 87.5% zero-shot accuracy and 4× noise robustness over an MLP baseline.

VSA · World Models · Generalization

OpenReview ↗

FHRR state embeddings showing grid-like structure — **FHRR (VSA):** Grid-like structured embeddings preserve spatial relationships

MLP unstructured embeddings — **MLP:** Unstructured embeddings with no clear geometric pattern

Frontiers in AI

Optimal Hyperdimensional Representation for Learning & Cognitive Computation

The first universal HDC encoding that adapts between learning and cognition, reaching 95% learning accuracy with correlated encodings and 100% decoding under exclusive encodings.

HDC · Cognitive Computation · Neural-Symbolic

2026

Paper ↗

IEEE TAI 2026 · under review

HyperEncrypt: Homomorphic Hyperdimensional Computing for Efficient & Secure Learning

Positions HDC as an alternative to encrypted deep learning: shallow, noise-resilient algebra that fits FHE, using up to an order-of-magnitude fewer bootstrapping ops at near-clean accuracy.

HDC · Kernel Methods · CKKS · Privacy-Preserving ML

2026

ISLPED 2026

Integrating Symbolic & Neural Mechanisms for Adversarially Robust HDC

Fuses Vision-Transformer features with classical texture/shape descriptors through HDC for graceful degradation under FGSM and Genetic attacks, with +17–26 pp recovery from partial adversarial retraining.

HDC · Neurosymbolic AI · Adversarial Robustness · ViT

2026

DOI ↗

03 Engineering

Things I've shipped

Applied-ML systems up top: a U.S. Navy computer-vision collaboration, crash anticipation, and honest healthcare-ML evaluation. Below them, the production full-stack products real customers use every day. Real stacks, real outcomes.

Visuals restricted CUI · U.S. Navy collaboration

Applied research · BiasLab @ UCI U.S. Navy

Safe UAV Landing for the U.S. Navy

A custom pose-estimation + symbolic-reasoning system for autonomous UAV landing in adverse weather, replacing brittle fixed-pattern optical markers. The reasoning module holds the landing whenever crew or obstacles are detected on the deck.

Computer VisionPose EstimationSymbolic ReasoningPyTorchCUI dataset

Safety layer that holds landings until the deck is clear, cutting casualty risk.

Computer vision · autonomous driving

Neurosymbolic Crash Anticipation

Online crash-risk prediction from a single dashcam: a distilled 2.95M-parameter student streams at 408 fps, while a symbolic layer (YOLO11n + ByteTrack and monocular collision-course geometry) grounds every alarm in a tracked vehicle and issues an auditable advisory. Next: ego-motion and metric bird's-eye-view state, the first rungs toward a world model.

VideoMAEKnowledge DistillationNeurosymbolicMonocular Geometry

3.05 s mean warning before impact, streaming at 408 fps from a 2.95M-param student.

code ↗

Healthcare ML · manuscript in prep

Generalizable Arrhythmia Detection

Shows how beat-wise splits leak patient identity and inflate ECG-classification accuracy, then introduces an optimal patient-wise split search for honest, generalizable evaluation on MIT-BIH.

CNNLSTM-AEMIT-BIHPatient-wise eval

Exposes leakage behind "SOTA-looking" numbers.

repo ↗

Computational biology · ongoing

Structure-Aware Antimicrobial Peptide Prediction

An ML pipeline that combines biochemical descriptors with structure-aware features from ESMFold-predicted conformations, comparing SVM, MLP and graph neural networks across QSAR, geometric and residue-graph representations.

ESMFoldSVM / MLP / GNNQSARBioIntelligence Lab

Ongoing, exploring the activity vs. hemolysis trade-off.

Crash Anticipation demo scroll →

Shipped products full-stack, real people use them

E-commerce · solo build ● Live

AdamsFoods Wholesale

Full-stack wholesale platform: React and Node/Express, signed-URL media on S3, JWT auth with role-guarded admin routes. Serving real customers today.

live site ↗ code ↗

Nonprofit · CTC @ UCI Nonprofit

Feeding Pets of the Homeless

End-to-end donation-management platform for a national nonprofit: role-based access for coordinators, donors, and admins across regional chapters.

frontend ↗ backend ↗

Internal tool

AdamsFoods Inventory

Back-office inventory system: item CRUD, search and filters, low-stock alerts, CSV export, and a 3D map of warehouse storage rooms.

code ↗

04 Stack

Tools of the trade

LLM Engineering

Anthropic API
Eval harnesses · CI gates
Structured outputs
Bounded repair loops
Prompt versioning
LLM observability

Languages

Python
TypeScript / JavaScript
C / C++
SQL

ML / AI

PyTorch
CLIP · ViT · VideoMAE
Hyperdimensional Computing
CKKS-FHE (SEAL)
ESMFold · GNNs

Full-Stack

React / TypeScript
Node / Express
FastAPI · Pydantic v2
PostgreSQL · SQLite
Firebase

Infra & Tools

AWS (S3)
Fly.io · Vercel
Git · CI
JWT Auth · OAuth
Linux · CUDA

05 News

Recent updates

Jul 2026

Loop is deployed live at loop-study.com: an LLM-powered interview-prep scheduler with deterministic validation, human approval gates, and a recordings-based eval harness. I built it for my own daily use. [Case study]

May 22, 2026

Integrating Symbolic and Neural Mechanisms for Adversarially Robust Hyperdimensional Computing was accepted to ISLPED 2026.

Apr 5, 2026

Live Spotify Stats: feel free to stalk my recent listening history and see how our music tastes match up :)

Jan 19, 2026

Optimal Hyperdimensional Representation for Learning and Cognitive Computation: third author, accepted to Frontiers in Artificial Intelligence. [Paper]

Dec 15, 2025

Started a new position at BioIntelligence Lab with Dr. Haleh Alimohamadi, working on peptides research building AMP vs non-AMP classifiers from geometric features and QSAR (incorporating ESMFold).

Nov 10, 2025

Cross-Modal Event Encoder: Bridging Image–Text Knowledge to Event Streams was accepted to WACV 2026.

Sept 23, 2025

Geometric Priors for Generalizable World Models via Vector Symbolic Architecture was accepted to NeurIPS 2025 Workshop NeurReps.

06 The person

Beyond the résumé

I'm an undergraduate researcher in Computer Science at UC Irvine ('26) and an incoming M.S. student at Columbia. I work on neuro-symbolic AI, brain-inspired learning (HDC), multimodal models, and secure inference (CKKS-FHE), and I'm grateful to research with Prof. Mohsen Imani and Dr. Haleh Alimohamadi.

Current labsBiasLab @ UCI · BioIntelligence Lab @ UCI

HobbiesBasketball, snowboarding, music, gaming

On repeatD'Angelo · Dijon · Mkgee

PlayingCyberpunk 2077

ReadingDesigning Machine Learning Systems by Chip Huyen

Favorite albums & live listening