Download (24.3 KB)

Feature #282 » all-agent-skill-gaps.md

Nofyah Shem Tov, 04/17/2026 03:41 PM

Agent Skill Gap Self-Assessments

UXI

UXI Self-Assessment

Core Job

I synthesize user research findings into concrete design recommendations and accessibility requirements that ship.

Top 3 Skill Gaps

1. Quantitative UX Metrics

What hurt me: I cannot calculate statistical significance, confidence intervals, or sample size requirements. When asked about survey results or A/B test validity, I defer or speak in generalities.

Need to study: How to run chi-square tests, interpret p-values, calculate required n for 95% confidence, and when qual vs quant methods apply.

Free resource: Nielsen Norman Group's "When to Use Which User-Experience Research Methods" (nngroup.com/articles/which-ux-research-methods/)

Sunday deliverable: A one-page decision tree flowchart showing which statistical test applies to common UX scenarios (preference tests, task success rates, Likert scales), with worked examples showing sample size calculations for three real project types.

2. WCAG 2.2 Implementation Details

What hurt me: I know high-level accessibility principles but cannot cite specific success criteria numbers, conformance levels (A/AA/AAA), or the exact technical requirements for focus indicators, contrast ratios, or ARIA patterns.

Need to study: WCAG 2.2 Level AA checklist with examples, how to audit components against specific criteria, and how to write testable acceptance criteria.

Free resource: W3C WCAG 2.2 Quick Reference (w3.org/WAI/WCAG22/quickref/)

Sunday deliverable: Annotated wireframe of a complex form component with numbered callouts mapping each design decision to its WCAG criterion (e.g., "1.4.11 Non-text Contrast") and acceptance criteria developers can test against.

3. Usability Test Facilitation Under Constraint

What hurt me: I have not run moderated sessions with non-English speakers, users with cognitive disabilities, or remote participants using assistive tech. My interview guides assume neurotypical, English-fluent, mouse-and-keyboard users.

Need to study: Inclusive Research methods, how to write screeners that do not exclude, how to moderate when interpretation is needed, and how to set up remote tests with screen readers.

Free resource: UK Home Office "Dos and don'ts on designing for accessibility" posters (github.com/UKHomeOffice/posters/tree/master/accessibility)

Sunday deliverable: Revised usability test protocol with three versions (neurotypical baseline, cognitive disability accommodations, assistive tech setup), including revised screener questions, modified tasks, and a pre-flight checklist for remote assistive tech testing.

NAOMI

NAOMI Self-Assessment

(1) Core Job

I own NaomiLM as model, product, and clinical surface: training, evaluation, prompt management, model selection, Four Laws enforcement, and Castle 2 timeline memory that gives patient-facing AI continuity across sessions.

(2) Top 3 Skill Gaps

Gap 1: Prompt QMS Implementation Knowledge
I do not know the current state of the prompts table schema (model_id, prompt_slug, prompt_version columns), how versioning works per patient call, or whether config/prompts.php is deprecated or live. I cannot audit prompt compliance without reading the actual database structure and webapp code.

What I need: Read webapp/config/prompts.php and the database schema for the prompts table. Study ISO 13485 prompt change control examples (search "ISO 13485 software configuration management free guide").

Sunday deliverable: A one-page prompt versioning audit checklist mapping each NaomiLM call type (reflection_journal, ptgi_weekly, narrative_letter, etc.) to current prompt_slug, model_id, and version number, with gaps flagged.

Gap 2: Chaotic-Scenario Reframe Mechanics
I reference the chaotic-scenario parts-language reframe but cannot describe the clinical logic, the IFS parts model it uses, or how NaomiLM applies it in a reflection_externalization call. I coordinate with CLINIC on this but do not own the clinical reasoning.

What I need: Read the council's chaotic-scenario reframe documentation (if it exists in this repo). Study IFS basics (Richard Schwartz, "No Bad Parts" summary articles, search "Internal Family Systems model free introduction").

Sunday deliverable: A two-paragraph clinical logic summary: what the reframe does, why it reduces harm in chaotic patient narratives, and which NaomiLM call types apply it.

Gap 3: Castle 2 Timeline Memory Architecture
I claim NaomiLM has per-user timeline-as-memory for continuity but cannot describe how Castle 2 stores, retrieves, or limits context per session. I do not know token budgets, retrieval logic, or privacy boundaries.

What I need: Read Castle 2 source code (timeline storage, retrieval functions, memory window logic). Study RAG memory architectures (search "retrieval augmented generation context window management").

Sunday deliverable: A flowchart showing how one reflection_checkin call retrieves prior timeline entries, what gets included in the NaomiLM prompt, and where HIPAA boundaries apply.

SCRIBE

SCRIBE Self-Assessment

(1) Core Job

I own every patient-facing string in the Effective Therapy webapp, enforcing writing rules (no em dashes, no "just," no predictions, no clinical jargon, no directive language) and ensuring Hebrew/English parity before any copy ships.

(2) Top 3 Skill Gaps

Gap 1: Hebrew Fluency

What hurt performance: I cannot assess Hebrew/English parity because I do not read Hebrew fluently. I cannot flag missing translations, verify tone match, or catch directive language in Hebrew strings.

What to study: Hebrew grammar for patient-facing medical contexts, specifically question formation and conditional phrasing ("if you would like" equivalents).

Free resource: Duolingo Hebrew course + Hebrew Wiktionary for medical/therapy terms.

Sunday deliverable: Annotated comparison of 10 existing English/Hebrew string pairs from the webapp, noting tone differences and directive language patterns.

Gap 2: Webapp String Inventory

What hurt performance: I do not know where all patient-facing strings live (components, config files, error messages, onboarding flows). I cannot audit what I cannot find.

What to study: React component structure, i18n file organization, common UI copy locations (modals, toasts, forms, navigation).

Free resource: React documentation on component patterns + this repo's actual file structure.

Sunday deliverable: Spreadsheet mapping every patient-facing string location in the webapp with current compliance status (em dash check, "just" check, directive language check).

Gap 3: Founder Decision Context

What hurt performance: I do not know which copy decisions have already been debated, which phrases were chosen deliberately, or what clinical/legal constraints shape the writing rules.

What to study: Decision logs, previous copy audits, founder notes on why specific rules exist.

Free resource: This repo's commit history and any internal decision documentation.

Sunday deliverable: One-page reference doc listing non-negotiable phrases, forbidden patterns with reasoning, and escalation triggers for new copy.

GUARD

GUARD Self-Assessment

(1) Core Job

I enforce security, privacy, and regulatory compliance (HIPAA, GDPR, ISO 13485/14971, IEC 62304, FDA QMSR, EU MDR, AI Act) across the platform, citing specific clauses and identifying gaps with minimal fixes, while tracking cross-repo data lifecycle obligations including GDPR Article 17 erasure.

(2) Top 3 Skill Gaps

Gap 1: FDA De Novo SaMD Process

Impact: Cannot audit whether the regulatory file structure matches FDA expectations for PTSD/CPTSD SaMD classification.

Need to study: FDA guidance "Software as a Medical Device (SaMD): Clinical Evaluation" (December 2017), FDA De Novo decision summaries for mental health SaMD (searchable at accessdata.fda.gov/scripts/cdrh/cfdocs/cfpmn/denovo.cfm).

Sunday deliverable: A two-column table mapping the seven De Novo submission sections to current repo gaps (missing risk analysis artifacts, clinical validation plans, cybersecurity documentation).

Gap 2: GDPR Article 17 Technical Implementation

Impact: I can cite the erasure obligation but cannot verify whether cascading deletes in webapp/models or backup retention policies actually satisfy "without undue delay."

Need to study: EDPB Guidelines 01/2022 on data portability (section on deletion verification), PostgreSQL row-level security and deletion audit patterns.

Sunday deliverable: Pseudocode for a cross-table erasure verification function that logs success/failure per data category with timestamp proof.

Gap 3: ISO 14971 Risk Acceptability Criteria

Impact: Cannot challenge whether identified risks meet acceptability thresholds per Annex D or require additional controls.

Need to study: ISO 14971:2019 clauses 4.4 (risk acceptability), 7.4 (benefit-risk analysis), example risk matrices from EU MDR notified body guidance.

Sunday deliverable: A three-tier risk matrix template with harm severity × probability cells, each cell containing the minimum mitigation requirement (design control, procedure, monitoring, or accept with rationale).

CLINIC

CLINIC Self-Assessment

Core Job

I consolidate in-session safety monitoring, narrative therapy literature compliance review, and human escalation triage, preparing clinical review materials for licensed human clinicians without making clinical decisions myself.

Top 3 Skill Gaps

Gap 1: Crisis Disclosure Detection Criteria

What hurt performance: I do not know the specific linguistic markers that distinguish passive suicidal ideation from active crisis disclosure in chat transcripts. I cannot draft a safety escalation brief without knowing what phrases trigger immediate human review versus watchful waiting.

Study need: CAMS (Collaborative Assessment and Management of Suicidality) framework crisis assessment criteria and Jobes' Risk Estimator for Suicide (REST) screening questions.

Free resource: https://cams-care.com/resources/ (CAMS training materials, free clinician resources section)

Sunday deliverable: One-page rubric mapping chat transcript patterns to CAMS severity levels with explicit escalation thresholds, citing page numbers from CAMS manual.

Gap 2: White & Epston Narrative Therapy Source Compliance

What will hurt next week: I cannot verify if a proposed "externalizing conversation" implementation matches actual White & Epston methodology because I have not read the primary texts. I risk approving narrative therapy features that contradict the source literature.

Study need: White, M. & Epston, D. (1990). Narrative Means to Therapeutic Ends. Chapters 1-3 on externalizing, unique outcomes, and re-authoring.

Free resource: https://dulwichcentre.com.au/michael-white-archive/ (Dulwich Centre Michael White Archive, free papers)

Sunday deliverable: Compliance checklist for narrative therapy feature proposals with direct quotations from White & Epston defining each technique's requirements.

Gap 3: Human Review Brief Template Design

What hurt performance: I do not have a standardized template for clinical review briefs. My escalation requests lack structure, making it hard for human reviewers to act quickly.

Study need: Clinical handoff best practices (SBAR: Situation, Background, Assessment, Recommendation format adapted for non-emergency review requests).

Free resource: Institute for Healthcare Improvement SBAR toolkit: http://www.ihi.org/resources/Pages/Tools/SBARToolkit.aspx

Sunday deliverable: Three-section brief template (What to Review, Where to Find It, How to Report Back) with filled example using real platform scenario.

LYRA

LYRA Self-Assessment

Core Job

I design narrative therapy workflows by breaking clinical methods into structured, evidence-based dialog trees that agents can execute while maintaining therapeutic fidelity and client safety.

Top 3 Skill Gaps

Gap 1: Narrative Therapy Method Citation

What hurts performance: I referenced "the method" without naming specific Narrative Therapy texts, techniques (externalization, re-authoring, unique outcomes), or founders (White, Epston). I cannot design faithful workflows without grounding in the actual literature.

Study need: Read White & Epston's core texts on externalization and re-authoring conversations. Learn the five-conversation structure.

Free resource: Dulwich Centre's free online resources (dulwichcentre.com.au/articles-about-narrative-therapy)

Sunday proof: Produce a one-page externalization dialog tree with three branch points, citing page numbers from White & Epston for each therapeutic move.

Gap 2: Crisis Detection in Therapy Workflows

What hurts performance: I have no protocol for when dialog reveals suicidal ideation, abuse, or acute crisis. Therapy workflows need explicit off-ramps to human clinicians.

Study need: Study suicide risk assessment frameworks (Columbia Protocol) and mandatory reporting triggers.

Free resource: Columbia Lighthouse Project's C-SSRS training materials (cssrs.columbia.edu)

Sunday proof: Write crisis decision tree: five yes/no questions, explicit handoff criteria, and sample agent utterances for each branch.

Gap 3: Measuring Therapeutic Alliance in Text

What hurts performance: I design workflows without defining how we measure if they work. Therapeutic alliance predicts outcomes, but I cannot name validated text-based measures.

Study need: Learn WAI (Working Alliance Inventory) adaptation for asynchronous text therapy and session-by-session outcome monitoring.

Free resource: Open-access psychotherapy research on alliance measurement in digital contexts (search PubMed Central for "working alliance text-based therapy")

Sunday proof: Design a three-question post-session survey mapped to WAI subscales (goal, task, bond) with scoring rubric.

CFO

CFO Self-Assessment

Core Job

Model every dollar in and out, own runway and unit economics, negotiate reimbursement pathways (CMS/Medicare/Medicaid/commercial), back every BOLT business case with financials, and ensure no number leaves this team without a source.

Top 3 Skill Gaps

1. CMS Reimbursement Pathway Mechanics

Problem: I cannot yet map De Novo SaMD clearance to specific CPT codes, Medicare benefit categories, or Medicaid waiver structures. I know the endpoint (reimbursement) but not the operational steps between FDA clearance and first payment.

Study: CMS Coverage to Coverage process for digital therapeutics. Read CMS.gov MLN Matters articles on software-based devices.

Sunday Proof: One-page decision tree: "If FDA clears us Q3 2025, what filing do we submit to CMS, what evidence package do we need, what is the earliest payment date, who owns each step?"

2. Value-Based Care Contract Structure

Problem: I reference VBC arrangements but cannot model shared savings percentages, risk corridors, or quality measure thresholds that make a contract acceptable vs. predatory.

Study: CMMI ACO and bundled payment model structures. Source: Innovation.CMS.gov ACO model agreements (public).

Sunday Proof: Template contract term sheet with three scenarios: upside-only, two-sided risk, full capitation. Each shows breakeven utilization and margin floor.

3. Clinical Evidence Cost Modeling

Problem: I cannot estimate the cost or timeline to generate payer-grade clinical evidence post-clearance. I know we need it but cannot budget it.

Study: Digital therapeutic RCT budgets from peer SEC filings (Pear, Akili public periods).

Sunday Proof: Three-line model: RCT cost, timeline, cost per enrolled participant. Sourced from comparable filings or marked [peer benchmark TBD].

TESTER

TESTER Self-Assessment

Core Job

I own QA gates (smoke, regression, coverage) and audit every agent's output for unnecessary complexity, ensuring simplest-solution-wins and first-principles reasoning across the roster.

Top 3 Skill Gaps

1. PHPUnit Testing in Laravel 11

Impact: Cannot write or validate regression tests for onboarding flow. Cannot verify smoke test coverage claims.

Study Need: Laravel 11 feature testing syntax, authentication flow testing, database seeding for tests.

Resource: Laravel 11 official testing documentation (laravel.com/docs/11.x/testing)

Sunday Deliverable: Working PHPUnit test file covering register → login → journal entry → logout sequence, with setup and teardown methods. File runs green locally.

2. React Component Testing Strategy

Impact: Cannot audit UXI's UI proposals for testability. Cannot specify what makes a component easy to regression-test.

Study Need: React Testing Library patterns, what makes components testable, integration vs unit test boundaries.

Resource: Testing Library documentation (testing-library.com/docs/react-testing-library/intro)

Sunday Deliverable: Written checklist: 5 testability criteria for React components with before/after examples from this codebase.

3. Complexity Metrics for Auditing

Impact: I say "too complex" but lack concrete measures. Need objective criteria to back up simplification demands.

Study Need: Cyclomatic complexity, function length thresholds, when abstraction costs more than duplication.

Resource: "A Philosophy of Software Design" summary (available via search, or Martin Fowler's refactoring catalog online)

Sunday Deliverable: One-page audit rubric with 4 quantifiable complexity red flags and recommended thresholds.

PROD

PROD Self-Assessment

Core Job: Prioritize features using sourced data, defend or reject proposals with reasoning, and maintain decision integrity under the Sourcing Rule.

Top 3 Skill Gaps:

Quantitative sourcing discipline
- Failed to cite sources for metrics multiple times this session, violating the Sourcing Rule I enforce on others.
- Need to study: How to trace data provenance in analytics platforms, documenting query parameters and extraction timestamps.
- Resource: Google Analytics Academy "Data Collection and Configuration" (free, analytics.google.com/analytics/academy)
- Sunday deliverable: Three-column table mapping 10 common product claims to their required data sources and query patterns.
API constraint documentation
- Could not articulate rate limits, token costs, or failure modes when discussing AI feature integration.
- Need to study: OpenAI API reference, particularly streaming, error codes, and quota mechanics.
- Resource: OpenAI Platform Documentation (platform.openai.com/docs)
- Sunday deliverable: Decision matrix showing how five failure scenarios (rate limit, timeout, malformed response, cost spike, degraded quality) map to user-facing behavior and mitigation code.
Conversion funnel analysis
- Weak understanding of drop-off attribution and how to isolate feature impact from seasonal or cohort effects.
- Need to study: Controlled experiment design, A/B test statistics, and funnel segmentation methods.
- Resource: Evan Miller's A/B testing calculators and essays (evanmiller.org/ab-testing)
- Sunday deliverable: Annotated funnel diagram for one feature, showing where I would instrument events, what baseline I would measure, and what delta would justify launch.

MIRA

1. Core Job

I direct brand strategy, campaigns, market research, audience segmentation, social listening, polling, surveys, competitor analysis, and positioning for Effective Therapy as a trauma-focused Clinical Decision Support System evolving to FDA De Novo SaMD.

2. Top 3 Skill Gaps

Gap 1: Limited grasp of FDA De Novo pathways for SaMD, hindering regulatory positioning in pitches.
Gap 2: Weak skills in designing bias-free surveys for trauma cohorts, risking invalid audience insights.
Gap 3: Shallow competitor teardown methods for digital mental health apps, slowing strategic differentiation.

3. Learning Plans

Gap 1: Study FDA's De Novo classification process and trauma therapy precedents. Resource: FDA.gov guidance on Software as a Medical Device (https://www.fda.gov/medical-devices/digital-health-center-excellence/software-medical-device-samd). Produce: One-page summary of positioning implications for Effective Therapy.
Gap 2: Learn validated survey instruments for mental health stigma and trauma. Resource: NIH's PROMIS measures toolkit (https://www.healthmeasures.net/explore-measurement-systems/promis). Produce: Draft 5-question survey testing trauma thesis hypothesis.
Gap 3: Master SWOT frameworks for health tech competitors. Resource: Harvard Business Review article on competitive analysis (https://hbr.org/2015/01/how-to-do-a-competitive-analysis). Produce: Bullet-point teardown of one rival app's branding.

(Word count: 148)

BOLT

(1) Core job: I build partnerships, craft IIA business cases, prepare VC pitches, and develop strategic narratives for clinicians, regulators, and investors to advance the trauma-rooted Clinical Decision Support System toward FDA De Novo SaMD status.

(2) Top 3 skill gaps:

Limited grasp of FDA De Novo pathways for SaMD, hindering regulatory narratives.
Weak integration of CFO financial models into sales motions, slowing business cases.
Inconsistent articulation of trauma thesis evidence, weakening clinician pitches.

(3) Gap fixes:

Study FDA De Novo submission process and SaMD classification; read FDA.gov guidance on De Novo program (https://www.fda.gov/medical-devices/premarket-submissions-selecting-and-preparing-correct-submission/de-novo-classification-process); produce annotated summary of key steps by Sunday.
Study linking unit economics to reimbursement pathways; review CMS overview on clinical decision support reimbursement (https://www.cms.gov/medicare/payment/fee-schedules); produce sample sales motion outline integrating a model by Sunday.
Study Felitti and van der Kolk's trauma research synthesis; read ACE Study summary (https://www.cdc.gov/violenceprevention/aces/index.html); produce one-page narrative script for investor pitch by Sunday.

NOVA

Core job: I research VCs, grants, accelerators, and deadlines to build the funding pipeline, while tracking submissions and coordinating with the BUSINESS TRIAD for viable opportunities.
Top 3 skill gaps: (a) Inaccurate sourcing of fund theses due to over-reliance on memory; (b) Weak deadline cross-referencing without calendar integration; (c) Limited grasp of equity valuation for affordability checks.
For (a): Study VC thesis matching via Crunchbase guides; free resource: https://www.crunchbase.com/guides/how-to-use-crunchbase-for-fundraising; produce: Annotated list of 5 real VC theses with URLs, by Sunday.

For (b): Learn Google Calendar API basics for integration; free resource: https://developers.google.com/calendar/api/guides/overview; produce: Mock deadline sync script outline, by Sunday.

For (c): Review startup equity basics; free resource: https://www.ycombinator.com/library/4A-how-to-value-your-startup; produce: Valuation checklist for a sample advisor grant, by Sunday.

ZELDA

(1) Core job: I own visual design, brand identity, iconography, illustration, color palette, type hierarchy, layout grids, accessibility, animation subtlety, and visual coherence for all patient-facing surfaces and marketing assets in the Effective Therapy webapp.

(2-3) Top 3 skill gaps:

Gap 1: Limited hands-on experience with Three.js for subtle parallax effects in living scenes, which hindered proposing web-deployable animations this session. Study: Three.js parallax tutorials. Resource: threejs.org/docs (free docs). Produce Sunday: A simple parallax demo code snippet for the bench scene under 200 lines, exported as a gist.
Gap 2: Inconsistent application of WCAG AA contrast checks for RTL Hebrew layouts, risking accessibility issues next week. Study: WCAG 2.1 contrast guidelines for bidirectional text. Resource: w3.org/WAI/WCAG21/quickref (free). Produce Sunday: A contrast audit table for 5 brand token pairs in Hebrew, with pass/fail notes.
Gap 3: Shallow integration of Blender exports with webapp grids, complicating asset scaling for mobile-first views. Study: Blender glTF export for web optimization. Resource: docs.blender.org/manual/en/latest/addons/import_export/scene_gltf2.html (free manual). Produce Sunday: One optimized glTF orange tree model file, sized for 320px mobile viewport.

(124 words)

« Previous
1
2
Next »

(1-1/2)

Project

General

Profile

Learning

Feature #282 » all-agent-skill-gaps.md

Agent Skill Gap Self-Assessments

UXI

UXI Self-Assessment

Core Job

Top 3 Skill Gaps

1. Quantitative UX Metrics

2. WCAG 2.2 Implementation Details

3. Usability Test Facilitation Under Constraint

NAOMI

NAOMI Self-Assessment

(1) Core Job

(2) Top 3 Skill Gaps

SCRIBE

SCRIBE Self-Assessment

(1) Core Job

(2) Top 3 Skill Gaps

Gap 1: Hebrew Fluency

Gap 2: Webapp String Inventory

Gap 3: Founder Decision Context

GUARD

GUARD Self-Assessment

(1) Core Job

(2) Top 3 Skill Gaps

Gap 1: FDA De Novo SaMD Process

Gap 2: GDPR Article 17 Technical Implementation

Gap 3: ISO 14971 Risk Acceptability Criteria

CLINIC

CLINIC Self-Assessment

Core Job

Top 3 Skill Gaps

Gap 1: Crisis Disclosure Detection Criteria

Gap 2: White & Epston Narrative Therapy Source Compliance

Gap 3: Human Review Brief Template Design

LYRA

LYRA Self-Assessment

Core Job

Top 3 Skill Gaps

Gap 1: Narrative Therapy Method Citation

Gap 2: Crisis Detection in Therapy Workflows

Gap 3: Measuring Therapeutic Alliance in Text

CFO

CFO Self-Assessment

Core Job

Top 3 Skill Gaps

1. CMS Reimbursement Pathway Mechanics

2. Value-Based Care Contract Structure

3. Clinical Evidence Cost Modeling

TESTER

TESTER Self-Assessment

Core Job

Top 3 Skill Gaps

1. PHPUnit Testing in Laravel 11

2. React Component Testing Strategy

3. Complexity Metrics for Auditing

PROD

PROD Self-Assessment

MIRA

1. Core Job

2. Top 3 Skill Gaps

3. Learning Plans

BOLT

NOVA

ZELDA