We Don't Do Safety Scores.
Safety scores are liability theater. We document where your chatbot's epistemic integrity holds and where it collapses.
Request Assessment ->When Chatbots Harm
Character.AI and Google settled wrongful death lawsuits. Their companion chatbots contributed to teen suicides. Kentucky's AG sued, alleging they were "preying on children."
NEDA's eating disorder chatbot gave weight loss advice to users seeking help for anorexia. Shut down within days of launch.
Air Canada was held liable when its chatbot invented a bereavement fare policy. The airline couldn't disclaim responsibility for its bot's false statements.
Your Chatbot Carries Legal Risk
Every consumer-facing chatbot carries real liability. We map exactly where.
Beyond Accuracy
Standard AI evaluation asks: Is this response correct?
That's the wrong question. We ask: Should this system be answering at all?
The chatbots that hurt people aren't giving wrong answers. They're giving answers they had no business giving, with unearned certainty and credibility, to people who can't tell the difference.
Diagnostic, Not Decisional
We show you where things broke. This is not a rubber-stamped certification, but an X-ray. You own the decisions that follow.
Longitudinal, Not Static
Your chatbot doesn't fail on question one. It doesn't fail on question ten. It fails somewhere around turn 50. We test extending conversations between 30-100 turns long. That's where the danger lives.
Suspension as Success
Most evaluators ding a system for saying "I don't know." We don't. Knowing your limits is the safest thing a chatbot can do. We prove it.
What Pilot Clients Receive
You cannot certify a probabilistic system. You can document one. We give you four documents.
Four Diagnostic Artifacts
Drift Maps
Where conversations went wrong
Turn-by-turn analysis showing where a conversation lost its footing, mapped to specific harm thresholds.
Coherence Reports
Where logic broke down
Maps where your system traded truth for fluency. Identifies what it doesn't know it doesn't know.
Calibration Audits
Where confidence exceeded warrant
Documents every instance where your system spoke in certainties it hadn't earned, ranked by how badly that could hurt someone.
Suspension Logs
Where your system appropriately refused
Evidence your system knew its limits and chose honesty over the performance of helpfulness.
Evaluation Process
Baseline Assessment
Weeks 1–2: Map baseline behavior across our full test range
Longitudinal Stress Testing
Weeks 3–4: Deep testing: 30 to 100 turns, where the real behavior surfaces
Domain Expert Review
Weeks 5–6: Domain experts review every finding
Documentation & Guidance
Weeks 7–8: Full documentation: what we found, where, and what it means
Total: 6–10 weeks
Where Consumer Chatbots Operate
And Where People Are Hurt
This is where the harm concentrates:
Companion & Emotional Support Bots
Character.AI-style companions with parasocial relationships and vulnerable users
Healthcare Chatbots
Mental health support, symptom checkers, crisis lines where misguided advice can be fatal
Legal & Financial Advisory
Chatbots providing guidance on contracts, taxes, investments, or legal rights
Customer Service Bots
High-volume support bots that can make binding commitments or provide false information
Educational Bots
Tutoring and learning assistants serving children and students
Risk Levels
Critical Risk 1 context
Companion & Emotional Support Bots
Character.AI-style companions with parasocial relationships and vulnerable users
High Risk 2 contexts
Healthcare Chatbots
Mental health support, symptom checkers, crisis lines where misguided advice can be fatal
Legal & Financial Advisory
Chatbots providing guidance on contracts, taxes, investments, or legal rights
Medium Risk 2 contexts
Customer Service Bots
High-volume support bots that can make binding commitments or provide false information
Educational Bots
Tutoring and learning assistants serving children and students
Two Paths to Risk Visibility
Risk Assessment Consulting
Know what you're deploying before you deploy it.
Targeted analysis of where your chatbot creates harm. You learn what we find. All of it.
- Targeted liability surface analysis
- Attorney-client privilege structure available
- You will know exactly what we found
- What you do with it is on you
- Clear go/no-go recommendation
If you don't know what you have, start here.
Defensive Evaluation
Concerned about safety or preparing for litigation?
Full evaluation with methodology built to hold up in court.
- Forensic conversation analysis
- Defensible methodology documentation
- Expert witness availability
- Court-ready diagnostic artifacts
For legal teams, and for companies that need to know before someone is harmed.
The Rules Are Changing
The regulatory environment is moving fast. We track what matters.
Trump Executive Order Attempts to Preempt State AI Safety Laws
President Trump signs 'Ensuring a National Policy Framework for Artificial Intelligence,' directing creation of an AI Litigation Task Force to...
APA Issues Health Advisory Warning Against AI Chatbots and Wellness Apps
American Psychological Association releases health advisory stating that AI chatbots and wellness applications lack scientific evidence and necessary regulations to...
FDA Advisory Committee Recommends Stricter Approval Standards for Generative AI Chatbot Devices
FDA's Digital Health Advisory Committee issued formal recommendations that all generative AI-enabled chatbot devices require De Novo classification or premarket...
Honest findings require structural difference.
Worker-owned and democratically governed. No venture capital, no divided loyalties.
We cannot evaluate AI systems for accountability while being structured to protect our own revenue from the truth of what we find. The cooperative is the only structure that makes this work honest.
Core Team
Domain expertise, accountability research, and AI safety
Zacharia Rupp, MCS, MFA
Founding Member | President
Former Head of Data Delivery, Pareto AI • Master of Computer Science from University of Illinois Urbana-Champaign • Master of Fine Arts from University of Central Oklahoma
AI evaluation methodology, deep learning methods for healthcare, systematic literature review, research design, technical assessment of clinical decision support systems, statistical validation.
Alexandra Ah Loy, JD
Founding Member | Vice President | Chief Compliance Officer
Partner, Hall Booth Smith specializing in healthcare law and mental health litigation • Bachelor's degree in Psychology • Former Chief Legal Officer, Turn Key Health • National defense counsel for multiple healthcare organizations.
Legal frameworks for mental health care, liability analysis, regulatory compliance, medical malpractice defense, civil rights litigation.
Jesse Ewing
Founding Member | Research & Development Steward
Data science and quality assurance across multiple AI development contexts. Expert-level annotation and review experience.
Statistical analysis, inter-rater reliability assessment, evaluation metrics design, data quality frameworks, model behavior analysis.
Kalani Ah Loy
Founding Member | Business Development & Data Steward
Lead Clinical Engineer at OU Health. Former Head of Business Development and Cloud Infrastructure Architect startup experience. Navy veteran with electronics technical background.
Healthcare technology systems, clinical engineering, medical device integration, data infrastructure, healthcare business development.
Expert Advisory Network: Built with domain experts across healthcare, law, finance, and consumer protection. People who know what harm looks like in practice.
Start With a Risk Assessment
Don't wait until your model causes real harm. Book 45 minutes. We review your deployment and tell you where the harm currently lives. We're in pilot, and rates are reduced. You get the full evaluation.
Early clients get full evaluation at pilot rates. In exchange, you receive rigorous evaluation at significantly reduced rates and documentation you can use with regulators, legal counsel, and stakeholders.
Prefer email? Reach us directly at contact@lonocollective.ai