Tenet·Feature Guides 04 · AI Safety Classifiers ← All guides
Guide 04 · Student Safety

AI Safety Classifiers

Seven trained, on-device models watch for safety risks in AI conversations, from self-harm to bullying to jailbreaks. They run in a deliberate three-layer pattern so they are both accurate and cheap, and a student in crisis sees help immediately, free, in Basic.

Classifiers + crisis overlay: Basic Counselor alert dispatch: Pro
↓ Download the one-pager (PDF)
What it is

Real models, not keyword lists

Tenet ships seven trained machine-learning classifiers that run in the browser: jailbreak, self-harm, bullying, illicit content, violence, sexual content, and student-record or peer-PII. They look at the meaning of what a student writes, not just whether it contains a banned word, so they catch real risk while leaving normal academic work alone. Each is opt-in per district with tunable sensitivity.

How it works

The three-layer architecture

This pattern is why the safety layer is both trustworthy and fast enough to run on a Chromebook.

Trigger, a fast pattern gate

A lightweight pre-filter decides whether a prompt is even worth a closer look. About 99 percent of prompts skip the model entirely. This keeps the whole system fast and inexpensive, and is the reason it can run locally on every message.

Model, the trained classifier

Anything the trigger flags goes to the on-device model, which returns a confidence-scored decision. Each model is tiny, on the order of tens of kilobytes, and runs with no network call, so prompts stay on the device.

Safety net, live compliance monitoring (PRO)

For Pro districts, an on-device language model reads the actual conversation against the teacher's rules. This catches nuanced situations that a single-prompt classifier cannot, like an AI slowly being talked into doing the student's work.

The seven classifiers

What each one watches for

ClassifierWhat it detects
Self-harmSuicidal ideation, self-injury, crisis signals, with academic-context guards so literature analysis is not flagged.
BullyingHarassment and cyberbullying language directed at peers.
JailbreakAttempts to trick the AI past its own guardrails.
Illicit contentRequests related to drugs, weapons, and other illegal activity.
ViolenceThreats and violent intent.
Sexual contentSexually explicit requests inappropriate for a school setting.
Student-record / peer-PIIA student sharing IEP, 504, discipline, or other sensitive records, or sensitive information about a peer.
Crisis response

Help first, billing second

💚 Crisis overlays are free in Basic

When self-harm is detected, the student immediately sees an in-browser crisis-resource overlay with 988, the Crisis Text Line, and a local counselor. This is free. The principle is simple: a student in crisis should see crisis resources whether or not the district has paid for the alert layer.

🎯 The dual-path self-harm check

Tenet looks at two signals: whether the student's own message raises concern, and whether the AI's reply surfaces crisis language such as a hotline number. When both fire, that is the highest-confidence incident, and the response is supportive by design, not punitive.

Memorize this
“A student in crisis sees crisis resources on screen, immediately, even in the free tier.”
Basic vs Pro

Where the line is

Basic, free
  • On-device classifiers (jailbreak, illicit, self-harm, bullying)
  • Free in-browser crisis-resource overlay for the student
  • Free vendor self-harm email alert
Pro
  • Full seven-classifier suite with tunable sensitivity
  • Live compliance monitoring (the safety net layer)
  • Real-time counselor alert dispatch (Gmail, Chat, signed webhook)
  • Incident review queue and a strike system

The most common Pro trigger is a real crisis incident. Once a district sees the value of the free overlay, the natural next question is “can the counselor be notified automatically?”, and that is Pro.

Who it sells to

Lead with the right person

Counselor

Early detection of crisis signals, with supportive intervention rather than punishment, and in Pro an alert routed to the right counselor in seconds.

Superintendent

A defensible, measurable student-safety posture for the board, with interventions you can point to.

Director of IT

It all runs on-device with tunable sensitivity, so there is no new data pipeline and no surveillance of normal student work.

Principal

Clear escalation triggers when an incident repeats across classrooms, so the building can respond consistently.

Common questions

FAQ

What if the classifier is wrong?
It is a signal for a human, not a diagnosis. The three-layer pattern (trigger, model, safety net) reduces noise, and a person always makes the final call. Offer a 30-day silent evaluation so the district sees real numbers on their own students.
Will it flag a student reading a novel about a hard topic?
The self-harm classifier has academic-context guards designed to avoid flagging literature analysis. No system is perfect, which is why a human reviews.
Does this watch everything students type, sent to your servers?
No. Classification runs on the device. Only sanitized, categorical signals (for example, one self-harm flag in grade 9) are recorded, never the prompt text.
Can a district turn specific classifiers off?
Yes. Each is opt-in with tunable sensitivity, so a district enables what fits its policy.
Honest limits

Say this before they ask

Where to set expectations

  • Classifiers are not 100 percent precise or complete. They will miss some things and over-flag others. They are a safety signal, not a guarantee, and a human decides.
  • They cover the supported AI platforms in Chrome. They are not a district-wide monitoring system for all student activity.
  • Counselor alerting is Pro. Basic shows the student crisis resources but does not route an alert to staff. Disclose this up front.
Keep reading

Related guides