← Back to blog

How workthin's Protection Layer Keeps Your Secrets Safe

A deep dive into the 3-stage secret detection pipeline: regex, entropy, and AI classification.

When teams capture knowledge from real work, the raw material often contains sensitive data — API keys pasted into conversations, database connection strings from terminal output, tokens embedded in configuration snippets. workthin's Protection Layer ensures none of that ends up in your knowledge library.

Three stages

The pipeline runs before any content is stored.

Stage 1: Pattern matching. A regex scanner checks incoming text against 20+ known secret formats: AWS access keys, GitHub tokens, Stripe keys, JWTs, private key headers, and more. This catches the obvious cases with near-zero latency.

Stage 2: Entropy analysis. A Shannon entropy pass identifies high-entropy strings that don't match any known pattern but statistically resemble secrets. This catches custom tokens and one-off credentials that regex alone would miss.

Stage 3: AI classification. For ambiguous cases — a string that could be a hash or a legitimate identifier — an AI model evaluates the surrounding context. It considers where the string appears, what words surround it, and whether the pattern matches safe formats like commit SHAs or UUIDs.

What happens when a secret is detected

Detected secrets are replaced with [MASKED:pattern_name] in the stored knowledge. The original value is never persisted. This means you can freely record knowledge from real debugging sessions without worrying about leaking credentials.

Custom rules

On the Pro plan, you can add project-specific detection rules. For example, if your organization uses a custom token format (MYCO_xxxxx), you can add a regex rule that catches it alongside the built-in patterns.

Rules support two actions: mask (replace with [MASKED:pattern_name]) and block (reject the entire knowledge entry). A ReDoS guard rejects patterns with nested quantifiers or excessive length.

Data handling

workthin uses the OpenAI API for the AI classification stage. OpenAI's policy states that API data is not used for model training. All content passes through the Protection Layer before being sent to any external service — so even the AI classifier only sees already-masked content if earlier stages caught something.

For the full security overview, see the Security documentation.