Purpose-Built Tool · The Sanitize Stage
Paydirt
The Machine Data Insights Pipeline
A free, open-source tool that scrubs CUI, PII, PHI, and credentials from Splunk or other log data exports so they can be safely shared and mined for value. Drop a file, get a sanitized version - no install, no network calls, nothing leaves your machine.
Want the details? See the README on GitHub to learn more about Paydirt and how to set up your own custom scrubbing config.
There's Gold in That Data!®
Built-in scrubbing · zero configuration
What Paydirt redacts out of the box.
Network & identity
IPv4/IPv6 addresses, AWS ip- hostnames, emails, FQDNs, UNC paths, domain usernames, MAC addresses.
Credentials & tokens
PEM private keys, AWS keys, GitHub PATs, Slack & Stripe tokens, JWTs, Google API keys, Authorization headers, URL query-string secrets.
PII / PHI
SSNs (valid ranges), Luhn-validated credit cards, NPIs (45 CFR 162.406), US phone numbers, Windows user SIDs.
CUI markings NIST SP 800-171
Banner, portion, and legacy markings (CUI, FOUO, SBU, NOFORN…) plus ITAR / EAR / DD 254 flags - full-value redaction with metadata-only placeholders.
Whatever else you tell it to
Text substitution, JSON field targeting at any depth, cloud tag structures, and random replacement pools via a simple CSV config.
Validated, not naive
Validators run inside the matchers, so ordinary order IDs, tracking numbers, and timestamps aren't mistaken for SSNs, cards, or NPIs.
Compliance-aware by design
Built for regulated data.
One scrubbing engine, two front ends
Run it however your environment allows.
Paydirt.html
Download the file, double-click, drop logs on it. Pure HTML/CSS/JS, runs entirely offline. Ideal for locked-down or air-gapped environments where you can't install Python.
log_scrubber.py
A CLI and importable library for automation, batch processing, and pipeline integration. Python 3.9+, standard library only.
Both use the same configuration format and produce identical output for identical input - verified by an automated parity test.
Free & open source. Yours to keep.
Paydirt is the sanitize stage of the Machine Data Insights pipeline - assess with CAT, sanitize with Paydirt, build with Data Refinery.