VaultSort 3.0.0

smart-categorizeaiadvanced-organizeprivacymajor-release

VaultSort 3 introduces Smart Categorize — a private, on-device AI engine that reads your files and proposes categories derived from your real library. No cloud. No API key. Nothing leaves your Mac.

What's New

Smart Categorize (headline feature)

  • On-device semantic understanding. A small (~470 MB, downloaded once) Apple-Silicon-native embedding model reads PDFs, ePubs, Word, plain text, Markdown, code, and more — entirely on your Mac via the Apple Neural Engine.
  • Categories from your content, not a generic list. Smart Categorize clusters your real files and proposes names derived from the patterns it actually finds — Audits, Invoices, Officeholders, Presets — instead of forcing your library into pre-baked buckets.
  • Live wizard. Pick a folder, watch the analysis stream in (300-file reservoir sample, ~2 seconds on M-series), then accept, edit, merge, or drop each proposed category before any file moves.
  • Saved Category Sets. Curate once, reuse across Advanced Organize jobs.
  • Composable with the full Rule Builder. The new semanticCategory condition combines with filename, size, date, tag, and every other Advanced Organize predicate. Example: "if semanticCategory is Receipts AND size > 500 KB, move to /Receipts/Large."

Quality of life

  • Order-independent results. Re-running on the same folder set in any selection order now produces byte-identical proposals.
  • Tighter junk-drawer behaviour. Prose-heavy clusters require ≥ 0.7 cohesion to surface; loose groupings merge into Uncategorized rather than littering the proposal list.
  • Smarter naming. Original casing preserved for acronyms (USAH stays USAH, not Usahs); irregular plurals (Children, Knives, Mice) handled correctly; calendar tokens, copy markers, and filename modifiers (last, latest, prev) excluded from cluster names.
  • Filename-evidence wins on disagreement. When body content and filenames genuinely disagree about what a cluster is "about", the filename signal now wins — matching how users actually read their own folders.

Privacy & telemetry

  • Embedding model integrity verified via SHA-256 checksum at install.
  • Embedding cache scoped per-user, never synced, never transmitted.
  • New smart_categorize_derive_completed and _prepass_completed events surface health metrics (silhouette, cache hit rate, sanity counts) — all aggregate, no file paths or content.

Improvements

  • Fully redesigned Advanced Organize interface to support the Smart Categorize wizard alongside the existing job and rule system.
  • Three-button save-collision dialog when accepting a Category Set whose name already exists (Replace / Save as copy / Cancel).
  • Schema-version stamp on every saved Category Set with forward-compatible migration shell, so older sets continue to load and newer sets are gracefully skipped on older builds.
  • L2-norm sanity guards in both deriver and prepass paths — degenerate embeddings can no longer crash a derive run.
  • Embedding-cache dimension validation on open and on each write.

Bug Fixes

  • Exemplar deduplication: clusters no longer surface multiple files that share the same basename.
  • Fewer "Cluster N" fallbacks; singleton clusters now derive names from their own filenames when TF-IDF has no signal.
  • Prose-heavy junk drawers (loose .md / .txt / .docx groupings) no longer surface as named proposals at low cohesion.
  • Telemetry events now include dropped_non_finite and renormalized counts so embedding-pipeline health is monitorable.

Compatibility

  • macOS 12+ on Apple Silicon. Smart Categorize requires Apple Silicon; the rest of VaultSort 3 runs on Intel as before.
  • Existing rules, jobs, and Category Sets from VaultSort 2.x load unchanged.

Pricing

  • VaultSort 3 ships with a price increase on new licenses. Existing license-holders receive the upgrade as part of their plan.

Stay Updated with VaultSort

Get the latest updates, security tips, and feature announcements delivered to your inbox.

🔒We respect your privacy. Unsubscribe at any time.