Check your software.
Find the software you use. See if it’s approved.
Sysadmin published the list.
The full list is searchable.
Why this policy exists
Unapproved software increases organizational risk. Compromised endpoints affect everyone on the network. Removing off-policy installs is necessary to keep systems within IT control.
When things go wrong
2020: Orion was compromised at the source and ran with full network access across thousands of organizations.
2021: Log4j was embedded in thousands of products. Most organizations had no idea they were running it.
2023: Engineers leaked source code through ChatGPT prompts. No AI tool policy, no visibility into what was running.
Pipeline & methodology
Full audit pipeline — figures below are computed from the dataset this page loads.
group_key per product + publisher · merged rows · install counts retained
—
Step 1 — Ingestion & normalization
Source of truth: Commercial Software to be removed.xlsx — IT Operations export, — rows. Each row is a single install record (software name + manufacturer). Versions are embedded in the name string; there is no separate version column and no per-machine identifier — this dataset is scoped to what is installed, not where.
Normalization (deterministic, repeatable):
- Case-insensitive matching; internal whitespace collapsed
- Version and build tails stripped with explicit regex (trailing numeric patterns and parenthetical suffixes)
- Manufacturer legal suffixes normalized (
Inc.,Incorporated,Corp.,Ltd.,LLC,GmbH, etc.) group_key= normalized product name +||+ normalized vendor key — same key drives deduplication, enrichment cache, and merge- UI displays canonical vendor labels for major publishers for scanability; the underlying key stays stable for engineering
Scoped tradeoff: Aggressive version stripping can fold distinct product lines into one group when the source name collapses to the same base (e.g. major language versions). That is intentional for an org-wide inventory view; line-item policy belongs in IT systems of record, not in this rollup.
Step 2 — Deduplication
All — rows roll up into groups on group_key. Colliding rows merge; install counts aggregate so prevalence is visible. Where the source showed multiple manufacturer strings for one product, the pipeline picks a consistent display string (longest form wins at ingest).
Shape of the data: — distinct product groups. Collapse: — of rows folded into an existing group (— rows merged). Largest single group: — install rows. Single-install products: — of — groups (—%) — typical long tail of one-off tools.
Quality bar: Duplicate keys from messy manufacturer strings are possible; spot-checks during build suggested a sub-1% phantom duplicate rate. If two products collide, they are usually near-duplicates in practice.
Step 3 — Identity enrichment (Pass 1)
— structured Gemini 2.0 Flash calls — one per product group — to fill a fixed identity schema the source spreadsheet does not carry:
- Canonical vendor — aligned to the real publisher, not the noisy string from the export
- License type — closed enum:
Commercial/Open Source/Free/Unknown - Category — 11-way taxonomy so the catalog is filterable and comparable
- Product URL — primary product page where a user can verify what the title refers to
Every response is keyed by group_key and written to data/enrichment_cache.jsonl (append-only). The cache is the audit trail: reruns are idempotent — completed groups are never re-sent unless you intentionally invalidate the cache.
Operational limits: Model knowledge has a cutoff; very new or renamed SKUs may be mislabeled until refreshed. URLs are not live-checked in batch. Categories are mutually exclusive — hybrid products get the closest fit. Those limits are ordinary for LLM-assisted enrichment; the UI still surfaces source row text for audit.
Step 4 — Policy enrichment (Pass 2)
Second structured pass: — calls on the same model stack, separate prompts — policy and risk are intentionally isolated from identity so the model cannot shortcut from a vendor name to a verdict.
- Policy status —
approved/remove/license_required/use_web_version/review_with_sysadmin - Risk tier —
lowthroughcritical(misuse / compromise exposure framing) - Description & recommended action — employee-readable, one screen each
- Data sensitivity — what classes of organizational data the product can touch
- Alternatives — Zoho or no-cost options where they exist
- pass2_confidence — model-reported certainty on the policy label (continuous 0–1), used only for disclosure bands — not as ground truth
Splitting passes is a deliberate architecture choice: combined identity+policy prompts empirically produced anchor bias on the first label. Two narrow contracts per group produce more stable policy output than one kitchen-sink prompt.
Where AI stops: This pass infers posture from public product knowledge — it does not read internal IT policy documents. risk_tier is judgment, not a formal risk register score. Treat pass2_confidence as self-assessment, not verification.
| Status | What it means | Count |
|---|---|---|
| Remove | Out of policy or high risk — or explicit sysadmin removal | — |
| License required | Commercial title — no evidence of a company license on record | — |
| Check sysadmin | Context-dependent — or model confidence below the disclosure threshold | — |
| Use web version | Browser / SaaS path acceptable; local install is not | — |
| Approved | Cleared by sysadmin ruling or high-confidence model pass | — |
—
Step 5 — Authoritative overrides
Sysadmin Community rulings (sourced from Kirijan J's authoritative post for this audit) are codified in hardcoded_rulings.json — pattern + explicit field values. They apply after both enrichment passes, on normalized product names (word-boundary safe).
Overrides win: any field set by a ruling replaces model output for that field. Affected rows carry data_source: sysadmin_ruling. Those rules expand to — rows in this dataset when multiple install names normalize to the same product.
Only — rows carry organizational authority from that channel. Everything else is model-assisted classification with the safeguards described above.
Step 6 — Merge, slim, ship
build_v2_data.py merges deduplicated groups, Pass 1, Pass 2, and overrides into data/software_audit_data_v2.json. Precedence is strict: sysadmin override > Pass 2 > Pass 1 > raw export. The client bundle is then slimmed (field provenance and heavy raw rows stripped) for fast load — full lineage stays in the repository.
Final dataset composition (loading…)
Confidence & disclosure
Pass 2 asks the model for a calibrated pass2_confidence score (0.0–1.0) on its own policy_status label — explicit prompt, same structure every row. That score is used for transparency, not as a quality guarantee.
0.80 cutoff — sampled manual review on a random slice showed that below 0.80, human reviewers disagreed with the model more often; above it, agreement was high. The UI labels rows below the line ai_inferred so employees know to double-check — including all — such rows in this dataset. Nothing is hidden; the banding is disclosure.
Confidence is not verification. It does not prove correctness, policy alignment, or that the product still exists. It only measures how assertive the model was about its own label.
Decision support, not a policy system of record.
Classifications and recommendations are model-assisted unless a row is marked sysadmin_ruling.
Use this tool to understand what is on the network and to drive conversations — not as the sole basis for disciplinary or compliance decisions without human review.
When in doubt, Sysadmin. They own the authoritative interpretation of corporate policy.
Drag the ⠿ handles to reorder columns (recommended on touch) · check to show or hide