diff --git a/bitwarden_license/bit-common/src/dirt/docs/access-intelligence/architecture/report-data-model-evolution.md b/bitwarden_license/bit-common/src/dirt/docs/access-intelligence/architecture/report-data-model-evolution.md new file mode 100644 index 00000000000..2e45d7fa9d8 --- /dev/null +++ b/bitwarden_license/bit-common/src/dirt/docs/access-intelligence/architecture/report-data-model-evolution.md @@ -0,0 +1,807 @@ +# Report Data Model Evolution + +> **Purpose**: Document the old report data model (what's stored today), the updated model +> from PR #17356 (merged, follows BW architecture), and the target model with the member +> registry optimization. This is a reference for understanding why the report was 450MB+ +> and how the member registry solves it. + +--- + +## Table of Contents + +1. [Current Storage Model](#1-current-storage-model-still-in-use--plain-interfaces-no-architecture) +2. [Proposed View Models — Following BW Architecture](#2-proposed-view-models--following-bw-architecture-what-should-be-implemented-next) +3. [Target Model — With Member Registry](#3-target-model--with-member-registry-what-were-building) +4. [Storage Structure Comparison](#4-storage-structure-comparison) +5. [Encryption Approaches (Current vs Future Options)](#5-encryption-approaches-current-vs-future-options) + +--- + +## 1. Current Storage Model (Still In Use) — Plain interfaces, no architecture + +**Status:** This is what's stored in the database today. These are simple TypeScript interfaces/types with no domain/data/view/api layers. No encryption support in the types themselves. Services do all the filtering and transformation. + +**Note:** While PR #17356 introduced architecture patterns, the actual storage structure still uses these plain types directly. The proposed view models (Section 2) describe the architecture we should migrate to next. + +### ApplicationHealthReportDetail (the old report row) + +**Source:** `models/report-models.ts:78-88` (current implementation, still in use) + +**Current structure (with arrays):** + +```typescript +// This is the main report model — one record per application (grouped by URI hostname) +// Used directly by services and UI components +export type ApplicationHealthReportDetail = { + applicationName: string; // hostname (e.g. "google.com") + passwordCount: number; // total ciphers for this app + atRiskPasswordCount: number; // ciphers with weak/reused/exposed passwords + cipherIds: CipherId[]; // IDs of all ciphers in this app - ARRAY + atRiskCipherIds: CipherId[]; // IDs of at-risk ciphers - ARRAY (subset of cipherIds) + memberCount: number; // count of unique members (redundant, = memberDetails.length) + atRiskMemberCount: number; // count of at-risk members (redundant, = atRiskMemberDetails.length) + memberDetails: MemberDetails[]; // ⚠️ FULL member objects repeated per app - ARRAY + atRiskMemberDetails: MemberDetails[]; // ⚠️ FULL member objects for at-risk only (subset of memberDetails) - ARRAY + // Members are deduplicated within a single app but NOT across apps. +}; +``` + +**Proposed structure (with Records for consistency):** + +```typescript +export type ApplicationHealthReportDetail = { + applicationName: string; + passwordCount: number; + atRiskPasswordCount: number; + cipherRefs: Record; // true = at-risk, false = not at-risk (combines cipherIds + atRiskCipherIds) + memberCount: number; // could be removed (= Object.keys(memberRefs).length) + atRiskMemberCount: number; // could be removed (= count of true values in memberRefs) + memberRefs: Record; // true = at-risk, false = not at-risk (combines memberDetails + atRiskMemberDetails) +}; +``` + +**Benefits of Record pattern for ciphers:** + +- ✅ Combines `cipherIds` and `atRiskCipherIds` into single structure +- ✅ No duplicate IDs (prevents data inconsistency) +- ✅ O(1) lookup to check if cipher is at-risk +- ✅ Consistent with `memberRefs` pattern +- ✅ Saves space (~50 bytes per duplicate cipher ID in large orgs) + +### MemberDetails (the old member model) + +**Source:** `models/report-models.ts:16-21` (current implementation, still in use) + +```typescript +// Repeated in EVERY ApplicationHealthReportDetail that a member has access to +// For a large org with 5,000 members accessing 200 apps → duplicated across apps +export type MemberDetails = { + userGuid: string; // Organization user ID (UUID) + userName: string | null; // Display name + email: string; // Email address + cipherId: string; // ⚠️ Meaningless after deduplication (first cipher processed) +}; +``` + +### RiskInsightsData (the storage container) + +**Source:** `models/report-models.ts:121-128` (current implementation, still in use) + +**Current structure (with arrays):** + +**Rename to:** RiskInsights + +```typescript +// The top-level container that is stored in the database +// Each field is encrypted separately as an EncString +export interface RiskInsightsData { + id: OrganizationReportId; // Report ID (generated by API) + creationDate: Date; // When report was generated + contentEncryptionKey: EncString; // Key used to encrypt report data + reportData: ApplicationHealthReportDetail[]; // ⚠️ Main payload - can be 700MB+ + summaryData: OrganizationReportSummary; // Pre-computed aggregates (~1KB) + applicationData: OrganizationReportApplication[]; // Per-app settings (~10KB) - ARRAY with O(n) lookup +} +``` + +**Proposed structure (with Records for O(1) lookup):** + +```typescript +export interface RiskInsights { + id: OrganizationReportId; + creationDate: Date; + contentEncryptionKey: EncString; + reportData: ApplicationHealthReportDetail[]; // Array is still needed here for iteration + summaryData: OrganizationReportSummary; + applicationData: Record; // Record for O(1) lookup +} +``` + +**Current encryption:** Each of `reportData`, `summaryData`, and `applicationData` is JSON.stringify'd and encrypted as a separate EncString. For large orgs, `reportData` is compressed before encryption to avoid WASM size limits. + +### OrganizationReportApplication (per-app user settings) + +**Source:** `models/report-models.ts:64-72` (current implementation, still in use) + +**Current (Array):** Stored as array with O(n) lookup (inefficient) + +**Rename to:** RiskInsightsApplication (If separate model is needed) + +```typescript +// User-defined settings per application (critical flag, review date) +// Stored in the report, carried over between report generations +export type OrganizationReportApplication = { + applicationName: string; // hostname (e.g. "google.com") + isCritical: boolean; // user-defined critical flag + reviewedDate: Date | null; // null = new/unreviewed application +}; +``` + +**Proposed (Record):** Should be stored as Record for O(1) lookup + +```typescript +// Key = applicationName (hostname) +type ApplicationDataRecord = Record; + +// Example: +applicationData: { + "google.com": { isCritical: true, reviewedDate: new Date("2026-01-15") }, + "github.com": { isCritical: false, reviewedDate: null }, // new/unreviewed + "slack.com": { isCritical: true, reviewedDate: new Date("2026-02-01") } +} +``` + +**Problem with current array structure:** + +```typescript +// Current inefficient O(n) lookup pattern found in code: +getCriticalApplications(): RiskInsightsReportView[] { + return this.report.filter((app) => { + const appMeta = this.applications.find((a) => a.hostname === app.applicationName); // O(n)! + return appMeta?.isCritical === true; + }); +} +``` + +**With Record (O(1) lookup):** + +```typescript +getCriticalApplications(): RiskInsightsReportView[] { + return this.report.filter((app) => { + return this.applicationData[app.applicationName]?.isCritical === true; // O(1)! + }); +} +``` + +### OrganizationReportSummary (pre-computed aggregates) + +**Source:** `models/report-models.ts:49-58` (current implementation, still in use) + +**Rename to:** RiskInsightsSummary + +```typescript +// Pre-computed aggregates for summary cards and filtering +// Recomputed when critical application markings change +export type OrganizationReportSummary = { + totalMemberCount: number; // All members in org + totalApplicationCount: number; // All applications in report + totalAtRiskMemberCount: number; // Members with at-risk access + totalAtRiskApplicationCount: number; // Applications with at-risk ciphers + totalCriticalApplicationCount: number; // Applications marked critical + totalCriticalMemberCount: number; // Members with access to critical apps + totalCriticalAtRiskMemberCount: number; // Members with at-risk access to critical apps + totalCriticalAtRiskApplicationCount: number; // Critical apps with at-risk ciphers +}; +``` + +**Note:** When a user marks/unmarks an application as critical, the summary is recomputed. This is why `atRiskMemberDetails[]` is stored separately per application - it allows efficient recalculation of critical app summaries without reprocessing all cipher health data. + +### Why This Was 450MB+ + +The core problem: **`MemberDetails` objects were fully duplicated per application**. + +Example for a large org: + +- 5,000 org members +- 200 applications in the report +- Each member might have access to 50+ applications +- Each `MemberDetails` object ~200 bytes + +**Worst case**: 5,000 members × 50 apps × 200 bytes = ~50MB just for member data +in `memberDetails[]` arrays. With `atRiskMemberDetails[]` duplicated alongside, +plus cipher health data, this easily reached 450MB+. + +This caused: + +1. **WASM encryption panics** — the encrypted blob exceeded SDK size limits +2. **Database storage limits** — even compressed, the JSON was too large for DB fields +3. **Memory pressure** — holding this in a `BehaviorSubject` blocked the UI +4. **Slow report generation** — building all these duplicated member arrays was O(n²) + +--- + +## 2. Proposed View Models — Following BW Architecture (What Should Be Implemented Next) + +**Status:** PR #17356 laid groundwork for architecture patterns, but storage still uses plain types from Section 1. This section describes the view models that SHOULD be implemented to follow Bitwarden's 4-layer pattern: `Api → Data → Domain → View` + +**Important:** These models are NOT currently in use. They represent the target architecture we should migrate to, with query methods replacing facade/orchestrator filtering logic. + +### What's Stored (Current Implementation) + +The current implementation stores the **exact types from Section 1** above: + +- `ApplicationHealthReportDetail` - report rows (700MB+ for large orgs) - using ARRAYS +- `OrganizationReportApplication` - per-app settings (~10KB) - using ARRAY +- `OrganizationReportSummary` - aggregates (~1KB) + +These are stored in `RiskInsightsData` and encrypted as separate EncStrings: + +```typescript +// What gets stored in the database today (using arrays): +RiskInsightsData { + reportData: ApplicationHealthReportDetail[] // ← JSON.stringify → EncString + // Contains duplicate member objects across apps + // Contains duplicate cipher IDs (cipherIds + atRiskCipherIds) + summaryData: OrganizationReportSummary // ← JSON.stringify → EncString + applicationData: OrganizationReportApplication[] // ← JSON.stringify → EncString (array with O(n) lookup) + contentEncryptionKey: EncString + id: OrganizationReportId + creationDate: Date +} +``` + +**Encryption approach:** Each field is JSON.stringify'd, optionally compressed (for `reportData` only, to avoid WASM limits), then encrypted with the `contentEncryptionKey`. + +**Problems with current structure:** + +- Member objects duplicated across applications (576MB for 10K org) +- Cipher and member IDs duplicated in separate arrays (~70MB wasted) +- ApplicationData requires O(n) find operations for every lookup + +### Proposed View Models (For Query Logic) + +The new architecture will introduce domain/view models with query methods. These are **NOT stored** - they're runtime transformations of the stored data. + +#### RiskInsightsView (proposed - replaces facade logic) + +```typescript +class RiskInsightsView { + report: ApplicationHealthReportDetail[]; // Decrypted from storage + applications: OrganizationReportApplication[]; // Decrypted from storage + summary: OrganizationReportSummary; // Decrypted from storage + memberRegistry: MemberRegistry; // ← NEW: Built at load time + createdDate: Date; + + // Query methods (replace current facade/orchestrator filtering): + getAtRiskMembers(): MemberRegistryEntry[]; + getCriticalApplications(): ApplicationHealthReportDetail[]; + getApplicationByHostname(hostname: string): ApplicationHealthReportDetail | undefined; + getNewApplications(): ApplicationHealthReportDetail[]; // reviewedDate === null + getSummary(): OrganizationReportSummary; +} +``` + +**Note:** The view model will have query methods, but the underlying storage structure (Section 1) remains the same until we implement the member registry optimization (Section 3). + +--- + +## 3. Target Model — With Member Registry (What We're Building) + +**Key optimization:** Replace duplicated `MemberDetails[]` arrays with lightweight member ID references that point into a shared `MemberRegistry`. This reduces a 10K org report from ~786MB to ~173MB (78% reduction). + +**Storage changes:** + +- Store members ONCE in a registry (not per application) +- Store only member IDs (userGuids) in application records +- Remove meaningless `cipherId` field from member data +- Combine `memberDetails` and `atRiskMemberDetails` into single array with flag (OR keep separate arrays with IDs only) + +### MemberRegistry (new — deduplicated member lookup) + +```typescript +// Single source of truth for member data in a report +// Stored once, referenced by index from every application that member appears in +class MemberRegistry { + // Map from org user ID → full member entry + private entries: Map; + + get(id: OrganizationUserId): MemberRegistryEntry | undefined; + getAll(): MemberRegistryEntry[]; + size(): number; +} + +interface MemberRegistryEntry { + id: OrganizationUserId; + userName: string; + email: string; + // Any other member metadata needed by the UI +} +``` + +### Member References (new — Record with at-risk flag) + +Instead of duplicating full member objects per application, each application stores member IDs as a `Record`, where: + +- **Key** = member ID (userGuid) +- **Value** = `true` if at-risk, `false` if not at-risk + +This provides: + +- **O(1) lookup** for checking membership and at-risk status +- **Automatic deduplication** (can't have duplicate keys) +- **Single source** for both member list and at-risk status +- **No duplicate IDs** (previously stored in both memberDetails and atRiskMemberDetails) + +```typescript +// Stored as a Record where value indicates at-risk status +type MemberRefs = Record; + +// Example: +memberRefs: { + "abc-123": true, // at-risk member + "def-456": false, // not at-risk + "ghi-789": true // at-risk member +} +``` + +### Updated RiskInsightsReportView (with registry references) + +```typescript +class RiskInsightsReportView { + applicationName: string; + passwordCount: number; + atRiskPasswordCount: number; + weakPasswordCount: number; + reusedPasswordCount: number; + exposedPasswordCount: number; + + // OLD: memberDetails: MemberDetails[] + atRiskMemberDetails: MemberDetails[] (duplicated arrays) + // NEW: Single Record with at-risk flag + memberRefs: Record; // { "abc": true, "def": false, ... } + + // OLD: cipherIds: CipherId[] + atRiskCipherIds: CipherId[] (duplicated arrays) + // NEW: Single Record with at-risk flag + cipherRefs: Record; // { "cipher-1": true, "cipher-2": false, ... } + + // The registry is held by the parent RiskInsightsView + // View model methods resolve refs → full entries on demand: + + getAllMembers(registry: MemberRegistry): MemberRegistryEntry[] { + return Object.keys(this.memberRefs) + .map((id) => registry.get(id as OrganizationUserId)) + .filter(Boolean); + } + + getAtRiskMembers(registry: MemberRegistry): MemberRegistryEntry[] { + return Object.entries(this.memberRefs) + .filter(([_, isAtRisk]) => isAtRisk) + .map(([id]) => registry.get(id as OrganizationUserId)) + .filter(Boolean); + } + + isAtRisk(): boolean { + return this.atRiskPasswordCount > 0; + } + + hasMember(memberId: OrganizationUserId): boolean { + return memberId in this.memberRefs; // O(1) lookup + } + + isMemberAtRisk(memberId: OrganizationUserId): boolean { + return this.memberRefs[memberId] === true; // O(1) lookup + } +} +``` + +### Updated RiskInsightsView (parent, holds registry) + +```typescript +class RiskInsightsView { + report: RiskInsightsReportView[]; + applications: Record; + summary: RiskInsightsSummaryView; + memberRegistry: MemberRegistry; // ← shared, deduplicated + createdDate: Date; + + // Smart query methods — these replace facade/orchestrator filtering logic: + + getAtRiskMembers(): MemberRegistryEntry[] { + // Deduplicate across all at-risk apps + const ids = new Set(); + for (const app of this.report) { + if (app.isAtRisk()) { + // memberRefs is a Record, iterate entries and filter for at-risk (value === true) + Object.entries(app.memberRefs).forEach(([id, isAtRisk]) => { + if (isAtRisk) ids.add(id as OrganizationUserId); + }); + } + } + return [...ids].map((id) => this.memberRegistry.get(id)).filter(Boolean); + } + + getCriticalApplications(): RiskInsightsReportView[] { + // OLD (O(n)): this.applications.find((a) => a.hostname === app.applicationName) + // NEW (O(1)): this.applicationData[app.applicationName] + return this.report.filter((app) => { + return this.applicationData[app.applicationName]?.isCritical === true; + }); + } + + getApplicationByHostname(hostname: string): RiskInsightsReportView | undefined { + return this.report.find((app) => app.applicationName === hostname); + } + + getNewApplications(): RiskInsightsReportView[] { + // OLD (O(n)): this.applications.find((a) => a.hostname === app.applicationName) + // NEW (O(1)): this.applicationData[app.applicationName] + return this.report.filter((app) => { + return this.applicationData[app.applicationName]?.reviewedDate === null; + }); + } + + getSummary(): RiskInsightsSummaryView { + return this.summary; + } +} +``` + +### Size Impact: Current vs Target + +#### Current (700MB+ for large orgs) + +**10K member org:** + +- `memberDetails`: 400 apps × 5,000 members × 180 bytes = **360MB** +- `atRiskMemberDetails`: 400 apps × 3,000 members × 180 bytes = **216MB** +- Cipher IDs + metadata: **~15MB** +- **Total unencrypted: ~591MB** +- **After encryption + Base64: ~786MB** + +#### Target (With Registry + Record Pattern) + +**10K member org:** + +- **MemberRegistry**: 10,000 members × 140 bytes (no cipherId) = **1.4MB** (stored once) +- **memberRefs**: 400 apps × 5,000 refs × 50 bytes (Record entry: `"id": false/true`) = **100MB** + - No separate atRiskMemberRefs needed - at-risk status is the boolean value +- **cipherRefs**: 400 apps × 100 ciphers × 50 bytes (Record entry: `"id": false/true`) = **2MB** + - No separate atRiskCipherIds array needed - at-risk status is the boolean value +- **applicationData** (as Record): 400 apps × 100 bytes = **0.04MB** (negligible) +- Metadata (counts, applicationName): **~10MB** +- **Total unencrypted: ~113MB** +- **After encryption + Base64: ~150MB** + +**Reduction: 786MB → 150MB = 81% smaller** 🎉 + +**Design Decision:** Use single `Record` for members, ciphers, AND Record for applicationData: + +- **memberRefs:** No duplicate member IDs, ~60MB saved (vs separate atRiskMemberDetails) +- **cipherRefs:** No duplicate cipher IDs, ~10MB saved (vs separate atRiskCipherIds array) +- **applicationData:** O(1) lookup, no functional size change but better performance +- **Trade-off:** Definitely worth it - saves ~70MB and prevents duplicate storage + +--- + +## 4. Storage Structure Comparison + +### Current Storage (What's in DB Today) + +```typescript +// Stored as RiskInsightsData in database +{ + id: OrganizationReportId, + creationDate: Date, + contentEncryptionKey: EncString, + + // ENCRYPTED FIELD 1: reportData (~700MB for large orgs) + reportData: [ + { + applicationName: "google.com", + cipherIds: ["cipher-id-1", "cipher-id-2", ...], // ~100 ciphers - ARRAY + atRiskCipherIds: ["cipher-id-1", ...], // ~50 at-risk - ARRAY (duplicates IDs from cipherIds) + memberDetails: [ // ~5,000 members - ARRAY + { userGuid: "abc", userName: "Alice", email: "alice@...", cipherId: "x" }, + { userGuid: "def", userName: "Bob", email: "bob@...", cipherId: "y" }, + // ... FULL member objects, deduplicated per app, duplicated across apps + ], + atRiskMemberDetails: [ // ~3,000 at-risk members - ARRAY (duplicates from memberDetails) + { userGuid: "abc", userName: "Alice", email: "alice@...", cipherId: "x" }, + // ... FULL member objects (subset of memberDetails) + ], + passwordCount: 100, + atRiskPasswordCount: 50, + memberCount: 5000, + atRiskMemberCount: 3000 + }, + // ... 400 applications + ], + + // ENCRYPTED FIELD 2: applicationData (~10KB) - ARRAY with O(n) lookup + applicationData: [ + { applicationName: "google.com", isCritical: true, reviewedDate: Date | null }, + // ... 400 applications + ], + + // ENCRYPTED FIELD 3: summaryData (~1KB) + summaryData: { + totalMemberCount: 10000, + totalApplicationCount: 400, + totalAtRiskMemberCount: 6000, + totalAtRiskApplicationCount: 300, + totalCriticalApplicationCount: 50, + totalCriticalMemberCount: 8000, + totalCriticalAtRiskMemberCount: 4500, + totalCriticalAtRiskApplicationCount: 40 + } +} +``` + +**Total size:** ~786MB encrypted for 10K member org +**Problem:** Member data duplicated across applications (360MB + 216MB = 576MB just for members) + +--- + +### Target Storage (With Member Registry) + +```typescript +// Stored as RiskInsightsData in database +{ + id: OrganizationReportId, + creationDate: Date, + contentEncryptionKey: EncString, + + // NEW: ENCRYPTED FIELD 0: memberRegistry (~1.4MB for 10K members) + memberRegistry: { + "abc": { userGuid: "abc", userName: "Alice", email: "alice@..." }, + "def": { userGuid: "def", userName: "Bob", email: "bob@..." }, + // ... 10,000 members stored ONCE + }, + + // ENCRYPTED FIELD 1: reportData (~116MB for 10K org - 80% reduction!) + reportData: [ + { + applicationName: "google.com", + cipherRefs: { // ~100 cipher IDs with at-risk flag - RECORD + "cipher-id-1": true, // at-risk + "cipher-id-2": false, // not at-risk + "cipher-id-3": true, // at-risk + // ... (no separate atRiskCipherIds array needed) + }, + memberRefs: { // ~5,000 member IDs with at-risk flag - RECORD + "abc": true, // at-risk member + "def": false, // not at-risk + "ghi": true, // at-risk member + // ... (no separate atRiskMemberRefs array needed) + }, + passwordCount: 100, + atRiskPasswordCount: 50, + memberCount: 5000, + atRiskMemberCount: 3000 + }, + // ... 400 applications + ], + + // ENCRYPTED FIELD 2: applicationData (~10KB) - RECORD with O(1) lookup + applicationData: { + "google.com": { isCritical: true, reviewedDate: new Date("2026-01-15") }, + "github.com": { isCritical: false, reviewedDate: null }, + "slack.com": { isCritical: true, reviewedDate: new Date("2026-02-01") } + // ... 400 applications as Record entries + }, + + // ENCRYPTED FIELD 3: summaryData (~1KB - unchanged) + summaryData: { + totalMemberCount: 10000, + totalApplicationCount: 400, + totalAtRiskMemberCount: 6000, + totalAtRiskApplicationCount: 300, + totalCriticalApplicationCount: 50, + totalCriticalMemberCount: 8000, + totalCriticalAtRiskMemberCount: 4500, + totalCriticalAtRiskApplicationCount: 40 + } +} +``` + +**Total size:** ~150MB encrypted for 10K member org (81% reduction) +**Benefits:** + +- Members stored once in registry, referenced by ID from applications +- Member and cipher IDs stored with at-risk flag (no duplicate arrays) +- ApplicationData as Record enables O(1) lookup instead of O(n) find operations + +--- + +### Design Decision: Single Record with Boolean Flag (for Members AND Ciphers) + +**Chosen approach:** Use single `Record` where the boolean indicates at-risk status for BOTH members and ciphers. + +```typescript +{ + applicationName: "google.com", + memberRefs: { + "abc": true, // at-risk member + "def": false, // not at-risk + "ghi": true // at-risk member + }, + cipherRefs: { + "cipher-1": true, // at-risk cipher + "cipher-2": false, // not at-risk + "cipher-3": true // at-risk cipher + } +} +``` + +**Pros:** + +- ✅ **Members:** No duplicate IDs (previously stored in both memberDetails AND atRiskMemberDetails) +- ✅ **Ciphers:** No duplicate IDs (previously stored in both cipherIds AND atRiskCipherIds) +- ✅ O(1) lookup for both membership/presence and at-risk status +- ✅ Automatic deduplication (can't have duplicate keys) +- ✅ Saves ~60MB for members + ~10MB for ciphers = **~70MB saved** compared to separate arrays +- ✅ Clear intent - one ID, one entry, one flag +- ✅ Consistent pattern across both members and ciphers + +**Cons:** + +- ⚠️ Slightly more complex iteration (need to check boolean value when filtering at-risk) +- ⚠️ Summary recalculation requires iterating entries instead of just counting keys + +**Trade-off Analysis:** + +- **Member size savings:** ~60MB (400 apps × 3K at-risk IDs × 50 bytes per duplicate entry) +- **Cipher size savings:** ~10MB (400 apps × 50 at-risk IDs × 50 bytes per duplicate entry) +- **Total savings:** ~70MB for 10K org +- **Performance:** Negligible - `Object.entries().filter()` is still O(n) like array iteration +- **Correctness:** Better - impossible to have ID in at-risk array but not in main array + +**Verdict:** Single Record with boolean flag is the clear winner for both members AND ciphers. + +--- + +## 5. Encryption Approaches (Current vs Future Options) + +### Current: Encrypt Whole Objects (No Compression) + +**How it works:** + +1. `JSON.stringify(reportData)` → encrypt → EncString (~700MB for large orgs) +2. `JSON.stringify(summaryData)` → encrypt → EncString (~1KB) +3. `JSON.stringify(applicationData)` → encrypt → EncString (~10KB) + +**Stored structure:** + +```typescript +{ + reportData: EncString, // ← Entire reportData[] array as one encrypted blob + summaryData: EncString, // ← Entire summary object as one encrypted blob + applicationData: EncString // ← Entire applicationData[] array as one encrypted blob + contentEncryptionKey: EncString, + id: OrganizationReportId, + creationDate: Date +} +``` + +**Problems:** + +- For large orgs (700MB+), may approach or exceed WASM encryption limits +- Must decrypt entire report to access any application +- Can't do field-level encryption with this approach + +--- + +### Option 1: Encrypt Per Top-Level Field (Current Approach) + +Encrypt `reportData`, `summaryData`, `applicationData` as separate EncStrings. + +**Pros:** + +- ✅ Allows decrypting summary without decrypting full report +- ✅ Simple encryption logic +- ✅ Separates metadata (summary, applicationData) from payload (reportData) + +**Cons:** + +- ❌ Can't access individual applications without decrypting entire report +- ❌ Can't do field-level encryption +- ❌ May hit WASM limits for very large orgs (700MB+ unencrypted) + +**Status:** This is what we have today. + +--- + +### Option 2: True Field-Level Encryption (Ideal) + +**Each field** within each object is encrypted separately, preserving JSON structure: + +```typescript +{ + memberRegistry: { + "abc": { + userGuid: EncString("abc"), + userName: EncString("Alice"), + email: EncString("alice@...") + }, + "def": { /* ... */ } + }, + reportData: [ + { + applicationName: EncString("google.com"), + cipherIds: [EncString("id1"), EncString("id2"), ...], + memberRefs: [EncString("abc"), EncString("def"), ...], + atRiskMemberRefs: [EncString("abc"), ...], + passwordCount: EncString("100"), + atRiskPasswordCount: EncString("50"), + memberCount: EncString("5000"), + atRiskMemberCount: EncString("3000") + }, + // ... each application + ], + summaryData: { + totalMemberCount: EncString("10000"), + totalApplicationCount: EncString("400"), + // ... each field encrypted + }, + applicationData: [ + { + applicationName: EncString("google.com"), + isCritical: EncString("true"), + reviewedDate: EncString("2026-02-10") + }, + // ... each application + ] +} +``` + +**Pros:** + +- ✅ Can decrypt individual fields on-demand +- ✅ Can access single application without decrypting all +- ✅ Each field is small enough for SDK (no size limits) +- ✅ Better for partial updates (re-encrypt only changed fields) +- ✅ Aligns with Bitwarden's data model architecture + +**Cons:** + +- ❌ More complex encryption/decryption logic +- ❌ Slightly larger overhead (each EncString has IV + metadata ~20 bytes) +- ❌ Requires updating all encryption/decryption code paths + +**Status:** **Can be implemented alongside member registry.** The member registry will reduce report size (making this easier), but field-level encryption is not blocked by it. + +**Size estimate with field-level encryption:** + +- Member registry (10K members): ~1.4MB unencrypted → ~2MB encrypted (each field encrypted) +- Report data: ~116MB unencrypted → ~145MB encrypted (overhead from EncString metadata) +- **Total: ~147MB** (vs ~154MB with whole-object encryption) + +Field-level encryption adds ~10MB overhead but enables partial decryption and avoids WASM limits. + +--- + +### Option 3: Compress Then Encrypt (Draft PR, Want to Avoid) + +Compress `reportData` before encrypting. `summaryData` and `applicationData` remain uncompressed (TBD if `applicationData` needs compression). + +**How it would work:** + +1. `JSON.stringify(reportData)` → compress with pako → encrypt → EncString +2. `JSON.stringify(summaryData)` → encrypt → EncString (no compression) +3. `JSON.stringify(applicationData)` → encrypt → EncString (compression TBD) + +**Pros:** + +- ✅ Stored object is as small as we can get it (compression reduces size by ~70%) +- ✅ Works for very large orgs without hitting WASM limits + +**Cons:** + +- ❌ Can't decrypt summary without decompressing everything if whole object is compressed +- ❌ Makes field-level encryption impossible (can't decrypt individual fields from compressed blob) +- ❌ More complex decryption logic (decompress → decrypt) +- ❌ Not the direction we want to go architecturally + +**Decision:** **Avoid if possible.** This was explored in a draft PR as a workaround, but we'd prefer to implement member registry (reduces size without compression) and move toward field-level encryption (Option 2).