[PM-31939] Access Intelligence Documentation: Report Data Model Evolution (#18879)

* Add report-data-model-evolution document * Change memberRefs to one record with flag for at risk or not * Update model evolution doc * Remove implementation section in favor of jira tracking * Remove todo comment * Add table of contents
2026-02-11 22:13:32 +00:00 · 2026-02-10 19:01:52 -06:00
parent 8f6cf67f8d
commit 7ccf1263a0
1 changed files with 807 additions and 0 deletions
--- a/bitwarden_license/bit-common/src/dirt/docs/access-intelligence/architecture/report-data-model-evolution.md
+++ b/bitwarden_license/bit-common/src/dirt/docs/access-intelligence/architecture/report-data-model-evolution.md
@@ -0,0 +1,807 @@
+# Report Data Model Evolution
+
+> **Purpose**: Document the old report data model (what's stored today), the updated model
+> from PR #17356 (merged, follows BW architecture), and the target model with the member
+> registry optimization. This is a reference for understanding why the report was 450MB+
+> and how the member registry solves it.
+
+---
+
+## Table of Contents
+
+1. [Current Storage Model](#1-current-storage-model-still-in-use--plain-interfaces-no-architecture)
+2. [Proposed View Models — Following BW Architecture](#2-proposed-view-models--following-bw-architecture-what-should-be-implemented-next)
+3. [Target Model — With Member Registry](#3-target-model--with-member-registry-what-were-building)
+4. [Storage Structure Comparison](#4-storage-structure-comparison)
+5. [Encryption Approaches (Current vs Future Options)](#5-encryption-approaches-current-vs-future-options)
+
+---
+
+## 1. Current Storage Model (Still In Use) — Plain interfaces, no architecture
+
+**Status:** This is what's stored in the database today. These are simple TypeScript interfaces/types with no domain/data/view/api layers. No encryption support in the types themselves. Services do all the filtering and transformation.
+
+**Note:** While PR #17356 introduced architecture patterns, the actual storage structure still uses these plain types directly. The proposed view models (Section 2) describe the architecture we should migrate to next.
+
+### ApplicationHealthReportDetail (the old report row)
+
+**Source:** `models/report-models.ts:78-88` (current implementation, still in use)
+
+**Current structure (with arrays):**
+
+```typescript
+// This is the main report model — one record per application (grouped by URI hostname)
+// Used directly by services and UI components
+export type ApplicationHealthReportDetail = {
+  applicationName: string; // hostname (e.g. "google.com")
+  passwordCount: number; // total ciphers for this app
+  atRiskPasswordCount: number; // ciphers with weak/reused/exposed passwords
+  cipherIds: CipherId[]; // IDs of all ciphers in this app - ARRAY
+  atRiskCipherIds: CipherId[]; // IDs of at-risk ciphers - ARRAY (subset of cipherIds)
+  memberCount: number; // count of unique members (redundant, = memberDetails.length)
+  atRiskMemberCount: number; // count of at-risk members (redundant, = atRiskMemberDetails.length)
+  memberDetails: MemberDetails[]; // ⚠️ FULL member objects repeated per app - ARRAY
+  atRiskMemberDetails: MemberDetails[]; // ⚠️ FULL member objects for at-risk only (subset of memberDetails) - ARRAY
+  // Members are deduplicated within a single app but NOT across apps.
+};
+```
+
+**Proposed structure (with Records for consistency):**
+
+```typescript
+export type ApplicationHealthReportDetail = {
+  applicationName: string;
+  passwordCount: number;
+  atRiskPasswordCount: number;
+  cipherRefs: Record<CipherId, boolean>; // true = at-risk, false = not at-risk (combines cipherIds + atRiskCipherIds)
+  memberCount: number; // could be removed (= Object.keys(memberRefs).length)
+  atRiskMemberCount: number; // could be removed (= count of true values in memberRefs)
+  memberRefs: Record<OrganizationUserId, boolean>; // true = at-risk, false = not at-risk (combines memberDetails + atRiskMemberDetails)
+};
+```
+
+**Benefits of Record pattern for ciphers:**
+
+- ✅ Combines `cipherIds` and `atRiskCipherIds` into single structure
+- ✅ No duplicate IDs (prevents data inconsistency)
+- ✅ O(1) lookup to check if cipher is at-risk
+- ✅ Consistent with `memberRefs` pattern
+- ✅ Saves space (~50 bytes per duplicate cipher ID in large orgs)
+
+### MemberDetails (the old member model)
+
+**Source:** `models/report-models.ts:16-21` (current implementation, still in use)
+
+```typescript
+// Repeated in EVERY ApplicationHealthReportDetail that a member has access to
+// For a large org with 5,000 members accessing 200 apps → duplicated across apps
+export type MemberDetails = {
+  userGuid: string; // Organization user ID (UUID)
+  userName: string | null; // Display name
+  email: string; // Email address
+  cipherId: string; // ⚠️ Meaningless after deduplication (first cipher processed)
+};
+```
+
+### RiskInsightsData (the storage container)
+
+**Source:** `models/report-models.ts:121-128` (current implementation, still in use)
+
+**Current structure (with arrays):**
+
+**Rename to:** RiskInsights
+
+```typescript
+// The top-level container that is stored in the database
+// Each field is encrypted separately as an EncString
+export interface RiskInsightsData {
+  id: OrganizationReportId; // Report ID (generated by API)
+  creationDate: Date; // When report was generated
+  contentEncryptionKey: EncString; // Key used to encrypt report data
+  reportData: ApplicationHealthReportDetail[]; // ⚠️ Main payload - can be 700MB+
+  summaryData: OrganizationReportSummary; // Pre-computed aggregates (~1KB)
+  applicationData: OrganizationReportApplication[]; // Per-app settings (~10KB) - ARRAY with O(n) lookup
+}
+```
+
+**Proposed structure (with Records for O(1) lookup):**
+
+```typescript
+export interface RiskInsights {
+  id: OrganizationReportId;
+  creationDate: Date;
+  contentEncryptionKey: EncString;
+  reportData: ApplicationHealthReportDetail[]; // Array is still needed here for iteration
+  summaryData: OrganizationReportSummary;
+  applicationData: Record<string, { isCritical: boolean; reviewedDate: Date | null }>; // Record for O(1) lookup
+}
+```
+
+**Current encryption:** Each of `reportData`, `summaryData`, and `applicationData` is JSON.stringify'd and encrypted as a separate EncString. For large orgs, `reportData` is compressed before encryption to avoid WASM size limits.
+
+### OrganizationReportApplication (per-app user settings)
+
+**Source:** `models/report-models.ts:64-72` (current implementation, still in use)
+
+**Current (Array):** Stored as array with O(n) lookup (inefficient)
+
+**Rename to:** RiskInsightsApplication (If separate model is needed)
+
+```typescript
+// User-defined settings per application (critical flag, review date)
+// Stored in the report, carried over between report generations
+export type OrganizationReportApplication = {
+  applicationName: string; // hostname (e.g. "google.com")
+  isCritical: boolean; // user-defined critical flag
+  reviewedDate: Date | null; // null = new/unreviewed application
+};
+```
+
+**Proposed (Record):** Should be stored as Record for O(1) lookup
+
+```typescript
+// Key = applicationName (hostname)
+type ApplicationDataRecord = Record<string, {
+  isCritical: boolean;
+  reviewedDate: Date | null;
+}>;
+
+// Example:
+applicationData: {
+  "google.com": { isCritical: true, reviewedDate: new Date("2026-01-15") },
+  "github.com": { isCritical: false, reviewedDate: null },  // new/unreviewed
+  "slack.com": { isCritical: true, reviewedDate: new Date("2026-02-01") }
+}
+```
+
+**Problem with current array structure:**
+
+```typescript
+// Current inefficient O(n) lookup pattern found in code:
+getCriticalApplications(): RiskInsightsReportView[] {
+  return this.report.filter((app) => {
+    const appMeta = this.applications.find((a) => a.hostname === app.applicationName);  // O(n)!
+    return appMeta?.isCritical === true;
+  });
+}
+```
+
+**With Record (O(1) lookup):**
+
+```typescript
+getCriticalApplications(): RiskInsightsReportView[] {
+  return this.report.filter((app) => {
+    return this.applicationData[app.applicationName]?.isCritical === true;  // O(1)!
+  });
+}
+```
+
+### OrganizationReportSummary (pre-computed aggregates)
+
+**Source:** `models/report-models.ts:49-58` (current implementation, still in use)
+
+**Rename to:** RiskInsightsSummary
+
+```typescript
+// Pre-computed aggregates for summary cards and filtering
+// Recomputed when critical application markings change
+export type OrganizationReportSummary = {
+  totalMemberCount: number; // All members in org
+  totalApplicationCount: number; // All applications in report
+  totalAtRiskMemberCount: number; // Members with at-risk access
+  totalAtRiskApplicationCount: number; // Applications with at-risk ciphers
+  totalCriticalApplicationCount: number; // Applications marked critical
+  totalCriticalMemberCount: number; // Members with access to critical apps
+  totalCriticalAtRiskMemberCount: number; // Members with at-risk access to critical apps
+  totalCriticalAtRiskApplicationCount: number; // Critical apps with at-risk ciphers
+};
+```
+
+**Note:** When a user marks/unmarks an application as critical, the summary is recomputed. This is why `atRiskMemberDetails[]` is stored separately per application - it allows efficient recalculation of critical app summaries without reprocessing all cipher health data.
+
+### Why This Was 450MB+
+
+The core problem: **`MemberDetails` objects were fully duplicated per application**.
+
+Example for a large org:
+
+- 5,000 org members
+- 200 applications in the report
+- Each member might have access to 50+ applications
+- Each `MemberDetails` object ~200 bytes
+
+**Worst case**: 5,000 members × 50 apps × 200 bytes = ~50MB just for member data
+in `memberDetails[]` arrays. With `atRiskMemberDetails[]` duplicated alongside,
+plus cipher health data, this easily reached 450MB+.
+
+This caused:
+
+1. **WASM encryption panics** — the encrypted blob exceeded SDK size limits
+2. **Database storage limits** — even compressed, the JSON was too large for DB fields
+3. **Memory pressure** — holding this in a `BehaviorSubject` blocked the UI
+4. **Slow report generation** — building all these duplicated member arrays was O(n²)
+
+---
+
+## 2. Proposed View Models — Following BW Architecture (What Should Be Implemented Next)
+
+**Status:** PR #17356 laid groundwork for architecture patterns, but storage still uses plain types from Section 1. This section describes the view models that SHOULD be implemented to follow Bitwarden's 4-layer pattern: `Api → Data → Domain → View`
+
+**Important:** These models are NOT currently in use. They represent the target architecture we should migrate to, with query methods replacing facade/orchestrator filtering logic.
+
+### What's Stored (Current Implementation)
+
+The current implementation stores the **exact types from Section 1** above:
+
+- `ApplicationHealthReportDetail` - report rows (700MB+ for large orgs) - using ARRAYS
+- `OrganizationReportApplication` - per-app settings (~10KB) - using ARRAY
+- `OrganizationReportSummary` - aggregates (~1KB)
+
+These are stored in `RiskInsightsData` and encrypted as separate EncStrings:
+
+```typescript
+// What gets stored in the database today (using arrays):
+RiskInsightsData {
+  reportData: ApplicationHealthReportDetail[]    // ← JSON.stringify → EncString
+                                                  // Contains duplicate member objects across apps
+                                                  // Contains duplicate cipher IDs (cipherIds + atRiskCipherIds)
+  summaryData: OrganizationReportSummary         // ← JSON.stringify → EncString
+  applicationData: OrganizationReportApplication[] // ← JSON.stringify → EncString (array with O(n) lookup)
+  contentEncryptionKey: EncString
+  id: OrganizationReportId
+  creationDate: Date
+}
+```
+
+**Encryption approach:** Each field is JSON.stringify'd, optionally compressed (for `reportData` only, to avoid WASM limits), then encrypted with the `contentEncryptionKey`.
+
+**Problems with current structure:**
+
+- Member objects duplicated across applications (576MB for 10K org)
+- Cipher and member IDs duplicated in separate arrays (~70MB wasted)
+- ApplicationData requires O(n) find operations for every lookup
+
+### Proposed View Models (For Query Logic)
+
+The new architecture will introduce domain/view models with query methods. These are **NOT stored** - they're runtime transformations of the stored data.
+
+#### RiskInsightsView (proposed - replaces facade logic)
+
+```typescript
+class RiskInsightsView {
+  report: ApplicationHealthReportDetail[]; // Decrypted from storage
+  applications: OrganizationReportApplication[]; // Decrypted from storage
+  summary: OrganizationReportSummary; // Decrypted from storage
+  memberRegistry: MemberRegistry; // ← NEW: Built at load time
+  createdDate: Date;
+
+  // Query methods (replace current facade/orchestrator filtering):
+  getAtRiskMembers(): MemberRegistryEntry[];
+  getCriticalApplications(): ApplicationHealthReportDetail[];
+  getApplicationByHostname(hostname: string): ApplicationHealthReportDetail | undefined;
+  getNewApplications(): ApplicationHealthReportDetail[]; // reviewedDate === null
+  getSummary(): OrganizationReportSummary;
+}
+```
+
+**Note:** The view model will have query methods, but the underlying storage structure (Section 1) remains the same until we implement the member registry optimization (Section 3).
+
+---
+
+## 3. Target Model — With Member Registry (What We're Building)
+
+**Key optimization:** Replace duplicated `MemberDetails[]` arrays with lightweight member ID references that point into a shared `MemberRegistry`. This reduces a 10K org report from ~786MB to ~173MB (78% reduction).
+
+**Storage changes:**
+
+- Store members ONCE in a registry (not per application)
+- Store only member IDs (userGuids) in application records
+- Remove meaningless `cipherId` field from member data
+- Combine `memberDetails` and `atRiskMemberDetails` into single array with flag (OR keep separate arrays with IDs only)
+
+### MemberRegistry (new — deduplicated member lookup)
+
+```typescript
+// Single source of truth for member data in a report
+// Stored once, referenced by index from every application that member appears in
+class MemberRegistry {
+  // Map from org user ID → full member entry
+  private entries: Map<OrganizationUserId, MemberRegistryEntry>;
+
+  get(id: OrganizationUserId): MemberRegistryEntry | undefined;
+  getAll(): MemberRegistryEntry[];
+  size(): number;
+}
+
+interface MemberRegistryEntry {
+  id: OrganizationUserId;
+  userName: string;
+  email: string;
+  // Any other member metadata needed by the UI
+}
+```
+
+### Member References (new — Record with at-risk flag)
+
+Instead of duplicating full member objects per application, each application stores member IDs as a `Record<string, boolean>`, where:
+
+- **Key** = member ID (userGuid)
+- **Value** = `true` if at-risk, `false` if not at-risk
+
+This provides:
+
+- **O(1) lookup** for checking membership and at-risk status
+- **Automatic deduplication** (can't have duplicate keys)
+- **Single source** for both member list and at-risk status
+- **No duplicate IDs** (previously stored in both memberDetails and atRiskMemberDetails)
+
+```typescript
+// Stored as a Record<string, boolean> where value indicates at-risk status
+type MemberRefs = Record<OrganizationUserId, boolean>;
+
+// Example:
+memberRefs: {
+  "abc-123": true,   // at-risk member
+  "def-456": false,  // not at-risk
+  "ghi-789": true    // at-risk member
+}
+```
+
+### Updated RiskInsightsReportView (with registry references)
+
+```typescript
+class RiskInsightsReportView {
+  applicationName: string;
+  passwordCount: number;
+  atRiskPasswordCount: number;
+  weakPasswordCount: number;
+  reusedPasswordCount: number;
+  exposedPasswordCount: number;
+
+  // OLD: memberDetails: MemberDetails[] + atRiskMemberDetails: MemberDetails[] (duplicated arrays)
+  // NEW: Single Record with at-risk flag
+  memberRefs: Record<OrganizationUserId, boolean>; // { "abc": true, "def": false, ... }
+
+  // OLD: cipherIds: CipherId[] + atRiskCipherIds: CipherId[] (duplicated arrays)
+  // NEW: Single Record with at-risk flag
+  cipherRefs: Record<CipherId, boolean>; // { "cipher-1": true, "cipher-2": false, ... }
+
+  // The registry is held by the parent RiskInsightsView
+  // View model methods resolve refs → full entries on demand:
+
+  getAllMembers(registry: MemberRegistry): MemberRegistryEntry[] {
+    return Object.keys(this.memberRefs)
+      .map((id) => registry.get(id as OrganizationUserId))
+      .filter(Boolean);
+  }
+
+  getAtRiskMembers(registry: MemberRegistry): MemberRegistryEntry[] {
+    return Object.entries(this.memberRefs)
+      .filter(([_, isAtRisk]) => isAtRisk)
+      .map(([id]) => registry.get(id as OrganizationUserId))
+      .filter(Boolean);
+  }
+
+  isAtRisk(): boolean {
+    return this.atRiskPasswordCount > 0;
+  }
+
+  hasMember(memberId: OrganizationUserId): boolean {
+    return memberId in this.memberRefs; // O(1) lookup
+  }
+
+  isMemberAtRisk(memberId: OrganizationUserId): boolean {
+    return this.memberRefs[memberId] === true; // O(1) lookup
+  }
+}
+```
+
+### Updated RiskInsightsView (parent, holds registry)
+
+```typescript
+class RiskInsightsView {
+  report: RiskInsightsReportView[];
+  applications: Record<string, { isCritical: boolean; reviewedDate: Date | null }>;
+  summary: RiskInsightsSummaryView;
+  memberRegistry: MemberRegistry; // ← shared, deduplicated
+  createdDate: Date;
+
+  // Smart query methods — these replace facade/orchestrator filtering logic:
+
+  getAtRiskMembers(): MemberRegistryEntry[] {
+    // Deduplicate across all at-risk apps
+    const ids = new Set<OrganizationUserId>();
+    for (const app of this.report) {
+      if (app.isAtRisk()) {
+        // memberRefs is a Record, iterate entries and filter for at-risk (value === true)
+        Object.entries(app.memberRefs).forEach(([id, isAtRisk]) => {
+          if (isAtRisk) ids.add(id as OrganizationUserId);
+        });
+      }
+    }
+    return [...ids].map((id) => this.memberRegistry.get(id)).filter(Boolean);
+  }
+
+  getCriticalApplications(): RiskInsightsReportView[] {
+    // OLD (O(n)): this.applications.find((a) => a.hostname === app.applicationName)
+    // NEW (O(1)): this.applicationData[app.applicationName]
+    return this.report.filter((app) => {
+      return this.applicationData[app.applicationName]?.isCritical === true;
+    });
+  }
+
+  getApplicationByHostname(hostname: string): RiskInsightsReportView | undefined {
+    return this.report.find((app) => app.applicationName === hostname);
+  }
+
+  getNewApplications(): RiskInsightsReportView[] {
+    // OLD (O(n)): this.applications.find((a) => a.hostname === app.applicationName)
+    // NEW (O(1)): this.applicationData[app.applicationName]
+    return this.report.filter((app) => {
+      return this.applicationData[app.applicationName]?.reviewedDate === null;
+    });
+  }
+
+  getSummary(): RiskInsightsSummaryView {
+    return this.summary;
+  }
+}
+```
+
+### Size Impact: Current vs Target
+
+#### Current (700MB+ for large orgs)
+
+**10K member org:**
+
+- `memberDetails`: 400 apps × 5,000 members × 180 bytes = **360MB**
+- `atRiskMemberDetails`: 400 apps × 3,000 members × 180 bytes = **216MB**
+- Cipher IDs + metadata: **~15MB**
+- **Total unencrypted: ~591MB**
+- **After encryption + Base64: ~786MB**
+
+#### Target (With Registry + Record Pattern)
+
+**10K member org:**
+
+- **MemberRegistry**: 10,000 members × 140 bytes (no cipherId) = **1.4MB** (stored once)
+- **memberRefs**: 400 apps × 5,000 refs × 50 bytes (Record entry: `"id": false/true`) = **100MB**
+  - No separate atRiskMemberRefs needed - at-risk status is the boolean value
+- **cipherRefs**: 400 apps × 100 ciphers × 50 bytes (Record entry: `"id": false/true`) = **2MB**
+  - No separate atRiskCipherIds array needed - at-risk status is the boolean value
+- **applicationData** (as Record): 400 apps × 100 bytes = **0.04MB** (negligible)
+- Metadata (counts, applicationName): **~10MB**
+- **Total unencrypted: ~113MB**
+- **After encryption + Base64: ~150MB**
+
+**Reduction: 786MB → 150MB = 81% smaller** 🎉
+
+**Design Decision:** Use single `Record<string, boolean>` for members, ciphers, AND Record for applicationData:
+
+- **memberRefs:** No duplicate member IDs, ~60MB saved (vs separate atRiskMemberDetails)
+- **cipherRefs:** No duplicate cipher IDs, ~10MB saved (vs separate atRiskCipherIds array)
+- **applicationData:** O(1) lookup, no functional size change but better performance
+- **Trade-off:** Definitely worth it - saves ~70MB and prevents duplicate storage
+
+---
+
+## 4. Storage Structure Comparison
+
+### Current Storage (What's in DB Today)
+
+```typescript
+// Stored as RiskInsightsData in database
+{
+  id: OrganizationReportId,
+  creationDate: Date,
+  contentEncryptionKey: EncString,
+
+  // ENCRYPTED FIELD 1: reportData (~700MB for large orgs)
+  reportData: [
+    {
+      applicationName: "google.com",
+      cipherIds: ["cipher-id-1", "cipher-id-2", ...],         // ~100 ciphers - ARRAY
+      atRiskCipherIds: ["cipher-id-1", ...],                  // ~50 at-risk - ARRAY (duplicates IDs from cipherIds)
+      memberDetails: [                                         // ~5,000 members - ARRAY
+        { userGuid: "abc", userName: "Alice", email: "alice@...", cipherId: "x" },
+        { userGuid: "def", userName: "Bob", email: "bob@...", cipherId: "y" },
+        // ... FULL member objects, deduplicated per app, duplicated across apps
+      ],
+      atRiskMemberDetails: [                                   // ~3,000 at-risk members - ARRAY (duplicates from memberDetails)
+        { userGuid: "abc", userName: "Alice", email: "alice@...", cipherId: "x" },
+        // ... FULL member objects (subset of memberDetails)
+      ],
+      passwordCount: 100,
+      atRiskPasswordCount: 50,
+      memberCount: 5000,
+      atRiskMemberCount: 3000
+    },
+    // ... 400 applications
+  ],
+
+  // ENCRYPTED FIELD 2: applicationData (~10KB) - ARRAY with O(n) lookup
+  applicationData: [
+    { applicationName: "google.com", isCritical: true, reviewedDate: Date | null },
+    // ... 400 applications
+  ],
+
+  // ENCRYPTED FIELD 3: summaryData (~1KB)
+  summaryData: {
+    totalMemberCount: 10000,
+    totalApplicationCount: 400,
+    totalAtRiskMemberCount: 6000,
+    totalAtRiskApplicationCount: 300,
+    totalCriticalApplicationCount: 50,
+    totalCriticalMemberCount: 8000,
+    totalCriticalAtRiskMemberCount: 4500,
+    totalCriticalAtRiskApplicationCount: 40
+  }
+}
+```
+
+**Total size:** ~786MB encrypted for 10K member org
+**Problem:** Member data duplicated across applications (360MB + 216MB = 576MB just for members)
+
+---
+
+### Target Storage (With Member Registry)
+
+```typescript
+// Stored as RiskInsightsData in database
+{
+  id: OrganizationReportId,
+  creationDate: Date,
+  contentEncryptionKey: EncString,
+
+  // NEW: ENCRYPTED FIELD 0: memberRegistry (~1.4MB for 10K members)
+  memberRegistry: {
+    "abc": { userGuid: "abc", userName: "Alice", email: "alice@..." },
+    "def": { userGuid: "def", userName: "Bob", email: "bob@..." },
+    // ... 10,000 members stored ONCE
+  },
+
+  // ENCRYPTED FIELD 1: reportData (~116MB for 10K org - 80% reduction!)
+  reportData: [
+    {
+      applicationName: "google.com",
+      cipherRefs: {                                            // ~100 cipher IDs with at-risk flag - RECORD
+        "cipher-id-1": true,   // at-risk
+        "cipher-id-2": false,  // not at-risk
+        "cipher-id-3": true,   // at-risk
+        // ... (no separate atRiskCipherIds array needed)
+      },
+      memberRefs: {                                            // ~5,000 member IDs with at-risk flag - RECORD
+        "abc": true,   // at-risk member
+        "def": false,  // not at-risk
+        "ghi": true,   // at-risk member
+        // ... (no separate atRiskMemberRefs array needed)
+      },
+      passwordCount: 100,
+      atRiskPasswordCount: 50,
+      memberCount: 5000,
+      atRiskMemberCount: 3000
+    },
+    // ... 400 applications
+  ],
+
+  // ENCRYPTED FIELD 2: applicationData (~10KB) - RECORD with O(1) lookup
+  applicationData: {
+    "google.com": { isCritical: true, reviewedDate: new Date("2026-01-15") },
+    "github.com": { isCritical: false, reviewedDate: null },
+    "slack.com": { isCritical: true, reviewedDate: new Date("2026-02-01") }
+    // ... 400 applications as Record entries
+  },
+
+  // ENCRYPTED FIELD 3: summaryData (~1KB - unchanged)
+  summaryData: {
+    totalMemberCount: 10000,
+    totalApplicationCount: 400,
+    totalAtRiskMemberCount: 6000,
+    totalAtRiskApplicationCount: 300,
+    totalCriticalApplicationCount: 50,
+    totalCriticalMemberCount: 8000,
+    totalCriticalAtRiskMemberCount: 4500,
+    totalCriticalAtRiskApplicationCount: 40
+  }
+}
+```
+
+**Total size:** ~150MB encrypted for 10K member org (81% reduction)
+**Benefits:**
+
+- Members stored once in registry, referenced by ID from applications
+- Member and cipher IDs stored with at-risk flag (no duplicate arrays)
+- ApplicationData as Record enables O(1) lookup instead of O(n) find operations
+
+---
+
+### Design Decision: Single Record with Boolean Flag (for Members AND Ciphers)
+
+**Chosen approach:** Use single `Record<string, boolean>` where the boolean indicates at-risk status for BOTH members and ciphers.
+
+```typescript
+{
+  applicationName: "google.com",
+  memberRefs: {
+    "abc": true,   // at-risk member
+    "def": false,  // not at-risk
+    "ghi": true    // at-risk member
+  },
+  cipherRefs: {
+    "cipher-1": true,   // at-risk cipher
+    "cipher-2": false,  // not at-risk
+    "cipher-3": true    // at-risk cipher
+  }
+}
+```
+
+**Pros:**
+
+- ✅ **Members:** No duplicate IDs (previously stored in both memberDetails AND atRiskMemberDetails)
+- ✅ **Ciphers:** No duplicate IDs (previously stored in both cipherIds AND atRiskCipherIds)
+- ✅ O(1) lookup for both membership/presence and at-risk status
+- ✅ Automatic deduplication (can't have duplicate keys)
+- ✅ Saves ~60MB for members + ~10MB for ciphers = **~70MB saved** compared to separate arrays
+- ✅ Clear intent - one ID, one entry, one flag
+- ✅ Consistent pattern across both members and ciphers
+
+**Cons:**
+
+- ⚠️ Slightly more complex iteration (need to check boolean value when filtering at-risk)
+- ⚠️ Summary recalculation requires iterating entries instead of just counting keys
+
+**Trade-off Analysis:**
+
+- **Member size savings:** ~60MB (400 apps × 3K at-risk IDs × 50 bytes per duplicate entry)
+- **Cipher size savings:** ~10MB (400 apps × 50 at-risk IDs × 50 bytes per duplicate entry)
+- **Total savings:** ~70MB for 10K org
+- **Performance:** Negligible - `Object.entries().filter()` is still O(n) like array iteration
+- **Correctness:** Better - impossible to have ID in at-risk array but not in main array
+
+**Verdict:** Single Record with boolean flag is the clear winner for both members AND ciphers.
+
+---
+
+## 5. Encryption Approaches (Current vs Future Options)
+
+### Current: Encrypt Whole Objects (No Compression)
+
+**How it works:**
+
+1. `JSON.stringify(reportData)` → encrypt → EncString (~700MB for large orgs)
+2. `JSON.stringify(summaryData)` → encrypt → EncString (~1KB)
+3. `JSON.stringify(applicationData)` → encrypt → EncString (~10KB)
+
+**Stored structure:**
+
+```typescript
+{
+  reportData: EncString,      // ← Entire reportData[] array as one encrypted blob
+  summaryData: EncString,     // ← Entire summary object as one encrypted blob
+  applicationData: EncString  // ← Entire applicationData[] array as one encrypted blob
+  contentEncryptionKey: EncString,
+  id: OrganizationReportId,
+  creationDate: Date
+}
+```
+
+**Problems:**
+
+- For large orgs (700MB+), may approach or exceed WASM encryption limits
+- Must decrypt entire report to access any application
+- Can't do field-level encryption with this approach
+
+---
+
+### Option 1: Encrypt Per Top-Level Field (Current Approach)
+
+Encrypt `reportData`, `summaryData`, `applicationData` as separate EncStrings.
+
+**Pros:**
+
+- ✅ Allows decrypting summary without decrypting full report
+- ✅ Simple encryption logic
+- ✅ Separates metadata (summary, applicationData) from payload (reportData)
+
+**Cons:**
+
+- ❌ Can't access individual applications without decrypting entire report
+- ❌ Can't do field-level encryption
+- ❌ May hit WASM limits for very large orgs (700MB+ unencrypted)
+
+**Status:** This is what we have today.
+
+---
+
+### Option 2: True Field-Level Encryption (Ideal)
+
+**Each field** within each object is encrypted separately, preserving JSON structure:
+
+```typescript
+{
+  memberRegistry: {
+    "abc": {
+      userGuid: EncString("abc"),
+      userName: EncString("Alice"),
+      email: EncString("alice@...")
+    },
+    "def": { /* ... */ }
+  },
+  reportData: [
+    {
+      applicationName: EncString("google.com"),
+      cipherIds: [EncString("id1"), EncString("id2"), ...],
+      memberRefs: [EncString("abc"), EncString("def"), ...],
+      atRiskMemberRefs: [EncString("abc"), ...],
+      passwordCount: EncString("100"),
+      atRiskPasswordCount: EncString("50"),
+      memberCount: EncString("5000"),
+      atRiskMemberCount: EncString("3000")
+    },
+    // ... each application
+  ],
+  summaryData: {
+    totalMemberCount: EncString("10000"),
+    totalApplicationCount: EncString("400"),
+    // ... each field encrypted
+  },
+  applicationData: [
+    {
+      applicationName: EncString("google.com"),
+      isCritical: EncString("true"),
+      reviewedDate: EncString("2026-02-10")
+    },
+    // ... each application
+  ]
+}
+```
+
+**Pros:**
+
+- ✅ Can decrypt individual fields on-demand
+- ✅ Can access single application without decrypting all
+- ✅ Each field is small enough for SDK (no size limits)
+- ✅ Better for partial updates (re-encrypt only changed fields)
+- ✅ Aligns with Bitwarden's data model architecture
+
+**Cons:**
+
+- ❌ More complex encryption/decryption logic
+- ❌ Slightly larger overhead (each EncString has IV + metadata ~20 bytes)
+- ❌ Requires updating all encryption/decryption code paths
+
+**Status:** **Can be implemented alongside member registry.** The member registry will reduce report size (making this easier), but field-level encryption is not blocked by it.
+
+**Size estimate with field-level encryption:**
+
+- Member registry (10K members): ~1.4MB unencrypted → ~2MB encrypted (each field encrypted)
+- Report data: ~116MB unencrypted → ~145MB encrypted (overhead from EncString metadata)
+- **Total: ~147MB** (vs ~154MB with whole-object encryption)
+
+Field-level encryption adds ~10MB overhead but enables partial decryption and avoids WASM limits.
+
+---
+
+### Option 3: Compress Then Encrypt (Draft PR, Want to Avoid)
+
+Compress `reportData` before encrypting. `summaryData` and `applicationData` remain uncompressed (TBD if `applicationData` needs compression).
+
+**How it would work:**
+
+1. `JSON.stringify(reportData)` → compress with pako → encrypt → EncString
+2. `JSON.stringify(summaryData)` → encrypt → EncString (no compression)
+3. `JSON.stringify(applicationData)` → encrypt → EncString (compression TBD)
+
+**Pros:**
+
+- ✅ Stored object is as small as we can get it (compression reduces size by ~70%)
+- ✅ Works for very large orgs without hitting WASM limits
+
+**Cons:**
+
+- ❌ Can't decrypt summary without decompressing everything if whole object is compressed
+- ❌ Makes field-level encryption impossible (can't decrypt individual fields from compressed blob)
+- ❌ More complex decryption logic (decompress → decrypt)
+- ❌ Not the direction we want to go architecturally
+
+**Decision:** **Avoid if possible.** This was explored in a draft PR as a workaround, but we'd prefer to implement member registry (reduces size without compression) and move toward field-level encryption (Option 2).