1
0
mirror of https://github.com/bitwarden/browser synced 2026-02-11 22:13:32 +00:00

[PM-31939] Access Intelligence Documentation: Report Data Model Evolution (#18879)

* Add report-data-model-evolution document

* Change memberRefs to one record with flag for at risk or not

* Update model evolution doc

* Remove implementation section in favor of jira tracking

* Remove todo comment

* Add table of contents
This commit is contained in:
Leslie Tilton
2026-02-10 19:01:52 -06:00
committed by GitHub
parent 8f6cf67f8d
commit 7ccf1263a0

View File

@@ -0,0 +1,807 @@
# Report Data Model Evolution
> **Purpose**: Document the old report data model (what's stored today), the updated model
> from PR #17356 (merged, follows BW architecture), and the target model with the member
> registry optimization. This is a reference for understanding why the report was 450MB+
> and how the member registry solves it.
---
## Table of Contents
1. [Current Storage Model](#1-current-storage-model-still-in-use--plain-interfaces-no-architecture)
2. [Proposed View Models — Following BW Architecture](#2-proposed-view-models--following-bw-architecture-what-should-be-implemented-next)
3. [Target Model — With Member Registry](#3-target-model--with-member-registry-what-were-building)
4. [Storage Structure Comparison](#4-storage-structure-comparison)
5. [Encryption Approaches (Current vs Future Options)](#5-encryption-approaches-current-vs-future-options)
---
## 1. Current Storage Model (Still In Use) — Plain interfaces, no architecture
**Status:** This is what's stored in the database today. These are simple TypeScript interfaces/types with no domain/data/view/api layers. No encryption support in the types themselves. Services do all the filtering and transformation.
**Note:** While PR #17356 introduced architecture patterns, the actual storage structure still uses these plain types directly. The proposed view models (Section 2) describe the architecture we should migrate to next.
### ApplicationHealthReportDetail (the old report row)
**Source:** `models/report-models.ts:78-88` (current implementation, still in use)
**Current structure (with arrays):**
```typescript
// This is the main report model — one record per application (grouped by URI hostname)
// Used directly by services and UI components
export type ApplicationHealthReportDetail = {
applicationName: string; // hostname (e.g. "google.com")
passwordCount: number; // total ciphers for this app
atRiskPasswordCount: number; // ciphers with weak/reused/exposed passwords
cipherIds: CipherId[]; // IDs of all ciphers in this app - ARRAY
atRiskCipherIds: CipherId[]; // IDs of at-risk ciphers - ARRAY (subset of cipherIds)
memberCount: number; // count of unique members (redundant, = memberDetails.length)
atRiskMemberCount: number; // count of at-risk members (redundant, = atRiskMemberDetails.length)
memberDetails: MemberDetails[]; // ⚠️ FULL member objects repeated per app - ARRAY
atRiskMemberDetails: MemberDetails[]; // ⚠️ FULL member objects for at-risk only (subset of memberDetails) - ARRAY
// Members are deduplicated within a single app but NOT across apps.
};
```
**Proposed structure (with Records for consistency):**
```typescript
export type ApplicationHealthReportDetail = {
applicationName: string;
passwordCount: number;
atRiskPasswordCount: number;
cipherRefs: Record<CipherId, boolean>; // true = at-risk, false = not at-risk (combines cipherIds + atRiskCipherIds)
memberCount: number; // could be removed (= Object.keys(memberRefs).length)
atRiskMemberCount: number; // could be removed (= count of true values in memberRefs)
memberRefs: Record<OrganizationUserId, boolean>; // true = at-risk, false = not at-risk (combines memberDetails + atRiskMemberDetails)
};
```
**Benefits of Record pattern for ciphers:**
- ✅ Combines `cipherIds` and `atRiskCipherIds` into single structure
- ✅ No duplicate IDs (prevents data inconsistency)
- ✅ O(1) lookup to check if cipher is at-risk
- ✅ Consistent with `memberRefs` pattern
- ✅ Saves space (~50 bytes per duplicate cipher ID in large orgs)
### MemberDetails (the old member model)
**Source:** `models/report-models.ts:16-21` (current implementation, still in use)
```typescript
// Repeated in EVERY ApplicationHealthReportDetail that a member has access to
// For a large org with 5,000 members accessing 200 apps → duplicated across apps
export type MemberDetails = {
userGuid: string; // Organization user ID (UUID)
userName: string | null; // Display name
email: string; // Email address
cipherId: string; // ⚠️ Meaningless after deduplication (first cipher processed)
};
```
### RiskInsightsData (the storage container)
**Source:** `models/report-models.ts:121-128` (current implementation, still in use)
**Current structure (with arrays):**
**Rename to:** RiskInsights
```typescript
// The top-level container that is stored in the database
// Each field is encrypted separately as an EncString
export interface RiskInsightsData {
id: OrganizationReportId; // Report ID (generated by API)
creationDate: Date; // When report was generated
contentEncryptionKey: EncString; // Key used to encrypt report data
reportData: ApplicationHealthReportDetail[]; // ⚠️ Main payload - can be 700MB+
summaryData: OrganizationReportSummary; // Pre-computed aggregates (~1KB)
applicationData: OrganizationReportApplication[]; // Per-app settings (~10KB) - ARRAY with O(n) lookup
}
```
**Proposed structure (with Records for O(1) lookup):**
```typescript
export interface RiskInsights {
id: OrganizationReportId;
creationDate: Date;
contentEncryptionKey: EncString;
reportData: ApplicationHealthReportDetail[]; // Array is still needed here for iteration
summaryData: OrganizationReportSummary;
applicationData: Record<string, { isCritical: boolean; reviewedDate: Date | null }>; // Record for O(1) lookup
}
```
**Current encryption:** Each of `reportData`, `summaryData`, and `applicationData` is JSON.stringify'd and encrypted as a separate EncString. For large orgs, `reportData` is compressed before encryption to avoid WASM size limits.
### OrganizationReportApplication (per-app user settings)
**Source:** `models/report-models.ts:64-72` (current implementation, still in use)
**Current (Array):** Stored as array with O(n) lookup (inefficient)
**Rename to:** RiskInsightsApplication (If separate model is needed)
```typescript
// User-defined settings per application (critical flag, review date)
// Stored in the report, carried over between report generations
export type OrganizationReportApplication = {
applicationName: string; // hostname (e.g. "google.com")
isCritical: boolean; // user-defined critical flag
reviewedDate: Date | null; // null = new/unreviewed application
};
```
**Proposed (Record):** Should be stored as Record for O(1) lookup
```typescript
// Key = applicationName (hostname)
type ApplicationDataRecord = Record<string, {
isCritical: boolean;
reviewedDate: Date | null;
}>;
// Example:
applicationData: {
"google.com": { isCritical: true, reviewedDate: new Date("2026-01-15") },
"github.com": { isCritical: false, reviewedDate: null }, // new/unreviewed
"slack.com": { isCritical: true, reviewedDate: new Date("2026-02-01") }
}
```
**Problem with current array structure:**
```typescript
// Current inefficient O(n) lookup pattern found in code:
getCriticalApplications(): RiskInsightsReportView[] {
return this.report.filter((app) => {
const appMeta = this.applications.find((a) => a.hostname === app.applicationName); // O(n)!
return appMeta?.isCritical === true;
});
}
```
**With Record (O(1) lookup):**
```typescript
getCriticalApplications(): RiskInsightsReportView[] {
return this.report.filter((app) => {
return this.applicationData[app.applicationName]?.isCritical === true; // O(1)!
});
}
```
### OrganizationReportSummary (pre-computed aggregates)
**Source:** `models/report-models.ts:49-58` (current implementation, still in use)
**Rename to:** RiskInsightsSummary
```typescript
// Pre-computed aggregates for summary cards and filtering
// Recomputed when critical application markings change
export type OrganizationReportSummary = {
totalMemberCount: number; // All members in org
totalApplicationCount: number; // All applications in report
totalAtRiskMemberCount: number; // Members with at-risk access
totalAtRiskApplicationCount: number; // Applications with at-risk ciphers
totalCriticalApplicationCount: number; // Applications marked critical
totalCriticalMemberCount: number; // Members with access to critical apps
totalCriticalAtRiskMemberCount: number; // Members with at-risk access to critical apps
totalCriticalAtRiskApplicationCount: number; // Critical apps with at-risk ciphers
};
```
**Note:** When a user marks/unmarks an application as critical, the summary is recomputed. This is why `atRiskMemberDetails[]` is stored separately per application - it allows efficient recalculation of critical app summaries without reprocessing all cipher health data.
### Why This Was 450MB+
The core problem: **`MemberDetails` objects were fully duplicated per application**.
Example for a large org:
- 5,000 org members
- 200 applications in the report
- Each member might have access to 50+ applications
- Each `MemberDetails` object ~200 bytes
**Worst case**: 5,000 members × 50 apps × 200 bytes = ~50MB just for member data
in `memberDetails[]` arrays. With `atRiskMemberDetails[]` duplicated alongside,
plus cipher health data, this easily reached 450MB+.
This caused:
1. **WASM encryption panics** — the encrypted blob exceeded SDK size limits
2. **Database storage limits** — even compressed, the JSON was too large for DB fields
3. **Memory pressure** — holding this in a `BehaviorSubject` blocked the UI
4. **Slow report generation** — building all these duplicated member arrays was O(n²)
---
## 2. Proposed View Models — Following BW Architecture (What Should Be Implemented Next)
**Status:** PR #17356 laid groundwork for architecture patterns, but storage still uses plain types from Section 1. This section describes the view models that SHOULD be implemented to follow Bitwarden's 4-layer pattern: `Api → Data → Domain → View`
**Important:** These models are NOT currently in use. They represent the target architecture we should migrate to, with query methods replacing facade/orchestrator filtering logic.
### What's Stored (Current Implementation)
The current implementation stores the **exact types from Section 1** above:
- `ApplicationHealthReportDetail` - report rows (700MB+ for large orgs) - using ARRAYS
- `OrganizationReportApplication` - per-app settings (~10KB) - using ARRAY
- `OrganizationReportSummary` - aggregates (~1KB)
These are stored in `RiskInsightsData` and encrypted as separate EncStrings:
```typescript
// What gets stored in the database today (using arrays):
RiskInsightsData {
reportData: ApplicationHealthReportDetail[] // ← JSON.stringify → EncString
// Contains duplicate member objects across apps
// Contains duplicate cipher IDs (cipherIds + atRiskCipherIds)
summaryData: OrganizationReportSummary // ← JSON.stringify → EncString
applicationData: OrganizationReportApplication[] // ← JSON.stringify → EncString (array with O(n) lookup)
contentEncryptionKey: EncString
id: OrganizationReportId
creationDate: Date
}
```
**Encryption approach:** Each field is JSON.stringify'd, optionally compressed (for `reportData` only, to avoid WASM limits), then encrypted with the `contentEncryptionKey`.
**Problems with current structure:**
- Member objects duplicated across applications (576MB for 10K org)
- Cipher and member IDs duplicated in separate arrays (~70MB wasted)
- ApplicationData requires O(n) find operations for every lookup
### Proposed View Models (For Query Logic)
The new architecture will introduce domain/view models with query methods. These are **NOT stored** - they're runtime transformations of the stored data.
#### RiskInsightsView (proposed - replaces facade logic)
```typescript
class RiskInsightsView {
report: ApplicationHealthReportDetail[]; // Decrypted from storage
applications: OrganizationReportApplication[]; // Decrypted from storage
summary: OrganizationReportSummary; // Decrypted from storage
memberRegistry: MemberRegistry; // ← NEW: Built at load time
createdDate: Date;
// Query methods (replace current facade/orchestrator filtering):
getAtRiskMembers(): MemberRegistryEntry[];
getCriticalApplications(): ApplicationHealthReportDetail[];
getApplicationByHostname(hostname: string): ApplicationHealthReportDetail | undefined;
getNewApplications(): ApplicationHealthReportDetail[]; // reviewedDate === null
getSummary(): OrganizationReportSummary;
}
```
**Note:** The view model will have query methods, but the underlying storage structure (Section 1) remains the same until we implement the member registry optimization (Section 3).
---
## 3. Target Model — With Member Registry (What We're Building)
**Key optimization:** Replace duplicated `MemberDetails[]` arrays with lightweight member ID references that point into a shared `MemberRegistry`. This reduces a 10K org report from ~786MB to ~173MB (78% reduction).
**Storage changes:**
- Store members ONCE in a registry (not per application)
- Store only member IDs (userGuids) in application records
- Remove meaningless `cipherId` field from member data
- Combine `memberDetails` and `atRiskMemberDetails` into single array with flag (OR keep separate arrays with IDs only)
### MemberRegistry (new — deduplicated member lookup)
```typescript
// Single source of truth for member data in a report
// Stored once, referenced by index from every application that member appears in
class MemberRegistry {
// Map from org user ID → full member entry
private entries: Map<OrganizationUserId, MemberRegistryEntry>;
get(id: OrganizationUserId): MemberRegistryEntry | undefined;
getAll(): MemberRegistryEntry[];
size(): number;
}
interface MemberRegistryEntry {
id: OrganizationUserId;
userName: string;
email: string;
// Any other member metadata needed by the UI
}
```
### Member References (new — Record with at-risk flag)
Instead of duplicating full member objects per application, each application stores member IDs as a `Record<string, boolean>`, where:
- **Key** = member ID (userGuid)
- **Value** = `true` if at-risk, `false` if not at-risk
This provides:
- **O(1) lookup** for checking membership and at-risk status
- **Automatic deduplication** (can't have duplicate keys)
- **Single source** for both member list and at-risk status
- **No duplicate IDs** (previously stored in both memberDetails and atRiskMemberDetails)
```typescript
// Stored as a Record<string, boolean> where value indicates at-risk status
type MemberRefs = Record<OrganizationUserId, boolean>;
// Example:
memberRefs: {
"abc-123": true, // at-risk member
"def-456": false, // not at-risk
"ghi-789": true // at-risk member
}
```
### Updated RiskInsightsReportView (with registry references)
```typescript
class RiskInsightsReportView {
applicationName: string;
passwordCount: number;
atRiskPasswordCount: number;
weakPasswordCount: number;
reusedPasswordCount: number;
exposedPasswordCount: number;
// OLD: memberDetails: MemberDetails[] + atRiskMemberDetails: MemberDetails[] (duplicated arrays)
// NEW: Single Record with at-risk flag
memberRefs: Record<OrganizationUserId, boolean>; // { "abc": true, "def": false, ... }
// OLD: cipherIds: CipherId[] + atRiskCipherIds: CipherId[] (duplicated arrays)
// NEW: Single Record with at-risk flag
cipherRefs: Record<CipherId, boolean>; // { "cipher-1": true, "cipher-2": false, ... }
// The registry is held by the parent RiskInsightsView
// View model methods resolve refs → full entries on demand:
getAllMembers(registry: MemberRegistry): MemberRegistryEntry[] {
return Object.keys(this.memberRefs)
.map((id) => registry.get(id as OrganizationUserId))
.filter(Boolean);
}
getAtRiskMembers(registry: MemberRegistry): MemberRegistryEntry[] {
return Object.entries(this.memberRefs)
.filter(([_, isAtRisk]) => isAtRisk)
.map(([id]) => registry.get(id as OrganizationUserId))
.filter(Boolean);
}
isAtRisk(): boolean {
return this.atRiskPasswordCount > 0;
}
hasMember(memberId: OrganizationUserId): boolean {
return memberId in this.memberRefs; // O(1) lookup
}
isMemberAtRisk(memberId: OrganizationUserId): boolean {
return this.memberRefs[memberId] === true; // O(1) lookup
}
}
```
### Updated RiskInsightsView (parent, holds registry)
```typescript
class RiskInsightsView {
report: RiskInsightsReportView[];
applications: Record<string, { isCritical: boolean; reviewedDate: Date | null }>;
summary: RiskInsightsSummaryView;
memberRegistry: MemberRegistry; // ← shared, deduplicated
createdDate: Date;
// Smart query methods — these replace facade/orchestrator filtering logic:
getAtRiskMembers(): MemberRegistryEntry[] {
// Deduplicate across all at-risk apps
const ids = new Set<OrganizationUserId>();
for (const app of this.report) {
if (app.isAtRisk()) {
// memberRefs is a Record, iterate entries and filter for at-risk (value === true)
Object.entries(app.memberRefs).forEach(([id, isAtRisk]) => {
if (isAtRisk) ids.add(id as OrganizationUserId);
});
}
}
return [...ids].map((id) => this.memberRegistry.get(id)).filter(Boolean);
}
getCriticalApplications(): RiskInsightsReportView[] {
// OLD (O(n)): this.applications.find((a) => a.hostname === app.applicationName)
// NEW (O(1)): this.applicationData[app.applicationName]
return this.report.filter((app) => {
return this.applicationData[app.applicationName]?.isCritical === true;
});
}
getApplicationByHostname(hostname: string): RiskInsightsReportView | undefined {
return this.report.find((app) => app.applicationName === hostname);
}
getNewApplications(): RiskInsightsReportView[] {
// OLD (O(n)): this.applications.find((a) => a.hostname === app.applicationName)
// NEW (O(1)): this.applicationData[app.applicationName]
return this.report.filter((app) => {
return this.applicationData[app.applicationName]?.reviewedDate === null;
});
}
getSummary(): RiskInsightsSummaryView {
return this.summary;
}
}
```
### Size Impact: Current vs Target
#### Current (700MB+ for large orgs)
**10K member org:**
- `memberDetails`: 400 apps × 5,000 members × 180 bytes = **360MB**
- `atRiskMemberDetails`: 400 apps × 3,000 members × 180 bytes = **216MB**
- Cipher IDs + metadata: **~15MB**
- **Total unencrypted: ~591MB**
- **After encryption + Base64: ~786MB**
#### Target (With Registry + Record Pattern)
**10K member org:**
- **MemberRegistry**: 10,000 members × 140 bytes (no cipherId) = **1.4MB** (stored once)
- **memberRefs**: 400 apps × 5,000 refs × 50 bytes (Record entry: `"id": false/true`) = **100MB**
- No separate atRiskMemberRefs needed - at-risk status is the boolean value
- **cipherRefs**: 400 apps × 100 ciphers × 50 bytes (Record entry: `"id": false/true`) = **2MB**
- No separate atRiskCipherIds array needed - at-risk status is the boolean value
- **applicationData** (as Record): 400 apps × 100 bytes = **0.04MB** (negligible)
- Metadata (counts, applicationName): **~10MB**
- **Total unencrypted: ~113MB**
- **After encryption + Base64: ~150MB**
**Reduction: 786MB → 150MB = 81% smaller** 🎉
**Design Decision:** Use single `Record<string, boolean>` for members, ciphers, AND Record for applicationData:
- **memberRefs:** No duplicate member IDs, ~60MB saved (vs separate atRiskMemberDetails)
- **cipherRefs:** No duplicate cipher IDs, ~10MB saved (vs separate atRiskCipherIds array)
- **applicationData:** O(1) lookup, no functional size change but better performance
- **Trade-off:** Definitely worth it - saves ~70MB and prevents duplicate storage
---
## 4. Storage Structure Comparison
### Current Storage (What's in DB Today)
```typescript
// Stored as RiskInsightsData in database
{
id: OrganizationReportId,
creationDate: Date,
contentEncryptionKey: EncString,
// ENCRYPTED FIELD 1: reportData (~700MB for large orgs)
reportData: [
{
applicationName: "google.com",
cipherIds: ["cipher-id-1", "cipher-id-2", ...], // ~100 ciphers - ARRAY
atRiskCipherIds: ["cipher-id-1", ...], // ~50 at-risk - ARRAY (duplicates IDs from cipherIds)
memberDetails: [ // ~5,000 members - ARRAY
{ userGuid: "abc", userName: "Alice", email: "alice@...", cipherId: "x" },
{ userGuid: "def", userName: "Bob", email: "bob@...", cipherId: "y" },
// ... FULL member objects, deduplicated per app, duplicated across apps
],
atRiskMemberDetails: [ // ~3,000 at-risk members - ARRAY (duplicates from memberDetails)
{ userGuid: "abc", userName: "Alice", email: "alice@...", cipherId: "x" },
// ... FULL member objects (subset of memberDetails)
],
passwordCount: 100,
atRiskPasswordCount: 50,
memberCount: 5000,
atRiskMemberCount: 3000
},
// ... 400 applications
],
// ENCRYPTED FIELD 2: applicationData (~10KB) - ARRAY with O(n) lookup
applicationData: [
{ applicationName: "google.com", isCritical: true, reviewedDate: Date | null },
// ... 400 applications
],
// ENCRYPTED FIELD 3: summaryData (~1KB)
summaryData: {
totalMemberCount: 10000,
totalApplicationCount: 400,
totalAtRiskMemberCount: 6000,
totalAtRiskApplicationCount: 300,
totalCriticalApplicationCount: 50,
totalCriticalMemberCount: 8000,
totalCriticalAtRiskMemberCount: 4500,
totalCriticalAtRiskApplicationCount: 40
}
}
```
**Total size:** ~786MB encrypted for 10K member org
**Problem:** Member data duplicated across applications (360MB + 216MB = 576MB just for members)
---
### Target Storage (With Member Registry)
```typescript
// Stored as RiskInsightsData in database
{
id: OrganizationReportId,
creationDate: Date,
contentEncryptionKey: EncString,
// NEW: ENCRYPTED FIELD 0: memberRegistry (~1.4MB for 10K members)
memberRegistry: {
"abc": { userGuid: "abc", userName: "Alice", email: "alice@..." },
"def": { userGuid: "def", userName: "Bob", email: "bob@..." },
// ... 10,000 members stored ONCE
},
// ENCRYPTED FIELD 1: reportData (~116MB for 10K org - 80% reduction!)
reportData: [
{
applicationName: "google.com",
cipherRefs: { // ~100 cipher IDs with at-risk flag - RECORD
"cipher-id-1": true, // at-risk
"cipher-id-2": false, // not at-risk
"cipher-id-3": true, // at-risk
// ... (no separate atRiskCipherIds array needed)
},
memberRefs: { // ~5,000 member IDs with at-risk flag - RECORD
"abc": true, // at-risk member
"def": false, // not at-risk
"ghi": true, // at-risk member
// ... (no separate atRiskMemberRefs array needed)
},
passwordCount: 100,
atRiskPasswordCount: 50,
memberCount: 5000,
atRiskMemberCount: 3000
},
// ... 400 applications
],
// ENCRYPTED FIELD 2: applicationData (~10KB) - RECORD with O(1) lookup
applicationData: {
"google.com": { isCritical: true, reviewedDate: new Date("2026-01-15") },
"github.com": { isCritical: false, reviewedDate: null },
"slack.com": { isCritical: true, reviewedDate: new Date("2026-02-01") }
// ... 400 applications as Record entries
},
// ENCRYPTED FIELD 3: summaryData (~1KB - unchanged)
summaryData: {
totalMemberCount: 10000,
totalApplicationCount: 400,
totalAtRiskMemberCount: 6000,
totalAtRiskApplicationCount: 300,
totalCriticalApplicationCount: 50,
totalCriticalMemberCount: 8000,
totalCriticalAtRiskMemberCount: 4500,
totalCriticalAtRiskApplicationCount: 40
}
}
```
**Total size:** ~150MB encrypted for 10K member org (81% reduction)
**Benefits:**
- Members stored once in registry, referenced by ID from applications
- Member and cipher IDs stored with at-risk flag (no duplicate arrays)
- ApplicationData as Record enables O(1) lookup instead of O(n) find operations
---
### Design Decision: Single Record with Boolean Flag (for Members AND Ciphers)
**Chosen approach:** Use single `Record<string, boolean>` where the boolean indicates at-risk status for BOTH members and ciphers.
```typescript
{
applicationName: "google.com",
memberRefs: {
"abc": true, // at-risk member
"def": false, // not at-risk
"ghi": true // at-risk member
},
cipherRefs: {
"cipher-1": true, // at-risk cipher
"cipher-2": false, // not at-risk
"cipher-3": true // at-risk cipher
}
}
```
**Pros:**
-**Members:** No duplicate IDs (previously stored in both memberDetails AND atRiskMemberDetails)
-**Ciphers:** No duplicate IDs (previously stored in both cipherIds AND atRiskCipherIds)
- ✅ O(1) lookup for both membership/presence and at-risk status
- ✅ Automatic deduplication (can't have duplicate keys)
- ✅ Saves ~60MB for members + ~10MB for ciphers = **~70MB saved** compared to separate arrays
- ✅ Clear intent - one ID, one entry, one flag
- ✅ Consistent pattern across both members and ciphers
**Cons:**
- ⚠️ Slightly more complex iteration (need to check boolean value when filtering at-risk)
- ⚠️ Summary recalculation requires iterating entries instead of just counting keys
**Trade-off Analysis:**
- **Member size savings:** ~60MB (400 apps × 3K at-risk IDs × 50 bytes per duplicate entry)
- **Cipher size savings:** ~10MB (400 apps × 50 at-risk IDs × 50 bytes per duplicate entry)
- **Total savings:** ~70MB for 10K org
- **Performance:** Negligible - `Object.entries().filter()` is still O(n) like array iteration
- **Correctness:** Better - impossible to have ID in at-risk array but not in main array
**Verdict:** Single Record with boolean flag is the clear winner for both members AND ciphers.
---
## 5. Encryption Approaches (Current vs Future Options)
### Current: Encrypt Whole Objects (No Compression)
**How it works:**
1. `JSON.stringify(reportData)` → encrypt → EncString (~700MB for large orgs)
2. `JSON.stringify(summaryData)` → encrypt → EncString (~1KB)
3. `JSON.stringify(applicationData)` → encrypt → EncString (~10KB)
**Stored structure:**
```typescript
{
reportData: EncString, // ← Entire reportData[] array as one encrypted blob
summaryData: EncString, // ← Entire summary object as one encrypted blob
applicationData: EncString // ← Entire applicationData[] array as one encrypted blob
contentEncryptionKey: EncString,
id: OrganizationReportId,
creationDate: Date
}
```
**Problems:**
- For large orgs (700MB+), may approach or exceed WASM encryption limits
- Must decrypt entire report to access any application
- Can't do field-level encryption with this approach
---
### Option 1: Encrypt Per Top-Level Field (Current Approach)
Encrypt `reportData`, `summaryData`, `applicationData` as separate EncStrings.
**Pros:**
- ✅ Allows decrypting summary without decrypting full report
- ✅ Simple encryption logic
- ✅ Separates metadata (summary, applicationData) from payload (reportData)
**Cons:**
- ❌ Can't access individual applications without decrypting entire report
- ❌ Can't do field-level encryption
- ❌ May hit WASM limits for very large orgs (700MB+ unencrypted)
**Status:** This is what we have today.
---
### Option 2: True Field-Level Encryption (Ideal)
**Each field** within each object is encrypted separately, preserving JSON structure:
```typescript
{
memberRegistry: {
"abc": {
userGuid: EncString("abc"),
userName: EncString("Alice"),
email: EncString("alice@...")
},
"def": { /* ... */ }
},
reportData: [
{
applicationName: EncString("google.com"),
cipherIds: [EncString("id1"), EncString("id2"), ...],
memberRefs: [EncString("abc"), EncString("def"), ...],
atRiskMemberRefs: [EncString("abc"), ...],
passwordCount: EncString("100"),
atRiskPasswordCount: EncString("50"),
memberCount: EncString("5000"),
atRiskMemberCount: EncString("3000")
},
// ... each application
],
summaryData: {
totalMemberCount: EncString("10000"),
totalApplicationCount: EncString("400"),
// ... each field encrypted
},
applicationData: [
{
applicationName: EncString("google.com"),
isCritical: EncString("true"),
reviewedDate: EncString("2026-02-10")
},
// ... each application
]
}
```
**Pros:**
- ✅ Can decrypt individual fields on-demand
- ✅ Can access single application without decrypting all
- ✅ Each field is small enough for SDK (no size limits)
- ✅ Better for partial updates (re-encrypt only changed fields)
- ✅ Aligns with Bitwarden's data model architecture
**Cons:**
- ❌ More complex encryption/decryption logic
- ❌ Slightly larger overhead (each EncString has IV + metadata ~20 bytes)
- ❌ Requires updating all encryption/decryption code paths
**Status:** **Can be implemented alongside member registry.** The member registry will reduce report size (making this easier), but field-level encryption is not blocked by it.
**Size estimate with field-level encryption:**
- Member registry (10K members): ~1.4MB unencrypted → ~2MB encrypted (each field encrypted)
- Report data: ~116MB unencrypted → ~145MB encrypted (overhead from EncString metadata)
- **Total: ~147MB** (vs ~154MB with whole-object encryption)
Field-level encryption adds ~10MB overhead but enables partial decryption and avoids WASM limits.
---
### Option 3: Compress Then Encrypt (Draft PR, Want to Avoid)
Compress `reportData` before encrypting. `summaryData` and `applicationData` remain uncompressed (TBD if `applicationData` needs compression).
**How it would work:**
1. `JSON.stringify(reportData)` → compress with pako → encrypt → EncString
2. `JSON.stringify(summaryData)` → encrypt → EncString (no compression)
3. `JSON.stringify(applicationData)` → encrypt → EncString (compression TBD)
**Pros:**
- ✅ Stored object is as small as we can get it (compression reduces size by ~70%)
- ✅ Works for very large orgs without hitting WASM limits
**Cons:**
- ❌ Can't decrypt summary without decompressing everything if whole object is compressed
- ❌ Makes field-level encryption impossible (can't decrypt individual fields from compressed blob)
- ❌ More complex decryption logic (decompress → decrypt)
- ❌ Not the direction we want to go architecturally
**Decision:** **Avoid if possible.** This was explored in a draft PR as a workaround, but we'd prefer to implement member registry (reduces size without compression) and move toward field-level encryption (Option 2).