mirror of
https://github.com/bitwarden/browser
synced 2026-02-11 22:13:32 +00:00
[PM-31939] Access Intelligence Documentation: Report Data Model Evolution (#18879)
* Add report-data-model-evolution document * Change memberRefs to one record with flag for at risk or not * Update model evolution doc * Remove implementation section in favor of jira tracking * Remove todo comment * Add table of contents
This commit is contained in:
@@ -0,0 +1,807 @@
|
||||
# Report Data Model Evolution
|
||||
|
||||
> **Purpose**: Document the old report data model (what's stored today), the updated model
|
||||
> from PR #17356 (merged, follows BW architecture), and the target model with the member
|
||||
> registry optimization. This is a reference for understanding why the report was 450MB+
|
||||
> and how the member registry solves it.
|
||||
|
||||
---
|
||||
|
||||
## Table of Contents
|
||||
|
||||
1. [Current Storage Model](#1-current-storage-model-still-in-use--plain-interfaces-no-architecture)
|
||||
2. [Proposed View Models — Following BW Architecture](#2-proposed-view-models--following-bw-architecture-what-should-be-implemented-next)
|
||||
3. [Target Model — With Member Registry](#3-target-model--with-member-registry-what-were-building)
|
||||
4. [Storage Structure Comparison](#4-storage-structure-comparison)
|
||||
5. [Encryption Approaches (Current vs Future Options)](#5-encryption-approaches-current-vs-future-options)
|
||||
|
||||
---
|
||||
|
||||
## 1. Current Storage Model (Still In Use) — Plain interfaces, no architecture
|
||||
|
||||
**Status:** This is what's stored in the database today. These are simple TypeScript interfaces/types with no domain/data/view/api layers. No encryption support in the types themselves. Services do all the filtering and transformation.
|
||||
|
||||
**Note:** While PR #17356 introduced architecture patterns, the actual storage structure still uses these plain types directly. The proposed view models (Section 2) describe the architecture we should migrate to next.
|
||||
|
||||
### ApplicationHealthReportDetail (the old report row)
|
||||
|
||||
**Source:** `models/report-models.ts:78-88` (current implementation, still in use)
|
||||
|
||||
**Current structure (with arrays):**
|
||||
|
||||
```typescript
|
||||
// This is the main report model — one record per application (grouped by URI hostname)
|
||||
// Used directly by services and UI components
|
||||
export type ApplicationHealthReportDetail = {
|
||||
applicationName: string; // hostname (e.g. "google.com")
|
||||
passwordCount: number; // total ciphers for this app
|
||||
atRiskPasswordCount: number; // ciphers with weak/reused/exposed passwords
|
||||
cipherIds: CipherId[]; // IDs of all ciphers in this app - ARRAY
|
||||
atRiskCipherIds: CipherId[]; // IDs of at-risk ciphers - ARRAY (subset of cipherIds)
|
||||
memberCount: number; // count of unique members (redundant, = memberDetails.length)
|
||||
atRiskMemberCount: number; // count of at-risk members (redundant, = atRiskMemberDetails.length)
|
||||
memberDetails: MemberDetails[]; // ⚠️ FULL member objects repeated per app - ARRAY
|
||||
atRiskMemberDetails: MemberDetails[]; // ⚠️ FULL member objects for at-risk only (subset of memberDetails) - ARRAY
|
||||
// Members are deduplicated within a single app but NOT across apps.
|
||||
};
|
||||
```
|
||||
|
||||
**Proposed structure (with Records for consistency):**
|
||||
|
||||
```typescript
|
||||
export type ApplicationHealthReportDetail = {
|
||||
applicationName: string;
|
||||
passwordCount: number;
|
||||
atRiskPasswordCount: number;
|
||||
cipherRefs: Record<CipherId, boolean>; // true = at-risk, false = not at-risk (combines cipherIds + atRiskCipherIds)
|
||||
memberCount: number; // could be removed (= Object.keys(memberRefs).length)
|
||||
atRiskMemberCount: number; // could be removed (= count of true values in memberRefs)
|
||||
memberRefs: Record<OrganizationUserId, boolean>; // true = at-risk, false = not at-risk (combines memberDetails + atRiskMemberDetails)
|
||||
};
|
||||
```
|
||||
|
||||
**Benefits of Record pattern for ciphers:**
|
||||
|
||||
- ✅ Combines `cipherIds` and `atRiskCipherIds` into single structure
|
||||
- ✅ No duplicate IDs (prevents data inconsistency)
|
||||
- ✅ O(1) lookup to check if cipher is at-risk
|
||||
- ✅ Consistent with `memberRefs` pattern
|
||||
- ✅ Saves space (~50 bytes per duplicate cipher ID in large orgs)
|
||||
|
||||
### MemberDetails (the old member model)
|
||||
|
||||
**Source:** `models/report-models.ts:16-21` (current implementation, still in use)
|
||||
|
||||
```typescript
|
||||
// Repeated in EVERY ApplicationHealthReportDetail that a member has access to
|
||||
// For a large org with 5,000 members accessing 200 apps → duplicated across apps
|
||||
export type MemberDetails = {
|
||||
userGuid: string; // Organization user ID (UUID)
|
||||
userName: string | null; // Display name
|
||||
email: string; // Email address
|
||||
cipherId: string; // ⚠️ Meaningless after deduplication (first cipher processed)
|
||||
};
|
||||
```
|
||||
|
||||
### RiskInsightsData (the storage container)
|
||||
|
||||
**Source:** `models/report-models.ts:121-128` (current implementation, still in use)
|
||||
|
||||
**Current structure (with arrays):**
|
||||
|
||||
**Rename to:** RiskInsights
|
||||
|
||||
```typescript
|
||||
// The top-level container that is stored in the database
|
||||
// Each field is encrypted separately as an EncString
|
||||
export interface RiskInsightsData {
|
||||
id: OrganizationReportId; // Report ID (generated by API)
|
||||
creationDate: Date; // When report was generated
|
||||
contentEncryptionKey: EncString; // Key used to encrypt report data
|
||||
reportData: ApplicationHealthReportDetail[]; // ⚠️ Main payload - can be 700MB+
|
||||
summaryData: OrganizationReportSummary; // Pre-computed aggregates (~1KB)
|
||||
applicationData: OrganizationReportApplication[]; // Per-app settings (~10KB) - ARRAY with O(n) lookup
|
||||
}
|
||||
```
|
||||
|
||||
**Proposed structure (with Records for O(1) lookup):**
|
||||
|
||||
```typescript
|
||||
export interface RiskInsights {
|
||||
id: OrganizationReportId;
|
||||
creationDate: Date;
|
||||
contentEncryptionKey: EncString;
|
||||
reportData: ApplicationHealthReportDetail[]; // Array is still needed here for iteration
|
||||
summaryData: OrganizationReportSummary;
|
||||
applicationData: Record<string, { isCritical: boolean; reviewedDate: Date | null }>; // Record for O(1) lookup
|
||||
}
|
||||
```
|
||||
|
||||
**Current encryption:** Each of `reportData`, `summaryData`, and `applicationData` is JSON.stringify'd and encrypted as a separate EncString. For large orgs, `reportData` is compressed before encryption to avoid WASM size limits.
|
||||
|
||||
### OrganizationReportApplication (per-app user settings)
|
||||
|
||||
**Source:** `models/report-models.ts:64-72` (current implementation, still in use)
|
||||
|
||||
**Current (Array):** Stored as array with O(n) lookup (inefficient)
|
||||
|
||||
**Rename to:** RiskInsightsApplication (If separate model is needed)
|
||||
|
||||
```typescript
|
||||
// User-defined settings per application (critical flag, review date)
|
||||
// Stored in the report, carried over between report generations
|
||||
export type OrganizationReportApplication = {
|
||||
applicationName: string; // hostname (e.g. "google.com")
|
||||
isCritical: boolean; // user-defined critical flag
|
||||
reviewedDate: Date | null; // null = new/unreviewed application
|
||||
};
|
||||
```
|
||||
|
||||
**Proposed (Record):** Should be stored as Record for O(1) lookup
|
||||
|
||||
```typescript
|
||||
// Key = applicationName (hostname)
|
||||
type ApplicationDataRecord = Record<string, {
|
||||
isCritical: boolean;
|
||||
reviewedDate: Date | null;
|
||||
}>;
|
||||
|
||||
// Example:
|
||||
applicationData: {
|
||||
"google.com": { isCritical: true, reviewedDate: new Date("2026-01-15") },
|
||||
"github.com": { isCritical: false, reviewedDate: null }, // new/unreviewed
|
||||
"slack.com": { isCritical: true, reviewedDate: new Date("2026-02-01") }
|
||||
}
|
||||
```
|
||||
|
||||
**Problem with current array structure:**
|
||||
|
||||
```typescript
|
||||
// Current inefficient O(n) lookup pattern found in code:
|
||||
getCriticalApplications(): RiskInsightsReportView[] {
|
||||
return this.report.filter((app) => {
|
||||
const appMeta = this.applications.find((a) => a.hostname === app.applicationName); // O(n)!
|
||||
return appMeta?.isCritical === true;
|
||||
});
|
||||
}
|
||||
```
|
||||
|
||||
**With Record (O(1) lookup):**
|
||||
|
||||
```typescript
|
||||
getCriticalApplications(): RiskInsightsReportView[] {
|
||||
return this.report.filter((app) => {
|
||||
return this.applicationData[app.applicationName]?.isCritical === true; // O(1)!
|
||||
});
|
||||
}
|
||||
```
|
||||
|
||||
### OrganizationReportSummary (pre-computed aggregates)
|
||||
|
||||
**Source:** `models/report-models.ts:49-58` (current implementation, still in use)
|
||||
|
||||
**Rename to:** RiskInsightsSummary
|
||||
|
||||
```typescript
|
||||
// Pre-computed aggregates for summary cards and filtering
|
||||
// Recomputed when critical application markings change
|
||||
export type OrganizationReportSummary = {
|
||||
totalMemberCount: number; // All members in org
|
||||
totalApplicationCount: number; // All applications in report
|
||||
totalAtRiskMemberCount: number; // Members with at-risk access
|
||||
totalAtRiskApplicationCount: number; // Applications with at-risk ciphers
|
||||
totalCriticalApplicationCount: number; // Applications marked critical
|
||||
totalCriticalMemberCount: number; // Members with access to critical apps
|
||||
totalCriticalAtRiskMemberCount: number; // Members with at-risk access to critical apps
|
||||
totalCriticalAtRiskApplicationCount: number; // Critical apps with at-risk ciphers
|
||||
};
|
||||
```
|
||||
|
||||
**Note:** When a user marks/unmarks an application as critical, the summary is recomputed. This is why `atRiskMemberDetails[]` is stored separately per application - it allows efficient recalculation of critical app summaries without reprocessing all cipher health data.
|
||||
|
||||
### Why This Was 450MB+
|
||||
|
||||
The core problem: **`MemberDetails` objects were fully duplicated per application**.
|
||||
|
||||
Example for a large org:
|
||||
|
||||
- 5,000 org members
|
||||
- 200 applications in the report
|
||||
- Each member might have access to 50+ applications
|
||||
- Each `MemberDetails` object ~200 bytes
|
||||
|
||||
**Worst case**: 5,000 members × 50 apps × 200 bytes = ~50MB just for member data
|
||||
in `memberDetails[]` arrays. With `atRiskMemberDetails[]` duplicated alongside,
|
||||
plus cipher health data, this easily reached 450MB+.
|
||||
|
||||
This caused:
|
||||
|
||||
1. **WASM encryption panics** — the encrypted blob exceeded SDK size limits
|
||||
2. **Database storage limits** — even compressed, the JSON was too large for DB fields
|
||||
3. **Memory pressure** — holding this in a `BehaviorSubject` blocked the UI
|
||||
4. **Slow report generation** — building all these duplicated member arrays was O(n²)
|
||||
|
||||
---
|
||||
|
||||
## 2. Proposed View Models — Following BW Architecture (What Should Be Implemented Next)
|
||||
|
||||
**Status:** PR #17356 laid groundwork for architecture patterns, but storage still uses plain types from Section 1. This section describes the view models that SHOULD be implemented to follow Bitwarden's 4-layer pattern: `Api → Data → Domain → View`
|
||||
|
||||
**Important:** These models are NOT currently in use. They represent the target architecture we should migrate to, with query methods replacing facade/orchestrator filtering logic.
|
||||
|
||||
### What's Stored (Current Implementation)
|
||||
|
||||
The current implementation stores the **exact types from Section 1** above:
|
||||
|
||||
- `ApplicationHealthReportDetail` - report rows (700MB+ for large orgs) - using ARRAYS
|
||||
- `OrganizationReportApplication` - per-app settings (~10KB) - using ARRAY
|
||||
- `OrganizationReportSummary` - aggregates (~1KB)
|
||||
|
||||
These are stored in `RiskInsightsData` and encrypted as separate EncStrings:
|
||||
|
||||
```typescript
|
||||
// What gets stored in the database today (using arrays):
|
||||
RiskInsightsData {
|
||||
reportData: ApplicationHealthReportDetail[] // ← JSON.stringify → EncString
|
||||
// Contains duplicate member objects across apps
|
||||
// Contains duplicate cipher IDs (cipherIds + atRiskCipherIds)
|
||||
summaryData: OrganizationReportSummary // ← JSON.stringify → EncString
|
||||
applicationData: OrganizationReportApplication[] // ← JSON.stringify → EncString (array with O(n) lookup)
|
||||
contentEncryptionKey: EncString
|
||||
id: OrganizationReportId
|
||||
creationDate: Date
|
||||
}
|
||||
```
|
||||
|
||||
**Encryption approach:** Each field is JSON.stringify'd, optionally compressed (for `reportData` only, to avoid WASM limits), then encrypted with the `contentEncryptionKey`.
|
||||
|
||||
**Problems with current structure:**
|
||||
|
||||
- Member objects duplicated across applications (576MB for 10K org)
|
||||
- Cipher and member IDs duplicated in separate arrays (~70MB wasted)
|
||||
- ApplicationData requires O(n) find operations for every lookup
|
||||
|
||||
### Proposed View Models (For Query Logic)
|
||||
|
||||
The new architecture will introduce domain/view models with query methods. These are **NOT stored** - they're runtime transformations of the stored data.
|
||||
|
||||
#### RiskInsightsView (proposed - replaces facade logic)
|
||||
|
||||
```typescript
|
||||
class RiskInsightsView {
|
||||
report: ApplicationHealthReportDetail[]; // Decrypted from storage
|
||||
applications: OrganizationReportApplication[]; // Decrypted from storage
|
||||
summary: OrganizationReportSummary; // Decrypted from storage
|
||||
memberRegistry: MemberRegistry; // ← NEW: Built at load time
|
||||
createdDate: Date;
|
||||
|
||||
// Query methods (replace current facade/orchestrator filtering):
|
||||
getAtRiskMembers(): MemberRegistryEntry[];
|
||||
getCriticalApplications(): ApplicationHealthReportDetail[];
|
||||
getApplicationByHostname(hostname: string): ApplicationHealthReportDetail | undefined;
|
||||
getNewApplications(): ApplicationHealthReportDetail[]; // reviewedDate === null
|
||||
getSummary(): OrganizationReportSummary;
|
||||
}
|
||||
```
|
||||
|
||||
**Note:** The view model will have query methods, but the underlying storage structure (Section 1) remains the same until we implement the member registry optimization (Section 3).
|
||||
|
||||
---
|
||||
|
||||
## 3. Target Model — With Member Registry (What We're Building)
|
||||
|
||||
**Key optimization:** Replace duplicated `MemberDetails[]` arrays with lightweight member ID references that point into a shared `MemberRegistry`. This reduces a 10K org report from ~786MB to ~173MB (78% reduction).
|
||||
|
||||
**Storage changes:**
|
||||
|
||||
- Store members ONCE in a registry (not per application)
|
||||
- Store only member IDs (userGuids) in application records
|
||||
- Remove meaningless `cipherId` field from member data
|
||||
- Combine `memberDetails` and `atRiskMemberDetails` into single array with flag (OR keep separate arrays with IDs only)
|
||||
|
||||
### MemberRegistry (new — deduplicated member lookup)
|
||||
|
||||
```typescript
|
||||
// Single source of truth for member data in a report
|
||||
// Stored once, referenced by index from every application that member appears in
|
||||
class MemberRegistry {
|
||||
// Map from org user ID → full member entry
|
||||
private entries: Map<OrganizationUserId, MemberRegistryEntry>;
|
||||
|
||||
get(id: OrganizationUserId): MemberRegistryEntry | undefined;
|
||||
getAll(): MemberRegistryEntry[];
|
||||
size(): number;
|
||||
}
|
||||
|
||||
interface MemberRegistryEntry {
|
||||
id: OrganizationUserId;
|
||||
userName: string;
|
||||
email: string;
|
||||
// Any other member metadata needed by the UI
|
||||
}
|
||||
```
|
||||
|
||||
### Member References (new — Record with at-risk flag)
|
||||
|
||||
Instead of duplicating full member objects per application, each application stores member IDs as a `Record<string, boolean>`, where:
|
||||
|
||||
- **Key** = member ID (userGuid)
|
||||
- **Value** = `true` if at-risk, `false` if not at-risk
|
||||
|
||||
This provides:
|
||||
|
||||
- **O(1) lookup** for checking membership and at-risk status
|
||||
- **Automatic deduplication** (can't have duplicate keys)
|
||||
- **Single source** for both member list and at-risk status
|
||||
- **No duplicate IDs** (previously stored in both memberDetails and atRiskMemberDetails)
|
||||
|
||||
```typescript
|
||||
// Stored as a Record<string, boolean> where value indicates at-risk status
|
||||
type MemberRefs = Record<OrganizationUserId, boolean>;
|
||||
|
||||
// Example:
|
||||
memberRefs: {
|
||||
"abc-123": true, // at-risk member
|
||||
"def-456": false, // not at-risk
|
||||
"ghi-789": true // at-risk member
|
||||
}
|
||||
```
|
||||
|
||||
### Updated RiskInsightsReportView (with registry references)
|
||||
|
||||
```typescript
|
||||
class RiskInsightsReportView {
|
||||
applicationName: string;
|
||||
passwordCount: number;
|
||||
atRiskPasswordCount: number;
|
||||
weakPasswordCount: number;
|
||||
reusedPasswordCount: number;
|
||||
exposedPasswordCount: number;
|
||||
|
||||
// OLD: memberDetails: MemberDetails[] + atRiskMemberDetails: MemberDetails[] (duplicated arrays)
|
||||
// NEW: Single Record with at-risk flag
|
||||
memberRefs: Record<OrganizationUserId, boolean>; // { "abc": true, "def": false, ... }
|
||||
|
||||
// OLD: cipherIds: CipherId[] + atRiskCipherIds: CipherId[] (duplicated arrays)
|
||||
// NEW: Single Record with at-risk flag
|
||||
cipherRefs: Record<CipherId, boolean>; // { "cipher-1": true, "cipher-2": false, ... }
|
||||
|
||||
// The registry is held by the parent RiskInsightsView
|
||||
// View model methods resolve refs → full entries on demand:
|
||||
|
||||
getAllMembers(registry: MemberRegistry): MemberRegistryEntry[] {
|
||||
return Object.keys(this.memberRefs)
|
||||
.map((id) => registry.get(id as OrganizationUserId))
|
||||
.filter(Boolean);
|
||||
}
|
||||
|
||||
getAtRiskMembers(registry: MemberRegistry): MemberRegistryEntry[] {
|
||||
return Object.entries(this.memberRefs)
|
||||
.filter(([_, isAtRisk]) => isAtRisk)
|
||||
.map(([id]) => registry.get(id as OrganizationUserId))
|
||||
.filter(Boolean);
|
||||
}
|
||||
|
||||
isAtRisk(): boolean {
|
||||
return this.atRiskPasswordCount > 0;
|
||||
}
|
||||
|
||||
hasMember(memberId: OrganizationUserId): boolean {
|
||||
return memberId in this.memberRefs; // O(1) lookup
|
||||
}
|
||||
|
||||
isMemberAtRisk(memberId: OrganizationUserId): boolean {
|
||||
return this.memberRefs[memberId] === true; // O(1) lookup
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Updated RiskInsightsView (parent, holds registry)
|
||||
|
||||
```typescript
|
||||
class RiskInsightsView {
|
||||
report: RiskInsightsReportView[];
|
||||
applications: Record<string, { isCritical: boolean; reviewedDate: Date | null }>;
|
||||
summary: RiskInsightsSummaryView;
|
||||
memberRegistry: MemberRegistry; // ← shared, deduplicated
|
||||
createdDate: Date;
|
||||
|
||||
// Smart query methods — these replace facade/orchestrator filtering logic:
|
||||
|
||||
getAtRiskMembers(): MemberRegistryEntry[] {
|
||||
// Deduplicate across all at-risk apps
|
||||
const ids = new Set<OrganizationUserId>();
|
||||
for (const app of this.report) {
|
||||
if (app.isAtRisk()) {
|
||||
// memberRefs is a Record, iterate entries and filter for at-risk (value === true)
|
||||
Object.entries(app.memberRefs).forEach(([id, isAtRisk]) => {
|
||||
if (isAtRisk) ids.add(id as OrganizationUserId);
|
||||
});
|
||||
}
|
||||
}
|
||||
return [...ids].map((id) => this.memberRegistry.get(id)).filter(Boolean);
|
||||
}
|
||||
|
||||
getCriticalApplications(): RiskInsightsReportView[] {
|
||||
// OLD (O(n)): this.applications.find((a) => a.hostname === app.applicationName)
|
||||
// NEW (O(1)): this.applicationData[app.applicationName]
|
||||
return this.report.filter((app) => {
|
||||
return this.applicationData[app.applicationName]?.isCritical === true;
|
||||
});
|
||||
}
|
||||
|
||||
getApplicationByHostname(hostname: string): RiskInsightsReportView | undefined {
|
||||
return this.report.find((app) => app.applicationName === hostname);
|
||||
}
|
||||
|
||||
getNewApplications(): RiskInsightsReportView[] {
|
||||
// OLD (O(n)): this.applications.find((a) => a.hostname === app.applicationName)
|
||||
// NEW (O(1)): this.applicationData[app.applicationName]
|
||||
return this.report.filter((app) => {
|
||||
return this.applicationData[app.applicationName]?.reviewedDate === null;
|
||||
});
|
||||
}
|
||||
|
||||
getSummary(): RiskInsightsSummaryView {
|
||||
return this.summary;
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Size Impact: Current vs Target
|
||||
|
||||
#### Current (700MB+ for large orgs)
|
||||
|
||||
**10K member org:**
|
||||
|
||||
- `memberDetails`: 400 apps × 5,000 members × 180 bytes = **360MB**
|
||||
- `atRiskMemberDetails`: 400 apps × 3,000 members × 180 bytes = **216MB**
|
||||
- Cipher IDs + metadata: **~15MB**
|
||||
- **Total unencrypted: ~591MB**
|
||||
- **After encryption + Base64: ~786MB**
|
||||
|
||||
#### Target (With Registry + Record Pattern)
|
||||
|
||||
**10K member org:**
|
||||
|
||||
- **MemberRegistry**: 10,000 members × 140 bytes (no cipherId) = **1.4MB** (stored once)
|
||||
- **memberRefs**: 400 apps × 5,000 refs × 50 bytes (Record entry: `"id": false/true`) = **100MB**
|
||||
- No separate atRiskMemberRefs needed - at-risk status is the boolean value
|
||||
- **cipherRefs**: 400 apps × 100 ciphers × 50 bytes (Record entry: `"id": false/true`) = **2MB**
|
||||
- No separate atRiskCipherIds array needed - at-risk status is the boolean value
|
||||
- **applicationData** (as Record): 400 apps × 100 bytes = **0.04MB** (negligible)
|
||||
- Metadata (counts, applicationName): **~10MB**
|
||||
- **Total unencrypted: ~113MB**
|
||||
- **After encryption + Base64: ~150MB**
|
||||
|
||||
**Reduction: 786MB → 150MB = 81% smaller** 🎉
|
||||
|
||||
**Design Decision:** Use single `Record<string, boolean>` for members, ciphers, AND Record for applicationData:
|
||||
|
||||
- **memberRefs:** No duplicate member IDs, ~60MB saved (vs separate atRiskMemberDetails)
|
||||
- **cipherRefs:** No duplicate cipher IDs, ~10MB saved (vs separate atRiskCipherIds array)
|
||||
- **applicationData:** O(1) lookup, no functional size change but better performance
|
||||
- **Trade-off:** Definitely worth it - saves ~70MB and prevents duplicate storage
|
||||
|
||||
---
|
||||
|
||||
## 4. Storage Structure Comparison
|
||||
|
||||
### Current Storage (What's in DB Today)
|
||||
|
||||
```typescript
|
||||
// Stored as RiskInsightsData in database
|
||||
{
|
||||
id: OrganizationReportId,
|
||||
creationDate: Date,
|
||||
contentEncryptionKey: EncString,
|
||||
|
||||
// ENCRYPTED FIELD 1: reportData (~700MB for large orgs)
|
||||
reportData: [
|
||||
{
|
||||
applicationName: "google.com",
|
||||
cipherIds: ["cipher-id-1", "cipher-id-2", ...], // ~100 ciphers - ARRAY
|
||||
atRiskCipherIds: ["cipher-id-1", ...], // ~50 at-risk - ARRAY (duplicates IDs from cipherIds)
|
||||
memberDetails: [ // ~5,000 members - ARRAY
|
||||
{ userGuid: "abc", userName: "Alice", email: "alice@...", cipherId: "x" },
|
||||
{ userGuid: "def", userName: "Bob", email: "bob@...", cipherId: "y" },
|
||||
// ... FULL member objects, deduplicated per app, duplicated across apps
|
||||
],
|
||||
atRiskMemberDetails: [ // ~3,000 at-risk members - ARRAY (duplicates from memberDetails)
|
||||
{ userGuid: "abc", userName: "Alice", email: "alice@...", cipherId: "x" },
|
||||
// ... FULL member objects (subset of memberDetails)
|
||||
],
|
||||
passwordCount: 100,
|
||||
atRiskPasswordCount: 50,
|
||||
memberCount: 5000,
|
||||
atRiskMemberCount: 3000
|
||||
},
|
||||
// ... 400 applications
|
||||
],
|
||||
|
||||
// ENCRYPTED FIELD 2: applicationData (~10KB) - ARRAY with O(n) lookup
|
||||
applicationData: [
|
||||
{ applicationName: "google.com", isCritical: true, reviewedDate: Date | null },
|
||||
// ... 400 applications
|
||||
],
|
||||
|
||||
// ENCRYPTED FIELD 3: summaryData (~1KB)
|
||||
summaryData: {
|
||||
totalMemberCount: 10000,
|
||||
totalApplicationCount: 400,
|
||||
totalAtRiskMemberCount: 6000,
|
||||
totalAtRiskApplicationCount: 300,
|
||||
totalCriticalApplicationCount: 50,
|
||||
totalCriticalMemberCount: 8000,
|
||||
totalCriticalAtRiskMemberCount: 4500,
|
||||
totalCriticalAtRiskApplicationCount: 40
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Total size:** ~786MB encrypted for 10K member org
|
||||
**Problem:** Member data duplicated across applications (360MB + 216MB = 576MB just for members)
|
||||
|
||||
---
|
||||
|
||||
### Target Storage (With Member Registry)
|
||||
|
||||
```typescript
|
||||
// Stored as RiskInsightsData in database
|
||||
{
|
||||
id: OrganizationReportId,
|
||||
creationDate: Date,
|
||||
contentEncryptionKey: EncString,
|
||||
|
||||
// NEW: ENCRYPTED FIELD 0: memberRegistry (~1.4MB for 10K members)
|
||||
memberRegistry: {
|
||||
"abc": { userGuid: "abc", userName: "Alice", email: "alice@..." },
|
||||
"def": { userGuid: "def", userName: "Bob", email: "bob@..." },
|
||||
// ... 10,000 members stored ONCE
|
||||
},
|
||||
|
||||
// ENCRYPTED FIELD 1: reportData (~116MB for 10K org - 80% reduction!)
|
||||
reportData: [
|
||||
{
|
||||
applicationName: "google.com",
|
||||
cipherRefs: { // ~100 cipher IDs with at-risk flag - RECORD
|
||||
"cipher-id-1": true, // at-risk
|
||||
"cipher-id-2": false, // not at-risk
|
||||
"cipher-id-3": true, // at-risk
|
||||
// ... (no separate atRiskCipherIds array needed)
|
||||
},
|
||||
memberRefs: { // ~5,000 member IDs with at-risk flag - RECORD
|
||||
"abc": true, // at-risk member
|
||||
"def": false, // not at-risk
|
||||
"ghi": true, // at-risk member
|
||||
// ... (no separate atRiskMemberRefs array needed)
|
||||
},
|
||||
passwordCount: 100,
|
||||
atRiskPasswordCount: 50,
|
||||
memberCount: 5000,
|
||||
atRiskMemberCount: 3000
|
||||
},
|
||||
// ... 400 applications
|
||||
],
|
||||
|
||||
// ENCRYPTED FIELD 2: applicationData (~10KB) - RECORD with O(1) lookup
|
||||
applicationData: {
|
||||
"google.com": { isCritical: true, reviewedDate: new Date("2026-01-15") },
|
||||
"github.com": { isCritical: false, reviewedDate: null },
|
||||
"slack.com": { isCritical: true, reviewedDate: new Date("2026-02-01") }
|
||||
// ... 400 applications as Record entries
|
||||
},
|
||||
|
||||
// ENCRYPTED FIELD 3: summaryData (~1KB - unchanged)
|
||||
summaryData: {
|
||||
totalMemberCount: 10000,
|
||||
totalApplicationCount: 400,
|
||||
totalAtRiskMemberCount: 6000,
|
||||
totalAtRiskApplicationCount: 300,
|
||||
totalCriticalApplicationCount: 50,
|
||||
totalCriticalMemberCount: 8000,
|
||||
totalCriticalAtRiskMemberCount: 4500,
|
||||
totalCriticalAtRiskApplicationCount: 40
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Total size:** ~150MB encrypted for 10K member org (81% reduction)
|
||||
**Benefits:**
|
||||
|
||||
- Members stored once in registry, referenced by ID from applications
|
||||
- Member and cipher IDs stored with at-risk flag (no duplicate arrays)
|
||||
- ApplicationData as Record enables O(1) lookup instead of O(n) find operations
|
||||
|
||||
---
|
||||
|
||||
### Design Decision: Single Record with Boolean Flag (for Members AND Ciphers)
|
||||
|
||||
**Chosen approach:** Use single `Record<string, boolean>` where the boolean indicates at-risk status for BOTH members and ciphers.
|
||||
|
||||
```typescript
|
||||
{
|
||||
applicationName: "google.com",
|
||||
memberRefs: {
|
||||
"abc": true, // at-risk member
|
||||
"def": false, // not at-risk
|
||||
"ghi": true // at-risk member
|
||||
},
|
||||
cipherRefs: {
|
||||
"cipher-1": true, // at-risk cipher
|
||||
"cipher-2": false, // not at-risk
|
||||
"cipher-3": true // at-risk cipher
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Pros:**
|
||||
|
||||
- ✅ **Members:** No duplicate IDs (previously stored in both memberDetails AND atRiskMemberDetails)
|
||||
- ✅ **Ciphers:** No duplicate IDs (previously stored in both cipherIds AND atRiskCipherIds)
|
||||
- ✅ O(1) lookup for both membership/presence and at-risk status
|
||||
- ✅ Automatic deduplication (can't have duplicate keys)
|
||||
- ✅ Saves ~60MB for members + ~10MB for ciphers = **~70MB saved** compared to separate arrays
|
||||
- ✅ Clear intent - one ID, one entry, one flag
|
||||
- ✅ Consistent pattern across both members and ciphers
|
||||
|
||||
**Cons:**
|
||||
|
||||
- ⚠️ Slightly more complex iteration (need to check boolean value when filtering at-risk)
|
||||
- ⚠️ Summary recalculation requires iterating entries instead of just counting keys
|
||||
|
||||
**Trade-off Analysis:**
|
||||
|
||||
- **Member size savings:** ~60MB (400 apps × 3K at-risk IDs × 50 bytes per duplicate entry)
|
||||
- **Cipher size savings:** ~10MB (400 apps × 50 at-risk IDs × 50 bytes per duplicate entry)
|
||||
- **Total savings:** ~70MB for 10K org
|
||||
- **Performance:** Negligible - `Object.entries().filter()` is still O(n) like array iteration
|
||||
- **Correctness:** Better - impossible to have ID in at-risk array but not in main array
|
||||
|
||||
**Verdict:** Single Record with boolean flag is the clear winner for both members AND ciphers.
|
||||
|
||||
---
|
||||
|
||||
## 5. Encryption Approaches (Current vs Future Options)
|
||||
|
||||
### Current: Encrypt Whole Objects (No Compression)
|
||||
|
||||
**How it works:**
|
||||
|
||||
1. `JSON.stringify(reportData)` → encrypt → EncString (~700MB for large orgs)
|
||||
2. `JSON.stringify(summaryData)` → encrypt → EncString (~1KB)
|
||||
3. `JSON.stringify(applicationData)` → encrypt → EncString (~10KB)
|
||||
|
||||
**Stored structure:**
|
||||
|
||||
```typescript
|
||||
{
|
||||
reportData: EncString, // ← Entire reportData[] array as one encrypted blob
|
||||
summaryData: EncString, // ← Entire summary object as one encrypted blob
|
||||
applicationData: EncString // ← Entire applicationData[] array as one encrypted blob
|
||||
contentEncryptionKey: EncString,
|
||||
id: OrganizationReportId,
|
||||
creationDate: Date
|
||||
}
|
||||
```
|
||||
|
||||
**Problems:**
|
||||
|
||||
- For large orgs (700MB+), may approach or exceed WASM encryption limits
|
||||
- Must decrypt entire report to access any application
|
||||
- Can't do field-level encryption with this approach
|
||||
|
||||
---
|
||||
|
||||
### Option 1: Encrypt Per Top-Level Field (Current Approach)
|
||||
|
||||
Encrypt `reportData`, `summaryData`, `applicationData` as separate EncStrings.
|
||||
|
||||
**Pros:**
|
||||
|
||||
- ✅ Allows decrypting summary without decrypting full report
|
||||
- ✅ Simple encryption logic
|
||||
- ✅ Separates metadata (summary, applicationData) from payload (reportData)
|
||||
|
||||
**Cons:**
|
||||
|
||||
- ❌ Can't access individual applications without decrypting entire report
|
||||
- ❌ Can't do field-level encryption
|
||||
- ❌ May hit WASM limits for very large orgs (700MB+ unencrypted)
|
||||
|
||||
**Status:** This is what we have today.
|
||||
|
||||
---
|
||||
|
||||
### Option 2: True Field-Level Encryption (Ideal)
|
||||
|
||||
**Each field** within each object is encrypted separately, preserving JSON structure:
|
||||
|
||||
```typescript
|
||||
{
|
||||
memberRegistry: {
|
||||
"abc": {
|
||||
userGuid: EncString("abc"),
|
||||
userName: EncString("Alice"),
|
||||
email: EncString("alice@...")
|
||||
},
|
||||
"def": { /* ... */ }
|
||||
},
|
||||
reportData: [
|
||||
{
|
||||
applicationName: EncString("google.com"),
|
||||
cipherIds: [EncString("id1"), EncString("id2"), ...],
|
||||
memberRefs: [EncString("abc"), EncString("def"), ...],
|
||||
atRiskMemberRefs: [EncString("abc"), ...],
|
||||
passwordCount: EncString("100"),
|
||||
atRiskPasswordCount: EncString("50"),
|
||||
memberCount: EncString("5000"),
|
||||
atRiskMemberCount: EncString("3000")
|
||||
},
|
||||
// ... each application
|
||||
],
|
||||
summaryData: {
|
||||
totalMemberCount: EncString("10000"),
|
||||
totalApplicationCount: EncString("400"),
|
||||
// ... each field encrypted
|
||||
},
|
||||
applicationData: [
|
||||
{
|
||||
applicationName: EncString("google.com"),
|
||||
isCritical: EncString("true"),
|
||||
reviewedDate: EncString("2026-02-10")
|
||||
},
|
||||
// ... each application
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
**Pros:**
|
||||
|
||||
- ✅ Can decrypt individual fields on-demand
|
||||
- ✅ Can access single application without decrypting all
|
||||
- ✅ Each field is small enough for SDK (no size limits)
|
||||
- ✅ Better for partial updates (re-encrypt only changed fields)
|
||||
- ✅ Aligns with Bitwarden's data model architecture
|
||||
|
||||
**Cons:**
|
||||
|
||||
- ❌ More complex encryption/decryption logic
|
||||
- ❌ Slightly larger overhead (each EncString has IV + metadata ~20 bytes)
|
||||
- ❌ Requires updating all encryption/decryption code paths
|
||||
|
||||
**Status:** **Can be implemented alongside member registry.** The member registry will reduce report size (making this easier), but field-level encryption is not blocked by it.
|
||||
|
||||
**Size estimate with field-level encryption:**
|
||||
|
||||
- Member registry (10K members): ~1.4MB unencrypted → ~2MB encrypted (each field encrypted)
|
||||
- Report data: ~116MB unencrypted → ~145MB encrypted (overhead from EncString metadata)
|
||||
- **Total: ~147MB** (vs ~154MB with whole-object encryption)
|
||||
|
||||
Field-level encryption adds ~10MB overhead but enables partial decryption and avoids WASM limits.
|
||||
|
||||
---
|
||||
|
||||
### Option 3: Compress Then Encrypt (Draft PR, Want to Avoid)
|
||||
|
||||
Compress `reportData` before encrypting. `summaryData` and `applicationData` remain uncompressed (TBD if `applicationData` needs compression).
|
||||
|
||||
**How it would work:**
|
||||
|
||||
1. `JSON.stringify(reportData)` → compress with pako → encrypt → EncString
|
||||
2. `JSON.stringify(summaryData)` → encrypt → EncString (no compression)
|
||||
3. `JSON.stringify(applicationData)` → encrypt → EncString (compression TBD)
|
||||
|
||||
**Pros:**
|
||||
|
||||
- ✅ Stored object is as small as we can get it (compression reduces size by ~70%)
|
||||
- ✅ Works for very large orgs without hitting WASM limits
|
||||
|
||||
**Cons:**
|
||||
|
||||
- ❌ Can't decrypt summary without decompressing everything if whole object is compressed
|
||||
- ❌ Makes field-level encryption impossible (can't decrypt individual fields from compressed blob)
|
||||
- ❌ More complex decryption logic (decompress → decrypt)
|
||||
- ❌ Not the direction we want to go architecturally
|
||||
|
||||
**Decision:** **Avoid if possible.** This was explored in a draft PR as a workaround, but we'd prefer to implement member registry (reduces size without compression) and move toward field-level encryption (Option 2).
|
||||
Reference in New Issue
Block a user