DocumentationSecurity & PrivacyData Encryption at Rest

Data Encryption at Rest

How CodeDD protects your source code with military-grade encryption

Data Encryption at Rest

Overview

Your source code is your intellectual property, your competitive advantage, and in many cases, your entire business. CodeDD treats it with the security it deserves through comprehensive encryption at every stage of processing.

Encryption Architecture

Defense in Depth

CodeDD employs multiple layers of encryption:

Layer 1: Transport (TLS 1.3)

  • All data transmitted over HTTPS
  • Perfect forward secrecy
  • Modern cipher suites only
  • Certificate pinning

Layer 2: Storage Encryption (AES-256-GCM)

  • Files encrypted immediately after clone
  • Unique key per audit
  • Authenticated encryption (prevents tampering)
  • Zero-knowledge architecture

Layer 3: Database Encryption

  • Sensitive metadata encrypted
  • Separate encryption keys
  • Encrypted backups
  • Encrypted replication

Layer 4: Memory Protection

  • Sensitive data cleared after use
  • Secure memory allocation
  • No swap for sensitive data
  • Memory encryption where available

File Encryption Process

Immediate Encryption

Timeline:

0:00 - Repository cloned to ephemeral container
0:01 - File discovered by scanner
0:02 - File metrics calculated (LOC, complexity)
0:03 - File IMMEDIATELY encrypted in place
0:04 - Original plaintext purged from memory

No Plaintext Storage:

  • Files never stored unencrypted on disk
  • Original content overwritten securely
  • Only encrypted version persists
  • Encryption keys never stored with data

Encryption Specification

Algorithm: AES-256-GCM

Why AES-256-GCM?

  • AES-256: Industry standard, NIST-approved
  • GCM Mode: Authenticated encryption (integrity + confidentiality)
  • Performance: Hardware-accelerated on modern CPUs
  • Security: No known practical attacks

Implementation Details:

Encryption Parameters:
  - Algorithm: AES-256-GCM
  - Key Size: 256 bits (32 bytes)
  - IV Size: 96 bits (12 bytes, random per file)
  - Tag Size: 128 bits (16 bytes, authentication)

Per-File Process:
  1. Generate random IV (Initialization Vector)
  2. Encrypt file content with AES-GCM
  3. Append authentication tag
  4. Overwrite original file with encrypted version
  5. Securely wipe original content from memory

File Format:

Encrypted File Structure:
  [IV: 12 bytes][Encrypted Content][Auth Tag: 16 bytes]

Key Management

Audit-Specific Keys:

  • Unique encryption key per audit
  • Keys generated using cryptographically secure RNG
  • Keys derived from high-entropy sources
  • Never reused across audits

Key Storage:

Key Hierarchy:

Master Key (HSM-protected)
  ↓
Audit Encryption Key (derived, ephemeral)
  ↓
File Encryption (AES-256-GCM)

Key Lifecycle:

1. Key Generation:
   - Generated at audit start
   - 256 bits of entropy
   - Stored in encrypted key vault
   - Access logged

2. Key Usage:
   - Retrieved only when needed
   - Cached in memory (encrypted)
   - Never written to disk in plaintext
   - Access requires authentication

3. Key Destruction:
   - Immediate after audit completion
   - Cryptographic shredding
   - Multi-pass overwrite
   - Verification of destruction

Decryption for Analysis

Just-in-Time Decryption

When AI analysis needs file content:

1. Retrieve encrypted file from storage
2. Load audit-specific decryption key
3. Decrypt in memory only
4. Pass plaintext to AI engine
5. AI analysis completes
6. Plaintext immediately cleared from memory
7. Encrypted file remains on disk

No Persistent Decryption:

  • Files decrypted only for milliseconds
  • Never written back to disk in plaintext
  • Memory cleared immediately after use
  • No caching of plaintext content

Memory Security

Secure Memory Handling:

  • Sensitive data in protected memory pages
  • No swap/page file for sensitive data
  • Memory zeroed before release
  • Core dumps disabled for processes handling code

Example (Python with cryptography library):

from cryptography.hazmat.primitives.ciphers import Cipher, algorithms, modes
import secrets

def decrypt_file_to_memory(file_path, key):
    """
    Decrypt file directly to memory without touching disk
    """
    try:
        # Read encrypted file
        with open(file_path, 'rb') as f:
            iv = f.read(12)
            ciphertext = f.read()[:-16]
            tag = f.read()[-16:]
        
        # Decrypt in memory
        cipher = Cipher(algorithms.AES(key), modes.GCM(iv, tag))
        decryptor = cipher.decryptor()
        plaintext = decryptor.update(ciphertext) + decryptor.finalize()
        
        # Use plaintext (e.g., AI analysis)
        result = analyze_content(plaintext)
        
        return result
    finally:
        # Ensure plaintext is cleared
        if 'plaintext' in locals():
            secrets.clearmem(plaintext)  # Secure memory wipe

Encryption Performance

Hardware Acceleration

AES-NI Support:

  • Modern CPUs have AES instruction sets
  • Hardware-accelerated encryption/decryption
  • Minimal performance overhead
  • Typical speed: 1-3 GB/s per core

Benchmarks:

  • Small files (<1KB): <1ms encryption time
  • Medium files (1-100KB): 1-10ms
  • Large files (1-10MB): 10-100ms
  • Very large files (>10MB): Streaming encryption

Scalability

Concurrent Operations:

  • Multiple files encrypted in parallel
  • Thread-safe key access
  • Lock-free where possible
  • Typical throughput: 100+ files/second

What Gets Encrypted

Complete Protection

Encrypted:

  • ✅ All source code files
  • ✅ Configuration files
  • ✅ Documentation files
  • ✅ Any file cloned from repository
  • ✅ Temporary analysis files
  • ✅ Database backups

Not Encrypted (Because Not Stored):

  • ❌ File content in AI API calls (ephemeral, TLS-protected)
  • ❌ Memory during active processing (cleared immediately)
  • ❌ Results/findings (no code content, only metadata)

Metadata Protection

Encrypted Metadata:

  • File paths (if sensitive)
  • Git commit messages (if they contain sensitive info)
  • Developer names (if required by policy)

Plaintext Metadata:

  • File extensions
  • Lines of code counts
  • File types
  • Timestamps

Compliance Standards

Encryption Standards Met

NIST Compliance:

  • FIPS 140-2 validated algorithms
  • Approved key lengths (256-bit)
  • Secure random number generation
  • Approved modes of operation (GCM)

Industry Standards:

  • AES-256: NSA approved for TOP SECRET
  • GCM Mode: NIST SP 800-38D
  • Key Management: NIST SP 800-57
  • TLS 1.3: RFC 8446

Regulatory Compliance

GDPR (EU Data Protection):

  • Encryption as "appropriate technical measure"
  • Pseudonymization through encryption
  • Data minimization (only encrypted files stored)
  • Right to erasure (cryptographic deletion)

SOC 2 Type II:

  • Encryption at rest controls
  • Key management procedures
  • Access logging
  • Audit trails

ISO 27001:

  • Cryptographic controls (A.10.1)
  • Secure deletion (A.8.3)
  • Access control (A.9)
  • Audit logging (A.12.4)

Encryption vs. No Encryption

Risk Comparison

Without Encryption:

Risks:
  - Data breach exposes entire codebase
  - Insider threats can access all code
  - Backup compromise leaks IP
  - Stolen disks/servers expose code
  - Cloud provider breach exposes data

With CodeDD Encryption:

Protections:
  - Breach only exposes encrypted data (useless without keys)
  - Access requires both file and key
  - Backups are protected
  - Stolen storage devices contain only encrypted data
  - Keys stored separately from data

Real-World Scenario

Incident: Cloud Storage Breach

Without Encryption:

Attacker gains access to cloud storage
  ↓
Reads all source code files
  ↓
Complete IP theft
  ↓
Result: Business-ending breach

With CodeDD Encryption:

Attacker gains access to cloud storage
  ↓
Finds only encrypted files
  ↓
Cannot decrypt without keys (stored separately)
  ↓
Result: No IP exposure, attack contained

Encryption Limitations

What Encryption Doesn't Protect

Honest About Trade-offs:

  1. Active Processing: While being analyzed, content is briefly in memory (but cleared immediately after)

  2. AI API Calls: Content sent to AI service over TLS (ephemeral, not stored by AI service)

  3. Metadata: File names, sizes, structures not encrypted (but contain no code)

  4. Timing Attacks: File size and processing time might leak minimal information

Mitigations:

  • AI APIs use zero-retention policies
  • TLS 1.3 encrypts all transmissions
  • Metadata minimization
  • Constant-time operations where feasible

Key Takeaways

For Investors:

  • IP Protection: Your portfolio's code is encrypted end-to-end
  • Breach Resilience: Even if infrastructure compromised, code stays protected
  • Compliance: Meets regulatory requirements for data protection
  • Insurance: Lower cyber insurance premiums due to strong encryption

For CTOs:

  • Standard Compliance: AES-256 is industry best practice
  • Performance: Minimal overhead due to hardware acceleration
  • Auditability: Encryption operations logged for compliance
  • Key Management: Proper key lifecycle management built-in

Next Steps