AI-Powered File Analysis
How CodeDD's multi-agent AI system analyzes your code
AI-Powered File Analysis
Overview
CodeDD's AI analysis stage is where the intelligence happens. Using specialized AI agents, every selected file is analyzed for architecture, security, technical debt, and qualityβproviding insights that traditional static analysis tools miss.
What Makes It Different
Beyond Traditional SAST
Traditional Static Analysis:
- Pattern matching against known vulnerabilities
- Limited to syntactic analysis
- High false positive rates
- Misses architectural issues
CodeDD's AI Analysis:
- Context-Aware: Understands business logic across files
- Intent-Based: Identifies what developers tried to do vs. what code actually does
- Architecture-Aware: Recognizes design patterns and anti-patterns
- Multi-Agent Verification: Cross-validates findings to eliminate false positives
Analysis Architecture
Concurrent Processing Pipeline
CodeDD processes files using a sophisticated concurrent architecture:
Stage 1: AI Auditing (Concurrent)
- 10 concurrent AI agent workers
- Rate-limited to respect API quotas (100 calls/minute)
- In-memory processing for speed
- Files queued for analysis
Stage 2: Batch Database Writes (Sequential)
- Single-threaded database writer
- Processes results in batches of 5
- Prevents database lock contention
- Ensures data consistency
Why This Architecture:
- High Throughput: AI analysis is CPU-intensive; parallelization maximizes speed
- Data Integrity: Sequential writes avoid database conflicts
- Graceful Degradation: Failed files don't block others
- Resource Efficient: Balanced CPU and I/O utilization
What Gets Analyzed
Source Code Files
Application Code:
- Business logic and algorithms
- API endpoints and controllers
- Database queries and ORM usage
- Authentication and authorization logic
- Data validation and sanitization
- Error handling patterns
Infrastructure Code:
- Dockerfile and container configurations
- Kubernetes manifests and Helm charts
- CI/CD pipeline definitions
- Infrastructure-as-Code (Terraform, CloudFormation)
Configuration Files:
- Application configuration
- Environment variables
- Feature flags
- Third-party integrations
File Selection
Not all files require deep AI analysis. CodeDD intelligently selects files based on:
Priority Criteria:
- Files marked as
selected_for_audit: truein database - Security-critical files (authentication, authorization)
- Core business logic
- Recently modified files
- High complexity files
- Configuration files with security implications
Excluded from AI Analysis:
- Auto-generated code
- Minified files
- Binary files
- Very large files (>50k LOC) may be sampled
- Test fixtures and mock data
AI Agent Capabilities
Code Understanding
Semantic Analysis:
- Understands code intent beyond syntax
- Identifies logical errors and edge cases
- Recognizes business logic vulnerabilities
- Detects subtle security issues
Cross-File Context:
- Tracks data flow across files
- Identifies missing validations
- Recognizes architectural patterns
- Detects inconsistent implementations
Issue Detection
Security Vulnerabilities:
- SQL injection points
- Cross-site scripting (XSS) vectors
- Authentication bypasses
- Authorization flaws
- Insecure cryptography
- Secrets in code
- Sensitive data exposure
Architecture Issues:
- Tight coupling
- Circular dependencies
- God classes/objects
- Missing abstraction layers
- Inconsistent patterns
Technical Debt:
- Code duplication
- Overly complex functions
- Missing error handling
- Inadequate logging
- Dead code
Quality Issues:
- Poor naming conventions
- Missing documentation
- Inconsistent style
- Magic numbers
- Hard-coded values
Confidence Scoring
Every finding includes a confidence score:
High Confidence (90-100%):
- Clear, unambiguous issues
- Verified by multiple checks
- Well-established patterns
Medium Confidence (70-89%):
- Likely issues requiring context
- May have legitimate use cases
- Recommend manual review
Low Confidence (50-69%):
- Potential issues
- Requires domain expertise to validate
- Informational findings
Example AI Analysis Output
π Example: Excellent File Analysis
Below is an example of what CodeDD's AI analysis produces for a typical file:
Overview
- Script Purpose: Authentication middleware handling JWT token validation for API requests
- Domain: Backend Security
- Recommendation: Consider implementing rate limiting for failed authentication attempts and adding token refresh mechanism
Code Quality Assessment π’ Excellent
- Readability: Highly readable π’
- Consistency: Highly consistent π’
- Modularity: Excellent π’
- Maintainability: High π’
- Reusability: High π’
- Technical Debt: Low π‘
- Code Smells: None π’
- Redundancy: No redundancies π’
Functionality Analysis π’ Strong
- Completeness: Fully functional π’
- Edge Cases: Excellently covered π’
- Error Handling: Robust π’
Performance & Scalability π‘ Good
- Efficiency: High π’
- Scalability: Moderate π‘
- Resource Utilization: Optimal π’
- Parallel Processing: Partially supported π‘
- Database Interaction: Optimized π’
- Concurrency Management: Adequate π‘
Security Analysis π’ Strong
- Input Validation: Strong π’
- Data Handling: Secure π’
- Authentication: Robust π’
- Flag Status: Green π’
- Security Concerns: No critical issues identified
Compatibility π’ Excellent
- Platform Independence: Multi-platform π’
- Integration: Seamless π’
Documentation π‘ Adequate
- Inline Comments: Adequate π‘
Standards & Best Practices π’ Strong
- Standards Compliance: Fully compliant π’
- Design Patterns: Extensive use of patterns π’
- Code Complexity: Low π’
- Refactoring Opportunities: Few opportunities π’
Legend:
- π’ Green (90-100): Excellent, no action needed
- π‘ Yellow (66-89): Good, minor improvements recommended
- π Orange (33-65): Moderate issues, attention needed
- π΄ Red (0-32): Critical issues, immediate action required
β οΈ Example: File with Critical Issues
Here's an example analysis of a file requiring immediate attention:
Overview
- Script Purpose: Payment processing handler for credit card transactions
- Domain: Backend Payment Processing
- Recommendation: URGENT - Fix SQL injection vulnerability and add input validation before production deployment
Code Quality Assessment π΄ Poor
- Readability: Moderately readable π‘
- Consistency: Somewhat inconsistent π‘
- Modularity: Poor π΄
- Maintainability: Low π΄
- Reusability: Low π΄
- Technical Debt: High π΄
- Code Smells: High π΄
- Redundancy: High redundancy π΄
Functionality Analysis π Moderate
- Completeness: Partially functional π
- Edge Cases: Poorly covered π΄
- Error Handling: Poor π΄
Performance & Scalability π Needs Improvement
- Efficiency: Average π‘
- Scalability: Not scalable π΄
- Resource Utilization: Excessive π΄
- Database Interaction: Inefficient π΄
Security Analysis π΄ Critical Issues
- Input Validation: Weak π΄
- Data Handling: Insecure π΄
- Authentication: Non-existent π΄
- Flag Status: Red π΄
- Security Concerns:
- SQL injection vulnerability detected (line 45)
- Hardcoded credentials found (line 12)
- Sensitive data logged in plaintext (line 78)
- No authentication on payment endpoint
Compatibility π‘ Limited
- Platform Independence: Limited platforms π‘
- Integration: Requires workarounds π‘
Documentation π΄ Insufficient
- Inline Comments: None π΄
Standards & Best Practices π΄ Non-Compliant
- Standards Compliance: Non-compliant π΄
- Design Patterns: None π΄
- Code Complexity: High π΄
- Refactoring Opportunities: Many π’
AI-Detected Vulnerabilities:
- Critical: SQL injection in payment query (Confidence: 95%)
- Critical: Hardcoded database credentials (Confidence: 100%)
- High: Sensitive payment data logged (Confidence: 92%)
- High: Missing authentication on public endpoint (Confidence: 98%)
- Medium: Excessive database connections (Confidence: 85%)
Recommended Actions:
- Immediate: Remove hardcoded credentials, use environment variables
- Immediate: Implement parameterized queries to prevent SQL injection
- Urgent: Add authentication/authorization to payment endpoints
- Urgent: Remove sensitive data from logs, implement proper logging
- High Priority: Refactor for better error handling and modularity
Concurrent Analysis Process
Step 1: File Decryption
Files are decrypted on-the-fly:
- Read encrypted file from disk
- Decrypt in memory using audit-specific key
- Content never written back to disk unencrypted
- Memory cleared after analysis
Step 2: AI Processing
Each file is sent to AI agent:
- File content and metadata provided
- Context about file type and purpose
- Previous findings for consistency
- Language-specific analysis rules
AI Agent Response Time:
- Simple files: 2-5 seconds
- Complex files: 10-30 seconds
- Rate limiting ensures stability
Step 3: Result Validation
AI findings are validated:
- Schema validation (ensures proper structure)
- Confidence score calculation
- Deduplication against previous findings
- Priority assignment
Step 4: Batch Storage
Results are queued and written in batches:
- 5 files per database transaction
- Each file isolated in transaction
- Failed writes don't affect other files
- Automatic retry with exponential backoff
SonarQube Integration (Optional)
Complementary Analysis
CodeDD can integrate with SonarQube for additional coverage:
When Enabled:
- Selected files decrypted to temporary workspace
- SonarScanner runs in isolated Docker container
- Results fetched via SonarQube API
- Issues stored alongside AI findings
- Temporary workspace securely deleted
What SonarQube Adds:
- Language-specific linting rules
- Code smell detection
- Cyclomatic complexity
- Duplication detection
- Security hotspots
Architecture:
Docker Container: SonarScanner
βββ Mounted: Temporary workspace (encrypted)
βββ Network: Isolated (codedd-network)
βββ Execution: Scan all files
βββ Output: CE Task ID
β
Poll SonarQube API
βββ Wait for analysis completion
βββ Fetch issues (paginated)
βββ Map issues to original file paths
βββ Store in TypeDB
Security:
- SonarQube runs in isolated network
- Source code never leaves CodeDD infrastructure
- Results linked to encrypted file references
- Container destroyed after analysis
Dependency Analysis
Package Discovery
For each file, CodeDD identifies dependencies:
Languages Supported:
- JavaScript/TypeScript: package.json, npm, yarn
- Python: requirements.txt, pipenv, poetry
- Java: pom.xml, build.gradle
- Ruby: Gemfile
- Go: go.mod
- PHP: composer.json
- Rust: Cargo.toml
- .NET: packages.config, .csproj
Vulnerability Scanning
Dependencies are checked against:
- CVE Database: Known vulnerabilities
- NVD: National Vulnerability Database
- GitHub Advisory: Security advisories
- OSV: Open Source Vulnerabilities
For Each Dependency:
- Version detection
- Vulnerability lookup
- Severity assessment (Critical, High, Medium, Low)
- Patch availability
- Upgrade recommendations
License Compliance
Dependencies checked for license issues:
- License type identification
- GPL/copyleft detection
- Commercial license conflicts
- Missing license warnings
Cyclomatic Complexity
Measuring Code Complexity
CodeDD calculates cyclomatic complexity for functions:
What It Measures:
- Number of independent paths through code
- Branching complexity (if/else, switch, loops)
- Error handling paths
Thresholds:
- 1-10: Simple, easy to maintain
- 11-20: Moderate complexity
- 21-50: High complexity, refactoring recommended
- 50+: Very high, significant technical debt
Why It Matters:
- High complexity = higher bug likelihood
- Harder to test thoroughly
- Difficult to maintain and modify
- Risk indicator for investors
Performance & Scale
Processing Speed
Typical Performance:
- Small Repos (<100 files): 5-10 minutes
- Medium Repos (100-1,000 files): 15-45 minutes
- Large Repos (1,000-10,000 files): 1-3 hours
- Very Large (10,000+ files): 3-8 hours
Factors Affecting Speed:
- File count and size
- API rate limits
- Network latency
- Database write speed
- Concurrent workers available
Resource Management
Memory Usage:
- Streaming file processing
- Results written incrementally
- Memory cleared after each file
- Typical peak: 2-4GB
CPU Utilization:
- 10 concurrent AI workers
- Additional threads for I/O
- Scales with available cores
Error Handling & Resilience
Graceful Degradation
File-Level Failures:
- Individual file errors logged
- Other files continue processing
- Partial audit results still valuable
Common Scenarios:
- API Timeouts: Automatic retry (3 attempts)
- Malformed Code: Logged, skipped, audit continues
- Rate Limiting: Automatic backoff and retry
- Database Errors: Transaction rollback, retry batch
Success Metrics
Audit is considered successful if:
- >80% of files analyzed successfully
- Critical files (security, config) analyzed
- Dependency scan completed
- Results stored in database
Data Privacy
Zero Code Retention
Critical Security Guarantee:
After AI analysis:
- File content never stored in database
- AI API calls use ephemeral processing
- Results stored (findings only, not code)
- Original files remain encrypted
- Memory cleared after analysis
What We Store:
- File paths and metrics
- AI findings and recommendations
- Dependency lists
- Vulnerability details
- Complexity scores
What We Never Store:
- Actual source code
- Secrets or credentials (if detected, flagged but not stored)
- PII from code comments
- Business logic implementation details
Key Takeaways
For Investors:
- AI analysis identifies risks traditional tools miss
- Confidence scoring helps prioritize findings
- Dependency vulnerabilities quantified
- Technical debt measured objectively
For CTOs:
- Deep semantic analysis, not just pattern matching
- Architecture issues identified early
- Actionable remediation guidance
- Benchmarking against industry standards
Next Steps
- Learn about Cross-File Contextualization
- Understand Audit Consolidation
- Review Data Encryption

