🗓️ 05092025 1514
🎯 Core Configuration Fields
cluster_name (String)
- Required: Yes
 - When to use: Always set a descriptive name
 - Best practices: Include environment, team, purpose (e.g., "prod-etl-daily", "dev-analytics")
 - Limits: Max 100 characters
 
spark_version (String)
- Required: Yes
 - When to use LTS: Production workloads (stability)
 - When to use latest: Development, new features
 - Avoid:
- photon- versions (use runtime_engine instead)
 
 - Reference: Databricks Runtime Versions
 
node_type_id
- Required: Yes
 - General Purpose (i3): Balanced CPU/memory, development
 - Memory Optimized (r5): ETL, analytics, large datasets
 - Compute Optimized (c5): CPU-intensive, streaming
 - Reference: AWS Instance Types
 
⚖️ Scaling Configuration
autoscale
- When to use: Variable workloads, cost optimization
 - Development: 1-3 workers
 - Production ETL: 2-10 workers
 - Analytics: 3-20 workers
 - Mutually exclusive with: num_workers
 
num_workers
- When to use: Consistent workloads, ML training
 - ML/Training: Fixed size for stability
 - Streaming: Fixed size for predictable performance
 - Mutually exclusive with: autoscale
 
💰 Cost Optimization
autotermination_minutes (Long)
- Required: Highly recommended
 - Range: 10-10000 minutes
 - Development: 15-30 minutes
 - Production: 60-120 minutes
 
aws_attributes (AwsAttributes)
- SPOT_WITH_FALLBACK: 60% savings, production-safe
 - SPOT: Maximum savings, dev/testing only
 - ON_DEMAND: Highest reliability, mission-critical
 - Reference: Spot Instance Best Practices
 
🚀 Performance Configuration
runtime_engine (RuntimeEngine)
- PHOTON: SQL workloads, ETL, analytics (3x faster)
 - STANDARD: ML, streaming, general compute
 - Cost: Photon adds ~20% premium but 3x performance
 - Reference: Photon Engine
 
spark_conf
- Always enable: Adaptive Query Execution (AQE)
 - ETL workloads: Enable Delta optimizations
 - Large datasets: Tune partition settings
 - Reference: Spark Configuration
 
🔒 Security & Governance
data_security_mode
- SINGLE_USER: Production, highest security, Unity Catalog
 - USER_ISOLATION: Shared clusters, user separation
 - NONE: Legacy, not recommended for new clusters
 - Reference: Data Security Modes
 
enable_local_disk_encryption
- true: Production, compliance requirements
 - false: Development, testing only
 
🏷️ Resource Management
custom_tags
- Required for: Cost tracking, resource management
 - Limit: 45 custom tags max
 - Best practices: Environment, Team, Project, CostCenter
 
policy_id
- When to use: Enforce organizational standards
 - Governance: Restrict instance types, regions, settings
 - Cost control: Limit expensive configurations
 
🐳 Container & Advanced Options
docker_image
- When to use: Custom libraries, golden images
 - Benefits: Consistent environments, faster startup
 - Reference: Databricks Container Services
 
init_scripts
- When to use: Custom software installation, configuration
 - Execution: Sequential order, before Spark starts