Skip to main content

Database vs Data Warehouse vs Data Lake

Database

  • Designed to capture and record data through OLTP (online transactional process)
  • Live, real-time
  • Stored in tables (with rows and columns)
  • Highly detailed
  • Flexible schema (how data is organized)

Data Warehouse

  • Designed for analytical processing through OLAP (online analytical processing)
  • Data is refreshed from source systems - (stores current and historical) through ETL process
    • Data is summarized
  • Might not always have current data
  • Rigid schema

DB vs Data Warehouse

DatabaseDatawarehouse
Designed for transactionsDesigned for analytics / reporting
Fresh and detailedRefreshed periodically, can be summarised
Slow for querying large amt of data => slow down transactionsWon't interfere with process => generally faster

Data Lake

  • Designed for capturing raw data (structured, semi-structured, unstructured)
  • Made for large amounts of data
  • Used for ML and AI in its current state / for analytics with procsesing
  • Can organize and put into DBs / Datawarehouses