Skip to main content

🗓️ 31012025 1548
📎 #wip

hologres

ABSTRACT

Hologres is a unified real-time data warehouse built by Alibaba Cloud for handling large-scale data processing and analysis in real time.

Core Capabilities

  1. Supports standard SQL

Fully compatible with PostgreSQL, which allows reuse of PostgreSQL-based tools and queries.

  1. Real-Time Data Warehousing

Designed to ingest, update, and analyze petabytes of data with low latency.

  1. OLAP & Ad Hoc Queries

Optimized for analytical processing and fine-grained, on-the-fly data queries.

  1. Deep Integration with Alibaba Cloud Ecosystem

Works seamlessly with MaxCompute, Flink, Spark, and DataWorks to offer full-stack data processing

NOTE

Key use cases:

  • Real-time analytics, marketing profiling, audience grouping
  • Fine-grained analysis and self-service BI
  • Real-time risk management and ID mapping

Key Features Summary

Flexible Querying and Analysis

Support for Multiple Storage Models

depending on the query and performance requirements.

Massively Parallel Processing (MPP)

SQL statements are distributed across clusters for faster execution and resource efficiency.

Optimized for Diverse Query Types

Handles simple, complex, and ad hoc queries efficiently.

Sub-Second Interactive Analysis

Parallel Computing via MPP Architecture

Utilizes scalable, parallel query execution.

Vectorization and AliORC Format

Optimized storage and I/O throughput for large-scale SSD-based storage.

High-Performance Point Queries

Primary Key Indexing

Enables high-throughput, low-latency queries for operations like ID lookups and dimension table joins.

Performance Optimization

Offers up to 10x performance improvements over open-source alternatives.

Federated Queries and Data Lake Acceleration

Integrates with MaxCompute and OSS

allowing you to accelerate queries by 5-10x through optimized metadata imports and external tables.

Supports real-time synchronization of millions of rows per second from MaxCompute

Semi-Structured Data Analysis

Native JSON/JSONB Support

Efficient storage and querying of JSON data similar to native column stores

Real-Time High-Concurrency Data Writes

Seamless integration with Flink and Spark

High-throughput real-time ingestion

Supports real-time updates with atomicity and isolation guarantees

Event-Driven Development

Binary Log Parsing

End-to-end real-time pipelines via Flink consumption of update events for minimal data latency

Real-Time Materialized Views

Automatically updates aggregate views immediately after source data changes

enabling real-time aggregations.

Enterprise Features

Security and Governance

  • Fine-Grained Access Control: Integration with RAM, STS, and BYOK encryption.
  • PCI DSS Certified: Ensures compliance for sensitive data handling.

Load Isolation

  • Supports multiple compute instances in isolated modes to balance workloads.
  • Data is stored with triplicate redundancy for high reliability.

Self-O&M Capabilities

  • Diagnostics and Query History: Built-in monitoring to identify bottlenecks and optimize queries.

Ecosystem and Integration

PostgreSQL Compatibility

  • Native JDBC/ODBC connectivity, compatible with BI tools like Tableau and Quick BI.
  • Supports GIS and spatial extensions.

Seamless DataWorks Integration

  • Facilitates visualized data warehouse construction and lineage management.

Hadoop Integration

  • Supports Hive and Spark connectors, along with Delta Lake and Hudi formats.

AI Integration

  • Built-in Vector Search Engine (Proxima):
    Supports real-time feature storage, vector retrievals, and ML-based queries via integration with Alibaba Cloud’s PAI

References