hologres_sql_optimization
🗓️ 03022025 1110
📎
Basic principles
ABSTRACT
Reduce I/O resource consumption and optimize concurrency
Configuring appropriate indexes
- If you configure appropriate indexes for data distribution when you create a table, the data that you want to query can be quickly located after an SQL statement is executed
- This reduces I/O and computing resource consumption and accelerates queries
- Balanced data distribution also helps increase the utilization of concurrent resources and prevent single-node bottlenecks
- The following figure shows the execution process from executing an SQL statement to obtaining data
Partition pruning
- If you perform an SQL query on a partitioned table, the system locates the partition in which the required data is located by using the partition pruning feature
- If no partition is matched based on query conditions, the system needs to traverse all partitions, which consumes excessive I/O resources
- In most cases, partitioning by day is more appropriate
- For a non-partitioned table, partition pruning is not involved
hologres_shard pruning
- You can use a hologres_distribution_key to quickly locate the shard in which the desired data is located
- This reduces the resources that are required by a single SQL statement and supports a high throughput if multiple SQL statements are executed at the same time
- If the specific shard cannot be located, the system schedules all shards for computing based on the distributed framework
- In this case, a single SQL statement is executed on multiple shards, which consumes a large amount of resources and decreases the overall concurrency of the system
- Extra shuffle overhead is incurred if specific operators are executed in a centralized manner
- In most cases, you can configure fields whose values are evenly distributed as distribution key fields, such as the order ID, user ID, and event ID fields
- We recommend that you configure the same distribution key for the tables that you want to join to allow correlative data to be distributed to the same shard
- This helps achieve high efficiency in local join operations
hologres_segment_key pruning
- You can use an event time column to quickly locate the file in which the required data is located from multiple files on a node
- This eliminates the need to access other files
- If the filtering fails, the system needs to traverse all files
hologres_clustering_key pruning
- You can use the clustering key to quickly locate the desired data range in a file
- This helps improve the efficiency of range queries and field sorting