Best practises

note

🚓 - Mandatory
💪🏻 - Recommended

Service Division

Prohibit direct database access across services.
Data interaction across services must be done through API calls (or develop your own Flink tasks for data synchronization
- This is advised if relying on historical data for large-scale processing is too costly in MySQL
- Data team can provide support

Database Selection

Choose the appropriate type of database based on the usage scenario.
Particularly, the use of Elasticsearch and Linodorm must match real application scenarios.

Internal Service Table Model Design

Use the correct storage engine
- e.g. InnoDB recommended for MySQL
Table design should be simple
- Do not store multi-dimensional data in one table
  - Select the suitable table design model (1:1, 1:n, n:1) based on specific business scenarios
- Field design should be straightforward, with each column representing an independent attribute
- Store data clearly, simply, and independently
- Avoid storing similar types of data in multiple places.
- Do not store different types of data in the same table.
- A table should not have too many fields, generally no more than 20. Consider splitting the table if there are too many fields.
- Fields in the same table should be related. Split unrelated tables.
- For 1:n, n:n table design models, ensure database consistency at the code level. Avoid using foreign keys in the database (due to performance issues and concurrency deadlocks).
Tables must have a primary key (auto-increment, composite primary keys not recommended)
Avoid Redundant Data Storage
- 🚓 Strictly adhere to the first normal form when creating tables (ensure atomicity of each column) to avoid duplicate data.
- 💪🏻 Tables should meet the second normal form (ensure every column in the table is related to the primary key) to avoid partial dependencies.
- 💪🏻 Tables should meet the third normal form (ensure each column is directly related to the primary key, not indirectly) to avoid transitive dependencies.
Consider consistency and latency when setting up redundant fields for performance or usability reasons
- Denormalized field design can be considered if there is a significant improvement in performance
- Consider denormalized design if redundant fields facilitate storage
- However, prioritize data consistency and acceptable latency in denormalized designs
Avoid Storing the Same Data in Multiple Places
- Developers should be clear about whether they are storing data from other internal services to avoid duplication
- Determine a unique external data source (such as IP) to ensure data accuracy. If uncertain, submit to the data committee for discussion
- Data warehouses should check and review based on actual usage

Field Standards

🚓 Tables, fields, etc., must have necessary English comments. If a field's numeric values represent different meanings, it should be clearly explained in the comments
🚓 All columns storing the same data must have consistent names and types
- user_id fields in different tables must have the same type
🚓 Use TIMESTAMP for storing millisecond-level time, not int type or strings
- (Because they lack automatic update attributes)
🚓 Prefer the smallest data type suitable for your storage needs
- e.g., prefer varchar over char
🚓 Allocate field lengths as per actual needs; avoid arbitrarily large capacities
💪🏻 Define all fields as NOT NULL
💪🏻 Use DECIMAL for storing precise floating-point numbers
- float can sometimes lead to precision issues.
💪🏻 Use large fields like blob, text sparingly
💪🏻 Allocate field lengths as per actual needs
- avoid arbitrarily large capacities.
💪🏻 Keep the number of fields in core tables as few as possible; consider splitting large fields
💪🏻 Use UNSIGNED for storing non-negative integers
- The same byte size can store a larger range of values
💪🏻 Use VARBINARY for storing case-sensitive variable-length strings or binary content
💪🏻 For VARCHAR(N), keep N as small as possible,
- MySQL limits the total length of all VARCHAR fields in a table to 65535 bytes
- Memory operations like sorting and creating temporary tables use the length of N for memory allocation
💪🏻 Store large fields and less frequently accessed fields in separate tables to separate cold and hot data
💪🏻 Ideally, keep the number of columns in a single table below 50
💪🏻 Default to creating timestamp fields for creation and modification times

Index Standards

🚓 Tables must have a primary key, which creates a primary key index
- The primary key column should not be frequently updated and preferably not be a string type
- Default to using a non-null and unique column
- Ideally, the primary key should be auto_increment and not null
🚓 Newly created unique indexes must not duplicate the primary key
🚓 Choose columns with high distinctiveness as indexes
- Avoid creating indexes on attributes that are frequently updated or have low distinctiveness
🚓 Anticipate SQL queries and create indexes in advance
- Especially time field indexes for data warehouse pulls
  - Advised to add indexes for creation time and modification time
🚓 Avoid performing mathematical and function operations on indexed columns
- Prevents the index from being filtered and unused
🚓 When creating composite indexes, place columns with high distinctiveness first
🚓 Ensure indexed fields are not NULL, considering default values
- NULL occupies space and significantly impacts the efficiency of index queries
🚓 Avoid creating an individual index for every column in a table
💪🏻 Limit the number of fields in a single index to 5 and the total number of indexes on a table to 5
- Index design should follow the B+ Tree index leftmost prefix match principle
💪🏻 Avoid indexing large fields like blob/text, as they consume excessive storage space
💪🏻 Use prefix indexing for strings, with the prefix index length not exceeding 128 characters
- Prefer prefix indexes and consider adding virtual columns for indexing if necessary.
💪🏻 Ensure created indexes cover 80% of the main queries. Focus on solving the primary issues rather than covering all possibilities.
💪🏻 Use unique indexes wherever possible to improve query efficiency.
💪🏻 For repetitive SQL statements with multiple fields, modify the condition fields' order and create a combined index to reduce the number of indexes.
💪🏻 Create appropriate indexes for DML, order by, and group by fields.
💪🏻 Important SQL must be indexed, with priority given to covering indexes, especially for fields involved in multi-table joins.
💪🏻 Avoid implicit index type conversion.
💪🏻 Avoid redundant indexes (do not add unnecessary indexes).
💪🏻 Avoid indexing frequently used small tables.
💪🏻 Minimize the use of foreign keys
- Implement referential integrity at the business end for more efficient indexing
- Explanation: Avoid operations on parent and child tables affecting each other, reducing usability

Database Development Standards

🚓 Do not store large data such as images and files in the database
🚓 It is recommended to keep the data volume of a single table below 20 million rows
- Note that this is not a MySQL database limitation, but large sizes pose significant challenges in modifying table structures, backups, and recovery
- MySQL does not limit storage, which depends on storage settings and file systems
🚓 Use InnoDB as the storage engine.
🚓 Table creation requires review and approval by the development team leader (TL)
🚓 Perform major table modifications during low business traffic periods
💪🏻 Use utf8mb4 as the database character set
💪🏻 Avoid using stored procedures, triggers, functions, etc.
- Can lead to tight coupling with the database
💪🏻 Avoid using JOINs with large tables
- MySQL's optimizer has overly simplistic strategies for optimizing joins
💪🏻 Avoid performing calculations in the database
- Complex computations should be moved to the business application.
💪🏻 Refuse large SQL statements, large transactions, and large batches
- Should be handled at application layer
💪🏻 When designing databases, consider whether future scalability has been taken into account.
💪🏻 Combine SQL statements, especially in DML, by merging multiple values to reduce interactions with the database.
💪🏻 Avoid negative queries such as 'not in' or 'like', as they can lead to full table scans and reduce buffer pool utilization.
💪🏻 Use 'union all' instead of 'union', as 'union' involves sorting operations for deduplication, leading to lower efficiency.
💪🏻 Avoid subqueries; it's advisable to convert them into join queries.
💪🏻 Avoid cross-database queries.
💪🏻 Implement reasonable pagination, especially for large pagination.
💪🏻 If UPDATE and DELETE statements do not use LIMIT and involve large data volumes, it can impact database performance and cause master-slave delays, possibly leading to inconsistencies.
💪🏻 Be cautious when deleting database objects (including tables, indexes, fields, etc.).
💪🏻 Use internal domain names instead of IPs to connect to databases.
💪🏻 Consider horizontal partitioning of log-type tables based on creation time and periodically archive historical data.

Naming Conventions

🚓 The permissible character range for names of tables, fields, etc., includes A-Z, a-z, 0-9, and underscore (_)
- No other characters are allowed
- Principally, table names should not include numbers, and field names should not include numbers unless in special cases
🚓 Database names, table names, and field names should be lowercase, follow the underscore style, not exceed 32 characters, be meaningful and intuitive, and preferably use nouns rather than verbs
- The names should be related to business or product lines, and mixing of pinyin and English is prohibited
🚓 Naming format for ordinary indexes: idx_tableName_indexFieldName (if there are multiple indexes starting with the first field name, add the second field name; abbreviate if too long)
- Unique index naming format: uk_tableName_indexFieldName (index names must be all lowercase, abbreviate if too long)
- Primary key index naming: pk_fieldName
- Sequence naming format: seq_tablename
🚓 Database names, table names, and field names must not use MySQL reserved words, such as desc, range, match, asc, index, etc.
🚓 Temporary table names must start with 'tmp' and end with a date suffix.
🚓 Backup table names must start with 'bak' and end with a date suffix.
🚓 Fields expressing a yes/no concept should be named using the format is_xxx.
💪🏻 Tables within the same module should use the same prefix as much as possible, and table names should be meaningful
- for instance, all log tables starting with 'log_'
💪🏻 Foreign key fields (if any) should be named using the format tableName_fieldName to represent their relationships
💪🏻 Fields with the same meaning across different tables should have the same name
- For fields with the same meaning across tables, use the naming format tableName_fieldName, omitting the module prefix

Security Standards

🚓 Prohibit storing plaintext passwords in the database; passwords must be encrypted before storage.
🚓 Developers are forbidden from directly exporting or querying data involving user-sensitive information for business colleagues. If necessary, such actions require approval from senior leadership.
🚓 Operations involving sensitive data interaction with the database must have audit logs, and alerts should be set up when necessary.
🚓 Implement a whitelist feature for IP addresses connecting to the database to prevent unauthorized IP access.
💪🏻 Regularly change usernames and passwords for online database connections.
💪🏻 Prohibit the plaintext storage of user-sensitive information, such as phone numbers, in the database

Service Division​

Database Selection​

Internal Service Table Model Design​

Field Standards​

Index Standards​

Database Development Standards​

Naming Conventions​

Security Standards​

Service Division

Database Selection

Internal Service Table Model Design

Field Standards

Index Standards

Database Development Standards

Naming Conventions

Security Standards