Best practises
note
- 🚓 - Mandatory
- 💪🏻 - Recommended
Service Division
- Prohibit direct database access across services.
- Data interaction across services must be done through API calls (or develop your own Flink tasks for data synchronization
- This is advised if relying on historical data for large-scale processing is too costly in MySQL
- Data team can provide support
Database Selection
- Choose the appropriate type of database based on the usage scenario.
- Particularly, the use of Elasticsearch and Linodorm must match real application scenarios.
Internal Service Table Model Design
- Use the correct storage engine
- e.g. InnoDB recommended for MySQL
- Table design should be simple
- Do not store multi-dimensional data in one table
- Select the suitable table design model (1:1, 1:n, n:1) based on specific business scenarios
- Field design should be straightforward, with each column representing an independent attribute
- Store data clearly, simply, and independently
- Avoid storing similar types of data in multiple places.
- Do not store different types of data in the same table.
- A table should not have too many fields, generally no more than 20. Consider splitting the table if there are too many fields.
- Fields in the same table should be related. Split unrelated tables.
- For 1:n, n:n table design models, ensure database consistency at the code level. Avoid using foreign keys in the database (due to performance issues and concurrency deadlocks).
- Do not store multi-dimensional data in one table
- Tables must have a primary key (auto-increment, composite primary keys not recommended)
- Avoid Redundant Data Storage
- 🚓 Strictly adhere to the first normal form when creating tables (ensure atomicity of each column) to avoid duplicate data.
- 💪🏻 Tables should meet the second normal form (ensure every column in the table is related to the primary key) to avoid partial dependencies.
- 💪🏻 Tables should meet the third normal form (ensure each column is directly related to the primary key, not indirectly) to avoid transitive dependencies.
- Consider consistency and latency when setting up redundant fields for performance or usability reasons
- Denormalized field design can be considered if there is a significant improvement in performance
- Consider denormalized design if redundant fields facilitate storage
- However, prioritize data consistency and acceptable latency in denormalized designs
- Avoid Storing the Same Data in Multiple Places
- Developers should be clear about whether they are storing data from other internal services to avoid duplication
- Determine a unique external data source (such as IP) to ensure data accuracy. If uncertain, submit to the data committee for discussion
- Data warehouses should check and review based on actual usage
Field Standards
- 🚓 Tables, fields, etc., must have necessary English comments. If a field's numeric values represent different meanings, it should be clearly explained in the comments
- 🚓 All columns storing the same data must have consistent names and types
- user_id fields in different tables must have the same type
- 🚓 Use TIMESTAMP for storing millisecond-level time, not int type or strings
- (Because they lack automatic update attributes)
- 🚓 Prefer the smallest data type suitable for your storage needs
- e.g., prefer varchar over char
- 🚓 Allocate field lengths as per actual needs; avoid arbitrarily large capacities
- 💪🏻 Define all fields as NOT NULL
- 💪🏻 Use DECIMAL for storing precise floating-point numbers
- float can sometimes lead to precision issues.
- 💪🏻 Use large fields like blob, text sparingly
- 💪🏻 Allocate field lengths as per actual needs
- avoid arbitrarily large capacities.
- 💪🏻 Keep the number of fields in core tables as few as possible; consider splitting large fields
- 💪🏻 Use UNSIGNED for storing non-negative integers
- The same byte size can store a larger range of values
- 💪🏻 Use VARBINARY for storing case-sensitive variable-length strings or binary content
- 💪🏻 For VARCHAR(N), keep N as small as possible,
- MySQL limits the total length of all VARCHAR fields in a table to 65535 bytes
- Memory operations like sorting and creating temporary tables use the length of N for memory allocation
- 💪🏻 Store large fields and less frequently accessed fields in separate tables to separate cold and hot data
- 💪🏻 Ideally, keep the number of columns in a single table below 50
- 💪🏻 Default to creating timestamp fields for creation and modification times
Index Standards
- 🚓 Tables must have a primary key, which creates a primary key index
- The primary key column should not be frequently updated and preferably not be a string type
- Default to using a non-null and unique column
- Ideally, the primary key should be auto_increment and not null
- 🚓 Newly created unique indexes must not duplicate the primary key
- 🚓 Choose columns with high distinctiveness as indexes
- Avoid creating indexes on attributes that are frequently updated or have low distinctiveness
- 🚓 Anticipate SQL queries and create indexes in advance
- Especially time field indexes for data warehouse pulls
- Advised to add indexes for creation time and modification time
- Especially time field indexes for data warehouse pulls
- 🚓 Avoid performing mathematical and function operations on indexed columns
- Prevents the index from being filtered and unused
- 🚓 When creating composite indexes, place columns with high distinctiveness first
- 🚓 Ensure indexed fields are not NULL, considering default values
- NULL occupies space and significantly impacts the efficiency of index queries
- 🚓 Avoid creating an individual index for every column in a table
- 💪🏻 Limit the number of fields in a single index to 5 and the total number of indexes on a table to 5
- Index design should follow the B+ Tree index leftmost prefix match principle
- 💪🏻 Avoid indexing large fields like blob/text, as they consume excessive storage space
- 💪🏻 Use prefix indexing for strings, with the prefix index length not exceeding 128 characters
- Prefer prefix indexes and consider adding virtual columns for indexing if necessary.
- 💪🏻 Ensure created indexes cover 80% of the main queries. Focus on solving the primary issues rather than covering all possibilities.
- 💪🏻 Use unique indexes wherever possible to improve query efficiency.
- 💪🏻 For repetitive SQL statements with multiple fields, modify the condition fields' order and create a combined index to reduce the number of indexes.
- 💪🏻 Create appropriate indexes for DML, order by, and group by fields.
- 💪🏻 Important SQL must be indexed, with priority given to covering indexes, especially for fields involved in multi-table joins.
- 💪🏻 Avoid implicit index type conversion.
- 💪🏻 Avoid redundant indexes (do not add unnecessary indexes).
- 💪🏻 Avoid indexing frequently used small tables.
- 💪🏻 Minimize the use of foreign keys
- Implement referential integrity at the business end for more efficient indexing
- Explanation: Avoid operations on parent and child tables affecting each other, reducing usability
Database Development Standards
- 🚓 Do not store large data such as images and files in the database
- 🚓 It is recommended to keep the data volume of a single table below 20 million rows
- Note that this is not a MySQL database limitation, but large sizes pose significant challenges in modifying table structures, backups, and recovery
- MySQL does not limit storage, which depends on storage settings and file systems
- 🚓 Use InnoDB as the storage engine.
- 🚓 Table creation requires review and approval by the development team leader (TL)
- 🚓 Perform major table modifications during low business traffic periods
- 💪🏻 Use utf8mb4 as the database character set
- 💪🏻 Avoid using stored procedures, triggers, functions, etc.
- Can lead to tight coupling with the database
- 💪🏻 Avoid using JOINs with large tables
- MySQL's optimizer has overly simplistic strategies for optimizing joins
- 💪🏻 Avoid performing calculations in the database
- Complex computations should be moved to the business application.
- 💪🏻 Refuse large SQL statements, large transactions, and large batches
- Should be handled at application layer
- 💪🏻 When designing databases, consider whether future scalability has been taken into account.
- 💪🏻 Combine SQL statements, especially in DML, by merging multiple values to reduce interactions with the database.
- 💪🏻 Avoid negative queries such as 'not in' or 'like', as they can lead to full table scans and reduce buffer pool utilization.
- 💪🏻 Use 'union all' instead of 'union', as 'union' involves sorting operations for deduplication, leading to lower efficiency.
- 💪🏻 Avoid subqueries; it's advisable to convert them into join queries.
- 💪🏻 Avoid cross-database queries.
- 💪🏻 Implement reasonable pagination, especially for large pagination.
- 💪🏻 If UPDATE and DELETE statements do not use LIMIT and involve large data volumes, it can impact database performance and cause master-slave delays, possibly leading to inconsistencies.
- 💪🏻 Be cautious when deleting database objects (including tables, indexes, fields, etc.).
- 💪🏻 Use internal domain names instead of IPs to connect to databases.
- 💪🏻 Consider horizontal partitioning of log-type tables based on creation time and periodically archive historical data.
Naming Conventions
- 🚓 The permissible character range for names of tables, fields, etc., includes A-Z, a-z, 0-9, and underscore (_)
- No other characters are allowed
- Principally, table names should not include numbers, and field names should not include numbers unless in special cases
- 🚓 Database names, table names, and field names should be lowercase, follow the underscore style, not exceed 32 characters, be meaningful and intuitive, and preferably use nouns rather than verbs
- The names should be related to business or product lines, and mixing of pinyin and English is prohibited
- 🚓 Naming format for ordinary indexes: idx_tableName_indexFieldName (if there are multiple indexes starting with the first field name, add the second field name; abbreviate if too long)
- Unique index naming format: uk_tableName_indexFieldName (index names must be all lowercase, abbreviate if too long)
- Primary key index naming: pk_fieldName
- Sequence naming format: seq_tablename
- 🚓 Database names, table names, and field names must not use MySQL reserved words, such as desc, range, match, asc, index, etc.
- 🚓 Temporary table names must start with 'tmp' and end with a date suffix.
- 🚓 Backup table names must start with 'bak' and end with a date suffix.
- 🚓 Fields expressing a yes/no concept should be named using the format is_xxx.
- 💪🏻 Tables within the same module should use the same prefix as much as possible, and table names should be meaningful
- for instance, all log tables starting with 'log_'
- 💪🏻 Foreign key fields (if any) should be named using the format tableName_fieldName to represent their relationships
- 💪🏻 Fields with the same meaning across different tables should have the same name
- For fields with the same meaning across tables, use the naming format tableName_fieldName, omitting the module prefix
Security Standards
- 🚓 Prohibit storing plaintext passwords in the database; passwords must be encrypted before storage.
- 🚓 Developers are forbidden from directly exporting or querying data involving user-sensitive information for business colleagues. If necessary, such actions require approval from senior leadership.
- 🚓 Operations involving sensitive data interaction with the database must have audit logs, and alerts should be set up when necessary.
- 🚓 Implement a whitelist feature for IP addresses connecting to the database to prevent unauthorized IP access.
- 💪🏻 Regularly change usernames and passwords for online database connections.
- 💪🏻 Prohibit the plaintext storage of user-sensitive information, such as phone numbers, in the database