MySQL RANGE Partition
When partition doesn’t help
- A lot of relations in the table.
- A small table and won’t grow in the future.
- Don’t use a partition unless you have 1M rows.
It’s tempting to believe that partition will solve performance problems. But it is so often wrong. Partitioning splits up one table into several smaller tables. But table size is rarely a performance issue. Instead, I/O time and indexes are the issues.
What is Partitioning?
Partitioning is a physical database design technique that many data modelers and DBAs are quite familiar with. Although partitioning can be used to accomplish a number of various objectives, the main goal is to reduce the amount of data read for particular SQL operations so that overall response time is reduced.
- Horizontal Partitioning – this form of partitioning segments table rows so that distinct groups of physical row-based datasets are formed that can be addressed individually (one partition) or collectively (one-to-all partitions). All columns defined to a table are found in each set of partitions so no actual table attributes are missing. An example of horizontal partitioning might be a table that contains ten years worth of historical invoice data being partitioned into ten distinct partitions, where each partition contains a single year’s worth of data
- Range – this partitioning mode allows a DBA to specify various ranges for which data is assigned. For example, a DBA may create a partitioned table that is segmented by three partitions that contain data for the 1980’s, 1990’s, and everything beyond and including the year 2000.
- Hash – this partitioning mode allows a DBA to separate data based on a computed hash key that is defined on one or more table columns, with the end goal being an equal distribution of values among partitions. For example, a DBA may create a partitioned table that has ten partitions that are based on the table’s primary key.
- Key – a special form of Hash where MySQL guarantees even distribution of data through a system-generated hash key.
- List – this partitioning mode allows a DBA to segment data based on a pre-defined list of values that the DBA specifies. For example, a DBA may create a partitioned table that contains three partitions based on the years 2004, 2005, and 2006.
- Composite – this final partitioning mode allows a DBA to perform sub-partitioning where a table is initially partitioned by, for example range partitioning, but then each partition is segmented even further by another method (for example, hash).
- Vertical Partitioning – this partitioning scheme is traditionally used to reduce the width of a target table by splitting a table vertically so that only certain columns are included in a particular dataset, with each partition including all rows. An example of vertical partitioning might be a table that contains a number of very wide text or BLOB columns that aren’t addressed often being broken into two tables that has the most referenced columns in one table and the seldom-referenced text or BLOB data in another.
Why the Range partition is the best choice in my case
CREATE TABLE `HISTORY` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`symbol` varchar(16) NOT NULL,
`open` float DEFAULT NULL,
`high` float DEFAULT NULL,
`low` float DEFAULT NULL,
`close` float DEFAULT NULL,
`volume` double DEFAULT '0',
`exchange` varchar(8) NOT NULL,
`timestamp` int(16) NOT NULL DEFAULT '0',
`date` varchar(16) DEFAULT NULL,
`update_date_time` timestamp NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
KEY `symbol` (`symbol`),
KEY `exchange` (`exchange`),
KEY `timestamp` (`timestamp`)
) ENGINE=InnoDB AUTO_INCREMENT=144753037 DEFAULT CHARSET=utf8;
My table structure is very simple. We will store Citigroup Inc daily stock data. Example dataset in CSV.
Since I have another 30K different company stock data like Citygroup and my total rows of this table is 128,947,460 at this moment. A Range partition is perfect in this case.
I will partition the table by year, like all the 2018 data in a partition called ‘p_2018’ and for the year 2019 it is ‘p_2019’.
My primary key is ID which is AUTO_INCREMENT. I have to find the ID of the first trading day of the next year 2019 (1st January), then I can create a partition for all 2018 data. ID 56839003 is the first trading of the year 2019.
PARTITION p_2018 VALUES less than (56839003)
Now p_2018 has all the data from 2018, we can create another partition for 2019, 2020, and so on.
How my existing SQL query gonna affected
None of your SQL needs to be changed.
But you can take advantage of the partition by adding your partition name in your SQL SELECT, INSERT queries.
SELECT * FROM `HISTORY` PARTITION (`p_2018`) WHERE `symbol`='C';
PARTITION SQLs
SHOW PARTITION
SHOW CREATE TABLE HISTORY
ADD PARTITION
ALTER TABLE `HISTORY`
PARTITION BY RANGE(`id`)
(
PARTITION p_2018 VALUES less than (56839003),
PARTITION p_2019 VALUES less than (61466903),
PARTITION p_others VALUES LESS THAN MAXVALUE
);
DROP PARTITION
ALTER TABLE t1 DISCARD PARTITION p_2018, p_2019 TABLESPACE;
DELETE PARTITION
All partitions
ALTER TABLE t1 REMOVE PARTITIONING;
One Partition
ALTER TABLE t1 DROP PARTITION p_2018, p_2019;
IMPORT PARTITION
Import Tablespace Partition
ALTER TABLE t1 IMPORT PARTITION p_2018, p_2019 TABLESPACE;
REORGANIZE PARTITION
ALTER TABLE SIGNALS
REORGANIZE PARTITION p_others INTO (
PARTITION p_2018 VALUES less than (91466903),
PARTITION p_2019 VALUES LESS THAN MAXVALUE
);
SELECT PARTITION
Rows of each partition using information_schema
SELECT PARTITION_ORDINAL_POSITION, TABLE_ROWS, PARTITION_METHOD FROM informatio_schema.PARTITIONS WHERE TABLE_SCHEMA = 'table' AND TABLE_NAME = 'HISTORY';
Few things to remember
- Duplicate your database before you try any partitions.
- You have to find your partition method. RANGE partition may not be suitable in your case.
- Run the same query before and after the partition and compare the time it takes.