When working with MySQL it is common to choose auto incrementing integer primary keys; however,
there are situations where using a GUID/UUID is desirable. For example, prior to MySQL 5.0, you
were unable to safely use auto incrementing primary keys in a multi-master replicated setup due to
potential (and inevitable) duplicate key generation on the multiple masters. (This issue can be
avoided in MySQL 5.0 and greater by appropriately configuring auto_increment_increment &
auto_increment_offset on each master.) Other reasons GUIDs/UUIDs are desirable include: data set
merging ease and non-sequential, unique identification for security (think credit card numbers).
Regardless of the motivation for their use, there is a very real consequence I encountered while
working on a rapidly increasing MySQL InnoDB data set. In short, the larger the data set became,
the longer it took to insert a new record. This behavior is not necessarily unexpected, but the
degree to which the insert time grew became rather problematic. So, years later, I've decided
to take a closer look to see if I could replicate this issue with MySQL 5.0.45, and if so,
determine (to a degree) what was happening.
Experimental Setup
| Hardware: |
PowerEdge SC440 with a Core2Duo 2160 (dedicated for the duration of the experiment) |
| MySQL: |
5.0.45 |
DDL
create table innodb_auto_increment (
id integer auto_increment not null,
primary key (id)
) type=innodb;
create table innodb_integer (
id integer not null,
primary key (id)
) type=innodb;
create table innodb_uuid (
id char(36),
primary key (id)
) type=innodb;
Code
For the sake of brevity, I am not including the actual Java code used to run this experiment; however,
I will explain the general idea. Essentially, the code truncates all tables and then proceeds to loop
through each table, inserting 100,000 per transaction, recording the time elapsed. It should be noted
that id's for the innodb_integer and innodb_uuid tables are generated in the Java code and passed
via a PreparedStatement.
And We're Off...
I wasn't sure how long I would have to wait for the expected results to show up, so, I started it up
and forgot about it for a couple of days. When I came back here is what I found:
(
how to read: it takes 20 minutes to insert 100,000 records into a UUID table which already
contains 15 million records)
(
how to read: it takes 25 hours to insert 15 million records into an empty UUID table)
Initial Thoughts
As you can see both the auto increment and manual increment tables grow pretty gracefully and the UUID
table, ya, not so much. So what is going on here? My first guess was the implicit unique index on
innodb_uuid's primary key was to blame, but I was
not able to say for sure that it simply was not the fact that the id field was bigger on innodb_uuid or that
MySQL was configured poorly. So, first I changed the MySQL config a bit, increasing various buffers by a factor
of 5 and rerunning for a while. What I found is that increasing the buffers (specifically
innodb_buffer_pool_size) did make a difference, but only for a brief period of time. Regardless of how
much RAM you provide, this issue will rear its ugly head eventually. In order to narrow down the issue
further, the previous experiment was ran with the addition of these tables:
create table innodb_uuid_no_key (
id char(36)
) type=innodb;
create table innodb_uuid_no_key_indexed (
id char(36),
index index1 (id)
) type=innodb;
create table innodb_uuid_no_key_unique_indexed (
id char(36),
unique index index1 (id)
) type=innodb;
Experiment Run #2
This time around I decided to watch vmstat while the experiment was running. At first, everything
was as expected, high IO wait due to heavy writing; however, after a few million records a trend started
to appear: when records were being inserted into innodb_uuid the high IO wait persisted
but the heavy writing was replaced with very heavy reading and much lower writing. This read/write
disparity grew with innodb_uuid's record count. Anyways, I allowed the second run to reach the
same 28 million records as the first. The results for all UUID tables can be seen below:
(
how to read: it takes 20 minutes to insert 100,000 records into a UUID table which already
contains 15 million records)
(
how to read: it takes 25 hours to insert 15 million records into an empty UUID table)
Conclusion
As you can see innodb_uuid_no_key performs closely to its integer counterparts,
innodb_uuid_no_key_indexed exhibits the same trend as innodb_uuid to a much less severe degree and
innodb_uuid_no_key_unique_indexed is nearly identical to innodb_uuid. So the unique index appears to
be the issue here. But why? Well, given the above information, I'd say that MySQL is unable to
buffer enough data to guarantee a value is unique and is therefore caused to perform a tremendous
amount of reading for each insert to guarantee uniqueness.
That's Great and All, But What About MyISAM?
The experiment is still running. I will update when it has finished...