When working with MySQL it is common to choose auto incrementing integer primary keys; however, there are situations where using a GUID/UUID is desirable. For example, prior to MySQL 5.0, you were unable to safely use auto incrementing primary keys in a multi-master replicated setup due to potential (and inevitable) duplicate key generation on the multiple masters. (This issue can be avoided in MySQL 5.0 and greater by appropriately configuring auto_increment_increment & auto_increment_offset on each master.) Other reasons GUIDs/UUIDs are desirable include: data set merging ease and non-sequential, unique identification for security (think credit card numbers). Regardless of the motivation for their use, there is a very real consequence I encountered while working on a rapidly increasing MySQL InnoDB data set. In short, the larger the data set became, the longer it took to insert a new record. This behavior is not necessarily unexpected, but the degree to which the insert time grew became rather problematic. So, years later, I've decided to take a closer look to see if I could replicate this issue with MySQL 5.0.45, and if so, determine (to a degree) what was happening.

Experimental Setup

Hardware: PowerEdge SC440 with a Core2Duo 2160 (dedicated for the duration of the experiment)
MySQL: 5.0.45
DDL
create table innodb_auto_increment (
  id integer auto_increment not null,
  primary key (id)
) type=innodb;

create table innodb_integer (
  id integer not null,
  primary key (id)
) type=innodb;

create table innodb_uuid (
  id char(36),
  primary key (id)
) type=innodb;


Code
For the sake of brevity, I am not including the actual Java code used to run this experiment; however, I will explain the general idea. Essentially, the code truncates all tables and then proceeds to loop through each table, inserting 100,000 per transaction, recording the time elapsed. It should be noted that id's for the innodb_integer and innodb_uuid tables are generated in the Java code and passed via a PreparedStatement.

And We're Off...

I wasn't sure how long I would have to wait for the expected results to show up, so, I started it up and forgot about it for a couple of days. When I came back here is what I found:


(how to read: it takes 20 minutes to insert 100,000 records into a UUID table which already contains 15 million records)


(how to read: it takes 25 hours to insert 15 million records into an empty UUID table)

Initial Thoughts

As you can see both the auto increment and manual increment tables grow pretty gracefully and the UUID table, ya, not so much. So what is going on here? My first guess was the implicit unique index on innodb_uuid's primary key was to blame, but I was not able to say for sure that it simply was not the fact that the id field was bigger on innodb_uuid or that MySQL was configured poorly. So, first I changed the MySQL config a bit, increasing various buffers by a factor of 5 and rerunning for a while. What I found is that increasing the buffers (specifically innodb_buffer_pool_size) did make a difference, but only for a brief period of time. Regardless of how much RAM you provide, this issue will rear its ugly head eventually. In order to narrow down the issue further, the previous experiment was ran with the addition of these tables:

create table innodb_uuid_no_key (
  id char(36)
) type=innodb;

create table innodb_uuid_no_key_indexed (
  id char(36),
  index index1 (id)
) type=innodb;

create table innodb_uuid_no_key_unique_indexed (
  id char(36),
  unique index index1 (id)
) type=innodb;

Experiment Run #2

This time around I decided to watch vmstat while the experiment was running. At first, everything was as expected, high IO wait due to heavy writing; however, after a few million records a trend started to appear: when records were being inserted into innodb_uuid the high IO wait persisted but the heavy writing was replaced with very heavy reading and much lower writing. This read/write disparity grew with innodb_uuid's record count. Anyways, I allowed the second run to reach the same 28 million records as the first. The results for all UUID tables can be seen below:


(how to read: it takes 20 minutes to insert 100,000 records into a UUID table which already contains 15 million records)


(how to read: it takes 25 hours to insert 15 million records into an empty UUID table)

Conclusion

As you can see innodb_uuid_no_key performs closely to its integer counterparts, innodb_uuid_no_key_indexed exhibits the same trend as innodb_uuid to a much less severe degree and innodb_uuid_no_key_unique_indexed is nearly identical to innodb_uuid. So the unique index appears to be the issue here. But why? Well, given the above information, I'd say that MySQL is unable to buffer enough data to guarantee a value is unique and is therefore caused to perform a tremendous amount of reading for each insert to guarantee uniqueness.

That's Great and All, But What About MyISAM?

The MyISAM results were virtually identical (in scale) to that of InnoDB.