Quick Merge in MySQL

This is just a quick tip. If you have a bunch of data that you need to concatenate, take a look at MySQL's GROUP_CONCAT function.

For example, way back when, I wrote about how to add tagging (and subsequently weighting those tags) to an application. It was a very simple approach with each record having a tags field. It was just a simple text field and I stored all the tags with spaces. However, to pull out a list of all tags being used in the system, I returned the tags field from all records, looped through each record, exploding the tags onto an array.

while($ftypes = mysql_fetch_array($rs))
{
  $types = array_merge($types, explode(" ", trim($ftypes[0]) ));
}

Using the GROUP_CONCAT function though, I no longer need to loop through them. I concatenate the list on the MySQL side, which returns me one record with one field.

select group_concat(tags SEPARATOR ' ') FROM posts

Then, on the PHP side, I no longer have to loop, I can just split the first record.

$ftypes = mysql_fetch_row($rs);
  $types = explode(" ", trim($ftypes[0]) );

This may not result in a huge reduction of code but from a performance perspective, I've been experiencing a couple interesting side effects. To test any performance advantages, I created a table with about two million records with some dummy data. The table only has one field to store my tags. The old method took about 23000ms for the first attempt and then an average of about 13000ms for each subsequent request. With the new GROUP_CONCAT method, the first attempt took on average about 7500ms. Already a speed improvement of about three times. But here's the kicker: due to caching, each subsequent request was under 15ms.

With this kind of result, I feel like I missed something somewhere but I ran these tests a few times and everything seems to be on the up-and-up. Inserting a new record into the database brought the select time back up to 7500ms but each call after that was back down to sub-15ms.

These performance tests don't even take into consideration the reduction in needing to loop through everything on the PHP end. All in all, a nice bonus.

Published August 22, 2006 · Updated September 14, 2006
Categorized as SQL
Short URL: http://snook.ca/s/645

Conversation

9 Comments · RSS feed
Nate K said on August 22, 2006

There are several instances where I found using the mysql functions to help processing - versus the programming end. I just have to evaluate each one for scalability - so if the database switches on me in the future, my code isnt broken.

I just decide where I keep the business logic for those matters, then adjust accordingly. The solution you have here seems much cleaner.

Ed Eliot said on August 22, 2006

It looks like a really useful function - thanks for posting info about it.

I think it may be more flexible however to have a separate tags table which is referenced from your main one. I think this would generally make tags management easier. As things currently stand if you wanted to return a distinct set of all tags used you'd be back to looping through PHP.

If instead you'd used a separate tags table you could have done something like:

select group_concat(distinct tag separator ' ') from tags

You could sort them at the same time by doing:

select group_concat(distinct tag order by tag asc separator ' ') from tags

Jonathan Snook said on August 22, 2006

Although I only eluded to it in the previous articles, I chose not to use a separate table because I didn't want to have to deal with creating multiple records or dealing with associations. Tagging for the most part was simply an enabler for search and therefore I didn't see a need for it. Mind you, with group_concat, I could have likely accomplished the same thing while still using a separate table. Always more than one way to skin a cat, as they say. :)

Andrew Kumar said on August 22, 2006

MySQL is powerful. I've been looking for a function index on MySQL, similar to phpFR widget for mac. Any ideas?

Jonathan Snook said on August 22, 2006

Andrew: Unfortunately, I don't know of any widgets. I'm a Windows guy so I use the CHM's made available on the MySQL site

Bill Mullin said on August 23, 2006

Mysql query Browser has a built in function Index, IT also provides ssyntax help and cut paste insert availability . Also gives you sql history as a helpful "how did i do that before: index.

Jeff J. Snider said on August 23, 2006

Jonathan, your link on "how to add tagging" is malformed, so the text of the link isn't actually showing up. Now I'll go read the rest of the post. :-)

Jonathan Snook said on August 23, 2006

Jeff: Ah, good catch. Fixed.

Andrew Kumar said on August 23, 2006

Thanks Bill and jonathan.

Sorry, comments are closed for this post. If you have any further questions or comments, feel free to send them to me directly.