Learning in action!
It’s always fun when you get to apply something you’re learning in a class to real life, right when you’re learning it.
I’m taking a Database Internals class, and we’ve spent a lot of time learning about indexes – what kinds there are, how each kind works, the pros and cons, etc. This week we learned about implementing different query operators, different kinds of joins, etc.
Now, for those of you who don’t know, indexes help to make looking up certain information in a database faster. I’ve never see the actual effect myself – all of the databases that I’ve worked with have been little toy databases of my own (with barely any data in them), or pre-established databases that already had the appropriate indexes.
Last night, C needed some help working with some data for his work. He has an ENORMOUS spreadsheet that someone gave him, but he needed to make some changes to it. It either couldn’t be done in Excel or would take a ridiculous amount of time, so he asked if I could do it with an SQL query. I couldn’t actually think of a way to do it in a single query, but I did know a simple way to do it with a PHP script running a lot of SQL queries. So we loaded the data up into a MySQL database I created for it* (and at first the file was actually too large to import into the database!) and I ran my script. It took about a minute to run, maybe a little longer. I tweaked it a bit to fix a couple of errors, and then it worked perfectly.
When finished, I decided, just for fun, to add an index to the table on one of the fields that was used a lot in the query. Then I ran the script again, and it only took ~5 seconds! This was really exciting for me – yes, I’m a nerd. I really enjoyed getting to see the results of indexing in action, just like I’d learned about – and I especially love the fact that I know exactly how and why it worked!
Also, on a side note, computers are awesome. What would take hours and hours of tedious manual labor took only a few seconds with a script and a properly-indexed database. It still boggles my mind sometimes just how fast computers really are!
*Actually, getting the data exported from Excel and into the database was the hardest part, and took a lot longer than either of us expected – partially because the Excel file was so huge that it took several minutes every time we needed to do any operation on the data to get it ready for export.
Have you played with the query planner at all? Most DBMSs (and I don’t think MySQL is an exception) have a way to input a query, and instead of running it (or while running it), they’ll tell you what steps they intend to take when they do run it, and exactly how long each one can be expected to take. Often the command is called EXPLAIN. When you add an index or something, the query plan will change, and what was once the big, expensive part will be different (an index scan instead of a table scan, for example), and cheaper. It’s kind of a neat tool, and occasionally very useful.
Yeah, I used that a little bit in my intro to DB class. I didn’t think to do that when I was playing around with the index, though. We also used a cool tool for Postgres called a visual planner that let us see tree visualizations of the different plans that Postgres considered for a query (and their costs) so you could see why you ended up with the query plan that you did. It was pretty interesting.