kSQL 

The last few weeks I've been teaching SQL concepts at work through code reviews and examples. Not everyone learns best this way. Me? I'm a doer. Show me or let me try it myself. Failing that, give me a good analogy to chew on. The following is a short story I dropped in our Business Intelligence Slack channel yesterday and I thought it was worth sharing. Keep in mind this was written for a target audience of Data Analysts.

- - - - -

The Survey Monkey BI team decides they're going to make a small flower garden up on the roof. We know what we want to buy, and we know what the plants look like, and we know how many we need. We all load into cars and Ubers and head over to a really, really big plant nursery. It's like, 10 acres big. 

We get to the closest plant nursery and we're horrified to find that the owner is crazy ... there are no rows of plants and trees. There's just one giant mess of plants strewn everywhere. Zero organization. It would take all of us wandering up and down every path and around every pallet to try and find the plants we want.  


This is the same as a table without a good clustered index. The table is the acres of nursery and the plants are the data and the indexes are the organization strategy. Indexes are not a thing you can see, but they have a critical affect on the grounds and the layout of the plants both physically and logically. A table scan is the same as us all wandering around, hoping to eventually find the right flower.


Screw this, let's go find a plant nursery that's organized!

We drive to a second nursery. This one is MUCH better. The plants are in neat rows, and they're organized by species and color! Akanksha instructs us to go grab one of every flower, bring it back to the checkout, and we'll pick what we want and make the purchase. We're like, " ... but .... there's so many plants!" This really doesn't seem efficient, even though we would technically end up with the right plants in the end. It's a lot of work for us, Vikram has to pee, and Elena is getting mad.


In this case, our team is like worker threads on a SQL server. Yeah, you CAN send us all out (<cough> SELECT * <cough>), we'll bring back waaaaay more than we need, and then pick out the right results to get the correct set in the end. But there's a better way . . .

Our shopping list is 6 orchids, 3 cacti, and 1 palm tree. The nice owner of the nursery gives us a map, Yusi runs off towards the palm trees, Jill goes for the cacti, and Anton gets to find orchids. They all come back, but Jill has brought us just 1 cacti! Geeze, Jill! Why?! Well, it turns out Jill thought maybe the cacti looked kinda sick. She brought us one and was going to run back and forth until we found 3 we like.  "No", says Elena. "Go get them all and we'll pick the best three".


This is why you put data that you're going to re-use into a temporary table instead of going back and forth and pulling the same data over and over again during a process. You grab a set and then work from that.

It was a very nice garden by the end of the day, and it was made efficiently.


The End.


- - - - -


It was well received, although one analyst and a DBA were miffed that they weren't included so I promised next week they can star in the story and it'll have zombies and tabasco sauce. Also, I was informed there's a plot hole and we never find out if Vikram gets to pee. Tough luck, buttercup.