Using Data Distributions to Write Better Code

I was recently asked to go back and look at some code to see if there was a way to speed things up because over time performance had decreased to unacceptable levels. Because I’m a DBA first and a developer second, I started with the SQLs, tables and indexes to see if there was anything obvious. There wasn’t, but I did find the following query being run a lot and it looked funky so I started to investigate.

select first 1
   col1
from
   tab1
where
   col1 = "A" or
   col1 = "AB" or
   col1 = "ABC" or
   col1 = "ABCD" or
   col1 = "ABCDE" or
   col1 = "ABCDEF" or
   col1 = "ABCDEFG" or
   col1 = "ABCDEFGH" or
   col1 = "ABCDEFGHI"
order by
   length(col1) desc

First of all, what is going on here?

There is a table, called tab1 (all names have been changed to protect the innocent) that has a char(n) field named col1 that holds some strings of varying lengths. The SQL is trying to find the row in tab1 with the col1 field that is the longest substring of “ABCDEFGHI”.

There is a unique index on col1 and the query runs fast enough, but this SQL is being run a lot in a tight loop for many different strings and the total ru…

Auteur : noreply@blogger.com (Andrew Ford)

No comments yet.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.