Using INNER JOIN/SELF JOIN to allow for smaller indexes.

by Allan Svelmøe Hansen May 15, 2009 16:26

In databases, I often need foreign keys in my tables, because I’ll want to use them to select content out from my tables. However this can often result in either bad index utilization in the selection or making additional indexes based on the foreign key(s) and the content I need to select out.
This in turn can result in ending up with “many” indexes and sometimes many big indexes.

However a method can be to make smaller indexes and use an INNER JOIN to join into the table an extra time.

I’m going to show the pattern with a relative simple example to illustrate, because it is about the pattern more so then the specifics layout, content and size of the table.
It is just a pattern/technique to keep in mind and have in the SQL toolbox.
Suppose you have two tables of a similar pattern to this:

Tables for self join example
(click for larger size)

I’ve let the SQL Server create my clustered indexes based on the primary key, which means I have a clustered index over my primary key(s).
When having to look up rows in JoinOne based on the foreign key, it’ll often look like this:

SELECT 
FROM 
JoinOne T1
WHERE FKOne <VALUE>
  

Because I’ve added no other indexes to the tables, I’ll get an Index Scan or Table Scan when running the above query, which - as we - know is not a good thing most of the time.
This usually leads to the creating of a second index which indexes over my foreign key.
However if the table is large, and if I’ll need to extract many/most of the columns (like in my example with SELECT *), it can mean I’ll have to make either a complete index or one with many included columns, just ordered by my foreign key first.

If I make a small index with my foreign key and my primary key second, I can use this to join into my table again, and then after that use my primary key.
This is an index to illustrate:

Index for self join example 
(click for larger size)

Then I can make the following query:

SELECT T2.
FROM 
JoinOne T1
INNER JOIN JoinOne T2 
ON
 
T1.PKOne 
T2.PKOne
 
AND T1.PKTwo 
T2.PKTwo
 
AND T1.PKThree 
T2.PKThree
WHERE T1.FKOne <VALUE>
  

When looking at its execution plan, it’ll show two index seeks instead of my previous index scan. This way I have a small index as possible, but maintain seeks in my execution.
Of course there are some considerations one need to take with this pattern. Firstly – if the tables are “small”, then the scan in itself might be alright, or the overhead of keeping a complete index for the foreign key is neglible.

However the advantage of the pattern is that I can have smaller indexes on (very) large tables, which means less overhead when inserting/updating – but still have only index seeks in my selections execution plans.
It is a useful technique in my opinion – when used at the right times, which of course is on a case by case evaluation as always.  

Bookmark and Share DotnetKicks dotnetshoutout

Delete using join-queries

by Allan Svelmøe Hansen April 24, 2008 18:29

Occasionally I run into having to clean or modify data using a range of more or less complex conditions.
Then the following syntax comes in handy, because it is actually possible to use complex queries utilizing joins, to delete information from.

The example is realtive simple but it illustrates the point quite well ....... I hope :)

DELETE T1
FROM Table1 T1
INNER JOIN Table2 T2 ON T1.TableOneID T2.TableOneID

As can be seen the syntax is a little bit different from a normal delete. This query will delete all rows in T1 aka Table1 which would be selected out by my query.

A good rule when building such queries to delete from is to build them up as selects, and then when you have selected the correct data out, you can substitute the SELECT with a DELETE <ALIAS>

I use this type of synax a lot in at work and just wanted to share it, in case it wasn't as known for people. It can quickly become a powerful tool for deleting the correct data when normal subqueries aren't useful.

Bookmark and Share DotnetKicks dotnetshoutout

Tags: , ,

SQL

SQL: Watch those join clauses

by Allan Svelmøe Hansen November 12, 2007 13:40

Watch those join clauses.

One of the statements I’ve heard a couple of times over the last couple of years, is that it doesn’t matter where you place your additional on-clauses when joining, meaning either in “on” or in “where”. Well it does, depending on what you want to extract of information from your database. Performance wise and result wise.

An easy way to show this is to take two tables.
Suppose I have table (“TableOne”) consisting of a TableOnePK (primary key) and a value, and a table two “TableTwo” consisting of TableOneFK (foreign key) and a value.

If I then do a left join like:

SELECT *
FROM TableOne T1
LEFT JOIN TableTwo T2 on T1.TableOnePK = T2.TableOneFK AND T2.TableTwoValue = <target value>

This will get you all the rows from TableOne with nulls for the left join where it didn’t match anything in TableTwo. Pretty much as you would expect.

Now change the statement to

SELECT *
FROM TableOne T1
LEFT JOIN TableTwo T2 on T1.TableOnePK = T2.TableOneFK
WHERE T2.TableTwoValue = <target value>

And now you’ll filter TableTwo prior to the left join and then left join that result into your TableOne, meaning you’ll get a vastly different result then the first one.

Thus one need to consider such things when doing joins as if not careful, it is possible to get a vastly different join result then otherwise intended.

Bookmark and Share DotnetKicks dotnetshoutout

SQL Fun: JOIN's

by Allan Svelmøe Hansen September 26, 2007 13:35

Having started to get better at SQL I’m looking at ways to optimize, change and simply parts of my own methodologies when I use SQL.  This is from MS SQL 2003/2005, as it is what I use most, but I doubt it would matter much anyway.

I’ve started to notice that you can use SQL joins for much more when joining tables then equaling one column to another. This is something I’ve found great usage for and in my view is a nice little thing to be aware of.

Normally – joins are explained something along the line of this (using Inner Join but I guess it doesn’t matter much).
Suppose you have two tables, which needs to be joined:

SELECT *
FROM TableOne AS T1
INNER JOIN TableTwo AS T2 ON T1.ColumnToCompare T2.ColumnToCompare
 

This is the way many see joins and utilize them.

Now for a situation I’ve used joins for.

When you map a 1:1 parent-child relationship, and aren’t using nested SQL, then a normal way is to map the child instance with a parent foreign key.
If having a longer parent-child relationship, you could extend the child with a “path” through its parents.
For instance
TableA
Field: ID
Field: ParentID
Field: ParentPath

Very much simplified, then the upmost parent instance could have ID 1 and would have ParentPath 0 and ParentID 0 because it has no parents. The child of ID 1 would thus for example have ID 2 and thus have ParentPath ‘0,1’ and ParentID 1.
Child of ID 2 would thus for example have ID 3 and thus ParentPath ‘0,1,2’ and ParentID 2.

Now to find the parent and the parent’s parent and onwards up, you could utilize a cursor or some other sort of recursive lookup, however trying my best to avoid cursors, I started to utilize the INNER JOIN for this instead.
This will incorporate the usage of PATINDEX, which illustrates the powers of a JOIN statement.


SELECT ID
FROM TableA AS T1
INNER JOIN TableA AS T2
 
ON PATINDEX('%,' CAST(T1.ID AS NVARCHAR) + ',%'',' T2.ParentPath ',') > 1
 

What it does, is that by appending commas to the ID of the T1 of TableA and checking to see if it is located within the ParentPath of T2 of TableA (also appended with commas), you will get the parent, and the parent’s parent and so onwards up the hierarchy.
This means, that in the very simplified situation mentioned, that you’re basically looking for all instances where “,ID,” from TableA is part of the TableA “,ParentPath,”.
It would in the example in question return the ID’s 1 and 2.
You could easily get ID 3 out as well, simply by adding the T1.ID to the end of the ParentPath of T2.

Of course a string search is not the world’s fastest method, but compared to some of the alternatives (cursor for instance), it does provides benefits.

However this also was mainly to show that JOINs are not just a direct column to column matching and it opens up for many tools for using SQL to do stuff.
It's just experimenting and finding what works for each situation.

Bookmark and Share DotnetKicks dotnetshoutout

Tags: ,

SQL

Powered by BlogEngine.NET 1.6.1.0
Theme by Mads Kristensen | Modified by Mooglegiant

About:
Allan Svelmøe Hansen

My real name is Allan Svelmøe Hansen.
I live in Denmark, where I work as a developer for hedal:kruse:brohus using SQL Server and the .NET framework since 2004.
My primary fields of expertise is back-end data integration, database design and optimization.


       View Allan Svelmøe Hansen's profile on LinkedIn     

Disclaimer

The opinions expressed herein are my own personal opinions and thoughts and does not represent my employers view in any way, nor are my results guaranteed for all situations.
Content is presented "as is", with no warranty.