| id_a | id_b | id_c |
|---|---|---|
| 1 | 1 | 1 |
| 2 | 2 | 2 |
| 3 | 3 | 3 |
| 7 | NULL | NULL |
| NULL | 4 | 4 |
| NULL | 5 | 5 |
| NULL | NULL | 6 |
Just Some Things to Read
Query with multiple full outer joins
22 February 2020
Last week I got the question to help someone with a query because the query result was not as expected. At first sight the query was pretty straight forwarded. But there was one thing he used multiple full outer joins and I asked why this was needed from a functional viewpoint. At that moment I already realized that the unexpected query result was related to the multiple full outer joins. I failed to explain this because I couldn't find the right words. The next day I explained it with some examples and some Venn diagrams. Before I can show the expected and the actual result we need to create some tables and populate it with some fake data - it's obvious I can't share the real data - in our SQL Server database:
drop table if exists #a;
drop table if exists #b;
drop table if exists #c;
create table #a (id_a int);
create table #b (id_b int);
create table #c (id_c int);
insert into #a (id_a) values (1), (2), (3), (7);
insert into #b (id_b) values (1), (2), (3), (4), (5);
insert into #c (id_c) values (1), (2), (3), (4), (5), (6);
id_a id_b id_c
1 1 1
2 2 2
3 3 3
7 NULL NULL
NULL 4 NULL
NULL 5 NULL
NULL NULL 4
NULL NULL 5
NULL NULL 6
The used query to get the actual result:
select *
from #a
full outer join #b on #a.id_a = #b.id_b
full outer join #c on #a.id_a = #c.id_c;
Due the above query all records from the tables #a, #b and c# are shown and the records with ID 4 and 5 are duplicated because table #a is the driving table and there is no link between table #b and #c.
Below a Venn Diagram to visualize this. The person who wrote the query thought that those records will be appeared once.
The get the below expected result we need to use an extra join condition in the query:.
select *
from #a
full outer join #b on #a.id_a = #b.id_b
full outer join #c on #a.id_a = #c.id_c
or #b.id_b = #c.id_c;
With the above query we will get the below result:
| id_a | id_b | id_c |
|---|---|---|
| 1 | 1 | 1 |
| 2 | 2 | 2 |
| 3 | 3 | 3 |
| 7 | NULL | NULL |
| NULL | 4 | 4 |
| NULL | 5 | 5 |
| NULL | NULL | 6 |
With the added join condition - the "or #b.id_b = #c.id_c" - we have a link between table #b and #c. See also the Venn diagram below.
My personal lesson is that I should use the Venn diagrams more often to explain query results. Deep down in my small brains I already knew...
But it is always good to get a nice reminder :-)