Query with multiple full outer joins

22 February 2020

Last week I got the question to help someone with a query because the query result was not as expected. At first sight the query was pretty straight forwarded. But there was one thing he used multiple full outer joins and I asked why this was needed from a functional viewpoint. At that moment I already realized that the unexpected query result was related to the multiple full outer joins. I failed to explain this because I couldn't find the right words. The next day I explained it with some examples and some Venn diagrams. Before I can show the expected and the actual result we need to create some tables and populate it with some fake data - it's obvious I can't share the real data - in our SQL Server database:


drop table if exists #a;
drop table if exists #b;
drop table if exists #c;

create table #a (id_a int);
create table #b (id_b int);
create table #c (id_c int);

insert into #a (id_a) values  (1), (2), (3), (7);
insert into #b (id_b) values  (1), (2), (3), (4), (5);
insert into #c (id_c) values  (1), (2), (3), (4), (5), (6);

Below the expected result:

id_aid_bid_c
111
222
333
7NULLNULL
NULL44
NULL55
NULLNULL6
Below the actual result:

id_aid_bid_c
111
222
333
7NULLNULL
NULL4NULL
NULL5NULL
NULLNULL4
NULLNULL5
NULLNULL6

The used query to get the actual result:


select *
from #a
    full outer join #b on #a.id_a = #b.id_b
    full outer join #c on #a.id_a = #c.id_c;
Due the above query all records from the tables #a, #b and c# are shown and the records with ID 4 and 5 are duplicated because table #a is the driving table and there is no link between table #b and #c. Below a Venn Diagram to visualize this. The person who wrote the query thought that those records will be appeared once.

Venn Diagram actual result

The get the below expected result we need to use an extra join condition in the query:.


select *
from #a
    full outer join #b on #a.id_a = #b.id_b
    full outer join #c on #a.id_a = #c.id_c
                        or #b.id_b = #c.id_c;

With the above query we will get the below result:

id_aid_bid_c
111
222
333
7NULLNULL
NULL44
NULL55
NULLNULL6

With the added join condition - the "or #b.id_b = #c.id_c" - we have a link between table #b and #c. See also the Venn diagram below.

Venn Diagram actual result

My personal lesson is that I should use the Venn diagrams more often to explain query results. Deep down in my small brains I already knew...
But it is always good to get a nice reminder :-)