Performance issue insert select

11 May 2022

A while ago I noticed a performance issue with a view in Azure SQL Server. The returned result set of this view is inserted into a table; just like this insert into <table> (<fields>) select <fields> from <view>. In this view we are joining tables and views. And those views are also joining other tables and views. So in short we have a view on a view construction. I know SQL Server is not always happy with this but from a data logistics point of view it is a simple and nice solution.

I still believe a view on a view construction is not bad and should not have an impact on performance. If it has then we have a code issue and/or a problem in the technical data model e.g. a missing index. But sometimes it is what it is and we can not change easily the techinical data model. And often the data and business rules are complex and so the code/SQL. Sometimes we need to persist a view for performance reasons. In the end it is finding the right balance between performance, Azure costs, easy development and we should be able to deliver the data right on time as agreed with our stakeholders.

Okay back to the performance issue. It turns out that inserting all returned records into an empty table takes about 11 minutes for 50k records. A clustered columnstore index is the only applied index on that table before inserting records and this one is not really needed as we take the number of records into account. But using clustered columnstore indexes is part of the homebuild framework.

My first thought was that this performance issue should be an easy one to solve with adding the option force order. In my experience with a view on a view construction this option might improve the execution of a view. But in this case it doesn't help, I cancel the query after 45 minutes. I forgot it was still running.... So, I decided to test several "insert - select" options (and in combination if possible) to investigate in this case the best option(s):


Below the average run time in minutes of the various scenario's I tested

ScenarioDegree of ParallelismRun Time (avg in min)
CCIX without tablock and force order111:32
CCIX with tablock and without force order14:43
CCIX with tablock and force order15:05
CCIX without tablock and with force order1after 20 min query is cancelled
TC without tablock and force order111:32
TC with tablock and without force order160:32
TC with tablock and force order10:34
TC without tablock and with force order1after 20 min query is cancelled
In this case I have chosen to populate the table with table compression and tablock.

Just some background of the test setup: Below the query to see the Degree of Parallism (field dop in the query) and other properties as well:
select
	req.session_id
,	req.start_time
,	req.status
,	req.command
,	DatabaseName = db_name(req.database_id)
,	req.blocking_session_id
,	req.wait_type
,	req.last_wait_type
,	ssn.login_name
,	ssn.program_name
,	ObjectName = object_name(qtext.objectid, qtext.dbid)
,	qtext.text
,	qplan.query_plan
,	req.dop
,	req.reads
,	req.writes
,	req.logical_reads
,	req.total_elapsed_time
,	req.parallel_worker_count

from sys.dm_exec_requests as req
	cross apply sys.dm_exec_sql_text(req.sql_handle) as qtext
	cross apply sys.dm_exec_query_plan(req.plan_handle) as qplan
	inner join sys.dm_exec_sessions as ssn on req.session_id = ssn.session_id

where req.session_id <> @@spid;
Happy coding!