English

--\\ Drop the table DROP TABLE IF EXISTS MyNewTable; --\\ Create main table CREATE TABLE MyNewTable ( ROW_ID INT IDENTITY(1,1) NOT NULL , myValue_1 INT NULL , myValue_2 INT NULL); --\\DECLARE @ Table DECLARE @MyAudit TABLE ( ROW_ID INT NOT NULL , myValue_1 INT NULL , myValue_2 INT NULL); --\\Begin Transaction BEGIN TRANSACTION; --\\ INSERT DATA INSERT INTO MyNewTable --\\ Output the data OUTPUT INSERTED.ROW_ID , INSERTED.myValue_1 , INSERTED.myValue_2 INTO @MyAudit -- \\ This generated the insertion. SELECT TOP 4 myValue_1= ROW_NUMBER() OVER (ORDER BY a.name)% 4 , myValue_1= ROW_NUMBER() OVER (ORDER BY a.name)% 10 FROM sys.all_objects a CROSS JOIN sys.messages c; --\\Rollback ROLLBACK; --\\Show the data inserted in main table (remember transaction was rolled-back) SELECT * FROM MyNewTable; --\\Show the data that was captured with Output SELECT * FROM @MyAudit;

Every time when I tell somebody that percentage cost in execution plan lie and you should not trust them when tuning a query, I get different reactions/question such as:

You are the one lying!

Can you prove it?

Why they put it there if they are not right?

Every time I got this question, I share a nice blog post by my friend Forrest [t,b] . You should read it before you start reading my craziness, he is way smarter than me :).

Here you go: Percentage Non Grata

Yesterday, someone came back asking a ton of questions, assuring that percentages were wrong because the statistics were forced, etc. I tried to explain that the main point was proved, and statistics are not even close to perfect in a production environment, but it was not enough to convince them, so I decided to share my screen and demonstrate it over a live session.

Here is a script to create our lab, one table and one UDF.

--\\Create table
CREATE TABLE [dbo].[UserProfile] (UserID INT  PRIMARY KEY CLUSTERED IDENTITY(1,1)
,AboutMe NVARCHAR(1500) NOT NULL
, MyValue1 INT NOT NULL);
GO
--\\ Insert some data :)
INSERT INTO [dbo].[UserProfile](AboutMe,MyValue1)
SELECT TOP (2000000)    AboutMe= REPLICATE( LEFT(CAST(NEWID() AS VARCHAR(520)), 5), 3)
				,   ROW_NUMBER() OVER (ORDER BY a.name)%3
FROM sys.all_objects a
CROSS JOIN sys.messages c
UNION ALL
SELECT TOP (2000000)    AboutMe= REPLICATE( LEFT(CAST(NEWID() AS VARCHAR(520)), 5), 3)
				,   3
FROM sys.all_objects a
CROSS JOIN sys.messages c;
GO
--\\Create and index to help our query.
CREATE INDEX IX_Value1 ON [dbo].[UserProfile] (MyValue1) WITH FILLFACTOR=100;
GO
--\\ Create scalar function.
CREATE OR ALTER FUNCTION dbo.Total(@MyValue1 INT)  
RETURNS BIGINT  
WITH SCHEMABINDING
AS   
BEGIN    
    DECLARE @Total BIGINT;        
	SELECT @Total=COUNT_BIG(*) 
	FROM [dbo].[UserProfile]
	WHERE MyValue1=@MyValue1;

    RETURN @Total;    
END;
GO

Once the table is ready, we are going to run the following queries, please make sure you include the Actual Execution Plan.

Note: Compatibility level is set to 140 (SQL 2017)

--\\ Compare the function and raw query
--\\ Include Actual Execution Plan (Ctrl + M)

SELECT Total=COUNT_BIG(*) 
FROM [dbo].[UserProfile]
WHERE MyValue1=3;

SELECT dbo.Total(3) as Total;

Both queries are identical, same logic, same results and same number of reads, the only difference is the UDF. What about percentages? Are those identical? Not sure about those, please see below...

SQL Server cannot expand the logic behind the UDF, therefore, the percentages are lying.

I am aware of SQL Server 2019 and FROID, but let's be honest, we are not upgrading soon and FROID it's not going to fix every UDF for you.

The above queries were not manipulated ,so what happens if we confuse SQL Server a little bit?

Let's run these three queries:

SELECT  A.MyValue1
	   ,A.AboutMe
FROM [dbo].[UserProfile] A  
WHERE A.MyValue1=3
ORDER BY A.MyValue1
OPTION(FAST 10); --\\ Give me the first 10 rows as fast as you can.

SELECT TOP (10) A.MyValue1
              , A.AboutMe
FROM [dbo].[UserProfile] A
WHERE A.MyValue1=3
ORDER BY A.MyValue1; --\\ Let SQL server handle.


SELECT TOP (10) A.MyValue1
              , B.AboutMe 
FROM [dbo].[UserProfile] A 
		INNER LOOP JOIN [dbo].[UserProfile] B --\\Forcing Loop Join and selference, expecting SQL Server seeks using cluster and noncluster indexes.
				ON A.UserID= B.UserID
WHERE A.MyValue1=3
ORDER BY A.MyValue1;

Query #1, returns 2,000,000 rows, but I asked SQL server to return the first 10 rows as fast as possible (Query Hint).
Query #2, returns 10 rows, very simple query without tricks.
Query #3, returns 10 rows, forced a Loop Join and self-reference.

Indeed, the percentages are lying again... Please see below.

If you have any questions or suggestions, please let me know in the comments.

Enjoy!!

Generally, when we write simple and small queries, we omit small details that make a huge difference. A few weeks ago I found an interesting query that was returning million of rows, the problem with this query is that it runs very often and we were not expecting that amount of data... In fact, we were getting the whole table instead of couple of rows.

The Lab

Let's create two tables, one table is for users and the other one for some kind of log per user.

Here is the query that is running very often.

Query is requesting all the logs for the users that were created through an external application, we know that only four records qualify for, but query is returning 1,000,000 rows. Let's take a look at the execution plan.

There is a warning for missing join predicate.
FK_UserID does not exists in table T2, but query runs without error.
Row Count Spool is used to satisfy the query (Left Semi Join), therefore all rows are returned and no error.

If you run the sub-query then you are going to get an error, this because the field does not exists.

If you add the missing qualifiers for the select statement, then you are going to get and error too, now SQL server knows what you are trying to do.

Here is the right query and execution plan.

Bonus

SQL server ignores completely the qualifications for the columns in insert statement, this makes sense, since the table has been specified already.

If you have any questions or suggestion, please let me know in the comments.

Enjoy!!

Did you Know Table Variable Ignore Transactions?

The lab

Percentage Cost lie and are good at it.

SQL SERVER - Why table Qualifiers are important for performance?

The Lab

Bonus