English 4/5

IN vs EXISTS - Which one is faster?

Victor Escott
Jan 5, 2019
3 min read

Theoretically EXISTS must be faster because it just checks for whether a row exists or not (True or False), on the other hand, IN has to collect the result set before testing the condition.

Is the above statement true?

Based on my labs, the answer would be “It depends”, basically the optimizer treats EXISTS and IN the same way when possible, but it depends of your table schema, data and the way you wrote the query.

How to use it?

IN: (subquery | expression [ ,...n ])

Expression : you can pass a list of literal values IN (1,2,3,N) or a single value IN (1)
Subquery: Is a subquery that has a result set of one column. IN (SELECT A FROM B)

EXISTS: (Subquery)

Subquery: Is a subquery to compare the existence of the row (TRUE or FALSE), the result set does not matter.

The Lab (IN vs EXISTS)

Let’s prepare the environment, to do so run the following script to create two tables:

We have two tables, the “MyMainTable” (300K rows) is our main table and “MySubTable” (25% random data from “MyMainTable”) this the table that we are going to use to filter the data.

Let’s see how EXISTS and IN compare.

Please noted the division by zero in the EXISTS statement, at this point you are probably expecting an error, but the cool thing is that the field list in an EXISTS statement is not going to be executed and query will be executed.

I have forced those queries to do not run in parallel, this is to get a very simple execution plan that we can compare easily.

Here are the execution plans, noted they are identical, two clustered index scan and a hash match.

The IOs are also exactly the same.

So the query optimizer treated both queries the same way. No performance differences here.

Let’s add some indexes and see how it goes.

The execution plans are identical, though now we have an index scan, a stream aggregate and a merge join due to the new indexes.

The IOs are also exactly the same.

So far the IN and EXISTS appear to perform identically, not matter if there is an index or not, even when there are nulls in either the subquery or in the outer table.

What about NOT IN vs NOT EXISTS?

Let’s see how NOT EXISTS and NOT IN compare. Please noted that I have created some indexes already.

One more time, IOs and execution plans are also exactly the same.

So far they appear to perform identically.

Hang on… The potential issue with NULL values.

If you have a NULL value in the result of the subquery or expression, IN and EXISTS will give the same result set, but NOT IN behaves differently.

Let’s alter the columns on our tables.

Let’s see how NOT EXISTS and NOT IN compare one more time.

Very different execution plans, you can see the NOT IN has bad performance.

The IOs shows exactly the same, very bad performance using the NOT IN, please noted that even with the bad performance we are still retrieving the same number of rows.

What if we insert a null value?

Let’s insert a NULL value.

The plan did not change, we still have a bad plan for the NOT IN.

The IOs are the same… Hang on, what happened here?

The NOT INT did not return rows!!! How is this possible? The answer is very simple, please see below.

IN Resolves to OR (=).

NOT IN Resolved to AND (<>).

On the other hand, NOT EXISTS work just fine since it just checks for whether a row exists or not (True or False).

Rule of thumb.

If you are going to use a subquery, then you must use EXISTS/NOT EXISTS.
If you are going to use a list of literal values, then you must use IN/NOT IN.

Enjoy!!

Logical query processing phases

Victor Escott
Dec 27, 2018
2 min read

Updated: Dec 27, 2018

This is one of the most challenging concepts that I had to learn when I started working with SQL.

I am not ashamed to confess that it took me some time to assimilate the way SQL Server processes its code and this is because SQL server does not process statement from top to bottom (like most programming languages)

I am an Accidental DBA, so I was expecting SQL Server to process the code as other programming languages does.

The “keyed-in order"

This is the order that you are supposed to type the query clauses. Please see below.

Logical Query Processing Phases

Here is where the fun begins and this is because the logical query processing order is different. It starts with the FROM clause.

Here is the same query based on “logical query processing order”:

Each phase operates on one or more tables as inputs and returns a virtual table as output.
The output table of one phase is the input to the next phase.
The very last step returns the final query result.

If you are a visual learner like myself, please take look at Itzik Ben-Gan’s flow chart of Logical Query Processing.

Examples:

A common mistake is try to use an ALIAS on the WHERE clause, this is going to throw an error because the SELECT clause is evaluated after a WHERE clause, so the column alias is not known to the WHERE clause.

On the other hand, we can use ALIAS on the ORDER BY clause since it is evaluated after the SELECT clause.

Using a little bit of everything based on the following rules:

GROUP BY is processed before the SELECT (we cannot use alias here)
HAVING is processed before the SELECT(we cannot use alias here, but we can use COUNT)
ORDER BY is processed after the SELECT (we can use alias here)

Thanks for reading. Enjoy!

Bulk insert into SQL using XML data (.NET application Tool)

Victor Escott
Dec 14, 2018
3 min read

In this post, we will see:

How do we insert bulk records into SQL Server using XML Data Type?
How do we automate the process?
Create an SP automatically to handle the data insertion.
Transform a DataTable to XML.

Background

Sometimes, there is a need to insert bulk records into the database, this process is very simple when we have direct access to the database, but what happens when we have to extract the data from a Web Service, an ASP service or RESTful API?

Here is when I start to see all sorts of creative stuff, like hitting the database for each record that was extracted from the call, create a script on the fly, etc.

Scenario

We have a web service that returns a DataSet (.NET)
The schema of the DataTable could change any time without notice.
We have to extract the data from several web servers.

How do we insert bulk records into SQL Server using XML Data Type?

Let’s insert some data based on the following XML:

XQuery in SQL Server is a statically typed language. So in order to reduce the static type and guarantee that there is indeed at most one value, you should use the positional predicate [1] to explicitly indicate that the path expression returns a singleton.

Tip: Uppercase and lowercase matter in XML (case sensitive). Opening and closing tags must match exactly. For example, <ROOT>, <Root>, and <root> are three different tags.

Let’s read the XML and insert some data.

Voila, we have data ready to be queried, it was very simple and straight forward right? Well, this is true if you have to create the script only once and it has no more than 10 columns, but what happens when you have multiple XML files with different schema, lots of data and each one goes to a different table? Yes, I know it will be a hassle to create the script, verify each data type and create all the tables.

Don’t worry, I have you covered!

The problem

We have to extract some data from several servers, but we don’t have access to the database.
IT Team found a clever a solution, so instead of allow access to the database, they created a Web Service that retrieves a DataSet with one single table. IT Team rules!! :P
IT Team provided an image of the DataSet, this is what we should get when we call the Web Service, please see below.

The Solution

I know … I told you have I have you covered, but you still have to work a little bit to get this running. The only thing you have to do is create the target table based on the image, yes the one that you are going to use to store the data, then .NET will take care of the rest.

Let’s create a target table.

Now what?

We have to create the query to insert the data, but this time we are going to let .NET to do the work for us, so I’ve create an small program where you can pass the table name and the result is Stored Procedure based on the table schema. Please see below.

Stored Procedure:

Please observe that the xPath is hard-coded to “NewDataSet/XMLData”.

.NET Source Code.

This run a query against my database:

This get the job done:

What about the data?

Well the data is not XML, so we have to convert the DataTable to XML, I have a function that takes cares of this. Please see below.

Here is a sample of the XML after conversion:

We have the data, now what?

We have to insert the data into SQL… Yes I have another function that takes care of this too, you have to pass the DataTable and the SP name. Please see below.

Yes, we are done, now we can get data from the Web Service and do a bulk insert into SQL.

This solution work well enough for small / medium sized files, so anything under 2GB will be inserted very fast. Anything over 2GB will crash for sure, in these circumstances, SQLXML Bulk Loader should the best solution. I will try to cover this in my next posts.

Bonus

Let’s run some test with final program, we are going to insert 300K random rows into the database.

Data insertion took an average of 6 seconds, so it is not that bad.

Let’s review the data.

Conclusion

Welcome to XML world, this a valuable option when you want to bulk insert data from a Web Service, an ASP service or RESTful API.

Thanks for reading. Enjoy!

1 2 35