DataGeek

Thursday, April 28, 2011

QUOTED_IDENTIFIER, Filtered Indexes, SSMS & SQLCMD

So I’ve been working on a project that utilizes filtered indexes and decided I needed to build an automated deployment routine. I had built a set of build scripts, and utilizing SQLCMD mode in SSMS to allow for include files, built a master script to perform the entire build.

During development and initial testing I ran this in SSMS and everything worked fine, but for automation purposes I used SQLCMD.EXE. As soon as I started using SQLCMD Is started to get errors indicating that the QUOTED_IDENTIFIER was incorrectly set for indexed views, filtered indexes etc.

Now I’m aware that indexed views among other objects are picky about their settings and the error message let me know that this was the case for filtered indexes as well. However, the question was, why hadn’t it come up before? In looking through the code I only found one reference where QUOTED_IDENTIFIER was set to OFF and that was in the create database script. Even after changing that the errors were still showing up.

It turns out that by default, a connection through SSMS sets QUOTED_IDENTIFIER to ON, while a connection through SQLCMD sets it to OFF. In SQLCMD it does this after each connect statement, so setting it at the beginning of a script may not cover it if you subsequently run a connect statement after it. The best practice is to run the set statements after each connect.

Reading through the Remarks section of the following MSDN article, SET QUOTED_IDENTIFIER (Transact-SQL), states that the defaults are set differently for different connection types (DB-Library, SQL Server Native Client, ODBC etc.) and thus indicates that SQLCMD and SSMS are probably using different connection types.

Tuesday, April 5, 2011

Debunking the Cult of the Machine SID

So this isn’t especially data oriented, but for any developer who is constantly creating new VMs for development or testing this info might come as a pleasant surprise.

Over the years, when cloning VMs, sometimes by simply copying the VHD files and attaching them to a new machine, we’ve been told that we’ll cause problems because we aren’t creating machines with unique SIDs. As a result, we’ve had to use sysprep, sysinternals NewSID (discontinued) or the VM software of our choice’s cloning tool. Turns out however, our obsession with SIDs was all for nought. The following technet blog post by the author NewSID details why duplicate SIDs don’t matter:

The Machine SID Duplication Myth (and Why Sysprep Matters)

Note that it is still official Microsoft policy to require cloned machines to be syspreped, and as such I would recommend following that policy in a production environment, but for us developers, copy away and stop worrying about your SIDs!

Wednesday, March 23, 2011

SQL’s Choice–Helping SQL Server Pick Deadlock Victims

In my last post I show how to debug deadlocks using SQL Profiler. That’ll help you figure out what’s causing them. But what if for whatever reason you can’t resolve all of them? How do I make sure that a high priority transaction isn’t chosen as the victim?

SQL provides the answer with the SET DEADLOCK_PRIORITY statement. You can pass it a value between –10 and 10 or the predefined values, HIGH, NORMAL or LOW that map to –5, 0 and 5 respectively. A session with a lower value will picked as a deadlock victim over a session with a higher value.

Below are the table and stored procedure scripts from the previous article that will allow us to create a deadlock by executing ProcedureA in one SSMS window and then immediately executing ProcedureB in a second window.

--  Create Tables
CREATE TABLE TableA(
    TableAId INT IDENTITY(1,1) NOT NULL,
    SomeValue NVARCHAR(50) NOT NULL,
)
GO

CREATE TABLE TableB(
    TableBId INT IDENTITY(1,1) NOT NULL,
    SomeOtherValue NVARCHAR(50) NOT NULL,
)
GO

--  Create Stored Procedures
CREATE PROCEDURE ProcedureA
AS
BEGIN
	SET DEADLOCK_PRIORITY HIGH

	BEGIN TRANSACTION
	
	INSERT INTO TableA WITH (TABLOCKX) (
			SomeValue)
		VALUES (
			'SomeValue A')
			
	WAITFOR DELAY '00:00:10'
	
	INSERT INTO TableB WITH (TABLOCKX) (
			SomeOtherValue)
		VALUES (
			'SomeOther Value A')
			
	COMMIT TRANSACTION
END
GO

CREATE PROCEDURE ProcedureB
AS
BEGIN
	SET DEADLOCK_PRIORITY LOW

	BEGIN TRANSACTION
	
	INSERT INTO TableB WITH (TABLOCKX) (
			SomeOtherValue)
		VALUES (
			'SomeValue B')

	INSERT INTO TableA WITH (TABLOCKX) (
			SomeValue)
		VALUES (
			'SomeOther Value B')
			
	COMMIT TRANSACTION
END
GO

Note that the difference here is that the first statement in ProcedureA is SET DEADLOCK_PRIORITY HIGH and in ProcedureB is SET DEADLOCK_PRIORITY LOW. This will ensure that ProcedureB will always be the victim of a deadlock between Procedures A & B.

By placing the SET statement within the stored procedure we’ve set the deadlock priority for the stored procedure alone. If it’s run outside the stored procedure it can be used to set the priority of an entire session.

If your interested in finding out what the current priority is you can use the following query:

SELECT 
		session_id,
		deadlock_priority 
	FROM sys.dm_exec_sessions  
	WHERE session_id = @@SPID

So now even if you can’t prevent all deadlocks, at least you can give your mission critical processes priority.

For more information on DEADLOCK_PRIORITY you can read on MSDN at:

SET DEADLOCK_PRIORITY (Transact-SQL)

Friday, March 18, 2011

Debugging Deadlocks with SQL Profiler

Don’t you love this:

Transaction (Process ID 60) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.

SQL helpfully tells you that your process was chosen as the deadlock victim but doesn’t tell you who “another process” was and what the deadlock was on? Certainly makes debugging fun and exciting. It turns out that SQL Profiler has some nice functionality for just this purpose.

First let’s find a way to reliably reproduce a deadlock. Let’s create a couple of tables using the following script:

CREATE TABLE TableA(
    TableAId INT IDENTITY(1,1) NOT NULL,
    SomeValue NVARCHAR(50) NOT NULL,
)
GO

CREATE TABLE TableB(
    TableBId INT IDENTITY(1,1) NOT NULL,
    SomeOtherValue NVARCHAR(50) NOT NULL,
)
GO

Now lets create the following tow stored procedures:

CREATE PROCEDURE ProcedureA
AS
BEGIN
	BEGIN TRANSACTION
	
	INSERT INTO TableA WITH (TABLOCKX) (
			SomeValue)
		VALUES (
			'SomeValue A')
			
	WAITFOR DELAY '00:00:10'
	
	INSERT INTO TableB WITH (TABLOCKX) (
			SomeOtherValue)
		VALUES (
			'SomeOther Value A')
			
	COMMIT TRANSACTION
END
GO

CREATE PROCEDURE ProcedureB
AS
BEGIN
	BEGIN TRANSACTION
	
	INSERT INTO TableB WITH (TABLOCKX) (
			SomeOtherValue)
		VALUES (
			'SomeValue B')

	INSERT INTO TableA WITH (TABLOCKX) (
			SomeValue)
		VALUES (
			'SomeOther Value B')
			
	COMMIT TRANSACTION
END
GO

Note that I’ve added a WAITFOR statement between the first and second INSERT statements in ProcedureA. This will allow us to start ProcedureA in one window and give us enough time to start ProcedureB and create a deadlock in a second window. Go ahead and give it a try. One of them will be chosen as the deadlock victim.

OK. Now that we can create one ourselves, how do we go about debugging it? First, open up SQL Profiler. You’ll get to the initial window to define your trace:

I’ve renamed mine Deadlock Trace. Probably would have left it untitled if it hadn’t been for a blog entry. Next go to the Event Selection tab and remove all of the existing selections as follows:

Once you have a clean slate, click on the Show all events and Show all columns checkboxes down on the lower right. Now you can pick the deadlock graph event from under the Locks group as follows:

When you do this, you’ll notice that there’s now a third tab, Events Extraction Settings. Navigate to this tab and you’ll be able to specify that profiler should write the deadlock information to a file and where as follows:

Note that I’ve chosen to have it write each deadlock to its own file. I’ve found that to be much more convenient when reading them. Now hit Run and off it goes. Go back to SSMS and run our test procedures again to produce another deadlock. When that’s completed, switch back to Profiler and you’ll see the following:

There you go. Debug away. OK, it’s not showing much more than we got before. We know there were two processes, they deadlocked and one was killed. Time to navigate explorer to the folder we told it to save the deadlocks in. You should see a deadlock trace file as follows:

Now if you simply double-click on it it will open up in Profiler with the same graphical view we got above. Since it’s just an XML file, open it up in the text/XML editor of your choice. This file contains a wealth of information to help you resolve deadlock issues. It’s a little large so I’m not going to include it the whole thing here but here’s some key parts:

<deadlock-list>
 <deadlock victim="process4ad288">
  <process-list>
   <process id="process4ad288" clientoption2="390200" clientoption1="671090784" locktimeout="4294967295" currentdb="11" xactid="4791721" isolationlevel="read committed (2)" loginname="SILKROADTECH\jbastow" hostpid="6456" hostname="JBASTOWVS" clientapp="Microsoft SQL Server Management Studio - Query" lastbatchcompleted="2011-03-18T16:42:10.440" lastbatchstarted="2011-03-18T16:48:44.960" trancount="2" priority="0" ecid="0" sbid="0" spid="58" status="suspended" kpid="7072" schedulerid="4" lockmode="X" xdes="0x800ff950" lasttranstarted="2011-03-18T16:48:44.960" transactionname="user_transaction" ownerid="4791721" waittime="2577" waitresource="OBJECT: 11:1893581784:0 " logused="256" taskpriority="0">
    <executionstack>
     <frame sqlhandle="0x03000b00f4320d6d38a90201aa9e00000100000000000000" stmtend="568" stmtstart="368" line="13" procname="PlayPen.dbo.ProcedureA">
INSERT INTO TableB WITH (TABLOCKX) (
            SomeOtherValue)
        VALUES (
            &amp;apos;SomeOther Value A&amp;apos;)     </frame>

At the top you’ll note it specifies the specific process that was the victim. Below that is a process list that shows the details of each process involved in the deadlock, including the stored procedure, if appropriate. It even shows which statements within the transaction are deadlocked.

After the process list node you’ll see the resource list node. this lists each of the objects (tables, indexes, etc.) that are locked as part of the deadlock, and the type of lock each process has, or is attempting to acquire.

<resource-list>
 <objectlock lockPartition="0" objid="1893581784" subresource="FULL" dbid="11" objectname="PlayPen.dbo.TableB" id="lock80022c80" mode="X" associatedObjectId="1893581784">
  <owner-list>
   <owner id="process493708" mode="X"/>
  </owner-list>
  <waiter-list>
   <waiter id="process4ad288" mode="X" requestType="wait"/>
  </waiter-list>
 </objectlock>

OK. Now that you know how to diagnose your deadlocks, get going fixing them!

You can read more on MSDN in:

Analyzing Deadlocks with SQL Server Profiler

Tuesday, February 15, 2011

SQL Server & Windows Firewall

If you’re installing SQL Server on a machine that has Windows Firewall enabled, you’ll have to configure the firewall to open the correct ports. For a default instance, that’s TCP 1433 for the database engine and UDP 1434 for the SQL Browser (used by client applications to retrieve the dynamic port of a non-default instance of SQL). The following MSDN article details what needs to be configured:

Configuring the Windows Firewall to Allow SQL Server Access

(http://msdn.microsoft.com/en-us/library/cc646023(SQL.100).aspx)

Friday, February 4, 2011

Filtered Indexes Part 3: Data Archiving

In the past, I’ve often had to keep large amounts of transactional and/or inactive data around for historical reporting purposes (or maybe just because I’m a data pack rat by nature). Keeping this in its original table can make querying current data rather costly, so often the approach has been to move it to an archive table or even archive database.

This of course makes querying a bit more complex and requires the management of archiving the data itself. Filtered indexes give us an option to have many of the performance advantages of archiving the data while leaving it in place.

Let’s consider the following scenario. We have a customers table and orders table. It’s been determined that most of the time we will be querying on open orders, where the order table’s CompletedOn field is null. The following script will build the two tables:

--  Create Customers table
CREATE TABLE tblCustomers (
		CustomerId INT NOT NULL IDENTITY(1, 1)
		,CustomerName NVARCHAR(255) NOT NULL
	CONSTRAINT PK_tblCustomers PRIMARY KEY CLUSTERED 
	(
		CustomerId ASC
	)
)
GO

--  Create Orders table
CREATE TABLE tblOrders (
		OrderId INT NOT NULL IDENTITY(1, 1)
		,CustomerId INT NOT NULL
		,OrderDate DATETIME NOT NULL
		,OrderAmount MONEY NOT NULL
		,CompletedOn DATETIME NULL
	CONSTRAINT PK_tblOrders PRIMARY KEY NONCLUSTERED 
	(
		OrderId ASC
	)
)
GO

--  Add foreign key constraint for CustomerId
ALTER TABLE tblOrders  
	WITH CHECK ADD CONSTRAINT FK_tblOrders_tblCustomers 
		FOREIGN KEY(CustomerId)
		REFERENCES tblCustomers (CustomerId)
GO

The script below will create some test data. Depending on your machine you may want to tweak some of the parameters. As setup it will create ~100,000 customer records and randomly create between 0 and 100 order records per customer (~5,000,000 records in total).

--  Create test customers
DECLARE @CustomersTemp TABLE (
	CustomerName NVARCHAR(255))
DECLARE @RecordCount INT
SET @RecordCount = 0
WHILE @RecordCount < 100000
BEGIN
	INSERT INTO @CustomersTemp (
			CustomerName)
		VALUES (
			'Customer ' + CAST(@RecordCount AS NVARCHAR))
			
	SET @RecordCount = @RecordCount + 1
END

INSERT INTO tblCustomers (
		CustomerName)
	SELECT
			CustomerName
		FROM @CustomersTemp

--  Create test orders
DECLARE @OrdersTemp TABLE(
	CustomerId INT
	,OrderDate DATETIME
	,OrderAmount MONEY)
DECLARE @CustomerId INT;
DECLARE @OrderCount INT
DECLARE Customers CURSOR
	FOR
		SELECT
				CustomerId
			FROM tblCustomers;
			
OPEN Customers;

FETCH NEXT FROM Customers
	INTO @CustomerId;
	
WHILE @@FETCH_STATUS = 0
BEGIN
	SET @RecordCount = 0;
	SET @OrderCount = CAST((RAND() * 100.0) AS INT);
	WHILE @RecordCount < @OrderCount
	BEGIN
		INSERT INTO @OrdersTemp (
				CustomerId
				,OrderDate
				,OrderAmount)
			VALUES (
				@CustomerId
				,DATEADD(DAY, RAND() * 3650, '1/1/2000')
				,(RAND() * 1000))
				
		SET @RecordCount = @RecordCount + 1
	END

	FETCH NEXT FROM Customers
		INTO @CustomerId;
END

INSERT INTO tblOrders (
		CustomerId
		,OrderDate
		,OrderAmount)
	SELECT
			CustomerId
			,OrderDate
			,OrderAmount
		FROM @OrdersTemp

CLOSE Customers;
DEALLOCATE Customers;
GO

--  Mark orders completed.
UPDATE tblOrders SET
		CompletedOn = GETDATE()
	WHERE OrderDate < '1/1/2009'
GO

Note that the last part of the script will mark about 90% of the orders as having been completed.

Now for a query. The query below is getting the record count for all uncompleted orders greater than $500.

SELECT
		COUNT(*)
	FROM tblOrders
		INNER JOIN tblCustomers
			ON tblOrders.CustomerId = tblCustomers.CustomerId
	WHERE CompletedOn IS NULL
		AND OrderAmount > 500

At this point we haven’t defined any indexes other than the primary keys of the tables. On my development server the query at this point the query is taking about 120ms.

Adding the unfiltered index below reduces that time to ~57ms.

--  Add index on OrderAmount
--DROP INDEX IX_tblOrders_OrderAmount ON tblOrders
GO

CREATE NONCLUSTERED INDEX IX_tblOrders_OrderAmount 
	ON tblOrders (
		OrderAmount ASC
		,CompletedOn ASC
	)
GO

Finally, adding the filtered index below reduces the time to ~34ms.

--  Add filtered index on OrderAmount
--DROP INDEX IX_tblOrders_OrderAmount_Current ON tblOrders
GO

CREATE NONCLUSTERED INDEX IX_tblOrders_OrderAmount_Current 
	ON tblOrders (
		OrderAmount ASC
	)
	INCLUDE (CompletedOn)
	WHERE CompletedOn IS NULL
GO

Not only is the query execution time significantly reduced, but also the space required to store the index is reduced to about 10% in this case. This space savings could be used to add additional frequently used fields as key or included fields into the index to speed queries even more. Note that in this case I’m simply including the CompletedOn field rather than adding it as a key since it’s implicitly keyed by the WHERE clause.

If you review the execution plan for the query, you’ll notice that it automatically reverts to the unfiltered index if you remove the CompletedOn filter from the WHERE clause. Thus, you could essentially have a separate set of indexes for current and archived data.

One final note. In my experience with filtered indexes, I’ve found that filtering on a BIT field doesn’t seem to actually filter. You end up with all the rows indexed as if there were no filter on it. Has anyone else found that to be the case?

Additional Info

Click here to download ZIP file with all SQL scripts

Filtered Index Design Guidelines – Microsoft TechNet (http://technet.microsoft.com/en-us/library/cc280372.aspx)

Article detailing issues with filtered indexes and SET options - Interesting issue with Filtered indexes

Thursday, February 3, 2011

Filtered Indexes Part 2: More With Unique Constraints

In my last post I showed you how you could enforce a unique constraint on a nullable field using SQL 2008’s filtered indexes. In a recent project I needed to have a unique code in a lookup table. However, the code was only required to be unique for active records. In other words, two records could have the code “ABC” as long as only one of the records was active.
Imagine the following table:

CREATE TABLE tblMyTable(
    MyTableId int IDENTITY(1,1) NOT NULL,
    UniqueCode nvarchar(10) NOT NULL,
    IsActive bit NOT NULL DEFAULT 1
)

To enforce a unique constraint on the UniqueCode filed only when the IsActive field is true you would create the following index:

CREATE UNIQUE NONCLUSTERED INDEX IX_tblMyTable_UniqueCode
    ON tblMyTable (
        UniqueCode ASC
    )
    WHERE IsActive = 1

In this case the WHERE causing the index to only include those UniqueCode values where the IsActive field is 1 and thus only checking for uniqueness on active records.