DataGeek

Tuesday, February 15, 2011

SQL Server & Windows Firewall

If you’re installing SQL Server on a machine that has Windows Firewall enabled, you’ll have to configure the firewall to open the correct ports. For a default instance, that’s TCP 1433 for the database engine and UDP 1434 for the SQL Browser (used by client applications to retrieve the dynamic port of a non-default instance of SQL). The following MSDN article details what needs to be configured:

Configuring the Windows Firewall to Allow SQL Server Access

(http://msdn.microsoft.com/en-us/library/cc646023(SQL.100).aspx)

Friday, February 4, 2011

Filtered Indexes Part 3: Data Archiving

In the past, I’ve often had to keep large amounts of transactional and/or inactive data around for historical reporting purposes (or maybe just because I’m a data pack rat by nature). Keeping this in its original table can make querying current data rather costly, so often the approach has been to move it to an archive table or even archive database.

This of course makes querying a bit more complex and requires the management of archiving the data itself. Filtered indexes give us an option to have many of the performance advantages of archiving the data while leaving it in place.

Let’s consider the following scenario. We have a customers table and orders table. It’s been determined that most of the time we will be querying on open orders, where the order table’s CompletedOn field is null. The following script will build the two tables:

--  Create Customers table
CREATE TABLE tblCustomers (
		CustomerId INT NOT NULL IDENTITY(1, 1)
		,CustomerName NVARCHAR(255) NOT NULL
	CONSTRAINT PK_tblCustomers PRIMARY KEY CLUSTERED 
	(
		CustomerId ASC
	)
)
GO

--  Create Orders table
CREATE TABLE tblOrders (
		OrderId INT NOT NULL IDENTITY(1, 1)
		,CustomerId INT NOT NULL
		,OrderDate DATETIME NOT NULL
		,OrderAmount MONEY NOT NULL
		,CompletedOn DATETIME NULL
	CONSTRAINT PK_tblOrders PRIMARY KEY NONCLUSTERED 
	(
		OrderId ASC
	)
)
GO

--  Add foreign key constraint for CustomerId
ALTER TABLE tblOrders  
	WITH CHECK ADD CONSTRAINT FK_tblOrders_tblCustomers 
		FOREIGN KEY(CustomerId)
		REFERENCES tblCustomers (CustomerId)
GO

The script below will create some test data. Depending on your machine you may want to tweak some of the parameters. As setup it will create ~100,000 customer records and randomly create between 0 and 100 order records per customer (~5,000,000 records in total).

--  Create test customers
DECLARE @CustomersTemp TABLE (
	CustomerName NVARCHAR(255))
DECLARE @RecordCount INT
SET @RecordCount = 0
WHILE @RecordCount < 100000
BEGIN
	INSERT INTO @CustomersTemp (
			CustomerName)
		VALUES (
			'Customer ' + CAST(@RecordCount AS NVARCHAR))
			
	SET @RecordCount = @RecordCount + 1
END

INSERT INTO tblCustomers (
		CustomerName)
	SELECT
			CustomerName
		FROM @CustomersTemp

--  Create test orders
DECLARE @OrdersTemp TABLE(
	CustomerId INT
	,OrderDate DATETIME
	,OrderAmount MONEY)
DECLARE @CustomerId INT;
DECLARE @OrderCount INT
DECLARE Customers CURSOR
	FOR
		SELECT
				CustomerId
			FROM tblCustomers;
			
OPEN Customers;

FETCH NEXT FROM Customers
	INTO @CustomerId;
	
WHILE @@FETCH_STATUS = 0
BEGIN
	SET @RecordCount = 0;
	SET @OrderCount = CAST((RAND() * 100.0) AS INT);
	WHILE @RecordCount < @OrderCount
	BEGIN
		INSERT INTO @OrdersTemp (
				CustomerId
				,OrderDate
				,OrderAmount)
			VALUES (
				@CustomerId
				,DATEADD(DAY, RAND() * 3650, '1/1/2000')
				,(RAND() * 1000))
				
		SET @RecordCount = @RecordCount + 1
	END

	FETCH NEXT FROM Customers
		INTO @CustomerId;
END

INSERT INTO tblOrders (
		CustomerId
		,OrderDate
		,OrderAmount)
	SELECT
			CustomerId
			,OrderDate
			,OrderAmount
		FROM @OrdersTemp

CLOSE Customers;
DEALLOCATE Customers;
GO

--  Mark orders completed.
UPDATE tblOrders SET
		CompletedOn = GETDATE()
	WHERE OrderDate < '1/1/2009'
GO

Note that the last part of the script will mark about 90% of the orders as having been completed.

Now for a query. The query below is getting the record count for all uncompleted orders greater than $500.

SELECT
		COUNT(*)
	FROM tblOrders
		INNER JOIN tblCustomers
			ON tblOrders.CustomerId = tblCustomers.CustomerId
	WHERE CompletedOn IS NULL
		AND OrderAmount > 500

At this point we haven’t defined any indexes other than the primary keys of the tables. On my development server the query at this point the query is taking about 120ms.

Adding the unfiltered index below reduces that time to ~57ms.

--  Add index on OrderAmount
--DROP INDEX IX_tblOrders_OrderAmount ON tblOrders
GO

CREATE NONCLUSTERED INDEX IX_tblOrders_OrderAmount 
	ON tblOrders (
		OrderAmount ASC
		,CompletedOn ASC
	)
GO

Finally, adding the filtered index below reduces the time to ~34ms.

--  Add filtered index on OrderAmount
--DROP INDEX IX_tblOrders_OrderAmount_Current ON tblOrders
GO

CREATE NONCLUSTERED INDEX IX_tblOrders_OrderAmount_Current 
	ON tblOrders (
		OrderAmount ASC
	)
	INCLUDE (CompletedOn)
	WHERE CompletedOn IS NULL
GO

Not only is the query execution time significantly reduced, but also the space required to store the index is reduced to about 10% in this case. This space savings could be used to add additional frequently used fields as key or included fields into the index to speed queries even more. Note that in this case I’m simply including the CompletedOn field rather than adding it as a key since it’s implicitly keyed by the WHERE clause.

If you review the execution plan for the query, you’ll notice that it automatically reverts to the unfiltered index if you remove the CompletedOn filter from the WHERE clause. Thus, you could essentially have a separate set of indexes for current and archived data.

One final note. In my experience with filtered indexes, I’ve found that filtering on a BIT field doesn’t seem to actually filter. You end up with all the rows indexed as if there were no filter on it. Has anyone else found that to be the case?

Additional Info

Click here to download ZIP file with all SQL scripts

Filtered Index Design Guidelines – Microsoft TechNet (http://technet.microsoft.com/en-us/library/cc280372.aspx)

Article detailing issues with filtered indexes and SET options - Interesting issue with Filtered indexes

Thursday, February 3, 2011

Filtered Indexes Part 2: More With Unique Constraints

In my last post I showed you how you could enforce a unique constraint on a nullable field using SQL 2008’s filtered indexes. In a recent project I needed to have a unique code in a lookup table. However, the code was only required to be unique for active records. In other words, two records could have the code “ABC” as long as only one of the records was active.
Imagine the following table:

CREATE TABLE tblMyTable(
    MyTableId int IDENTITY(1,1) NOT NULL,
    UniqueCode nvarchar(10) NOT NULL,
    IsActive bit NOT NULL DEFAULT 1
)

To enforce a unique constraint on the UniqueCode filed only when the IsActive field is true you would create the following index:

CREATE UNIQUE NONCLUSTERED INDEX IX_tblMyTable_UniqueCode
    ON tblMyTable (
        UniqueCode ASC
    )
    WHERE IsActive = 1

In this case the WHERE causing the index to only include those UniqueCode values where the IsActive field is 1 and thus only checking for uniqueness on active records.

Tuesday, February 1, 2011

Filtered Indexes Part 1: Nullable Columns

SQL Server 2008 introduced the concept of filtered indexes, whereby the index only indexes a portion of the data based on a WHERE clause in the filter definition.
I’ve toyed with some potential uses of this, such as keeping archived data in its original table (an entry for another day). However, recently a more immediate use came up for it. Solving the age old problem of placing a unique constraint on a nullable field.
An index treats a NULL like any other value and so a unique index on a nullable field only allows a single record with a NULL value. In prior versions of SQL Server you would need to create triggers to enforce a unique constraint. However, in SQL 2008, by filtering the index to exclude NULLs, you can create a unique constraint on nullable.
Imagine the following table:

CREATE TABLE tblMyTable(
MyTableId int IDENTITY(1,1) NOT NULL
,UniqueCode nvarchar(10) NULL
)

If we want to create a unique constraint on UniqueCode but retain the ability to have nulls you’d create an index as follows:

CREATE UNIQUE NONCLUSTERED INDEX IX_tblMyTable_UniqueCode
    ON tblMyTable (
        UniqueCode ASC
    )
    WHERE UniqueCode IS NOT NULL

By using the WHERE clause in the CREATE INDEX script you are removing NULLs from the index and thus ensuring that only non-null values are unique.

Thursday, September 30, 2010

Getting Feedback from Stored Procedures

I'm one of those people who puts as much business logic in the database. Stored procedures, views and functions are my "middle tier". That can sometimes lead to long running procedures with little feedback from the UI. I've often wished that you could return interim status information back from SQL.

Well, it turns out you can. The MSDN blog post, "Getting feedback / progress from batches and stored procedures", details how using RAISERROR.

BTW, the article also notes why there's a missing "E" in RAISERROR.

Tuesday, September 28, 2010

Putting SQL to Sleep

Have you ever wanted to test how your application would handle a long running stored procedure, but didn't want to create a multi-gigabyte database to do so? Add a WAITFOR statement.

WAITFOR is similar to .NET's Thread.Sleep() in which you can stop a thread from processing for a specified period of time. You can either pass WAITFOR a delay time for the batch to pause for or a specific time for the batch to continue afterwards as follows:

WAITFOR DELAY '00:00:45' --Delays the batch for 45 seconds.

WAITFOR TIME '12:30' --Restarts the batch at the next line at 12:30.

Note that there is no date specified in the WAITFOR TIME statement. You can only delay it within a 24 hour period using the TIME clause.

One thing to watch out for are transactions. If you put a WAITFOR within a transaction, any locked resources will remain so for the duration of the delay, potentially causing timeouts and deadlocks.

WAITFOR is also used for waiting for Service Broker messages, but that's a whole other blog entry.

I'm sure there are other uses for WAITFOR but for now, happy napping!

Tuesday, September 7, 2010

SQL Connection Kill Switch

Aaaarrrrg……

OK, so a few SSMS functions, such as Detach Database, have a checkbox letting you kill all current connections before performing the requested operation. But some, say Restore, don’t (could someone give me a good reason why?), and end up failing if someone else (or yourself in another query window) has an open connection. Your solution is to open the Activity Monitor, sort by database and kill each of the connections and hope that no new ones are opened in the mean time. And what if you want to do this all in a script anyway?

To solve the above problems you can run the following script:

USE master

DECLARE @SQLStatement VARCHAR(MAX)
SET @SQLStatement = ''

SELECT
        @SQLStatement = @SQLStatement
            + 'KILL '
            + convert(VARCHAR(10), spid)
            + CHAR(13) + CHAR(10)
    FROM master.dbo.sysprocesses
    WHERE DB_NAME(dbid) = 'MyDatabase'
        AND dbid <> 0
        AND spid <> @@SPID

EXEC(@SQLStatement)

GO

So what exactly is this doing? The SELECT statement is building a collection of KILL n statements where n is a process ID of a process connected to the specified database (in this case “MyDatabase”). The CHAR(13) + CHAR(10) are really only there for aesthetics and debugging purposes. Note that the WHERE clause explicitly excludes the connection of the current process.

Once this collection of KILL statements is built it executes them. By using @SQLStatement = @SQLStatement + {next KILL statement}, it’s continually appending the next KILL statement to itself and the resulting value in the @SQLStatement variable is the entire list of KILL statements.

Giving credit where credit is due, I would like to note that I copied and tweaked the above statement from the Aaquagua blog. In that blog, it is noted that the following statements will have the same effect:

ALTER DATABASE MyDatabase
SET SINGLE_USER WITH ROLLBACK IMMEDIATE

ALTER DATABASE MyDatabase
SET MULTI_USER WITH ROLLBACK IMMEDIATE

What we’re doing here is setting the database to single user mode, which immediately kills any current connections, and then setting back to multi-user mode. Although this will in fact have the same effect, there’s something hackish about it that makes me a tad uncomfortable.

But that’s just my neuroses speaking. You can pick whichever one works for you.

UPDATE:

OK. So after writing this, I was thinking about the method of setting the database to single user mode first. My issue with using it to kill connections is that your using a statement that does one thing, set a database to single user mode, to perform something else, kill all the connections. Yes it works, but if someone were to come along and review your script, they may wonder what you’re doing.

Anyway, upon further pondering, I decided that setting the database to single user mode before performing an action, such as a restore or major update, is probably a good idea anyway by preventing any other connections from interfering with the action. The idea would be to wrap your action inside the two set statements.