Friday, February 4, 2011

Filtered Indexes Part 3: Data Archiving

In the past, I’ve often had to keep large amounts of transactional and/or inactive data around for historical reporting purposes (or maybe just because I’m a data pack rat by nature).  Keeping this in its original table can make querying current data rather costly, so often the approach has been to move it to an archive table or even archive database. 

This of course makes querying a bit more complex and requires the management of archiving the data itself.  Filtered indexes give us an option to have many of the performance advantages of archiving the data while leaving it in place.

Let’s consider the following scenario.  We have a customers table and orders table.  It’s been determined that most of the time we will be querying on open orders, where the order table’s CompletedOn field is null.  The following script will build the two tables:

--  Create Customers table
CREATE TABLE tblCustomers (
CustomerId INT NOT NULL IDENTITY(1, 1)
,CustomerName NVARCHAR(255) NOT NULL
CONSTRAINT PK_tblCustomers PRIMARY KEY CLUSTERED
(
CustomerId ASC
)
)
GO

-- Create Orders table
CREATE TABLE tblOrders (
OrderId INT NOT NULL IDENTITY(1, 1)
,CustomerId INT NOT NULL
,OrderDate DATETIME NOT NULL
,OrderAmount MONEY NOT NULL
,CompletedOn DATETIME NULL
CONSTRAINT PK_tblOrders PRIMARY KEY NONCLUSTERED
(
OrderId ASC
)
)
GO

-- Add foreign key constraint for CustomerId
ALTER TABLE tblOrders
WITH CHECK ADD CONSTRAINT FK_tblOrders_tblCustomers
FOREIGN KEY(CustomerId)
REFERENCES tblCustomers (CustomerId)
GO



The script below will create some test data. Depending on your machine you may want to tweak some of the parameters. As setup it will create ~100,000 customer records and randomly create between 0 and 100 order records per customer (~5,000,000 records in total).




--  Create test customers
DECLARE @CustomersTemp TABLE (
CustomerName NVARCHAR(255))
DECLARE @RecordCount INT
SET @RecordCount = 0
WHILE @RecordCount < 100000
BEGIN
INSERT INTO @CustomersTemp (
CustomerName)
VALUES (
'Customer ' + CAST(@RecordCount AS NVARCHAR))

SET @RecordCount = @RecordCount + 1
END

INSERT INTO tblCustomers (
CustomerName)
SELECT
CustomerName
FROM @CustomersTemp

-- Create test orders
DECLARE @OrdersTemp TABLE(
CustomerId INT
,OrderDate DATETIME
,OrderAmount MONEY)
DECLARE @CustomerId INT;
DECLARE @OrderCount INT
DECLARE Customers CURSOR
FOR
SELECT
CustomerId
FROM tblCustomers;

OPEN Customers;

FETCH NEXT FROM Customers
INTO @CustomerId;

WHILE @@FETCH_STATUS = 0
BEGIN
SET @RecordCount = 0;
SET @OrderCount = CAST((RAND() * 100.0) AS INT);
WHILE @RecordCount < @OrderCount
BEGIN
INSERT INTO @OrdersTemp (
CustomerId
,OrderDate
,OrderAmount)
VALUES (
@CustomerId
,DATEADD(DAY, RAND() * 3650, '1/1/2000')
,(RAND() * 1000))

SET @RecordCount = @RecordCount + 1
END

FETCH NEXT FROM Customers
INTO @CustomerId;
END

INSERT INTO tblOrders (
CustomerId
,OrderDate
,OrderAmount)
SELECT
CustomerId
,OrderDate
,OrderAmount
FROM @OrdersTemp

CLOSE Customers;
DEALLOCATE Customers;
GO

-- Mark orders completed.
UPDATE tblOrders SET
CompletedOn = GETDATE()
WHERE OrderDate < '1/1/2009'
GO




Note that the last part of the script will mark about 90% of the orders as having been completed.



Now for a query.  The query below is getting the record count for all uncompleted orders greater than $500.




SELECT
COUNT(*)
FROM tblOrders
INNER JOIN tblCustomers
ON tblOrders.CustomerId = tblCustomers.CustomerId
WHERE CompletedOn IS NULL
AND OrderAmount > 500



At this point we haven’t defined any indexes other than the primary keys of the tables.  On my development server the query at this point the query is taking about 120ms.



Adding the unfiltered index below reduces that time to ~57ms.




--  Add index on OrderAmount
--DROP INDEX IX_tblOrders_OrderAmount ON tblOrders
GO

CREATE NONCLUSTERED INDEX IX_tblOrders_OrderAmount
ON tblOrders (
OrderAmount ASC
,CompletedOn ASC
)
GO




Finally, adding the filtered index below reduces the time to ~34ms.




--  Add filtered index on OrderAmount
--DROP INDEX IX_tblOrders_OrderAmount_Current ON tblOrders
GO

CREATE NONCLUSTERED INDEX IX_tblOrders_OrderAmount_Current
ON tblOrders (
OrderAmount ASC
)
INCLUDE (CompletedOn)
WHERE CompletedOn IS NULL
GO



Not only is the query execution time significantly reduced, but also the space required to store the index is reduced to about 10% in this case.  This space savings could be used to add additional frequently used fields as key or included fields into the index to speed queries even more.  Note that in this case I’m simply including the CompletedOn field rather than adding it as a key since it’s implicitly keyed by the WHERE clause.



If you review the execution plan for the query, you’ll notice that it automatically reverts to the unfiltered index if you remove the CompletedOn filter from the WHERE clause.  Thus, you could essentially have a separate set of indexes for current and archived data.



One final note.  In my experience with filtered indexes, I’ve found that filtering on a BIT field doesn’t seem to actually filter.  You end up with all the rows indexed as if there were no filter on it.  Has anyone else found that to be the case?



Additional Info



Click here to download ZIP file with all SQL scripts



Filtered Index Design Guidelines – Microsoft TechNet (http://technet.microsoft.com/en-us/library/cc280372.aspx)



Article detailing issues with filtered indexes and SET options - Interesting issue with Filtered indexes

No comments:

Post a Comment