Wednesday, December 16, 2009

Working with large data in SQL Server 2005 and beyond

Recently I ran across a data centric application that had been written for SQL Server 2005. This particular application had a need to store a lot of large text based data. On the data tier this had been implemented using the text and ntext data types. Given this was a new application that had never looked at an earlier version of SQL Server I was surprised to find that the developers hadn't taken advantage of the new approach to large data types, namely the MAX length specifier for the varchar and nvarchar data types, e.g. using VARCHAR(MAX) to work with up to 2GB of data.

There are a number of advantages to working with the new types over the old. There is support for most, if not all, string functions and they can also be used when declaring variables.

I wonder if these new large data types are yet to become widely known by developers, or if it is simply a case of us all being comfortable with known ways of meeting a particular requirement and following familiar patterns.

Monday, December 14, 2009

Denormalisation and transactions

I'm all for denormalising a data model for performance or meeting business requirements with less complexity in the model. I'm happy to design a model with an aggregated figure from a related child table in a parent table - putting a year to date sales figure into a master customer table is one that seems to crop up regularly. One of my rules of thumb when I denormalise is to make sure that I transactionalise units of work to ensure that data integrity is maintained. In the case of the earlier example this would mean placing the insert, update or delete to the sales table inside the same transaction as the update to the year to date sales figure in the customer table. It's critical that the two parts to the data change either succeed completely or fail completely. If the first sales data is changed but the customer year to date sales figure is not (due to some failure) then without a rollback of the transaction your data now has problems. I've seen enough cases where a denormalised data model is used without transactions around units of work to wonder if some developers simply don't realise that this problem even exists.

By the way, you could also use triggers to protect your data integrity. It's not my preferred way of doing things - but it does make for a workable solution.

One other thing that I've found useful (if you inherited a denormalised model that doesn't make use of transactions) is to have a means of recalculating any aggregate values (that might be stored in the data model) by summing across the granular child rows. Having components that do this gives you the flexiblity to check the stored aggregate values from time to time or, in the event that the data does develop problems, you can calculate what the stored aggregate values should be and make the necessary updates.

Friday, December 11, 2009

Lazy Installation Scripts

One of my pet hates is working with installation / update scripts which don't check if objects exist before trying to drop them.

This is just lazy and (when) lots of objects are involved it makes reviewing the installation logs difficult due to the large number of "Cannot drop the view...." type errors. Too often I've heard the excuse "you can just ignore those errors, they're not important". They may not be important, but there's no reason the DBA or user running the script should have to see or be confused by them.

OK, so SQL Server doesn't have an equivalent of Oracle's one liner CREATE OR REPLACE but it's easy to check if an object exists before running the DROP / CREATE pair of statements. There are a few options available. If the object is a table or a view you can query INFORMATION_SCHEMA.TABLES to see if it already exists. For a wider coverage of objects you can query SYS.OBJECTS directly.

The approach can go something like this:

IF EXISTS (SELECT 1 FROM INFORMATION_SCHEMA.TABLES
WHERE table_name = 'myView')
DROP VIEW myView
GO
CREATE VIEW myView AS .....

This simple step will make for much cleaner install logs where the only errors are ones that are worthy of attention.

Thursday, December 10, 2009

Monitoring a Long Running Stored Procedure

Recently I had the need to write a SQL Server stored procedure which performed numerous complex comparisons and wrote back numerous updates across many millions of rows. The result was a lengthy runtime. My client needed a means of quickly finding how far through the process was at any point in time.


I'm probably like many others and now just use the debugging ability of an IDE to get a look at what's going on inside a stored procedure, but that wouldn't work here. Way back when I'd PRINT statements to get information out to the messages pane but they won't display until a batch has finished and so couldn't be used here either.


The answer came in the form of RAISERROR WITH NOWAIT.


RAISERROR doesn't have to be used only with errors. In fact, for severity levels of 10 or less SQL Server will just treat them as messages, meaning that execution can continue on as normal, even when TRY / CATCH blocks are being used.


You'll need to ensure that the results are being sent to either text of grid (CTRL+T or CTRL+D in management studio) and click onto the messages tab of the output window in order to see the messages appear in real time.

Try this as an example:



When you execute the script you should see nothing for the first ten seconds with the first two messages then appearing before another ten second gap followed by the final message appearing.

Don't forget to click over to the messages tab in order to watch the action!