String Manipulation to Find Difference in Different Columns in a Wide Table
The recent thread at T-SQL forum at MSDN Dynamic SQL Stored Proc with information_schema.columns presented an interesting problem � find the differences in columns and list old and new values along with the column names in a new table.
The test tables that demonstrate the problem are:
CREATE TABLE dbo.z_test (
businessdate DATE,
colpk VARCHAR(20),
col3 VARCHAR(10),
col4 VARCHAR(10),
col5 VARCHAR(10),
col6 VARCHAR(10),
col7 VARCHAR(10),
col8 VARCHAR(10),
col9 VARCHAR(10)
)
INSERT INTO z_test
VALUES ('20100628','ABC01','A','B','C','D','E','F','G')
INSERT INTO z_test
VALUES ('20100628','ABC02','AA','BB','CC','DD','EE','FF','GG')
INSERT INTO z_test
VALUES ('20100628','ABC03','AAA','BBB','CCC','DDD','EEE','FFF','GGG')
INSERT INTO z_test
VALUES ('20100627','ABC01','A','BB','C','D','E','F','GG')
INSERT INTO z_test
VALUES ('20100627','ABC02','A','BB','C','DD','E','F','GG')
INSERT INTO z_test
VALUES ('20100627','ABC03','AAA','BBB','CC','D','EEE','FF','GGG')
And log table:
-- Log table
CREATE TABLE dbo.z_logtable (
businessdate DATE,
colpk VARCHAR(20),
colname VARCHAR(100),
preval VARCHAR(20),
curval VARCHAR(20)
)
Now, we want to construct a SQL statement, that will populate the Log table with the Column Name and Previous and New values for this column if there is a difference.
For this purpose we will use meta-table with the information about columns Information_Schema.Columns
The query I�m using is not guaranteed to work 100% since it uses string concatenation technique and assumes that the string will be concatenated in the order of columns in a table. In my tests this technique always worked as intended.
The first script demonstrates how to list columns with differences and count number of such columns:
SET nocount ON;
DECLARE @SQL VARCHAR(MAX)
SELECT @SQL = Coalesce(@SQL + ' + space(1) + ','') +
' case when isnull(a.' + Quotename(column_name) + ',space(0)) <> isnull(b.' + Quotename(column_name) + ',space(0)) then ' + Char(39) + Quotename(column_name) + Char(39) + ' else '''' end'
FROM information_schema.columns
WHERE table_name = 'z_test'
AND column_name NOT IN ('BusinessDate','ColPK')
SET @SQL = '
select *, LEN(ColDiff) - LEN(replace(colDiff,'']'',space(0))) as CountDiff
from (
Select a.BusinessDate, a.ColPK, RTRIM(LTRIM(' + @SQL + ')) as ColDiff
from z_test a
INNER JOIN z_test b on a.ColPK = b.ColPK and a.BusinessDate = dateadd(DAY,-1,b.BusinessDate)
) X'
--print @SQL
EXECUTE( @SQL)
In the script above I used ISNULL(a.column,space(0)) <> ISNULL(b.column, space(0)) in order to account for possible NULLs in the columns. In the later scripts I will just test a.column <> b.column for simplicity.
The sample data can be displayed by running the below t-sql SELECT query.
select * from z_test order by colpk, businessdate
When we execute dynamic sql script, it will compare sql data in database table z_test and find differences.
The list of different fields are listed in the output of the data compare statement as follows.
The second query shows how to create and execute script that will log column differences and previous/new values:
SET nocount ON
DECLARE @SQL VARCHAR(MAX)
DECLARE @SQLCase VARCHAR(MAX)
DECLARE @SQlFrom VARCHAR(MAX)
DECLARE @SQlSelect VARCHAR(MAX)
SELECT @SQlSelect = 'SELECT b.BusinessDate, a.ColPK, '
SELECT @SQlFrom = ' FROM Z_Test a
INNER JOIN Z_Test b on a.ColPK = b.ColPK
and a.BusinessDate = dateadd(DAY,-1,b.BusinessDate)'
SELECT @SQL = Coalesce(@SQL + '','') + Quotename(column_name,'''') + ' as ColName, b.' +
Quotename(column_name) + ' as PrevVal, a.' + Quotename(column_name) + ' as CurVal ' +
@SQlFrom + '
where a.' + Quotename(column_name) + '<> b.' + Quotename(column_name) +
Char(13) + ' UNION ALL' + Char(13) + @SQlSelect
FROM information_schema.columns
WHERE table_name = 'Z_Test'
AND column_name NOT IN ('BusinessDate','ColPK')
SELECT @SQL = Substring(@SQL,1,Len(@SQL) - Len(@SQlSelect) - 10)
SELECT @SQL = 'Select b.BusinessDate, b.ColPK, ' + @SQL
DELETE FROM dbo.z_logtable
SELECT @SQL = 'Insert into dbo.Z_LogTable (
BusinessDate, ColPK, ColName, CurVal, PreVal
) ' + @SQL
PRINT @SQL
EXECUTE( @SQL)
select * from dbo.Z_LogTable order by BusinessDate, ColPK, ColName
This generates one long insert statement for all columns with differences.
The original poster also asked for amendment in this script to do such inserts in chunks.
The next script demonstrates the same technique but with split for every 3 columns (see the difference in generated scripts in the Messages pane):
set nocount on
declare @ColCount int
select @ColCount = COUNT(*)
from INFORMATION_SCHEMA.COLUMNS
where TABLE_NAME = 'Z_Test'
declare @SQL varchar(max)
declare @SQLInsert varchar(max)
declare @SQlFrom varchar(max)
declare @SQlSelect varchar(max)
select @SQlSelect = 'Select b.BusinessDate, a.ColPK, '
select @SQlFrom = '
from Z_Test a
INNER JOIN Z_Test b on a.ColPK = b.ColPK
and a.BusinessDate = dateadd(DAY,-1,b.BusinessDate)'
set @SQLInsert = 'Insert into dbo.Z_LogTable (
BusinessDate, ColPK, ColName, CurVal, PreVal
) '
select @SQL = coalesce(@SQL + '','') +
quotename(Column_Name,'''') + ' as ColName,
' +
'b.' + quotename(Column_Name) + ' as PrevVal, ' +
'a.' + quotename(Column_Name) + ' as CurVal ' +
@SQlFrom + '
where a.' + QUOTENAME(Column_Name) +
'<> b.' + QUOTENAME(Column_Name) + CHAR(13) +
Case when ORDINAL_POSITION = @ColCount
then ''
when ORDINAL_POSITION %3 = 0
then @SQLInsert + CHAR(13) + @SQlSelect
else
' UNION ALL' + CHAR(13) + @SQlSelect
end
from INFORMATION_SCHEMA.COLUMNS
where TABLE_NAME ='Z_Test'
and COLUMN_NAME NOT IN ('BusinessDate','ColPK')
--SELECT @SQL = SUBSTRING(@SQL, 1,LEN(@SQL) -LEN(@SQlSelect) -10)
SELECT @SQL = 'Select b.BusinessDate, b.ColPK, ' + @SQL
DELETE FROM dbo.Z_LogTable
SELECT @SQL = @SQLInsert + @SQL
print @SQL
execute(@SQL)
SELECT * FROM Z_LogTable order by BusinessDate, ColPK, ColName
As you see, such tasks can be solved using simple string manipulation, although this technique is not guaranteed to work, especially if you try to add ORDER BY in the queries used. In my tests, though, it always worked.
You can see more scripts discussing the same technique at this blog
How to get information about all databases without a loop
About the Author : Naomi Nosonovsky
She is a personality of the year 2008 and 2009 at www.universalthread.com in .NET, SQL Server & VFP categories.
Naomi Nosonovsky is also an All Star contributor at ASP.NET Forums and MSDN Transact-SQL Forum.
Readers of this sql article can share their ideas about this article at String Manipulation Discussion Forum