Aggregate sql resultset into HashBytes value

Aggregate sql resultset into HashBytes value - sql

Is there an equivalent of
CHECKSUM_AGG(CHECKSUM(*))
for HashBytes?
I know you can do
SELECT
HashBytes('MD5',
CONVERT(VARCHAR,Field1) + '|'
+ CONVERT(VARCHAR,Field2) + '|'
+ CONVERT(VARCHAR,field3) + '|'
)
FROM MyTable
But I am not sure how to aggregate all calculated hashbyte records into a single value inside of SQL.
One reason I would want to do this is to determine if data has changed in the source table since the previous load before moving the data into my system.

You can loop through all records and combine hashes to one
declare #c cursor;
declare #data varchar(max);
declare #hash varchar(400) = '';
set #c = cursor fast_forward for
select cast(SomeINTData as varchar(50)) + SomeTextData
from TFact
where Year = #year
and Month = #month;
open #c
fetch next from #c into #data
while ##FETCH_STATUS = 0 begin
set #hash = HASHBYTES('sha1', #hash + #data)
fetch next from #c into #data
end
select #hash Ha

If you want to check if a given row has changed I strongly recomend you to use a "timestamp" column.
The value is automatically updated by Sql Server in every row modification.
Then if a row is changed, the value will be diferent after modification and you could notice it without implementing logic or querying the whole table.
But if you want to know if at least one row has been updated I recomend you to use:
DECLARE #Tablename sysname = 'MyTable';
SELECT modify_date FROM sys.tables WHERE name = #Tablename;
(If you are using .Net in your business layer it might be interesting for you to take a look on SqlDependency)

You can nest hashbytes, using a varbinary variable to accrue each row's inner hash results for the final outer hash.
My example below takes ~24 seconds against 870k rows on a mid-range Xeon. More columns and lots of null values will increase crunch time.
The Order by clause is essential to guaranteeing repeatable results.
Declare #TableHash varbinary(max) = 0x00;
Select #TableHash =
hashbytes('MD5', #TableHash +
hashbytes('MD5',
isnull(convert(nvarchar(max),Col1_int),'null') +
isnull(convert(nvarchar(max),Col2_int),'null') +
isnull(convert(nvarchar(max),Col3_int),'null') +
isnull(convert(nvarchar(max),Col4_int),'null') +
isnull(convert(nvarchar(max),Col5_nvmax),'null'))
)
From MyTable
Order by Col2_int,Col1_int;
Print convert(varchar(max), #TableHash, 1) +
Case #TableHash When 0x00 Then ' (Table has no data)' Else '' End;
Output:
0x2AF0A66411F23B67D3819AC407D3B8BD

With newer versions of SQL Server, you can use a combination of CONCAT and STRING_AGG to bung everything together, then hash the whole result.
SELECT
HASHBYTES('SHA2_512',
STRING_AGG(
CONCAT(
CAST(Field1 AS varchar(max)), -- at least one max
Field2,
field3
), ''
)
)
FROM MyTable;
Note that MD5 is deprecated, and would probably be at risk of hash collisions even in this case. You should use SHA2_512 or SHA2_256 instead.

Related

Generating filed name with concat

My table has column names m1,m2,m3...,m12.
I'm using iterator to select them and insert them one by one in another table.
In this iterator I'm trying to generate filed names with:
'['+concat('m',cast(#P_MONTH as nvarchar))+']'
where #P_MONTH is incrementing in each loop.
so for #P_MONTH = 1 this suppose to give [m1] which works fine.
But when I run query I get:
Conversion failed when converting the nvarchar value '[m1]' to data
type int.
And if I put simply [m1] in that select it works ok.
How to concat filed name so it can be actually interpreted as filed name from certain table?
EDIT
Here is full query:
DECLARE #SQLString nvarchar(500),
#P_YEAR int,
#P_MONTH int = 1
set #P_YEAR = 2018
WHILE #P_MONTH < 13
BEGIN
SET #SQLString =
'INSERT INTO [dbo].[MASTER_TABLE]
(sector,serial,
date, number, source)'+
'SELECT ' + '[SECTOR],[DEPARTMENT]' +
QUOTENAME(cast(CONVERT(datetime,CONVERT(VARCHAR(4),#P_YEAR)+RIGHT('0'+CONVERT(VARCHAR(2),#P_MONTH),2)+'01',5) as nvarchar))+
QUOTENAME ('M',cast(#P_MONTH as nvarchar)) +
'EMPLOYED' +
'FROM [dbo].[STATS]'+
'where YEAR= #P_YEAR'
EXECUTE sp_executesql #SQLString
SET #P_MONTH = #P_MONTH + 1
END
It's still not working. It executes successfully but it does nothing.

Good day,
Let's create a simple table for the sake of the explanation
DROP TABLE IF EXISTS T
GO
CREATE TABLE T(a1 INT)
GO
INSERT T(a1) VALUES (1),(2)
GO
SELECT a1 FROM T
GO
When we are using a query like bellow, the server parse the text as a value and not as a column name
DECLARE #String NVARCHAR(10)
SELECT #String = '1'
--
SELECT '['+concat('a',cast(#String as nvarchar))+']'
FROM T
GO
This mean that the result will be 2 rows with no name for the column and the value will be "[a1]"
Moreover, the above query uses the brackets as part of the string.
One simple solution is to use the function QUOTENAME in order to add brackets around a name.
Another issue in this approach is the optional risk of SQL Injection. QUOTENAME might not be perfect solution but can help in this as well.
If we need to use entities name dynamically like in this case the column name then for most cases using dynamic query is the best solution. This mean to use the Stored Procedure sp_executesql as bellow
DECLARE #String INT
SELECT #String = 1
DECLARE #SQLString nvarchar(500);
SET #SQLString =
'SELECT ' + QUOTENAME(concat('a',cast(#String as nvarchar))) + ' FROM T'
EXECUTE sp_executesql #SQLString
GO

Server side paging and per-column filtering in SQL server with minimal dynamic SQL

Goal: Write an STP for SQL Server 2012+ that can do server-side paging and per-column filtering, returning the row count of the unfiltered and filtered data, with as little dynamic SQL as possible.
Motivation: We have many queries for our website that would return thousands of rows, so server-side paging is needed. The results from these queries are displayed in large tables which allow for ordering on a single column and filtering on multiple columns, which is different from many solutions I've seen online which have a single search field for the whole table. Additionally, we need the total size of the unfiltered and filtered resultset to display to the user as Showing 1-10 of 2,500 (filtered from 15,000).
Our current method to achieve this is with a query defined as:
CREATE PROCEDURE [dbo].[stp_MyBigQuery]
#take INT = 25,
#skip INT = 0,
#sortBy VARCHAR(256) = 'Name',
#sortDir VARCHAR(5) = 'desc',
#searchBy NVARCHAR(max) = ''
AS BEGIN
DECLARE #sql varchar(max) = '
;WITH fullList AS ( SELECT Name, Age, Town FROM MyBigTable ),
totalCount AS ( SELECT COUNT(*) AS numRows FROM fullList )
SELECT
fl.*,
COUNT(*) over() as filteredCount,
(select top 1 totalCount.numRows from totalCount) as totalCount
INTO #temp
from fullList fl'
IF(#searchBy != '') BEGIN
SET #sql = #sql + ' WHERE ' + #searchBy + ' '
END
SET #sql = #sql + ' ORDER BY ' + #sortBy + ' ' + #sortDir +
' OFFSET ' + CAST(#skip AS varchar(32)) + ' rows ' +
'fetch next ' + CAST(#take AS varchar(32)) + ' rows only '
SET #sql = #sql + '
declare #total bigint = (SELECT TOP 1 totalCount FROM #temp)
declare #filteredCount bigint = (SELECT TOP 1 filteredCount FROM #temp)
ALTER TABLE #temp
DROP COLUMN filteredCount, totalCount
SELECT
t.*
FROM #temp t
ORDER BY ' + #sortBy + ' ' + #sortDir + '
SELECT ISNULL(#filteredCount,0)
SELECT ISNULL(#total,0)
DROP TABLE #temp'
exec(#sql)
END
Where #searchBy would be something like Name ='Alice' and Age > 30
Downsides of this method:
Lots of dynamic SQL, reducing performance and maintainability
Pretty bad performance depending on the table sizes due to the creation of a temp table, and potentially from the CTEs?
From the research I've done so far, it seems completely possible to achieve this without the dynamic SQL as long as you're not doing the filtering, and also if you only allow filtering on a single column.
This is the closest solution I've seen to what I'm looking for, but it returns all rows from the table and does paging on the webserver instead of in the query. When I started thinking about how we could avoid all this messy dynamic SQL this was my first thought, but there are concerns about the amount of memory on our webserver not being able to handle the full size of some of these tables. Our application is deployed on servers provided by our clients, so we are fairly limited in how much memory we can request, hence why we want to only return the paged data and the total counts from the DB to the web server.
I've also found the much nicer way to do this using Entity Framework, but that is unfortunately not an option for us.
Lastly, it would be really great if I could write one STP that would handle this logic for multiple queries, which appears to be almost achieved by the exec_with_paging STP mentioned in this article, but the link for it returns a 404. I'm guessing this may not be possible, but I just wanted to mention it in case someone has any ideas.
Thank you for any help/guidance!

Return multiple columns as single comma separated row in SQL Server 2005

I'm curious to see if this is possible.
I have a table or this could be specific to any old table with data. A simple SELECT will return the columns and rows as a result set. What I'm trying to find out if is possible to return rows but rather than columns, the columns concatenated and are comma separated. So expected amount of rows returned but only one varchar column holding comma separated results of all the columns just like a CSV file.
Thanks.
[UPDATE]
Here is a bit more detail why I'm asking. I don't have the option to do this on the client, this is a task I'm trying to do with SSIS.
Scenario: I have a table that is dynamically created in SSIS but the column names change each time it's built. The original package uses BCP to grab the data and put it into a flat file but due to permissions when run as a job BCP can't create the flat file at the required destination. We can't get this changed either.
The other issue is that with SSIS 2005, using the flat files destination, you have to map the column name from the input source which I can't do because the column names keep changing.
I've written a script task to grab all the data from the original tables and then use stream writer to write to the CSV but I have to loop through each row then through each column to produce the string built up of all the columns. I want to measure performance of this concatenation of columns on sql server against a nasty loop with vb.net.
If I can get sql to produce a single column for each row I can just write a single line to the text file instead of iterating though each column to build the row.

I Think You Should try This
SELECT UserName +','+ Password AS ColumnZ
FROM UserTable

Assuming you know what columns the table has, and you don't want to do something dynamic and crazy, you can do this
SELECT CONCAT(ColumnA, ',', ColumnB) AS ColumnZ
FROM Table

There is a fancy way to this using SQL Server's XML functions, but for starters could you just cast the contents of the columns you care about as varchar and concatenate them with commas?
SELECT cast(colA as varchar)+', '+cast(colB as varchar)+', '+cast(colC as varchar)
FROM table
Note, that this will get tripped up if any of your contents have a comma or double quotes in them, in which case you can also use a replace function on each cast to escape them.

This could stand to be cleaned up some, but you can do this by using the metadata stored in sys.objects and sys.columns along with dynamic SQL. Note that I am NOT a fan of dynamic SQL, but for reporting purposes it shouldn't be too much of a problem.
Some SQL to create test data:
if (object_id('test') is not null)
drop table test;
create table test
(
id uniqueidentifier not null default newId()
,col0 nvarchar(255)
,col1 nvarchar(255)
,col2 nvarchar(255)
,col3 nvarchar(255)
,col4 nvarchar(255)
);
insert into test (col0,col1,col2,col3,col4)
select 'alice','bob','charlie','dave','emily'
union
select 'abby','bill','charlotte','daniel','evan'
A stored proc to build CSV rows:
-- emit the contents of a table as a CSV.
-- #table_name: name of a permanent (in sys.objects) table
-- #debug: set to 1 to print the generated query
create procedure emit_csv(#table_name nvarchar(max), #debug bit = 0)
as
declare #object_id int;
set nocount on;
set #object_id = object_id(#table_name);
declare #name nvarchar(max);
declare db_cursor cursor for
select name
from sys.columns
where object_id = #object_id;
open db_cursor;
fetch next from db_cursor into #name
declare #query nvarchar(max);
set #query = '';
while ##FETCH_STATUS = 0
begin
-- TODO: modify appended clause to escape commas in addition to trimming
set #query = #query + 'rtrim(cast('+#name+' as nvarchar(max)))'
fetch next from db_cursor into #name;
-- add concatenation to the end of the query.
-- TODO: Rearrange #query construction order to make this unnecessary
if (##fetch_status = 0)
set #query = #query + ' + '','' +'
end;
close db_cursor;
deallocate db_cursor;
set #query = 'select rtrim('+#query+') as csvrow from '+#table_name;
if #debug != 0
begin
declare #newline nvarchar(2);
set #newline = char(13) + char(10)
print 'Generated SQL:' + #newline + #query + #newline + #newline;
end
exec (#query);
For my test table, this generates the query:
select
rtrim(rtrim(cast(id as nvarchar(max)))
+ ','
+rtrim(cast(col0 as nvarchar(max)))
+ ','
+rtrim(cast(col1 as nvarchar(max)))
+ ','
+rtrim(cast(col2 as nvarchar(max)))
+ ','
+rtrim(cast(col3 as nvarchar(max)))
+ ','
+rtrim(cast(col4 as nvarchar(max))))
as csvrow
from test
and the result set:
csvrow
-------------------------------------------------------------------------------------------
EEE16C3A-036E-4524-A8B8-7CCD2E575519,alice,bob,charlie,dave,emily
F1EE6C84-D6D9-4621-97E6-AA8716C0643B,abby,bill,charlotte,daniel,evan
Suggestions
Modify the cursor loop to escape commas
Make sure that #table_name refers to a valid table (if object_id(#table_name) is null) in the sproc
Some exception handling would be good
Set permissions on this so that only the account that runs the report can execute it. String concatenation in dynamic SQL can be a big security hole, but I don't see another way to do this.
Some error handling to ensure that the cursor gets closed and deallocated might be nice.
This can be used for any table that is not a #temp table. In that case, you'd have to use sys.objects and sys.columns from tempdb...

select STUFF((select ','+ convert(varchar,l.Subject) from tbl_Student B,tbl_StudentMarks L
where B.Id=L.Id FOR XML PATH('')),1,1,'') Subject FROM tbl_Student A where A.Id=10

Dynamically search columns for given table

I need to create a search for a java app I'm building where users can search through a SQL database based on the table they're currently viewing and a search term they provide. At first I was going to do something simple like this:
SELECT * FROM <table name> WHERE CAST((SELECT COLUMN_NAME
FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = '<table name>')
AS VARCHAR) LIKE '%<search term>%'
but that subquery returns more than one result, so then I tried to make a procedure to loop through all the columns in a given table and put any relevant fields in a results table, like this:
CREATE PROC sp_search
#tblname VARCHAR(4000),
#term VARCHAR(4000)
AS
SET nocount on
SELECT COLUMN_NAME
INTO #tempcolumns
FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = #tblname
ALTER TABLE #tempcolumns
ADD printed BIT,
num SMALLINT IDENTITY
UPDATE #tempcolumns
SET printed = 0
DECLARE #colname VARCHAR(4000),
#num SMALLINT
WHILE EXISTS(SELECT MIN(num) FROM #tempcolumns WHERE printed = 0)
BEGIN
SELECT #num = MIN(num)
FROM #tempcolumns
WHERE printed = 0
SELECT #colname = COLUMN_NAME
FROM #tempcolumns
WHERE num = #num
SELECT * INTO #results FROM #tblname WHERE CAST(#colname AS VARCHAR)
LIKE '%' + #term + '%' --this is where I'm having trouble
UPDATE #tempcolumns
SET printed = 1
WHERE #num = num
END
SELECT * FROM #results
GO
This has two problems: first is that it gets stuck in an infinite loop somehow, and second I can't select anything from #tblname. I tried using dynamic sql as well, but I don't know how to get results from that or if that's even possible.
This is for an assignment I'm doing at college and I've gotten this far after hours of trying to figure it out. Is there any way to do what I want to do?

You need to only search columns that actually contain strings, not all columns in a table (which may include integers, dates, GUIDs, etc).
You shouldn't need a #temp table (and certainly not a ##temp table) at all.
You need to use dynamic SQL (though I'm not sure if this has been part of your curriculum so far).
I find it beneficial to follow a few simple conventions, all of which you've violated:
use PROCEDURE not PROC - it's not a "prock," it's a "stored procedure."
use dbo. (or alternate schema) prefix when referencing any object.
wrap your procedure body in BEGIN/END.
use vowels liberally. Are you saving that many keystrokes, never mind time, saying #tblname instead of #tablename or #table_name? I'm not fighting for a specific convention but saving characters at the cost of readability lost its charm in the 70s.
don't use the sp_ prefix for stored procedures - this prefix has special meaning in SQL Server. Name the procedure for what it does. It doesn't need a prefix, just like we know they're tables even without a tbl prefix. If you really need a prefix there, use another one like usp_ or proc_ but I personally don't feel that prefix gives you any information you don't already have.
since tables are stored using Unicode (and some of your columns might be too), your parameters should be NVARCHAR, not VARCHAR. And identifiers are capped at 128 characters, so there is no reason to support > 257 characters for #tablename.
terminate statements with semi-colons.
use the catalog views instead of INFORMATION_SCHEMA - though the latter is what your professor may have taught and might expect.
CREATE PROCEDURE dbo.SearchTable
#tablename NVARCHAR(257),
#term NVARCHAR(4000)
AS
BEGIN
SET NOCOUNT ON;
DECLARE #sql NVARCHAR(MAX);
SET #sql = N'SELECT * FROM ' + #tablename + ' WHERE 1 = 0';
SELECT #sql = #sql + '
OR ' + c.name + ' LIKE ''%' + REPLACE(#term, '''', '''''') + '%'''
FROM
sys.all_columns AS c
INNER JOIN
sys.types AS t
ON c.system_type_id = t.system_type_id
AND c.user_type_id = t.user_type_id
WHERE
c.[object_id] = OBJECT_ID(#tablename)
AND t.name IN (N'sysname', N'char', N'nchar',
N'varchar', N'nvarchar', N'text', N'ntext');
PRINT #sql;
-- EXEC sp_executesql #sql;
END
GO
When you're happy that it's outputting the SELECT query you're after, comment out the PRINT and uncomment the EXEC.

You get into an infinite loop because EXISTS(SELECT MIN(num) FROM #tempcolumns WHERE printed = 0) will always return a row even if there are no matches - you need to EXISTS (SELECT * .... instead
To use dynamic SQL, you need to build up a string (varchar) of the SQL statement you want to run, then you call it with EXEC
eg:
declare #s varchar(max)
select #s = 'SELECT * FROM mytable '
Exec (#s)

Help with TSQL - a way to get the value in the Nth column of a row?

I hope to find a way to get the value in the Nth column of a dataset.
Thus, for N = 6 I want
SELECT (Column6Value) from MyTable where MyTable.RowID = 14
Is there a way to do this in TSQL as implemented in SQL Server 2005? Thanks.

You should be able to join with the system catalog (Information_Schema.Columns) to get the column number.

This works:
create table test (a int, b int, c int)
insert test values(1,2,3)
declare #column_number int
set #column_number = 2
declare #query varchar(8000)
select #query = COLUMN_NAME from information_Schema.Columns
where TABLE_NAME = 'test' and ORDINAL_POSITION = #column_number
set #query = 'select ' + #query + ' from test'
exec(#query)
But why you would ever do something like this is beyond me, what problem are you trying to solve?

Not sure if you're at liberty to redesign the table, but if the ordinal position of the column is significant, your data is not normalized and you're going to have to jump through lots of hoops for many common tasks.
Instead of having table MyTable with Column1... ColumnN you'd have a child table of those values you formerly stored in Column1...ColumnN each in their own row.
For those times when you really need those values in a single row, you could then do a PIVOT: Link
Edit: My suggestion is somewhat moot. Ash clarified that it's "de-normalization by design, it's a pivot model where each row can contain one of any four data types." Yeah, that kind of design can be cumbersome when you normalize it.

If you know the range of n you could use a case statement
Select Case when #n = 1 then Column1Value when #n = 2 then Column2Value end
from MyTable
As far as I know there is no dynamic way to replace a column (or table) in a select statement without resorting to dynamic sql (in which chase you should probably refactor anyways)

Implementation of #Mike Sharek's answer.
Declare #columnName varchar(255),
#tablename varchar(255), #columnNumber int, #SQL nvarchar(4000)
Set #tablename = 'MyTable'
Set #columnNumber = 6
Select #columnName = Column_Name from Information_SChema.columns
where Ordinal_position = #columnNumber and Table_Name = #tablename
Set #SQL = 'select ' + #columnName + ' from ' + #tableName + ' where RowID=14'
Exec sp_Executesql #SQL
I agree with Sambo - why are you trying to do this? If you are calling the code from C# or VB, its much easier to grab the 6th column from a resultset.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Aggregate sql resultset into HashBytes value - sql

Related

Generating filed name with concat

Server side paging and per-column filtering in SQL server with minimal dynamic SQL

Return multiple columns as single comma separated row in SQL Server 2005

Dynamically search columns for given table

Help with TSQL - a way to get the value in the Nth column of a row?

Categories

Resources