Will the use of CURSOR improve the performance / speed of querying using PIVOT in SQL? - sql

Newbie here on EAV (Entity-Attribute-Value) model of DB in SQL.
Just a background: I am using SQL Server 2016. The use of EAV is kind of a requirement at work so I am learning to do it one step at a time.
I recently learned how to do a dynamic PIVOT to return 800+ rows with 200+ columns in an EAV table.
See details here:
Converting 200+ rows to column in SQL Server using PIVOT
As successful it was to return the data I need, the performance speed was too slow - it took about 30mins to query. By the way, I am using the code as follows:
declare #pivot_col varchar(max);
declare #sql varchar(max);
select #pivot_col = STUFF(
( SELECT ',' + CAST([Col_Name] AS VARCHAR(max) ) AS [text()]
FROM ( select distinct [Col_Name] from tbl_Values ) A
ORDER BY [Col_Name] FOR XML PATH('')), 1, 1, NULL
);
set #sql = 'SELECT *
FROM ( SELECT [Row_ID], [Col_Name], [Col_Value] FROM tbl_Values ) AS a
PIVOT (
MAX([Col_Value])
FOR [Col_Name] in (' + #pivot_col + ' )
) AS p
ORDER BY [Row_ID]';
exec ( #sql );
I am trying to incorporate CURSOR with this but hasn't gone much far. Before I go more distance on research, can you provide input as to if it makes any difference with regards to performance / speed?
Thanks!

Found a solution to the poor performance of my PIVOT query: I was told to create a clustered index on the Row_ID column I have in my table. I ran the query below:
CREATE CLUSTERED INDEX IX_tbl_Values_Row_ID
ON dbo.tbl_Values (Row_ID);
GO
And the query I have on my question which took 30 mins to load before had now run for just 6 seconds now! Thanks to #MohitShrivastava for the tip! Definitely worked.
I also referred to this before creating the clustered index:
https://learn.microsoft.com/en-us/sql/relational-databases/indexes/create-clustered-indexes?view=sql-server-ver15

Related

how to check a dynamic column name is null

I have a table like below which has several columns along with series of numbers as well like the below:
Name: JLEDG
name
user_val_1
user_val_2
user_val_3
user_val_4
One
Two
Three
Three
Three
DECLARE #myvar int = 3;
So I would like to do the following which is not working:
SELECT * FROM JLEDG WHERE ('user_val_' + #myvar) IS NULL;
Expect the sql should be
SELECT * FROM JLEDG WHERE user_val_3 IS NULL;
You can only do that in dynamic SQL. You seem to have a problem with your data model. You shouldn't be storing values splayed across columns like that. You should have another table with one row per value.
One thing you can do is unpivot (using apply) and then filter:
select j.*
from jledg j cross apply
(values (1, user_val_1), (2, user_val_2), . . .
) v(which, user_val)
where which = #myvar;
The alternative is to use dynamic SQL (sp_executesql), but that seems quite cumbersome when you could just fix the data model.
SQL Server is declarative by design, and does not support macro substitution. As Gordon mentioned in his solution (+1), Dynamic SQL is just another option
Example
Declare #myvar int = 3
Declare #SQL varchar(max) = concat('SELECT * FROM JLEDG WHERE user_val_',#myvar,' IS NULL;')
Exec(#SQL)

SQL: Improving the string split

I have a set of code that takes a string value, split it, and pass it to a table. The code works, but it runs slow. Any suggestion to modify the code and make it run faster would be greatly appreciated.
DECLARE #StrPropertyIDs VARCHAR(1000)
SET #StrPropertyIDs = '419,429,459'
DECLARE #TblPropertyID TABLE
(
property_id varchar(100)
)
INSERT INTO #TblPropertyID(property_id)
select x.Item
from dbo.SplitString(#StrPropertyIDs, ',') x
select *
from vw_nfpa_firstArv_RPT
where property_use IN
(
SELECT property_id
FROM #TblPropertyID
)
The best long term strategy here would be to move away from CSV data in your SQL tables if at all possible. As a quick fix here, we could try creating the table variable with an index on property_id:
DECLARE #TblPropertyID TABLE (
property_id varchar(100) INDEX idx CLUSTERED
);
This would make the WHERE IN clause of your query faster, though we could try rewriting it using EXISTS:
SELECT *
FROM vw_nfpa_firstArv_RPT t1
WHERE EXISTS (SELECT 1 FROM #TblPropertyID t2
WHERE t2.property_id = t1.property_use);
Note that this would only work on SQL Server 2014 or later.

Convert SQL results from "N rows of 1 column" to "1 row of N columns" from WITHIN the query

Doing this seemingly trivial task should be simple and obvious using PIVOT - but isn't.
What is the cleanest way to do the conversion, not necessarily using pivot, when limited to ONLY using "pure" SQL (see other factors, below)?
It shouldn't affect the answer, but note that a Python 3.X front end is being used to run SQL queries on a MS SQL Server 2012 backend.
Background :
I need to create CSV files by calling SQL code from Python 3.x. The CSV header line is created from the field (column) names of the SQL table that holds the results of the query.
The following SQL code extracts the field names and returns them as N rows of 1 column - but I need them as 1 row of N columns. (In the example below, the final result must be "A", "B", "C" .)
CREATE TABLE #MyTable -- ideally the real code uses "DECLARE #MyTable TABLE"
(
A varchar( 32 ),
B varchar( 32 ),
C varchar( 32 )
) ;
CREATE TABLE #MetaData -- ideally the real code uses "DECLARE #MetaData TABLE"
(
NameOfField varchar( 32 ) not NULL
) ;
INSERT INTO #MetaData
SELECT name
FROM tempdb.sys.columns as X
WHERE ( object_id = Object_id( 'tempdb..#MyTable' ) )
ORDER BY column_id ; -- generally redundant, ensures correct order if results returned in random order
/*
OK so far, the field names are returned as 3 rows of 1 column (entitled "NameOfField").
Pivoting them into 1 row of 3 columns should be something simple like:
*/
SELECT NameOfField
FROM #MetaData AS Source
PIVOT
(
COUNT( [ NameOfField ] ) FOR [ NameOfField ]
IN ( #MetaData ) -- I've tried "IN (SELECT NameOfField FROM #Metadata)"
) AS Destination ;
This error gets raised twice, once for the COUNT and once for the "FOR" clause of the PIVOT statement:
Msg 207, Level 16, State 1, Line 32
Invalid column name ' NameOfField'.
How do I use the contents of #Metadata to get PIVOT to work? Or is there another simple way?
Other background factors to be aware of:
OBDC (Python's pyodbc package) is being used to pass the SQL queries from - and return the results (a cursor) to - a Python 3.x front end. Consequently there is no opportunity to use any type of manual intervention before the result set is returned to Python.
The above SQL code is intended to become standard boilerplate for every query passed to SQL. The code must dynamically "adapt" itself to the structure of #MyTable (e.g. if field B is removed while D and E are added after C, the end result must be "A", "C","D", "E"). This means that the field names of a table must never appear inside PIVOT's IN clause (the #MetaData table is intended to supply those values).
"Standard" SQL must be used. ALL vendor specific (e.g. Microsoft) extensions/utilities (e.g. "bcp", sqlcmd) must be avoided unless there is a very compelling reason to use them (because "it's there" doesn't count).
For known reasons the select clause (into #Metadata) doesn't work for temporary variables (#MyTable). Is there an equivalent Select that works for temporary variables(i.e. #MetaData)?
UPDATE: This problem is subtly different from that in SQL Server dynamic PIVOT query?. In my case I have to preserve the order of the fields, something not required by that question.
WHY I NEED TO DO THIS:
The python code is a GUI for non-technical people. They use the GUI to pick & chose which (or even all) SQL reports to run from a HUGE number of reports.
Apps like Excel are being used to view these files: to keep our users happy each CSV file must have a header line. The header line will consist of the field names from the SQL table that holds the results of the query.
These scripts can change at any time (e.g. add/delete a column) without any advance notice. To meet our users needs the header line must automatically "adjust itself" to make the corresponding changes. The SQL code below accomplishes this.
The header line gets merged (using UNION) with the query results to form the result set (a cursor) that gets passed back to Python. Python then processes the returned data and creates the CSV file (including the header line) that gets used by our customers.
In a nutshell: We have many sites, many users, many queries. By having SQL "dynmically create" the header line we remove the headache of having to manually manage/coordinate/rollout the SQL changes to all affected parties.
I am unsure what "pure" sql is. Are you refering to ANSI-92 SQL?
Anyhow, if you can use SQL variables, try this:
DECLARE #STRING VARCHAR(MAX)
SELECT #STRING = COALESCE(#STRING + ', ' + '"' + NameOfField + '"', '"' + NameOfField + '"')
FROM #MetaData
SELECT #STRING
/*
Results:
"A", "B", "C"
*/
To #Tab Alleman, thanks. I was able to modify the answer to SQL Server dynamic PIVOT query?
to do the swap (see below) in a way that meets all my needs.
NOTE: For some reason the "DISTINCT" keyword places the fields in alphabetical order - something I don't want.
Commenting that word out (as done below) preserves the order of the fields. I'm a bit uneasy about doing this but in this case it should be safe because the values being selected into #MetaData are guaranteed to be unique.
The difference can be easily seen by swapping fields A & B in #MyTable and uncommenting the "DISTINCT" keyword
--drop table #MyTable
--drop table #MetaData
Create TABLE #MyTable
(
A varchar( 10 ),
B varchar( 10 ),
C varchar( 10 )
)
;
CREATE TABLE #MetaData
(
NameOfField varchar( 100 ) not NULL,
Position int
)
;
INSERT INTO #MetaData
SELECT name, column_id
FROM tempdb.sys.columns as X
WHERE ( object_id = Object_id( 'tempdb..#MyTable' ) )
--ORDER BY column_id -- normally redundant, guards against results being returned in random order
;
select * from #MetaData
DECLARE #cols AS NVARCHAR(MAX),
#query AS NVARCHAR(MAX);
SET #cols = STUFF( (SELECT
-- DISTINCT
',' + QUOTENAME( c.NameOfField )
FROM #MetaData AS c
FOR XML PATH(''), TYPE
).value('.', 'NVARCHAR(MAX)')
,1,1,'')
--print( #cols )
set #query = 'SELECT ' + #cols + ' from
(
select NameOfField
from #MetaData
) AS x
pivot
(
MAX( NameOfField )
for NameOfField in ( '+ #cols + ' )
) AS p
'
--print( #query )
execute( #query )
drop table #MyTable
drop table #MetaData

Convert SQL columns to rows with an ID

I have data that is 192 separate columns. They are column pairs so that there is data for a 15 minute time slice and a quality control number. The way it is setup now each row represents a single day. I would like to insert the data into another table with fewer columns (Date,ReadTime,QualityControlNumber,Reading,...)
I started with trying a while loop like this but using a variable to change the column header doesn't seem possible.
Should I nest while loops to increment the column headers or is there another trick I should be using
Code tried:
Declare #count varchar (10),
#QC varchar (10),
#Interval varchar(10)
set #count = 1
set #QC = 'QC#' + #count
set #Interval = 'Interval#' + #count
While (#count<97)
BEGIN
insert into Data_DATEstr (Number,[ReadDate],TimeInterval,QCReading,IntervalReading,ConversionFactor)
select [Number], [Start Date], #count, ['QC#'+#count], [Interval# +#count] ,[Conversion Factor]
from table
where [Number] = '103850581'
and [Start Date] = '060112'
set #count = (#count+1)
END
There is no need to use a while-loop for this. You want an UNPIVOT If you are trying to convert 192 separate columns to rows, then you will definitely want to use an UNPIVOT function. This can be written using dynamic SQL so then you will have have to code all of the fields:
DECLARE #cols AS NVARCHAR(MAX),
#query AS NVARCHAR(MAX)
select #cols = stuff((select ','+quotename(C.name)
from sys.columns as C
where C.object_id = object_id('test') and
C.name like 'col%'
for xml path('')), 1, 1, '')
set #query = 'SELECT QualityControlNumber, replace(col, ''col'', '''') as col, value
from test
unpivot
(
value
for col in (' + #cols + ')
) p '
execute(#query)
See SQL Fiddle with Demo
Using dynamic SQL for this will get the list of fields that you want to transpose when the query is executed. This also prevents you having to manually code for 192 separate columns.
Although your question is not entirely clear, it sounds like you need some form of operation similar to UNPIVOT. This lets you rotate columns into rows. I'd suggest reading up on it and seeing if it will work for you -- even if it doesn't, it might suggest an approach that would work.
The command that you want is unpivot. Something like:
select Date, ReadTime, QualityControlNumber, Reading
from t
unpivot (Reading for date in (day1, . . . , dayn) as unpvt
You will find that you won't really get the date, but instead the original column name. You can fix this by putting the unpivot query in a subquery (or CTE) and then using string manipulations and cast to convert the column name to a date.

How to select some particular columns from a table if the table has more than 100 columns

I need to select 90 columns out of 107 columns from my table.
Is it possible to write select * except( column1,column2,..) from table or any other way to get specific columns only, or I need to write all the 90 columns in select statement?
You could generate the column list:
select name + ', '
from sys.columns
where object_id = object_id('YourTable')
and name not in ('column1', 'column2')
It's possible to do this on the fly with dynamic SQL:
declare #columns varchar(max)
select #columns = case when #columns is null then '' else #columns + ', ' end +
quotename(name)
from sys.columns
where object_id = object_id('YourTable')
and name not in ('column1', 'column2')
declare #query varchar(max)
set #query = 'select ' + #columns + ' from YourTable'
exec (#query)
No, there's no way of doing * EXCEPT some columns. SELECT * itself should rarely, if ever, be used outside of EXISTS tests.
If you're using SSMS, you can drag the "columns" folder (under a table) from the Object Explorer into a query window, and it will insert all of the column names (so you can then go through them and remove the 17 you don't want)
There is no way in SQL to do select everything EXCEPT col1, col2 etc.
The only way to do this is to have your application handle this, and generate the sql query dynamically.
You could potentially do some dynamic sql for this, but it seems like overkill. Also it's generally considered poor practice to use SELECT *... much less SELECT * but not col3, col4, col5 since you won't get consistent results in the case of table changes.
Just use SSMS to script out a select statement and delete the columns you don't need. It should be simple.
No - you need to write all columns you need. You might create an view for that, so your actual statement could use select * (but then you have to list all columns in the view).
Since you should never be using select *, why is this a problem? Just drag the columns over from the Object Explorer and delete the ones you don't want.