Adding 1 number to TOP Select hangs the query - sql

OK, first a disclaimer. I'm using an Entity Attribute Value approach in a couple of my tables. So basically I have a List of attributes in a single column in one table that I want to then populate it in a single row in a seperate view.
I found this solution and it works great:
SQL: Dynamic view with column names based on column values in source table
However the initial load was extremely slow (it took over 27 minutes to populate 514 rows). I thought something didn't seem right at all so I messed around with selecting portions of the Client table using TOP. I got instant results. I found that I could instantly queue the entire database this way. However I found a very weird caveat. The most I could select was 5250 records.
Up to this point I was still getting instant results. If I tried to select 5251, the query hangs. I tried it on a test server and got the same limitations but with a different number (I could select a max of 5321 there). Keep in mind the table has only 514 records, so I have no idea why adding 1 number to a TOP select would cause it to hang. Does anyone have any input in this? Here's my working sql query below:
DECLARE #cols AS NVARCHAR(MAX)
DECLARE #query AS NVARCHAR(MAX)
select #cols = STUFF((SELECT distinct ',' + QUOTENAME(a.AttributeName)
from AttributeCodes a
FOR XML PATH(''), TYPE).value('.', 'NVARCHAR(MAX)')
,1,1,'')
set #query = 'SELECT TOP 5250 ClientID, ' + #cols + ' from
(
select c.ClientID
, c.[Value]
, a.AttributeName
from Client c
inner join AttributeCodes a
on a.AttributeCodeId = c.AttributeCodeID
)x
pivot
(
min([Value])
for AttributeName in (' + #cols + ')
)p'
execute(#query)
EDIT:
OK it seems as though the problem is that the execution plan is completely changed by adding another digit. I'll post the two results below. I still don't know why it would change, and if there is any way I can prevent it from using a Hash Match instead of an Inner Join.
Execution Plan 1 (instant):
Execution Plan 2 (30+ minutes):

Without looking at the exact index design and knowing exactly the size and scope of what you're selecting and what's in the database, I can't give you the exact reason.
But what I can tell you is that there are cost thresholds that SQL Server uses to come up with plans about whether it's more advantageous to scan a table or perform a large amount of seeks. What's happening here is that SQL Server seems to be crossing that threshold at the 5250/5251 boundary.
If you were to add many more data rows to your main table, rebuild the statistics, and then re-query, you would likely find that SQL Server would stop scanning and hashing and would return to the lookup-based plan because the proportion of rows it would have to repeatedly seek would once again be lower.
So, how to eliminate this problem? I'll start out by saying that EAV can be fine for certain types of designs in certain scenarios, but when you're pulling a lot of data from the system, or you want reports off of these models, EAV becomes problematic, and these are the types of problems you tend to see. Since you're selective
You have a couple of possible solutions here.
Add a LOOP JOIN hint to your join. Yuck.
Be more selective in what you're asking from the database -- for example, just ask it for the values for a single client at a time.
Re-evaluate your reasons for using the EAV model and redesign your data model using conventional techniques.
Again, I don't know much about your reasoning for this design, so if I were in your position and my back were against the wall, I would probably do a combination of 1) and 3). I'd add the hint to "staunch the bleeding" as it were, and then I'd consider re-evaluating the decision to use an entity-attribute-value design altogether.

The following is a formatted comment, not an answer.
Out of morbid curiosity, I've been fiddling about with the following test code. I realize that is does not have the proper code for checking for the existence of the test tables, that the edge cases for the random numbers are undoubtedly wrong, ... . So it goes.
The intent is to create considerably more test data and larger results than those described in the question. With the parameters shown it requires roughly 306 seconds to initialize and 87 seconds to run the dynamic query on my notebook. (Windows 7 Professional 64-bit, 16GB memory, SQL Server 2008 R2 64-bit.) I've seen no indications of any difficulties. Daniel, do you see any obvious differences e.g. a different data type for the Value column that might be larger? Version of SQL Server? Available memory? Did I completely muck up your EAV representation?
-- Test parameters.
declare #NumberOfAttributes as Int = 1000
declare #NumberOfClients as Int = 1000
declare #NumberOfSampleRows as Int = 1000000
declare #NumberOfTopRows as Int = 10000
declare #Start as DateTime = GetDate()
-- Houseclean any prior data.
if Object_Id( 'AttributeCodes' ) is not NULL
drop table AttributeCodes
if Object_Id( 'Client' ) is not NULL
drop table Client
-- Create the tables.
create table AttributeCodes (
AttributeCodeId Int Identity(1,1) not NULL,
AttributeName VarChar(64) not NULL )
create table Client (
ClientId Int not NULL,
AttributeCodeId Int not NULL,
[Value] VarChar(64) not NULL )
set nocount on
-- Generate some sample attributes.
declare #Count as Int
set #Count = #NumberOfAttributes
while ( #Count > 0 )
begin
insert into AttributeCodes ( AttributeName ) values
( 'Attr_' + Right( '000' + Cast( #Count as VarChar(8) ), 4 ) )
set #Count = #Count - 1
end
-- Generate some sample client data.
declare #ClientId as Int
declare #AttributeCodeId as Int
set #Count = #NumberOfSampleRows
while ( #Count > 0 )
begin
set #ClientId = 1 + Cast( Rand() * #NumberOfClients as Int )
set #AttributeCodeId = 1 + Cast( Rand() * #NumberOfAttributes as Int )
insert into Client ( ClientId, AttributeCodeId, Value ) values
( #ClientId, #AttributeCodeId, Replicate( 'i', Cast( Rand() * 64 as Int ) ) )
set #Count = #Count - 1
end
-- Build the list of columns.
declare #Cols as NVarChar(Max)
select #Cols = Stuff(
( select ',' + QuoteName( AttributeName ) from AttributeCodes order by AttributeName for XML path(''), type).value( '.[1]', 'NVarChar(max)' ), 1, 1, '' );
-- Build an execute the summary query.
declare #Query as NVarChar(Max)
set #Query = 'select top (' + Cast( #NumberOfTopRows as VarChar(8) ) + ') ClientId, ' + #Cols +
' from ( select C.ClientId, C.[Value], A.AttributeName from Client as C inner join AttributeCodes as A on A.AttributeCodeId = C.AttributeCodeId ) as X' +
' pivot ( Min( [Value] ) for AttributeName in (' + #cols + ') ) as P'
declare #InitializationComplete as DateTime = GetDate()
execute( #Query )
select DateDiff( ms, #Start, #InitializationComplete ) as 'Initialization (ms)',
DateDiff( ms, #InitializationComplete, GetDate() ) as 'Query (ms)',
DateDiff( mi, #Start, GetDate() ) as 'Total (min)'

Related

Ho to write Select query for EAV - Entity Attribute Value model

I have following schema. In which
Types represent DB Table
TypeProperty represents Column of Table
TypeRow represent Row of table
I want to write a select query to which I will pass single Type and it should give me all its TypeProperty, TypeRow and TypeValue that are associated with these Properties and Rows.
I will be showing this data in a web application in which user will select a Type from dropdown and application will get Properties, Rows and associated values and will show them as a complete grid.
I am using SQL Server 2014.
Can anyone help me please?
So, i'm going to try and take a crack at what you have been getting help with Kannan on.
It sounds like you have two different queries to the database:
1) Query results of the list of 'Type(s)' for your dropdown,(You should be able to do this fairly easily)
2) query results of the list of 'Propert(ies)', 'Row(s)', and 'value(s) that match the selected 'Type' in the dropdown as a table with the properties as the header to set table.
To me it seems the easest and best way to handle this would be to get the data back using Kannan's script (probably inside of a stored procedure and maybe a view?) and create the grid in code from your back-end application, or front end client. However, if you cant, here is a script that should work or at the least get you started.
I would suggest maybe creating two stored procs, one to retrieve the data, and another to pivot using dynamic sql.
CREATE PROCEDURE dbo.EAV_GridGenerator
#TypeId int = 0,
#param2 int
AS
BEGIN TRY
DECLARE #cols varchar(max),
#query varchar(max);
--TODO: CLEAN UP VARIABLE NAMES THROUGHOUT
SELECT trow.TypesId, tprop.PropertyName AS [Column], trow.TypeRowId AS [RowID], tval.Value AS [Data]
INTO #TT2
FROM dbo.[Types] AS t
JOIN dbo.TypeRow trow
ON t.typesId = trow.typesId
JOIN dbo.TypeValue tval
ON tval.TypeRowsId = trow.TypeRowId
JOIN dbo.[TypeProperty] tprop
ON tval.TypesPropertyId = tprop.TypePropertyId
WHERE trow.TypesId = #TypeId
--AND t.IsActive = 1 AND tprop.IsActive = 1 AND trow.IsActive = 1 AND tval.IsActive = 1--TODO: IDK but you should probably add both of these
-- AND t.IsDelete = 1 AND tprop.IsDelete = 1 AND trow.IsDelete = 1 AND tval.IsDelete = 1--TODO: IDK but you should probably add both of these
ORDER BY RowID, [Column], Data
SELECT #cols = STUFF(( SELECT DISTINCT TOP 100 PERCENT
'],[' + t.[Column]
FROM #TT2 AS t
--ORDER BY '],[' + t.ID
FOR XML PATH('')
), 1, 2, '') + ']'
SET #query = N'SELECT RowID,'+ #cols +' FROM
(SELECT tt2.RowID,tt2.[Column] , tt2.Data FROM #tt2 AS tt2) p
PIVOT (max([data]) FOR [Column] IN ( '+ #cols +'))
AS pvt;'
EXECUTE(#query)
drop table #TT2
END TRY
BEGIN CATCH
--TODO: PROPER CATCH
END CATCH
A simple join will work.. Are you looking this?
Select * --your required columns
from Types t
inner join TypesProperty tp
on t.TypesId = tp.TypesId
inner join TypeRow tr
on t.TypesId = tr.TypesId
Left join TypeValue tv
on tp.TypesPropertyId = tv.TypesPrpertyId
--You need to join using typeRowid with typeValue if your require value details

How to make efficient pagination with total count

We have a web application which helps organizing biological experiments (users describe experiment and upload experiment data). In the main page, we show first 10 experiments and then below Previous Next 1 2 3 .. 30.
I bugs me how to make efficient total count and pagination. Currently:
select count(id) from experiments; // not very efficient in large datasets
but how does this scale when dealing with large datarecords > 200.000. I tried to import random experiments to table, but it still performs quite ok (0.6 s for 300.000 experiments).
The other alternative I thought about is to add addtional table statistics (column tableName, column recordsCount). So after each insert to table experiments I would increase recordsCount in statistics (this means inserting to one table and updating other, using sql transaction of course). Vice versa goes for delete statement (recordsCount--).
For pagination the most efficient way is to do where id > last_id as sql uses index of course. Is there any other better way?
In case results are to be filtered e.g. select * from experiment where name like 'name%', option with table statistics fails. We need to get total count as: select count(id) from experiment where name like 'name%'.
Application was developed using Laravel 3 in case it makes any difference.
I would like to develop pagination that always performs the same. Records count must not affect pagination nor total count of records.
Please have the query like below:
CREATE PROCEDURE [GetUsers]
(
#Inactive Bit = NULL,
#Name Nvarchar(500),
#Culture VarChar(5) = NULL,
#SortExpression VarChar(50),
#StartRowIndex Int,
#MaxRowIndex Int,
#Count INT OUTPUT
)
AS
BEGIN
SELECT ROW_NUMBER()
OVER
(
ORDER BY
CASE WHEN #SortExpression = 'Name' THEN [User].[Name] END,
CASE WHEN #SortExpression = 'Name DESC' THEN [User].[Name] END DESC
) AS RowIndex, [User].*
INTO #tmpTable
FROM [User] WITH (NOLOCK)
WHERE (#Inactive IS NULL OR [User].[Inactive] = #Inactive)
AND (#Culture IS NULL OR [User].[DefaultCulture] = #Culture)
AND [User].Name LIKE '%' + #Name + '%'
SELECT *
FROM #tmpTable WITH (NOLOCK)
WHERE #tmpTable.RowIndex > #StartRowIndex
AND #tmpTable.RowIndex < (#StartRowIndex + #MaxRowIndex + 1)
SELECT #Count = COUNT(*) FROM #tmpTable
IF OBJECT_ID('tempdb..#tmpTable') IS NOT NULL DROP TABLE #tmpTable;
END

Dynamic SP returning values in reverse order

I am using MS SQL and created one Dynamic stored procedure:
ALTER Procedure [dbo].[sp_MTracking]
(
#OList varchar(MAX)
)
As
BEGIN TRY
SET NOCOUNT ON
DECLARE #SQL varchar(600)
SET #SQL = 'select os.X,os.Y from Table1 as os join Table2 as s on os.sID=s.sID where s.SCode IN ('+ #OList +')'
exec (#SQL)
END TRY
BEGIN CATCH
Execute sp_DB_ErrorInfo
Select -1 Result
END CATCH
GO
It is working properly, but I am getting x,y values in reverse order.
For example if I am passing 'scode1,scode2' as parameter, I am getting x,y values for scode1 in 2nd row and x,y values for scode2 as first row.
How can I fix this issue
Thanks
This is a bit long for a comment.
SQL tables and results sets represent unordered sets. There is no ordering, unless you explicitly use an ORDER BY clause.
Your query does not have an ORDER BY. Hence, you have no reason to expect the results in any particular order. In addition, the ordering may be different on different runs of the query. If you want the results in a particular order, add ORDER BY.
Probably the easiest way is to use charindex():
order by charindex(',' + s.code + ',' , ',''' + #olist + ''',')
This is a bit more cumbersome in dynamic sql:
SET #SQL = '
select os.X,os.Y
from Table1 os join
Table2 s
on os.sID = s.sID
where s.SCode IN (' + #OList + ')
order by charindex('','' + s.code + '','', '',''' + #OList + ''', '')
';
Well, there are a couple of things here.
The first thing is what Gordon wrote - to ensure the order of the result set you must use the order by clause.
Second, like Devart demonstrated in his answer, you don't need dynamic sql for this kind of procedures.
Third, if you want your results ordered by the order of the parameters in the list, you should use a slightly different approach then Devart wrote.
Therefor, here are my 2 cents:
If you can change the stored procedure to accept a table valued parameter instead of VARCHAR(max) that would be your best option IMHO.
If not, you must use a split function to create a table from that varchar and then use that table in your select.
Note that you will have to choose a split function that returns a table with two columns - one for the value and one for it's position in the original string.
Whatever the case may be, the rest of the sql should be something like this:
SELECT os.X, os.Y
FROM Table1 os
INNER JOIN Table2 s ON os.[sID] = s.[sID]
INNER JOIN #TVP t ON s.SCode = t.Value
ORDER BY t.Sort
That's assuming #TVP to be a Table containing a Value column that's the same data type of SCode in table2, and a Sort column (an int, naturally).
Without dynamic sql -
ALTER PROCEDURE [dbo].[sp_MTracking]
(
#OList VARCHAR(MAX)
)
AS BEGIN
SET NOCOUNT ON
DECLARE #t TABLE (val VARCHAR(50) PRIMARY KEY WITH(IGNORE_DUP_KEY=ON))
INSERT INTO #t
SELECT item = t.c.value('.', 'INT')
FROM (
SELECT txml = CAST('<r>' + REPLACE(#OList, ',', '</r><r>') + '</r>' AS XML)
) r
CROSS APPLY txml.nodes('/r') t(c)
SELECT os.X, os.Y
FROM Table1 os
JOIN Table2 s ON os.[sID] = s.[sID]
WHERE s.SCode IN (SELECT * FROM #t)
--OPTION(RECOMPILE)
END
GO

Pass EXEC statement to APPLY as a parameter

I have a need to grab data from multiple databases which has tables with the same schema. For this I created synonyms for this tables in the one of the databases. The number of databases will grow with time. So, the procedure, which will grab the data should be flexible. I wrote the following code snippet to resolve the problem:
WHILE #i < #count
BEGIN
SELECT #synonymName = [Name]
FROM Synonyms
WHERE [ID] = #i
SELECT #sql = 'SELECT TOP (1) *
FROM [dbo].[synonym' + #synonymName + '] as syn
WHERE [syn].[Id] = tr.[Id]
ORDER BY [syn].[System.ChangedDate] DESC'
INSERT INTO #tmp
SELECT col1, col2
FROM
(
SELECT * FROM TableThatHasRelatedDataFromAllTheSynonyms
WHERE [Date] > #dateFrom
) AS tr
OUTER APPLY (EXEC(#sql)) result
SET #i = #i + 1
END
I also appreciate for any ideas on how to simplify the solution.
Actually, it's better to import data from all tables into one table (maybe with additional column for source table name) and use it. Importing can be performed through SP or SSIS package.
Regarding initial question - you can achieve it through TVF wrapper for exec statement (with exec .. into inside it).
UPD: As noticed in the comments exec doesn't work inside TVF. So, if you really don't want to change DB structure and you need to use a lot of tables I suggest to:
OR select all data from synonym*** table into variables (as I see you select only one row) and use them
OR prepare dynamic SQL for complete statement (with insert, etc.) and use temporary table instead of table variable here.
My solution is quite simple. Just to put all the query to the string and exec it. Unfortunately it works 3 times slower than just copy/past the code for all the synonyms.
WHILE #i < #count
BEGIN
SELECT #synonymName = [Name]
FROM Synonyms
WHERE [ID] = #i
SELECT #sql = 'SELECT col1, col2
FROM
(
SELECT * FROM TableThatHasRelatedDataFromAllTheSynonyms
WHERE [Date] > ''' + #dateFrom + '''
) AS tr
OUTER APPLY (SELECT TOP (1) *
FROM [dbo].[synonym' + #synonymName + '] as syn
WHERE [syn].[Id] = tr.[Id]
ORDER BY [syn].[System.ChangedDate] DESC) result'
INSERT INTO #tmp
EXEC(#sql)
SET #i = #i + 1
END

How to use PIVOT in SQL Server 2005 Stored Procedure Joining Two Views

Good Morning,
I have 2 views: ICCUDays which contains one record per account with fields ACCOUNT and ICCUDays, ICCUEnctrSelectedRevCatsDirCost which contains multiple records per account with fields ACCOUNT, UBCATEGORY, and DirectCost.
My Goal: To create a stored procedure that outputs one record per ACCOUNT with ICCUDays and DirectCost by UBCATEGORY. This will be a crosstab or pivot and has to allow for the possibility of nulls in one or more direct cost ubcategory bucket. Finally, this crosstab or pivot needs to be sent to a new table EnctrUBCatPivot.
Questions: What is the correct PIVOT syntax for the above scenario? Given that I want to ouptut direct cost for however many UBCATEGORY entries, how do I write the TSQL to iterate over these and pivot by account and UBCATEGORY? Is all this accomplished in one sproc, or does it have to be separated into multiple sprocs to write the results out to a table?
Here's the code I've written so far:
ALTER PROCEDURE [dbo].[spICCUMain]
-- Add the parameters for the stored procedure here
AS
declare #columns varchar(8000)
BEGIN
-- SET NOCOUNT ON added to prevent extra result sets from
-- interfering with SELECT statements.
SET NOCOUNT ON;
-- Insert statements for procedure here
SELECT #columns = COALESCE(#columns + ',[' + cast(UBCATEGORYmid as varchar) + ']','[' + cast(UBCATEGORYmid as varchar)+ ']')
FROM vwICCUEnctrSelectedRevCatsDirCost
GROUP BY UBCATEGORYmid
DECLARE #query VARCHAR(8000)
SET #query = '
SELECT *
FROM vwICCUEnctrSelectedRevCatsDirCost
PIVOT
(
MAX(DirectCost)
FOR [UBCATEGORYmid]
IN (' + #columns + ')
)
AS p'
EXECUTE(#query)
END
This works fine in that it outputs Account and all the Direct Costs for each UBCATEGORY. However, I need to inner join to vwICCUDAYS on ACCOUNT to add a column to the pivot for ICCUDays. Final pivot columns should be Account, ICCUDays, Direct Cost for each UBCATEGORYmid.
I'm not very familiar with the coalesce syntax and thus cannot discern how to modify it to add further columns, nor am I sure how/where to add the inner join syntax to add ICCUDays.
Can someone point me in the proper direction?
Thanks,
Sid
You need to know all of the possible values to PIVOT by. So it is difficult to do this with T-SQL directly unless you use dynamic SQL and this can get hairy pretty quickly. Probably better to pass all of the rows back to the presentation tier or report writer and let it turn them sideways.
Here is a quick PIVOT example if you know all of the UBCategory values in advance. I left out ICCUDays since it seems rather irrelevant unless there are columns that come from that view as part of the result.
USE tempdb;
GO
SET NOCOUNT ON;
GO
-- who on earth is responsible for your naming scheme?
CREATE TABLE dbo.ICCUEnctrSelectedRevCatsDirCost
(
Account INT,
UBCategory VARCHAR(10),
DirectCost DECIMAL(9,2)
);
INSERT dbo.ICCUEnctrSelectedRevCatsDirCost
SELECT 1, 'foo', 5.25
UNION SELECT 1, 'bar', 6.25
UNION SELECT 1, 'smudge', 8.50
UNION SELECT 2, 'foo', 9.25
UNION SELECT 2, 'brap', 2.75;
SELECT Account,[foo],[bar],[smudge],[brap] FROM
dbo.ICCUEnctrSelectedRevCatsDirCost
-- WHERE <something>, I assume ???
PIVOT
(
MAX(DirectCost)
FOR UBCategory IN ([foo],[bar],[smudge],[brap])
) AS p;
GO
DROP TABLE dbo.ICCUEnctrSelectedRevCatsDirCost;
To make this more dynamic, you'd have to get the comma separated list of DISTINCT UBCategory values, and build the pivot on the fly. So it might look like this:
USE tempdb;
GO
SET NOCOUNT ON;
GO
-- who on earth is responsible for your naming scheme?
CREATE TABLE dbo.ICCUEnctrSelectedRevCatsDirCost
(
Account INT,
UBCategory VARCHAR(10),
DirectCost DECIMAL(9,2)
);
INSERT dbo.ICCUEnctrSelectedRevCatsDirCost
SELECT 1, 'foo', 5.25
UNION SELECT 1, 'bar', 6.25
UNION SELECT 1, 'smudge', 8.50
UNION SELECT 2, 'foo', 9.25
UNION SELECT 2, 'brap', 2.75
UNION SELECT 3, 'bingo', 4.00;
DECLARE #sql NVARCHAR(MAX),
#col NVARCHAR(MAX);
SELECT #col = COALESCE(#col, '') + QUOTENAME(UBCategory) + ','
FROM
(
SELECT DISTINCT UBCategory
FROM dbo.ICCUEnctrSelectedRevCatsDirCost
) AS x;
SET #col = LEFT(#col, LEN(#col)-1);
SET #sql = N'SELECT Account, $col$ FROM
dbo.ICCUEnctrSelectedRevCatsDirCost
-- WHERE <something>, I assume ???
PIVOT
(
MAX(DirectCost)
FOR UBCategory IN ($col$)
) AS p;';
SET #sql = REPLACE(#sql, '$col$', #col);
--EXEC sp_executeSQL #sql;
PRINT #sql;
GO
DROP TABLE dbo.ICCUEnctrSelectedRevCatsDirCost;
Then to "send the data to a new table" you can just make the query an INSERT INTO ... SELECT instead of a straight SELECT. Of course, this seems kind of useless, because in order to write that insert statement, you need to know the order of the columns (which isn't guaranteed with this approach) and you need to have already put in columns for each potential UBCategory value anyway, so this seems very chicken and egg.