more efficiently pivot rows - sql

I am trying to join multiple tables together. One of the tables I am trying to join has hundreds of rows per ID of data. I am trying to pivot about 100 rows for each ID into columns. The value I am trying to use isn't always in the same row. Below is an example (my real table has hundreds of rows per ID). AccNum for example in ID 1 may be in the NumV column, but for ID 2 it may be in the CharV column.
ID QType CharV NumV
1 AccNum 10
1 EmpNam John Inc 0
1 UW Josh 0
2 AccNum 11
2 EmpNam CBS 0
2 UW Dan 0
The original code I used was a select statement with hundreds of lines like one below:
Max(Case When PM.[QType] = 'AccNum' Then NumV End) as AccNum
This code with hundreds on lines completed in just under 10 min. The problem however is that in only pulls in values from the column I specify, so I will always loss the data that is in a different column. (In the example above I would get AccNum 10, but not AccNum11 because it's in the CharV column).
I updated the code to use a pivot:
;with CTE
As
(
Select [PMID], [QType],
Value=concat(Nullif([CharV],''''),Nullif([NumV],0))
From [DBase].[dbo].[PM]
)
Select C.[ID] AS M_ID
,Max(c.[AccNum]) As AcctNum
,Max(c.[EmpNam]) As EmpName
and so on...
I then select all of my hundreds of rows and then pivot it the data:
from CTE
pivot (max(Value) for [QType] in ([AccNum],[EmpNam],(more rows)))As c
The problem with this code, however, is that it takes almost 2 hours to run.
Is there a different, more efficient solution to what I am trying to accomplish? I need to have the speed of the first code, but the result of the second.

Perhaps you can reduce the Concat/NullIf processing by using a UNION ALL
Select ID,QType,Value=CharV From #YourTable where CharV>''
Union All
Select ID,QType,Value=cast(NumV as varchar(25)) From #YourTable where NumV>0
For the conditional aggregation approach
No need to worry about which field, just reference VALUE
Select [ID]
,[Accnum] = Max(Case When [QType] = 'AccNum' Then Value End)
,[EmpNam] = Max(Case When [QType] = 'EmpNam' Then Value End)
,[UW] = Max(Case When [QType] = 'UW' Then Value End)
From (
Select ID,QType,Value=CharV From #YourTable where CharV>''
Union All
Select ID,QType,Value=cast(NumV as varchar(25)) From #YourTable where NumV>0
) A
Group By ID
For the PIVOT approach
Select [ID],[AccNum],[EmpNam],[UW]
From (
Select ID,QType,Value=CharV From #YourTable where CharV>''
Union All
Select ID,QType,Value=cast(NumV as varchar(25)) From #YourTable where NumV>0
) A
Pivot (max([Value]) For [QType] in ([AccNum],[EmpNam],[UW])) p

Related

Split column value into separate columns based on length

I have multiple comma-separated values in one column with a size up to 20000 characters, and I want to split that column into one column but its based on character values 2000 (like into one new column it will take 2000 character and if length is grater than 2000 then its will be in second column like this).
When it's comma-separated value goes into first new column, then it should be meaningful like it should be based on , and up to 2000 characters only like this.
I have done from row level value to column level only but its should be 2000 character and based on comma
Could you please help me with this ?
siddesh, although this question lacks of everything I want to point some things out and help you (as you are an unexperienced SO-user):
First I set up a minimal reproducible exampel. This is on you the next time.
I'll start with a declared table with some rows inserted.
We on SO can copy'n'paste this into our environment which makes it easy to answer.
DECLARE #tbl TABLE(ID INT IDENTITY, YourCSVString VARCHAR(MAX));
INSERT INTO #tbl VALUES('1 this is long text, 2 some second fragment, 3 third fragment, 4 adfjksdahfljsadhfjhadlfhasdjks alsdjfsadhf k, 5 halksjfh asdkf ')
,('1 this is other long text, 2 some second fragment to show that this works with tabular data, 3 again a third fragment, 4 adfjksdahfljsadhfjhadlfhasdjks alsdjfsadhf k, 5 halksjfh asdkf ');
--This is, what you actually need:
SELECT fkID = t.ID
,B.fragmentPosition
,B.fragmentContent
,C.framgentLength
FROM #tbl t
CROSS APPLY OPENJSON(CONCAT(N'["',REPLACE(t.YourCSVString,N',','","'),'"]')) A
CROSS APPLY(VALUES(A.[key],TRIM(A.[value]))) B(fragmentPosition,fragmentContent)
CROSS APPLY(VALUES(LEN(B.fragmentContent))) C(framgentLength);
The result should be stored within a physical table, where the fkID points to the ID of the original row and the fragmentPosition stores the order. fkID and fragmentPosition should be a combined unique key.
If you really want to do, what you are suggesting in your question (not recommended!) you can try something along this:
DECLARE #maxPerColumn INT=75; --You can set the portion's max size, in your case 2000.
WITH cte AS
(
SELECT fkID = t.ID
,B.fragmentPosition
,B.fragmentContent
,C.framgentLength
FROM #tbl t
CROSS APPLY OPENJSON(CONCAT(N'["',REPLACE(t.YourCSVString,N',','","'),'"]')) A
CROSS APPLY(VALUES(A.[key],TRIM(A.[value]))) B(fragmentPosition,fragmentContent)
CROSS APPLY(VALUES(LEN(B.fragmentContent))) C(framgentLength)
)
,recCTE AS
(
SELECT *
,countPerColumn = 1
,columnCounter = 1
,sumLength = LEN(fragmentContent)
,growingString = CAST(fragmentContent AS NVARCHAR(MAX))
FROM cte WHERE fragmentPosition=0
UNION ALL
SELECT r.fkID
,cte.fragmentPosition
,cte.fragmentContent
,cte.framgentLength
,CASE WHEN A.newSumLength>#maxPerColumn THEN 1 ELSE r.countPerColumn + 1 END
,r.columnCounter + CASE WHEN A.newSumLength>#maxPerColumn THEN 1 ELSE 0 END
,CASE WHEN A.newSumLength>#maxPerColumn THEN LEN(cte.fragmentContent) ELSE newSumLength END
,CASE WHEN A.newSumLength>#maxPerColumn THEN cte.fragmentContent ELSE CONCAT(r.growingString,N', ',cte.fragmentContent) END
FROM cte
INNER JOIN recCTE r ON r.fkID=cte.fkID AND r.fragmentPosition+1=cte.fragmentPosition
CROSS APPLY(VALUES(r.sumLength+LEN(cte.fragmentContent))) A(newSumLength)
)
SELECT TOP 1 WITH TIES
fkID
,growingString
,LEN(growingString)
FROM recCTE
ORDER BY ROW_NUMBER() OVER(PARTITION BY fkID,columnCounter ORDER BY countPerColumn DESC );
The result
fkID pos Content
1 2 1 this is long text, 2 some second fragment, 3 third fragment
1 4 4 adfjksdahfljsadhfjhadlfhasdjks alsdjfsadhf k, 5 halksjfh asdkf
2 0 1 this is other long text
2 1 2 some second fragment to show that this works with tabular data
2 3 3 again a third fragment, 4 adfjksdahfljsadhfjhadlfhasdjks alsdjfsadhf k
2 4 5 halksjfh asdkf
The idea in short:
The first cte does the splitting (as above)
The recursive cte will iterate down the string and do the magic.
The final SELECT uses a hack with TOP 1 WITH TIES together with an ORDER BY ROW_NUMBER() OVER(...). This will return the highest intermediate result only.
Hint: Don't do this...
UPDATE
Just for fun:
You can replace the final SELECT with this
,getPortions AS
(
SELECT TOP 1 WITH TIES
fkID
,fragmentPosition
,growingString
,LEN(growingString) portionLength
FROM recCTE
ORDER BY ROW_NUMBER() OVER(PARTITION BY fkID,columnCounter ORDER BY countPerColumn DESC )
)
SELECT p.*
FROM
(
SELECT fkID
,CONCAT(N'col',ROW_NUMBER() OVER(PARTITION BY fkID ORDER BY fragmentPosition)) AS ColumnName
,growingString
FROM getPortions
) t
PIVOT(MAX(growingString) FOR ColumnName IN(col1,col2,col3,col4,col5)) p;
This will return exactly what you are asking for.
But - as said before - this is against all rules of best practice...

Separate columns for product counts using CTEs

Asking a question again as my post did not follow community rules.
I first tried to write a PIVOT statement to get the desired output. However, I am now trying to approach this using CTEs.
Here's the raw data. Let's call it ProductMaster:
PRODUCT_NUM
CO_CD
PROD_CD
MASTER_ID
Date
ROW_NUM
1854
MAWC
STATIONERY
10003493039
1/1/2021
1
1567
PREF
PRINTER
10003493039
2/1/2021
2
2151
MAWC
STATIONERY
10003497290
3/2/2021
1
I require the Count of each product for every Household from this data in separate columns, Printer_CT, Stationery_Ct
Each Master_ID represents a household. And a household can have multiple products.
So each household represents one row in my final output and I need the Product Counts in separate columns. There can be multiple products in each household, 4 or even more. But I have simplified this example.
I'm writing a query with CTEs to give me the output that I want. In my output, each row is grouped by Master ID
ORGL_CO_CD
ORGL_PROD_CD
STATIONERY_CT
PRINTER_CT
MAWC
STATIONERY
1
1
MAWC
STATIONERY
1
0
Here's my query. I'm not sure where to introduce Column 'Stationery_Ct'
WITH CTE AS
(
SELECT
CO_CD, Prod_CD, MASTER_ID,
'' as S1_CT, '' as P1_CT
FROM
ProductMaster
WHERE
ROW_NUM = 1
), CTE_2 AS
(
SELECT Prod_CD, MASTER_ID
FROM ProductMaster
WHERE ROW_NUM = 2
)
SELECT
CO_CD AS ORGL_CO_CD,
c.Prod_CD AS ORGL_PROD_CD,
(CASE WHEN c2.Prod_CD = ‘PRINTER’ THEN P1_CT = 1 END) PRINTER_CT
FROM
CTE AS c
LEFT OUTER JOIN
CTE_2 AS c2 ON c.MASTER_ID = c2.MASTER_ID
Any pointers would be appreciated.
Thank you!
I guess you can solve that using just GROUP BY and SUM:
-- Test data
DECLARE #ProductMaster AS TABLE (PRODUCT_NUM INT, CO_CD VARCHAR(30), PROD_CD VARCHAR(30), MASTER_ID BIGINT)
INSERT #ProductMaster VALUES (1854, 'MAWC', 'STATIONERY', 10003493039)
INSERT #ProductMaster VALUES (1567, 'PREF', 'PRINTER', 10003493039)
INSERT #ProductMaster VALUES (2151, 'MAWC', 'STATIONERY', 10003497290)
SELECT
MASTER_ID,
SUM(CASE PROD_CD WHEN 'STATIONERY' THEN 1 ELSE 0 END) AS STATIONERY_CT,
SUM(CASE PROD_CD WHEN 'PRINTER' THEN 1 ELSE 0 END) AS PRINTER_CT
FROM #ProductMaster
GROUP BY MASTER_ID
The result is:
MASTER_ID
STATIONERY_CT
PRINTER_CT
10003493039
1
1
10003497290
1
0

SQL. how to compare values and from two table, and report per-row results

I have two Tables.
table A
id name Size
===================
1 Apple 7
2 Orange 15
3 Banana 22
4 Kiwi 2
5 Melon 28
6 Peach 9
And Table B
id size
==============
1 14
2 5
3 31
4 9
5 1
6 16
7 7
8 25
My desired result will be (add one column to Table A, which is the number of rows in Table B that have size smaller than Size in Table A)
id name Size Num.smaller.in.B
==============================
1 Apple 7 2
2 Orange 15 5
3 Banana 22 6
4 Kiwi 2 1
5 Melon 28 7
6 Peach 9 3
Both Table A and B are pretty huge. Is there a clever way of doing this. Thanks
Use this query it's helpful
SELECT id,
name,
Size,
(Select count(*) From TableB Where TableB.size<Size)
FROM TableA
The standard way to get your result involves a non-equi-join, which will be a product join in Explain. First duplicating 20,000 rows, followed by 7,000,000 * 20,000 comparisons and a huge intermediate spool before the count.
There's a solution based on OLAP-functions which is usually quite efficient:
SELECT dt.*,
-- Do a cumulative count of the rows of table #2
-- sorted by size, i.e. count number of rows with a size #2 less size #1
Sum(CASE WHEN NAME = '' THEN 1 ELSE 0 end)
Over (ORDER BY SIZE, NAME DESC ROWS Unbounded Preceding)
FROM
( -- mix the rows of both tables, an empty name indicates rows from table #2
SELECT id, name, size
FROM a
UNION ALL
SELECT id, '', size
FROM b
) AS dt
-- only return the rows of table #1
QUALIFY name <> ''
If there are multiple rows with the same size in table #2 you better count before the Union to reduce the size:
SELECT dt.*,
-- Do a cumulative sum of the counts of table #2
-- sorted by size, i.e. count number of rows with a size #2 less size #1
Sum(CASE WHEN NAME ='' THEN id ELSE 0 end)
Over (ORDER BY SIZE, NAME DESC ROWS Unbounded Preceding)
FROM
( -- mix the rows of both tables, an empty name indicates rows from table #2
SELECT id, name, size
FROM a
UNION ALL
SELECT Count(*), '', SIZE
FROM b
GROUP BY SIZE
) AS dt
-- only return the rows of table #1
QUALIFY NAME <> ''
There is no clever way of doing that, you just need to join the tables like this:
select a.*, b.size
from TableA a join TableB b on a.id = b.id
To improve performance you'll need to have indexes on the id columns.
maybe
select
id,
name,
a.Size,
sum(cnt) as sum_cnt
from
a inner join
(select size, count(*) as cnt from b group by size) s on
s.size < a.size
group by id,name,a.size
if you're working with large tables. Indexing table b's size field could help. I'm also assuming the values in table B converge, that there's many duplicates you don't care about, other than you want to count them.
sqlfiddle
#Ritesh solution is perfectly correct, another similar solution is using CROSS JOIN as shown below
use tempdb
create table dbo.A (id int identity, name varchar(30), size int );
create table dbo.B (id int identity, size int);
go
insert into dbo.A (name, size)
values ('Apple', 7)
,('Orange', 15)
,('Banana', 22)
,('Kiwi', 2 )
,('Melon', 28)
,('Peach', 6 )
insert into dbo.B (size)
values (14), (5),(31),(9),(1),(16), (7),(25)
go
-- using cross join
select a.*, t.cnt
from dbo.A
cross apply (select cnt=count(*) from dbo.B where B.size < A.size) T(cnt)
try this query
SELECT
A.id,A.name,A.size,Count(B.size)
from A,B
where A.size>B.size
group by A.size
order by A.id;

Generate SQL rows

Given a number of types and a number of occurrences per type, I would like to generate something like this in T-SQL:
Occurrence | Type
-----------------
0 | A
1 | A
0 | B
1 | B
2 | B
Both the number of types and the number of occurrences per type are presented as values in different tables.
While I can do this with WHILE loops, I'm looking for a better solution.
Thanks!
This works with a number-table which i would use.
SELECT Occurrence = ROW_NUMBER() OVER (PARTITION BY Type ORDER BY Type) - 1
, Type
FROM Numbers num
INNER JOIN #temp1 t
ON num.n BETWEEN 1 AND t.Occurrence
Tested with this sample data:
create table #temp1(Type varchar(10),Occurrence int)
insert into #temp1 VALUES('A',2)
insert into #temp1 VALUES('B',3)
How to create a number-table? http://sqlperformance.com/2013/01/t-sql-queries/generate-a-set-1
If you have a table with the columns type and num, you have two approaches. One way is to use recursive CTEs:
with CTE as (
select type, 0 as occurrence, num
from table t
union all
select type, 1 + occurrence, num
from cte
where occurrence + 1 < num
)
select cte.*
from cte;
You may have to set the MAXRECURSION option, if the number exceeds 100.
The other way is to join in a numbers table. SQL Server uses spt_values for this purpose:
select s.number - 1 as occurrence, t.type
from table t join
spt_values s
on s.number <= t.num ;

How do I aggregate numbers from a string column in SQL

I am dealing with a poorly designed database column which has values like this
ID cid Score
1 1 3 out of 3
2 1 1 out of 5
3 2 3 out of 6
4 3 7 out of 10
I want the aggregate sum and percentage of Score column grouped on cid like this
cid sum percentage
1 4 out of 8 50
2 3 out of 6 50
3 7 out of 10 70
How do I do this?
You can try this way :
select
t.cid
, cast(sum(s.a) as varchar(5)) +
' out of ' +
cast(sum(s.b) as varchar(5)) as sum
, ((cast(sum(s.a) as decimal))/sum(s.b))*100 as percentage
from MyTable t
inner join
(select
id
, cast(substring(score,0,2) as Int) a
, cast(substring(score,charindex('out of', score)+7,len(score)) as int) b
from MyTable
) s on s.id = t.id
group by t.cid
[SQLFiddle Demo]
Redesign the table, but on-the-fly as a CTE. Here's a solution that's not as short as you could make it, but that takes advantage of the handy SQL Server function PARSENAME. You may need to tweak the percentage calculation if you want to truncate rather than round, or if you want it to be a decimal value, not an int.
In this or most any solution, you have to count on the column values for Score to be in the very specific format you show. If you have the slightest doubt, you should run some other checks so you don't miss or misinterpret anything.
with
P(ID, cid, Score2Parse) as (
select
ID,
cid,
replace(Score,space(1),'.')
from scores
),
S(ID,cid,pts,tot) as (
select
ID,
cid,
cast(parsename(Score2Parse,4) as int),
cast(parsename(Score2Parse,1) as int)
from P
)
select
cid, cast(round(100e0*sum(pts)/sum(tot),0) as int) as percentage
from S
group by cid;