Separate columns for product counts using CTEs - sql

Asking a question again as my post did not follow community rules.
I first tried to write a PIVOT statement to get the desired output. However, I am now trying to approach this using CTEs.
Here's the raw data. Let's call it ProductMaster:
PRODUCT_NUM
CO_CD
PROD_CD
MASTER_ID
Date
ROW_NUM
1854
MAWC
STATIONERY
10003493039
1/1/2021
1
1567
PREF
PRINTER
10003493039
2/1/2021
2
2151
MAWC
STATIONERY
10003497290
3/2/2021
1
I require the Count of each product for every Household from this data in separate columns, Printer_CT, Stationery_Ct
Each Master_ID represents a household. And a household can have multiple products.
So each household represents one row in my final output and I need the Product Counts in separate columns. There can be multiple products in each household, 4 or even more. But I have simplified this example.
I'm writing a query with CTEs to give me the output that I want. In my output, each row is grouped by Master ID
ORGL_CO_CD
ORGL_PROD_CD
STATIONERY_CT
PRINTER_CT
MAWC
STATIONERY
1
1
MAWC
STATIONERY
1
0
Here's my query. I'm not sure where to introduce Column 'Stationery_Ct'
WITH CTE AS
(
SELECT
CO_CD, Prod_CD, MASTER_ID,
'' as S1_CT, '' as P1_CT
FROM
ProductMaster
WHERE
ROW_NUM = 1
), CTE_2 AS
(
SELECT Prod_CD, MASTER_ID
FROM ProductMaster
WHERE ROW_NUM = 2
)
SELECT
CO_CD AS ORGL_CO_CD,
c.Prod_CD AS ORGL_PROD_CD,
(CASE WHEN c2.Prod_CD = ‘PRINTER’ THEN P1_CT = 1 END) PRINTER_CT
FROM
CTE AS c
LEFT OUTER JOIN
CTE_2 AS c2 ON c.MASTER_ID = c2.MASTER_ID
Any pointers would be appreciated.
Thank you!

I guess you can solve that using just GROUP BY and SUM:
-- Test data
DECLARE #ProductMaster AS TABLE (PRODUCT_NUM INT, CO_CD VARCHAR(30), PROD_CD VARCHAR(30), MASTER_ID BIGINT)
INSERT #ProductMaster VALUES (1854, 'MAWC', 'STATIONERY', 10003493039)
INSERT #ProductMaster VALUES (1567, 'PREF', 'PRINTER', 10003493039)
INSERT #ProductMaster VALUES (2151, 'MAWC', 'STATIONERY', 10003497290)
SELECT
MASTER_ID,
SUM(CASE PROD_CD WHEN 'STATIONERY' THEN 1 ELSE 0 END) AS STATIONERY_CT,
SUM(CASE PROD_CD WHEN 'PRINTER' THEN 1 ELSE 0 END) AS PRINTER_CT
FROM #ProductMaster
GROUP BY MASTER_ID
The result is:
MASTER_ID
STATIONERY_CT
PRINTER_CT
10003493039
1
1
10003497290
1
0

Related

SQL from per day table to date range table transformation

I need to transform the following input table to the output table where output table will have ranges instead of per day data.
Input:
Asin day is_instock
--------------------
A1 1 0
A1 2 0
A1 3 1
A1 4 1
A1 5 0
A2 3 0
A2 4 0
Output:
asin start_day end_day is_instock
---------------------------------
A1 1 2 0
A1 3 4 1
A1 5 5 0
A2 3 4 0
This is what is referred to as the "gaps and islands" problem. There's a fair amount of articles and references you can find if you use that search term.
Solution below:
/*Data setup*/
DROP TABLE IF EXISTS #Stock
CREATE TABLE #Stock ([Asin] Char(2),[day] int,is_instock bit)
INSERT INTO #Stock
VALUES
('A1',1,0)
,('A1',2,0)
,('A1',3,1)
,('A1',4,1)
,('A1',5,0)
,('A2',3,0)
,('A2',4,0);
/*Solution*/
WITH cte_Prev AS (
SELECT *
/*Compare previous day's stock status with current row's status. Every time it changes, return 1*/
,StockStatusChange = CASE WHEN is_instock = LAG(is_instock) OVER (PARTITION BY [Asin] ORDER BY [day]) THEN 0 ELSE 1 END
FROM #Stock
)
,cte_Groups AS (
/*Cumulative sum so everytime stock status changes, add 1 from StockStatusChange to begin the next group*/
SELECT GroupID = SUM(StockStatusChange) OVER (PARTITION BY [Asin] ORDER BY [day])
,*
FROM cte_Prev
)
SELECT [Asin]
,start_day = MIN([day])
,end_day = MAX([day])
,is_instock
FROM cte_Groups
GROUP BY [Asin],GroupID,is_instock
You are looking for an operator described in the temporal data literature, and "best known" as PACK.
This operator was not made part of the SQL standard (SQL:2011) that introduced the temporal features of the literature into the language, so there's extremely little chance you're going to find anything to support you in any SQL product/dialect.
Boils down to : you'll have to write out the algorithm to do the PACKing yourself.

Case in Sql group by query

I am working on a project in which I want to use Case to calculate price of product under specific Reference Number in SQL server. Below is my Sql query
SELECT
product AS Products,
refNum AS Refrence,
COUNT(id) AS Count
FROM ProductPriceList
GROUP BY
refNum, product
By Executing Above query I get:
Product Reference Count
Product1 Ref08 24
Product2 Ref08 7
Product3 Ref07 32
Product2 Ref12 1
Product3 Ref12 18
Product1 Ref07 76
Product1 Null 56
Can anyone guide me how to use Case statement in Sql query with group by statement to show price Below is the case:
if count < 10 then price 1
if count > 10 and < 100 then price 2
if count > 100 then price 3
I don't want to add a new table in my database. I hope you can understand my query.
Thanks in advance.
I think a basic CASE expression can handle your requirement:
SELECT
product AS Products,
refNum AS Refrence,
CASE WHEN COUNT(*) < 10 THEN 1
WHEN COUNT(*) >= 10 AND COUNT(*) < 100 THEN 2
ELSE 3 END AS price
FROM ProductPriceList
GROUP BY
product, refNum;
Not much to explain here, except that the 2 price case uses a bound which includes the count of 10 (since the 1 price case excludes it).
Here's alternative (doesn't differ much from exisiting one though):
You can use your query in subquery and use case outside:
select product,
--to get NULL values back
case Reference when 'RefNull' then NULL else Reference end [Reference],
case when [Count] < 10 then 1
when [Count] between 10 and 100 then 2
else 3 end [price]
from (
SELECT product AS Products,
--to allow also null values to be grouped
coalesce(refNum, 'RefNull') AS Refrence,
COUNT(id) AS Count
FROM ProductPriceList
GROUP BY coalesce(refNum, 'RefNull'), product
) [a]
Dataset:
Create Table ProductPriceList
(
Product varchar(10)
,RefNum CHAR(5)
,Records Int
);
Insert into ProductPriceList
Values
('Product1','Ref08',24)
,('Product2','Ref08',7)
,('Product3','Ref07',32)
,('Product2','Ref12',1)
,('Product3','Ref12',18)
,('Product1','Ref07',76)
,('Product1', NULL, 56);
With RCTE AS
(
Select Product
,RefNum
,Records
,1 RowNo
From ProductPriceList PPL
Union All
Select Product
,RefNum
,Records
,RowNo + 1
From RCTE R
Where RowNo + 1 < Records
)
Insert Into ProductPriceList (Product, RefNum, Records)
Select Product, RefNum, Records
From RCTE
where Records > 1
Query to fetch desired result:
Select Product
,RefNum
,Case When Count(*) < 10 Then 1
When Count(*) Between 10 and 99 then 2
Else 3 End Price
From ProductPriceList
Group By Product, RefNum
SQL Fiddle

Using Count distinct case in sql and group by multiple columns

I have a query that works great (listed below). The issue I am having is we have run into a patient that has had event on two different days and because I am grouping by the PATNUM, it is only showing it as one.
How can I get it to count 1 for each time if the PATNUM and SCHDT are different
Example:
PATNUM SCHDT
12345 30817
12345 30817
54321 30817
54321 30717
PATNUM 12345 should only count once while PATNUM 54321 should count twice.
My count statement is this:
SELECT ph.*, pi.*,
COUNT(DISTINCT CASE WHEN `SERVTYPE` IN ('INPT','INPFOP','INFOBS','IP') AND Complete ='7' THEN pi.PATNUM ELSE NULL END) AS count1,
COUNT(DISTINCT CASE WHEN `SERVTYPE` IN ('INPT','INPFOP','INFOBS','IP') AND Complete ='8' THEN pi.PATNUM ELSE NULL END) AS count2
FROM patientinfo as pi
INNER JOIN physicians as ph ON pi.SURGEON=ph.PName
WHERE PID NOT IN ('1355','988','767','1289','484','2784')
GROUP BY SURGEON
ORDER BY Dept,SURGEON ASC
Which columns do you want to see?
You can adjust your GROUP BY:
SELECT
ph.pname,
ph.specialty,
SUM(CASE WHEN complete = 7 THEN 1 ELSE 0 END) count1,
SUM(CASE WHEN complete = 8 THEN 1 ELSE 0 END) count2
FROM
(
SELECT
DISTINCT
surgeon,
patnum,
schdt,
complete,
servtype
FROM patientinfo
WHERE complete IN (7,8)
AND servtype IN ('INPT','INPFOP','INFOBS','IP')
AND pid NOT IN ('1355','988','767','1289','484','2784')
) pisub
INNER JOIN physicians ph ON pisub.surgeon = ph.pname
GROUP BY ph.pname, ph.specialty
ORDER BY ph.pname, ph.specialty;
Also, I would make a few suggestions:
If you're going to give your tables an alias, then use the alias when referring to any column in your query. I've made a guess here about some of your columns as to which table they come from (e.g. dept), so feel free to change it if it is not correct
You don't need to select all records from both tables if you don't need them
The query won't run if you don't GROUP BY all columns you're selecting. I've written about this for Oracle and SQL in general, but actually in MySQL I think it does run but show incorrect results.

more efficiently pivot rows

I am trying to join multiple tables together. One of the tables I am trying to join has hundreds of rows per ID of data. I am trying to pivot about 100 rows for each ID into columns. The value I am trying to use isn't always in the same row. Below is an example (my real table has hundreds of rows per ID). AccNum for example in ID 1 may be in the NumV column, but for ID 2 it may be in the CharV column.
ID QType CharV NumV
1 AccNum 10
1 EmpNam John Inc 0
1 UW Josh 0
2 AccNum 11
2 EmpNam CBS 0
2 UW Dan 0
The original code I used was a select statement with hundreds of lines like one below:
Max(Case When PM.[QType] = 'AccNum' Then NumV End) as AccNum
This code with hundreds on lines completed in just under 10 min. The problem however is that in only pulls in values from the column I specify, so I will always loss the data that is in a different column. (In the example above I would get AccNum 10, but not AccNum11 because it's in the CharV column).
I updated the code to use a pivot:
;with CTE
As
(
Select [PMID], [QType],
Value=concat(Nullif([CharV],''''),Nullif([NumV],0))
From [DBase].[dbo].[PM]
)
Select C.[ID] AS M_ID
,Max(c.[AccNum]) As AcctNum
,Max(c.[EmpNam]) As EmpName
and so on...
I then select all of my hundreds of rows and then pivot it the data:
from CTE
pivot (max(Value) for [QType] in ([AccNum],[EmpNam],(more rows)))As c
The problem with this code, however, is that it takes almost 2 hours to run.
Is there a different, more efficient solution to what I am trying to accomplish? I need to have the speed of the first code, but the result of the second.
Perhaps you can reduce the Concat/NullIf processing by using a UNION ALL
Select ID,QType,Value=CharV From #YourTable where CharV>''
Union All
Select ID,QType,Value=cast(NumV as varchar(25)) From #YourTable where NumV>0
For the conditional aggregation approach
No need to worry about which field, just reference VALUE
Select [ID]
,[Accnum] = Max(Case When [QType] = 'AccNum' Then Value End)
,[EmpNam] = Max(Case When [QType] = 'EmpNam' Then Value End)
,[UW] = Max(Case When [QType] = 'UW' Then Value End)
From (
Select ID,QType,Value=CharV From #YourTable where CharV>''
Union All
Select ID,QType,Value=cast(NumV as varchar(25)) From #YourTable where NumV>0
) A
Group By ID
For the PIVOT approach
Select [ID],[AccNum],[EmpNam],[UW]
From (
Select ID,QType,Value=CharV From #YourTable where CharV>''
Union All
Select ID,QType,Value=cast(NumV as varchar(25)) From #YourTable where NumV>0
) A
Pivot (max([Value]) For [QType] in ([AccNum],[EmpNam],[UW])) p

Aggregate data from multiple rows into single row

In my table each row has some data columns Priority column (for example, timestamp or just an integer). I want to group my data by ID and then in each group take latest not-null column. For example I have following table:
id A B C Priority
1 NULL 3 4 1
1 5 6 NULL 2
1 8 NULL NULL 3
2 634 346 359 1
2 34 NULL 734 2
Desired result is :
id A B C
1 8 6 4
2 34 346 734
In this example table is small and has only 5 columns, but in real table it will be much larger. I really want this script to work fast. I tried do it myself, but my script works for SQLSERVER2012+ so I deleted it as not applicable.
Numbers: table could have 150k of rows, 20 columns, 20-80k of unique ids and average SELECT COUNT(id) FROM T GROUP BY ID is 2..5
Now I have a working code (thanks to #ypercubeᵀᴹ), but it runs very slowly on big tables, in my case script can take one minute or even more (with indices and so on).
How can it be speeded up?
SELECT
d.id,
d1.A,
d2.B,
d3.C
FROM
( SELECT id
FROM T
GROUP BY id
) AS d
OUTER APPLY
( SELECT TOP (1) A
FROM T
WHERE id = d.id
AND A IS NOT NULL
ORDER BY priority DESC
) AS d1
OUTER APPLY
( SELECT TOP (1) B
FROM T
WHERE id = d.id
AND B IS NOT NULL
ORDER BY priority DESC
) AS d2
OUTER APPLY
( SELECT TOP (1) C
FROM T
WHERE id = d.id
AND C IS NOT NULL
ORDER BY priority DESC
) AS d3 ;
In my test database with real amount of data I get following execution plan:
This should do the trick, everything raised to the power 0 will return 1 except null:
DECLARE #t table(id int,A int,B int,C int,Priority int)
INSERT #t
VALUES (1,NULL,3 ,4 ,1),
(1,5 ,6 ,NULL,2),(1,8 ,NULL,NULL,3),
(2,634 ,346 ,359 ,1),(2,34 ,NULL,734 ,2)
;WITH CTE as
(
SELECT id,
CASE WHEN row_number() over
(partition by id order by Priority*power(A,0) desc) = 1 THEN A END A,
CASE WHEN row_number() over
(partition by id order by Priority*power(B,0) desc) = 1 THEN B END B,
CASE WHEN row_number() over
(partition by id order by Priority*power(C,0) desc) = 1 THEN C END C
FROM #t
)
SELECT id, max(a) a, max(b) b, max(c) c
FROM CTE
GROUP BY id
Result:
id a b c
1 8 6 4
2 34 346 734
One alternative that might be faster is a multiple join approach. Get the priority for each column and then join back to the original table. For the first part:
select id,
max(case when a is not null then priority end) as pa,
max(case when b is not null then priority end) as pb,
max(case when c is not null then priority end) as pc
from t
group by id;
Then join back to this table:
with pabc as (
select id,
max(case when a is not null then priority end) as pa,
max(case when b is not null then priority end) as pb,
max(case when c is not null then priority end) as pc
from t
group by id
)
select pabc.id, ta.a, tb.b, tc.c
from pabc left join
t ta
on pabc.id = ta.id and pabc.pa = ta.priority left join
t tb
on pabc.id = tb.id and pabc.pb = tb.priority left join
t tc
on pabc.id = tc.id and pabc.pc = tc.priority ;
This can also take advantage of an index on t(id, priority).
previous code will work with following syntax:
with pabc as (
select id,
max(case when a is not null then priority end) as pa,
max(case when b is not null then priority end) as pb,
max(case when c is not null then priority end) as pc
from t
group by id
)
select pabc.Id,ta.a, tb.b, tc.c
from pabc
left join t ta on pabc.id = ta.id and pabc.pa = ta.priority
left join t tb on pabc.id = tb.id and pabc.pb = tb.priority
left join t tc on pabc.id = tc.id and pabc.pc = tc.priority ;
This looks rather strange. You have a log table for all column changes, but no associated table with current data. Now you are looking for a query to collect your current values from the log table, which is a laborious task naturally.
The solution is simple: have an additional table with the current data. You can even link the tables with a trigger (so either every time a record gets inserted in your log table you update the current table or everytime a change is written to the current table you write a log entry).
Then just query your current table:
select id, a, b, c from currenttable order by id;