SQL Transform of raw data into summary table? - sql

I have a table full of survey data which I need to transform into a nice summary table. The survey question was (for example) "Rank your restaurant preferences in order". The raw data looks like:
CUST_ID WENDYS_RANK MCDONALDS_RANK BURGERKING_RANK
1 First Third Second
2 Second First Third
3 None First Second
4 Second Third First
(repeat for 100,000+ records)
I need to turn this into a nice table that looks like:
NAME NUM_FIRST NUM_SECOND NUM_THIRD
Wendys 1 2 0
McDonalds 2 0 2
BK 1 2 1
But it's been so darn long since I did a transformation like this I not only forgot how to write the SQL, I forgot what this transformation is called, which makes it hard to google. Can anyone help me?
Thanks...

Here is one method:
select 'Wendys',
sum(case when Wendys_Rank = 'First' then 1 else 0 end) as Rank1,
sum(case when Wendys_Rank = 'Second' then 1 else 0 end) as Rank2,
sum(case when Wendys_Rank = 'Third' then 1 else 0 end) as Rank3
from surveydata
union all
select 'McDonalds',
sum(case when McDonalds_Rank = 'First' then 1 else 0 end) as Rank1,
sum(case when McDonalds_Rank = 'Second' then 1 else 0 end) as Rank2,
sum(case when McDonalds_Rank = 'Third' then 1 else 0 end) as Rank3
from surveydata
union all
select 'BK',
sum(case when BK_Rank = 'First' then 1 else 0 end) as Rank1,
sum(case when BK_Rank = 'Second' then 1 else 0 end) as Rank2,
sum(case when BK_Rank = 'Third' then 1 else 0 end) as Rank3
from surveydata;

Related

Combining two aggregate queries into one

For some context, I am making an image browser which is connected to an SQLite database. Within the browser, similar images are grouped into an event (EventId) and each image (MicrosoftId) is labelled with a few tags (name).
I have these two queries on the same table (TagsMSCV) but pulling out different information. Ultimately I need to combine the information in my browser so if it was possible to combine these two queries (maybe with a JOIN?) it would be a lot faster and convenient for me. Both results of these queries share the EventId column.
1st Query ():
SELECT EventId as 'event', count(*) as 'size',
SUM(case when tag_count = 1 then 1 else 0 end) as '1',
SUM(case when tag_count = 2 then 1 else 0 end) as '2',
SUM(case when tag_count = 3 then 1 else 0 end) as '3'
FROM (SELECT EventId, MicrosoftId,
SUM(case when name in ('indoor', 'cluttered', 'screen') then 1 else 0 end) as tag_count
FROM TagsMSCV GROUP BY EventId, MicrosoftId) TagsMSCV
GROUP BY EventId ORDER BY 3 DESC, 2 DESC, 1 DESC
2nd Query
SELECT EventId,
SUM(CASE WHEN name = 'indoor' THEN 1 ELSE 0 END) as indoor,
SUM(CASE WHEN name = 'cluttered' THEN 1 ELSE 0 END) as cluttered,
SUM(CASE WHEN name = 'screen' THEN 1 ELSE 0 END) as screen
FROM TagsMSCV WHERE name IN ('indoor', 'cluttered', 'screen')
GROUP BY EventId
As you can see in both queries I am feeding in the tags 'necktie' 'man', 'male' and getting different information back.
SQL Fiddle Here: https://www.db-fiddle.com/f/f8WNimjmZAj1XXeCj4PHB8/3
You should do this all in one query:
SELECT EventId as event, count(*) as size,
SUM(case when (indoor + cluttered + screen) = 1 then 1 else 0 end) as tc_1,
SUM(case when (indoor + cluttered + screen) = 2 then 1 else 0 end) as tc_2,
SUM(case when (indoor + cluttered + screen) = 3 then 1 else 0 end) as tc_3,
SUM(indoor) as indoor,
SUM(cluttered) as cluttered,
SUM(screen) as screen
FROM (SELECT EventId, MicrosoftId,
SUM(CASE WHEN name = 'indoor' THEN 1 ELSE 0 END) as indoor,
SUM(CASE WHEN name = 'cluttered' THEN 1 ELSE 0 END) as cluttered,
SUM(CASE WHEN name = 'screen' THEN 1 ELSE 0 END) as screen
FROM TagsMSCV
GROUP BY EventId, MicrosoftId
) TagsMSCV
GROUP BY EventId
ORDER BY 3 DESC, 2 DESC, 1 DESC;
You need two aggregations to get the information about the tag counts. There is no need to add more aggregations and joins to the query.
You could use an Inner join subquery
SELECT TagsMSCV.EventId as 'event', count(*) as 'size',
SUM(case when tag_count = 1 then 1 else 0 end) as '1',
SUM(case when tag_count = 2 then 1 else 0 end) as '2',
SUM(case when tag_count = 3 then 1 else 0 end) as '3',
t.necktie,
t.man,
t.male
FROM (
SELECT EventId, MicrosoftId,
SUM(case when name in ('necktie' 'man', 'male') then 1 else 0 end) as tag_count
FROM TagsMSCV GROUP BY EventId, MicrosoftId
) TagsMSCV
INNER JOIN (
SELECT EventId,
SUM(CASE WHEN name = 'necktie' THEN 1 ELSE 0 END) as necktie,
SUM(CASE WHEN name = 'man' THEN 1 ELSE 0 END) as man,
SUM(CASE WHEN name = 'male' THEN 1 ELSE 0 END) as male
FROM TagsMSCV WHERE name IN ('necktie' 'man', 'male')
GROUP BY EventId
) t on t.EventId = TagsMSCV.EventId
GROUP BY TagsMSCV.EventId
ORDER BY 3 DESC, 2 DESC, 1 DESC

SQL sum of two conditional aggregation

So yesterday I learned about conditional aggregation. I'm fairly new to SQL.
Here is my query:
select
Year_CW,
sum(case when col = 0 then 1 else 0 end) as "Total_sampled(Checked)",
sum(case when col = 1 then 1 else 0 end) as "Total_unsampled(Not_Checked)",
sum(case when col = 0 AND col2 = 'accepted' then 1 else 0 end) as "Accepted",
sum(case when col = 0 AND col2 = 'accepted with comments' then 1 else 0 end) as "Accepted with comments",
sum(case when col = 0 AND col2 = 'request for rework' then 1 else 0 end) as "Request for rework",
sum(case when col = 0 AND col2 = 'rejected' then 1 else 0 end) as "Rejected",
sum(case when col = 0 Or col = 1 then 1 else 0 end) as "Total_DS"
from
(select
Year_CW, SAMPLED as col, APPROVAL as col2
from
View_TEST tv) tv
group by
Year_CW
order by
Year_CW desc
I'm basically just calculating some KPIs grouped by week.
Look at the row for "Total_DS". It is essentially the sum of the first two sums, "Total_sampled(Checked)" and "Total_unsampled(Not_Checked)".
Is there a way that I can add the two columns from the first two sums to get the third one instead of trying to get the data all over again? I feel performance wise this would be terrible practice. It doesn't matter for this database but I don't want to learn bad code practice from the start.
Thanks for helping.
You probably won't see a significant performance hit from what you're doing now as you already have all the data available, you're just repeating the case evaluation.
But you can't refer to the column aliases for the first two columns within the same level of query.
If you can't do a simple count as #Zeki suggested because you aren't sure if there might be values other than zero and one (though this looks rather like a binary true/false equivalent, so there may well be a check constraint limiting you to those values), or if you're just more interested in a more general case, you can use an inline view as #jarhl suggested:
select Year_CW,
"Total_sampled(Checked)",
"Total_unsampled(Not_Checked)",
"Accepted",
"Accepted with comments",
"Request for rework",
"Rejected",
"Total_sampled(Checked)" + "Total_unsampled(Not_Checked)" as "Total_DS"
from (
select Year_CW,
sum(case when col = 0 then 1 else 0 end) as "Total_sampled(Checked)",
sum(case when col = 1 then 1 else 0 end) as "Total_unsampled(Not_Checked)",
sum(case when col = 0 AND col2 = 'accepted' then 1 else 0 end) as "Accepted",
sum(case when col = 0 AND col2 = 'accepted with comments' then 1 else 0 end)
as "Accepted with comments",
sum(case when col = 0 AND col2 = 'request for rework' then 1 else 0 end)
as "Request for rework",
sum(case when col = 0 AND col2 = 'rejected' then 1 else 0 end) as "Rejected"
from (
select Year_CW, SAMPLED as col, APPROVAL as col2
from View_TEST tv
) tv
group by Year_CW
)
order by Year_CW desc;
The inner query gets the data and calculates the conditional aggregate values. The outer query just gets those values from the inner query, and also adds the Total_DS column to the result set by adding together the rwo values from the inner query.
You should generally avoid quoted identifiers, and if you really need them in your result set you should apply them at the last possible moment - so use unquoted identifiers in the inner query, and give them qupted aliases in the outer query. And personally if the point of a query is to count things, I prefer to use a conditional count over a conditional sum. I'm also not sure why you already have a subquery against your view, which just changes the column names and makes the main query slightly more obscure. So I might do this as:
select year_cw,
total_sampled_checked as "Total_sampled(Checked)",
total_unsampled_not_checked as "Total_unsampled(Not_Checked)",
accepted as "Accepted",
accepted_with_comments as "Accepted with comments",
request_for_rework as "Request for rework",
rejected as "Rejected",
total_sampled_checked + total_unsampled_not_checked as "Total_DS"
from (
select year_cw,
count(case when sampled = 0 then 1 end) as total_sampled_checked,
count(case when sampled = 1 then 1 end) as total_unsampled_not_checked,
count(case when sampled = 0 and approval = 'accepted' then 1 end) as accepted,
count(case when sampled = 0 and approval = 'accepted with comments' then 1 end)
as accepted_with_comments,
count(case when sampled = 0 and approval = 'request for rework' then 1 end)
as request_for_rework,
count(case when sampled = 0 and approval = 'rejected' then 1 end) as rejected
from view_test
group by year_cw
)
order by year_cw desc;
Note that in the case expression, then 1 can be then <anything that isn't null>, so you could do then sampled or whatever. I've left out the implicit else null. As count() ignores nulls, all the case expression has to do is evaluate to any not-null value for the rows you want to include in the count.
You can try below
select Year_CW,
sum(case when col = 0 then 1 else 0 end) as "Total_sampled(Checked)",
sum(case when col = 1 then 1 else 0 end) as "Total_unsampled(Not_Checked)",
sum(case when col = 0 AND col2 = 'accepted' then 1 else 0 end) as "Accepted",
sum(case when col = 0 AND col2 = 'accepted with comments' then 1 else 0 end) as "Accepted with comments",
sum(case when col = 0 AND col2 = 'request for rework' then 1 else 0 end) as "Request for rework",
sum(case when col = 0 AND col2 = 'rejected' then 1 else 0 end) as "Rejected",
sum(sum(case when col = 0 then 1 else 0 end) = 0 Or sum(case when col = 1 then 1 else 0 end) = 1 then 1 else 0 end) as "Total_DS"
from (select Year_CW, SAMPLED as col, APPROVAL as col2
from View_TEST tv
) tv
group by Year_CW
order by Year_CW desc

Exclude records that have sum greater than 1

I have query returning details of customers that are subscribed to channel xyz or all other channels.
To generate this results i am using the following query:
select customerID
,sum(case when channel='xyz' then 1 else 0 end) as 'xyz Count'
,sum(case when channel<>'xyz' then bundle_qty else 0 end) as 'Other'
From temptable
So my Question is, how do i Exclude customers that are subscribed to 2 channels, where one is xyz and one is another channel.
select customerID
,sum(case when channel='xyz' then 1 else 0 end) as 'xyz Count'
,sum(case when channel<>'xyz' then bundle_qty else 0 end) as 'Other'
From temptable
group by customerID
having sum(case when channel= 'xyz' then 1 else 0 end) > 0
and sum(case when channel<>'xyz' then 1 else 0 end) > 0
First, your query is not correct. It needs a group by. Second, you can do what you want using having:
select customerID,
sum(case when channel = 'xyz' then 1 else 0 end) as xyz_Count,
sum(case when channel<>'xyz' then bundle_qty else 0 end) as Other
From temptable
group by customerID
having count(*) = 2 and
sum(case when channel = 'xyz' then 1 else 0 end) = 1;
If customers can subscribe to the same channel multiple times, and you still want only "xyz" and another channel, then:
having count(distinct channel) = 2 and
(min(channel) = 'xyz' or max(channel) = 'xyz')

how to sum count sql

How to total count?
SELECT
COUNT(CASE WHEN SHP.id = 1 then 1 ELSE NULL END) as "New",
COUNT(CASE WHEN SHP.id = 2 then 5 ELSE NULL END) as "Accepted"
from SHP
RESULT:
NEW Accepted
1 5
But I need a total count
result: 6
I'd do something like this;
SELECT
COUNT(CASE WHEN id = 1 THEN 1 END) as New,
COUNT(CASE WHEN id = 2 THEN 5 END) as Accepted,
COUNT(CASE WHEN id = 1 THEN 1
WHEN id = 2 THEN 5 END) as Total
FROM SHP
This is exactly what the CASE statement should be used for, the logic is very simple. This will avoid having to perform multiple calculations on the same fields.
As a note, the value in your THEN statement isn't used in this instance at all, it's just doing a COUNT of the number rather than performing a SUM. I've also removed the ELSE NULL because this is what the CASE will do by default anyway.
If your intention was to SUM the values then do this;
SELECT
SUM(CASE WHEN id = 1 THEN 1 END) as New,
SUM(CASE WHEN id = 2 THEN 5 END) as Accepted,
SUM(CASE WHEN id = 1 THEN 1
WHEN id = 2 THEN 5 END) as Total
FROM SHP
Example
Assuming you have only two values in your database, 1 and 2, we can create test data like this;
CREATE TABLE #SHP (id int)
INSERT INTO #SHP (id)
VALUES (1),(2)
And use this query;
SELECT
SUM(CASE WHEN id = 1 then 1 END) as New,
SUM(CASE WHEN id = 2 then 5 END) as Accepted,
SUM(CASE WHEN id = 1 THEN 1
WHEN id = 2 THEN 5 END) as Total
FROM #SHP
Gives this result;
New Accepted Total
1 5 6
Try this:
SELECT
COUNT(CASE WHEN SHP.id = 1 then 1 ELSE NULL END) +
COUNT(CASE WHEN SHP.id = 2 then 5 ELSE NULL END) as "Total"
from SHP
You could wrap your query into a subquery and do something like this:
SELECT SUM(New) as New, Sum(Accepted) as Accepted, Sum(New + Accepted) as Total FROM
(SELECT
COUNT(CASE WHEN SHP.id = 1 then 1 ELSE NULL END) as "New",
COUNT(CASE WHEN SHP.id = 2 then 5 ELSE NULL END) as "Accepted"
from SHP) as SubQuery
That's if you don't want to duplicate doing the counts and just adding the two together.
try this
with s1 as(
SELECT
COUNT(CASE WHEN SHP.id = 1 then 1 ELSE 0 END) as "New"
from SHP
),s2 as
(
SELECT
COUNT(CASE WHEN SHP.id = 2 then 5 ELSE 0 END) as "Accepted"
from SHP
)
select sum("New"+ "Accepted") from s1,s2

Dynamic SQL Statement Creation - Scala

I have a file with following values
matchId Id
2e0c6c42-68ac-43e0-b130-1b986f61a462 segA
2e0c6c42-68ac-43e0-b130-1b986f61a463 segB
2e0c6c42-68ac-43e0-b130-1b986f61a463 segC
2e0c6c42-68ac-43e0-b130-1b986f61a463 segA
I want pivoted result like below
matchid segA segB segC
2e0c6c42-68ac-43e0-b130-1b986f61a463 1 1 1
2e0c6c42-68ac-43e0-b130-1b986f61a462 1 0 0
This means some id is present in SegA and some in all segments (should be denoted as binary 1's and 0's)
Since number of segments can vary I want SQL statement to be generated dynamically using Scala as below (it should scale up or down as per number of segments e.g. today i have 10 seg tomorrow it can be 5 and so on.)
From SQL (AWS Redshift DB) perspective I can generate below query if I know number of segments already but this can get complicated as number of segment increase.
CREATE TABLE pivotedsegments distkey(match_id) AS (
SELECT match_id,
MAX(CASE WHEN segment='Seg1' then 1 else 0 end) Seg1,
MAX(CASE WHEN segment='Seg2' then 1 else 0 end) Seg2,
MAX(CASE WHEN segment='Seg3' then 1 else 0 end) Seg3,
MAX(CASE WHEN segment='Seg4' then 1 else 0 end) Seg4,
MAX(CASE WHEN segment='Seg5' then 1 else 0 end) Seg5,
MAX(CASE WHEN segment='Seg6' then 1 else 0 end) Seg6,
MAX(CASE WHEN segment='Seg7' then 1 else 0 end) Seg7,
MAX(CASE WHEN segment='Seg8' then 1 else 0 end) Seg8,
MAX(CASE WHEN segment='Seg9' then 1 else 0 end) Seg9,
MAX(CASE WHEN segment='Seg10' then 1 else 0 end) Seg10,
MAX(CASE WHEN segment='Seg11' then 1 else 0 end) Seg11,
.
.
.
.
From some-table group by matchid;
So I want to design an scala API that can read such file and convert result in pivoted manner.
Please suggest.