Changing Row/Column in Aggregate function in SQL - sql

Here is a SQL Function - sorry about the complexity;
SELECT [Codes].[Description], RawData.FS, COUNT(*) As 'Total Units'
FROM RawData, Codes, Categories
WHERE RawData.ACR = Codes.Name
AND Codes.CategoryName = 'ACR'
GROUP BY [Codes].[Description], [RawData].[FS]
ORDER BY [RawData].[FS]
In description - there is a Codes table that contains codes that are used in the table RawData for each of the columns. A second table called Categories keeps track of all these columns and Codes.CategoryName is a FK to Categories.Name. Basically creating a single lookup table for each of the coded values in RawData.
The field RawData.FS has 3 values NULL, 1, and 2. The RawData.ACR has 3 values corresponding to the descriptions of less than 1 acre, 1-10 acres, > 10 acres. The query above gives the correct results
Description FS Total Units
House on less than one acre 57080
House on one to less than ten acres 4760
House on ten acres or more 880
House on less than one acre 1 31496
House on one to less than ten acres 1 4312
House on ten acres or more 1 360
House on less than one acre 2 594404
House on one to less than ten acres 2 74688
House on ten acres or more 2 9104
The challenge here is to redo the SQL so that instead of 3 sets of 3 rows, theirs is a column corresponding to values of FS. In other words the header would be (for the first row);
Description FS=NULL FS=1 FS=2
House on less than one acre 57080 31496 594404
As a little bit more help - here is the SQL to create the Category and Code structure
CREATE TABLE Categories (
[Name] NVARCHAR(50) PRIMARY KEY,
[Description] NVARCHAR(200)
)
CREATE TABLE Codes (
[Name] NVARCHAR(50),
[CategoryName] NVARCHAR(50) FOREIGN KEY REFERENCES Categories(Name),
[Description] NVARCHAR(200) )
Every field in RawData is coded (in fact the data dictionary is at http://www.census.gov/acs/www/Downloads/data_documentation/pums/DataDict/PUMS_Data_Dictionary_2009-2011.pdf ). This is one of those classic SQL puzzles.

It sounds like you want to do the following which will pivot the FS values into columns:
SELECT [Codes].[Description],
sum(case when RawData.FS is null then 1 else 0 end) FS_null,
sum(case when RawData.FS = 1 then 1 else 0 end) FS_1,
sum(case when RawData.FS = 2 then 1 else 0 end) FS_2
FROM RawData
INNER JOIN Codes
ON RawData.ACR = Codes.Name
INNER JOIN Categories
ON Codes.CategoryName = Categories.Name
WHERE Codes.CategoryName = 'ACR'
GROUP BY [Codes].[Description]

Untested but it seems a case(s) and a subselect would work.
Select description,
case when fs is null then total_units end as 'FS=Null',
CASE WHEN fs = 1 then total_units end as 'FS=1',
case when fs = 2 then total_units end as 'FS=2'
FROM (
SELECT [Codes].[Description],
RawData.FS, COUNT(*) As 'Total_units'
FROM RawData, Codes, Categories
WHERE RawData.ACR = Codes.Name
AND Codes.CategoryName = 'ACR'
GROUP BY [Codes].[Description])
ORDER BY description

Related

how can I count some values for data in a table based on same key in another table in Bigquery?

I have one table like bellow. Each id is unique.
id
times_of_going_out
fef666
2
S335gg
1
9a2c50
1
and another table like this one ↓. In this second table the "id" is not unique, there are different "category_name" for a single id.
id
category_name
city
S335gg
Games & Game Supplies
tk
9a2c50
Telephone Companies
os
9a2c50
Recreation Centers
ky
fef666
Recreation Centers
ky
I want to find the difference between destinations(category_name) of people who go out often(times_of_going_out<5) and people who don't go out often(times_of_going_out<=5).
** Both tables are a small sample of large tables.
 ・ Where do people who go out twice often go?
 ・ Where do people who go out 6times often go?
Thank you
The expected result could be something like
less than 5
more than 5
top ten “category_name” for uid’s with "times_of_going_out" less than 5 times
top ten “category_name” for uid’s with "times_of_going_out" more than 5 times
Steps:
combining data and aggregating total time_going_out
creating the categories that you need : less than equal to 5 and more than 5. if you don't need equal to 5, you can adjust the code
ranking both categories with top 10, using dense_rank(). this will produce the rank from 1 - 10 based on the total time_going out
filtering the cases so it takes top 10 values for both categories
with main as (
select
category_name,
sum(coalesce(times_of_going_out,0)) as total_time_per_category
from table1 as t1
left join table2 as t2
on t1.id = t2.id
group by 1
),
category as (
select
*,
if(total_time_per_category >= 5, 'more than 5', 'less than equal to 5') as is_more_than_5_times
from main
),
ranking_ as (
select *,
case when
is_more_than_5_times = 'more than 5' then
dense_rank() over (partition by is_more_than_5_times order by total_time_per_category desc)
else NULL
end AS rank_more_than_5,
case when
is_more_than_5_times = 'less than equal to 5' then
dense_rank() over (partition by is_more_than_5_times order by total_time_per_category)
else NULL
end AS rank_less_than_equal_5
from category
)
select
is_more_than_5_times,
string_agg(category_name,',') as list
from ranking_
where rank_less_than_equal_5 <=10 or rank_more_than_5 <= 10
group by 1

Can you explain the difference between these two SQL queries? Codility SQL Exercise 3

I came up with a solution to the below scenario that generated the correct results with the test data, but when it was graded it only got 36% correct when using different data. Someone else asked for the solution to this problem here (How do i crack this SQL Soccer Matches assignment?) and I found Strange Coder's solution to be similar to mine. This solution got a 100%. What is the difference between them?
Set Up
You are given two tables, teams and matches, with the following structures:
create table teams (
team_id integer not null,
team_name varchar(30) not null,
unique(team_id)
);
create table matches (
match_id integer not null,
host_team integer not null,
guest_team integer not null,
host_goals integer not null,
guest_goals integer not null,
unique(match_id)
);
Each record in the table teams represents a single soccer team. Each record in the table matches represents a finished match between two teams. Teams (host_team, guest_team) are represented by their IDs in the teams table (team_id). No team plays a match against itself. You know the result of each match (that is, the number of goals scored by each team).
You would like to compute the total number of points each team has scored after all the matches described in the table. The scoring rules are as follows:
If a team wins a match (scores strictly more goals than the other team), it receives three points.
If a team draws a match (scores exactly the same number of goals as the opponent), it receives one point.
If a team loses a match (scores fewer goals than the opponent), it receives no points.
Write an SQL query that returns a ranking of all teams (team_id) described in the table teams. For each team you should provide its name and the number of points it received after all described matches (num_points). The table should be ordered by num_points (in decreasing order). In case of a tie, order the rows by team_id (in increasing order).
For example, for:
teams:
team_id
team_name
10
Give
20
Never
30
You
40
Up
50
Gonna
matches:
match_id
host_team
guest_team
host_goals
guest_goals
1
30
20
1
0
2
10
20
1
2
3
20
50
2
2
4
10
30
1
0
5
30
50
0
1
your query should return:
team_id
team_name
num_points
20
Never
4
50
Gonna
4
10
Give
3
30
You
3
40
Up
0
My Solution
SELECT t.team_id, t.team_name, COALESCE(SUM(num_points), 0) AS num_points
FROM(
SELECT t.team_id, t.team_name,
(CASE WHEN m.host_goals > m.guest_goals THEN 3
WHEN m.host_goals = m.guest_goals THEN 1
WHEN m.host_goals < m.guest_goals THEN 0
END) AS num_points
FROM teams t
JOIN matches m
ON t.team_id = m.host_team
UNION
SELECT t.team_id, t.team_name,
(CASE WHEN m.guest_goals > m.host_goals THEN 3
WHEN m.guest_goals = m.host_goals THEN 1
WHEN m.guest_goals < m.host_goals THEN 0
END) AS num_points
FROM teams t
JOIN matches m
ON t.team_id = m.guest_team
) AS c
RIGHT JOIN teams t
ON t.team_id = c.team_id
GROUP BY t.team_id, t.team_name
ORDER BY COALESCE(SUM(num_points), 0) DESC, t.team_id
Strange Coder's Solution
How do i crack this SQL Soccer Matches assignment?
From Strange Coder
select team_id, team_name,
coalesce(sum(case when team_id = host_team then
(
case when host_goals > guest_goals then 3
when host_goals = guest_goals then 1
when host_goals < guest_goals then 0
end
)
when team_id = guest_team then
(
case when guest_goals > host_goals then 3
when guest_goals = host_goals then 1
when guest_goals < host_goals then 0
end
)
end), 0) as num_points
from Teams
left join Matches
on
Teams.team_id = Matches.host_team
or Teams.team_id = Matches.guest_team
group by team_id, team_name
order by num_points desc, team_id;
I have figured it out. I should have used UNION ALL instead of UNION.
Alternative solution, can simply unpivot your results with CROSS APPLY instead of using UNION. Also no need to calculate ties in your CASE statement as your simply going to SUM() the results and 0 won't affect it.
Calculate Total Points per Team
DROP TABLE IF EXISTS #Team
DROP TABLE IF EXISTS #Match
CREATE TABLE #Team (team_id INT, team_name VARCHAR(100))
INSERT INTO #Team VALUES (10,'Give'),(20,'Never'),(30,'You'),(40,'Up'),(50,'Gonna')
CREATE TABLE #Match (match_id INT,host_team INT,guest_team INT,host_goals INT,guest_goals INT)
INSERT INTO #Match VALUES
(1,30,20,1,0)
,(2,10,20,1,2)
,(3,20,50,2,2)
,(4,10,30,1,0)
,(5,30,50,0,1)
;WITH cte_TotalPoints AS
(
SELECT C.team_id,SUM(C.Points) AS TotalPoints
FROM #Match AS A
CROSS APPLY (
SELECT host_points = CASE
WHEN A.host_goals > A.guest_goals THEN 3
WHEN A.host_goals = A.guest_goals THEN 1
END
,guest_points = CASE
WHEN A.guest_goals > A.host_goals THEN 3
WHEN A.host_goals = A.guest_goals THEN 1
END
) AS B
CROSS APPLY (
VALUES
(host_team,host_points)
,(guest_team,guest_points)
) AS C(team_id,points)
GROUP BY c.team_id
)
SELECT A.team_id
,A.team_name
,TotalPoints = ISNULL(TotalPoints,0)
FROM #Team AS A
LEFT JOIN cte_TotalPoints AS B
ON A.team_id = B.team_id

Separate columns for product counts using CTEs

Asking a question again as my post did not follow community rules.
I first tried to write a PIVOT statement to get the desired output. However, I am now trying to approach this using CTEs.
Here's the raw data. Let's call it ProductMaster:
PRODUCT_NUM
CO_CD
PROD_CD
MASTER_ID
Date
ROW_NUM
1854
MAWC
STATIONERY
10003493039
1/1/2021
1
1567
PREF
PRINTER
10003493039
2/1/2021
2
2151
MAWC
STATIONERY
10003497290
3/2/2021
1
I require the Count of each product for every Household from this data in separate columns, Printer_CT, Stationery_Ct
Each Master_ID represents a household. And a household can have multiple products.
So each household represents one row in my final output and I need the Product Counts in separate columns. There can be multiple products in each household, 4 or even more. But I have simplified this example.
I'm writing a query with CTEs to give me the output that I want. In my output, each row is grouped by Master ID
ORGL_CO_CD
ORGL_PROD_CD
STATIONERY_CT
PRINTER_CT
MAWC
STATIONERY
1
1
MAWC
STATIONERY
1
0
Here's my query. I'm not sure where to introduce Column 'Stationery_Ct'
WITH CTE AS
(
SELECT
CO_CD, Prod_CD, MASTER_ID,
'' as S1_CT, '' as P1_CT
FROM
ProductMaster
WHERE
ROW_NUM = 1
), CTE_2 AS
(
SELECT Prod_CD, MASTER_ID
FROM ProductMaster
WHERE ROW_NUM = 2
)
SELECT
CO_CD AS ORGL_CO_CD,
c.Prod_CD AS ORGL_PROD_CD,
(CASE WHEN c2.Prod_CD = ‘PRINTER’ THEN P1_CT = 1 END) PRINTER_CT
FROM
CTE AS c
LEFT OUTER JOIN
CTE_2 AS c2 ON c.MASTER_ID = c2.MASTER_ID
Any pointers would be appreciated.
Thank you!
I guess you can solve that using just GROUP BY and SUM:
-- Test data
DECLARE #ProductMaster AS TABLE (PRODUCT_NUM INT, CO_CD VARCHAR(30), PROD_CD VARCHAR(30), MASTER_ID BIGINT)
INSERT #ProductMaster VALUES (1854, 'MAWC', 'STATIONERY', 10003493039)
INSERT #ProductMaster VALUES (1567, 'PREF', 'PRINTER', 10003493039)
INSERT #ProductMaster VALUES (2151, 'MAWC', 'STATIONERY', 10003497290)
SELECT
MASTER_ID,
SUM(CASE PROD_CD WHEN 'STATIONERY' THEN 1 ELSE 0 END) AS STATIONERY_CT,
SUM(CASE PROD_CD WHEN 'PRINTER' THEN 1 ELSE 0 END) AS PRINTER_CT
FROM #ProductMaster
GROUP BY MASTER_ID
The result is:
MASTER_ID
STATIONERY_CT
PRINTER_CT
10003493039
1
1
10003497290
1
0

SQL count of related products via foreign key

We need to build a tree view with the group structure and a count of all the products that fit into each level below it like:
1 - Drink related (120)
> 11 - Plastic Cups (70)
> 111 - Vending Machine Cups (20)
> 1111 - Vending Machine Cups 100-150cc (12)
o Cup 1
o Cup 2
etc..
> 112 - Party Cups (25)
> 113 - Childrens Cups (25)
> 12 - Paper Cups (50)
2 - Food related (198)
> 21 - Plastic Plates (75)
etc...
So, we have two tables. One holds products, one holds a group structure for the products. Each product row contains only a direct foreign key link to the ID of the group row it belongs to, so for example Product ID 42231 has a reference to Group ID 4 as it is a clear plastic cup of a certain size. A product can fit in at any group level, it won't necessarily be always at the 4th group level if it doesn't fit a specific category. (So, for example, a new line of a drink cup may be dumped in Group ID 1 "Drink Related" until it eventually gets its own category some months later.)
The group table (currently) has 1800 rows and basically forms a category tree. Each group ID is alphanumeric as some groups have too many variants to work with just numbers so:
ID Gp1 Gp2 Gp3 Gp4 Desc
1 1 0 0 0 Drink Related
2 1 11 0 0 Plastic Cups
3 1 11 111 0 Vending machine cups
4 1 11 111 1111 Vending machine cups 100-150cc
If I only wanted to show the exact number of products in each ID, I could do something like this:
select *,
(select count(1)
from products
where groupID=g.id and isDeleted=0)
as groupProductCount
from groups g
order by g.group1, g.group2, g.group3, g.group4
...but I'm after a more recursive count where it shows the count for all products below the current level so at a glance I can see there are 120 drink-related products within group 1 and not 3 which are directly in group ID 1 at the moment.
Personally I think I'm going to have to get the DBA to add in the 4 group levels to the product record too as otherwise for each record in the group table I'd have to determine which level we're at (zeros in unused level, so a zero in group 4 means we're at level 3, zero in group 3 means we're at level 2, etc) and then scan through every product record (currently 10,000 and growing) to see if the group it falls in (read via the foreign key group ID) has a level that matches the current group level record I'm trying to count for.
I can't see that this can be achieved efficiently with just the group ID in the product record. Am I right here or am I missing something obvious?
Here is a brute force solution. (I assume SqlServer but it shouldn't be difficult to port to others).
WITH LeafGroup AS (
SELECT Id
,CASE WHEN Gp4 <> '0'
THEN Gp4
ELSE CASE WHEN Gp3 <> '0'
THEN Gp3
ELSE CASE WHEN Gp2 <> '0'
THEN Gp2
ELSE Gp1
END
END
END AS GroupId
FROM Groups
)
-- Build a delimited string from the groups
,DelimitedGroups AS (
SELECT Id
,Gp1 + '|' +
CASE WHEN Gp2 = '0' THEN '' ELSE Gp2 + '|' END +
CASE WHEN Gp3 = '0' THEN '' ELSE Gp3 + '|' END +
CASE WHEN Gp4 = '0' THEN '' ELSE Gp4 + '|' END AS Delim
FROM Groups
)
-- Find ids where the delimited string starts with the same groups
,ConnectedGroups AS (
SELECT DG1.Id AS TopId
, DG2.Id AS ConnectedId
FROM DelimitedGroups DG1
INNER JOIN DelimitedGroups DG2
ON LEFT(DG2.Delim, LEN(DG1.Delim)) = DG1.Delim
)
-- Now we can fetch all groups for each Id
,GroupsPerId AS (
SELECT ConnectedGroups.TopId AS Id
,LeafGroup.GroupId
FROM ConnectedGroups
INNER JOIN LeafGroup
ON ConnectedGroups.ConnectedId = LeafGroup.Id
)
-- Count all products for each id
SELECT GroupsPerId.Id
,COUNT(1)
FROM GroupsPerId
LEFT JOIN Products
ON GroupsPerId.GroupId = Products.GroupId
GROUP BY GroupsPerId.Id

MySQL Query - getting missing records when using group-by

I have a query :
select score, count(1) as 'NumStudents' from testresults where testid = 'mytestid'
group by score order by score
where testresults table contains the performances of students in a test. A sample result looks like the following, assuming maximum marks of the test is 10.
score, NumStudents
0 10 1 20 2 12 3 5 5 34 .. 10 23
As you can see, this query does not return any records for scores which no student have scored. For eg. nobody scored 4/10 in the test and there are no records for score = 4 in the query output.
I would like to change the query so that I can get these missing records with 0 as the value for the NumStudents field. So that my end output would have max + 1 records, one for each possible score.
Any ideas ?
EDIT:
The database contains several tests and the maximum marks for the test is part of the test definition. So having a new table for storing all possible scores is not feasible. In the sense that whenever I create a new test with a new max marks, I need to ensure that the new table should be changed to contain these scores as well.
SQL is good at working with sets of data values in the database, but not so good at sets of data values that are not in the database.
The best workaround is to keep one small table for the values you need to range over:
CREATE TABLE ScoreValues (score int);
INSERT INTO ScoreValues (score)
VALUES (0), (1), (2), (3), (4), (5), (6), (7), (8), (9), (10);
Given your comment that you define the max marks of a test in another table, you can join to that table in the following way, as long as ScoreValues is sure to have values at least as high or higher than the greatest test's max marks:
SELECT v.score, COUNT(tr.score) AS 'NumStudents'
FROM ScoreValues v
JOIN Tests t ON (v.score <= t.maxmarks)
LEFT OUTER JOIN TestResults tr ON (v.score = tr.score AND t.testid = tr.testid)
WHERE t.testid = 'mytestid'
GROUP BY v.score;
The most obvious way would be to create a table named "Scores" and left outer join your table to it.
SELECT s.score, COUNT(1) AS scoreCount
FROM score AS s
LEFT OUTER JOIN testScores AS ts
ON s.score = ts.score
GROUP BY s.score
If you don't want to create the table, you could use
SELECT
1 as score, SUM(CASE WHEN ts.score = 1 THEN 1 ELSE 0 END) AS scoreCount,
2 as score, SUM(CASE WHEN ts.score = 2 THEN 1 ELSE 0 END) AS scoreCount,
3 as score, SUM(CASE WHEN ts.score = 3 THEN 1 ELSE 0 END) AS scoreCount,
4 as score, SUM(CASE WHEN ts.score = 4 THEN 1 ELSE 0 END) AS scoreCount,
...
10 as score, SUM(CASE WHEN ts.score = 10 THEN 1 ELSE 0 END) AS scoreCount
FROM testScores AS ts
Does MySQL support set-returning functions? Recent releases of PostgreSQL have a function, generate_series(start, stop) that produces the value start on the first row, start+1 on the second, and so on up to stop on the stopth row. The advantage of this is that you can put this function in a subselect in the FROM clause and then join to it, instead of creating and populating a table and joining to that as suggested by le dorfier and Bill Karwin.
Just as a mental exercise I came up with this to generate a sequence in MySQL. As long as the number of tables in all databases on the box squared are less than the total length of the sequence it will work. I wouldn't recommend it for production though ;)
SELECT #n:=#n+1 as n from (select #n:=-1) x, Information_Schema.Tables y, Information_Schema.Tables WHERE #n<20; /* sequence from 0 to 20 inclusive */