SQL: Values in column listed twice after pivot - sql

When querying a specific table, I need to change the structure of the result, making it so that all values from a given year are on the same row, in separate columns that identify the category that the value belongs to.
The table looks like this (example data):
year | category | amount
1991 | A of s | 56
1992 | A of s | 55
1993 | A of s | 40
1994 | A of s | 51
1995 | A of s | 45
1991 | Total | 89
1992 | Total | 80
1993 | Total | 80
1994 | Total | 81
1995 | Total | 82
The result I need is this:
year | a_of_s | total
1991 | 56 | 89
1992 | 55 | 80
1993 | 40 | 80
1994 | 51 | 81
1995 | 45 | 82
From what I can understand I need to use pivot. However, my problem seems to be that I don't understand pivot. I've attempted to adapt the queries of solutions in similar questions where pivot seems to be part of the answer, and so far what I've come up with is this:
SELECT year, [A of s], [Total] FROM table
pivot (
max(amount)
FOR category in ([A of s], [Total])
) pvt
ORDER BY year
This returns the correct table structure, but all cells in the columns a_of_s and total are NULL, and every year is listed twice. What am I missing to get the result I need?
EDIT: After fixing the errors pointed out in the comments, the only real issue that remains is that years in the year column are listed twice.
Possibly related: Is the aggregate function I use in pivot (max, sum, min, etc) arbitrary?

I assumed you really don't need to pivot your table and with the result you require you can create an alternative approach to achieve it.
This is the query i made that returns according to your requirement.
;With cte as
(
select year, Amount from tbl
where category = 'A of s'
)
select
tbl1.year, tbl2.Amount as A_of_S, tbl1.Amount as Total
from tbl as tbl1
inner join cte as tbl2 on tbl1.year = tbl2.year
where tbl1.category = 'Total'
and this is the SQL fiddle i created for you with your test day. -> SQL fiddle

Much simpler answer:
WITH VTE AS(
SELECT *
FROM (VALUES (1991,'A of s',56),
(1992,'A of s',55),
(1993,'A of s',40),
(1994,'A of s',51),
(1995,'A of s',45),
(1991,'Total',89),
(1992,'Total',80),
(1993,'Total',80),
(1994,'Total',81),
(1995,'Total',82)) V([year],category, amount))
SELECT [year],
MAX(CASE category WHEN 'A of s' THEN amount END) AS [A of s],
MAX(CASE category WHEN 'Total' THEN amount END) AS Total
FROM VTE
GROUP BY [year];

Related

How to find the average by day in SQL?

I am super new to SQL and am trying to figure out how to find the average by day. So YTD what were they averaging by day.
the table below is an example of the table I am working with
Study Date | ID | Subject
01\01\2018 | 123 | Math
01\01\2018 | 456 | Science
01\02\2018 | 789 | Science
01\02\2018 | 012 | History
01\03\2018 | 345 | Science
01\03\2018 | 678 | History
01\03\2018 | 921 | Art
01\03\2018 | 223 | Science
01\04\2018 | 256 | English
For instance, If I filter on just 'Science' as the Subject, the output I am looking for is , out of the 4 science subject results, what is the daily average, min and max YTD.
So my max in a day would be 2 science subjects, my min would be 1 etc.
how can i configure a query to output this information?
So far I know to get the YTD total it would be
select SUBJECT, count (ID)
from table
where SUBJECT='science' and year (Study_date)=2022
group by SUBJECT
what would be the next step I have to take?
If you want the min/max of the daily subject count, then you need two levels of aggregation:
select subject, sum(cnt_daily) as cnt,
min(cnt_daily) as min_cnt_daily, max(cnt_daily) as max_cnt_daily
from (
select study_date, subject, count(*) as cnt_daily
from mytable
where study_date >= '2022-01-01'
group by study_date, subject
) t
group by subject
The subquery aggregates by day and by subject, and computes the number of occurences in each group. Then, the outer query groups by subject only, and computes the total count (that's the sum of the intermediate counts), along with the min/max daily values.
select Subject
,count(*) as total_count
,min(cnt) as min_daily_count
,max(cnt) as max_daily_count
,avg(cnt*1.0) as avg_daily_count
from
(
select *
,count(*) over(partition by Study_Date, Subject) as cnt
from t
) t
group by Subject
Subject
total_count
min_daily_count
max_daily_count
avg_daily_count
Art
1
1
1
1.000000
English
1
1
1
1.000000
History
2
1
1
1.000000
Math
1
1
1
1.000000
Science
4
1
2
1.500000
Fiddle

SQL MIN() with GROUP BY select additional columns

I am trying to query a sql database table for the minimum price for products. I also want to grab an additional column with the value of the row with the minimum price. My data looks something like this.
ProductId | Price | Location
1 | 50 | florida
1 | 55 | texas
1 | 53 | california
2 | 65 | florida
2 | 64 | texas
2 | 60 | new york
I can query the minimum price for a product with this query
select ProductId, Min(Price)
from Table
group by ProductId
What I want to do is also include the Location where the Min price is being queried from in the above query. Is there a standard way to achieve this?
One method uses a correlated subquery:
select t.*
from t
where t.price = (select min(t2.price) from t t2 where t2.productid = t.productid);
In most databases, this has very good performance with an index on (productid, price).

SQL Finding sum of rows and returning count of keys

For a database table looking something like this:
id | year | stint | sv
----+------+-------+---
mk1 | 2001 | 1 | 30
mk1 | 2001 | 2 | 20
ml0 | 1999 | 1 | 43
ml0 | 2000 | 1 | 44
hj2 | 1993 | 1 | 70
I want to get the following output:
count
-------
3
with the conditions being count the number of ids that have a sv > 40 for a single year greater than 1994. If there is more than one stint for the same year, add the sv points and see if > 40.
This is what I have written so far but it is obviously not right:
SELECT COUNT(DISTINCT id),
SUM(sv) as SV
FROM public.pitching
WHERE (year > 1994 AND sv >40);
I know the syntax is completely wrong and some of the conditions' information is missing but I'm not familiar enough with SQL and don't know how to properly do the summing of two rows in the same table with a condition (maybe with a subquery?). Any help would be appreciated! (using postgres)
You could use a nested query to get the aggregations, and wrap that for getting the count. Note that the condition on the sum must be in a having clause:
SELECT COUNT(id)
FROM (
SELECT id,
year,
SUM(sv) as SV
FROM public.pitching
WHERE year > 1994
GROUP BY id,
year
HAVING SUM(sv) > 40 ) years
If an id should only count once even it fulfils the condition in more than one year, then do COUNT(distinct id) instead of COUNT(id)
You can try like following using sum and partition by year.
select count( distinct year) from
(
select year, sum(sv) over (partition by year) s
from public.pitching
where year > 1994
) t where s>40
Online Demo

Impala: change the column type prior to perform the aggregation function for group by

I have a table, my_table:
transaction_id | money | team
--------------------------------------------
1 | 10 | A
2 | 20 | B
3 | null | A
4 | 30 | A
5 | 16 | B
6 | 12 | B
When I group by team, I can compute max, min through query:
select team, max(money), min(money) from my_table group by team
However, I can't do avg and sum because there is null. i.e:
select team, avg(money), sum(money) from my_table group by team
would fail.
Is there a way to change the column type prior to computing the avg and sum? i.e. I want the output to be:
team | avg(money) | sum(money)
--------------------------------------
A | 20 | 40
B | 16 | 48
Thanks!
Per documentation provided by Cloudera your query should be working as-is. Both AVG Function and
SUM Function ignore null.
SELECT team, AVG(money), SUM(money)
FROM my_table
GROUP BY team
UPDATE: Per your comment, again I'm not familiar with Impala. Presumably standard SQL will work. Your error appears to be a datatype issue.
SELECT team, AVG(CAST(money AS INT)), SUM(CAST(money AS INT))
FROM my_table
GROUP BY team
Just divide the sum by the count:
SELECT team, SUM(money)/COUNT(money) AS AVG, SUM(money)
FROM team
GROUP BY team
Tested here: http://sqlfiddle.com/#!9/ba381/4

Comparing in SQL and SUM

I really couldn't figure out a good title for this question, but I have a problem that I'm sure you can help me with!
I have a query which outputs something like this:
Month | Year | Subcategory | PrivateLabel | Price
-------------------------------------------------
1 | 2010 | 666 | No | -520
1 | 2010 | 666 | No | -499,75
1 | 2010 | 666 | No | -59,95
1 | 2010 | 666 | No | -49,73
1 | 2010 | 666 | No | -32,95
I want to SUM on the price because all the other data is the same. I thought I could do this with SUM and GROUP BY, but I can't figure out how to do it or at least it doesn't output the right result.
The query is an inner join between two tables, if that helps.
select
month
,year
,subcategory
,privatelabel
,sum(price) as [total sales]
from
a inner join b ...
where
any where clauses
group by
month
,year
,subcategory
,privatelabel
should work if i am understanding you correctly.. every colum in the select either needs to be part of the group by or an aggregate function on all rows in the group
added a fiddle.. mainly as i didn't know about he text to DDL functionality and wanted to test it ;-) (thanks Michael Buen)
http://sqlfiddle.com/#!3/35c1c/1
note the where clause is a place holder..
select month, year, subcategory, privatelabel, sum(price)
from (put your query in here) dummyName
group by month, year, subcategory, privatelabel
Basic idea is it will run your current query to get above output then do the sum and group by on the result.
You query has to be in parentheses and you have to give it some name e.g. dummyName. As long as it's unique in the sql and preferably not a key word, doesn't matter what it is.
There might be a way of doing all this in one go, but without the sql for your query we can't help.