SQL: Aggregations over partition by - sql

Query: Considering only Italian routes, for each category of goods and for each year
select the average daily income for each month
the total monthly income since the beginning of the year
SQL:
SELECT
gc.GoodCategory,
tm.Month,
tm.Year,
SUM(ro.Income) / COUNT(DISTINCT tm.Date),
SUM(ro.Income) OVER (PARTITION BY gc.GoodCategory, tm.Year
ORDER BY tm.Month ROWS UNBOUNDED PRECEDING)
FROM FactRoutes ro,
DimLocation dp,
DimLocation ds,
DimGoodCategory gc,
DimTime tm
WHERE ro.DepartureID = dp.LocationID
AND ro.DestinationID = ds.LocationID
AND ro.GoodCategoryID = gc.GoodCategoryID
AND ro.GoodTimeID = tm.GoodTimeID
AND dp.State = 'Italy'
AND ds.State = 'Italy'
GROUP BY gc.GoodCategory,
tm.Month,
tm.Year;
But facing the below error
Column 'FactRoutes.Income' is invalid in the select list
because it is not contained in either an aggregate function
or the GROUP BY clause.
whats the better way to handle it?

I think that you want:
SELECT
gc.GoodCategory,
tm.Month,
tm.Year,
SUM(ro.Income) / COUNT(DISTINCT tm.Date),
SUM(SUM(ro.Income)) OVER (PARTITION BY gc.GoodCategory, tm.Year ORDER BY tm.Month)
FROM FactRoutes ro
INNER JOIN DimLocation dp ON ro.DepartureID = dp.LocationID
INNER JOIN DimLocation ds ON ro.DestinationID = ds.LocationID
INNER JOIN DimGoodCategory gc ON ro.GoodCategoryID = gc.GoodCategoryID
INNER JOIN DimTime tm ON ro.GoodTimeID = tm.GoodTimeID
WHERE dp.State = 'Italy' AND ds.State = 'Italy'
GROUP BY gc.GoodCategory, tm.Month, tm.Year;
The main point is that, in order to make your query is not a valid aggregate query, you need to use an aggregate function within the window function, like SUM(SUM(ro.Income)) OVER (...) instead of just SUM(ro.Income) OVER(...), so you get a window sum over the previous groups of records.
Other notable points:
always use explicit joins (with the ON keyword) rather than old-school, implicit joins (with commas in the FROM clause), whose syntax has fallen out of favor for decades
ROWS UNBOUNDED PRECEDING is not needed; your window function has an ORDER BY clause so that's what it does anyway

Related

Group by after a partition by in MS SQL Server

I am working on some car accident data and am stuck on how to get the data in the form I want.
select
sex_of_driver,
accident_severity,
count(accident_severity) over (partition by sex_of_driver, accident_severity)
from
SQL.dbo.accident as accident
inner join SQL.dbo.vehicle as vehicle on
accident.accident_index = vehicle.accident_index
This is my code, which counts the accidents had per each sex for each severity. I know I can do this with group by but I wanted to use a partition by in order to work out % too.
However I get a very large table (I assume for each row that is each sex/severity. When I do the following:
select
sex_of_driver,
accident_severity,
count(accident_severity) over (partition by sex_of_driver, accident_severity)
from
SQL.dbo.accident as accident
inner join SQL.dbo.vehicle as vehicle on
accident.accident_index = vehicle.accident_index
group by
sex_of_driver,
accident_severity
I get this:
sex_of_driver
accident_severity
(No column name)
1
1
1
1
2
1
-1
2
1
-1
1
1
1
3
1
I won't give you the whole table, but basically, the group by has caused the count to just be 1.
I can't figure out why group by isn't working. Is this an MS SQL-Server thing?
I want to get the same result as below (obv without the CASE etc)
select
accident.accident_severity,
count(accident.accident_severity) as num_accidents,
vehicle.sex_of_driver,
CASE vehicle.sex_of_driver WHEN '1' THEN 'Male' WHEN '2' THEN 'Female' end as sex_col,
CASE accident.accident_severity WHEN '1' THEN 'Fatal' WHEN '2' THEN 'Serious' WHEN '3' THEN 'Slight' end as serious_col
from
SQL.dbo.accident as accident
inner join SQL.dbo.vehicle as vehicle on
accident.accident_index = vehicle.accident_index
where
sex_of_driver != 3
and
sex_of_driver != -1
group by
accident.accident_severity,
vehicle.sex_of_driver
order by
accident.accident_severity
You seem to have a misunderstanding here.
GROUP BY will reduce your rows to a single row per grouping (ie per pair of sex_of_driver, accident_severity values. Any normal aggregates you use with this, such as COUNT(*), will return the aggregate value within that group.
Whereas OVER gives you a windowed aggregated, and means you are calculating it after reducing your rows. Therefore when you write count(accident_severity) over (partition by sex_of_driver, accident_severity) the aggregate only receives a single row in each partition, because the rows have already been reduced.
You say "I know I can do this with group by but I wanted to use a partition by in order to work out % too." but you are misunderstanding how to do that. You don't need PARTITION BY to work out percentage. All you need to calculate a percentage over the whole resultset is COUNT(*) * 1.0 / SUM(COUNT(*)) OVER (), in other words a windowed aggregate over a normal aggregate.
Note also that count(accident_severity) does not give you the number of distinct accident_severity values, it gives you the number of non-null values, which is probably not what you intend. You also have a very strange join predicate, you probably want something like a.vehicle_id = v.vehicle_id
So you want something like this:
select
sex_of_driver,
accident_severity,
count(*) as Count,
count(*) * 1.0 /
sum(count(*)) over (partition by sex_of_driver) as PercentOfSex
count(*) * 1.0 /
sum(count(*)) over () as PercentOfTotal
from
dbo.accident as accident a
inner join dbo.vehicle as v on
a.vehicle_id = v.vehicle_id
group by
sex_of_driver,
accident_severity;

Oracle: select just last update of date

I have the following query that return me: 100 rows
SELECT uni_id, uni_mast_id, uni_type
FROM UNIVERSITIES
WHERE uni_master ='SO88'AND uni_stat= 'OK'
now i need to do a join with another table and to obtain last entry of that day then:
SELECT uni_id, uni_teach_name, MAX(cal_update), cal_status
FROM UNIVERSITIES
LEFT JOIN CALENDAR
ON unı_id = cal_id
WHERE uni_master = 'SO88'
AND uni_stat = 'OK'
AND cal_name = 'REGISTRED'
GROUP BY uni_id, uni_teach_name, uni_stat
ORDER BY cal_update
but this query gives me 102 records, because cal_update appears 2 times.
One for example with date : 22-OCT-2020 11:34:55 another for the same uni_id at time 22-OCT-2020 11:30:22
I want just to get the max date for that date, not both.
In this case the query with the join needs to return the same records of the first select query.
I think you can do what you want using row_number():
SELECT UNI_ID, UNI_TEACH_NAME, CAL_UPDATE, CAL_STATUS
FROM (SELECT U.UNI_ID, U.UNI_TEACH_NAME, C.CAL_UPDATE, C.CAL_STATUS,
ROW_NUMBER() OVER (PARTITION BY U.UNI_ID, TRUNC(C.CAL_UPDATE) ORDER BY C.CAL_UPDATE DESC) as seqnum
FROM UNIVERSITIES U LEFT JOIN
CALENDAR C
ON U.UNI_ID = C.CAL_ID AND C.CAL_NAME = 'REGISTRED'
WHERE U.UNI_MASTER = 'SO88' AND
U.UNI_STAT= 'OK'
) UC
WHERE seqnum = 1;
I have to guess where the columns come from, because the question is not clear. Any filtering columns from CALENDAR should be in the ON clause if you are using a LEFT JOIN.
You can replace the last part of the query, while aliasing the MAX(cal_update) with cal_update , as
ORDER BY cal_update DESC
FETCH FIRST 1 ROW WITH TIES
for DB version 12c+ to descendingly order by the concerned column in order to pick the record with the latest value for that column.
WITH TIES option stand for bringing all records with the same datetime values, might be replaced with ONLY in order to bring only one row even for those cases occur.
The column call_status(within the select list) should be removed which's a non- aggregated column
As an alternative to a subquery and rank, you could use KEEP...LAST :
SELECT U.UNI_ID,
U.UNI_TEACH_NAME,
MAX(C.CAL_UPDATE) AS CAL_UPDATE,
MAX(C.CAL_STATUS) KEEP (DENSE_RANK LAST ORDER BY C.CAL_UPDATE) AS CAL_STATUS
FROM UNIVERSITIES U
LEFT JOIN CALENDAR C
ON U.UNI_ID = C.CAL_ID
AND C.CAL_NAME = 'REGISTRED'
WHERE U.UNI_MASTER = 'SO88'
AND U.UNI_STAT= 'OK'
GROUP BY U.UNI_ID,
U.UNI_TEACH_NAME,
TRUNC(C.CAL_UPDATE)
I've moved the CAL_NAME check into the outer join's ON clause; if it's in the WHERE clause then it will effectively turn it back into an inner join. So this will get one row per university per day that the calendar was updated: "I want just to get the max date for that date". And it will show nulls for the calendar fields if there is no matching calendar, since it's an outer join.
If you actually only want the latest update on any day then just remove the TRUNC(C.CAL_UPDATE) from the grouping:
SELECT U.UNI_ID,
U.UNI_TEACH_NAME,
MAX(C.CAL_UPDATE) AS CAL_UPDATE,
MAX(C.CAL_STATUS) KEEP (DENSE_RANK LAST ORDER BY C.CAL_UPDATE) AS CAL_STATUS
FROM UNIVERSITIES U
LEFT JOIN CALENDAR C
ON U.UNI_ID = C.CAL_ID
AND C.CAL_NAME = 'REGISTRED'
WHERE U.UNI_MASTER = 'SO88'
AND U.UNI_STAT= 'OK'
GROUP BY U.UNI_ID,
U.UNI_TEACH_NAME
db<>fiddle with some made-up data; and also (just for fun) showing Gordon's query with the calendar name clause in both places to show the difference, and to show this gets the same result for that dummy data. (And an 18c version which shows Barbaros' too; getting back a single row.)

Oracle SQL query, getting a a maximum of a sum

Hey, guys. I'm struggling to solve one query, just cant get around it.
Basically, I got a some tables from data mart :
DimTheatre(TheatreId(PK), TheatreNo, Name, Address, MainTel);
DimTrow(TrowId(PK), TrowNo, RowName, RowType);
DimProduction(ProductionId(PK), ProductionNo, Title, ProductionDir, PlayAuthor);
DimTime(TimeId(PK), Year, Month, Day, Hour);
TicketPurchaseFact( TheatreId(FK), TimeId(FK), TrowId(FK),
PId(FK), TicketAmount);
The thing I'm trying to achieve in oracle is - I need to retrieve the most popular row type in each theatre by value of ticket sale
Thing I'm doing now is :
SELECT dthr.theatreid, dthr.name, max(tr.rowtype) keep(dense_rank last order
by tpf.ticketamount), sum(tpf.ticketamount) TotalSale
FROM TicketPurchaseFact tpf, DimTheatre dthr, DimTrow tr
WHERE dthr.theatreid = tpf.theatreid
GROUP BY dthr.theatreid, dthr.name;
It does give me the output, but the 'TotalSale' column is totally out of place, it gives much way higher numbers than they should be.. How could I approach this issue :) ?
I am not sure how MAX() KEEP () would help your case if I understand the problem correctly. But the below approach should work:
SELECT x.theatreid, x.name, x.rowtype, x.total_sale
FROM
(SELECT z.theatreid, z.name, z.rowtype, z.total_sale, DENSE_RANK() OVER (PARTITION BY z.theatreid, z.name ORDER BY z.total_sale DESC) as popular_row_rank
FROM
(SELECT dthr.theatreid, dthr.name, tr.rowtype, SUM(tpf.ticketamount) as total_sale
FROM TicketPurchaseFact tpf, DimTheatre dthr, DimTrow tr
WHERE dthr.theatreid = tpf.theatreid AND tr.trowid = tpf.trowid
GROUP BY dthr.theatreid, dthr.name, tr.rowtype) z
) x
WHERE x.popular_row_rank = 1;
You want the row type per theatre with the highest ticket amount. So join purchases and rows and then aggregate to get the total per rowtype. Use RANK to rank your row types per theatre and stay with the best ranked ones. At last join with the theatre table to get the theatre name.
select
theatreid,
t.name,
tr.trowid
from
(
select
p.theatreid,
r.rowtype,
rank() over (partition by p.theatreid order by sum(p.ticketamount) desc) as rn
from ticketpurchasefact p
join dimtrow r using (trowid)
group by p.theatreid, r.rowtype
) tr
join dimtheatre t using (theatreid)
where tr.rn = 1;

SQL - combining consecutive months of the same block with same quantity

This question will seem very easy at first but as you start writing the complexity hits. I have attached a picture blow with the result set of my SQL. The result is 39 rows. I need to combine all the consecutive rows of the same block with the same value. With this example, the end result should be 29 rows where all the red box'd rows below should be consolidated into 1 row.
so for example the first redbox with quantity = 40 should combine into 1 row with term_start = 2017-06-01 and term_end = 2017-08-01
Here's my Code
SELECT
pp.position
, term_start = pq.begtime
, term_end = pq.endtime
, quantity = CONVERT(VARCHAR,convert(double precision, pq.energy))
, block = p.block
FROM trade t
INNER JOIN position p on p.trade = t.trade
INNER JOIN powerposition pp on p.position = pp.position
INNER JOIN powerquantity pq on pq.position = pp.position
AND pq.posdetail = pp.posdetail
AND pq.quantitystatus = 'TRADE'
WHERE 1=1
AND p.positionmode = 'PHYSICAL'
AND t.collaboration = 13119572
I've been stuck on this problem for three days straight now. I've explored using CTEs and Row_Number() over () but with no success. Any help would be greatly appreciated!!
You are looking for consecutive values. Here is one way, using a difference of row numbers to identify a group:
with t as (<your query here>)
select min(term_start), max(term_end), block, quantity
from (select t.*,
(row_number() over (partition by block order by position) -
row_number() over (partition by quantity, block order by position)
) as grp
from t
) t
group by quantity, grp, block;

How to get a percentile rank based on a computation

there are four tables as :
T_SALES has columns like
CUST_KEY,
ITEM_KEY,
SALE_DATE,
SALES_DLR_SALES_QTY,
ORDER_QTY.
T_CUST has columns like
CUST_KEY,
CUST_NUM,
PEER_GRP_ID
T_PEER_GRP has columns like
PEER_GRP_ID,
PEER_GRP_DESC,
PRNT_PEER_GRP_ID
T_PRNT_PEEER has columns like
PRNT_PEER_GRP_ID,
PRNT_PEER_DESC
Now for the above tables, i need to generate a percentile rank of the customer based on the computation fillrate = SALES_QTY / ORDER_QTY * 100 by peer group within a parent peer.
could someone please help on this?
You can use the analytic function PERCENT_RANK() to calculate the percentile rank, as below:
SELECT
t_s.cust_key,
t_c.cust_num,
PERCENT_RANK() OVER (ORDER BY (t_s.SALES_DLR_SALES_QTY / ORDER_QTY) DESC) as pr
FROM t_sales t_s
INNER JOIN t_cust t_c ON t_s.cust_key = t_c.cust_key
ORDER BY pr;
Reference:
PERCENT_RANK on Oracle® Database SQL Reference
If by "percentile rank" you mean "percent rank" (documented here), then the harder part is the joins. I think this is the basic data that you want for the percentile rank:
select t.PEER_GRP_ID, t.PRNT_PEER_GRP_ID,
sum(SALES_DLR_SALES_QTY * ORDER_QTY) as total
from t_sales s join
t_customers c
on s.CUST_KEY = c.cust_key join
t_peer_grp t
on t.PEER_GRP_ID = c.PEER_GRP_ID
group by t.PEER_GRP_ID, t.PRNT_PEER_GRP_ID;
You can then calculate the percentile (0 to 100) as:
select t.PEER_GRP_ID, t.PRNT_PEER_GRP_ID,
sum(SALES_DLR_SALES_QTY * ORDER_QTY) as total,
percentile_rank() over (partition by t.PRNT_PEER_GRP_ID
order by sum(SALES_DLR_SALES_QTY * ORDER_QTY)
)
from t_sales s join
t_customers c
on s.CUST_KEY = c.cust_key join
t_peer_grp t
on t.PEER_GRP_ID = c.PEER_GRP_ID
group by t.PEER_GRP_ID, t.PRNT_PEER_GRP_ID;
Note that this mixes analytic functions with aggregation functions. This can look awkward when you first learn about it.