Trying to do sum math using Partition By and Row_Number - sql

I'm trying to add a few columns to my table and I'm a bit of the way there but not clear why it's failing. This is an example starting table...
Date Name Amount
1/2/2015 Andy 148
2/5/2015 Andy 188
2/11/2015 Andy 154
1/15/2015 John 136
2/5/2015 John 176
1/7/2015 John 134
1/19/2015 John 251
2/21/2015 Carlos 120
2/15/2015 Carlos 211
1/8/2015 Carlos 120
1/2/2014 Andy 151
2/5/2014 Andy 281
2/11/2014 Andy 298
1/15/2014 John 292
2/5/2014 John 134
1/7/2014 John 281
1/19/2014 John 101
2/21/2014 Carlos 137
2/15/2014 Carlos 108
1/8/2014 Carlos 292
I want to take the above table and...
1) Sort by Year, Name, Then Value
2) Based on #1, Add the "Ordered" column which gives a number for each set of Year and Name where Value is set sorted ascending
3) Multiplied column is multiplying Amount by Ordered
4) Sum the multiplied column and a the sum to each set
Result...
Date Year Name Amount Ordered Multiplied Sum
1/2/2014 2014 Andy 151 1 151 1607
2/5/2014 2014 Andy 281 2 562 1607
2/11/2014 2014 Andy 298 3 894 1607
2/15/2014 2014 Carlos 108 1 108 1258
2/21/2014 2014 Carlos 137 2 274 1258
1/8/2014 2014 Carlos 292 3 876 1258
1/19/2014 2014 John 101 1 101 2380
2/5/2014 2014 John 134 2 268 2380
1/7/2014 2014 John 281 3 843 2380
1/15/2014 2014 John 292 4 1168 2380
1/2/2015 2015 Andy 148 1 148 1020
2/11/2015 2015 Andy 154 2 308 1020
2/5/2015 2015 Andy 188 3 564 1020
1/8/2015 2015 Carlos 120 1 120 993
2/21/2015 2015 Carlos 120 2 240 993
2/15/2015 2015 Carlos 211 3 633 993
1/7/2015 2015 John 134 1 134 1938
1/15/2015 2015 John 136 2 272 1938
2/5/2015 2015 John 176 3 528 1938
1/19/2015 2015 John 251 4 1004 1938
I have everything but the last column as I keep getting the error...
'Invalid expression near Row_Number'.
SQL for 'Ordered'...
ROW_NUMBER() OVER ( Partition BY Name, DATEPART(YEAR, Date) ORDER BY Amount ) AS 'Ordered'
SQL for 'Multiplied'...
Amount * Ordered AS Multiplied
Now I could be thinking of this naively but I thought I could just do add a line like this...
sum(Multiplied) OVER ( Partition BY Name, DATEPART(YEAR, Date) ORDER BY Amount ) AS 'Sum'
But I keep getting the error mentioned. Any ideas how to handle? I'm welcome to hearing other ways of handling the data. I only care about the last column

If your syntax worked, it would produce a cumulative sum. That doesn't appear to be what you want.
I think you can do what you want with a subquery:
select t.*,
(seqnum * amount) as multiplied,
sum(seqnum * amount) over (partition by name, year(date)) as thesum
from (select t.*,
row_number() over (partition by name, year(date) order by date) as seqnum
from table t
) t;

Related

how to select a value based on multiple criteria

I'm trying to select some values based on some proprietary data, and I just changed the variables to reference house prices.
I am trying to get the total offers for houses where they were sold at the bid or at the ask price, with offers under 15 and offers * sale price less than 5,000,000.
I then want to get the total number of offers for each neighborhood on each day, but instead I'm getting the total offers across each neighborhood (n1 + n2 + n3 + n4 + n5) across all dates and the total offers in the dataset across all dates.
My current query is this:
SELECT DISTINCT(neighborhood),
DATE(date_of_sale),
(SELECT SUM(offers)
FROM `big_query.a_table_name.houseprices`
WHERE ((offers * accepted_sale_price < 5000000)
AND (offers < 15)
AND (house_bid = sale_price OR
house_ask = sale_price))) as bid_ask_off,
(SELECT SUM(offers)
FROM `big_query.a_table_name.houseprices`) as
total_offers,
FROM `big_query.a_table_name.houseprices`
GROUP BY neighborhood, DATE(date_of_sale) LIMIT 100
Which I am expecting a result like, with date being repeated throughout as d1, d2, d3, etc.:
but am instead receiving
I'm aware that there are some inherent problems with what I'm trying to select / group, but I'm not sure what to google or what tutorials to look at in order to perform this operation.
It's querying quite a bit of data, and I want to keep costs down, as I've already racked up a smallish bill on queries.
Any help or advice would be greatly appreciated, and I hope I've provided enough information.
Here is a sample dataframe.
neighborhood date_of_sale offers accepted_sale_price house_bid house_ask
bronx 4/1/2022 3 323 320 323
manhattan 4/1/2022 4 244 230 244
manhattan 4/1/2022 8 856 856 900
queens 4/1/2022 15 110 110 135
brooklyn 4/2/2022 12 115 100 115
manhattan 4/2/2022 9 255 255 275
bronx 4/2/2022 6 330 300 330
queens 4/2/2022 10 405 395 405
brooklyn 4/2/2022 4 254 254 265
staten_island 4/3/2022 2 442 430 442
staten_island 4/3/2022 13 195 195 225
bronx 4/3/2022 4 650 650 690
manhattan 4/3/2022 2 286 266 286
manhattan 4/3/2022 6 356 356 400
staten_island 4/4/2022 4 361 361 401
staten_island 4/4/2022 5 348 348 399
bronx 4/4/2022 8 397 340 397
manhattan 4/4/2022 9 333 333 394
manhattan 4/4/2022 11 392 325 392
I think that this is what you need.
As we group by neighbourhood we do not need DISTINCT.
We take sum(offers) for total_offers directly from the table and bids from a sub-query which we join to so that it is grouped by neighbourhood.
SELECT
h.neighborhood,
DATE(h.date_of_sale) AS date_,
s.bids AS bid_ask_off,
SUM(h.offers) AS total_offers,
FROM
`big_query.a_table_name.houseprices` h
LEFT JOIN
(SELECT
neighborhood,
SUM(offers) AS bids
FROM
`big_query.a_table_name.houseprices`
WHERE offers * accepted_sale_price < 5000000
AND offers < 15
AND (house_bid = sale_price OR
house_ask = sale_price)
GROUP BY neighborhood) s
ON h.neighborhood = s.neighborhood
GROUP BY
h.neighborhood,
DATE(date_of_sale),
s.bids
LIMIT 100;
Or the following which modifies more the initial query but may be more like what you need.
SELECT
h.neighborhood,
DATE(h.date_of_sale) AS date_,
s.bids AS bid_ask_off,
SUM(h.offers) AS total_offers,
FROM
`big_query.a_table_name.houseprices` h
LEFT JOIN
(SELECT
date_of_sale dos,
neighborhood,
SUM(offers) AS bids
FROM
`big_query.a_table_name.houseprices`
WHERE offers * accepted_sale_price < 5000000
AND offers < 15
AND (house_bid = sale_price OR
house_ask = sale_price)
GROUP BY
neighborhood,
date_of_sale) s
ON h.neighborhood = s.neighborhood
AND h.date_of_sale = s.dos
GROUP BY
h.neighborhood,
DATE(date_of_sale),
s.bids
LIMIT 100;

In Azure Data bricks I want to get start dates of every week with week numbers from datetime column

This is a sample Data Frame
Date Items_Sold
12/29/2019 10
12/30/2019 20
12/31/2019 30
1/1/2020 40
1/2/2020 50
1/3/2020 60
1/4/2020 35
1/5/2020 56
1/6/2020 34
1/7/2020 564
1/8/2020 6
1/9/2020 45
1/10/2020 56
1/11/2020 45
1/12/2020 37
1/13/2020 36
1/14/2020 479
1/15/2020 47
1/16/2020 47
1/17/2020 578
1/18/2020 478
1/19/2020 3578
1/20/2020 67
1/21/2020 578
1/22/2020 478
1/23/2020 4567
1/24/2020 7889
1/25/2020 8999
1/26/2020 99
1/27/2020 66
1/28/2020 678
1/29/2020 889
1/30/2020 990
1/31/2020 58585
2/1/2020 585
2/2/2020 555
2/3/2020 56
2/4/2020 66
2/5/2020 66
2/6/2020 6634
2/7/2020 588
2/8/2020 2588
2/9/2020 255
I am running this query
%sql
use my_items_table;
select weekofyear(Date), count(items_sold) as Sum
from my_items_table
where year(Date)=2020
group by weekofyear(Date)
order by weekofyear(Date)
I am getting this output. (IMP: I have added random values in Sum)
Week Sum
1 | 300091
2 | 312756
3 | 309363
4 | 307312
5 | 310985
6 | 296889
7 | 315611
But I want in which with week number one column should hold a start date of each week. Like this
Start_Date Week Sum
12/29/2019 1 300091
1/5/2020 2 312756
1/12/2020 3 309363
1/19/2020 4 307312
1/26/2020 5 310985
2/2/2020 6 296889
2/9/2020 7 315611
I am running the query on Azure Data Bricks.
If you have data for all days, then just use min():
select min(date), weekofyear(Date), count(items_sold) as Sum
from my_items_table
where year(Date) = 2020
group by weekofyear(Date)
order by weekofyear(Date);
Note: The year() is the calendar year starting on Jan 1. You are not going to get dates from other years using this query. If that is an issue, I would suggest that you ask a new question asking how to get the first day for the first week of the year.

database system

I need help in calculating the length of days based on Discharge_date and admission_date for all the admission_id. thanks
ADMISSION_ID PATIENT_ID ADMISSION_ EXP ADMITTED_BY WAR DISCHARGE_
------------ ---------- ---------- --- ----------- --- ----------
205 101 02/02/2011 HB 114 P 16/02/2011
275 101 01/09/2010 HY 115 L 01/11/2010
286 101 03/05/2016 AR 116 A 03/07/2016
303 101 03/04/2018 LA 125 F 13/05/2018
298 103 23/01/2016 TS 114 L 24/04/2016
299 103 23/03/2016 AP 114 L 23/04/2016
305 103 23/04/2018 HT 125 F 29/05/2018
321 103 13/05/2018 AR 125 F 23/05/2018
283 105 03/12/2015 AR 116 A 05/12/2015
278 105 01/01/2011 HB 115 P 30/01/2011
307 105 03/04/2018 TS 125 F 13/05/2018
In Oracle, you can simply subtract two dates and get the difference in days.
Similar question: DATEDIFF function in Oracle

How I select record that not appear in another table

Table: Movie
mID title year director
101 Gone with the Wind 1939 Victor Fleming
102 Star Wars 1977 George Lucas
103 The Sound of Music 1965 Robert Wise
104 E.T. 1982 Steven Spielberg
105 Titanic 1997 James Cameron
106 Snow White 1937 <null>
107 Avatar 2009 James Cameron
108 Raiders of the Lost Ark 1981 Steven Spielberg
Table: Rating
rID mID stars ratingDate
201 101 2 2011-01-22
201 101 4 2011-01-27
202 106 4 <null>
203 103 2 2011-01-20
203 108 4 2011-01-12
203 108 2 2011-01-30
204 101 3 2011-01-09
205 103 3 2011-01-27
205 104 2 2011-01-22
205 108 4 <null>
206 107 3 2011-01-15
206 106 5 2011-01-19
207 107 5 2011-01-20
208 104 3 2011-01-02
I need to fetch movies which are not rate yet. In this case Titanic (mID 105) and Star Wars (mID 102) never get rate in rating table.
I figured out it with
select distinct movie.title from movie,rating where
rating.mid!=movie.mid except select distinct movie.title from
movie,rating where rating.mid=movie.mid
however I think it might have better (easier/cleaner) way to do.
Simple:
SELECT Movies.* FROM Movies LEFT JOIN Rating ON Movies.mID = Rating.mID WHERE Rating.mID IS NULL
If I understood your question properly, that looks like textbook application of outer joins.
You could do it like this:
SELECT * FROM Movie WHERE mid NOT IN (SELECT DISTINCT(mid) FROM Rating)
Basically it will select all records from the movie table that are not in the rating table, linking them on the 'mid' column, which I am assuming is a unique identifier.
I will add another possibility.
Select [list columns here]
from Movie m
where NOT exists (SELECT * FROM RATING r where m.mid = r.mid)

SQL query self join

I am working on a query for a report in Oracle 10g.
I need to generate a short list of each course along with the number of times they were offered in the past year (including ones that weren't actually offered).
I created one query
SELECT coursenumber, count(datestart) AS Offered
FROM class
WHERE datestart BETWEEN (sysdate-365) AND sysdate
GROUP BY coursenumber;
Which produces
COURSENUMBER OFFERED
---- ----------
ST03 2
PD01 1
AY03 2
TB01 4
This query is all correct. However ideally I want it to list those along with COURSENUMBER HY and CS in the left column as well with 0 or null as the OFFERED value. I have a feeling this involves a join of sorts, but so far what I have tried doesn't produce the classes with nothing offered.
The table normally looks like
REFERENCE_NO DATESTART TIME TIME EID ROOMID COURSENUMBER
------------ --------- ---- ---- ---------- ---------- ----
256 03-MAR-11 0930 1100 2 2 PD01
257 03-MAY-11 0930 1100 12 7 PD01
258 18-MAY-11 1230 0100 12 7 PD01
259 24-OCT-11 1930 2015 6 2 CS01
260 17-JUN-11 1130 1300 6 4 CS01
261 25-MAY-11 1900 2000 13 6 HY01
262 25-MAY-11 1900 2000 13 6 HY01
263 04-APR-11 0930 1100 13 5 ST03
264 13-SEP-11 1930 2100 6 4 ST03
265 05-NOV-11 1930 2100 6 5 ST03
266 04-FEB-11 1430 1600 6 5 ST03
267 02-JAN-11 0630 0700 13 1 TB01
268 01-FEB-11 0630 0700 13 1 TB01
269 01-MAR-11 0630 0700 13 1 TB01
270 01-APR-11 0630 0700 13 1 TB01
271 01-MAY-11 0630 0700 13 1 TB01
272 14-MAR-11 0830 0915 4 3 AY03
273 19-APR-11 0930 1015 4 3 AY03
274 17-JUN-11 0830 0915 14 3 AY03
275 14-AUG-09 0930 1015 14 3 AY03
276 03-MAY-09 0830 0915 14 3 AY03
SELECT
coursenumber,
COUNT(CASE WHEN datestart BETWEEN (sysdate-365) AND sysdate THEN 1 END) AS Offered
FROM class
GROUP BY coursenumber;
So, as you can see, this particular problem doesn't need a join.
I think something like this should work for you, by just doing it as a subquery.
SELECT distinct c.coursenumber,
(SELECT COUNT(*)
FROM class
WHERE class.coursenumber = c.coursenumber
AND datestart BETWEEN (sysdate-365) AND sysdate
) AS Offered
FROM class c
I like jschoen's answer better for this particular case (when you want one and only one row and column out of the subquery for each row of the main query), but just to demonstrate another way to do it:
select t1.coursenumber, nvl(t2.cnt,0)
from class t1 left outer join (
select coursenumber, count(*) cnt
from class
where datestart between (sysdate-365) AND sysdate
group by coursenumber
) t2 on t1.coursenumber = t2.coursenumber