how to select a value based on multiple criteria - sql

I'm trying to select some values based on some proprietary data, and I just changed the variables to reference house prices.
I am trying to get the total offers for houses where they were sold at the bid or at the ask price, with offers under 15 and offers * sale price less than 5,000,000.
I then want to get the total number of offers for each neighborhood on each day, but instead I'm getting the total offers across each neighborhood (n1 + n2 + n3 + n4 + n5) across all dates and the total offers in the dataset across all dates.
My current query is this:
SELECT DISTINCT(neighborhood),
DATE(date_of_sale),
(SELECT SUM(offers)
FROM `big_query.a_table_name.houseprices`
WHERE ((offers * accepted_sale_price < 5000000)
AND (offers < 15)
AND (house_bid = sale_price OR
house_ask = sale_price))) as bid_ask_off,
(SELECT SUM(offers)
FROM `big_query.a_table_name.houseprices`) as
total_offers,
FROM `big_query.a_table_name.houseprices`
GROUP BY neighborhood, DATE(date_of_sale) LIMIT 100
Which I am expecting a result like, with date being repeated throughout as d1, d2, d3, etc.:
but am instead receiving
I'm aware that there are some inherent problems with what I'm trying to select / group, but I'm not sure what to google or what tutorials to look at in order to perform this operation.
It's querying quite a bit of data, and I want to keep costs down, as I've already racked up a smallish bill on queries.
Any help or advice would be greatly appreciated, and I hope I've provided enough information.
Here is a sample dataframe.
neighborhood date_of_sale offers accepted_sale_price house_bid house_ask
bronx 4/1/2022 3 323 320 323
manhattan 4/1/2022 4 244 230 244
manhattan 4/1/2022 8 856 856 900
queens 4/1/2022 15 110 110 135
brooklyn 4/2/2022 12 115 100 115
manhattan 4/2/2022 9 255 255 275
bronx 4/2/2022 6 330 300 330
queens 4/2/2022 10 405 395 405
brooklyn 4/2/2022 4 254 254 265
staten_island 4/3/2022 2 442 430 442
staten_island 4/3/2022 13 195 195 225
bronx 4/3/2022 4 650 650 690
manhattan 4/3/2022 2 286 266 286
manhattan 4/3/2022 6 356 356 400
staten_island 4/4/2022 4 361 361 401
staten_island 4/4/2022 5 348 348 399
bronx 4/4/2022 8 397 340 397
manhattan 4/4/2022 9 333 333 394
manhattan 4/4/2022 11 392 325 392

I think that this is what you need.
As we group by neighbourhood we do not need DISTINCT.
We take sum(offers) for total_offers directly from the table and bids from a sub-query which we join to so that it is grouped by neighbourhood.
SELECT
h.neighborhood,
DATE(h.date_of_sale) AS date_,
s.bids AS bid_ask_off,
SUM(h.offers) AS total_offers,
FROM
`big_query.a_table_name.houseprices` h
LEFT JOIN
(SELECT
neighborhood,
SUM(offers) AS bids
FROM
`big_query.a_table_name.houseprices`
WHERE offers * accepted_sale_price < 5000000
AND offers < 15
AND (house_bid = sale_price OR
house_ask = sale_price)
GROUP BY neighborhood) s
ON h.neighborhood = s.neighborhood
GROUP BY
h.neighborhood,
DATE(date_of_sale),
s.bids
LIMIT 100;
Or the following which modifies more the initial query but may be more like what you need.
SELECT
h.neighborhood,
DATE(h.date_of_sale) AS date_,
s.bids AS bid_ask_off,
SUM(h.offers) AS total_offers,
FROM
`big_query.a_table_name.houseprices` h
LEFT JOIN
(SELECT
date_of_sale dos,
neighborhood,
SUM(offers) AS bids
FROM
`big_query.a_table_name.houseprices`
WHERE offers * accepted_sale_price < 5000000
AND offers < 15
AND (house_bid = sale_price OR
house_ask = sale_price)
GROUP BY
neighborhood,
date_of_sale) s
ON h.neighborhood = s.neighborhood
AND h.date_of_sale = s.dos
GROUP BY
h.neighborhood,
DATE(date_of_sale),
s.bids
LIMIT 100;

Related

How to return the top 3 spending customers per country?

I'm trying to return the top 3 spending customers per country for a table like this:
customer_id
country
spend
159
China
45
152
China
8
159
China
21
160
China
6
161
China
9
162
China
93
152
China
3
168
Germany
91
169
Germany
101
170
Germany
38
171
Germany
17
154
Germany
11
154
Germany
50
167
Germany
63
168
Germany
1
153
Japan
7
163
Japan
58
164
Japan
44
153
Japan
19
164
Japan
10
165
Japan
15
166
Japan
24
153
Japan
105
I've tried the below code but it's not returning the correct results.
SELECT customer_id, country, spend FROM (SELECT customer_id, country, spend,
#country_rank := IF(#current_country = country, #country_rank + 1, 1)
AS country_rank,
#current_country := country
FROM table1
ORDER BY country ASC, spend DESC) ranked_rows
WHERE country_rank<=3;
Since some customers are also repeat customers, I want to make sure that it's the sum of spend per customer that's being taken into account.
You appear to be using MySQL. If you're running version 8 or later, then just use ROW_NUMBER() here:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY country ORDER BY spend DESC) rn
FROM table1
)
SELECT customer_id, country, spend
FROM cte
WHERE rn <= 3;

Get nearest date column value from another table in SQL Server

I have two tables A and B,
Table A
PstngDate WorkingDayOutput
12/1/2020 221
12/3/2020 327
12/4/2020 509
12/5/2020 418
12/7/2020 390
12/8/2020 431
12/9/2020 244
12/10/2020 246
12/11/2020 314
12/12/2020 301
12/14/2020 411
12/15/2020 530
12/16/2020 554
12/17/2020 300
12/18/2020 375
12/23/2020 402
12/24/2020 302
12/25/2020 269
12/26/2020 382
12/28/2020 608
Table B
PstngDate HolidayOutput isWorkingDay
12/2/2020 20 0
12/6/2020 24 0
12/13/2020 31 0
12/19/2020 82 0
12/22/2020 507 0
12/27/2020 537 0
Expected output:
PstngDate WorkingDayOutput HolidayOutput
12/1/2020 221 20
12/3/2020 327
12/4/2020 509
12/5/2020 418 24
12/7/2020 390
12/8/2020 431
12/9/2020 244
12/10/2020 246
12/11/2020 314
12/12/2020 301 31
12/14/2020 411
12/15/2020 530
12/16/2020 554
12/17/2020 300
12/18/2020 375 589
12/23/2020 402
12/24/2020 302
12/25/2020 269
12/26/2020 382 537
12/28/2020 608
I want to join TableB to TableA with nearest lesser date column. If you see Expectedoutput table, day 18 row of holidayoutput column is taking sum of day19 and day22 of table B.
I want to join TableB to TableA with nearest lesser date column
This sounds like a lateral join:
select a.*, coalesce(b.holidayquantity, 0) as holidayquantity
from a
outer apply (
select top (1) b.*
from b
where b.pstng_date >= a.pstng_date
order by b.pstng_date
) b
You can use self left join as follows:
Select pstng_date, workingDayQuantity,
HolidayQuantity,
workingDayQuantity + HolidayQuantity as total
From
(Select a.*, b.HolidayQuantity,
Row_number() over (partirion by a.psrng_date order by b.pstng_date) ad rn
From tablea a join tableb b On b.pstng_date > a.pstng_date) t
Where rn=1

Sqlite substract sums (with group by) with JOIN and duplicates

I previously found the solution to my problem but unfortunately I lost files on my harddrive and I can't find the statement I managed to produce.
I have 2 tables T2REQ and T2STOCK, both have 2 columns (typeID and quantity) and my problem reside in the fact that I can have multiple occurences of SAME typeID in BOTH tables.
What I'm trying to do is SUM(QUANTITY) grouped by typeID and substract the values of T2STOCK from T2REQ but since I have multiple occurences of same typeID in both tables, the SUM I get is multiplied by the number of occurences of typeID.
Here's a sample of T2REQ (take typeID 11399 for example):
typeID quantity
---------- ----------
34 102900
35 10500
36 3220
37 840
11399 700
563 140
9848 140
11486 28
11688 700
11399 390
4393 130
9840 390
9842 390
11399 390
11483 19.5
11541 780
And this is a sample of T2STOCK table :
typeID quantity
---------- ----------
9842 1921
9848 2400
11399 1700
11475 165
11476 27
11478 28
11481 34
11483 122
11476 2
And this is where I'm at for now, I know that the SUM(t2stock.quantity) is affected (multiplied) because of the JOIN 1 = 1 but whatever I tried, I'm not doing it in the right order:
SELECT
t2req.typeID, sum(t2req.quantity), sum(t2stock.quantity),
sum(t2req.quantity) - sum(t2stock.quantity) as diff
FROM t2req JOIN t2stock ON t2req.typeID = t2stock.typeID
GROUP BY t2req.typeID
ORDER BY diff DESC;
typeID sum(t2req.quantity) sum(t2stock.quantity) diff
---------- ------------------- --------------------- ----------
563 140 30 110
11541 780 780 0
11486 28 40 -12
11483 19.5 122 -102.5
9840 390 1000 -610
40 260 940 -680
9842 390 1921 -1531
9848 140 2400 -2260
11399 1480 5100 -3620
39 650 7650 -7000
37 1230 116336 -115106
36 28570 967098 -938528
35 33770 2477820 -2444050
34 102900 2798355 -2695455
You can see that SUM(t2req) for typeID 11399 is correct : 1480
And you can see that the SUM(t2stock) for typeID 11399 is not correct : 5100 instead of 1700 (which is 5100 divided by 3, the number of occurences in t2req)
What would be the best way to avoid multiplications because of multiple typeIDs (in both tables) with the JOIN for my sum substract ?
Sorry for the wall of text, just trying to explain as best as I can since english is not my mother tongue.
Thanks a lot for your help.
You can aggregate before join:
SELECT
t2req.typeID,
t2req.quantity,
t2stock.quantity,
t2req.quantity - t2stock.quantity as diff
FROM
(SELECT TypeID, SUM(Quantity) Quantity FROM t2req GROUP BY TypeID) t2req JOIN
(SELECT TypeID, SUM(Quantity) Quantity FROM t2stock GROUP BY TypeID) t2stock
ON t2req.typeID = t2stock.typeID
ORDER BY diff DESC;
Fiddle sample: http://sqlfiddle.com/#!7/06711/5
You can't do this in a single aggregation:
SELECT
COALESCE(r.typeID, s.typeID) AS typeID,
COALESCE(r.quantity, 0) AS req_quantity,
COALESCE(s.quantity, 0) AS stock_quantity,
COALESCE(r.quantity, 0) - COALESCE(s.quantity, 0) AS diff
FROM (
SELECT rr.typeID, SUM(rr.quantity) AS quantity
FROM t2req rr
GROUP BY rr.typeID
) r
CROSS JOIN (
SELECT ss.typeID, SUM(ss.quantity) AS quantity
FROM t2stock ss
GROUP BY ss.typeID
) s ON r.typeID = s.typeID
ORDER BY 4 DESC;

Calculating difference from previous record

May I ask for your help with the following please ?
I am trying to calculate a change from one record to the next in my results. It will probably help if I show you my current query and results ...
SELECT A.AuditDate, COUNT(A.NickName) as [TAccounts],
SUM(IIF((A.CurrGBP > 100 OR A.CurrUSD > 100), 1, 0)) as [Funded]
FROM Audits A
GROUP BY A.AuditDate;
The query gives me these results ...
AuditDate D/M/Y TAccounts Funded
--------------------------------------------
30/12/2011 506 285
04/01/2012 514 287
05/01/2012 514 288
06/01/2012 516 288
09/01/2012 520 289
10/01/2012 522 289
11/01/2012 523 290
12/01/2012 524 290
13/01/2012 526 291
17/01/2012 531 292
18/01/2012 532 292
19/01/2012 533 293
20/01/2012 537 295
Ideally, the results I would like to get, would be similar to the following ...
AuditDate D/M/Y TAccounts TChange Funded FChange
------------------------------------------------------------------------
30/12/2011 506 0 285 0
04/01/2012 514 8 287 2
05/01/2012 514 0 288 1
06/01/2012 516 2 288 0
09/01/2012 520 4 289 1
10/01/2012 522 2 289 0
11/01/2012 523 1 290 1
12/01/2012 524 1 290 0
13/01/2012 526 2 291 1
17/01/2012 531 5 292 1
18/01/2012 532 1 292 0
19/01/2012 533 1 293 1
20/01/2012 537 4 295 2
Looking at the row for '17/01/2012', 'TChange' has a value of 5 as the 'TAccounts' has increased from previous 526 to 531. And the 'FChange' would be based on the 'Funded' field. I guess something to be aware of is the fact that the previous row to this example, is dated '13/01/2012'. What I mean is, there are some days where I have no data (for example over weekends).
I think I need to use a SubQuery but I am really struggling to figure out where to start. Could you show me how to get the results I need please ?
I am using MS Access 2010
Many thanks for your time.
Johnny.
Here is one approach you could try...
SELECT B.AuditDate,B.TAccounts,
B.TAccount -
(SELECT Count(NickName) FROM Audits WHERE AuditDate=B.PrevAuditDate) as TChange,
B.Funded -
(SELECT Count(*) FROM Audits WHERE AuditDate=B.PrevAuditDate AND (CurrGBP > 100 OR CurrUSD > 100)) as FChange
FROM (
SELECT A.AuditDate,
(SELECT Count(NickName) FROM Audits WHERE AuditDate=A.AuditDate) as TAccounts,
(SELECT Count(*) FROM Audits WHERE (CurrGBP > 100 OR CurrUSD > 100)) as Funded,
(SELECT Max(AuditDate) FROM Audits WHERE AuditDate<A.AuditDate) as PrevAuditDate
FROM
(SELECT DISTINCT AuditDate FROM Audits) AS A) AS B
Instead of using a Group By I've used subquerys to get both TAccounts and Funded, as well as the Previous Audit Date, which is then used on the main SELECT statement to get TAccounts and Funded again but this time for the previous date, so that any required calculation can be done against them.
But I would imagine this may be slow to process
It's a shame MS never made this type of thing simple in Access, how many rows are you working with on your report?
If it's under 65K then I would suggest dumping the data on to an Excel spreadsheet and using a simple formula to calculate the different between rows.
You can try something like the following (sql is untested and will require some changes)
SELECT
A.AuditDate,
A.TAccounts,
A.TAccounts - B.TAccounts AS TChange,
A.Funded,
A.Funded - B.Funded AS FChange
FROM
( SELECT
ROW_NUMBER() OVER (ORDER BY AuditDate DESC) AS ROW,
AuditDate,
COUNT(NickName) as [TAccounts],
SUM(IIF((CurrGBP > 100 OR CurrUSD > 100), 1, 0)) as [Funded]
FROM Audits
GROUP BY AuditDate
) A
INNER JOIN
( SELECT
ROW_NUMBER() OVER (ORDER BY AuditDate DESC) AS ROW,
AuditDate,
COUNT(NickName) as [TAccounts],
SUM(IIF((CurrGBP > 100 OR CurrUSD > 100), 1, 0)) as [Funded]
FROM Audits
GROUP BY AuditDate
) B ON B.ROW = A.ROW + 1

SQL query self join

I am working on a query for a report in Oracle 10g.
I need to generate a short list of each course along with the number of times they were offered in the past year (including ones that weren't actually offered).
I created one query
SELECT coursenumber, count(datestart) AS Offered
FROM class
WHERE datestart BETWEEN (sysdate-365) AND sysdate
GROUP BY coursenumber;
Which produces
COURSENUMBER OFFERED
---- ----------
ST03 2
PD01 1
AY03 2
TB01 4
This query is all correct. However ideally I want it to list those along with COURSENUMBER HY and CS in the left column as well with 0 or null as the OFFERED value. I have a feeling this involves a join of sorts, but so far what I have tried doesn't produce the classes with nothing offered.
The table normally looks like
REFERENCE_NO DATESTART TIME TIME EID ROOMID COURSENUMBER
------------ --------- ---- ---- ---------- ---------- ----
256 03-MAR-11 0930 1100 2 2 PD01
257 03-MAY-11 0930 1100 12 7 PD01
258 18-MAY-11 1230 0100 12 7 PD01
259 24-OCT-11 1930 2015 6 2 CS01
260 17-JUN-11 1130 1300 6 4 CS01
261 25-MAY-11 1900 2000 13 6 HY01
262 25-MAY-11 1900 2000 13 6 HY01
263 04-APR-11 0930 1100 13 5 ST03
264 13-SEP-11 1930 2100 6 4 ST03
265 05-NOV-11 1930 2100 6 5 ST03
266 04-FEB-11 1430 1600 6 5 ST03
267 02-JAN-11 0630 0700 13 1 TB01
268 01-FEB-11 0630 0700 13 1 TB01
269 01-MAR-11 0630 0700 13 1 TB01
270 01-APR-11 0630 0700 13 1 TB01
271 01-MAY-11 0630 0700 13 1 TB01
272 14-MAR-11 0830 0915 4 3 AY03
273 19-APR-11 0930 1015 4 3 AY03
274 17-JUN-11 0830 0915 14 3 AY03
275 14-AUG-09 0930 1015 14 3 AY03
276 03-MAY-09 0830 0915 14 3 AY03
SELECT
coursenumber,
COUNT(CASE WHEN datestart BETWEEN (sysdate-365) AND sysdate THEN 1 END) AS Offered
FROM class
GROUP BY coursenumber;
So, as you can see, this particular problem doesn't need a join.
I think something like this should work for you, by just doing it as a subquery.
SELECT distinct c.coursenumber,
(SELECT COUNT(*)
FROM class
WHERE class.coursenumber = c.coursenumber
AND datestart BETWEEN (sysdate-365) AND sysdate
) AS Offered
FROM class c
I like jschoen's answer better for this particular case (when you want one and only one row and column out of the subquery for each row of the main query), but just to demonstrate another way to do it:
select t1.coursenumber, nvl(t2.cnt,0)
from class t1 left outer join (
select coursenumber, count(*) cnt
from class
where datestart between (sysdate-365) AND sysdate
group by coursenumber
) t2 on t1.coursenumber = t2.coursenumber