How to sort a sql result based on values in previous row? - sql

I'm trying to sort a sql data selection by values in columns of the result set. The data looks like:
(This data is not sorted correctly, just an example)
ID projectID testName objectBefore objectAfter
=======================================================================================
13147 280 CDM-710 Generic TP-0000120 TOC~~#~~ -1 13148
1145 280 3.2 Quadrature/Carrier Null 25 Deg C 4940 1146
1146 280 3.2 Quadrature/Carrier Null 0 Deg C 1145 1147
1147 280 3.3 External Frequency Reference 1146 1148
1148 280 3.4 Phase Noise 50 Deg C 1147 1149
1149 280 3.4 Phase Noise 25 Deg C 1148 1150
1150 280 3.4 Phase Noise 0 Deg C 1149 1151
1151 280 3.5 Output Spurious 50 Deg C 1150 1152
1152 280 3.5 Output Spurious 25 Deg C 1151 1153
1153 280 3.5 Output Spurious 0 Deg C 1152 1154
............
18196 280 IP Regression Suite 18195 -1
The order of the data is based on the objectBefore and the objectAfter columns. The first row will always be when objectBefore = -1 and the last row will be when objectAfter = -1. In the above example, the second row would be ID 13148 as that is what row 1 objectAfter is equal to. Is there any way to write a query that would order the data in this manner?

This is actually sorting a linked list:
WITH SortedList (Id, objectBefore , projectID, testName, Level)
AS
(
SELECT Id, objectBefore , projectID, testName, 0 as Level
FROM YourTable
WHERE objectBefore = -1
UNION ALL
SELECT ll.Id, ll.objectBefore , ll.projectID, ll.testName, Level+1 as Level
FROM YourTable ll
INNER JOIN SortedList as s
ON ll.objectBefore = s.Id
)
SELECT Id, objectBefore , projectID, testName
FROM SortedList
ORDER BY Level
You can find more details in this post

Related

how to select a value based on multiple criteria

I'm trying to select some values based on some proprietary data, and I just changed the variables to reference house prices.
I am trying to get the total offers for houses where they were sold at the bid or at the ask price, with offers under 15 and offers * sale price less than 5,000,000.
I then want to get the total number of offers for each neighborhood on each day, but instead I'm getting the total offers across each neighborhood (n1 + n2 + n3 + n4 + n5) across all dates and the total offers in the dataset across all dates.
My current query is this:
SELECT DISTINCT(neighborhood),
DATE(date_of_sale),
(SELECT SUM(offers)
FROM `big_query.a_table_name.houseprices`
WHERE ((offers * accepted_sale_price < 5000000)
AND (offers < 15)
AND (house_bid = sale_price OR
house_ask = sale_price))) as bid_ask_off,
(SELECT SUM(offers)
FROM `big_query.a_table_name.houseprices`) as
total_offers,
FROM `big_query.a_table_name.houseprices`
GROUP BY neighborhood, DATE(date_of_sale) LIMIT 100
Which I am expecting a result like, with date being repeated throughout as d1, d2, d3, etc.:
but am instead receiving
I'm aware that there are some inherent problems with what I'm trying to select / group, but I'm not sure what to google or what tutorials to look at in order to perform this operation.
It's querying quite a bit of data, and I want to keep costs down, as I've already racked up a smallish bill on queries.
Any help or advice would be greatly appreciated, and I hope I've provided enough information.
Here is a sample dataframe.
neighborhood date_of_sale offers accepted_sale_price house_bid house_ask
bronx 4/1/2022 3 323 320 323
manhattan 4/1/2022 4 244 230 244
manhattan 4/1/2022 8 856 856 900
queens 4/1/2022 15 110 110 135
brooklyn 4/2/2022 12 115 100 115
manhattan 4/2/2022 9 255 255 275
bronx 4/2/2022 6 330 300 330
queens 4/2/2022 10 405 395 405
brooklyn 4/2/2022 4 254 254 265
staten_island 4/3/2022 2 442 430 442
staten_island 4/3/2022 13 195 195 225
bronx 4/3/2022 4 650 650 690
manhattan 4/3/2022 2 286 266 286
manhattan 4/3/2022 6 356 356 400
staten_island 4/4/2022 4 361 361 401
staten_island 4/4/2022 5 348 348 399
bronx 4/4/2022 8 397 340 397
manhattan 4/4/2022 9 333 333 394
manhattan 4/4/2022 11 392 325 392
I think that this is what you need.
As we group by neighbourhood we do not need DISTINCT.
We take sum(offers) for total_offers directly from the table and bids from a sub-query which we join to so that it is grouped by neighbourhood.
SELECT
h.neighborhood,
DATE(h.date_of_sale) AS date_,
s.bids AS bid_ask_off,
SUM(h.offers) AS total_offers,
FROM
`big_query.a_table_name.houseprices` h
LEFT JOIN
(SELECT
neighborhood,
SUM(offers) AS bids
FROM
`big_query.a_table_name.houseprices`
WHERE offers * accepted_sale_price < 5000000
AND offers < 15
AND (house_bid = sale_price OR
house_ask = sale_price)
GROUP BY neighborhood) s
ON h.neighborhood = s.neighborhood
GROUP BY
h.neighborhood,
DATE(date_of_sale),
s.bids
LIMIT 100;
Or the following which modifies more the initial query but may be more like what you need.
SELECT
h.neighborhood,
DATE(h.date_of_sale) AS date_,
s.bids AS bid_ask_off,
SUM(h.offers) AS total_offers,
FROM
`big_query.a_table_name.houseprices` h
LEFT JOIN
(SELECT
date_of_sale dos,
neighborhood,
SUM(offers) AS bids
FROM
`big_query.a_table_name.houseprices`
WHERE offers * accepted_sale_price < 5000000
AND offers < 15
AND (house_bid = sale_price OR
house_ask = sale_price)
GROUP BY
neighborhood,
date_of_sale) s
ON h.neighborhood = s.neighborhood
AND h.date_of_sale = s.dos
GROUP BY
h.neighborhood,
DATE(date_of_sale),
s.bids
LIMIT 100;

Create calculated field from MIN() and MAX() values of another column, grouped by unique ID

I have a table that looks like this, containing information about an object's position and the length of time it existed for (age):
Date ID Age x y
2021-03-25 20 1 531 295
2021-03-25 20 2 478 272
2021-03-25 20 3 421 272
2021-03-26 20 1 478 286
2021-03-26 21 1 903 342
And I am trying to select the x position of a certain ID, when the age is at its minimum value for that ID (column named xStart), and when it is at its maximum value (column named xFin). IDs represent a different object on each day, so ID 20 on the 25th will not be the same object as ID 20 on the 26th.
I would like the resulting table to look something like this:
Date ID Age x y xStart xEnd
2021-03-25 20 1 531 295 531 421
2021-03-25 20 2 478 272 531 421
2021-03-25 20 3 421 272 531 421
2021-03-26 20 1 478 286 478 some number
2021-03-26 21 1 903 342 908 some other number
And that table could be grouped for each ID:
Date ID MAX(Age) xStart xEnd
2021-03-25 20 3 531 421
2021-03-26 20 1 478 some number
2021-03-26 21 1 908 some other number
You can use window functions, if I understand:
select distinct date, id,
max(age) over (partition by date, id),
first_value(x) over (partition by date, id order by age) as xstart,
first_value(x) over (partition by date, id order by age desc) as xend
from t;

Get nearest date column value from another table in SQL Server

I have two tables A and B,
Table A
PstngDate WorkingDayOutput
12/1/2020 221
12/3/2020 327
12/4/2020 509
12/5/2020 418
12/7/2020 390
12/8/2020 431
12/9/2020 244
12/10/2020 246
12/11/2020 314
12/12/2020 301
12/14/2020 411
12/15/2020 530
12/16/2020 554
12/17/2020 300
12/18/2020 375
12/23/2020 402
12/24/2020 302
12/25/2020 269
12/26/2020 382
12/28/2020 608
Table B
PstngDate HolidayOutput isWorkingDay
12/2/2020 20 0
12/6/2020 24 0
12/13/2020 31 0
12/19/2020 82 0
12/22/2020 507 0
12/27/2020 537 0
Expected output:
PstngDate WorkingDayOutput HolidayOutput
12/1/2020 221 20
12/3/2020 327
12/4/2020 509
12/5/2020 418 24
12/7/2020 390
12/8/2020 431
12/9/2020 244
12/10/2020 246
12/11/2020 314
12/12/2020 301 31
12/14/2020 411
12/15/2020 530
12/16/2020 554
12/17/2020 300
12/18/2020 375 589
12/23/2020 402
12/24/2020 302
12/25/2020 269
12/26/2020 382 537
12/28/2020 608
I want to join TableB to TableA with nearest lesser date column. If you see Expectedoutput table, day 18 row of holidayoutput column is taking sum of day19 and day22 of table B.
I want to join TableB to TableA with nearest lesser date column
This sounds like a lateral join:
select a.*, coalesce(b.holidayquantity, 0) as holidayquantity
from a
outer apply (
select top (1) b.*
from b
where b.pstng_date >= a.pstng_date
order by b.pstng_date
) b
You can use self left join as follows:
Select pstng_date, workingDayQuantity,
HolidayQuantity,
workingDayQuantity + HolidayQuantity as total
From
(Select a.*, b.HolidayQuantity,
Row_number() over (partirion by a.psrng_date order by b.pstng_date) ad rn
From tablea a join tableb b On b.pstng_date > a.pstng_date) t
Where rn=1

HSQLDB query to replace a null value with a value derived from another record

This is a small excerpt from a much larger table, call it LOG:
RN EID FID FRID TID TFAID
1 364 509 7045 null 7452
2 364 509 7045 7452 null
3 364 509 7045 7457 null
4 375 512 4525 5442 5241
5 375 513 4525 5863 5241
6 375 515 4525 2542 5241
7 576 621 5632 null 5452
8 576 621 5632 2595 null
9 672 622 5632 null 5966
10 672 622 5632 2635 null
I would like a query that will replace the null in the 'TFAID' column with the value from the 'TFAID' column from the 'FID' column that matches.
Desired output would therefore be:
RN EID FID FRID TID TFAID
1 364 509 7045 null 7452
2 364 509 7045 7452 7452
3 364 509 7045 7457 7452
4 375 512 4525 5442 5241
5 375 513 4525 5863 5241
6 375 515 4525 2542 5241
7 576 621 5632 null 5452
8 576 621 5632 2595 5452
9 672 622 5632 null 5966
10 672 622 5632 2635 5966
I know that something like
SELECT RN,
EID,
FID,
FRID,
TID,
(COALESCE TFAID, {insert clever code here}) AS TFAID
FROM LOG
is what I need, but I can't for the life of me come up with the clever bit of SQL that will fill in the proper TFAID.
HSQLDB supports SQL features that can be used as alternatives. These features are not supported by some other databases.
CREATE TABLE LOG (RN INT, EID INT, FID INT, FRID INT, TID INT, TFAID INT);
-- using LATERAL
SELECT l.RN, l.EID, l.FID, l.FRID, l.TID,
COALESCE(l.TFAID, f.TFAID) AS TFAID
FROM LOG l , LATERAL (SELECT MAX(TFAID) AS TFAID FROM LOG f WHERE f.FID = l.FID) f
-- using scalar subquery
SELECT l.RN, l.EID, l.FID, l.FRID, l.TID,
COALESCE(l.TFAID, (SELECT MAX(TFAID) AS TFAID FROM LOG f WHERE f.FID = l.FID)) AS TFAID
FROM LOG l
Here is one approach. This aggregates the log to get the value and then joins the result in:
SELECT l.RN, l.EID, l.FID, l.FRID, l.TID,
COALESCE(l.TFAID, f.TFAID) AS TFAID
FROM LOG l join
(select fid, max(tfaid) as tfaid
from log
group by fid
) f
on l.fid = f.fid;
There may be other approaches that are more efficient. However, HSQL doesn't implement all SQL features.

Calculating difference from previous record

May I ask for your help with the following please ?
I am trying to calculate a change from one record to the next in my results. It will probably help if I show you my current query and results ...
SELECT A.AuditDate, COUNT(A.NickName) as [TAccounts],
SUM(IIF((A.CurrGBP > 100 OR A.CurrUSD > 100), 1, 0)) as [Funded]
FROM Audits A
GROUP BY A.AuditDate;
The query gives me these results ...
AuditDate D/M/Y TAccounts Funded
--------------------------------------------
30/12/2011 506 285
04/01/2012 514 287
05/01/2012 514 288
06/01/2012 516 288
09/01/2012 520 289
10/01/2012 522 289
11/01/2012 523 290
12/01/2012 524 290
13/01/2012 526 291
17/01/2012 531 292
18/01/2012 532 292
19/01/2012 533 293
20/01/2012 537 295
Ideally, the results I would like to get, would be similar to the following ...
AuditDate D/M/Y TAccounts TChange Funded FChange
------------------------------------------------------------------------
30/12/2011 506 0 285 0
04/01/2012 514 8 287 2
05/01/2012 514 0 288 1
06/01/2012 516 2 288 0
09/01/2012 520 4 289 1
10/01/2012 522 2 289 0
11/01/2012 523 1 290 1
12/01/2012 524 1 290 0
13/01/2012 526 2 291 1
17/01/2012 531 5 292 1
18/01/2012 532 1 292 0
19/01/2012 533 1 293 1
20/01/2012 537 4 295 2
Looking at the row for '17/01/2012', 'TChange' has a value of 5 as the 'TAccounts' has increased from previous 526 to 531. And the 'FChange' would be based on the 'Funded' field. I guess something to be aware of is the fact that the previous row to this example, is dated '13/01/2012'. What I mean is, there are some days where I have no data (for example over weekends).
I think I need to use a SubQuery but I am really struggling to figure out where to start. Could you show me how to get the results I need please ?
I am using MS Access 2010
Many thanks for your time.
Johnny.
Here is one approach you could try...
SELECT B.AuditDate,B.TAccounts,
B.TAccount -
(SELECT Count(NickName) FROM Audits WHERE AuditDate=B.PrevAuditDate) as TChange,
B.Funded -
(SELECT Count(*) FROM Audits WHERE AuditDate=B.PrevAuditDate AND (CurrGBP > 100 OR CurrUSD > 100)) as FChange
FROM (
SELECT A.AuditDate,
(SELECT Count(NickName) FROM Audits WHERE AuditDate=A.AuditDate) as TAccounts,
(SELECT Count(*) FROM Audits WHERE (CurrGBP > 100 OR CurrUSD > 100)) as Funded,
(SELECT Max(AuditDate) FROM Audits WHERE AuditDate<A.AuditDate) as PrevAuditDate
FROM
(SELECT DISTINCT AuditDate FROM Audits) AS A) AS B
Instead of using a Group By I've used subquerys to get both TAccounts and Funded, as well as the Previous Audit Date, which is then used on the main SELECT statement to get TAccounts and Funded again but this time for the previous date, so that any required calculation can be done against them.
But I would imagine this may be slow to process
It's a shame MS never made this type of thing simple in Access, how many rows are you working with on your report?
If it's under 65K then I would suggest dumping the data on to an Excel spreadsheet and using a simple formula to calculate the different between rows.
You can try something like the following (sql is untested and will require some changes)
SELECT
A.AuditDate,
A.TAccounts,
A.TAccounts - B.TAccounts AS TChange,
A.Funded,
A.Funded - B.Funded AS FChange
FROM
( SELECT
ROW_NUMBER() OVER (ORDER BY AuditDate DESC) AS ROW,
AuditDate,
COUNT(NickName) as [TAccounts],
SUM(IIF((CurrGBP > 100 OR CurrUSD > 100), 1, 0)) as [Funded]
FROM Audits
GROUP BY AuditDate
) A
INNER JOIN
( SELECT
ROW_NUMBER() OVER (ORDER BY AuditDate DESC) AS ROW,
AuditDate,
COUNT(NickName) as [TAccounts],
SUM(IIF((CurrGBP > 100 OR CurrUSD > 100), 1, 0)) as [Funded]
FROM Audits
GROUP BY AuditDate
) B ON B.ROW = A.ROW + 1