GROUP BY selecting minimum element in another column

GROUP BY selecting minimum element in another column - sql

Say I had a table like so:
Timestamp1
Timestamp2
Diff
2015-03-17 20:33:00
2015-03-17 20:00:00
33
2015-03-17 20:33:00
2015-03-17 21:00:00
27
2015-03-18 19:17:00
2015-03-18 20:00:00
43
2015-03-18 19:17:00
2015-03-18 19:00:00
17
Note Diff is calculated as the difference between Timestamp1 and Timestamp2. I would like to use SQL to return the records such that the difference is smallest for records with the same Timestamp1 value.
Timestamp1
Timestamp2
Diff
2015-03-17 20:33:00
2015-03-17 21:00:00
27
2015-03-18 19:17:00
2015-03-18 19:00:00
17
Also, there could be ties, in which case pick randomly (doesn't matter if actually random or hardcoded).
I've tried following something like this but I'm having trouble with the tie-breaking case where the difference is 30.

You can use the ROW_NUMBER window function to build an incremental ordering value for each of your differences in your table, then use a QUALIFY clause to set that value equal to 1, so that you'll get only the first one row (the minimum difference), ignoring ties.
SELECT *
FROM tab
QUALIFY ROW_NUMBER() OVER(PARTITION BY Timestamp1 ORDER BY Diff) = 1

Window functions can help, given that you can take the first value with the FIRST_VALUE function
https://cloud.google.com/bigquery/docs/reference/standard-sql/navigation_functions#first_value
The query could be like this:
SELECT
FIRST_VALUE(Diff)
OVER (PARTITION BY Timestamp1 ORDER BY Diff DESC
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS min-diff
FROM tableA);

Related

Hi, I create a table with only business day, I need to calculate every 4th business day from my date table. Can you please help me?

My table is like:
2021-03-01
2021-03-02 (for exmp. 3rd March is holiday and not including to table)
2021-03-04
2021-03-05
...
2021-05-03
2021-05-04
2021-05-05
2021-05-06
2021-05-07
...
And I should get result as 2021-03-05 for march, and 2021-05-06 for may.
So every month I should get the 4th row as date result.

You can use row_number(). But the real trick are the date functions. Let me assume that your database supports year() and month():
select t.*
from (select t.*,
row_number() over (partition by year(col), month(col) order by col) as seqnum
from t
) t
where seqnum = 4;

Getting last_value MariaDB SQL

I have this data in the table:
internal_id
match_id
company_id
market_id
selection_id
odds_value
update_date
1442
8483075
66
1
1
100
2021-01-04 18:58:19
1
8483075
66
1
1
10
2021-01-04 18:57:19
2
8483075
66
1
2
19
2021-01-04 18:57:19
3
8483075
66
1
3
1.08
2021-01-04 18:57:19
I'm trying to get last value of odds_value from whole table for each combination of match_id + company_id + market_id + selection_id based on update_date.
I wrote this query which is not working:
SELECT
odds.`internal_id`,
odds.`match_id`,
odds.`company_id`,
odds.`market_id`,
odds.`selection_id`,
odds.`update_date`,
odds.`odd_value`,
LAST_VALUE (odds.`odd_value`) OVER (PARTITION BY odds.`internal_id`, odds.`match_id`, odds.`company_id`, odds.`market_id`, odds.`selection_id` ORDER BY odds.`update_date` DESC) AS last_value
FROM
`odds`
LEFT JOIN `matches` ON matches.match_id = odds.match_id
WHERE
odds.match_id = 8483075
and odds.company_id = 66
GROUP BY
odds.match_id,
odds.company_id,
odds.market_id,
odds.selection_id
For match_id=8483075 & market_id=1 and selection_id=1 I'm getting odd_value 10 instead of 100.
What am I doing wrong? or maybe there is a better way to get that (using internal_id = higher means most recent)?

LAST_VALUE() is very strange. The problem is that the default window frame for the ordering is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW.
I won't go into the details, but the fix is to just always use FIRST_VALUE(). I'm also fixing the PARTITION BY to match the description in your question:
FIRST_VALUE(odds.odd_value) OVER (PARTITION BY odds.company_id, odds.market_id, odds.selection_id
ORDER BY odds.update_date DESC
) AS last_value
Ironically, you already have a descending sort, so your last value was really fetching the first value anyway, sort of.

How to retrieve other columns when performing an aggregate function?

I've been trying to retrieve other columns from a table in which I'm performing an aggregate function to get the minimum number by date, this is an example of the data:
id resource date quality ask ask_volume
1 1 2020-06-08 10:50 0 6.9 5102
2 1 2020-06-08 10:50 1 6.8 2943
3 1 2020-06-08 10:50 2 6.9 25338
4 1 2020-06-08 10:50 3 7.0 69720
5 1 2020-06-08 10:50 4 7.0 9778
6 1 2020-06-08 10:50 5 7.0 297435
7 1 2020-06-08 10:40 0 6.6 611
8 1 2020-06-08 10:40 1 6.6 4331
9 1 2020-06-08 10:40 2 6.7 1000
10 1 2020-06-08 10:40 3 7.0 69720
11 1 2020-06-08 10:40 4 7.0 9778
12 1 2020-06-08 10:40 5 7.0 297435
...
This is the desired result I'm trying to get, so I can perform a weighted average on it:
date ask ask_volume
2020-06-08 10:50 6.8 2943
2020-06-08 10:40 6.6 4331
...
Though both quality 0 and quality 1 have the same ask, quality 1 shall be chosen because its ask_volume is higher.
I have tried the classic:
SELECT date, min(ask) FROM table GROUP BY date;
But adding ask_volume to the column list will force me to add it to the GROUP BY as well, messing up the result.
The problems are:
How can I get the corresponding ask_volume of the minimum ask displayed in the result?
And, if there are two records with the same ask value on the same date, how can I get ask_volume to show the one with the highest value?
I use PostgreSQL, but SQL from a different database will help me get the idea as well.

In standard SQL, you would use window functions:
select *
from (
select t.*, row_number() over(partition by date order by ask, ask_volume desc) rn
from mytable
) t
where rn = 1
In Postgres this is better suited for distinct on:
select distinct on (date) *
from mytable
order by ask, ask_volume desc

You can do what you want with distinct on:
select distinct on (date) t.*
from (select t.*,
order by date, ask, ask_volume desc;
I find your date column confusing. It has a time component, so the name is misleading.

Other answers are simpler and better, but here is an alternative to get around your aggregation problem. You could use a subquery to only include max ask_volume per date per ask before you get the min ask per date.
select date, min(ask), max(ask_volume)
from t
where (date, ask_volume) in (select date, max(ask_volume)
from t
group by date, ask)
group by date;

DISTINCT ON has already been suggested, but in imperfect ways. (The currently accepted answer is incorrect.) That's how you do it:
SELECT DISTINCT ON (date) *
FROM tbl
ORDER BY date, ask, ask_volume DESC NULLS LAST;
Most importantly, leading expressions in ORDER BY must be in the set of expressions in DISTINCT ON. In other words for the simple case, date must be the first ORDER BY expression.
While null values have not been ruled out (with a NOT NULL constraint), you must add NULLS LAST or get null values first in descending order.
Detailed explanation:
Select first row in each GROUP BY group?

Select to search column on group by query

I have one table called prices that have a reference from table products through product_id column. I want a query that selects prices grouped by product_id with the max final date and get the value of start_date through one select with id of price grouped.
I try with the following query but I am getting a wrong value of start date. Is weird because of the result subquery return more than one row even though I use the price id on where clause. Because that I put the limit on the query but it is wrong.
select prices.produto_id, prices.id,
MAX(CASE WHEN prices.finish_date IS NULL THEN COALESCE(prices.finish_date,'9999-12-31') ELSE prices.finish_date END) as finish_date,
(select start_date from prices where prices.id = prices.id limit 1)
as start_date from prices group by prices.product_id, prices.id
How I can get the relative start date of the price id in my grouped row? I am using postgresql.
A example to view what I want with my query:
DataSet:
ID | PRODUCT_ID | START_DATE | FINISH_DATE
1 1689 2018-01-19 02:00:00 2019-11-19 23:59:59
2 1689 2019-10-11 03:00:00 2019-10-15 23:59:59
3 1689 2019-01-11 03:00:00 2019-05-15 23:59:59
4 1690 2019-11-11 03:00:00 2019-12-15 23:59:59
5 1690 2019-05-11 03:00:00 2025-12-15 23:59:59
6 1691 2019-05-11 03:00:00 null
I want this result:
ID | PRODUCT_ID | START_DATE | FINISH_DATE
1 1689 2018-01-19 02:00:00 2019-11-19 23:59:59
5 1690 2019-05-11 03:00:00 2025-12-15 23:59:59
6 1691 2019-05-11 03:00:00 9999-12-31 23:59:59
The start date should be the same value of the row before the group by.

I would recommend DISTINCT ON in Postgres:
select distinct on (p.product_id) p.*
from prices p
order by p.product_id,
p.finish_date desc nulls first;
NULL values are treated as larger than any other value, so a descending sort puts them first. However, I've included nulls first just to be explicit.
DISTINCT ON is a very handy Postgres extension, which you can learn more about in the documentation.

Try this
with data as (
SELECT id, product_id,
max(COALESCE(finish_date,'9999-12-31')) as finish_date from prices group by 1,2)
select d.*, p.start_date from data d join prices p on p.id = d.id;
It surely isnt' the most elegant solution, but it should work.

Get MAX count but keep the repeated calculated value if highest

I have the following table, I am using SQL Server 2008
BayNo FixDateTime FixType
1 04/05/2015 16:15:00 tyre change
1 12/05/2015 00:15:00 oil change
1 12/05/2015 08:15:00 engine tuning
1 04/05/2016 08:11:00 car tuning
2 13/05/2015 19:30:00 puncture
2 14/05/2015 08:00:00 light repair
2 15/05/2015 10:30:00 super op
2 20/05/2015 12:30:00 wiper change
2 12/05/2016 09:30:00 denting
2 12/05/2016 10:30:00 wiper repair
2 12/06/2016 10:30:00 exhaust repair
4 12/05/2016 05:30:00 stereo unlock
4 17/05/2016 15:05:00 door handle repair
on any given day need do find the highest number of fixes made on a given bay number, and if that calculated number is repeated then it should also appear in the resultset
so would like to see the result set as follows
BayNo FixDateTime noOfFixes
1 12/05/2015 00:15:00 2
2 12/05/2016 09:30:00 2
4 12/05/2016 05:30:00 1
4 17/05/2016 15:05:00 1
I manage to get the counts of each but struggling to get the max and keep the highest calculated repeated value. can someone help please

Use window functions.
Get the count for each day by bayno and also find the min fixdatetime for each day per bayno.
Then use dense_rank to compute the highest ranked row for each bayno based on the number of fixes.
Finally get the highest ranked rows.
select distinct bayno,minfixdatetime,no_of_fixes
from (
select bayno,minfixdatetime,no_of_fixes
,dense_rank() over(partition by bayno order by no_of_fixes desc) rnk
from (
select t.*,
count(*) over(partition by bayno,cast(fixdatetime as date)) no_of_fixes,
min(fixdatetime) over(partition by bayno,cast(fixdatetime as date)) minfixdatetime
from tablename t
) x
) y
where rnk = 1
Sample Demo

You are looking for rank() or dense_rank(). I would right the query like this:
select bayno, thedate, numFixes
from (select bayno, cast(fixdatetime) as date) as thedate,
count(*) as numFixes,
rank() over (partition by cast(fixdatetime as date) order by count(*) desc) as seqnum
from t
group by bayno, cast(fixdatetime as date)
) b
where seqnum = 1;
Note that this returns the date in question. The date does not have a time component.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

GROUP BY selecting minimum element in another column - sql

Related

Hi, I create a table with only business day, I need to calculate every 4th business day from my date table. Can you please help me?

Getting last_value MariaDB SQL

How to retrieve other columns when performing an aggregate function?

Select to search column on group by query

Get MAX count but keep the repeated calculated value if highest

Categories

Resources