SQL get max value with date smaller date - sql

I have a table like this:
| colA | date | num |
| x | 1.7. | 2 |
| x | 3.7. | 1 |
| x | 4.7. | 3 |
| z | 1.8. | 0 | (edit)
| z | 2.8. | 1 |
| z | 5.8. | 2 |
And I want a result like this:
| colA | date | maxNum |
| x | 1.7. | null |
| x | 3.7. | 2 |
| x | 4.7. | 2 |
| z | 1.8. | null | (edit)
| z | 2.8. | 0 |
| z | 5.8. | 1 |
So I want to have the max(num) for every row where the date is smaller the date grouped by colA.
Is this somehow possible with a simple query? It would be part of a bigger query needed for some calculations on big databases.
Edit: maxNum should be null if there is no value before a date in the group
Thanks in advance.

Use MAX..KEEP syntax.
select cola,
adate,
max(num) keep ( dense_rank first order by adate ) over (partition by cola ) maxnum,
case when adate = min(adate) over ( partition by cola )
then null
else max(num) keep ( dense_rank first order by adate ) over (partition by cola ) end maxnum_op
from input;
+------+-------+--------+-----------+
| COLA | ADATE | MAXNUM | MAXNUM_OP |
+------+-------+--------+-----------+
| x | 1.7 | 2 | |
| x | 3.7 | 2 | 2 |
| x | 4.7 | 2 | 2 |
| z | 2.8 | 1 | |
| z | 5.8 | 1 | 1 |
+------+-------+--------+-----------+
The MAXNUM_OP column shows the results you wanted, but you never explained why some of the values were supposed to be null. The MAXNUM column shows the results that I think you described in the text of your post.

You can use first_value and row_number analytical function as following:
Select cola,
date,
case when row_number() over (partition by cola order by date) > 1 then
first_value(num) over (partition by cola order by date)
end as maxnum
From your_table;
Cheers!!

One way is to use a subquery.
SELECT t1.cola,
t1.date,
(SELECT max(t2.num)
FROM elbat t2
WHERE t2.cola = t1.cola
AND t2.date < t1.date) maxnum
FROM elbat t1;

Related

How to use SQL PARTITION BY GROUPS?

I'm working with PostgreSQL 12, but the question is standard SQL.
I have a table like this:
| timestamp | raw_value |
| ------------------------ | --------- |
| 2015-06-27T03:52:50.000Z | 0 |
| 2015-06-27T03:53:00.000Z | 0 |
| 2015-06-27T03:53:10.000Z | 1 |
| 2015-06-27T03:53:20.000Z | 1 |
| 2015-06-27T04:22:20.000Z | 1 |
| 2015-06-27T04:22:30.000Z | 0 |
| 2015-06-27T05:33:40.000Z | 1 |
| 2015-06-27T05:33:50.000Z | 1 |
I need to get the first and last timestamp of each group with raw_value = 1, i.e. needed result :
| start_time | end_time |
| ------------------------ | ------------------------ |
| 2015-06-27T03:53:10.000Z | 2015-06-27T04:22:20.000Z |
| 2015-06-27T05:33:40.000Z | 2015-06-27T05:33:50.000Z |
My best effort so far looks like this:
SELECT timestamp, raw_value, row_number() over w as rn, first_value(obt) OVER w AS start_time, last_value(obt) OVER w AS end_time
FROM mytable
WINDOW w AS (PARTITION BY raw_value ORDER BY timestamp GROUPS CURRENT ROW )
ORDER BY timestamp;
Google doesn't have much info about it, but according to the docs the "GROUPS" clause is exactly what I need, but the end result is wrong, because window functions simply copy value from the timestamp column:
| timestamp | raw_value | rn | start_time | end_time |
| ------------------------ | --------- | --- | ------------------------ | ------------------------ |
| 2015-06-27T03:52:50.000Z | 0 | 1 | 2015-06-27T03:52:50.000Z | 2015-06-27T03:52:50.000Z |
| 2015-06-27T03:53:00.000Z | 0 | 2 | 2015-06-27T03:53:00.000Z | 2015-06-27T03:53:00.000Z |
| 2015-06-27T03:53:10.000Z | 1 | 1 | 2015-06-27T03:53:10.000Z | 2015-06-27T03:53:10.000Z |
| 2015-06-27T03:53:20.000Z | 1 | 2 | 2015-06-27T03:53:20.000Z | 2015-06-27T03:53:20.000Z |
| 2015-06-27T04:22:20.000Z | 1 | 3 | 2015-06-27T04:22:20.000Z | 2015-06-27T04:22:20.000Z |
| 2015-06-27T04:22:30.000Z | 0 | 3 | 2015-06-27T04:22:30.000Z | 2015-06-27T04:22:30.000Z |
| 2015-06-27T05:33:40.000Z | 1 | 4 | 2015-06-27T05:33:40.000Z | 2015-06-27T05:33:40.000Z |
| 2015-06-27T05:33:50.000Z | 1 | 5 | 2015-06-27T05:33:50.000Z | 2015-06-27T05:33:50.000Z |
At line#6 I'd expect the row number to reset to 1, but it doesn't! I tried using BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING as well without luck.
I have created a DB Fiddle link for your convenience as well.
If there is any other way to achieve the same result in SQL (ok to be PG-specific) without window functions, I'd like to know.
Identify groups using row_number() - sum() trick, then choose min and max time for each identified group.
with grp as (
select obt, raw_value
, row_number() over w - sum(raw_value) over w as g
from tm_series
window w as (order by obt)
)
select min(obt), max(obt)
from grp
where raw_value = 1
group by g;
DB fiddle here.
(The GROUPS clause depends on window ordering and seems to have nothing common with your problem.)
Your updated fiddle here.
For an gaps and islands approach, first mark your transitions from raw_value = 0 to raw_value = 1
with mark_changes as (
select obt, raw_value,
case
when raw_value = 0 then 0
when raw_value = lag(raw_value) over (order by obt) then 0
else 1
end as transition
from tm_series
),
Keep only the raw_value = 1 rows, and sum() the preceding transition markers to place each row into a group.
id_groups as (
select obt, raw_value,
sum(transition) over (order by obt) as grp_num
from mark_changes
where raw_value = 1
)
Use group by on these grp_num values to get your desired result.
select min(obt) as start_time,
max(obt) as end_time
from id_groups
group by grp_num
order by min(obt);

SQL Count In Range

How could I count data in range which could be configured
Something like this,
CAR_AVBL
+--------+-----------+
| CAR_ID | DATE_AVBL |
+--------------------|
| JJ01 | 1 |
| JJ02 | 1 |
| JJ03 | 3 |
| JJ04 | 10 |
| JJ05 | 13 |
| JJ06 | 4 |
| JJ07 | 10 |
| JJ08 | 1 |
| JJ09 | 23 |
| JJ10 | 11 |
| JJ11 | 20 |
| JJ12 | 3 |
| JJ13 | 19 |
| JJ14 | 22 |
| JJ15 | 7 |
+--------------------+
ZONE_CFG
+--------+------------+
| DATE | ZONE_DESCR |
+--------+------------+
| 15 | GREEN_ZONE |
| 25 | YELLOW_ZONE|
| 30 | RED_ZONE |
+--------+------------+
Table ZONE_CFG is configurable, so I could not use static value for this
The DATE column mean maximum date for each ZONE
And the result what I expected :
+------------+----------+
| ZONE_DESCR | AVBL_CAR |
+------------+----------+
| GREEN_ZONE | 11 |
| YELLOW_ZONE| 4 |
| RED_ZONE | 0 |
+------------+----------+
Please could someone help me with this
You can use LAG and group by as following:
SELECT
ZC.ZONE_DESCR,
COUNT(1) AS AVBL_CAR
FROM
CAR_AVBL CA
JOIN ( SELECT
ZONE_DECR,
COALESCE(LAG(DATE) OVER(ORDER BY DATE) + 1, 0) AS START_DATE,
DATE AS END_DATE
FROM ZONE_CFG ) ZC
ON ( CA.DATE_AVBL BETWEEN ZC.START_DATE AND ZC.END_DATE )
GROUP BY
ZC.ZONE_DESCR;
Note: Don't use oracle preserved keywords (DATE, in your case) as the name of the columns. Try to change it to something like DATE_ or DATE_START or etc..
Cheers!!
If you want the zero 0, I might suggest a correlated subquery instead:
select z.*,
(select count(*)
from car_avbl c
where c.date_avbl >= start_date and
c.date_avbl <= date
) as avbl_car
from (select z.*,
lag(date, 1, 0) as start_date
from zone_cfg z
) z;
In Oracle 12C, can phrase this using a lateral join:
select z.*,
(c.cnt - lag(c.cnt, 1, 0) over (order by z.date)) as cnt
from zone_cfg z left join lateral
(select count(*) as cnt
from avbl_car c
where c.date_avbl <= z.date
) c
on 1=1

Select latest values for group of related records

I have a table that accommodates data that is logically groupable by multiple properties (foreign key for example). Data is sequential over continuous time interval; i.e. it is a time series data. What I am trying to achieve is to select only latest values for each group of groups.
Here is example data:
+-----------------------------------------+
| code | value | date | relation_id |
+-----------------------------------------+
| A | 1 | 01.01.2016 | 1 |
| A | 2 | 02.01.2016 | 1 |
| A | 3 | 03.01.2016 | 1 |
| A | 4 | 01.01.2016 | 2 |
| A | 5 | 02.01.2016 | 2 |
| A | 6 | 03.01.2016 | 2 |
| B | 1 | 01.01.2016 | 1 |
| B | 2 | 02.01.2016 | 1 |
| B | 3 | 03.01.2016 | 1 |
| B | 4 | 01.01.2016 | 2 |
| B | 5 | 02.01.2016 | 2 |
| B | 6 | 03.01.2016 | 2 |
+-----------------------------------------+
And here is example of desired output:
+-----------------------------------------+
| code | value | date | relation_id |
+-----------------------------------------+
| A | 3 | 03.01.2016 | 1 |
| A | 6 | 03.01.2016 | 2 |
| B | 3 | 03.01.2016 | 1 |
| B | 6 | 03.01.2016 | 2 |
+-----------------------------------------+
To put this in perspective — for every related object I want to select each code with latest date.
Here is a select I came with. I've used ROW_NUMBER OVER (PARTITION BY...) approach:
SELECT indicators.code, indicators.dimension, indicators.unit, x.value, x.date, x.ticker, x.name
FROM (
SELECT
ROW_NUMBER() OVER (PARTITION BY indicator_id ORDER BY date DESC) AS r,
t.indicator_id, t.value, t.date, t.company_id, companies.sic_id,
companies.ticker, companies.name
FROM fundamentals t
INNER JOIN companies on companies.id = t.company_id
WHERE companies.sic_id = 89
) x
INNER JOIN indicators on indicators.id = x.indicator_id
WHERE x.r <= (SELECT count(*) FROM companies where sic_id = 89)
It works but the problem is that it is painfully slow; when working with about 5% of production data which equals to roughly 3 million fundamentals records this select take about 10 seconds to finish. My guess is that happens due to subselect selecting huge amounts of records first.
Is there any way to speed this query up or am I digging in wrong direction trying to do it the way I do?
Postgres offers the convenient distinct on for this purpose:
select distinct on (relation_id, code) t.*
from t
order by relation_id, code, date desc;
So your query uses different column names than your sample data, so it's hard to tell, but it looks like you just want to group by everything except for date? Assuming you don't have multiple most recent dates, something like this should work. Basically don't use the window function, use a proper group by, and your engine should optimize the query better.
SELECT mytable.code,
mytable.value,
mytable.date,
mytable.relation_id
FROM mytable
JOIN (
SELECT code,
max(date) as date,
relation_id
FROM mytable
GROUP BY code, relation_id
) Q1
ON Q1.code = mytable.code
AND Q1.date = mytable.date
AND Q1.relation_id = mytable.relation_id
Other option:
SELECT DISTINCT Code,
Relation_ID,
FIRST_VALUE(Value) OVER (PARTITION BY Code, Relation_ID ORDER BY Date DESC) Value,
FIRST_VALUE(Date) OVER (PARTITION BY Code, Relation_ID ORDER BY Date DESC) Date
FROM mytable
This will return top value for what ever you partition by, and for whatever you order by.
I believe we can try something like this
SELECT CODE,Relation_ID,Date,MAX(value)value FROM mytable
GROUP BY CODE,Relation_ID,Date

No rowid or key need most recent row

I am trying my hardest to get a list of the most recent rows by date in a DB2 file. The file has no unique id, so I am trying to get the entries by matching a set of columns. I need DESCGA most importantly as that changes often. When it does they keep another row for historical reasons.
SELECT B.COGA, B.COMSUBGA, B.ACCTGA, B.PRFXGA, B.DESCGA
FROM mylib.myfile B
WHERE
(
SELECT COUNT(*)
FROM
(
SELECT A.COGA,A.COMSUBGA,A.ACCTGA,A.PRFXGA,MAX(A.DATEGA) AS EDATE
FROM mylib.myfile A
GROUP BY A.COGA, A.COMSUBGA, A.ACCTGA, A.PRFXGA
) T
WHERE
(B.ACCTGA = T.ACCTGA AND
B.COGA = T.COGA AND
B.COMSUBGA = T.COMSUBGA AND
B.PRFXGA = T.PRFXGA AND
B.DATEGA = T.EDATE)
) > 1
This is what I am trying and so far I get 0 results.
If I remove
B.ACCTGA = T.ACCTGA AND
It will return results (of course wrong).
I am using ODBC in VS 2013 to structure this query.
I have a table with the following
| a | b | descri | date |
-----------------------------
| 1 | 0 | string | 20140102 |
| 2 | 1 | string | 20140103 |
| 1 | 1 | string | 20140101 |
| 1 | 1 | string | 20150101 |
| 1 | 0 | string | 20150102 |
| 2 | 1 | string | 20150103 |
| 1 | 1 | string | 20150103 |
and i need
| 1 | 0 | string | 20150102 |
| 2 | 1 | string | 20150103 |
| 1 | 1 | string | 20150103 |
You can use row_number():
select t.*
from (select t.*,
row_number() over (partition by a, b order by date desc) as seqnum
from mylib.myfile t
) t
where seqnum = 1;

sql - select row from group based on multiple values

I have a table like:
| ID | Val |
+-------+-----+
| abc-1 | 10 |
| abc-2 | 30 |
| cde-1 | 10 |
| cde-2 | 10 |
| efg-1 | 20 |
| efg-2 | 11 |
and would like to get the result based on the substring(ID, 1, 3) and minimum value and ist must be only the first in case the Val has duplicates
| ID | Val |
+-------+-----+
| abc-1 | 10 |
| cde-1 | 10 |
| efg-2 | 11 |
the problem is that I am stuck, because I cannot use group by substring(id,1,3), ID since it will then have again 2 rows (each for abc-1 and abc-2)
WITH
sorted
AS
(
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY substring(id,1,3) ORDER BY val, id) AS sequence_id
FROM
yourTable
)
SELECT
*
FROM
sorted
WHERE
sequence_id = 1
SELECT SUBSTRING(id,1,3),MIN(val) FROM Table1 GROUP BY SUBSTRING(id,1,3);
You were grouping the columns using both SUBSTRING(id,1,3),id instead of just SUBSTRING(id,1,3). It works perfectly fine.Check the same example in this below link.
http://sqlfiddle.com/#!3/fd9fc/1