SQL min / max with all fields - sql

I am facing a simple problem with an SQL query that I do not know how to tackle.
I have a table with the following structure
CITY COUNTRY DATES TEMPERATURE
Note that for a given country, I can have several cities. And, for a given city, I have several rows giving me the TEMPERATURE at each available DATE. This is just a time serie.
I would like to write a query which gives me for every cities the DATE where the TEMPERATURE is the MIN and the DATE where the TEMPERATURE is the MAX. The query should return something like that:
CITY COUNTRY DATE_MIN_TEMPERATURE MIN_TEMPERATURE DATE_MAX_TEMPERATURE MAX_TEMPERATURE
Any idea on how to achieve this?
Best regards,
Deny

Oracle provides keep/dense_rank first for this purpose:
select city,
min(temperature) as min_temperature,
max(date) keep (dense_rank first order by temperature asc) as min_temperature_date,
max(temperature) as max_temperature,
max(date) keep (dense_rank first order by temperature desc) as max_temperature_date
from t
group by city;
Note that this returns only one date if there are ties. If you want to handle that, more logic is needed:
select city, min(temperature) as min_temperature,
listagg(case when seqnum_min = 1 then date end, ',') within group (order by date) as mindates,
max(temperature) as max_temperature,
listagg(case when seqnum_max = 1 then date end, ',') within group (order by date) as maxdates,
from (select t.*,
rank() over (partition by city order by temperature) as seqnum_min,
rank() over (partition by city order by temperature desc) as seqnum_max
from t
) t
where seqnum_min = 1 or seqnum_max = 1
group by city;

In Oracle 11 and above, you can use PIVOT. In the solution below I use LISTAGG to show all the dates in case of ties. Another option is, in the case of ties, to show the most recent date when the extreme temperature was reached; if that is preferred, simply replace LISTAGG(dt, ....) (including the WITHIN GROUP clause) with MAX(dt). However, in that case the first solution offered by Gordon (using the first function) is more efficient anyway - no need for pivoting.
Note that I changed "date" to "dt" - DATE is a reserved word in Oracle. I also show the rows by country first, then city (the more logical ordering). I created test data in a WITH clause, but the solution is everything below the comment line.
with
inputs ( city, country, dt, temperature ) as (
select 'Palermo', 'Italy' , date '2014-02-13', 3 from dual union all
select 'Palermo', 'Italy' , date '2002-01-23', 3 from dual union all
select 'Palermo', 'Italy' , date '1998-07-22', 42 from dual union all
select 'Palermo', 'Italy' , date '1993-08-24', 30 from dual union all
select 'Maseru' , 'Lesotho', date '1994-01-11', 34 from dual union all
select 'Maseru' , 'Lesotho', date '2004-08-13', 12 from dual
)
-- >> end test data; solution (SQL query) begins with the next line
select country, city,
"'min'_DT" as date_min_temp, "'min'_TEMP" as min_temp,
"'max'_DT" as date_max_temp, "'max'_TEMP" as max_temp
from (
select city, country, dt, temperature,
case when temperature = min(temperature)
over (partition by city, country) then 'min'
when temperature = max(temperature)
over (partition by city, country) then 'max'
end as flag
from inputs
)
pivot ( listagg(to_char(dt, 'dd-MON-yyyy'), ', ')
within group (order by dt) as dt, min(temperature) as temp
for flag in ('min', 'max'))
order by country, city -- ORDER BY is optional
;
COUNTRY CITY DATE_MIN_TEMP MIN_TEMP DATE_MAX_TEMP MAX_TEMP
------- ------- ------------------------ ---------- -------------- ----------
Italy Palermo 23-JAN-2002, 13-FEB-2014 3 22-JUL-1998 42
Lesotho Maseru 13-AUG-2004 12 11-JAN-1994 34
2 rows selected.

Instead of keep/dense_rank first function you can also use FIRST_VALUE and LAST_VALUE:
select distinct city,
MIN(temperature) OVER (PARTITION BY city) as min_temperature,
FIRST_VALUE(date) OVER (PARTITION BY city ORDER BY temperature) AS min_temperature_date,
MAX(temperature) OVER (PARTITION BY city) as max_temperature,
LAST_VALUE(date) OVER (PARTITION BY city ORDER BY temperature) AS max_temperature_date
FROM t;

Related

How to differentiate iteration using date filed in bigquery

I have a process that occur every 30 days but can take few days.
How can I differentiate between each iteration in order to sum the output of the process?
for Example
the output I except is
Name
Date
amount
iteration (optional)
Sophia Liu
2016-01-01
4
1
Sophia Liu
2016-02-01
5
2
Nikki Leith
2016-01-02
5
1
Nikki Leith
2016-02-01
10
2
I tried using lag function on the date filed and using the difference between that column and the date column.
WITH base AS
(SELECT 'Sophia Liu' as name, DATE '2016-01-01' as date, 3 as amount
UNION ALL SELECT 'Sophia Liu', DATE '2016-01-02', 1
UNION ALL SELECT 'Sophia Liu', DATE '2016-02-01', 3
UNION ALL SELECT 'Sophia Liu', DATE '2016-02-02', 2
UNION ALL SELECT 'Nikki Leith', DATE '2016-01-02', 5
UNION ALL SELECT 'Nikki Leith', DATE '2016-02-01', 5
UNION ALL SELECT 'Nikki Leith', DATE '2016-02-02', 3
UNION ALL SELECT 'Nikki Leith', DATE '2016-02-03', 1
UNION ALL SELECT 'Nikki Leith', DATE '2016-02-04', 1)
select
name
,date
,lag(date) over (partition by name order by date) as lag_func
,date_diff(date,lag(date) over (partition by name order by date),day) date_differacne
,case when date_diff(date,lag(date) over (partition by name order by date),day) >= 10
or date_diff(date,lag(date) over (partition by name order by date),day) is null then true else false end as new_iteration
,amount
from base
Edited answer
After your clarification and looking at what's actually in your SQL code. I'm guessing you are looking for a solution to what's called a gaps and islands problem. That is, you want to identify the "islands" of activity and sum the amount for each iteration or island. Taking your example you can first identify the start of a new session (or "gap") and then use that to create a unique iteration ("island") identifier for each user. You can then use that identifier to perform a SUM().
gaps as (
select
name,
date,
amount,
if(date_diff(date, lag(date,1) over(partition by name order by date), DAY) >= 10, 1, 0) new_iteration
from base
),
islands as (
select
*,
1 + sum(new_iteration) over(partition by name order by date) iteration_id
from gaps
)
select
*,
sum(amount) over(partition by name, iteration_id) iteration_amount
from islands
Previous answer
Sounds like you just need a RANK() to count the iterations in your window functions. Depending on your need you can then sum cumulative or total amounts in a similar window function. Something like this:
select
name
,date
,rank() over (partition by name order by date) as iteration
,sum(amount) over (partition by name order by date) as cumulative_amount
,sum(amount) over (partition by name) as total_amount
,amount
from base

Select Must Return only one row against every id

I'm using the below query to get results. The purpose of the query is to get the latest sales_amount of every customer, but when the sales are two or more in the given date range, the query returns all the records, how I can get only the latest records against the id. The same id should contain only one row against each id.
SELECT id,
Max(date),
sales_amount
FROM customer
WHERE date BETWEEN '2020-08-01' AND '2020-08-15'
AND id = 1001
GROUP BY id,
sales_amount;
You can use row number in a sub query to give you an ordering and then just pick the first one.
SELECT *
FROM (
SELECT id, date, sales_amount,
ROW_NUMBER() OVER (ORDER BY date DESC) as RN
FROM customer
WHERE date BETWEEN '2020-08-01' AND '2020-08-15'
AND id = 1001
) sub
WHERE RN = 1
Note if you want to do it for all customers then this is the query
SELECT *
FROM (
SELECT id, date, sales_amount,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY date DESC) as RN
FROM customer
WHERE date BETWEEN '2020-08-01' AND '2020-08-15'
) sub
WHERE RN = 1
That will give you the most recent row for each customer.
this is because you have included the ID in teh group by clause, all ID's will be returned.
Have you tried adding:
SELECT id,
Max(date),
sales_amount
FROM customer
WHERE date BETWEEN '2020-08-01' AND '2020-08-15'
AND id = 1001
GROUP BY id,
sales_amount
ORDER BY date DESC
LIMIT 1;
select * from (
SELECT ROW_NUMBER() as Rn,
id,
Max(date),
sales_amount
FROM customer
WHERE date BETWEEN '2020-08-01' AND '2020-08-15'
AND id = 1001
GROUP BY id
order by max(date)
)
where Rn = 1

Top 2 per month in SQL

I have this dataset, which has dates and products for cities:
CREATE TABLE my_table (
the_id varchar(5) NOT NULL,
the_date timestamp NOT NULL,
the_city varchar(5) NOT NULL,
the_product varchar(1) NOT NULL
);
INSERT INTO my_table
VALUES ('VIS01', '2019-05-02 09:00:00','LISBO','A'),
('VIS02', '2019-05-04 12:00:00','EVORA','A'),
('VIS03', '2019-05-05 18:00:00','LISBO','B'),
('VIS04', '2019-05-06 18:30:00','PORTO','B'),
('VIS05', '2019-05-15 12:05:00','PORTO','C'),
('VIS06', '2019-06-02 18:06:00','EVORA','C'),
('VIS07', '2019-06-02 18:07:00','PORTO','A'),
('VIS08', '2019-06-04 18:08:00','EVORA','B'),
('VIS09', '2019-06-07 18:09:00','LISBO','B'),
('VIS10', '2019-06-09 18:10:00','LISBO','D'),
('VIS11', '2019-06-12 18:11:00','EVORA','D'),
('VIS12', '2019-06-15 18:12:00','LISBO','E'),
('VIS13', '2019-06-15 18:13:00','EVORA','F'),
('VIS14', '2019-06-18 18:14:00','PORTO','G'),
('VIS15', '2019-06-23 18:15:00','LISBO','A'),
('VIS16', '2019-06-25 18:16:00','LISBO','A'),
('VIS17', '2019-06-27 18:17:00','LISBO','F'),
('VIS18', '2019-06-27 18:18:00','LISBO','A'),
('VIS19', '2019-06-28 18:19:00','LISBO','A'),
('VIS20', '2019-06-30 18:20:00','EVORA','D'),
('VIS21', '2019-07-01 18:21:00','EVORA','D'),
('VIS22', '2019-07-04 18:30:00','EVORA','D'),
('VIS23', '2019-07-04 18:31:00','EVORA','B'),
('VIS24', '2019-07-06 18:40:00','EVORA','K'),
('VIS25', '2019-07-12 18:50:00','EVORA','G'),
('VIS26', '2019-07-15 18:00:00','PORTO','C'),
('VIS27', '2019-07-18 18:00:00','PORTO','C'),
('VIS28', '2019-07-25 18:00:00','PORTO','B'),
('VIS29', '2019-07-30 18:00:00','PORTO','M');
And I want the top two per month. The expected result should be:
month product count
2019-05 A 2
2019-05 B 2
2019-06 A 5
2019-06 D 3
2019-07 C 2
2019-07 D 2
But I'm not quite sure how to group by month. Please, any help will be greatly appreciated.
First, you can use to_char(the_date,'YYYY-MM') to get the year and month in the right format.
Next, you can use count(*) to group by the month and product, and row_number() to give a sequence number to each row in the groups.
SELECT to_char(the_date,'YYYY-MM') as month,
the_product as product,
count(*) as p_count,
row_number() over (partition by to_char(the_date,'YYYY-MM') order by count(*) desc) as seq
FROM my_table
group by month, product
Last, you can wrap that in an outer query to select just the columns and rows that you want.
SELECT month, product, p_count as count
FROM (
SELECT to_char(the_date,'YYYY-MM') as month,
the_product as product,
count(*) as p_count,
row_number() over (partition by to_char(the_date,'YYYY-MM') order by count(*) desc) as seq
FROM my_table
group by month, product
) as foo
where foo.seq <= 2;
You can use aggregation and window functions:
select mp.*
from (select date_trunc('month', the_date) as yyyymm,
the_product, count(*) as cnt,
row_number() over (partition by date_trunc('month', the_date) order by count(*) desc) as seqnum
from my_table
group by yyyymm, the_product
) mp
where seqnum <= 2;
In postgresql, I believe you can extract every parts of the timestamp using the Extract function.
e.g.:
SELECT the_date, EXTRACT(MONTH from the_date) as MONTH
the_date
MONTH
'2019-08-05'
08
that said, you can then group by Product, then Month, and Select the TOP 2
SELECT EXTRACT(MONTH from the_date) as month, the_product, count (*) FROM my_table
GROUP BY EXTRACT(MONTH from the_date), the_product
ORDER BY count(*)
LIMIT 2
There might be some optimization to do since I don't have a Database to test the query, but it might give you a good start

Retrieve recent 5 days forecast for each cities with latest issue date

I need to retrieve the recent 5 days forecast info for each cities.
My table looks like below
The real problem is with the issue date.
the city may contain several forecast info for the same date with distinct issue date.
I need to retrieve recent 5 records for each cities with latest issue date and group by forecast date
I have tried something like below but not giving the expected result
SELECT * FROM(
SELECT
ROW_NUMBER () OVER (PARTITION BY CITY_ID ORDER BY FORECAST_DATE DESC, ISSUE_DATE DESC) AS rn,
CITY_ID, FORECAST_DATE, ISSUE_DATE
FROM
FORECAST
GROUP BY FORECAST_DATE
) WHERE rn <= 5
Any suggestion or advice will be helpful
This will get the latest issued forecast per day over the most recent 5 days for each city:
SELECT *
FROM (
SELECT f.*,
DENSE_RANK() OVER ( PARTITION BY city_id ORDER BY forecast_date DESC )
AS forecast_rank,
ROW_NUMBER() OVER ( PARTITION BY city_id, forecast_date ORDER BY issue_date DESC )
AS issue_rn
FROM Forecast f
)
WHERE forecast_rank <= 5
AND issue_rn = 1;
Partition by works like group by but for the function only.
Try
with CTE as
(
select t1.*,
row_number() over (partition by city_id, forecast_date order by issue_date desc) as r_ord
from Forecast
)
select CTE.*
from CTE
where r_ord <= 5
Try this
SELECT * FROM(
SELECT
ROW_NUMBER () OVER (PARTITION BY CITY_ID, FORECAST_DATE order by ISSUE_DATE DESC) AS rn,
CITY_ID, FORECAST_DATE, ISSUE_DATE
FROM
FORECAST
) WHERE rn <= 5

Get Date of Change

I have a table containing Dates and Statuses. I wish to get the date that the status changed to the most recent status. Sample data:
DATE STATUS
01/01/2000 P
02/01/2000 A
03/01/2000 C
04/01/2000 A
05/01/2000 A
06/01/2000 A
So in this instance the most recent status is A and it changed to this on 04/01/2000. (The 02/01/2000 row should be ignored in this situation)
Any suggestions for how to go about selecting this row?
At first, I misunderstood the question. You need to get the earliest date of the last status.
You can group sequences of like statuses using a trick -- a difference of row numbers. The difference (in the query below) is constant for sequences that are the same. Then you can use aggregation to get the minimum date and select the latest one:
select mindate
from (select min(date) as mindate
from (select t.*,
row_number() over (order by date) as seqnum1,
row_number() over (partition by status order by date) as seqnum2
from table t
) t
group by status, (seqnum1 - seqnum2)
order by mindate desc
) t
where rownum = 1
EDIT:
In any case, the right way to do this is using lag():
select max(date)
from (select t.*, lag(status) over (order by date) as prev_status
from table t
)
where prev_status <> status or prev_status is null;
Here is the SQL Fiddle.
You can do this using lag or lead. Here I'm using lead, ordering by date descending to find the previous status date (if it's null I'm just supplying the date, which is needed in case there's only one record).
select max(date)
from (
select status, date, nvl(lead(status) over (order by date desc),date) as previous_status
from t
order by date desc
)
where status <> previous_status;
Something like this ought to do the trick:
with sample_data as (select to_date('01/01/2000', 'dd/mm/yyyy') dt, 'P' status from dual union all
select to_date('02/01/2000', 'dd/mm/yyyy') dt, 'A' status from dual union all
select to_date('03/01/2000', 'dd/mm/yyyy') dt, 'C' status from dual union all
select to_date('04/01/2000', 'dd/mm/yyyy') dt, 'A' status from dual union all
select to_date('05/01/2000', 'dd/mm/yyyy') dt, 'A' status from dual union all
select to_date('06/01/2000', 'dd/mm/yyyy') dt, 'A' status from dual),
results1 as (select dt,
status,
row_number() over (order by dt) - row_number() over (partition by status order by dt) grp
from sample_data),
results2 as (select status, min(dt) min_dt, grp, max(min(dt)) over () max_min_dt
from results1
group by status, grp)
select status, min_dt
from results2
where min_dt = max_min_dt;
STATUS MIN_DT
------ ----------
A 04/01/2000