I have a table wherein I have to report the the present status and the date from which this status is applicable.
Example:
Status date
1 26 July
1 24 July
1 22 July
2 21 July
2 19 July
1 16 July
0 14 July
Given this, i want to display the current status as 1 and date as 22 July> I am not sure how to go about this.
Status date
1 25 July
1 24 July
1 20 July
In this case, I want to show the status as 1 and date as 20th July
This should pull what you need using very standard SQL:
-- Get the oldest date that is the current Status
select Status, min(date) as date
from MyTable
where date > (
-- Get the most recent date that isn't the current Status
select max(date)
from MyTable
where Status != (
-- Get the current Status
select Status -- May need max/min here for multiple statuses on same date
from MyTable
where date = (
-- Get the most recent date
select max(date)
from MyTable
)
)
)
group by Status
I'm assuming that the date column is of a data type suitable for sorting properly (as in, not a string, unless you can cast it).
This is a little inelegant, but it should work
SELECT status, date
FROM my_table t
WHERE status = ALL (SELECT status
FROM my_table
WHERE date = ALL(SELECT MAX(date) FROM my_table))
AND date = ALL (SELECT MIN(date)
FROM my_table t1
WHERE t1.status = t.status
AND NOT EXISTS (SELECT *
FROM my_table t2
WHERE t2.date > t1.date AND t2.status <> t1.status))
Another option is to use a window function like LEAD (or LAG depending on how you order your results). In this example we mark the row when the status changes with the date, order the results and exclude rows other than the first one:
with test_data as (
select 1 status, date '2012-07-26' status_date from dual union all
select 1 status, date '2012-07-24' status_date from dual union all
select 1 status, date '2012-07-22' status_date from dual union all
select 2 status, date '2012-07-21' status_date from dual union all
select 2 status, date '2012-07-19' status_date from dual union all
select 1 status, date '2012-07-16' status_date from dual union all
select 0 status, date '2012-07-14' status_date from dual)
select status, as_of
from (
select status
, case when status != lead(status) over (order by status_date desc) then status_date else null end as_of
from test_data
order by as_of desc nulls last
)
where rownum = 1;
Addendum:
The LEAD and LAG functions accept two more parameters: offset and default. The offset defaults to 1, and default defaults to null. The default allows you to determine what value to consider when you are at the beginning or end of the result set. In your case when the status has never changed, a default is needed. In this example I supplied -1 as a status default because I am assuming that status value is not part of your expected set:
with test_data as (
select 1 status, date '2012-07-25' status_date from dual union all
select 1 status, date '2012-07-24' status_date from dual union all
select 1 status, date '2012-07-20' status_date from dual)
select status, as_of
from (
select status
, case when status != lead(status,1,-1) over (order by status_date desc) then status_date else null end as_of
from test_data
order by as_of desc nulls last
)
where rownum = 1;
You can play around with the case condition (equals/not equals), the order by clause in the lead function, and the desired default to accomplish your needs.
Related
I have a process that occur every 30 days but can take few days.
How can I differentiate between each iteration in order to sum the output of the process?
for Example
the output I except is
Name
Date
amount
iteration (optional)
Sophia Liu
2016-01-01
4
1
Sophia Liu
2016-02-01
5
2
Nikki Leith
2016-01-02
5
1
Nikki Leith
2016-02-01
10
2
I tried using lag function on the date filed and using the difference between that column and the date column.
WITH base AS
(SELECT 'Sophia Liu' as name, DATE '2016-01-01' as date, 3 as amount
UNION ALL SELECT 'Sophia Liu', DATE '2016-01-02', 1
UNION ALL SELECT 'Sophia Liu', DATE '2016-02-01', 3
UNION ALL SELECT 'Sophia Liu', DATE '2016-02-02', 2
UNION ALL SELECT 'Nikki Leith', DATE '2016-01-02', 5
UNION ALL SELECT 'Nikki Leith', DATE '2016-02-01', 5
UNION ALL SELECT 'Nikki Leith', DATE '2016-02-02', 3
UNION ALL SELECT 'Nikki Leith', DATE '2016-02-03', 1
UNION ALL SELECT 'Nikki Leith', DATE '2016-02-04', 1)
select
name
,date
,lag(date) over (partition by name order by date) as lag_func
,date_diff(date,lag(date) over (partition by name order by date),day) date_differacne
,case when date_diff(date,lag(date) over (partition by name order by date),day) >= 10
or date_diff(date,lag(date) over (partition by name order by date),day) is null then true else false end as new_iteration
,amount
from base
Edited answer
After your clarification and looking at what's actually in your SQL code. I'm guessing you are looking for a solution to what's called a gaps and islands problem. That is, you want to identify the "islands" of activity and sum the amount for each iteration or island. Taking your example you can first identify the start of a new session (or "gap") and then use that to create a unique iteration ("island") identifier for each user. You can then use that identifier to perform a SUM().
gaps as (
select
name,
date,
amount,
if(date_diff(date, lag(date,1) over(partition by name order by date), DAY) >= 10, 1, 0) new_iteration
from base
),
islands as (
select
*,
1 + sum(new_iteration) over(partition by name order by date) iteration_id
from gaps
)
select
*,
sum(amount) over(partition by name, iteration_id) iteration_amount
from islands
Previous answer
Sounds like you just need a RANK() to count the iterations in your window functions. Depending on your need you can then sum cumulative or total amounts in a similar window function. Something like this:
select
name
,date
,rank() over (partition by name order by date) as iteration
,sum(amount) over (partition by name order by date) as cumulative_amount
,sum(amount) over (partition by name) as total_amount
,amount
from base
I have a very simple query that results in two rows:
SELECT DISTINCT
id,
trunc(start_date) start_date
FROM example.table
WHERE ID = 1
This results in the following rows:
id start_date
1 7/1/2012
1 9/1/2016
I want to add a column that simply shows the previous date for each row. So I'm using the following:
SELECT DISTINCT id,
Trunc(start_date) start_date,
Lag(start_date, 1)
over (
ORDER BY start_date) pdate
FROM example.table
WHERE id = 1
However, when I do this, I get four rows instead of two:
id start_date pdate
1 7/1/2012 NULL
1 7/1/2012 7/1/2012
1 9/1/2016 7/1/2012
1 9/1/2016 9/1/2012
If I change the offset to 2 or 3 the results remain the same. If I change the offset to 0, I get two rows again but of course now the start_date == pdate.
I can't figure out what's going on
Use an explicit GROUP BY instead:
SELECT id, trunc(start_date) as start_date,
LAG(trunc(start_date)) OVER (PARTITION BY id ORDER BY trunc(start_date))
FROM example.table
WHERE ID = 1
GROUP BY id, trunc(start_date)
The reason for this is: the order of execution of an SQL statements, is that LAG runs before the DISTINCT.
You actually want to run the LAG after the DISTINCT, so the right query should be:
WITH t1 AS (
SELECT DISTINCT id, trunc(start_date) start_date
FROM example.table
WHERE ID = 1
)
SELECT *, LAG(start_date, 1) OVER (ORDER BY start_date) pdate
FROM t1
Dont know how to solve the problem.May be you can show right direction or give a link.
I have a table:
id Date
23 01.01.2020
23 03.01.2020
23 04.01.2020
56 07.01.2020
56 08.01.2020
87 11.01.2020
23 12.01.2020
23 18.01.2020
I want to aggregate data (id, Date_min) and add new column like this one:
id Date_min Date_new
23 01.01.2020 07.01.2020
56 07.01.2020 11.01.2020
87 11.01.2020 12.01.2020
23 12.01.2020 18.01.2020
In column Data_new I want to see next user's first date. If there is no next user, add user`s max date
LEAD will give you the next date, but we also have the slight sticking problem that your ID repeats, so we need something to make the second 23 distinct from the first. For that I guess we can establish a counter that ticks up every time the ID changes:
with a as(
select '23' as id, '01.01.2020' as "date" union all
select '23' as id, '03.01.2020' as "date" union all
select '23' as id, '04.01.2020' as "date" union all
select '56' as id, '07.01.2020' as "date" union all
select '56' as id, '08.01.2020' as "date" union all
select '87' as id, '11.01.2020' as "date" union all
select '23' as id, '12.01.2020' as "date" union all
select '23' as id, '18.01.2020' as "date"
), b as (
SELECT *, LAG(id) OVER(ORDER BY "date") as last_id FROM a
), c AS(
SELECT *,
LEAD("date") OVER(ORDER BY "date") as next_date,
SUM(CASE WHEN last_id <> id THEN 1 ELSE 0 END) OVER(ORDER BY "date" ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) id_ctr
FROM b
)
SELECT id, MIN("date"), MAX(next_date)
FROM c
GROUP BY id, id_ctr
I haven't got a PG instance to test this on, but it works in SQLS and I'm pretty sure that PG supports everything that SQLS does - there isn't any SQLS specific stuff here
a takes the place of your table - you can drop it from your query and just straight d a with b as (select... from yourtablenamehere)
b calculates the previous ID; we'll use this to detect if the id has changed between current row and prev row. If it changes we'll put a 1 otherwise a 0. When these are summed as a running total it effectively means the counter ticks up every time the ID changes, so we can group by this counter as well as the ID to split our two 23s apart. We need to do this separately because window functions can't be nested
c takes the last_id and does the running total. It also does the next_date with a simple window function that pulls the date from the following row (rows ordered by date). the ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW is techincally unnecessary as it's the default action for a SUM OVER ORDERBY, but I find being explicit helps document/change if needed
then all that is required is to select the id, min date and max next_date, but throw the counter in there too to split the 23s up - you're allowed to group by more columns than you select but not the other way round
This is a particularly simply type of gaps-and-islands problem.
You can simply use lag() to determine the first row of each bunch of rows and then a lead() to get date_new:
select id, date as date_min,
lead(date, 1, max_date) over (order by date) as date_max
from (select t.*,
lag(id) over (order by date) as prev_id,
max(date) over () as max_date
from t
) t
where prev_id is null or prev_id <> id;
Here is a db<>fiddle.
Three window functions and no aggregation: this should be by far the fastest approach to this problem.
Need Suggestion to make it dynamic On Dates.
Expected:
Date, Total Sellers, Sellers From Previous Date
Currently:
Data in table(active_seller_codes): date, seller_code
Queries:
-- Date Wise Sellers Count
select date,count(distinct seller_code) as Sellers_COunt
from active_seller_codes where date between '2016-12-15' AND '2016-12-15'
-- Sellers from previous Days
select date,count(distinct seller_code) as Last_Day_Seller
from active_seller_codes
where date between '2016-12-15' AND '2016-12-15'
and seller_code IN(
select seller_code from active_seller_codes
where date between '2016-12-14' AND '2016-12-14'
)
group by 1
Database Using: Vertica
Reading attentively, you seem to want one row in the report, with the data from the search date in the first two columns and the data of the day before the search date in the third and fourth column, like so:
sales_date|sellers_count|prev_date |prev_sellers_count
2016-12-15| 8|2016-12-14| 5
The solution could be something like this (without the first Common Table Expression, which, in my case, contains the data, but in your case, the data would be in your active_seller_codes table.
WITH
-- initial input
(sales_date,seller_code) AS (
SELECT DATE '2016-12-15',42
UNION ALL SELECT DATE '2016-12-15',43
UNION ALL SELECT DATE '2016-12-15',44
UNION ALL SELECT DATE '2016-12-15',45
UNION ALL SELECT DATE '2016-12-15',46
UNION ALL SELECT DATE '2016-12-15',47
UNION ALL SELECT DATE '2016-12-15',48
UNION ALL SELECT DATE '2016-12-15',49
UNION ALL SELECT DATE '2016-12-14',42
UNION ALL SELECT DATE '2016-12-14',44
UNION ALL SELECT DATE '2016-12-14',46
UNION ALL SELECT DATE '2016-12-14',48
UNION ALL SELECT DATE '2016-12-14',50
UNION ALL SELECT DATE '2016-12-13',42
UNION ALL SELECT DATE '2016-12-13',43
UNION ALL SELECT DATE '2016-12-13',44
UNION ALL SELECT DATE '2016-12-13',45
UNION ALL SELECT DATE '2016-12-13',46
UNION ALL SELECT DATE '2016-12-13',47
UNION ALL SELECT DATE '2016-12-13',48
UNION ALL SELECT DATE '2016-12-13',49
)
,
-- search argument this, in the real query, would come just after the WITH keyword
-- as the above would be the source table
search_dt(search_dt) AS (SELECT DATE '2016-12-15')
,
-- the two days we're interested in, de-duped
distinct_two_days AS (
SELECT DISTINCT
sales_date
, seller_code
FROM active_seller_codes
WHERE sales_date IN (
SELECT search_dt FROM search_dt -- the search date
UNION ALL SELECT search_dt - 1 FROM search_dt -- the day before
)
)
,
-- the two days we want one above the other,
-- with index for the final pivot
vertical AS (
SELECT
ROW_NUMBER() OVER (ORDER BY sales_date DESC) AS idx
, sales_date
, count(DISTINCT seller_code) AS seller_count
FROM distinct_two_days
GROUP BY 2
)
SELECT
MAX(CASE idx WHEN 1 THEN sales_date END) AS sales_date
, SUM(CASE idx WHEN 1 THEN seller_count END) AS sellers_count
, MAX(CASE idx WHEN 2 THEN sales_date END) AS prev_date
, SUM(CASE idx WHEN 2 THEN seller_count END) AS prev_sellers_count
FROM vertical
;
sales_date|sellers_count|prev_date |prev_sellers_count
2016-12-15| 8|2016-12-14| 5
I have a table containing Dates and Statuses. I wish to get the date that the status changed to the most recent status. Sample data:
DATE STATUS
01/01/2000 P
02/01/2000 A
03/01/2000 C
04/01/2000 A
05/01/2000 A
06/01/2000 A
So in this instance the most recent status is A and it changed to this on 04/01/2000. (The 02/01/2000 row should be ignored in this situation)
Any suggestions for how to go about selecting this row?
At first, I misunderstood the question. You need to get the earliest date of the last status.
You can group sequences of like statuses using a trick -- a difference of row numbers. The difference (in the query below) is constant for sequences that are the same. Then you can use aggregation to get the minimum date and select the latest one:
select mindate
from (select min(date) as mindate
from (select t.*,
row_number() over (order by date) as seqnum1,
row_number() over (partition by status order by date) as seqnum2
from table t
) t
group by status, (seqnum1 - seqnum2)
order by mindate desc
) t
where rownum = 1
EDIT:
In any case, the right way to do this is using lag():
select max(date)
from (select t.*, lag(status) over (order by date) as prev_status
from table t
)
where prev_status <> status or prev_status is null;
Here is the SQL Fiddle.
You can do this using lag or lead. Here I'm using lead, ordering by date descending to find the previous status date (if it's null I'm just supplying the date, which is needed in case there's only one record).
select max(date)
from (
select status, date, nvl(lead(status) over (order by date desc),date) as previous_status
from t
order by date desc
)
where status <> previous_status;
Something like this ought to do the trick:
with sample_data as (select to_date('01/01/2000', 'dd/mm/yyyy') dt, 'P' status from dual union all
select to_date('02/01/2000', 'dd/mm/yyyy') dt, 'A' status from dual union all
select to_date('03/01/2000', 'dd/mm/yyyy') dt, 'C' status from dual union all
select to_date('04/01/2000', 'dd/mm/yyyy') dt, 'A' status from dual union all
select to_date('05/01/2000', 'dd/mm/yyyy') dt, 'A' status from dual union all
select to_date('06/01/2000', 'dd/mm/yyyy') dt, 'A' status from dual),
results1 as (select dt,
status,
row_number() over (order by dt) - row_number() over (partition by status order by dt) grp
from sample_data),
results2 as (select status, min(dt) min_dt, grp, max(min(dt)) over () max_min_dt
from results1
group by status, grp)
select status, min_dt
from results2
where min_dt = max_min_dt;
STATUS MIN_DT
------ ----------
A 04/01/2000