Retrieve rows for time interval but also previous row of each - how to? - sql

I have a table like this:
Id FKId Amount1 Amount2 Date
-----------------------------------------------------
1 1 100,0000 33,0000 2018-01-18 19:57:39.403
2 2 50,0000 10,0000 2018-01-19 19:57:57.097
3 1 130,0000 40,0000 2018-01-20 19:58:13.660
5 2 44,0000 2,0000 2018-01-21 11:11:00.000
How to get rows from 3 - 5 (all that have dates 2018-01-21 or 2018-01-21) but also their previous row regarding FKId (1 and 2)?
Thank you

In most databases, you can use the ANSI standard lead() function:
select t.*
from (select t.*, lead(date) over (partition by fkid order by date) as next_date
from t
) t
where date in ('2018-01-20', '2018-01-21') or
next_date in ('2018-01-20', '2018-01-21');
Alternatively, if you just want all records where the date is bigger than some date and the previous record, this logic also works:
select t.*
from t
where t.date >= (select max(t2.date)
from t t2
where t2.fkid = t.fkid and t2.date < '2018-01-20'
);

Related

SQL: How to create a daily view based on different time intervals using SQL logic?

Here is an example:
Id|price|Date
1|2|2022-05-21
1|3|2022-06-15
1|2.5|2022-06-19
Needs to look like this:
Id|Date|price
1|2022-05-21|2
1|2022-05-22|2
1|2022-05-23|2
...
1|2022-06-15|3
1|2022-06-16|3
1|2022-06-17|3
1|2022-06-18|3
1|2022-06-19|2.5
1|2022-06-20|2.5
...
Until today
1|2022-08-30|2.5
I tried using the lag(price) over (partition by id order by date)
But i can't get it right.
I'm not familiar with Azure, but it looks like you need to use a calendar table, or generate missing dates using a recursive CTE.
To get started with a recursive CTE, you can generate line numbers for each id (assuming multiple id values) in the source data ordered by date. These rows with row number equal to 1 (with the minimum date value for the corresponding id) will be used as the starting point for the recursion. Then you can use the DATEADD function to generate the row for the next day. To use the price values ​​from the original data, you can use a subquery to get the price for this new date, and if there is no such value (no row for this date), use the previous price value from CTE (use the COALESCE function for this).
For SQL Server query can look like this
WITH cte AS (
SELECT
id,
date,
price
FROM (
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY date) AS rn
FROM tbl
) t
WHERE rn = 1
UNION ALL
SELECT
cte.id,
DATEADD(d, 1, cte.date),
COALESCE(
(SELECT tbl.price
FROM tbl
WHERE tbl.id = cte.id AND tbl.date = DATEADD(d, 1, cte.date)),
cte.price
)
FROM cte
WHERE DATEADD(d, 1, cte.date) <= GETDATE()
)
SELECT * FROM cte
ORDER BY id, date
OPTION (MAXRECURSION 0)
Note that I added OPTION (MAXRECURSION 0) to make the recursion run through all the steps, since the default value is 100, this is not enough to complete the recursion.
db<>fiddle here
The same approach for MySQL (you need MySQL of version 8.0 to use CTE)
WITH RECURSIVE cte AS (
SELECT
id,
date,
price
FROM (
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY date) AS rn
FROM tbl
) t
WHERE rn = 1
UNION ALL
SELECT
cte.id,
DATE_ADD(cte.date, interval 1 day),
COALESCE(
(SELECT tbl.price
FROM tbl
WHERE tbl.id = cte.id AND tbl.date = DATE_ADD(cte.date, interval 1 day)),
cte.price
)
FROM cte
WHERE DATE_ADD(cte.date, interval 1 day) <= NOW()
)
SELECT * FROM cte
ORDER BY id, date
db<>fiddle here
Both queries produces the same results, the only difference is the use of the engine's specific date functions.
For MySQL versions below 8.0, you can use a calendar table since you don't have CTE support and can't generate the required date range.
Assuming there is a column in the calendar table to store date values ​​(let's call it date for simplicity) you can use the CROSS JOIN operator to generate date ranges for the id values in your table that will match existing dates. Then you can use a subquery to get the latest price value from the table which is stored for the corresponding date or before it.
So the query would be like this
SELECT
d.id,
d.date,
(SELECT
price
FROM tbl
WHERE tbl.id = d.id AND tbl.date <= d.date
ORDER BY tbl.date DESC
LIMIT 1
) price
FROM (
SELECT
t.id,
c.date
FROM calendar c
CROSS JOIN (SELECT DISTINCT id FROM tbl) t
WHERE c.date BETWEEN (
SELECT
MIN(date) min_date
FROM tbl
WHERE tbl.id = t.id
)
AND NOW()
) d
ORDER BY id, date
Using my pseudo-calendar table with date values ranging from 2022-05-20 to 2022-05-30 and source data in that range, like so
id
price
date
1
2
2022-05-21
1
3
2022-05-25
1
2.5
2022-05-28
2
10
2022-05-25
2
100
2022-05-30
the query produces following results
id
date
price
1
2022-05-21
2
1
2022-05-22
2
1
2022-05-23
2
1
2022-05-24
2
1
2022-05-25
3
1
2022-05-26
3
1
2022-05-27
3
1
2022-05-28
2.5
1
2022-05-29
2.5
1
2022-05-30
2.5
2
2022-05-25
10
2
2022-05-26
10
2
2022-05-27
10
2
2022-05-28
10
2
2022-05-29
10
2
2022-05-30
100
db<>fiddle here

How to cross join but using latest value in BIGQUERY

I have this table below
date
id
value
2021-01-01
1
3
2021-01-04
1
5
2021-01-05
1
10
And I expect output like this, where the date column is always increase daily and value column will generate the last value on an id
date
id
value
2021-01-01
1
3
2021-01-02
1
3
2021-01-03
1
3
2021-01-04
1
5
2021-01-05
1
10
2021-01-06
1
10
I think I can use cross join but I can't get my expected output and think that there are a special syntax/logic to solve this
Consider below approach
select * from `project.dataset.table`
union all
select missing_date, prev_row.id, prev_row.value
from (
select *, lag(t) over(partition by id order by date) prev_row
from `project.dataset.table` t
), unnest(generate_date_array(prev_row.date + 1, date - 1)) missing_date
I would write this using:
select dte, t.id, t.value
from (select t.*,
lead(date, 1, date '2021-01-06') over (partition by id order by date) as next_day
from `table` t
) t cross join
unnest(generate_date_array(
date,
ifnull(
date_add(next_date, interval -1 day), -- generate missing date rows
(select max(date) from `table`) -- add last row
)
)) dte;
Note that this requires neither union all nor window function to fill in the values.
alternative solution using last_value. You may explore the following query and customize your logic to generate days (if needed)
WITH
query AS (
SELECT
date,
id,
value
FROM
`mydataset.newtable`
ORDER BY
date ),
generated_days AS (
SELECT
day
FROM (
SELECT
MIN(date) min_dt,
MAX(date) max_dt
FROM
query),
UNNEST(GENERATE_DATE_ARRAY(min_dt, max_dt)) day )
SELECT
g.day,
LAST_VALUE(q.id IGNORE NULLS) OVER(ORDER BY g.day) id,
LAST_VALUE(q.value IGNORE NULLS) OVER(ORDER BY g.day) value,
FROM
generated_days g
LEFT OUTER JOIN
query q
ON
g.day = q.date
ORDER BY
g.day

Count new entries day by day

I would like to count new id's in each day. Saying new, I mean new relative to the day before.
Assume we have a table:
Date
Id
2021-01-01
1
2021-01-02
4
2021-01-02
5
2021-01-02
6
2021-01-03
1
2021-01-03
5
2021-01-03
7
My desired output, would look like this:
Date
Count(NewId)
2021-01-01
1
2021-01-02
3
2021-01-03
2
You can use two levels of aggregation:
select date, count(*)
from (select id, min(date) as date
from t
group by id
) i
group by date
order by date;
If by "relative to the day before" you mean that you want to count someone as new whenever they have no record on the previous day, then use lag() . . . carefully:
select date,
sum(case when prev_date = date - interval '1' day then 0 else 1 end)
from (select t.*,
lag(date) over (partition by id order by date) as prev_date
from t
) t
group by date
order by date;
here is another way, probably the simplest :
select t1.Date, count(*) from table t1
where id not in (select id from table t2 where t2.date = t1.date- interval '1 day')
group by t1.Date
Maybe this other option could also do the job, but being honest I would prefer the #GordonLinoff answer:
select date, count(*)
from your_table t
where not exists (
select 1
from your_table tt
where tt.Id=t.id
and tt.date = date_sub(t.date,1)
)
group by date

SQL: Get next date value per row

I've a table containing a date column.
ID | Date
----|-----------
1 | 2000-01-01
2 | 2000-02-01
3 | 2000-02-01
4 | 2000-03-01
I need a select that returns for each row, the ID, the Date and the smallest date (of all dates in the table) that is larger than the current date.
ID | Date | Next date
----+------------+------------
1 | 2000-01-01 | 2000-02-01
2 | 2000-02-01 | 2000-03-01
3 | 2000-02-01 | 2000-03-01
4 | 2000-03-01 | (NULL)
My first approach was
SELECT id, date, LEAD (date, 1) OVER (ORDER BY date NULLS LAST) AS next_date
FROM t
But this only works, if the values in column DATE are unique.
Any ideas?
You could use an analytic function with a windowing clause. lead() doesn't support a windowing clause, so you need use one that does like min() or first_value():
FIRST_VALUE ("Date")
OVER (ORDER BY "Date" RANGE BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING)
The default windowing clause is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, which would give all your rows the same value of 2000-01-01 and using a ROWS window would run into the same problem you're having with lead() with duplicate dates (ID 2 would still get 2000-02-01; and ID 4 would get 2000-03-01 instead of null if you you used ROWS BETWEEN CURRENT ROW... rather than 1 FOLLOWING).
Demo using this range:
with t (ID, "Date") as (
select 1, date '2000-01-01' from dual
union all select 2, date '2000-02-01' from dual
union all select 3, date '2000-02-01' from dual
union all select 4, date '2000-03-01' from dual
)
select id, "Date", FIRST_VALUE ("Date") OVER (ORDER BY "Date"
RANGE BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING) AS next_date
FROM t;
ID Date NEXT_DATE
---------- ---------- ----------
1 2000-01-01 2000-02-01
2 2000-02-01 2000-03-01
3 2000-02-01 2000-03-01
4 2000-03-01
Only rows where the date value is higher than the current row are considered. And this still only has to hit the table once.
(I've put "Date" in double-quotes because date is a reserved word; from your sample data it looks like a quoted identifier, but it isn't quoted in your query, so it's probably just got a more sensible name really...)
select * , (select min(t2.date) from table t2 where t2.date > t1.date)
from Table t1
Above code is in sql server
To answer my own question. ;-) (Just to show another option to people stumbling across this post)
Another solution would be using a subselect:
SELECT t.id,
t.date,
(SELECT MIN (t.date)
FROM t t2
WHERE t2.date > t.date)
AS next_date
FROM t;
One approach would be to create a CTE containing the distinct dates and their immediate lead values. Then, join this CTE to your original table on the date to get the final result.
WITH cte AS (
SELECT t.date,
LEAD(t.date, 1) OVER (ORDER BY t.date NULLS LAST) AS next_date
FROM (SELECT DISTINCT date FROM yourTable) t
)
SELECT
t1.ID,
t1.date,
t2.next_date
FROM yourTable t1
INNER JOIN cte t2
ON t1.date = t2.date
Here is another approach, without "distinct":
select
ted.id,
ted.date_col,
(select
min(ted2.date_col)
from
test_date_v ted2
where
ted2.id != ted.id and
ted2.date_col > ted.date_col) next_date_col
from
test_date_v ted;

SQL Server- find all records within a certain date (not that straightforward!)

Ok. My SQL is pretty pants so I'm struggling to get my head around this.
I have a table that stores records complete with a time stamp.
What I want, is a list of uids where there are 2 or more records for that user within a time frame of 1 second of each other. Maybe I've made it more complicated in my head, just cannot figure it out.
Shortened version of table (pk ignored)
uid date
1 2015-01-01 10:00:30.020*
1 2015-01-01 10:00:30.300*
1 2015-01-01 10:00:30.500*
1 2015-01-01 10:00:39.000
1 2015-01-01 10:00:35.000
1 2015-01-01 10:00:37.800
2 2015-02-02 12:00:30.000
2 2015-02-02 14:00:30.000
2 2015-02-02 15:00:30.000
2 2015-02-02 18:00:30.000
3 2015-03-02 15:00:24.000
3 2015-03-02 15:00:20.000 *
3 2015-03-02 15:00:20.300 *
I've marked * next to the records I'd expect to match.
The results list I'd like is just a list of uid, so the result I'd want would just be
1
3
You can do this with exists:
select distinct uid
from t
where exists (select 1
from t t2
where t2.uid = t.uid and
t2.date > t.date and
t2.date <= t.date + interval 1 second
);
Note: The syntax for adding 1 second varies by database. But the above gives the idea for the logic.
In SQL Server, the syntax is:
select distinct uid
from t
where exists (select 1
from t t2
where t2.uid = t.uid and
t2.date > t.date and
t2.date <= dateadd(second, 1, t.date)
);
EDIT:
Or, in SQL Server 2012+, a faster alternative is to use lead() or lag():
select distinct uid
from (select t.*, lead(date) over (partition by uid order by date) as next_date
from t
) t
where next_date < dateadd(second, 1, date);
If you want the records, not just the uids, then you need to get both:
select t.*
from (select t.*,
lag(date) over (partition by uid order by date) as prev_date,
lead(date) over (partition by uid order by date) as next_date
from t
) t
where next_date <= dateadd(second, 1, date) or
prev_date >= dateadd(second, -1, date);