SQL Server- find all records within a certain date (not that straightforward!)

SQL Server- find all records within a certain date (not that straightforward!) - sql

Ok. My SQL is pretty pants so I'm struggling to get my head around this.
I have a table that stores records complete with a time stamp.
What I want, is a list of uids where there are 2 or more records for that user within a time frame of 1 second of each other. Maybe I've made it more complicated in my head, just cannot figure it out.
Shortened version of table (pk ignored)
uid date
1 2015-01-01 10:00:30.020*
1 2015-01-01 10:00:30.300*
1 2015-01-01 10:00:30.500*
1 2015-01-01 10:00:39.000
1 2015-01-01 10:00:35.000
1 2015-01-01 10:00:37.800
2 2015-02-02 12:00:30.000
2 2015-02-02 14:00:30.000
2 2015-02-02 15:00:30.000
2 2015-02-02 18:00:30.000
3 2015-03-02 15:00:24.000
3 2015-03-02 15:00:20.000 *
3 2015-03-02 15:00:20.300 *
I've marked * next to the records I'd expect to match.
The results list I'd like is just a list of uid, so the result I'd want would just be
1
3

You can do this with exists:
select distinct uid
from t
where exists (select 1
from t t2
where t2.uid = t.uid and
t2.date > t.date and
t2.date <= t.date + interval 1 second
);
Note: The syntax for adding 1 second varies by database. But the above gives the idea for the logic.
In SQL Server, the syntax is:
select distinct uid
from t
where exists (select 1
from t t2
where t2.uid = t.uid and
t2.date > t.date and
t2.date <= dateadd(second, 1, t.date)
);
EDIT:
Or, in SQL Server 2012+, a faster alternative is to use lead() or lag():
select distinct uid
from (select t.*, lead(date) over (partition by uid order by date) as next_date
from t
) t
where next_date < dateadd(second, 1, date);
If you want the records, not just the uids, then you need to get both:
select t.*
from (select t.*,
lag(date) over (partition by uid order by date) as prev_date,
lead(date) over (partition by uid order by date) as next_date
from t
) t
where next_date <= dateadd(second, 1, date) or
prev_date >= dateadd(second, -1, date);

Related

SQL: How to create a daily view based on different time intervals using SQL logic?

Here is an example:
Id|price|Date
1|2|2022-05-21
1|3|2022-06-15
1|2.5|2022-06-19
Needs to look like this:
Id|Date|price
1|2022-05-21|2
1|2022-05-22|2
1|2022-05-23|2
...
1|2022-06-15|3
1|2022-06-16|3
1|2022-06-17|3
1|2022-06-18|3
1|2022-06-19|2.5
1|2022-06-20|2.5
...
Until today
1|2022-08-30|2.5
I tried using the lag(price) over (partition by id order by date)
But i can't get it right.

I'm not familiar with Azure, but it looks like you need to use a calendar table, or generate missing dates using a recursive CTE.
To get started with a recursive CTE, you can generate line numbers for each id (assuming multiple id values) in the source data ordered by date. These rows with row number equal to 1 (with the minimum date value for the corresponding id) will be used as the starting point for the recursion. Then you can use the DATEADD function to generate the row for the next day. To use the price values from the original data, you can use a subquery to get the price for this new date, and if there is no such value (no row for this date), use the previous price value from CTE (use the COALESCE function for this).
For SQL Server query can look like this
WITH cte AS (
SELECT
id,
date,
price
FROM (
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY date) AS rn
FROM tbl
) t
WHERE rn = 1
UNION ALL
SELECT
cte.id,
DATEADD(d, 1, cte.date),
COALESCE(
(SELECT tbl.price
FROM tbl
WHERE tbl.id = cte.id AND tbl.date = DATEADD(d, 1, cte.date)),
cte.price
)
FROM cte
WHERE DATEADD(d, 1, cte.date) <= GETDATE()
)
SELECT * FROM cte
ORDER BY id, date
OPTION (MAXRECURSION 0)
Note that I added OPTION (MAXRECURSION 0) to make the recursion run through all the steps, since the default value is 100, this is not enough to complete the recursion.
db<>fiddle here
The same approach for MySQL (you need MySQL of version 8.0 to use CTE)
WITH RECURSIVE cte AS (
SELECT
id,
date,
price
FROM (
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY date) AS rn
FROM tbl
) t
WHERE rn = 1
UNION ALL
SELECT
cte.id,
DATE_ADD(cte.date, interval 1 day),
COALESCE(
(SELECT tbl.price
FROM tbl
WHERE tbl.id = cte.id AND tbl.date = DATE_ADD(cte.date, interval 1 day)),
cte.price
)
FROM cte
WHERE DATE_ADD(cte.date, interval 1 day) <= NOW()
)
SELECT * FROM cte
ORDER BY id, date
db<>fiddle here
Both queries produces the same results, the only difference is the use of the engine's specific date functions.
For MySQL versions below 8.0, you can use a calendar table since you don't have CTE support and can't generate the required date range.
Assuming there is a column in the calendar table to store date values (let's call it date for simplicity) you can use the CROSS JOIN operator to generate date ranges for the id values in your table that will match existing dates. Then you can use a subquery to get the latest price value from the table which is stored for the corresponding date or before it.
So the query would be like this
SELECT
d.id,
d.date,
(SELECT
price
FROM tbl
WHERE tbl.id = d.id AND tbl.date <= d.date
ORDER BY tbl.date DESC
LIMIT 1
) price
FROM (
SELECT
t.id,
c.date
FROM calendar c
CROSS JOIN (SELECT DISTINCT id FROM tbl) t
WHERE c.date BETWEEN (
SELECT
MIN(date) min_date
FROM tbl
WHERE tbl.id = t.id
)
AND NOW()
) d
ORDER BY id, date
Using my pseudo-calendar table with date values ranging from 2022-05-20 to 2022-05-30 and source data in that range, like so
id
price
date
1
2
2022-05-21
1
3
2022-05-25
1
2.5
2022-05-28
2
10
2022-05-25
2
100
2022-05-30
the query produces following results
id
date
price
1
2022-05-21
2
1
2022-05-22
2
1
2022-05-23
2
1
2022-05-24
2
1
2022-05-25
3
1
2022-05-26
3
1
2022-05-27
3
1
2022-05-28
2.5
1
2022-05-29
2.5
1
2022-05-30
2.5
2
2022-05-25
10
2
2022-05-26
10
2
2022-05-27
10
2
2022-05-28
10
2
2022-05-29
10
2
2022-05-30
100
db<>fiddle here

select rows in sql with end_date >= start_date for each ID repeated multiple times

Attached the image how the data looks like. In my table I have 3 columns id, start date, and end date, and values like this:
id start date end date
-------------------------------
100 2015-01-01 2015-12-31
100 2016-01-10 2018-12-31
200 2015-02-15 2016-03-15
200 2016-03-15 2016-12-31
300 2016-01-01 2016-12-31
400 2017-01-01 2017-12-31
500 2017-02-01 2017-12-31
600 2017-01-15 2017-03-05
600 2017-02-01 2018-12-31
I want my output to be
id start date end date
--------------------------------
100 2015-01-01 2015-12-31
100 2016-01-10 2018-12-31
200 2015-02-15 2016-12-31
300 2016-01-01 2016-12-31
400 2017-01-01 2017-12-31
500 2017-02-01 2017-12-31
600 2017-01-15 2018-12-31
Query:
select
id, *
from
dbo.test_sl
where
id in (select id
from dbo.test_sl
where end_date >= start_date
group by id)
Please help me get the output I am looking for.

This is an example of a gaps-and-islands problem. In this case, you want to find adjacent rows that do not overlap for the same id. These are the starts of groups. A cumulative sum of the starts of a group providing a grouping number, which can be used for aggregation.
In a query, this looks like:
select id, min(startdate), max(enddate)
from (select t.*,
sum(isstart) over (partition by id order by startdate) as grp
from (select t.*,
(case when exists (select 1
from test_sl t2
where t2.id = t.id and
t2.startdate < t.startdate and
t2.enddate >= t.startdate
)
then 0 else 1
end) as isstart
from test_sl t
) t
) t
group by id, grp;

Assuming that only two records can be combined together, you can LEFT JOIN the table with itself and then use a CASE to display the end date of the self-joined record, if available.
SELECT
t1.id,
min(t1.start_date),
CASE WHEN t2.end_date IS NULL THEN t1.end_date ELSE t2.end_date END
FROM
table t1
LEFT JOIN table t2
ON t1.id = t2.id
AND t2.start_date > t1.start_date
AND t2.start_date <= t1.end_date
GROUP BY
t1.id,
CASE WHEN t2.end_date IS NULL THEN t1.end_date ELSE t2.end_date END
ORDER BY 1
Tested in this SQL Fiddle

Here's a solution that uses a Recursive CTE.
It basically loops through the dates per id, and keeps the smallest start_date for the overlapping end_date/start_date.
Then the result is grouped so there are no more overlaps.
Test here on rextester.
WITH SRC AS
(
SELECT id, start_date, end_date,
row_number() over (partition by id order by start_date) as rn
FROM test_sl
)
, RCTE AS
(
SELECT id, rn, start_date, end_date
FROM SRC
WHERE rn = 1
UNION ALL
SELECT t.id, t.rn, iif(r.end_date >= t.start_date, r.start_date, t.start_date), t.end_date
FROM RCTE r
JOIN SRC t ON t.id = r.id AND t.rn = r.rn + 1
)
SELECT id, start_date, max(end_date) as end_date
FROM RCTE
GROUP BY id, start_date
ORDER BY id, start_date;

get max date when sum of a field equals a value

I have a problem with writing a query.
Row data is as follow :
DATE CUSTOMER_ID AMOUNT
20170101 1 150
20170201 1 50
20170203 1 200
20170204 1 250
20170101 2 300
20170201 2 70
I want to know when(which date) the sum of amount for each customer_id becomes more than 350,
How can I write this query to have such a result ?
CUSTOMER_ID MAX_DATE
1 20170203
2 20170201
Thanks,

Simply use ANSI/ISO standard window functions to calculate the running sum:
select t.*
from (select t.*,
sum(t.amount) over (partition by t.customer_id order by t.date) as running_amount
from t
) t
where running_amount - amount < 350 and
running_amount >= 350;
If for some reason, your database doesn't support this functionality, you can use a correlated subquery:
select t.*
from (select t.*,
(select sum(t2.amount)
from t t2
where t2.customer_id = t.customer_id and
t2.date <= t.date
) as running_amount
from t
) t
where running_amount - amount < 350 and
running_amount >= 350;

ANSI SQL
Used for the test: TSQL and MS SQL Server 2012
select
"CUSTOMER_ID",
min("DATE")
FROM
(
select
"CUSTOMER_ID",
"DATE",
(
SELECT
sum(T02."AMOUNT") AMOUNT
FROM "TABLE01" T02
WHERE
T01."CUSTOMER_ID" = T02."CUSTOMER_ID"
AND T02."DATE" <= T01."DATE"
) "AMOUNT"
from "TABLE01" T01
) T03
where
T03."AMOUNT" > 350
group by
"CUSTOMER_ID"
GO
CUSTOMER_ID | (No column name)
----------: | :------------------
1 | 03/02/2017 00:00:00
2 | 01/02/2017 00:00:00
db<>fiddle here

DB-Fiddle
SELECT
tmp.`CUSTOMER_ID`,
MIN(tmp.`DATE`) as MAX_DATE
FROM
(
SELECT
`DATE`,
`CUSTOMER_ID`,
`AMOUNT`,
(
SELECT SUM(`AMOUNT`) FROM tbl t2 WHERE t2.`DATE` <= t1.`DATE` AND `CUSTOMER_ID` = t1.`CUSTOMER_ID`
) AS SUM_UP
FROM
`tbl` t1
ORDER BY
`DATE` ASC
) tmp
WHERE
tmp.`SUM_UP` > 350
GROUP BY
tmp.`CUSTOMER_ID`
Explaination:
First I select all rows and subselect all rows with SUM and ID where the current row DATE is smaller or same as all rows for the customer. From this tabe i select the MIN date, which has a current sum of >350

I think it is not an easy calculation and you have to calculate something. I know It could be seen a little mixed but i want to calculate step by step. As fist step if we can get success for your scenario, I believe it can be made better about performance. If anybody can make better my query please edit my post;
Unfortunately the solution that i cannot try on computer is below, I guess it will give you expected result;
-- Get the start date of customers
SELECT MIN(DATE) AS DATE
,CUSTOMER_ID
INTO #table
FROM TABLE t1
-- Calculate all possible date and where is sum of amount greater than 350
SELECT t1.CUSTOMER_ID
,SUM(SELECT Amount FROM TABLE t3 WHERE t3.DATE BETWEEN t1.DATE
AND t2.DATE) AS total
,t2.DATE AS DATE
INTO #tableCalculated
FROM #table t1
INNER JOIN TABLE t2 ON t.ID = t2.ID
AND t1.DATE != t2.DATE
WHERE total > 350
-- SELECT Min amount and date for per Customer_ID
SELECT CUSTOMER_ID, MIN(DATE) AS DATE
FROM #tableCalculated
GROUP BY ID

SELECT CUSTOMER_ID, MIN(DATE) AS GOALDATE
FROM ( SELECT cd1.*, (SELECT SUM(AMOUNT)
FROM CustData cd2
WHERE cd2.CUSTOMER_ID = cd1.CUSTOMER_ID
AND cd2.DATE <= cd1.DATE) AS RUNNINGTOTAL
FROM CustData cd1) AS custdata2
WHERE RUNNINGTOTAL >= 350
GROUP BY CUSTOMER_ID
DB Fiddle

Retrieve rows for time interval but also previous row of each - how to?

I have a table like this:
Id FKId Amount1 Amount2 Date
-----------------------------------------------------
1 1 100,0000 33,0000 2018-01-18 19:57:39.403
2 2 50,0000 10,0000 2018-01-19 19:57:57.097
3 1 130,0000 40,0000 2018-01-20 19:58:13.660
5 2 44,0000 2,0000 2018-01-21 11:11:00.000
How to get rows from 3 - 5 (all that have dates 2018-01-21 or 2018-01-21) but also their previous row regarding FKId (1 and 2)?
Thank you

In most databases, you can use the ANSI standard lead() function:
select t.*
from (select t.*, lead(date) over (partition by fkid order by date) as next_date
from t
) t
where date in ('2018-01-20', '2018-01-21') or
next_date in ('2018-01-20', '2018-01-21');
Alternatively, if you just want all records where the date is bigger than some date and the previous record, this logic also works:
select t.*
from t
where t.date >= (select max(t2.date)
from t t2
where t2.fkid = t.fkid and t2.date < '2018-01-20'
);

Adding a Date column based on the next row date value

Im using SQL Server 2005. From the tbl_temp table below, I would like to add an EndDate column based on the next row's StartDate minus 1 day until there's a change in AID and UID combination. This calculated EndDate will go to the row above it as the EndDate. The last row of the group of AID and UID will get the system date as its EndDate. The table has to be ordered by AID, UID, StartDate sequence. Thanks for the help.
-- tbl_temp
AID UID StartDate
1 1 2013-02-20
2 1 2013-02-06
1 1 2013-02-21
1 1 2013-02-27
1 2 2013-02-02
1 2 2013-02-04
-- Result needed
AID UID StartDate EndDate
1 1 2013-02-20 2013-02-20
1 1 2013-02-21 2013-02-26
1 1 2013-02-27 sysdate
1 2 2013-02-02 2013-02-03
1 2 2013-02-04 sysdate
2 1 2013-02-06 sysdate

The easiest way to do this is with a correlated subquery:
select t.*,
(select top 1 dateadd(day, -1, startDate )
from tbl_temp t2
where t2.aid = t.aid and
t2.uid = t.uid and
t2.startdate > t.startdate
) as endDate
from tbl_temp t
To get the current date, use isnull():
select t.*,
isnull((select top 1 dateadd(day, -1, startDate )
from tbl_temp t2
where t2.aid = t.aid and
t2.uid = t.uid and
t2.startdate > t.startdate
), getdate()
) as endDate
from tbl_temp t
Normally, I would recommend coalesce() over isnull(). However, there is a bug in some versions of SQL Server where it evaluates the first argument twice. Normally, this doesn't make a difference, but with a subquery it does.
And finally, the use of sysdate makes me think of Oracle. The same approach will work there too.

;WITH x AS
(
SELECT AID, UID, StartDate,
ROW_NUMBER() OVER(PARTITION BY AID, UID ORDER BY StartDate) AS rn
FROM tbl_temp
)
SELECT x1.AID, x1.UID, x1.StartDate,
COALESCE(DATEADD(day,-1,x2.StartDate), CAST(getdate() AS date)) AS EndDate
FROM x x1
LEFT OUTER JOIN x x2 ON x2.AID = x1.AID AND x2.UID = x1.UID
AND x2.rn = x1.rn + 1
ORDER BY x1.AID, x1.UID, x1.StartDate
SQL Fiddle example

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL Server- find all records within a certain date (not that straightforward!) - sql

Related

SQL: How to create a daily view based on different time intervals using SQL logic?

select rows in sql with end_date >= start_date for each ID repeated multiple times

get max date when sum of a field equals a value

Retrieve rows for time interval but also previous row of each - how to?

Adding a Date column based on the next row date value

Categories

Resources