Imagine you have a set of dates. You want any date which is within X days of the lowest date to be "merged" into that date. Then you want to repeat until you have merged all date points.
For example:
ID
DatePoints
1
2023-01-01
2
2023-01-02
3
2023-01-12
4
2023-01-21
5
2023-02-01
6
2023-02-02
7
2023-03-01
If you applied this rule to this data using 10 days as your X, you would end up with this output:
DateRangeStarts
2023-01-01
2023-01-12
2023-02-01
2023-03-01
IDs 1 and 2 into range 1, IDs 3 and 4 into range 2, IDs 5 and 6 into range 3, and ID 7 into range 4.
Is there any way to do this without a loop? Answer can work in SQL Server or BigQuery. Thanks
You could consider something like the following. It's not pretty and I'm not at all confident it is the best solution, but I do think it works. Maybe it's a good starting point for you to work from.
WITH cte AS
(
SELECT min(datepoint) datepoint
FROM test
UNION ALL
SELECT min(t.datepoint) OVER() datepoint
FROM test t CROSS APPLY (SELECT max(cte.datepoint) OVER() md FROM cte) c
WHERE t.datepoint > DATEADD(DAY, 10, c.md)
)
SELECT distinct datepoint
FROM cte
ORDER BY datepoint
(You might want to change the > to a >=, depending on what counts as within X days.)
The basic idea is to get the minimum date from your table into the cte, then recursively get the minimum date from your table that is bigger than the current maximum date in the cte + X days.
It gets messy because of the limitations SQL Server places on recursive CTEs. They can't be used in subqueries, with normal OUTER JOINs, or with aggregate functions. Therefore, I use CROSS APPLY and the window versions of min/max. This gets the correct result, but multiple times, so I'm forced to use DISTINCT to clean it up afterward.
Depending on your data, it might be better to do a loop anyway, but I think this is an option to consider.
Here's a Fiddle of it working.
Related
I have a dataset like this called data_per_day
instructional_day
points
2023-01-24
2
2023-01-23
2
2023-01-20
1
2023-01-19
0
and so on. the table shows weekdays (days minus holidays and weekends) and the number of points someone has earned. 1 is the start of a streak and 0 is the end of a streak. 2 is max points after a streak has started.
I need to find how long is the latest streak. so in this case the result should be 3
I created a recursive cte but the query returns 2 as the streak count because i'm using lag mechanism with days. instead I need to adjust so that the instructional days are used rather than all dates.
RECURSIVE cte AS (
SELECT
student_unique_id,
instructional_day,
points,
1 AS cnt
FROM
`data_per_day`
WHERE
instructional_day = DATE_ADD(CURRENT_DATE('America/Chicago'), INTERVAL -1 DAY)
UNION ALL
SELECT
a.student_unique_id,
a.instructional_day,
a.points,
c.cnt+1
FROM (
SELECT
*
FROM
`data_per_day`
WHERE
points > 0 ) a
INNER JOIN
cte c
ON
a.student_unique_id = c.student_unique_id
AND a.instructional_day = c.instructional_day - INTERVAL '1' day )
SELECT
student_unique_id,
MAX(cnt) AS streak
FROM
cte --
WHERE
student_unique_id = "419"
GROUP BY
student_unique_id
How do I adjust the query?
This is not a trivial coding exercise, so I won't actually write the code and provide it.
What you have here is a gaps and islands question. You want to identify the largest "island" of days with points within a date range. Depending upon what dates are contained in your data, you may need to generate a list of sequential dates that meet your criteria.
One problem I see is that you are trying to combine the steps to generate the date range (the recursive CTE) with the points. You'll need to separate those steps.
Define the date range.
Generate the dates within the range.
Filter the dates with isweekday = 'no' and isholiday = 'no'. You will probably want to add a row number during this step.
[left] join the dates to your data, including coalesce(points, 0)
Filter the data to points > 0.
Identify the islands.
Identify the largest island per student.
I'm trying to get the day difference between 2 dates in Impala but I need to exclude weekends.
I know it should be something like this but I'm not sure how the weekend piece would go...
DATEDIFF(resolution_date,created_date)
Thanks!
One approach at such task is to enumerate each and every day in the range, and then filter out the week ends before counting.
Some databases have specific features to generate date series, while in others offer recursive common-table-expression. Impala does not support recursive queries, so we need to look at alternative solutions.
If you have a table wit at least as many rows as the maximum number of days in a range, you can use row_number() to offset the starting date, and then conditional aggregation to count working days.
Assuming that your table is called mytable, with column id as primary key, and that the big table is called bigtable, you would do:
select
t.id,
sum(
case when dayofweek(dateadd(t.created_date, n.rn)) between 2 and 6
then 1 else 0 end
) no_days
from mytable t
inner join (select row_number() over(order by 1) - 1 rn from bigtable) n
on t.resolution_date > dateadd(t.created_date, n.rn)
group by id
This question is a bit complicated but to make it as simple as possible:
I have a list of timestamps (it is in the millions but let's say for simplicity sake it is much smaller):
order_times
-----------
2014-10-11 15:00:00
2014-10-11 15:02:00
2014-10-11 15:03:31
2014-10-11 15:07:00
2014-10-11 16:00:00
2014-10-11 16:04:00
I am trying to build a query (in PostgeSQL) that would allow me to determine the number of times a an order_time occurs within 10 minutes of 2 order_times prior to it (and no more).
In the sample data above:
first time stamp is considered 0 as there were no orders before it
second timestamp is considered 0 as it was within 10 minutes of it
prior but there was only 1 request before it
third timestamp is considered 1 because there were at least 2 orders within 10 minutes of it
(and so on)...
I hope this is clear!
You don't need to look at the first previous, just the one 2 prior to each. If that is within 10 minutes, then the one after it will be also.
Best way is to get the data that is important to you into a single row, so you can do set operations on it. For that, use the windowing function ROW_NUMBER() and a self join. This is the MS SQL way of doing what you want.
WITH T1 AS (
SELECT ID, Order_Time, ROW_NUMBER() OVER( ORDER BY Order_Time) AS RowNumber FROM myTest)
SELECT T1.ID,T1.Order_Time, T2.ID AS CompareID,T2.Order_Time AS CompareTime
FROM T1 LEFT OUTER JOIN T1 AS T2 ON T1.RowNumber-2 = T2.RowNumber
WHERE DATEDIFF(n,t2.Order_Time,t1.Order_Time)<=10
First we create a query that has the row numbers, then use it as an inline table to do a self join to build a row that contains each order, and the one that happened 2 orders prior to it. Then just do a simple date comparison to select out the rows you want.
I am trying to figure out how to be able to select records based on even or uneven dates.
I have a table with 4 columns and one has the sign up date and I would like to segment them into two groups based on their sign up dates (using the day as the denominator). So 12/4/2013 would be in the even and 4/3/2012 in the uneven.
I am not sure how to construct this query, I was looking at the datepart but wasn't sure if there is something more straight forward.
thanks
SELECT signed_on, DAY(signed_on) % 2 AS uneven
FROM YourTable
uneven will be either 0 or 1
SELECT DAY('2013-01-03') % 2 -- 1
SELECT DAY('2013-01-02') % 2 -- 0
If you want to name your column even rather than uneven you could just inverse the result.
SELECT signed_on, POWER(DAY(signed_on) % 2 - 1, 2) AS even
FROM YourTable
HINT (for now unless you can show you've tried something yourself) You could do something like DATEPART(day, date) % 2 with a case statement (case statement isn't necessary, but could be if you want to easily go between even and uneven without changing the query of course depending on your environment).
DATEPART(dd, your_date) should give you the date part of your date and then use % to find if it is even or not in the WHERE clause.
I believe something like this should be straight forward -
SELECT COL,CASE WHEN DAY(COL)%2=0 THEN 'EVEN' ELSE 'UNEVEN' END AS [EVEN_UNEVEN]
FROM TEST
Sample output from my sample table-
COL EVEN_UNEVEN
2014-01-09 22:39:51.203 UNEVEN
2014-01-10 22:39:51.210 EVEN
2014-01-12 22:39:51.210 EVEN
2014-01-13 22:39:51.213 UNEVEN
I need to generate a diagram out of data from a table. This table has the following date:
Timestamp | Value
01-20-2013| 5
01-21-2013| 7
01-22-2013| 3
01-25-2013| 5
As you can see not every date has a value. If I put that into a diagram it looks weird. Dates are used for the X-axis. As 01-23-2013 and 01-24-2013 is missing this values are either not printed in the diagram (looks weird) or the are printed put the line of the diagram goes from 3 directly to 5 and not to 0 as it should.
Is there a way via SQL to select the data so that it looks like this:
Timestamp | Value
01-20-2013| 5
01-21-2013| 7
01-22-2013| 3
01-23-2013| 0
01-24-2013| 0
01-25-2013| 5
Any help is appreciated!
Regards,
Alex
Edit: I had no clue that the database engine was that important. This is running on a MySQL 5 Database (not sure about the complete version string).
There are various ways to do this, depending on the database. Date functions are notoriously database independent.
Here is an approach using a "driver" table with all dates and to use this for a left outer join:
select driver.timestamp, coalesce(t.value, 0) as value
from (select distinct timestamp + n.n as timestamp
from t cross join
(select 0 as n union all select 1 union all select 2 union all select 3
)
) driver left outer join
t;
This version assumes that there are gaps of no more than three days.
In some databases, you can construct the list of dates using a recursive CTE. Such an approach would handle gaps of any size.