Oracle SQL count of unique IDs in moving time window

Oracle SQL count of unique IDs in moving time window - sql

I need to query data on a large transaction dataset in a Oracle SQL Developer database but unfortunately I didn't came up with the correct solution yet - hopefully someone can help me.
The dataset consists of the following:
Customer_ID, Counterparty_ID, Transaction ID, Transaction_Amount, Date
Task :
Flag the transactions of a customer, when a customers made transactions (>= 1000 € each) with >= 5 different counterparties in a 7day time window.
The time window should be a moving time window: E.g. if the transaction date is 17.6. the time window would be +- 6 days (11. - 23.06.).
Within a time window only distinct counterparties should be counted. E.g. if a customer has made 5 transactions with counterparty X in time window A, it is counted as 1. If the customer made additional transactions with counterparty X but in time window B, it is again counted as 1 for that time window.
So far I was only able to solve the task with calendar weeks as time window but that is not as intended.

You haven't provided any sample data or expected output so its difficult to know exactly what you want to output; however, you can immediately filter out all the rows with transaction_amount < 1000 and then, from Oracle 12, you use MATCH_RECOGNIZE to perform row-by-row matching:
SELECT *
FROM (
SELECT m.*,
COUNT(DISTINCT counterparty_id) OVER (
PARTITION BY customer_id, match
) AS num_counterparties
FROM (
SELECT *
FROM table_name
WHERE transaction_amount >= 1000
)
MATCH_RECOGNIZE (
PARTITION BY customer_id
ORDER BY "DATE"
MEASURES
MATCH_NUMBER() AS match
ALL ROWS PER MATCH
AFTER MATCH SKIP TO NEXT ROW
PATTERN ( within_week+ )
DEFINE
within_week AS "DATE" < FIRST("DATE") + INTERVAL '7' DAY
) m
)
WHERE num_counterparties >= 5;
Which, for the sample data:
CREATE TABLE table_name (
Customer_ID, Counterparty_ID, Transaction_ID, Transaction_Amount, "DATE"
) AS
SELECT 1, 1, 1, 1000, DATE '1970-01-01' FROM DUAL UNION ALL
SELECT 1, 2, 2, 1000, DATE '1970-01-02' FROM DUAL UNION ALL
SELECT 1, 3, 3, 1000, DATE '1970-01-03' FROM DUAL UNION ALL
SELECT 1, 4, 4, 1000, DATE '1970-01-04' FROM DUAL UNION ALL
SELECT 1, 5, 5, 1000, DATE '1970-01-05' FROM DUAL;
Outputs:
CUSTOMER_ID
DATE
MATCH
COUNTERPARTY_ID
TRANSACTION_ID
TRANSACTION_AMOUNT
NUM_COUNTERPARTIES
1
01-JAN-70
1
1
1
1000
5
1
02-JAN-70
1
2
2
1000
5
1
03-JAN-70
1
3
3
1000
5
1
04-JAN-70
1
4
4
1000
5
1
05-JAN-70
1
5
5
1000
5
db<>fiddle here

Related

BigQuery: How to merge HLL Sketches over a window function? (Count distinct values over a rolling window)

Example relevant table schema:
+---------------------------+-------------------+
| activity_date - TIMESTAMP | user_id - STRING |
+---------------------------+-------------------+
| 2017-02-22 17:36:08 UTC | fake_id_i24385787 |
+---------------------------+-------------------+
| 2017-02-22 04:27:08 UTC | fake_id_234885747 |
+---------------------------+-------------------+
| 2017-02-22 08:36:08 UTC | fake_id_i24385787 |
+---------------------------+-------------------+
I need to count active distinct users over a large data set over a rolling time period (90 days), and am running into issues due to the size of the dataset.
At first, I attempted to use a window function, similar to the answer here.
https://stackoverflow.com/a/27574474
WITH
daily AS (
SELECT
DATE(activity_date) day,
user_id
FROM
`fake-table`)
SELECT
day,
SUM(APPROX_COUNT_DISTINCT(user_id)) OVER (ORDER BY day ROWS BETWEEN 89 PRECEDING AND CURRENT ROW) ninty_day_window_apprx
FROM
daily
GROUP BY
1
ORDER BY
1 DESC
However, this resulted in getting the distinct number of users per day, then summing these up - but distincts could be duplicated within the window, if they appeared multiple times. So this is not a true accurate measure of distinct users over 90 days.
The next thing I tried is to use the following solution
https://stackoverflow.com/a/47659590
- concatenating all the distinct user_ids for each window to an array and then counting the distincts within this.
WITH daily AS (
SELECT date(activity_date) day, STRING_AGG(DISTINCT user_id) users
FROM `fake-table`
GROUP BY day
), temp2 AS (
SELECT
day,
STRING_AGG(users) OVER(ORDER BY UNIX_DATE(day) RANGE BETWEEN 89 PRECEDING AND CURRENT ROW) users
FROM daily
)
SELECT day,
(SELECT APPROX_COUNT_DISTINCT(id) FROM UNNEST(SPLIT(users)) AS id) Unique90Days
FROM temp2
order by 1 desc
However this quickly ran out of memory with anything large.
Next was to use a HLL sketch to represent the distinct IDs in a much smaller value, so memory would be less of an issue. I thought my problems were solved, but I'm getting an error when running the following: The error is simply "Function MERGE_PARTIAL is not supported." I tried with MERGE as well and got the same error. It only happens when using the window function. Creating the sketches for each day's value works fine.
I read through the BigQuery Standard SQL documentation and don't see anything about HLL_COUNT.MERGE_PARTIAL and HLL_COUNT.MERGE with window functions. Presumably this should take the 90 sketches and combine them into one HLL sketch, representing the distinct values between the 90 original sketches?
WITH
daily AS (
SELECT
DATE(activity_date) day,
HLL_COUNT.INIT(user_id) sketch
FROM
`fake-table`
GROUP BY
1
ORDER BY
1 DESC),
rolling AS (
SELECT
day,
HLL_COUNT.MERGE_PARTIAL(sketch) OVER (ORDER BY UNIX_DATE(day) RANGE BETWEEN 89 PRECEDING AND CURRENT ROW) rolling_sketch
FROM daily)
SELECT
day,
HLL_COUNT.EXTRACT(rolling_sketch)
FROM
rolling
ORDER BY
1
"Image of the error - Function MERGE_PARTIAL is not supported"
Any ideas why this error happens or how to adjust?

Below is for BigQuery Standard SQL and does exactly what you want with use of window function
#standardSQL
SELECT day,
(SELECT HLL_COUNT.MERGE(sketch) FROM UNNEST(rolling_sketch_arr) sketch) rolling_sketch
FROM (
SELECT day,
ARRAY_AGG(ids_sketch) OVER(ORDER BY UNIX_DATE(day) RANGE BETWEEN 89 PRECEDING AND CURRENT ROW) rolling_sketch_arr
FROM (
SELECT day, HLL_COUNT.INIT(id) ids_sketch
FROM `project.dataset.table`
GROUP BY day
)
)
You can test, play with above using [totally] dummy data as in below example
#standardSQL
WITH `project.dataset.table` AS (
SELECT 1 id, DATE '2019-01-01' day UNION ALL
SELECT 2, '2019-01-01' UNION ALL
SELECT 3, '2019-01-01' UNION ALL
SELECT 1, '2019-01-02' UNION ALL
SELECT 4, '2019-01-02' UNION ALL
SELECT 2, '2019-01-03' UNION ALL
SELECT 3, '2019-01-03' UNION ALL
SELECT 4, '2019-01-03' UNION ALL
SELECT 5, '2019-01-03' UNION ALL
SELECT 1, '2019-01-04' UNION ALL
SELECT 4, '2019-01-04' UNION ALL
SELECT 2, '2019-01-05' UNION ALL
SELECT 3, '2019-01-05' UNION ALL
SELECT 5, '2019-01-05' UNION ALL
SELECT 6, '2019-01-05'
)
SELECT day,
(SELECT HLL_COUNT.MERGE(sketch) FROM UNNEST(rolling_sketch_arr) sketch) rolling_sketch
FROM (
SELECT day,
ARRAY_AGG(ids_sketch) OVER(ORDER BY UNIX_DATE(day) RANGE BETWEEN 2 PRECEDING AND CURRENT ROW) rolling_sketch_arr
FROM (
SELECT day, HLL_COUNT.INIT(id) ids_sketch
FROM `project.dataset.table`
GROUP BY day
)
)
-- ORDER BY day
with result
Row day rolling_sketch
1 2019-01-01 3
2 2019-01-02 4
3 2019-01-03 5
4 2019-01-04 5
5 2019-01-05 6

Combine HLL_COUNT.INIT and HLL_COUNT.MERGE. This solution uses a 90 days cross join with GENERATE_ARRAY(1, 90) instead of OVER.
#standardSQL
SELECT DATE_SUB(date, INTERVAL i DAY) date_grp
, HLL_COUNT.MERGE(sketch) unique_90_day_users
, HLL_COUNT.MERGE(DISTINCT IF(i<31,sketch,null)) unique_30_day_users
, HLL_COUNT.MERGE(DISTINCT IF(i<8,sketch,null)) unique_7_day_users
FROM (
SELECT DATE(creation_date) date, HLL_COUNT.INIT(owner_user_id) sketch
FROM `bigquery-public-data.stackoverflow.posts_questions`
WHERE EXTRACT(YEAR FROM creation_date)=2017
GROUP BY 1
), UNNEST(GENERATE_ARRAY(1, 90)) i
GROUP BY 1
ORDER BY date_grp

Google Big Query SQL - Get most recent unique value by date

#EDIT - Following the comments, I rephrase my question
I have a BigQuery table that i want to use to get some KPI of my application.
In this table, I save each create or update as a new line in order to keep a better history.
So I have several times the same data with a different state.
Example of the table :
uuid |status |date
––––––|–––––––––––|––––––––––
3 |'inactive' |2018-05-12
1 |'active' |2018-05-10
1 |'inactive' |2018-05-08
2 |'active' |2018-05-08
3 |'active' |2018-05-04
2 |'inactive' |2018-04-22
3 |'inactive' |2018-04-18
We can see that we have multiple value of each data.
What I would like to get:
I would like to have the number of current 'active' entry (So there must be no 'inactive' entry with the same uuid after). And to complicate everything, I need this total per day.
So for each day, the amount of 'active' entries, including those from previous days.
So with this example I should have this result :
date | actives
____________|_________
2018-05-02 | 0
2018-05-03 | 0
2018-05-04 | 1
2018-05-05 | 1
2018-05-06 | 1
2018-05-07 | 1
2018-05-08 | 2
2018-05-09 | 2
2018-05-10 | 3
2018-05-11 | 3
2018-05-12 | 2
Actually i've managed to get the good amount of actives for one day. But my problem is when i want the results for each days.
What I've tried:
I'm stuck with two solutions that each return a different error.
First solution :
WITH
dates AS(
SELECT GENERATE_DATE_ARRAY(
DATE_SUB(CURRENT_DATE(), INTERVAL 6 MONTH), CURRENT_DATE(), INTERVAL 1 DAY)
arr_dates )
SELECT
i_date date,
(
SELECT COUNT(uuid)
FROM (
SELECT
uuid, status, date,
RANK() OVER(PARTITION BY uuid ORDER BY date DESC) rank
FROM users
WHERE
PARSE_DATE("%Y-%m-%d", FORMAT_DATETIME("%Y-%m-%d",date)) <= i_date
)
WHERE
status = 'active'
and rank = 1
## rank is the condition which causes the error
) users
FROM
dates, UNNEST(arr_dates) i_date
ORDER BY i_date;
The SELECT with the RANK() OVER correctly returns the users with a rank column that allow me to know which entry is the last for each uuid.
But when I try this, I got a :
Correlated subqueries that reference other tables are not supported unless they can be de-correlated, such as by transforming them into an efficient JOIN. because of the rank = 1 condition.
Second solution :
WITH
dates AS(
SELECT GENERATE_DATE_ARRAY(
DATE_SUB(CURRENT_DATE(), INTERVAL 6 MONTH), CURRENT_DATE(), INTERVAL 1 DAY)
arr_dates )
SELECT
i_date date,
(
SELECT
COUNT(t1.uuid)
FROM
users t1
WHERE
t1.date = (
SELECT MAX(t2.date)
FROM users t2
WHERE
t2.uuid = t1.uuid
## Here that's the i_date condition which causes problem
AND PARSE_DATE("%Y-%m-%d", FORMAT_DATETIME("%Y-%m-%d", t2.date)) <= i_date
)
AND status='active' ) users
FROM
dates,
UNNEST(arr_dates) i_date
ORDER BY i_date;
Here, the second select is working too and correctly returning the number of active user for a current day.
But the problem is when i try to use i_date to retrieve datas among the multiple days.
And Here i got a LEFT OUTER JOIN cannot be used without a condition that is an equality of fields from both sides of the join. error...
Which solution is more able to succeed ? What should i change ?
And, if my way of storing the data isn't good, how should i proceed in order to keep a precise history ?

Below is for BigQuery Standard SQL
#standardSQL
SELECT date, COUNT(DISTINCT uuid) total_active
FROM `project.dataset.table`
WHERE status = 'active'
GROUP BY date
-- ORDER BY date
Update to address your "rephrased" question :o)
Below example is using dummy data from your question
#standardSQL
WITH `project.dataset.users` AS (
SELECT 3 uuid, 'inactive' status, DATE '2018-05-12' date UNION ALL
SELECT 1, 'active', '2018-05-10' UNION ALL
SELECT 1, 'inactive', '2018-05-08' UNION ALL
SELECT 2, 'active', '2018-05-08' UNION ALL
SELECT 3, 'active', '2018-05-04' UNION ALL
SELECT 2, 'inactive', '2018-04-22' UNION ALL
SELECT 3, 'inactive', '2018-04-18'
), dates AS (
SELECT day FROM UNNEST((
SELECT GENERATE_DATE_ARRAY(MIN(date), MAX(date))
FROM `project.dataset.users`
)) day
), active_users AS (
SELECT uuid, status, date first, DATE_SUB(next_status.date, INTERVAL 1 DAY) last FROM (
SELECT uuid, date, status, LEAD(STRUCT(status, date)) OVER(PARTITION BY uuid ORDER BY date ) next_status
FROM `project.dataset.users` u
)
WHERE status = 'active'
)
SELECT day, COUNT(DISTINCT uuid) actives
FROM dates d JOIN active_users u
ON day BETWEEN first AND IFNULL(last, day)
GROUP BY day
-- ORDER BY day
with result
Row day actives
1 2018-05-04 1
2 2018-05-05 1
3 2018-05-06 1
4 2018-05-07 1
5 2018-05-08 2
6 2018-05-09 2
7 2018-05-10 3
8 2018-05-11 3
9 2018-05-12 2

I think this -- or something similar -- will do what you want:
SELECT day,
coalesce(running_actives, 0) - coalesce(running_inactives, 0)
FROM UNNEST(GENERATE_DATE_ARRAY(DATE('2015-05-11'), DATE('2018-06-29'), INTERVAL 1 DAY)
) AS day left join
(select date, sum(countif(status = 'active')) over (order by date) as running_actives,
sum(countif(status = 'active')) over (order by date) as running_inactives
from t
group by date
) a
on a.date = day
order by day;
The exact solution depends on whether the "inactive" is inclusive of the day (as above) or takes effect the next day. Either is handled the same way, by using cumulative sums of actives and inactives and then taking the difference.
In order to get data on all days, this generates the days using arrays and unnest(). If you have data on all days, that step may be unnecessary

POSTGRES - Average for previous 4 weekdays

Hi I am trying to calculate the average of previous 4 Tuesdays. I have daily sales data and I am trying to calculate what the average for previous 4 weeks were for the same weekday.
Attached is a snapshot of how my dataset looks like
Now for March 6, I would like to know what is the average for the previous 4 weeks were, (namely Feb 6, Feb 13, Feb 20 and Feb 27). This value needs to be assigned to Monthly Average column
I am using a PostGres DB.
Thanks

You can use window functions:
select t.*,
avg(dailycount) over (partition by seller_name, day
order by date
rows between 3 preceding and current row
) as avg_4_weeks
from t
where day = 'Tuesday';
This assumes that "previous 4 weeks" is the current date plus the previous three weeks. If it starts the week before, only the windowing clause needs to change:
select t.*,
avg(dailycount) over (partition by seller_name, day
order by date
rows between 4 preceding and 1 preceding
) as avg_4_weeks
from t
where day = 'Tuesday';

I decided to post my answer also, for anyone else searching. My answer will allow you to put in any date and get the average for the previous 4 weeks ( current day + previous 3 weeks matching the day).
SQL Fiddle
PostgreSQL 9.3 Schema Setup:
CREATE TABLE sales (sellerName varchar(10), dailyCount int, saleDay date) ;
INSERT INTO sales (sellerName, dailyCount, saleDay)
SELECT 'ABC',10,to_date('2018-03-15','YYYY-MM-DD') UNION ALL /* THIS ONE */
SELECT 'ABC',11,to_date('2018-03-14','YYYY-MM-DD') UNION ALL
SELECT 'ABC',12,to_date('2018-03-12','YYYY-MM-DD') UNION ALL
SELECT 'ABC',13,to_date('2018-03-11','YYYY-MM-DD') UNION ALL
SELECT 'ABC',14,to_date('2018-03-10','YYYY-MM-DD') UNION ALL
SELECT 'ABC',15,to_date('2018-03-09','YYYY-MM-DD') UNION ALL
SELECT 'ABC',16,to_date('2018-03-08','YYYY-MM-DD') UNION ALL /* THIS ONE */
SELECT 'ABC',17,to_date('2018-03-07','YYYY-MM-DD') UNION ALL
SELECT 'ABC',18,to_date('2018-03-06','YYYY-MM-DD') UNION ALL
SELECT 'ABC',19,to_date('2018-03-05','YYYY-MM-DD') UNION ALL
SELECT 'ABC',20,to_date('2018-03-04','YYYY-MM-DD') UNION ALL
SELECT 'ABC',21,to_date('2018-03-03','YYYY-MM-DD') UNION ALL
SELECT 'ABC',22,to_date('2018-03-02','YYYY-MM-DD') UNION ALL
SELECT 'ABC',23,to_date('2018-03-01','YYYY-MM-DD') UNION ALL /* THIS ONE */
SELECT 'ABC',24,to_date('2018-02-28','YYYY-MM-DD') UNION ALL
SELECT 'ABC',25,to_date('2018-02-22','YYYY-MM-DD') UNION ALL /* THIS ONE */
SELECT 'ABC',26,to_date('2018-02-15','YYYY-MM-DD') UNION ALL
SELECT 'ABC',27,to_date('2018-02-08','YYYY-MM-DD') UNION ALL
SELECT 'ABC',28,to_date('2018-02-01','YYYY-MM-DD')
;
Now For The Query:
WITH theDay AS (
SELECT to_date('2018-03-15','YYYY-MM-DD') AS inDate
)
SELECT AVG(dailyCount) AS totalCount /* 18.5 = (10(3/15)+16(3/8)+23(3/1)+25(2/22))/4 */
FROM sales
CROSS JOIN theDay
WHERE extract(dow from saleDay) = extract(dow from theDay.inDate)
AND saleDay <= theDay.inDate
AND saleDay >= theDay.inDate-INTERVAL '3 weeks' /* Since we want to include the entered
day, for the INTERVAL we need 1 less week than we want */
Results:
| totalcount |
|------------|
| 18.5 |

Oracle SQL - Easiest way to produce a row of answers in one query, so I don't have to run the query multiple times?

I have this simple query:
Select
To_Date('2012-sep-03','yyyy-mon-dd')as Date_Of_Concern,
Count(Player_Id) as Retained
From Player
Where
(To_Date('2012-sep-03','yyyy-mon-dd')-Trunc(Init_Dtime))<=7
Which Results In:
Date_Of_Concern Retained
03-Sep-12 81319
This query counts all of the players in my database who have logged in(init_dtime) within 7 days of a specific date.
As it stands, I will have to run this query multiple times, for every "Day of Concern" that I wish to know about. Is there a better solution?

If you need to run this query for multiple dates, you would need some mean to hold more than one value. I suggest you use a NESTED TABLE object:
CREATE TYPE my_dates AS TABLE OF DATE;
/
SELECT d.column_value AS Date_Of_Concern, count(Player_Id) AS Retained
FROM Player
JOIN TABLE (my_dates(to_date('2012-sep-03', 'yyyy-mon-dd'),
to_date('2012-sep-04', 'yyyy-mon-dd'),
to_date('2012-sep-05', 'yyyy-mon-dd'))) d
ON d.column_value - trunc(Init_Dtime) BETWEEN 0 AND 7
GROUP BY d.column_value

Simply use GROUP BY to get the count by day:
Select
To_Date(Init_Dtime,'yyyy-mon-dd') as Date_Of_Concern,
Count(Player_Id) as Retained
From Player
Where
(To_Date('2012-sep-03','yyyy-mon-dd') - Trunc(Init_Dtime)) <= 7
GROUP BY To_Date(Init_Dtime,'yyyy-mon-dd')
ORDER BY To_Date(Init_Dtime,'yyyy-mon-dd')

within 7 days of a specific date
To be able to do what you want you will have to know what "specific date" you are talking about by either a formula or a date range. Any random date would obviously require the user to either enter that date or modify the query to run for that date (the way you mentioned).

Not sure if I understood you correctly but this is probably what you want. Might have suboptimal performance though.
12:32:22 HR#vm_xe> l
1 with player(id, dt) as (
2 select 1, date '2012-01-01' from dual union all
3 select 2, date '2012-01-01' from dual union all
4 select 3, date '2012-01-02' from dual union all
5 select 4, date '2012-01-03' from dual union all
6 select 5, date '2012-01-04' from dual union all
7 select 6, date '2012-01-05' from dual union all
8 select 7, date '2012-01-06' from dual union all
9 select 8, date '2012-01-07' from dual union all
10 select 9, date '2012-01-08' from dual union all
11 select 10, date '2012-01-09' from dual union all
12 select 11, date '2012-01-10' from dual
13 )
14 select distinct
15 to_char(dt, 'dd-mm-yyyy') dt
16 ,count(*) over (order by trunc(dt) range interval '7' day preceding) week_cnt
17 from player
18* order by 1, 2
12:32:22 HR#vm_xe> /
DT WEEK_CNT
---------- ----------
01-01-2012 2
02-01-2012 3
03-01-2012 4
04-01-2012 5
05-01-2012 6
06-01-2012 7
07-01-2012 8
08-01-2012 9
09-01-2012 8
10-01-2012 8
10 rows selected.
Elapsed: 00:00:00.01
p.s. do not code like
(To_Date('2012-sep-03','yyyy-mon-dd')-Trunc(Init_Dtime))<=7
code like
init_time between to_date('2012-SEP-03', 'yyyy-mon-dd') and to_date('2012-SEP-03', 'yyyy-mon-dd') + 7
Unless you don't care about indexes, of course :)

SELECT any FROM system

Can any of these queries be done in SQL?
SELECT dates FROM system
WHERE dates > 'January 5, 2010' AND dates < 'January 30, 2010'
SELECT number FROM system
WHERE number > 10 AND number < 20
I'd like to create a generate_series, and that's why I'm asking.

I assume you want to generate a recordset of arbitrary number of values, based on the first and last value in the series.
In PostgreSQL:
SELECT num
FROM generate_series (11, 19) num
In SQL Server:
WITH q (num) AS
(
SELECT 11
UNION ALL
SELECT num + 1
FROM q
WHERE num < 19
)
SELECT num
FROM q
OPTION (MAXRECURSION 0)
In Oracle:
SELECT level + 10 AS num
FROM dual
CONNECT BY
level < 10
In MySQL:
Sorry.

Sort of for dates...
Michael Valentine Jones from SQL Team has an AWESOME date function
Check it out here:
http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=61519

In Oracle
WITH
START_DATE AS
(
SELECT TO_CHAR(TO_DATE('JANUARY 5 2010','MONTH DD YYYY'),'J')
JULIAN FROM DUAL
),
END_DATE AS
(
SELECT TO_CHAR(TO_DATE('JANUARY 30 2010','MONTH DD YYYY'),'J')
JULIAN FROM DUAL
),
DAYS AS
(
SELECT END_DATE.JULIAN - START_DATE.JULIAN DIFF
FROM START_DATE, END_DATE
)
SELECT TO_CHAR(TO_DATE(N + START_DATE.JULIAN, 'J'), 'MONTH DD YYYY')
DESIRED_DATES
FROM
START_DATE,
(
SELECT LEVEL N
FROM DUAL, DAYS
CONNECT BY LEVEL < DAYS.DIFF
)

If you want to get the list of days, with a SQL like
select ... as days where date is between '2010-01-20' and '2010-01-24'
And return data like:
days
----------
2010-01-20
2010-01-21
2010-01-22
2010-01-23
2010-01-24
This solution uses no loops, procedures, or temp tables. The subquery generates dates for the last thousand days, and could be extended to go as far back or forward as you wish.
select a.Date
from (
select curdate() - INTERVAL (a.a + (10 * b.a) + (100 * c.a)) DAY as Date
from (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as a
cross join (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as b
cross join (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as c
) a
where a.Date between '2010-01-20' and '2010-01-24'
Output:
Date
----------
2010-01-24
2010-01-23
2010-01-22
2010-01-21
2010-01-20
Notes on Performance
Testing it out here, the performance is surprisingly good: the above query takes 0.0009 sec.
If we extend the subquery to generate approx. 100,000 numbers (and thus about 274 years worth of dates), it runs in 0.0458 sec.
Incidentally, this is a very portable technique that works with most databases with minor adjustments.

Not sure if this is what you're asking, but if you are wanting to select something not from a table, you can use 'DUAL'
select 1, 2, 3 from dual;
will return a row with 3 columns, contain those three digits.
Selecting from dual is useful for running functions. A function can be run with manual input instead of selecting something else into it. For example:
select some_func('First Parameter', 'Second parameter') from dual;
will return the results of some_func.

In SQL Server you can use the BETWEEN keyword.
Link:
http://msdn.microsoft.com/nl-be/library/ms187922(en-us).aspx

You can select a range by using WHERE and AND WHERE. I can't speak to performance, but its possible.

The simplest solution to this problem is a Tally or Numbers table. That is a table that simply stores a sequence of integers and/or dates
Create Table dbo.Tally (
NumericValue int not null Primary Key Clustered
, DateValue datetime NOT NULL
, Constraint UK_Tally_DateValue Unique ( DateValue )
)
GO
;With TallyItems
As (
Select 0 As Num
Union All
Select ROW_NUMBER() OVER ( Order By C1.object_id ) As Num
From sys.columns as c1
cross join sys.columns as c2
)
Insert dbo.Tally(NumericValue, DateValue)
Select Num, DateAdd(d, Num, '19000101')
From TallyItems
Where Num
Once you have that table populated, you never need touch it unless you want to expand it. I combined the dates and numbers into a single table but if you needed more numbers than dates, then you could break it into two tables. In addition, I arbitrarily filled the table with 100K rows but you could obviously add more. Every day between 1900-01-01 to 9999-12-31 takes about 434K rows. You probably won't need that many but even if you did, the storage is tiny.
Regardless, this is a common technique to solving many gaps and sequences problems. For example, your original queries all ran in less than tenth of a second. You can also use this sort of table to solve gaps problems like:
Select NumericValue
From dbo.Tally
Left Join MyTable
On Tally.NumericValue = MyTable.IdentityColumn
Where Tally.NumericValue Between SomeLowValue And SomeHighValue

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Oracle SQL count of unique IDs in moving time window - sql

Related

BigQuery: How to merge HLL Sketches over a window function? (Count distinct values over a rolling window)

Google Big Query SQL - Get most recent unique value by date

POSTGRES - Average for previous 4 weekdays

Oracle SQL - Easiest way to produce a row of answers in one query, so I don't have to run the query multiple times?

SELECT any FROM system

Categories

Resources