Get the last 30 unique days that had data - sql

I am trying to run a query that will retrieve the most recent 30 days that have data (not the last 30 days)
There are can be several rows for the same date (so can't use the limit 30)
My data has the following formatting:
date count
2017-05-05 111
2017-05-05 78
2017-04-28 54
2017-01-11 124
Is there a way for me to add a WHERE clause to get the most recent 30 days with data?

Not sure if I correctly understand, though...
(this is for most recent 2 day):
with t(date, count) as(
select '2017-05-05', 111 union all
select '2017-05-05', 78 union all
select '2017-04-28', 54 union all
select '2017-01-11', 124
)
select date from t group by date order by date desc limit 2

If you want all rows, which has the same date as the last 30, distinct dates in your table, you can use the dense_rank() window function:
select (t).*
from (select t, dense_rank() over (order by date desc)
from t) s
where dense_rank <= 30
or IN, with a sub-select:
select *
from t
where date in (select distinct date
from t
order by date desc
limit 30)
http://rextester.com/ESDLIM64772

select * from tablename
where datecolumn in (select TOP 30 max(datecolumn) from tablename group by datecolumn)

Related

SQL: How to create a daily view based on different time intervals using SQL logic?

Here is an example:
Id|price|Date
1|2|2022-05-21
1|3|2022-06-15
1|2.5|2022-06-19
Needs to look like this:
Id|Date|price
1|2022-05-21|2
1|2022-05-22|2
1|2022-05-23|2
...
1|2022-06-15|3
1|2022-06-16|3
1|2022-06-17|3
1|2022-06-18|3
1|2022-06-19|2.5
1|2022-06-20|2.5
...
Until today
1|2022-08-30|2.5
I tried using the lag(price) over (partition by id order by date)
But i can't get it right.
I'm not familiar with Azure, but it looks like you need to use a calendar table, or generate missing dates using a recursive CTE.
To get started with a recursive CTE, you can generate line numbers for each id (assuming multiple id values) in the source data ordered by date. These rows with row number equal to 1 (with the minimum date value for the corresponding id) will be used as the starting point for the recursion. Then you can use the DATEADD function to generate the row for the next day. To use the price values ​​from the original data, you can use a subquery to get the price for this new date, and if there is no such value (no row for this date), use the previous price value from CTE (use the COALESCE function for this).
For SQL Server query can look like this
WITH cte AS (
SELECT
id,
date,
price
FROM (
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY date) AS rn
FROM tbl
) t
WHERE rn = 1
UNION ALL
SELECT
cte.id,
DATEADD(d, 1, cte.date),
COALESCE(
(SELECT tbl.price
FROM tbl
WHERE tbl.id = cte.id AND tbl.date = DATEADD(d, 1, cte.date)),
cte.price
)
FROM cte
WHERE DATEADD(d, 1, cte.date) <= GETDATE()
)
SELECT * FROM cte
ORDER BY id, date
OPTION (MAXRECURSION 0)
Note that I added OPTION (MAXRECURSION 0) to make the recursion run through all the steps, since the default value is 100, this is not enough to complete the recursion.
db<>fiddle here
The same approach for MySQL (you need MySQL of version 8.0 to use CTE)
WITH RECURSIVE cte AS (
SELECT
id,
date,
price
FROM (
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY date) AS rn
FROM tbl
) t
WHERE rn = 1
UNION ALL
SELECT
cte.id,
DATE_ADD(cte.date, interval 1 day),
COALESCE(
(SELECT tbl.price
FROM tbl
WHERE tbl.id = cte.id AND tbl.date = DATE_ADD(cte.date, interval 1 day)),
cte.price
)
FROM cte
WHERE DATE_ADD(cte.date, interval 1 day) <= NOW()
)
SELECT * FROM cte
ORDER BY id, date
db<>fiddle here
Both queries produces the same results, the only difference is the use of the engine's specific date functions.
For MySQL versions below 8.0, you can use a calendar table since you don't have CTE support and can't generate the required date range.
Assuming there is a column in the calendar table to store date values ​​(let's call it date for simplicity) you can use the CROSS JOIN operator to generate date ranges for the id values in your table that will match existing dates. Then you can use a subquery to get the latest price value from the table which is stored for the corresponding date or before it.
So the query would be like this
SELECT
d.id,
d.date,
(SELECT
price
FROM tbl
WHERE tbl.id = d.id AND tbl.date <= d.date
ORDER BY tbl.date DESC
LIMIT 1
) price
FROM (
SELECT
t.id,
c.date
FROM calendar c
CROSS JOIN (SELECT DISTINCT id FROM tbl) t
WHERE c.date BETWEEN (
SELECT
MIN(date) min_date
FROM tbl
WHERE tbl.id = t.id
)
AND NOW()
) d
ORDER BY id, date
Using my pseudo-calendar table with date values ranging from 2022-05-20 to 2022-05-30 and source data in that range, like so
id
price
date
1
2
2022-05-21
1
3
2022-05-25
1
2.5
2022-05-28
2
10
2022-05-25
2
100
2022-05-30
the query produces following results
id
date
price
1
2022-05-21
2
1
2022-05-22
2
1
2022-05-23
2
1
2022-05-24
2
1
2022-05-25
3
1
2022-05-26
3
1
2022-05-27
3
1
2022-05-28
2.5
1
2022-05-29
2.5
1
2022-05-30
2.5
2
2022-05-25
10
2
2022-05-26
10
2
2022-05-27
10
2
2022-05-28
10
2
2022-05-29
10
2
2022-05-30
100
db<>fiddle here

Select weekly data from date table

I have a table with date and other columns. The dates are all weekdays excluding holidays and weekends. I need to select weekly data from the table (OR every Monday data and if Monday is a holiday select Tuesday's. Next row will be Monday's data and so on.).
Example table columns and data:
Date Rate StockQty
2018/08/31 22 25
2018/09/04 24 25
2018/09/05 23 24
2018/09/06 19 21
2018/09/07 25 22
2018/09/10 21 21
I need to select data such that the result will be:
Date Rate StockQty
2018/08/31 22 25
2018/09/04 24 25
2018/09/10 21 21
It is selecting one row per week. 9/3 is Monday and a holiday, so select Tuesday date, then select next week's Monday date.
I tried to partition by DatePart, but it lupms all week together.
create table #Date_rate
(
date smalldatetime,rate int,stockQty int
)
Insert into #Date_rate
select '2018/08/31', 22 , 25 union
select '2018/09/04', 24 , 25 union
select '2018/09/05', 23 , 24 union
select '2018/09/06', 19 , 21 union
select '2018/09/07', 25 , 22 union
select '2018/09/10', 21 , 21
select
a.date
,a.rate
,a.stockQty
from(
select
*
,dense_rank() over(partition by datepart(WEEK,date) order by datepart(WEEKDAY,date) asc) as SekectedDay
from #Date_rate
) a where SekectedDay=1
You can follow logic like this:
select t.*
from (select t.*,
row_number() over (partition by extract(year from date), extract(week from date) order by date asc) as seqnum
from t
) t
where seqnum = 1;
Date functions can vary by database. This uses ANSI/ISO standard functions.
This should work in SQL Server:
SELECT date,Rate,StockQty FROM
(SELECT
date,
Rate,
StockQty,
ROW_NUMBER() OVER(PARTITION BY YEAR(date),DATENAME(WK,Date) ORDER BY day(date))cnt
FROM
#temp
)m
WHERE
cnt = 1

sum last 7 days of sales in new column

I have the following data set:
I want to create a new column that sums the last 7 days of sales. So the query result should look be the following:
Pls help
Thanks!
In standard SQL, you would use a window function -- assuming you have data for each day:
select t.*,
sum(sales) over (partition by itemid order by date rows between 6 preceding and current row) as sales_7
from t;
use sum() aggregate function and group by
select country,itemid,year,monthnumber,week sum(sales) as sales_last_7days from your_table
where date>=DATEADD(day, -7, getdate()) and date< getdate()
group by country,itemid,year,monthnumber,week
with window:
select (list other columns here), sum(sum(sales)) over
(partition by week
order by day
rows between 6 preceding and current row)
from table
group by date, week;
note that week doesen't change group by beacause a date is reffered to one week only, but it is needed in window.
Seems you are working with SQL Server if so, then you can use apply :
select t.*, t1.[last7day]
from table t outer apply
(select sum(t1.sales) as [last7day]
from table t1
where t.itemid = t1.itemid and
t1.date <= dateadd(day, -6, t.dt)
) t1;
If you don't have exactly one day for each row, for example if you have a list of transactions...
The below example completely confused me the first time I saw it, so I've tried to comment as much as I can to explain what's happening.
Suppose we have a table tbl with date column dt and amount column amt, and for each date in tbl we want to return a rolling sum of the amount from the current day and the past 6 days.
select distinct -- see note after code on what this distinct is doing.
dt
, ( -- Has to be in brackets to denote we're returning 1 value per row.
-- for each row of T1:
select sum(b.amt) -- the sum of amounts in T2. The where clause will restrict which rows in T2 will be summed.
from tbl T2
where T2.dt between T1.dt - 6 and T1.dt -- for each row in T1, give me all rows in T2 where the date is between 6 days before this T1 row's date and T1 row's date, giving us our rolling sum
-- WARNING: CHECK YOUR VERSION OF SQL FOR HOW TO SUBTRACT DAYS FROM A DATE, I'VE MADE IT (T1.dt - 6) FOR SIMPLICITY
-- we don't need a group by, because we're returning one value for each row in T1
)
from tbl T1
We have a main version of tbl, aliased T1. We then have a secondary table, aliased T2. For each row in T1, we're going to ask for a set of rows in T2 that we're going to sum before giving it to our main query.
To understand what's happening, run the code without the distinct. You'll notice that we have the same number of rows as in tbl, because the T2 statement is happening for every row in T1.
Notes:
If you have any days for which no rows exist in your table you will not get a calculation for this day. To be certain this doesn't happen, join your table to a table containing a distinct list of consecutive dates, and use this as your date column.
If you have nulls in your amount column the calculation will still work, but if the rolling average contains only nulls you will have null instead of 0 as your result. If that troubles you convert all your nulls to zero's before (or after) you use the query.
The beginning of the period will have a 'ramp up'. But this would be the same whatever method you use to do a rolling sum. If it bothers you, don't return the first 6 days.
Finally a worked example if you're playing along at home using SQL Server:
with tbl as (
-- a list of transactions from 1.10.2019 to 14.10.2019
select cast('2019-10-01' as date) dt, 1 amt
union select cast('2019-10-02' as date), 4
union select cast('2019-10-01' as date), 10
union select cast('2019-10-03' as date), 3
union select cast('2019-10-04' as date), 20
union select cast('2019-10-04' as date), 2
union select cast('2019-10-04' as date), 12
union select cast('2019-10-04' as date), 17
union select cast('2019-10-05' as date), null -- a whole week of null values because we all had the week off... I hope this data wasn't important
union select cast('2019-10-06' as date), null
union select cast('2019-10-07' as date), null
union select cast('2019-10-08' as date), null
union select cast('2019-10-09' as date), null
union select cast('2019-10-10' as date), null
union select cast('2019-10-10' as date), null
union select cast('2019-10-10' as date), null
union select cast('2019-10-11' as date), null
union select cast('2019-10-12' as date), 1
union select cast('2019-10-12' as date), 1
union select cast('2019-10-12' as date), 1
union select cast('2019-10-12' as date), 1
union select cast('2019-10-12' as date), 1
union select cast('2019-10-12' as date), 1
union select cast('2019-10-13' as date), 2
union select cast('2019-10-14' as date), 1000
)
select distinct
a.dt
, (
select sum(b.amt)
from tbl b
where b.dt between dateadd(dd, -6, a.dt) and a.dt
) past_7_days_amt
from tbl a
Returns:
+------------+-----------------+
| dt | past_7_days_amt |
+------------+-----------------+
| 2019-10-01 | 11 |
| 2019-10-02 | 15 |
| 2019-10-03 | 18 |
| 2019-10-04 | 69 |
| 2019-10-05 | 69 |
| 2019-10-06 | 69 |
| 2019-10-07 | 69 |
| 2019-10-08 | 58 |
| 2019-10-09 | 54 |
| 2019-10-10 | 51 |
| 2019-10-11 | NULL |
| 2019-10-12 | 1 |
| 2019-10-13 | 3 |
| 2019-10-14 | 1003 |
+------------+-----------------+

Combining multiple scalar bigquery queries into a single query to generate one table

I have a BiqQuery query that basically takes a date as a parameter and calculates the number of active users our app had near that date.
Right now, if I want to make a graph over a year of active users, I have to run the query 12 times (once per month) and collate the results manually, which is error-prone and time consuming.
Is there a way to make a single bigquery query that runs the subquery 12 times and puts the results on 12 different rows?
For example, if my query is
SELECT COUNT(*) FROM MyTable WHERE activityTime < date '2017-01-01'
How can I get a table like
| Date | Count |
|------------|---------|
| 2017-01-01 | 50000 |
| 2017-02-01 | 40000 |
| 2017-03-01 | 30000 |
| 2017-04-01 | 20000 |
| 2017-05-01 | 10000 |
Supposing that you have a column called date and one called user_id and you want to calculate distinct users on a monthly basis, you can run a query such as:
#standardSQL
SELECT
DATE_TRUNC(date, MONTH) AS month,
COUNT(DISTINCT user_id) AS distinct_users
FROM YourTable
GROUP BY month
ORDER BY month ASC;
(Here you can replace YourTable with the subquery that you want to run). As a self-contained example:
#standardSQL
WITH YourTable AS (
SELECT DATE '2017-06-25' AS date, 10 AS user_id UNION ALL
SELECT DATE '2017-05-04', 11 UNION ALL
SELECT DATE '2017-06-20', 10 UNION ALL
SELECT DATE '2017-04-01', 11 UNION ALL
SELECT DATE '2017-06-02', 12 UNION ALL
SELECT DATE '2017-04-13', 10
)
SELECT
DATE_TRUNC(date, MONTH) AS month,
COUNT(DISTINCT user_id) AS distinct_users
FROM YourTable
GROUP BY month
ORDER BY month ASC;
Elliot taught me UNION ALL and it seemed to do the trick:
SELECT COUNT(*) FROM MyTable WHERE activityTime < date '2017-01-01'
UNION ALL
SELECT COUNT(*) FROM MyTable WHERE activityTime < date '2017-02-01'
UNION ALL
SELECT COUNT(*) FROM MyTable WHERE activityTime < date '2017-03-01'
Maybe there's a nicer way to parameterize the dates in the WHERE clause, but this did the trick for me.

SQL Server 2008 - DateDiff

This is an example of some data
id CurrentMailPromoDate
1 1/1/2013
2 3/1/2013
3 6/9/2013
4 6/10/2013
5 9/18/2013
6 12/27/2013
7 12/27/2013
What I need is to extract only those id's where the date diff between the current and previous record >= 100
In other words the result set would be :
id CurrentMailPromoDate
1 1/1/2013 **initial record**
3 6/9/2013
5 9/18/2013
6 12/27/2013
7 12/27/2013
The challenge is getting the previous date. In SQL Server 2012, you can use lag(), but this is not supported in SQL Server 2008. Instead, I use a correlated subquery to get the previous date.
The following returns the set you are looking for:
with t as (
select 1 as id, CAST('2013-01-01' as DATE) as CurrentMailPromoDate union all
select 2, '2013-03-01' union all
select 3, '2013-06-09' union all
select 4, '2013-06-10' union all
select 5, '2013-09-18' union all
select 6, '2013-12-27' union all
select 7, '2013-12-27')
select t.id, t.CurrentMailPromoDate
from (select t.*,
(select top 1 CurrentMailPromoDate
from t t2
where t2.CurrentMailPromoDate < t.CurrentMailPromoDate
order by CurrentMailPromoDate desc
) as prevDate
from t
) t
where DATEDIFF(dd, prevDate, CurrentMailPromoDate) >= 100 or prevDate is null;
EDIT:
The subquery is returning the previous date for each record. For a given record, it is looking at all the dates less than the date on the record. These are sorted in descending order and the first one is picked. This is prevDate.
The where clause just filters out anything that is less than 100 days from the previous date.