SQL - daily change in a value with business day into consideration - sql

Hi I am trying to write a query that will track daily changes of a column which isn't populated on weekends/holidays.
First my data looks something like this :
Date Value
11/5/2015 10
11/6/2015 11
11/9/2015 12
11/10/2015 12
11/11/2015 11
so i want my query to give me result of the value change each date vs. the previous business day to return something like this:
Date Change in Value since previous business day
11/5/2015 -
11/6/2015 1
11/9/2015 1
11/10/2015 0
11/11/2015 -1
how do i write a write a query in MS Access which tracks daily changes over a business day? Currently i have written the following which only returns daily change over a calendar day as opposed to a biz day. so it won't return anything on Mondays.
SELECT A.Date, A.Value, ( A.Value - B.Value) as [Daily change]
FROM Table as A INNER JOIN Table as B on (A.date = B.date+1)
=============================================================================
thanks guys I've tried all 3 suggestions but they didn't work unfortunately :( there's another column called product ID and perhaps that is why? in other words, on each day, each product ID will have their own distinct values. there is a total of 100 product IDs so on each date there are 100 different values and I would like to track daily changes (business day basis) for each of the 100 product IDs. could anyone kindly help here? :(

It's hacky, but why not:
Join on 3 days ago also
use iif to say "if the 1 day ago diff is null then show the 3 days ago diff"
SELECT
A.Date, A.Value,
iif (isNull( A.Value - B.Value), ( A.Value - C.Value), ( A.Value - B.Value) ) as [change since last biz day]
FROM [Table] as A
left JOIN [Table] as B on ( A.Date = B.Date + 1 )
left JOIN [Table] as C on ( A.Date = C.Date + 3 )

Sometimes I just say it many times in English and the SQL follows. You want it where B equals the maximum date that is less than A.
SELECT A.Date,
A.Value,
A.Value - B.Value as [Daily Change]
FROM MyTable as A
INNER JOIN MyTable as B
ON B.date = (SELECT MAX(C.date) FROM MyTable C WHERE C.Date < A.Date)
ORDER BY A.Date

Related

Filter customers with atleast 3 transactions a year for the past 2 years Presto/SQL

I have a table of customer transactions called cust_trans where each transaction made by a customer is stored as one row. I have another col called visit_date that contains the transaction date. I would like to filter the customers who transact atleast 3 times a year for the past 2 years.
The data looks like below
Id visit_date
---- ------
1 01/01/2019
1 01/02/2019
1 01/01/2019
1 02/01/2020
1 02/01/2020
1 03/01/2020
1 03/01/2020
2 01/02/2019
3 02/04/2019
I would like to know the customers who visited atleast 3 times every year for the past two years
ie. I want below output.
id
---
1
From the customer table only one person visited atleast 3 times for 2 years.
I tried with below query but it only checks if total visits greater than or equal to 3
select id
from
cust_scan
GROUP by
id
having count(visit_date) >= 3
and year(date(max(visit_date)))-year(date(min(visit_date))) >=2
I would appreciate any help, guidance or suggestions
One option would be to generate a list of distinct ids, cross join it with the last two years, and then bring the original table with a left join. You can then aggregate to count how many visits each id had each year. The final step is to aggregate again, and filter with a having clause
select i.id
from (
select i.id, y.yr, count(c.id) cnt
from (select distinct id from cust_scan) i
cross join (values
(date_trunc('year', current_date)),
(date_trunc('year', current_date) - interval '1' year)
) as y(yr)
left join cust_scan c
on i.id = c.id
and c.visit_date >= y.yr
and c.visit_date < y.yr + interval '1' year
group by i.id, y.yr
) t
group by i.id
having min(cnt) >= 3
Another option would be to use two correlated subqueries:
select distinct id
from cust_scan c
where
(
select count(*)
from cust_scan c1
where
c1.id = c.id
and c1.visit_date >= date_trunc('year', current_date)
and c1.visit_date < date_trunc('year', current_date) + interval '1' year
) >= 3
and (
select count(*)
from cust_scan c1
where
c1.id = c.id
and c1.visit_date >= date_trunc('year', current_date) - interval '1' year
and c1.visit_date < date_trunc('year', current_date)
) >= 3
I assume you mean calendar years. I think I would use two levels of aggregation:
select ct.id
from (select ct.id, year(visit_date) as yyyy, count(*) as cnt
from cust_trans ct
where ct.visit_date >= '2019-01-01' -- or whatever
group by ct.id
) ct
group by ct.id
having count(*) = 2 and -- both year
min(cnt) >= 3; -- at least three transactions
If you want the last two complete years, just change the where clause in the subquery.
You can use a similar idea -- of two aggregations -- if you want the last two years relative to the current date. That would be two full years, rather than 1 and some fraction of the current year.

Adding all values for certain dates and the following day

I am trying to do some reporting to see if an event drove sales on the day after the event as well. So for this I am trying to add all the sales from the day of an event and the day after it for each set of data. It does not matter which store the sale happened for the purpose of this report. However it is possible to have a day appear up to 22 times (1 for each store). All the data is stored in one table called UCS.
I have tried finding a way to make Lag or Lead work using case statements and temp tables but so far no luck.
Below are a couple of screenshots to help show what I am trying to do. I need to add the sales marked yellow for each screenshot.
You will notice in example 2 I am not adding the values from the days 11/4 or 11/13. While those are the next date in the data set they are not the next day on the calendar.
Example1
Example 2
Moments after posting this my brain clicked and figured it out. I can make a temp table pulling in a list of dates where there is an event doing a union to a list of dates equal to the dates of event +1 using a date table that just houses the dates of the calendar. Then use this to join back and limit the original table. Sample code below to better explain.
If OBJECT_ID('tempdb.dbo.#Event') IS NOT NULL DROP TABLE #Event
Select *
Into #Event
From (
Select
Universal_ID
,Date
From UCS
Where Month = 11
and Year = 2018
and Event = 1
Union
Select
Universal_ID
,DL.Date
From UCS
Join DateLookup as DL
on DATEADD(d,1,UCS.Date) = DL.Date
Where UCS.Month = 11
and UCS.Year = 2018
and Event = 1
) as A
Select
Sum(Sale) as Sale
From UCS
Join #Event as E
on UCS.Universal_ID = E.Universal_ID
and UCS.date = E.date
Where Month = 11
and Year = 2018
You don't really need a universal set of dates for this, if the date isn't in the UCS table it will not matter to the end result
select
sum(sale)
from UCS t
inner join (
select date from UCS where event = 1
union
select dateadd(day,1,date) from UCS where event = 1
) d on t.date = d.date
You can avoid a union in the subquery which may help avoid a pass through the UCS table by using a cross join, but this might not be worthwhile - only assessing execution plans would reveal this:
select
sum(sale)
from UCS t
inner join (
select distinct dateadd(day,cj.n,date) as date
from UCS
cross join (select 0 as n union all select 1) cj
where event = 1
) d on t.date = d.date
;

sql count statement with multiple date ranges

I have two table with different appointment dates.
Table 1
id start date
1 5/1/14
2 3/2/14
3 4/5/14
4 9/6/14
5 10/7/14
Table 2
id start date
1 4/7/14
1 4/10/14
1 7/11/13
2 2/6/14
2 2/7/14
3 1/1/14
3 1/2/14
3 1/3/14
If i had set date ranges i can count each appointment date just fine but i need to change the date ranges.
For each id in table 1 I need to add the distinct appointment dates from table 2 BUT only
6 months prior to the start date from table 1.
Example: count all distinct appointment dates for id 1 (in table 2) with appointment dates between 12/1/13 and 5/1/14 (6 months prior). So the result is 2...4/7/14 and 4/10/14 are within and 7/1/13 is outside of 6 months.
So my issue is that the range changes for each record and i can not seem to figure out how to code this.For id 2 the date range will be 9/1/14-3/2/14 and so on.
Thanks everyone in advance!
Try this out:
SELECT id,
(
SELECT COUNT(*)
FROM table2
WHERE id = table1.id
AND table2.start_date >= DATEADD(MM,-6,table1.start_date)
) AS table2records
FROM table1
The DATEADD subtracts 6 months from the date in table1 and the subquery returns the count of related records.
I think what you want is a type of join.
select t1.id, count(t2.id) as numt2dates
from table1 t1 left outer join
table2 t2
on t1.id = t2.id and
t2.startdate between dateadd(month, -6, t1.startdate) and t1.startdate
group by t1.id;
The exact syntax for the date arithmetic depends on the database.
Thank you this solved my issue. Although this may not help you since you are not attempting to group by date. But the answer gave me the insights to resolve the issue I was facing.
I was attempting to gather the total users a date criteria that had to be evaluated by multiple fields.
WITH data AS (
SELECT generate_series(
(date '2020-01-01')::timestamp,
NOW(),
INTERVAL '1 week'
) AS date
)
SELECT d.date, (SELECT COUNT(DISTINCT h.id) AS user_count
FROM history h WHERE h.startDate < d.date AND h.endDate > d.date
ORDER BY 1 DESC) AS total_records
FROM data d ORDER BY d.date DESC
2022-05-16, 15
2022-05-09, 13
2022-05-02, 13
...

Sum of revenue everyday, for last 30 days on each day

I have a simple table.
Date | Revenue
5/1 12
5/2 25
5/3 93
.
.
11/15 47
I am trying to write a query that returns two columns. The first column is Date, day-by-day, like the original table. The second column is 30-Day-Revenue, which is the sum of the "Revenue" column in the original table for the last 30 days, ending on the displayed date. There is a lot of overlap when we sum. Thanks in advance!
I have an alternative solution (assumes your table is called revenue_table):
SELECT a.Date, SUM(b.Revenue)
FROM revenue_table a, revenue_table b
WHERE b.Date <= a.Date AND b.Date > a.Date - 30
GROUP BY a.Date;
SELECT table1.Date, table1.Revenue, Past30DayRevenue = SUM(table2.Revenue)
FROM insert_your_table_name_here table1
JOIN insert_your_table_name_here table2 ON DATEDIFF(day, table2.Date, table1.Date) BETWEEN 0 AND 29
GROUP BY table1.Date, table1.Revenue
ORDER BY table1.Date;
You can do this by using subqueries. E.g.
SELECT outer.date, (SELECT SUM(inner.revenue)
FROM table inner
WHERE inner.date > outer.date-30) AS thirtydayrevenue
FROM table outer

SELECT Query between dates, only selecting items between start and end fields

I have two tables that I will be using for tracking purposes, a Date Table and a Item Table. The Date Table is used to track the start and end dates of a tracked id. The Item Table is the amount of items that are pulled on a specific date for an id. The id is the foreign key between these two tables.
What I want to do, is a sum of the items with a GROUP BY of the id of the items, but only by summing the items based on if the date of the pulled item falls between the start_date and end_date of the tracked id.
The Date Table
id start_date end_date
1 2014-01-01 NULL
2 2014-01-01 2014-01-02
3 2014-01-25 NULL
The Item Table
id items date
1 3 2014-01-01
1 5 2014-01-02
1 5 2014-01-26
2 2 2014-01-01
2 3 2014-01-05
2 2 2014-01-26
3 2 2014-01-01
3 3 2014-01-05
3 2 2014-01-26
SQL I have so far, but I'm lost as to what to add to it from here.
SELECT
a.id,
SUM(items)
FROM
ww_test.dbo.items a
INNER JOIN ww_test.dbo.dates b ON
a.id = b.id
WHERE
a.date >= '2014-01-01' AND a.date <= '2014-01-30'
GROUP BY
a.id
ORDER BY
a.id
The output should be:
id items
1 13
2 2
3 2
Instead of:
id items
1 13
2 7
3 7
First of all, I strongly recommend that you stop using NULL in your date ranges to represent "no end date" and instead use a sentinel value such as 9999-12-31. The reason for this is primarily performance and secondarily query simplicity--a benefit to yourself now in writing the queries and to you or others later who have to maintain them. In front-end or middle-tier code, there is little difference to comparing a date range to Null or to 9999-12-31, and in fact you get some of the same benefits of simplified code there as you do in your SQL. I base this recommendation on over 10 years of full-time professional SQL query writing experience.
To fix your query as is, I think this would work:
SELECT
a.id,
ItemsSum = SUM(items)
FROM
ww_test.dbo.items a
INNER JOIN ww_test.dbo.dates b
ON a.id = b.id
AND a.date >= Coalesce(b.start_date, 0)
AND a.date <= Coalesce(b.end_date, '99991231')
WHERE
a.date >= '20140101'
AND a.date <= '20140130'
GROUP BY
a.id
ORDER BY
a.id
;
Note that if you followed my recommendation, your query JOIN conditions could look like this:
INNER JOIN ww_test.dbo.dates b
ON a.id = b.id
AND a.date >= b.start_date
AND a.date <= b.end_date
You will find that if your data sets become large, having to put a Coalesce or IsNull in there will hurt performance in a significant way. It doesn't help to use OR clauses, either:
INNER JOIN ww_test.dbo.dates b
ON a.id = b.id
AND (a.date >= b.start_date OR b.start_date IS NULL)
AND (a.date <= b.end_date OR b.end_date IS NULL)
That's going to have the same problems (for example converting what could have been a seek when there's a suitable index, into a scan, which would be very sad).
Last, I also recommend that you change your end dates to be exclusive instead of inclusive. This means that for the end date, instead of entering the date of the beginning of the final day the information is true, you put the date of the first day it is no longer true. There are several reasons for this recommendation:
If your date resolution ever changes to hours, or minutes, or seconds, every piece of code you have ever written dealing with this data will have to change (and it won't if you use exclusive end dates).
If you ever have to compare date ranges to each other (to collapse date ranges together or locate contiguous ranges or even locate non-contiguous ranges), you now have to do all the comparisons on a.end_date + 1 = b.start_date instead of a simple equijoin of a.end_date = b.start_date. This is painful, and easy to make mistakes.
Always thinking of dates as suggesting time of day will be extremely salutary to your coding ability in any language. Many mistakes are made, over and over, by people forgetting that dates, even ones in formats that can't denote a time portion (such as the date data type in SQL 2008 and up) still have an implicit time portion, and can be converted directly to date data types that do have a time portion, and that time portion will always be 0 or 12 a.m..
The only drawback is that in some cases, you have to do some twiddling about what date you show users (to convert to the inclusive date) and then convert dates they enter into the exclusive date for storing into the database. But this is confined to UI-handling code and is not throughout your database, so it's not that big a drawback.
The only change to your query would be:
INNER JOIN ww_test.dbo.dates b
ON a.id = b.id
AND a.date >= b.start_date
AND a.date < b.end_date -- no equal sign now
One last thing: be aware that the date format 'yyyy-mm-dd' is not culture-safe.
SET LANGUAGE FRENCH;
SELECT Convert(datetime, '2014-01-30'); -- fails with an error
The only invariantly culture-safe formats for datetime in SQL Server are:
yyyymmdd
yyyy-mm-ddThh:mm:ss
I think what you want to do is to compare the dates to be between the start_date and end_date of your Data table.
Change your query to the following and try
SELECT
a.id,
SUM(items)
FROM
ww_test.dbo.items a
INNER JOIN ww_test.dbo.dates b ON a.id = b.id
WHERE
a.date >= ISNULL(b.start_date, GETDATE())
AND a.date <= ISNULL(b.end_date, GETDATE())
GROUP BY a.id
ORDER BY a.id
The problem with the query is the condition part.
Also, since you need to retrieve data based on the condition defined in Dates table, you do not have to explicitly hard code the condition.
Assuming that your End Date can either be null or have values, you can use the following
query:
SELECT
a.id,
SUM(items)
FROM
ww_test.dbo.items a
INNER JOIN ww_test.dbo.dates b ON
a.id = b.id
where (b.end_date is not null and a.date between b.start_date and b.end_date)
or (b.end_date is null and a.date >= b.start_date)
GROUP BY
a.id
ORDER BY
a.id