SQL - get summary of differences vs previous month - sql

I have a table similar to this one:
| id | store | BOMdate |
| 1 | A | 01/10/2018 |
| 1 | B | 01/10/2018 |
| 1 | C | 01/10/2018 |
|... | ... | ... |
| 1 | A | 01/11/2018 |
| 1 | C | 01/11/2018 |
| 1 | D | 01/11/2018 |
|... | ... | ... |
| 1 | B | 01/12/2018 |
| 1 | C | 01/12/2018 |
| 1 | E | 01/12/2018 |
It contains the stores that are active at BOM (beginning of month).
How do I query it to get the amount of stores that are new that month - those that where not active the previous month?
The output should be this:
| BOMdate | #newstores |
| 01/10/2018 | 3 | * no stores on previous month
| 01/11/2018 | 1 | * D is the only new active store
| 01/12/2018 | 2 | * store B was not active on November, E is new
I now how to count the first time that each store is active (nested select, taking the MIN(BOMdate) and then counting). But I have no idea how to check each month vs its previous month.
I use SQL Server, but I am interested in the differences in other platforms if there are any.
Thanks

How do I query it to get the amount of stores that are new that month - those that where not active the previous month?
One option uses not exists:
select bomdate, count(*) cnt_new_stores
from mytable t
where not exists (
select 1
from mytable t1
where t1.store = t.store and t1.bomdate = dateadd(month, -1, t.bomdate)
)
group by bomdate
You can also use window functions:
select bomdate, count(*) cnt_new_stores
from (
select t.*, lag(bomdate) over(partition by store order by bomdate) lag_bomdate
from mytable t
) t
where bomdate <> dateadd(month, 1, lag_bomdate) or lag_bomdate is null
group by bomdate

you can compare a date with previous month's date using DATEDIFF function of TSQL.
Using NOT EXIST you can count the stores which did not appear in last month as well you can get the names in a list using STRING_AGG function of TSQL introduced from SQL 2017.
select BOMDate, NewStoresCount=count(1),NewStores= STRING_AGG(store,',') from
yourtable
where not exists
(
Select 1 from
yourtable y where y.store=store and DATEDIFF(m,y.BOMDate,BOMDate)=1
)
group by BOMDate

Related

How to add records for each user based on another existing row in BigQuery?

Posting here in case someone with more knowledge than may be able to help me with some direction.
I have a table like this:
| Row | date |user id | score |
-----------------------------------
| 1 | 20201120 | 1 | 26 |
-----------------------------------
| 2 | 20201121 | 1 | 14 |
-----------------------------------
| 3 | 20201125 | 1 | 0 |
-----------------------------------
| 4 | 20201114 | 2 | 32 |
-----------------------------------
| 5 | 20201116 | 2 | 0 |
-----------------------------------
| 6 | 20201120 | 2 | 23 |
-----------------------------------
However, from this, I need to have a record for each user for each day where if a day is missing for a user, then the last score recorded should be maintained then I would have something like this:
| Row | date |user id | score |
-----------------------------------
| 1 | 20201120 | 1 | 26 |
-----------------------------------
| 2 | 20201121 | 1 | 14 |
-----------------------------------
| 3 | 20201122 | 1 | 14 |
-----------------------------------
| 4 | 20201123 | 1 | 14 |
-----------------------------------
| 5 | 20201124 | 1 | 14 |
-----------------------------------
| 6 | 20201125 | 1 | 0 |
-----------------------------------
| 7 | 20201114 | 2 | 32 |
-----------------------------------
| 8 | 20201115 | 2 | 32 |
-----------------------------------
| 9 | 20201116 | 2 | 0 |
-----------------------------------
| 10 | 20201117 | 2 | 0 |
-----------------------------------
| 11 | 20201118 | 2 | 0 |
-----------------------------------
| 12 | 20201119 | 2 | 0 |
-----------------------------------
| 13 | 20201120 | 2 | 23 |
-----------------------------------
I'm trying to to this in BigQuery using StandardSQL. I have an idea of how to keep the same score across following empty dates, but I really don't know how to add new rows for missing dates for each user. Also, just to keep in mind, this example only has 2 users, but in my data I have more than 1500.
My end goal would be to show something like the average of the score per day. For background, because of our logic, if the score wasn't recorded in a specific day, this means that the user is still in the last score recorded which is why I need a score for every user every day.
I'd really appreciate any help I could get! I've been trying different options without success
Below is for BigQuery Standard SQL
#standardSQL
select date, user_id,
last_value(score ignore nulls) over(partition by user_id order by date) as score
from (
select user_id, format_date('%Y%m%d', day) date,
from (
select user_id, min(parse_date('%Y%m%d', date)) min_date, max(parse_date('%Y%m%d', date)) max_date
from `project.dataset.table`
group by user_id
) a, unnest(generate_date_array(min_date, max_date)) day
)
left join `project.dataset.table` b
using(date, user_id)
-- order by user_id, date
if applied to sample data from your question - output is
One option uses generate_date_array() to create the series of dates of each user, then brings the table with a left join.
select d.date, d.user_id,
last_value(t.score ignore nulls) over(partition by d.user_id order by d.date) as score
from (
select t.user_id, d.date
from mytable t
cross join unnest(generate_date_array(min(date), max(date), interval 1 day)) d(date)
group by t.user_id
) d
left join mytable t on t.user_id = d.user_id and t.date = d.date
I think the most efficient method is to use generate_date_array() but in a very particular way:
with t as (
select t.*,
date_add(lead(date) over (partition by user_id order by date), interval -1 day) as next_date
from t
)
select row_number() over (order by t.user_id, dte) as id,
t.user_id, dte, t.score
from t cross join join
unnest(generate_date_array(date,
coalesce(next_date, date)
interval 1 day
)
) dte;

SQL: Get an aggregate (SUM) of a calculation of two fields (DATEDIFF) that has conditional logic (CASE WHEN)

I have a dataset that includes a bunch of stay data (at a hotel). Each row contains a start date and an end date, but no duration field. I need to get a sum of the durations.
Sample Data:
| Stay ID | Client ID | Start Date | End Date |
| 1 | 38 | 01/01/2018 | 01/31/2019 |
| 2 | 16 | 01/03/2019 | 01/07/2019 |
| 3 | 27 | 01/10/2019 | 01/12/2019 |
| 4 | 27 | 05/15/2019 | NULL |
| 5 | 38 | 05/17/2019 | NULL |
There are some added complications:
I am using Crystal Reports and this is a SQL Expression, which obeys slightly different rules. Basically, it returns a single scalar value. Here is some more info: http://www.cogniza.com/wordpress/2005/11/07/crystal-reports-using-sql-expression-fields/
Sometimes, the end date field is blank (they haven't booked out yet). If blank, I would like to replace it with the current timestamp.
I only want to count nights that have occurred in the past year. If the start date of a given stay is more than a year ago, I need to adjust it.
I need to get a sum by Client ID
I'm not actually any good at SQL so all I have is guesswork.
The proper syntax for a Crystal Reports SQL Expression is something like this:
(
SELECT (CASE
WHEN StayDateStart < DATEADD(year,-1,CURRENT_TIMESTAMP) THEN DATEDIFF(day,DATEADD(year,-1,CURRENT_TIMESTAMP),ISNULL(StayDateEnd,CURRENT_TIMESTAMP))
ELSE DATEDIFF(day,StayDateStart,ISNULL(StayDateEnd,CURRENT_TIMESTAMP))
END)
)
And that's giving me the correct value for a single row, if I wanted to do this:
| Stay ID | Client ID | Start Date | End Date | Duration |
| 1 | 38 | 01/01/2018 | 01/31/2019 | 210 | // only days since June 4 2018 are counted
| 2 | 16 | 01/03/2019 | 01/07/2019 | 4 |
| 3 | 27 | 01/10/2019 | 01/12/2019 | 2 |
| 4 | 27 | 05/15/2019 | NULL | 21 |
| 5 | 38 | 05/17/2019 | NULL | 19 |
But I want to get the SUM of Duration per client, so I want this:
| Stay ID | Client ID | Start Date | End Date | Duration |
| 1 | 38 | 01/01/2018 | 01/31/2019 | 229 | // 210+19
| 2 | 16 | 01/03/2019 | 01/07/2019 | 4 |
| 3 | 27 | 01/10/2019 | 01/12/2019 | 23 | // 2+21
| 4 | 27 | 05/15/2019 | NULL | 23 |
| 5 | 38 | 05/17/2019 | NULL | 229 |
I've tried to just wrap a SUM() around my CASE but that doesn't work:
(
SELECT SUM(CASE
WHEN StayDateStart < DATEADD(year,-1,CURRENT_TIMESTAMP) THEN DATEDIFF(day,DATEADD(year,-1,CURRENT_TIMESTAMP),ISNULL(StayDateEnd,CURRENT_TIMESTAMP))
ELSE DATEDIFF(day,StayDateStart,ISNULL(StayDateEnd,CURRENT_TIMESTAMP))
END)
)
It gives me an error that the StayDateEnd is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause. But I don't even know what that means, so I'm not sure how to troubleshoot, or where to go from here. And then the next step is to get the SUM by Client ID.
Any help would be greatly appreciated!
Although the explanation and data set are almost impossible to match, I think this is an approximation to what you want.
declare #your_data table (StayId int, ClientId int, StartDate date, EndDate date)
insert into #your_data values
(1,38,'2018-01-01','2019-01-31'),
(2,16,'2019-01-03','2019-01-07'),
(3,27,'2019-01-10','2019-01-12'),
(4,27,'2019-05-15',NULL),
(5,38,'2019-05-17',NULL)
;with data as (
select *,
datediff(day,
case
when datediff(day,StartDate,getdate())>365 then dateadd(year,-1,getdate())
else StartDate
end,
isnull(EndDate,getdate())
) days
from #your_data
)
select *,
sum(days) over (partition by ClientId)
from data
https://rextester.com/HCKOR53440
You need a subquery for sum based on group by client_id and a join between you table the subquery eg:
select Stay_id, client_id, Start_date, End_date, t.sum_duration
from your_table
inner join (
select Client_id,
SUM(CASE
WHEN StayDateStart < DATEADD(year,-1,CURRENT_TIMESTAMP) THEN DATEDIFF(day,DATEADD(year,-1,CURRENT_TIMESTAMP),ISNULL(StayDateEnd,CURRENT_TIMESTAMP))
ELSE DATEDIFF(day,StayDateStart,ISNULL(StayDateEnd,CURRENT_TIMESTAMP))
END) sum_duration
from your_table
group by Client_id
) t on t.Client_id = your_table.client_id

Split the date of same column in multiple rows till the next date value is specified - SQL Server

I have this table
+------+------------+-----+
| Code | date | qty |
+------+------------+-----+
| 1 | 06-07-2017 | 44 |
| 1 | 08-07-2017 | 45 |
| 2 | 07-07-2017 | 32 |
| 2 | 09-07-2017 | 33 |
+------+------------+-----+
and I want to display it this way
+------+------------+-----+
| Code | date | qty |
+------+------------+-----+
| 1 | 06-07-2017 | 44 |
| 1 | 07-07-2017 | 44 |
| 1 | 08-07-2017 | 45 |
| 2 | 07-07-2017 | 32 |
| 2 | 08-07-2017 | 32 |
| 2 | 09-07-2017 | 33 |
+------+------------+-----+
I want to split the date of same 'Code' and keep the same value for 'qty' till the next date of same 'Code'.
You need a calendar table and Outer Apply
;WITH cte
AS (SELECT Min([date]) AS st,
Max([date]) ed,
code
FROM Yourtable
GROUP BY code
UNION ALL
SELECT Dateadd(dd, 1, st) AS st,
ed,
code
FROM cte
WHERE Dateadd(dd, 1, st) <= ed)
SELECT c.code,
[date]=c.st,
qty
FROM cte c
OUTER apply (SELECT TOP 1 qty
FROM Yourtable a
WHERE a.code = c.code
AND c.st >= a.[date]
ORDER BY [date] DESC) oa
ORDER BY c.code,st
Note : For the sake of completeness I have used Recursive CTE to generate the dates you can always create a physical calendar table in your database and use it.
Live Demo

Union in outer query

I'm attempting to combine multiple rows using a UNION but I need to pull in additional data as well. My thought was to use a UNION in the outer query but I can't seem to make it work. Or am I going about this all wrong?
The data I have is like this:
+------+------+-------+---------+---------+
| ID | Time | Total | Weekday | Weekend |
+------+------+-------+---------+---------+
| 1001 | AM | 5 | 5 | 0 |
| 1001 | AM | 2 | 0 | 2 |
| 1001 | AM | 4 | 1 | 3 |
| 1001 | AM | 5 | 3 | 2 |
| 1001 | PM | 5 | 3 | 2 |
| 1001 | PM | 5 | 5 | 0 |
| 1002 | PM | 4 | 2 | 2 |
| 1002 | PM | 3 | 3 | 0 |
| 1002 | PM | 1 | 0 | 1 |
+------+------+-------+---------+---------+
What I want to see is like this:
+------+---------+------+-------+
| ID | DayType | Time | Tasks |
+------+---------+------+-------+
| 1001 | Weekday | AM | 9 |
| 1001 | Weekend | AM | 7 |
| 1001 | Weekday | PM | 8 |
| 1001 | Weekend | PM | 2 |
| 1002 | Weekday | PM | 5 |
| 1002 | Weekend | PM | 3 |
+------+---------+------+-------+
The closest I've come so far is using UNION statement like the following:
SELECT * FROM
(
SELECT Weekday, 'Weekday' as 'DayType' FROM t1
UNION
SELECT Weekend, 'Weekend' as 'DayType' FROM t1
) AS X
Which results in something like the following:
+---------+---------+
| Weekday | DayType |
+---------+---------+
| 2 | Weekend |
| 0 | Weekday |
| 2 | Weekday |
| 0 | Weekend |
| 10 | Weekday |
+---------+---------+
I don't see any rhyme or reason as to what the numbers are under the 'Weekday' column, I suspect they're being grouped somehow. And of course there are several other columns missing, but since I can't put a large scope in the outer query with this as inner one, I can't figure out how to pull those in. Help is greatly appreciated.
It looks like you want to union all a pair of aggregation queries that use sum() and group by id, time, one for Weekday and one for Weekend:
select Id, DayType = 'Weekend', [time], Tasks=sum(Weekend)
from t
group by id, [time]
union all
select Id, DayType = 'Weekday', [time], Tasks=sum(Weekday)
from t
group by id, [time]
Try with this
select ID, 'Weekday' as DayType, Time, sum(Weekday)
from t1
group by ID, Time
union all
select ID, 'Weekend', Time, sum(Weekend)
from t1
group by ID, Time
order by order by 1, 3, 2
Not tested, but it should do the trick. It may require 2 proc sql steps for the calculation, one for summing and one for the case when statements. If you have extra lines, just use a max statement and group by ID, Time, type_day.
Proc sql; create table want as select ID, Time,
sum(weekday) as weekdayTask,
sum(weekend) as weekendTask,
case when calculated weekdaytask>0 then weekdaytask
when calculated weekendtask>0 then weekendtask else .
end as Task,
case when calculated weekdaytask>0 then "Weekday"
when calculated weekendtask>0 then "Weekend"
end as Day_Type
from have
group by ID, Time
;quit;
Proc sql; create table want2 as select ID, Time, Day_Type, Task
from want
;quit;

Data aggregation with left-outer join

I am trying to pull some data with transaction counts, by branch, by week, which will later be used to feed some dynamic .Net charts.
I have a calendar table, I have a branch table and I have a transaction table.
Here is my DB info (only relevant columns included):
Branch Table:
ID (int), Branch (varchar)
Calendar Table:
Date (datetime), WeekOfYear(int)
Transaction Table:
Date (datetime), Branch (int), TransactionCount(int)
So, I want to do something like the following:
Select b.Branch, c.WeekOfYear, sum(TransactionCount)
FROM BranchTable b
LEFT OUTER JOIN TransactionTable t
on t.Branch = b.ID
JOIN Calendar c
on t.Date = c.Date
WHERE YEAR(c.Date) = #Year // (SP accepts this parameter)
GROUP BY b.Branch, c.WeekOfYear
Now, this works EXCEPT when a branch doesn't have any transactions for a week, in which case NO RECORD is returned for that branch on that week. What I WANT is to get that branch, that week and "0" for the sum. I tried isnull(sum(TransactionCount), 0) - but that didn't work, either. So I will get the following (making up sums for illustration purposes):
+--------+------------+-----+
| Branch | WeekOfYear | Sum |
+--------+------------+-----+
| 1 | 1 | 25 |
| 2 | 1 | 37 |
| 3 | 1 | 19 |
| 4 | 1 | 0 | //THIS RECORD DOES NOT GET RETURNED, BUT I NEED IT!
| 1 | 2 | 64 |
| 2 | 2 | 34 |
| 3 | 2 | 53 |
| 4 | 2 | 11 |
+--------+------------+-----+
So, why doesn't the left-outer join work? Isn't that supposed to
Any help will be greatly appreciated. Thank you!
EDIT: SAMPLE TABLE DATA:
Branch Table:
+----+---------------+
| ID | Branch |
+----+---------------+
| 1 | First Branch |
| 2 | Second Branch |
| 3 | Third Branch |
| 4 | Fourth Branch |
+----+---------------+
Calendar Table:
+------------+------------+
| Date | WeekOfYear |
+------------+------------+
| 01/01/2015 | 1 |
| 01/02/2015 | 1 |
+------------+------------+
Transaction Table
+------------+--------+--------------+
| Date | Branch | Transactions |
+------------+--------+--------------+
| 01/01/2015 | 1 | 12 |
| 01/01/2015 | 1 | 9 |
| 01/01/2015 | 2 | 4 |
| 01/01/2015 | 2 | 2 |
| 01/01/2015 | 2 | 23 |
| 01/01/2015 | 3 | 42 |
| 01/01/2015 | 3 | 19 |
| 01/01/2015 | 3 | 7 |
+------------+--------+--------------+
If you want to return a query that contains each Branch and each week, then you'll need to first create a full list of that, then use a LEFT JOIN to the transactions to get the count. The code will be similar to:
select bc.Branch,
bc.WeekOfYear,
TotalTransaction = coalesce(sum(t.TransactionCount), 0)
from
(
select b.id, b.branch, c.WeekOfYear, c.date
from branch b
cross join Calendar c
-- if you want to limit the number of rows returned use a WHERE to limit the weeks
-- so far in the year or using the date column
WHERE c.date <= getdate()
and YEAR(c.Date) = #Year // (SP accepts this parameter)
) bc
left join TransactionTable t
on t.Date = bc.Date
and bc.id = t.branch
GROUP BY bc.Branch, bc.WeekOfYear
See Demo
This code will create in your subquery a full list of each branch with each date. Once you have this list, then you can JOIN to the transactions to get your total transaction count and you'd return each date as you want.
Bring in the Calendar before you bring in the transactions:
SELECT b.Branch, c.WeekOfYear, sum(TransactionCount)
FROM BranchTable b
INNER JOIN CalendarTable c ON YEAR(c.Date) = #Year
LEFT JOIN TransactionTable t ON t.Branch = b.ID AND t.Date = c.Date
GROUP BY b.Branch, c.WeekOfYear
ORDER BY c.WeekOfYear, b.Branch