How do I obtain/insert dummy zero week and costs in between valid data - sql

This query is a long shot and has been driving me crazy.
What I want to do is basically insert zero weeks and costs in between the valid weeks and costs in a temp table. See the list below
Practice ID Practice Name Cost Week
1 1 - Practice 1 56.00 18
1 1 - Practice 1 80.00 18
1 1 - Practice 1 122.00 18
1 1 - Practice 1 -80.00 19
1 1 - Practice 1 80.00 19
1 1 - Practice 1 80.00 21
3 3 - Practice 3 80.00 24
3 3 - Practice 3 18.00 28
3 3 - Practice 3 50.00 29
3 3 - Practice 3 18.00 30
3 3 - Practice 3 18.00 34
3 3 - Practice 3 18.00 35
4 4 - Practice 4 36.00 29
4 4 - Practice 4 299.81 31
4 4 - Practice 4 54.00 32
4 4 - Practice 4 132.00 34
4 4 - Practice 4 314.00 35
4 4 - Practice 4 18.00 35
4 4 - Practice 4 501.00 36
4 4 - Practice 4 342.00 36
7 7 - Practice 7 28.00 24
7 7 - Practice 7 56.00 27
7 7 - Practice 7 40.00 27
What I want to do is where there are weeks missing between 1 and 36 for each practice I need to somehow insert zeroed weeks and costs for example
Practice ID Practice Name Cost Week
1 1 - Practice 1 0.00 12
1 1 - Practice 1 0.00 13
1 1 - Practice 1 0.00 14
1 1 - Practice 1 0.00 15
1 1 - Practice 1 0.00 16
1 1 - Practice 1 0.00 17
1 1 - Practice 1 56.00 18
1 1 - Practice 1 80.00 18
1 1 - Practice 1 122.00 18
1 1 - Practice 1 -80.00 19
1 1 - Practice 1 80.00 19
1 1 - Practice 1 80.00 21
As you can see this might seem easy enough but I'm obtaining the payments above from a number of joins the actual week is from a Time table where the date time from the payment is joined to the date time for the week number. Then where there is a matching payment it returns the payment with the payment date.
I've included a depersonalised version of the Query below.
If there is no payment in the payment table then a week number cannot be returned as there is nothing to join it to. (Hard to explain) I have tried to create a temp table with just the fiscal week numbers then update those where there is a payment, but because there is no fixed number of payments per practice per week I cannot determine how many week entries I would need.
If anyone can help with this they're a life saver as this has been giving me a headache for a day or so now.
SELECT R.[Practice id]
,W.[Practice Short Name]
,convert(VARCHAR(10), R.[Practice id]) + ' - ' + W.[Practice Short Name] AS PracticeNumberName
,SUM(R.[Cost]) AS 'Cost'
,I.FiscalWeek
,I.WeekEndingDate
,R.[Owner]
,R.[Description]
FROM dbo.Payments R
INNER JOIN dbo.Time I ON convert(VARCHAR(10), R.[AllocationDate], 121) = convert(VARCHAR(10), I.CalendarDate, 121)
INNER JOIN dbo.OtherDetails W ON R.[Practice id] = W.[Practice Id]
WHERE [paymenttype] NOT IN (5,14,15)
AND I.[FiscalYear] in (2013,2014)
AND I.[FiscalWeek] between 1 and 36
AND R.[cost] <> 0
AND R.[Practice id] IN (1,2,3,4,5,6,7,8,9,10,11,12)
GROUP BY R.[Practice id]
,W.[Practice Short Name]
,convert(VARCHAR(10), R.[Practice id]) + ' - ' + W.[Practice Short Name]
,I.FiscalWeek
,I.WeekEndingDate
,R.[Owner]
,R.[Description]
HAVING sum(R.[cost]) <> 0

Something like this will get you started. You can look after the details.
select i.FiscalWeek, isnull(somefield, 0) cost
, etc
from time i left join payments r on etc
left join other tables
This will give you all the fiscal weeks whether there are matching records in the other tables or not.

Related

Pandas: to get mean for each data category daily [duplicate]

I am a somewhat beginner programmer and learning python (+pandas) and hope I can explain this well enough. I have a large time series pd dataframe of over 3 million rows and initially 12 columns spanning a number of years. This covers people taking a ticket from different locations denoted by Id numbers(350 of them). Each row is one instance (one ticket taken).
I have searched many questions like counting records per hour per day and getting average per hour over several years. However, I run into the trouble of including the 'Id' variable.
I'm looking to get the mean value of people taking a ticket for each hour, for each day of the week (mon-fri) and per station.
I have the following, setting datetime to index:
Id Start_date Count Day_name_no
149 2011-12-31 21:30:00 1 5
150 2011-12-31 20:51:00 1 0
259 2011-12-31 20:48:00 1 1
3015 2011-12-31 19:38:00 1 4
28 2011-12-31 19:37:00 1 4
Using groupby and Start_date.index.hour, I cant seem to include the 'Id'.
My alternative approach is to split the hour out of the date and have the following:
Id Count Day_name_no Trip_hour
149 1 2 5
150 1 4 10
153 1 2 15
1867 1 4 11
2387 1 2 7
I then get the count first with:
Count_Item = TestFreq.groupby([TestFreq['Id'], TestFreq['Day_name_no'], TestFreq['Hour']]).count().reset_index()
Id Day_name_no Trip_hour Count
1 0 7 24
1 0 8 48
1 0 9 31
1 0 10 28
1 0 11 26
1 0 12 25
Then use groupby and mean:
Mean_Count = Count_Item.groupby(Count_Item['Id'], Count_Item['Day_name_no'], Count_Item['Hour']).mean().reset_index()
However, this does not give the desired result as the mean values are incorrect.
I hope I have explained this issue in a clear way. I looking for the mean per hour per day per Id as I plan to do clustering to separate my dataset into groups before applying a predictive model on these groups.
Any help would be grateful and if possible an explanation of what I am doing wrong either code wise or my approach.
Thanks in advance.
I have edited this to try make it a little clearer. Writing a question with a lack of sleep is probably not advisable.
A toy dataset that i start with:
Date Id Dow Hour Count
12/12/2014 1234 0 9 1
12/12/2014 1234 0 9 1
12/12/2014 1234 0 9 1
12/12/2014 1234 0 9 1
12/12/2014 1234 0 9 1
19/12/2014 1234 0 9 1
19/12/2014 1234 0 9 1
19/12/2014 1234 0 9 1
26/12/2014 1234 0 10 1
27/12/2014 1234 1 11 1
27/12/2014 1234 1 11 1
27/12/2014 1234 1 11 1
27/12/2014 1234 1 11 1
04/01/2015 1234 1 11 1
I now realise I would have to use the date first and get something like:
Date Id Dow Hour Count
12/12/2014 1234 0 9 5
19/12/2014 1234 0 9 3
26/12/2014 1234 0 10 1
27/12/2014 1234 1 11 4
04/01/2015 1234 1 11 1
And then calculate the mean per Id, per Dow, per hour. And want to get this:
Id Dow Hour Mean
1234 0 9 4
1234 0 10 1
1234 1 11 2.5
I hope this makes it a bit clearer. My real dataset spans 3 years with 3 million rows, contains 350 Id numbers.
Your question is not very clear, but I hope this helps:
df.reset_index(inplace=True)
# helper columns with date, hour and dow
df['date'] = df['Start_date'].dt.date
df['hour'] = df['Start_date'].dt.hour
df['dow'] = df['Start_date'].dt.dayofweek
# sum of counts for all combinations
df = df.groupby(['Id', 'date', 'dow', 'hour']).sum()
# take the mean over all dates
df = df.reset_index().groupby(['Id', 'dow', 'hour']).mean()
You can use the groupby function using the 'Id' column and then use the resample function with how='sum'.

running total starting from a date column

I'm trying to get a running total as of a date. This is the data I have
Date
transaction Amount
End of Week Balance
jan 1
5
100
jan 2
3
100
jan 3
4
100
jan 4
3
100
jan 5
1
100
jan 6
3
100
I would like to find out what the daily end balance is. My thought is to get a running total from each day to the end of the week and subtract it from the end of week balance, like below
Date
transaction Amount
Running total
End of Week Balance
Balance - Running total
jan 1
5
19
100
86
jan 2
3
14
100
89
jan 3
4
11
100
93
jan 4
3
7
100
96
jan 5
1
4
100
97
jan 6
3
3
100
100
I can use
SUM(transactionAmount) OVER (Order by Date)
to get a running total, is there a way to specify that I only want the total of transactions that have taken place after the date?
You can use sum() as a window function, but accumulate in reverse:
select t.*,
(end_of_week_balance -
sum(transactionAmount) over (order by date desc)
)
from t;
If you have this example:
1> select i, sum(i) over (order by i) S from integers where i<10;
2> go
i S
----------- -----------
1 1
2 3
3 6
4 10
5 15
6 21
7 28
8 36
9 45
you can also do:
1> select i, sum(case when i>3 then i else 0 end) over (order by i) S from integers where i<10;
2> go
i S
----------- -----------
1 0
2 0
3 0
4 4
5 9
6 15
7 22
8 30
9 39

Keep first record in group and populate rest with Null/0 in SQL?

I have the following table in my database:
date sales
1 2010-12-13 10
2 2010-12-13 10
3 2010-12-13 10
4 2010-12-13 10
5 2010-12-13 10
6 2010-12-14 20
7 2010-12-14 20
8 2010-12-14 20
9 2010-12-14 20
10 2010-12-14 20
Is there a way to attain the first record only and populate the rest with NULL or 0 for the remainder of the group? AS the grouping will be done by date and sales:
For example the intended output is:
date sales
1 2010-12-13 10
2 2010-12-13 0
3 2010-12-13 0
4 2010-12-13 0
5 2010-12-13 0
6 2010-12-14 20
7 2010-12-14 0
8 2010-12-14 0
9 2010-12-14 0
10 2010-12-14 0
So essentially to keep the first record but make the rest of the records in the group be 0 (maybe Null if that is quicker/easier)
The closest i have got to solving this is attaining just the first record through an inner join - but I think a partition over may solve it - just stuck at the moment!
Any help appreciated!
Using SQLite - but also GCP (SQL) is accesible to me
This might work in SQLite:
CASE WHEN id = MIN(id) OVER(PARTITION BY date) THEN sales ELSE 0 END as sales
If it doesn't you can prepare a subquery that has only the min ID per date and join it in:
SELECT
CASE WHEN y.id IS NULL THEN 0 ELSE sales END as sales
FROM
x
LEFT JOIN (SELECT MIN(id) as id FROM x GROUP BY date) y ON x.id= y.id

Max date among records and across tables - SQL Server

I tried max to provide in table format but it seem not good in StackOver, so attaching snapshot of the 2 tables. Apologize about the formatting.
SQL Server 2012
**MS Table**
**mId tdId name dueDate**
1 1 **forecastedDate** 1/1/2015
2 1 **hypercareDate** 11/30/2016
3 1 LOE 1 7/4/2016
4 1 LOE 2 7/4/2016
5 1 demo for yy test 10/15/2016
6 1 Implementation – testing 7/4/2016
7 1 Phased Rollout – final 7/4/2016
8 2 forecastedDate 1/7/2016
9 2 hypercareDate 11/12/2016
10 2 domain - Forte NULL
11 2 Fortis completion 1/1/2016
12 2 Certification NULL
13 2 Implementation 7/4/2016
-----------------------------------------------
**MSRevised**
**mId revisedDate**
1 1/5/2015
1 1/8/2015
3 3/25/2017
2 2/1/2016
2 12/30/2016
3 4/28/2016
4 4/28/2016
5 10/1/2016
6 7/28/2016
7 7/28/2016
8 4/28/2016
9 8/4/2016
9 5/28/2016
11 10/4/2016
11 10/5/2016
13 11/1/2016
----------------------------------------
The required output is
1. Will be passing the 'tId' number, for instance 1, lets call it tid (1)
2. Want to compare tId (1)'s all milestones (except hypercareDate) with tid(1)'s forecastedDate milestone
3. return if any of the milestone date (other than hypercareDate) is greater than the forecastedDate
The above 3 steps are simple, but I have to first compare the milestones date with its corresponding revised dates, if any, from the revised table, and pick the max date among all that needs to be compared with the forecastedDate
I managed to solve this. Posting the answer, hope it helps aomebody.
//Insert the result into temp table
INSERT INTO #mstab
SELECT [mId]
, [tId]
, [msDate]
FROM [dbo].[MS]
WHERE ([msName] NOT LIKE 'forecastedDate' AND [msName] NOT LIKE 'hypercareDate'))
// this scalar function will get max date between forecasted duedate and forecasted revised date
SELECT #maxForecastedDate = [dbo].[fnGetMaxDate] ( 'forecastedDate');
// this will get the max date from temp table and compare it with forecasatedDate/
SET #maxmilestoneDate = (SELECT MAX(maxDate)
FROM ( SELECT ms.msDueDate AS dueDate
, mr.msRevisedDate AS revDate
FROM #mstab as ms
LEFT JOIN [MSRev] as mr on ms.msId = mr.msId
) maxDate
UNPIVOT (maxDate FOR DateCols IN (dueDate, revDate))up );

Using Sum() with multiple where clauses

I'm pretty new to this, so forgive if this has been posted (I had no idea what to even search on).
I have 2 tables, Accounts and Usage
AccountID AccountStartDate AccountEndDate
-------------------------------------------
1 12/1/2012 12/1/2013
2 1/1/2013 1/1/2014
UsageId AccountID EstimatedUsage StartDate EndDate
------------------------------------------------------
1 1 10 1/1 1/31
2 1 11 2/1 2/29
3 1 23 3/1 3/31
4 1 23 4/1 4/30
5 1 15 5/1 5/31
6 1 20 6/1 6/30
7 1 15 7/1 7/31
8 1 12 8/1 8/31
9 1 14 9/1 9/30
10 1 21 10/1 10/31
11 1 27 11/1 11/30
12 1 34 12/1 12/31
13 2 13 1/1 1/31
14 2 13 2/1 2/29
15 2 28 3/1 3/31
16 2 29 4/1 4/30
17 2 31 5/1 5/31
18 2 26 6/1 6/30
19 2 43 7/1 7/31
20 2 32 8/1 8/31
21 2 18 9/1 9/30
22 2 20 10/1 10/31
23 2 47 11/1 11/30
24 2 33 12/1 12/31
I'd like to write one query that gives me estimated usage for each month (starting now until the last month that we serve an account) for all accounts being served during that month.
The results would be as follows:
Month-Year Total Est Usage
------------------------------
Oct-12 0 (none being served)
Nov-12 0 (none being served)
Dec-12 34 (only accountid 1 being served)
Jan-13 23 (accountid 1 & 2 being served)
Feb-13 24 (accountid 1 & 2 being served)
Mar-13 51 (accountid 1 & 2 being served)
...
Dec-13 33 (only accountid 2 being served)
Jan-14 0 (none being served)
Feb-14 0 (none being served)
I'm assuming I need to sum and then do a Group By...but not really sure logically how I'd lay this out.
Revised Answer:
I've created a Months table with columns MonthID, Month with values like (201212, 12), (201301, 1), ...
I've also reorganised the usage table to have a month column rather than the start date and end date, as it makes the idea clearer.
See http://sqlfiddle.com/#!3/f57d84/6 for details
The query is now:
Select
m.MonthID,
Sum(u.EstimatedUsage) TotalEstimatedUsage
From
Accounts a
Inner Join
Usage u
On a.AccountID = u.AccountID
Inner Join
Months m
On m.MonthID Between
Year(a.AccountStartDate) * 100 + Month(a.AccountStartDate) And
Year(a.AccountEndDate) * 100 + Month(a.AccountEndDate) And
m.Month = u.Month
Group By
m.MonthID
Order By
1
Previous answer, for reference which assumed usages ranges were full dates rather than just months.
Select
Year(u.StartDate),
Month(u.StartDate),
Sum(Case When a.AccountStartDate <= u.StartDate And a.AccountEndDate >= u.EndDate Then u.EstimatedUsage Else 0 End) TotalEstimatedUsage
From
Accounts a
Inner Join
Usage u
On a.AccountID = u.AccountID
Group By
Year(u.StartDate),
Month(u.StartDate)
Order By
1, 2