SQL. Split data over month based on expected hours - sql

I really hope you can help me with this problem, which seems pretty complicated for me.
Dealid: DealprojectStartDate: expectedhours:
3534 2021-01-01 200
What I want is to split the weightamount out on different month in the future based on the expected number of hours.
I have following distribution key for expected hours:
0-500 = 2 month
500-1500 = 4 month
1500 - 4000 = 6 month
4000 - above = 8 month.
So forexample: in the above observation the start date is 01/01 and expected hours of 100 therefore weightamount should be split over 2 month -> Month 1 = 100 and Month 2 = 100.
Important note: If it is the first of the month then it should be allocated to that month. So for the above exapmle because it is the first of the monst (01/05) then it should be allocated to month 5 and 6, but if the start date wat 07/05 then i should have been allocated to month 6 and 7.
What i think would work if to get a new tabel that would split above observation into this:
Dealid: allocation date: expectedhours:
3534 2021-01-01(jan) 100
3534 2021-02-01(feb) 100
Hope you guy can help. Thanks

Consider the following table:
min max count month_number
0 500 2 1
0 500 2 2
500 1500 4 1
500 1500 4 2
500 1500 4 3
500 1500 4 4
1500 4000 6 1
1500 4000 6 2
1500 4000 6 3
1500 4000 6 4
1500 4000 6 5
1500 4000 6 6
4000 1000000000 8 1
4000 1000000000 8 2
4000 1000000000 8 3
4000 1000000000 8 4
4000 1000000000 8 5
4000 1000000000 8 6
4000 1000000000 8 7
4000 1000000000 8 8
If we call that table calc_lookup and your table atable then this query will give you the results you want
SELECT a.Dealid,
dateadd(month, dealprojectstartdate, l.month_number-1) as alocation_date,
a.expectedhours / l.count as expectedhours
FROM atable a
JOIN calc_lookup l on a.expectedhours between l.min and l.max
You don't give detail of the edge cases and such -- so there may be some changes needed (off by one, rounding, etc.)

Related

How can I write pyspark code to get below output

Input:
Buy-A
Date
Time
Qty
Per Share price
Total Value
15
10
10
10
100
15
14
20
10
200
Sell - B
Date
Time
Qty
Per Share price
Total Value
15
15
15
20
300
Output:
Date
Buy Time
Buy Qty
Buy Per Share price
Buy Total Value
Sell Qty
Sell Per Share Price
Sell Total Value
15
10
10
10
100
10
20
200
15
14
5
10
50
5
20
100
15
14
15
10
150
By using pyspark

Cumulative over table rows with condition Oracle PL/SQL

I have two tables:
Employees:
employee_id field max_amount
3 a 3000
4 a 3000
1 a 1600
2 a 500
4 b 4000
2 b 4000
3 b 1700
ordered by employee, field, amount desc.
Amounts (pol, premia,field):
pol premia field **assign_to_employee**
11 900 a 3
44 1000 a 3
55 1400 a 4
77 500 a 3
88 1300 a 1
22 800 b 4
33 3900 b 2
66 1300 b 4
Assign Stats Table:
employee_id field max_amount true_amount remain
3 a 3000 2400 600
4 a 3000 1400 1600
1 a 1600 1300 300
2 a 500 0 500
4 b 4000 2100 1900
2 b 4000 3900 100
3 b 1700 0 1700
The output : assign_to_employee field (merged to amounts table).
Algoritem wise : The method is to assign pol's to employees until the premia needs to be added to the cumulative_sum is bigger then the max amount per employee listed in the employees table. You always start with the employess with most max amount until you cannot add any other pols to the employee.
I start with the employees with the grater max_amount per field.
I keep doing this until no pols remains to be assign.
Can you help me solve this?
Thank you.

Pandas: to get mean for each data category daily [duplicate]

I am a somewhat beginner programmer and learning python (+pandas) and hope I can explain this well enough. I have a large time series pd dataframe of over 3 million rows and initially 12 columns spanning a number of years. This covers people taking a ticket from different locations denoted by Id numbers(350 of them). Each row is one instance (one ticket taken).
I have searched many questions like counting records per hour per day and getting average per hour over several years. However, I run into the trouble of including the 'Id' variable.
I'm looking to get the mean value of people taking a ticket for each hour, for each day of the week (mon-fri) and per station.
I have the following, setting datetime to index:
Id Start_date Count Day_name_no
149 2011-12-31 21:30:00 1 5
150 2011-12-31 20:51:00 1 0
259 2011-12-31 20:48:00 1 1
3015 2011-12-31 19:38:00 1 4
28 2011-12-31 19:37:00 1 4
Using groupby and Start_date.index.hour, I cant seem to include the 'Id'.
My alternative approach is to split the hour out of the date and have the following:
Id Count Day_name_no Trip_hour
149 1 2 5
150 1 4 10
153 1 2 15
1867 1 4 11
2387 1 2 7
I then get the count first with:
Count_Item = TestFreq.groupby([TestFreq['Id'], TestFreq['Day_name_no'], TestFreq['Hour']]).count().reset_index()
Id Day_name_no Trip_hour Count
1 0 7 24
1 0 8 48
1 0 9 31
1 0 10 28
1 0 11 26
1 0 12 25
Then use groupby and mean:
Mean_Count = Count_Item.groupby(Count_Item['Id'], Count_Item['Day_name_no'], Count_Item['Hour']).mean().reset_index()
However, this does not give the desired result as the mean values are incorrect.
I hope I have explained this issue in a clear way. I looking for the mean per hour per day per Id as I plan to do clustering to separate my dataset into groups before applying a predictive model on these groups.
Any help would be grateful and if possible an explanation of what I am doing wrong either code wise or my approach.
Thanks in advance.
I have edited this to try make it a little clearer. Writing a question with a lack of sleep is probably not advisable.
A toy dataset that i start with:
Date Id Dow Hour Count
12/12/2014 1234 0 9 1
12/12/2014 1234 0 9 1
12/12/2014 1234 0 9 1
12/12/2014 1234 0 9 1
12/12/2014 1234 0 9 1
19/12/2014 1234 0 9 1
19/12/2014 1234 0 9 1
19/12/2014 1234 0 9 1
26/12/2014 1234 0 10 1
27/12/2014 1234 1 11 1
27/12/2014 1234 1 11 1
27/12/2014 1234 1 11 1
27/12/2014 1234 1 11 1
04/01/2015 1234 1 11 1
I now realise I would have to use the date first and get something like:
Date Id Dow Hour Count
12/12/2014 1234 0 9 5
19/12/2014 1234 0 9 3
26/12/2014 1234 0 10 1
27/12/2014 1234 1 11 4
04/01/2015 1234 1 11 1
And then calculate the mean per Id, per Dow, per hour. And want to get this:
Id Dow Hour Mean
1234 0 9 4
1234 0 10 1
1234 1 11 2.5
I hope this makes it a bit clearer. My real dataset spans 3 years with 3 million rows, contains 350 Id numbers.
Your question is not very clear, but I hope this helps:
df.reset_index(inplace=True)
# helper columns with date, hour and dow
df['date'] = df['Start_date'].dt.date
df['hour'] = df['Start_date'].dt.hour
df['dow'] = df['Start_date'].dt.dayofweek
# sum of counts for all combinations
df = df.groupby(['Id', 'date', 'dow', 'hour']).sum()
# take the mean over all dates
df = df.reset_index().groupby(['Id', 'dow', 'hour']).mean()
You can use the groupby function using the 'Id' column and then use the resample function with how='sum'.

How can I transfer the data from on column to another based on another column values in SQL

select Products, Fiscal_year, Fiscal_Period, Stock_QTY, DaysRemaining,
(Stock_QTY / DaysRemaining) as QtyforPeriod,
Stock_QTY -(Stock_QTY / DaysRemaining) as LeftforNextmonth
from Stocks
products| Fiscal_yaer| Fiscal_period| Stock_QTY |DaysReamain| QtyforPeriod |LeftforNextMonth
5000 22 1 100 4
6000 22 1 200 4
7000 22 2 300 20
7000 22 3 400 40
8000 23 1 500 60
5000 23 1 600 60
7000 23 2 700 90
8000 23 3 800 100
There is any possibility to write a query if the Fiscal_yae =22 Fiscal_period=4. Subtract StockTY - LeftforNextMonth of period 3 and divided by DaysRemaining.
Like if the Fiscal_yae =22 Fiscal_period=5. Subtract StockTY - LeftforNextMonth of period 4 and divided by days remaining.
Like if the Fiscal_yae =22 Fiscal_period=6. Subtract StockTY ( - ) LeftforNextMonth of period 5 and divided by days remaining.

Join Three tables with Sum of column in access query

I have Three tables as shown below..
I need output as shown in output table
for this i need to join three tables and order output in month order
tbl_MonthList
MonthID MonthList
1 January
2 February
3 March
4 April
5 May
6 June
7 July
8 August
9 September
10 October
11 November
12 December
tbl_Amount:
Month_id Amount_Received Customer_id
3 500 aaa
3 1000 bbb
4 700 jjj
5 300 aaa
5 400 jjj
5 500 ppp
7 1000 aaa
10 1500 bbb
12 700 jjj
tbl_Month_Target
MonthID MonthF_L
1 10000
2 150000
3 1000
4 50000
5 5000
6 3000
7 20000
8 12000
9 34000
10 85000
11 34000
12 45000
I need output as shown below
Month Total_amount MonthF_L
January 0 10000
February 0 150000
March 2000 1000
April 700 50000
May 1200 5000
June 0 3000
July 1000 20000
August 0 12000
September 0 34000
October 1500 85000
November 0 34000
December 700 45000
SELECT ML.MonthList AS Month,
Sum(A.Amount_Received) AS Total_amount,
First(MT.MonthF_L) AS MonthF_L
FROM (tbl_MonthList AS ML
INNER JOIN tbl_Month_Target AS MT ON ML.MonthID = MT.MonthID)
LEFT JOIN tbl_Amount AS A ON ML.MonthID = A.Month_id
GROUP BY ML.MonthList, ML.MonthID
ORDER BY ML.MonthID
Note: In MS Access, multiple joins must be explicitly nested within parentheses
Try this:
select ml.MonthList, sum(a.Amount_Received), mt.MonthF_L from tbl_MonthList ml
left join tbl_Month_Target mt on mt.MonthID = ml.MonthID
left join tbl_Amount a on ml.Month_id = ml.MonthID
group by ml.MonthList, mt.MonthF_L