SQL : How to create a row per period value - sql

I am currently trying to upskill myself within data analysis using tools such as SQL, EXCEL etc. So apologies, if what I am asking for may not make much sense, but happy to expand/clarify where required.
Problem :
I am trying to create a period by period line graph, showing pay across periods. However, with my current dataset the rows are :
employee codes and the columns are the individual periods with the values pertaining to that period for each row.
In order to achieve the requirements for my line graph. I would need to perform pivot of some sort to create a row per period for each worker. This will then allow me to group by periods for my line graph.
Current dataset :
Code Name Period 1 Period 2 Period 3
P1 Worker 1 2740.67 0 0
2 Worker 2 0 0 0
3 Worker 3 0 759.85 607.88
4 Worker 4 0 0 0
5 Worker 5 5000 5000 5000
6 Worker 6 1762.5 1672.5 960
12 Worker 7 6050 7750 5000
7 Worker 8 625.38 748.46 10
1234 Worker 9 2616.67 2616.67 2616.67
8 Worker 10 500 200 0
144 Worker 11 0 0 0
M100 Worker 12 423.08 0 0
M01 Worker 13 1583.33 1583.33 1583.33
M102 Worker 14 5833.33 5833.33 0
2403 Worker 15 8333.33 8333.33 11269.23
So for worker 5 they should have have 3 rows. The only thing i can think of is subqueries per worker that make up the columns or multiple unions, but seems rather time consuming ? Was hoping for a quicker efficient way of achieving what i need.

Related

One step left to filter consecutive numbers in GPS data according to conditions

Many posts have been published about filtering desired data from a dataframe. I reviewed most of them and figured out there is a gap. Mainly, responders just tried to correct the code or recommend some line to solve the presented issue. So, I would be grateful if you recommend training resources in your resource to improve our knowledge about filtering the data.
ID CI timestamp speed Lat Long
1 1 2013-01-08 10:22:36 20 23.01 33.54
2 1 2013-01-08 10:22:42 21 23.04 33.54
3 1 2013-01-08 10:22:47 25 23.05 33.54
4 1 2013-01-08 10:22:51 10 23.06 33.54
5 2 2013-01-08 10:22:27 24 23.07 33.54
6 2 2013-01-08 10:22:29 18 23.08 33.54
7 1 2013-01-08 10:33:15 07 23.09 33.55
8 1 2013-01-08 10:33:36 20 24.01 33.55
9 1 2013-01-08 10:33:42 21 24.11 33.55
10 1 2013-01-08 10:33:47 25 24.14 33.55
11 1 2013-01-08 10:33:51 10 24.21 33.55
12 1 2013-01-08 10:33:57 24 24.31 33.55
13 1 2013-01-08 10:33:59 10 24.51 33.55
14 1 2013-01-08 10:34:04 24 24.61 33.55
I have a dataframe that includes 200 thousand records. It has three columns, Cycler's Id (CI), timestamp, and speed. According to CI, the time difference (TD) is calculated from subtraction between two succession rows. I used the groupby to calculate TD according to CI. Then, I dropped the CIs because the number of their records was less than five. In the presented data sample, CI "2" is eliminated because the number of its records was 2. Next, I counted the number of consecutive values whose time difference was less than seven and put them in a list for each CI. Obviously, each CI could have various lengths of successive values. For instance, in the shown sample, CI 1 has lengths 4 and 8.
What step is still left as the final step to accomplish the mission:
I need to check the generated list for each CI and save the records having maximum consecutive values in a new CSV file.
Following is my code and I need to complete this code or welcome a faster solution.
grouped = df.sort_values(by='timestamp').groupby('CI')
for i in grouped.groups.keys():
p = grouped.get_group(i)
if len(p.index)>13:
p['Time_diff'] = pd.to_datetime(p['timestamp'].astype(str)).diff(1).dt.total_seconds()
y= [len(list(g)) for k, g in groupby(p['Time_diff']<7) if k==True]
if len(p['y']) !=0:
if max(y)>5:
################ till here, my code works perfectly ##########
??? generating code to save only consecutive values having a maximum length
for each CI
p.to_csv('D:/out/'f"{i}.csv", sep=';')
Expected output:
In this case, only the last 7 records save in a CSV file.
Thanks in advance

How to run a loop on a query that gives the sum of time remaining on tickets so that we get time remaining of individual tickets?

I have a table consisting of Entity_Id, Date_of_Modification, Previous_State, and New_State for tickets we are working on.
Entity_Id
Date_of_Modification
Previous_State
New_State
Time Difference (Days)
1
3/18/2020
Internal Review
Done
0
1
3/18/2020
Open
Internal Review
0
2
6/25/2020
Internal Review
Done
1
2
6/24/2020
Done
Internal Review
0
2
6/21/2020
Testing
Done
3
2
6/18/2020
In Dev
Testing
3
2
4/30/2020
Planned
In Dev
49
2
3/21/2020
Open
Planned
0
3
3/31/2020
Internal Review
Internal Review
6
3
3/25/2020
Analyzing
Internal Review
5
3
3/20/2020
Analyzing
Analyzing
1
3
3/10/2020
Open
Analyzing
0
4
3/25/2020
Internal Review
Done
2
4
3/23/2020
Internal Review
Internal Review
0
4
3/23/2020
Open
Internal Review
5
4
3/18/2020
Open
Open
32
4
3/18/2020
Done
Open
0
4
2/14/2020
Done
Done
17
4
2/14/2020
Internal Review
Done
0
4
1/28/2020
Internal Review
Internal Review
2
4
1/28/2020
Open
Internal Review
0
I have figured out the query for calculating the total amount of time already spent by a ticket.
I also have figured out the time spent by the ticket on 'internal review' state because we want the time spent apart from this state and have written a query to calculate the remaining time.
-------query to find total time remaining for a ticket apart from internal review---------
SELECT M.TotalTime - N.IRTotalTime AS RemainingHours
FROM
----------query to find total time spent on a ticket---------
(SELECT SUM(B.Diff) AS TotalTime
FROM
(SELECT
A.Modification_Id,
A.Date_of_Modification,
A.Previous_State,
A.State AS NewState,
DATEDIFF(DAY, LAG(Date_of_Modification) OVER (ORDER BY Date_of_Modification), Date_of_Modification)
AS Diff
FROM
(SELECT
Modification_Id,
Date_of_Modification,
Previous_State,
State
FROM Book2
)AS A)
AS B) AS M
,
----------query to find total time spent on internal review---------
(SELECT SUM(B.Diff) AS IRTotalTime
FROM
(SELECT
A.Modification_Id,
A.Date_of_Modification,
A.Previous_State,
A.State AS NewState,
DATEDIFF(DAY, LAG(Date_of_Modification) OVER (ORDER BY Date_of_Modification), Date_of_Modification) AS Diff
FROM
(SELECT
Modification_Id,
Date_of_Modification,
Previous_State,
State
FROM Book2
WHERE Previous_State = 'Internal Review' AND State <> 'Internal Review'
UNION
SELECT
Modification_Id,
Date_of_Modification,
Previous_State,
State
FROM Book2
WHERE Previous_State = 'Internal Review' AND State = 'Internal Review'
) AS A
) AS B
WHERE B.Previous_State = 'Internal Review' AND B.NewState <> 'Internal Review') AS N
But this query for some reason is only for for case when I specify the ticket number (i.e. Entity_Id). It is not working when I run it over the entire table. So I thought if we could use a loop to get the total remaining time of individual tickets.
But I am having difficulty running that query through a loop and getting the Entity_Id displayed for each calculation on the tickets.
When I run the query I get the value 55 which might be the total remaining time. But I want the total remaining time for individual tickets like:
Entity_Id
Remaining Time (Days)
1
NULL
2
95
3
11
4
20
Thank you
Update:
I used PARTITION BY Entity_Id and got the required total time and Internal Review time of individual tickets and saved the result in separate tables. I now need to subtract the value of time of 2nd table from 1st table. There are rows that have NULL value in the time spent column in some of the rows of both the table.
Table A (Total time spent):
Entity_Id
Remaining Time (Days)
1
NULL
2
96
3
21
4
21
Table A (Time spent in Internal Review):
Entity_Id
Remaining Time (Days)
2
1
3
15
4
5
Thanks
Update:
I have figured out the query for it. Thank you all for your suggestions.
If the question regarding the Internal review state was unclear, here is a diag representing what I require from this query for a particular ticket:
Total: sum of time diff = 58 days
Internal Review State: 19 days
Final result: 39 days
#JonArmstrong
Try adding a PARTITION BY in your lag functions, like so:
DATEDIFF(DAY, LAG(Date_of_Modification) OVER (PARTITION BY Entity_Id ORDER BY Date_of_Modification), Date_of_Modification)

Is it possible to set a dynamic window frame bound in SQL OVER(ROW BETWEEN ...)-Clause?

Consider the following table, describing a patients medication plan. For example, the first row describes that the patient with patient_id = 1 is treated from timestamp 0 to 4. At time = 0, the patient has not yet become any medication (kum_amount_start = 0). At time = 4, the patient has received a kumulated amount of 100 units of a certain drug. It can be assumed, that the drug is given in with a constant rate. Regarding the first row, this means that the drug is given with a rate of 25 units/h.
patient_id
starttime [h]
endtime [h]
kum_amount_start
kum_amount_end
1
0
4
0
100
1
4
5
100
300
1
5
15
300
550
1
15
18
550
700
2
0
3
0
150
2
3
6
150
350
2
6
10
350
700
2
10
15
700
1100
2
15
19
1100
1500
I want to add the two columns "kum_amount_start_last_6hr" and "kum_amount_end_last_6hr" that describe the amount that has been given within the last 6 hours of the treatment (for the respective timestamps start, end).
I'm stuck with this problem for a while now.
I tried to tackle it with something like this
SUM(kum_amount) OVER (PARTITION BY patient_id ROWS BETWEEN "dynmaic window size" AND CURRENT ROW)
but I'm not sure whether this is the right approach.
I would be very happy if you could help me out here, thanks!

Reward distribution Reinforcement Learning

Problem1:
We want to go from s to e. In each cell we can move right R or down D. The environment is fully known. The table has (4*5) 20 cells. The challenge is that we do not know what the reward of each cell is, but we will receive an overall reward as we pass and finish a path.
Example: a solution can be RRDDRDR and the overall reward is 16.
s 3 5 1 5
1 2 4 5 1
7 3 1 2 8
9 2 1 1 e
The target is to find a set of actions from Start to End which maximizes the obtained overall reward. How can we distribute the overall reward among actions?
Problem2:
This problem is the same as Problem1 but the rewards of problem environment is dynamic so that the way we reach a cell will affect the rewards of cells which are ahead.
Example: for two movements of RRD and DRR, both will get us to the same cell but since they have different path, the ahead cells will have different rewards.
s 3 5 1 5
1 2 4 9 -1
7 3 2 -5 18
9 2 9 7 e
(RRD path, selecting this path will result in changes of rewards of ahead cells)
s 3 5 1 5
1 2 4 3 1
7 3 30 7 -8
9 2 40 11 e
(DRR path, selecting this path will result in changes of rewards of ahead cells)
The target is to find a set of actions from Start to End which maximizes the obtained overall reward. How can we distribute the overall reward between actions? (After passing a path from Start to End and the overall reward is obtained)
Can you say more about the research you are doing? (The problem sounds a lot like the sort of thing someone might assign just to get you thinking about temporal credit assignment.)

How to return a group of rows when one row meets "where" criteria in SQL Anywhere

I am somewhat overwhelmed by what I am trying to do, since I have only been using SQL for 3 days now, but I already love the increased functionality over MS query. The need for the IN function is what drove me to learn about this, and I thank the community for the info here to get me through learning that.
I tried looking thru other questions, but I couldn't find one in which the intent was to group more than two rows, or to group a varying number of rows. This means that count and duplicate are both out as options.
What I am doing is analyzing a table of part number information that spans multiple store locations. The table gives a row to each instance of a part number, so if all 15 stores have some sort of history for a given part number, that part number will have 15 rows in the table.
I am wanting to look at other store's history for parts that meet the criteria of 0 sales history for my location. The purpose is to see if they can be transferred to another store instead of being returned to the vendor and incurring a restock fee.
Here is a simplified version of the table organized in the way I would want the output to be structured. I got here by having suspected part numbers and using the list of them as a text string in IN() but I want to go about this the other way and build a list of part numbers from sales data in this table.
Branch| Part_No| Description| Bin Qty|current 12 mo sales|previous 12 mo sales|
------|--------|------------|---------|-------------------|--------------------|
20 CA38385 SUPPORT 2 1 1
23 CA38385 SUPPORT 1 0 0
25 CA38385 SUPPORT 0 0 1
20 DFC10513 Hdw Kit 0 1 0
23 DFC10513 Hdw Kit 1 0 0
07 DFC10513 Hdw Kit 0 1 0
3 D59096 VALVE 0 0 12
5 D59096 VALVE 0 0 4
6 D59096 VALVE 4 6 12
8 D59096 VALVE 0 0 0
33 D59096 VALVE 11 14 18
21 D59096 VALVE 4 4 4
22 D59096 VALVE 0 0 0
23 D59096 VALVE 10 0 0
24 D59096 VALVE 0 0 0
25 D59096 VALVE 0 0 0
26 D59096 VALVE 2 2 0
1 TE67401 Repair Kit 1 1 2
21 TE67401 REPAIR KIT 1 3 0
22 TE67401 REPAIR KIT 0 1 0
I am branch 23, so the start of the query as I understand it would be
Select * from part_information
Group By part_number
Having IN(Branch) 23 and bin qty > 0 and current_12_mo_sales=0 and previous_12_mo_sales = 0
Can you point me down the right track? This table has approx. 200000 rows in it, so I really need to learn how to do this. I really don't see a better way.
Thank you in advance for your help and or criticism -Cody
Select * from part_information
where part_number not in (
select part_number from part_information
where branch = 23 and bin_qty > 0 -- etc...
)
(Apologies for lack of formatting).
This ended up working the way I wanted
SELECT pi_Branch, pi_Franchise, pi_Part_No, pi_Description, pi_Bin_Qty,
pi_Bin, pi_current_12_mo_sales, pi_previous_12_mo_sales, pi_Inventory_Cost,
pi_Return_Indicator
From Part_Information
Where pi_Part_No IN (Select pi_Part_No
From Part_Information
Where pi_Branch=23 And
pi_Bin_Qty>0 And pi_current_12_mo_sales<=0
And pi_previous_12_mo_sales<=0)
I was thinking that this had to be some complex process, but in reality, two simple queries were all that was needed.
I would still be interested in anyone's opinion on a better or more efficient way of handling this.
Thanks Mischa for getting me there!