Insert data from another table on specific rows

Insert data from another table on specific rows - sql

I successfully inserted data from other table, but data was inserted on other rows. how to specify condition that will let data inserted on the particular row, most of query has conditions from other (old)table.
INSERT INTO machine_types( machine_id)
SELECT DISTINCT machine_id
FROM machine_events
WHERE platform_id = '70ZOvy' AND cpu = 0.25 AND memory = 0.2498
machine_types table has machine_type column, I want the data inserted on machine_type = 1.
machine_types table:
machine_id // has null value now
cpu
memory
platform
machine_type
machine_events:
machine_id
cpu
memory
platform
Question is how I am going to write the SQL query to insert the data into records that have machine_type = 1?
Note. machine_types table has only 10 records, and the new records that is going to be inserted is 126 records. All of these records is going to machine_type = 1.
Machine_types table:
machine_type platform cpu memory machine_id
1 HofLG 0.25 0.2498 NULL
2 HofLG 0.5 0.03085 NULL
3 HofLG 0.5 0.06158 NULL
4 HofLG 0.5 0.1241 NULL
Machine_events table:
time machine_id platform_id cpu memory machine_type
0 5 HofLG 0.25 0.2493 1
0 6 HofLG 0.5 0.03085 1
0 7 HofLG 0.5 0.2493 2
0 10 HofLG 0.5 0.2493 2
Null on first table should changed to get machine_id from second table based on machine_type.
machine_types after machine_id just updated:
machine_type platform cpu memory machine_id
1 HofLG 0.25 0.2498 5
1 HofLG 0.5 0.03085 6
2 HofLG 0.5 0.03085 NULL
3 HofLG 0.5 0.06158 NULL
4 HofLG 0.5 0.1241 NULL

Related

Aggregating values into previous row based on condition

I have below input table. I need to create a query to generate the output table which shown below.
The time should be accumulated and the summing up should stop when a record with both time and qty is defined and should restart from there. The Spent_Qty is the sum of all qty defined from both time and qty record till next non zero time and qty record.
Example:
first 3 rows has no meaning. 4th row has Qty defined but the next row has time defined so the qty is belong to previous time.
5th row has 3.5 (decimal time) and no Qty so need sum up with next record with qty defined. 6th row has both defined so the sum of time now is 7.25 (time / 60). 6th row has 2 qty defined and 7th row has 0 qty and 0, 8th row has no time but 0.5 qty is show. This should be summed up with 6th row which 2.5. The 9th row has hours defined so need to stop the qty accumulation and restart from here
The result:
7.25hrs took 2.5 spent qty
Example:
INPUT:
Time
Qty
0
0
0
0
0
0
0
1
3.75
0
3.5
2
0
0
0
0.5
2.5
0
2.5
0.5
0
0.5
0
0
3
0
3.5
0.4
0
0.5
1
0
3
2
0
0
0
2
0
1
4
1
1.75
0
1.75
0
0
1
0.75
1
Output
TOT_TIME
Spent QTY
7.25
2.5
5
1
6.5
0.9
4
5
4
1
3.5
1
0.75
1
I have used LEAD, LAG and other analytical functions. I need to write select statement to get the result along with few other columns. its not working out.

You can use:
SELECT *
FROM table_name
MATCH_RECOGNIZE(
ORDER BY rn
MEASURES
SUM(time) AS total_time,
SUM(qty) AS total_qty
PATTERN ( ^ no_time* | any_row*? time_and_qty no_time* )
DEFINE
time_and_qty AS time > 0 AND qty > 0,
no_time AS time = 0
)
Which, for the sample data, outputs:
TOTAL_TIME
TOTAL_QTY
0
1
7.25
2.5
5
1
6.5
.9
4
5
4
1
4.25
2
Note: The final 4 rows are aggregated together due to the rule "The time should be accumulated and the summing up should stop when a record with both time and qty is defined and should restart from there." It is not until you get to the final row that it has both time and qty.
fiddle

Update inbetween rows when the value repeats - Pandas

I have a dataframe as below
id col1 col2
1 1 1
2 0 NaN
3 0 NaN
4 0 0
5 1 1
6 0 NaN
7 1 1
8 0 0
Column 2 (col2) follows a pattern (1 followed by 0). It will have 1 (like row number 1) and it will end with 0 (like row number 4). It may or may not have NaN between the rows..
For example
9 1 1
10 0 0
is valid.. As the 9th row has 1 followed by 0 in the 10th row.
As i mentioned before , the expectation is - 1 should be followed by 0. However, there are some places where 1 occurs inbetween 1 and 0.
For example:
Row number 5 has 1. However, row number 7 is also 1. In such a case i need to change both column 1 and column 2 as 0.
id col1 col2
1 1 1
2 0 NaN
3 0 NaN
4 0 0
5 1 1
6 0 NaN
7 0 0
8 0 0
I have an iterative solution in mind.
Iterate through each row. Whenever col2 is 1 , set a flag to True
During the iteration, if flag is set , and col2 is 1 , then update col1 and col2 to 0.
But it will be very slow as i have over million rows in the dataframe. Is there any way we can achieve this without iterative solution.

Well, what do you think about this solution?
dropped_df = df.dropna()
rolling_df = dropped_df.rolling(window=2).sum()
indexes = rolling_df[rolling_df.col2>1].index
df['col1'].loc[indexes] = 0
df['col2'].loc[indexes] = 0
Also I tried your idea:
ind = 0
for i in range(0, len(df)):
if (ind == 1) and (df['col2'].loc[i] == 1):
df['col1'].loc[i] = 0
df['col2'].loc[i] = 0
ind = 0
elif df['col2'].loc[i] == 1:
ind = 1
elif df['col2'].loc[i] == 0:
ind = 0
And for your example, your idea works much faster then both, mine first solution and the solution in another answer.
Mine first solution:
CPU times: user 7.81 ms, sys: 2.11 ms, total: 9.92 ms
Wall time: 11.8 ms
Loop solution:
CPU times: user 2.23 ms, sys: 1.44 ms, total: 3.67 ms
Wall time: 5.53 ms
Solution from the second answer:
CPU times: user 7.4 ms, sys: 1.22 ms, total: 8.62 ms
Wall time: 11.2 ms
Though, I guess, it's not a good idea to make any conclusions with one run and small data. You could try all of these solutions.

You can do something like this.
# Get previous non-null col2 value.
df.loc[~df.col2.isna(), 'pv'] = df.col2.dropna().shift()
# If current col2 and previous values are both 1, change them to 0
df.loc[(df.col2 == 1) & (df.pv == 1), ['col1','col2']] = 0
df = df.drop('pv', axis=1)

Count the number of columns that has a true value then divide it to the total number of columns

Lets assume that the table below is called Table
**---------------------------------------------
ID Col1 Col2 Col3 Col4 ... Total
--------------------------------------------
1 1 0 NULL 1 30.33
2 0 1 1 1 60.12
3 1 1 0 0 20.12
4 1 0 1 1 60.12
5 0 NULL NULL 1 10.19
6 1 1 NULL 1 90.00
7 0 0 NULL 0 0.00
--------------------------------------------**
I wanted to count and get the average number of columns that has a "true" in it. And display the total average of it in the Total Columns. For example there are 10 columns and 5 columns are true so I divide it and got 50% in total. Assuming that all of the columns that I will counting are bit and has a value of null,0 and 1. How do I achieve this one?

You could use:
SELECT
ID,
100.0*(COALESCE(Col1, 0) + COALESCE(Col2, 0) + ... + COALESCE(Col10, 0)) / 10 AS pct
FROM yourTable;

Calculating Run Cost for lengths of Pipe & Pile

I work for a small company and we're trying to get away from Excel workbooks for Inventory control. I thought I had it figured out with help from (Nasser) but its beyond me. This is what I can get into a table, from there I need too get it to look like the table below.
My data
ID|GrpID|InOut| LoadFt | LoadCostft| LoadCost | RunFt | RunCost| AvgRunCostFt
1 1 1 4549.00 0.99 4503.51 4549.00 0 0
2 1 1 1523.22 1.29 1964.9538 6072.22 0 0
3 1 2 -2491.73 0 0 3580.49 0 0
4 1 2 -96.00 0 0 3484.49 0 0
5 1 1 8471.68 1.41 11945.0688 11956.17 0 0
6 1 2 -369.00 0 0 11468.0568 0 0
7 2 1 1030.89 5.07 5223.56 1030.89 0 0
8 2 1 314.17 5.75 1806.4775 1345.06 0 0
9 2 1 239.56 6.3 1508.24 1509.228 0 0
10 2 2 -554.46 0 0 954.768 0 0
11 2 1 826.24 5.884 4861.5961 1781.008 0 0
Expected output
ID|GrpID|InOut| LoadFt | LoadCostft| LoadCost | RunFt | RunCost| AvgRunCostFt
1 1 1 4549.00 0.99 4503.51 4549.00 4503.51 0.99
2 1 1 1523.22 1.29 1964.9538 6072.22 6468.4638 1.0653
3 1 2 -2491.73 1.0653 -2490.6647 3580.49 3977.7991 1.111
4 1 2 -96.00 1.111 -106.656 3484.49 3871.1431 1.111
5 1 1 8471.68 1.41 11945.0688 11956.17 15816.2119 1.3228
6 1 2 -369.00 1.3228 -488.1132 11468.0568 15328.0987 1.3366
7 2 1 1030.89 5.07 5223.56 1030.89 5223.56 5.067
8 2 1 314.17 5.75 1806.4775 1345.06 7030.0375 5.2266
9 2 1 239.56 6.3 1508.24 1509.228 8539.2655 5.658
10 2 2 -554.46 5.658 -3137.1346 954.768 5402.1309 5.658
11 2 1 826.24 5.884 4861.5961 1781.008 10263.727 5.7629
The first record of a group would be considered the opening balance. Inventory going into the yard have the ID of 1 and out of the yard are 2's. Load footage going into the yard always has a load cost per foot and I can calculate the the running total of footage. The first record of a group is easy to calculate the run cost and run cost per foot. The next record becomes a little more difficult to calculate. I need to move the average of run cost per foot forward to the load cost per foot when something is going out of the yard and then calculate the run cost and average run cost per foot again. Hopefully this makes sense to somebody and we can automate some of these calculations. Thanks for any help.
Here's an Oracle example I found;
SQL> select order_id
2 , volume
3 , price
4 , total_vol
5 , total_costs
6 , unit_costs
7 from ( select order_id
8 , volume
9 , price
10 , volume total_vol
11 , 0.0 total_costs
12 , 0.0 unit_costs
13 , row_number() over (order by order_id) rn
14 from costs
15 order by order_id
16 )
17 model
18 dimension by (order_id)
19 measures (volume, price, total_vol, total_costs, unit_costs)
20 rules iterate (4)
21 ( total_vol[any] = volume[cv()] + nvl(total_vol[cv()-1],0.0)
22 , total_costs[any]
23 = case SIGN(volume[cv()])
24 when -1 then total_vol[cv()] * nvl(unit_costs[cv()-1],0.0)
25 else volume[cv()] * price[cv()] + nvl(total_costs[cv()-1],0.0)
26 end
27 , unit_costs[any] = total_costs[cv()] / total_vol[cv()]
28 )
29 order by order_id
30 /
ORDER_ID VOLUME PRICE TOTAL_VOL TOTAL_COSTS UNIT_COSTS
---------- ---------- ---------- ---------- ----------- ----------
1 1000 100 1000 100000 100
2 -500 110 500 50000 100
3 1500 80 2000 170000 85
4 -100 150 1900 161500 85
5 -600 110 1300 110500 85
6 700 105 2000 184000 92
6 rows selected.

Let me say first off three things:
This is certainly not the best way to do it. There is a rule saying that if you need a while-loop, then you are most probably doing something wrong.
I suspect there is some calculation errors in your original "Expected output", please check the calculations since my calculated values are different according to your formulas.
This question could also be seen as a gimme teh codez type of question, but since you asked a decently formed question with some follow-up research, my answer is below. (So no upvoting since this is help for a specific case)
Now onto the solution:
I attempted to use my initial hint of the LAG statement in a nicely formed single update statement, but since you can only use a windowed function (aka LAG) inside a select or order by clause, that will not work.
What the code below does in short:
It calculates the various calculated fields for each record when they can be calculated and with the appropriate functions, updates the table and then moves onto the next record.
Please see comments in the code for additional information.
TempTable is a demo table (visible in the linked SQLFiddle).
Please read this answer for information about decimal(19, 4)
-- Our state and running variables
DECLARE #curId INT = 0,
#curGrpId INT,
#prevId INT = 0,
#prevGrpId INT = 0,
#LoadCostFt DECIMAL(19, 4),
#RunFt DECIMAL(19, 4),
#RunCost DECIMAL(19, 4)
WHILE EXISTS (SELECT 1
FROM TempTable
WHERE DoneFlag = 0) -- DoneFlag is a bit column I added to the table for calculation purposes, could also be called "IsCalced"
BEGIN
SELECT top 1 -- top 1 here to get the next row based on the ID column
#prevId = #curId,
#curId = tmp.ID,
#curGrpId = Grpid
FROM TempTable tmp
WHERE tmp.DoneFlag = 0
ORDER BY tmp.GrpID, tmp.ID -- order by to ensure that we get everything from one GrpID first
-- Calculate the LoadCostFt.
-- It is either predetermined (if InOut = 1) or derived from the previous record's AvgRunCostFt (if InOut = 2)
SELECT #LoadCostFt = CASE
WHEN tmp.INOUT = 2
THEN (lag(tmp.AvgRunCostFt, 1, 0.0) OVER (partition BY GrpId ORDER BY ID))
ELSE tmp.LoadCostFt
END
FROM TempTable tmp
WHERE tmp.ID IN (#curId, #prevId)
AND tmp.GrpID = #curGrpId
-- Calculate the LoadCost
UPDATE TempTable
SET LoadCost = LoadFt * #LoadCostFt
WHERE Id = #curId
-- Calculate the current RunFt and RunCost based on the current LoadFt and LoadCost plus the previous row's RunFt and RunCost
SELECT #RunFt = (LoadFt + (lag(RunFt, 1, 0) OVER (partition BY GrpId ORDER BY ID))),
#RunCost = (LoadCost + (lag(RunCost, 1, 0) OVER (partition BY GrpId ORDER BY ID)))
FROM TempTable tmp
WHERE tmp.ID IN (#curId, #prevId)
AND tmp.GrpID = #curGrpId
-- Set all our values, including the AvgRunCostFt calc
UPDATE TempTable
SET RunFt = #RunFt,
RunCost = #RunCost,
LoadCostFt = #LoadCostFt,
AvgRunCostFt = #RunCost / #RunFt,
doneflag = 1
WHERE ID = #curId
END
SELECT ID, GrpID, InOut, LoadFt, RunFt, LoadCost,
RunCost, LoadCostFt, AvgRunCostFt
FROM TempTable
ORDER BY GrpID, Id
The output with your sample data and a SQLFiddle demonstrating how it all works:
ID GrpID InOut LoadFt RunFt LoadCost RunCost LoadCostFt AvgRunCostFt
1 1 1 4549 4549 4503.51 4503.51 0.99 0.99
2 1 1 1523.22 6072.22 1964.9538 6468.4638 1.29 1.0653
3 1 2 -2491.73 3580.49 -2654.44 3814.0238 1.0653 1.0652
4 1 2 -96 3484.49 -102.2592 3711.7646 1.0652 1.0652
5 1 1 8471.68 11956.17 11945.0688 15656.8334 1.41 1.3095
6 1 2 -369 11587.17 -483.2055 15173.6279 1.3095 1.3095
7 2 1 1030.89 1030.89 5226.6123 5226.6123 5.07 5.07
8 2 1 314.17 1345.06 1806.4775 7033.0898 5.75 5.2288
9 2 1 239.56 1584.62 1509.228 8542.3178 6.3 5.3908
10 2 2 -554.46 1030.16 -2988.983 5553.3348 5.3908 5.3907
11 2 1 826.24 1856.4 4861.5962 10414.931 5.884 5.6103
If you are unclear about parts of the code, I can update with additional explanations.

Manipulating SQL query to create data categories that return maximum values

I have a table that looks like this at the moment:
Day Limit Price
1 52 0.3
1 4 70
1 44 200
1 9 0.01
1 0 0.03
1 0 0.03
2 52 0.4
2 10 70
2 44 200
2 5 0.01
2 0 0.55
2 2 50
Is there a way I can use SQL to manipulate the result into a table with different categories for price and selecting the maximum value for the limit respective to its price?
Day 0-10 10-100 100+
1 52 4 44
2 52 10 44

You can use CASE and MAX:
SELECT Day,
MAX(CASE WHEN Price BETWEEN 0 AND 10 THEN Limit ELSE 0 END) as ZeroToTen,
MAX(CASE WHEN Price BETWEEN 10 AND 100 THEN Limit ELSE 0 END) as TenToHundred,
MAX(CASE WHEN Price > 100 THEN Limit ELSE 0 END) as HundredPlus
FROM YourTable
GROUP BY Day
Here is the Fiddle.
BTW -- if you're using MySQL, add ticks around LIMIT since it's a keyword.
Good luck.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Insert data from another table on specific rows - sql

Related

Aggregating values into previous row based on condition

Update inbetween rows when the value repeats - Pandas

Count the number of columns that has a true value then divide it to the total number of columns

Calculating Run Cost for lengths of Pipe & Pile

Manipulating SQL query to create data categories that return maximum values

Categories

Resources