SQL : Copying records within same table - sql

SQL Server 2008 Table: table1
ID DESC TYP SUBSET VAL1 VAL2 VAL3 VAL4 PReview Country
1 DESC1 1 1 1.0 1.1 1.2 1.2 0 1
2 DESC1 1 1 2.0 1.1 1.2 1.2 0 1
3 DESC1 1 1 1.0 1.1 1.2 1.2 0 1
4 DESC2 2 1 3.0 2.1 1.7 1.8 0 1
5 DESC2 2 1 4.0 3.1 1.7 1.9 0 1
6 DESC2 2 1 5.0 6.1 1.5 1.6 0 1
13 DESC1 1 1 1.0 1.1 1.2 1.2 1 1
14 DESC1 1 1 2.0 1.1 1.2 1.2 1 1
15 DESC1 1 1 1.0 1.1 1.2 1.2 1 1
16 DESC2 2 1 1.0 6.1 1.7 1.2 1 1
17 DESC2 2 1 2.0 4.1 6.2 8.2 1 1
18 DESC2 2 1 1.0 8.1 7.2 1.9 1 1
I need to copy records which have preview = 1 into records which have preview 0. There is no way to uniquely define each record..just that they shoul dbe copied in an orderly manner.
Record 13 should be copied to Record 1
Record 14 should be copied to Record 2
Record 15 should be copied to Record 3
Thanks.

The basic idea is to "enumerate" (i.e. attach indexes to) both source and destination rows and then assign the source row with index 1 to the destination row with index 1, source row with index 2 to the destination row with index 2 etc:
UPDATE TABLE1
SET
[DESC] = SOURCE.[DESC],
TYP = SOURCE.TYP,
SUBSET = SOURCE.SUBSET,
VAL1 = SOURCE.VAL1,
VAL2 = SOURCE.VAL2,
VAL3 = SOURCE.VAL3,
VAL4 = SOURCE.VAL4,
PREVIEW = SOURCE.PREVIEW,
COUNTRY = SOURCE.COUNTRY
FROM (
SELECT DEST_ID, SRC.*
FROM
(SELECT ID DEST_ID, RANK() OVER (ORDER BY ID) R FROM TABLE1 WHERE PREVIEW = 0) DEST
JOIN (SELECT *, RANK() OVER (ORDER BY ID) R FROM TABLE1 WHERE PREVIEW = 1) SRC
ON SRC.R = DEST.R
) SOURCE
WHERE TABLE1.ID = SOURCE.DEST_ID
In plain English:
Attach indexes to rows where PREVIEW = 0, in order of ID (RANK() OVER (ORDER BY ID)).
Do the same where PREVIEW = 1.
Match source to destination indexes (JOIN ... ON SRC.R = DEST.R).
Update the table based on that matching.
Please be careful when number of destination rows is smaller than the number of source rows - the initial query execution will not update all the destination rows and the second execution may lead to the same source row being copied to a different destination row.
In effect, you'd be copying the same source row to multiple destination rows.

If you need the records to "copy over" the preview = 0 records, couldn't you just:
Delete the preview = 0 records
Replicate each preview = 1 record so you have two copies of each.
This sounds like what you're asking for.

Related

How to merge two rows if same values in sql server

I have the Following Output:
Sno
Value Stream
Duration
Inspection
1
Test1
3
1
2
ON
14
0
3
Start
5
0
4
Test1
5
1
5
OFF
0
1
6
Start
0
1
7
Test2
0
1
8
ON
3
1
9
START
0
1
10
Test2
2
2
I want to merge the same value after that before START values charge to after ON. For example S.no 4 will merge to s.no4.
1 | Test1 | 8 | 2 |
If the combination is not equal then don't allow it to merge. For Example, we have to consider only On/Start. If the condition is OFF/Start then don't allow to merge. E.g. S.no 5 and 6 OFF/Start then don't allow to merge s.no 4 & 7.
I think you are talking about summarization not merging:
select [Value Stream],
min(Sno) as First_Sno,
sum(Duration) as total_Duration,
sum(Inspection) as Inspection
from yourtable
group by [Value Stream]
Will give you the result

How to show percentage of individuals on y axis instead of count in histogram by groups?

I have a data frame like this:
> head(a)
FID IID FLASER PLASER DIABDUR HBA1C ESRD pheno
1 fam1000-03 G1000 1 1 38 10.2 1 control
2 fam1001-03 G1001 1 1 15 7.3 1 control
3 fam1003-03 G1003 1 2 17 7.0 1 case
4 fam1005-03 G1005 1 1 36 7.7 1 control
5 fam1009-03 G1009 1 1 23 7.6 1 control
6 fam1052-03 G1052 1 1 32 7.3 1 control
My df has 1698 obs of which 828 who have "case" in pheno column and 836 who have "control" in pheno column.
I make a histogram via:
library(ggplot2)
ggplot(a, aes(x=HBA1C, fill=pheno)) +
geom_histogram(binwidth=.5, position="dodge")
I would like to have the y-axis show the percentage of individuals which
have either "case" or "control" in pheno instead of the count. So percentage would be calculated for each group on y axis ("case" or "control"). I also do have NAs in my plot and it would be good to exclude those from the plot.
I guess I can remove NAs from pheno with this:
ggplot(data=subset(a, !is.na(pheno)), aes(x=HBA1C, fill=pheno)) + geom_histogram(binwidth=.5, position="dodge")
This can be achieved like so:
Note: Concerning the NAs you were right. Simply subset for non-NA values or use dplyr::filter or ...
a <- read.table(text = "id FID IID FLASER PLASER DIABDUR HBA1C ESRD pheno
1 fam1000-03 G1000 1 1 38 10.2 1 control
2 fam1001-03 G1001 1 1 15 7.3 1 control
3 fam1003-03 G1003 1 2 17 7.0 1 case
4 fam1005-03 G1005 1 1 36 7.7 1 control
5 fam1009-03 G1009 1 1 23 7.6 1 control
6 fam1052-03 G1052 1 1 32 7.3 1 control
7 fam1052-03 G1052 1 1 32 7.3 1 NA", header = TRUE)
library(ggplot2)
ggplot(a, aes(x=HBA1C, fill=pheno)) +
geom_histogram(aes(y = ..count.. / tapply(..count.., ..group.., sum)[..group..]),
position='dodge', binwidth=0.5) +
scale_y_continuous(labels = scales::percent)
Created on 2020-05-23 by the reprex package (v0.3.0)

How to get the average of every three records in a column starting from first record in MS Access/SQL?

I am working on something where i am stuck in getting the average of say every three/four/five records starting from first record in a column. If i have a table with data say
ID_Col1 | Value_Col2
1 | 1.5
2 | 2
3 | 2.5
4 | 3
5 | 3.5
6 | 4
7 | 4.5
8 | 5
9 | 5.5
10 | 6
If we say average of every three records then the Output required is
every_three_records_average_Column
none
none
average(1.5, 2, 2.5)
average(2, 2.5, 3)
average(2.5, 3, 3.5)
average(3, 3.5, 4)
average(3.5, 4, 4.5)
average(4, 4.5, 5)
average(4.5, 5, 5.5)
average(5, 5.5, 6)
Does anyone have any idea to get this kind of output in SQL query.
Any help would be much appreciated.
Thanks,
Honey
SQL Fiddle Demo
SELECT
T1.[ID_Col1], T2.[ID_Col1], T3.[ID_Col1],
T1.[Value_Col2] , T2.[Value_Col2] , T3.[Value_Col2],
(T1.[Value_Col2] + T2.[Value_Col2] + T3.[Value_Col2])/3
FROM Source T1
JOIN Source T2
ON T1.[ID_Col1] = T2.[ID_Col1] - 1
JOIN Source T3
ON T2.[ID_Col1] = T3.[ID_Col1] - 1
OUTPUT
Consider a correlated aggregate subquery filtering on last three IDs:
SELECT myTable.ID_Col1, myTable.Value_Col2,
(SELECT Avg(sub.Value_Col2)
FROM myTable As sub
WHERE sub.ID_Col1 >= myTable.ID_Col1 - 2
AND sub.ID_Col1 <= myTable.ID_Col1
AND myTable.ID_Col1 >= 3) As LastThreeAvg
FROM myTable;
Output
ID_Col1 Value_Col2 LastThreeAvg
1 1.5
2 2
3 2.5 2
4 3 2.5
5 3.5 3
6 4 3.5
7 4.5 4
8 5 4.5
9 5.5 5
10 6 5.5
However, if ID_Col1 is an AutoNumber field, there is no guarantee values will remain in numeric ordinal count. Therefore, a calculated row number, RowNo, is needed in both the derived table and aggregate subquery. In MS Access SQL without CTEs, the query becomes a bit verbose:
SELECT dT.ID_Col1, dT.Value_Col2,
(SELECT Avg(sub.Value_Col2)
FROM
(SELECT ID_Col1, Value_Col2,
(SELECT Count(*)
FROM myTable As sub
WHERE sub.ID_Col1 <= myTable.ID_Col1) As RowNo
FROM myTable) As sub
WHERE sub.RowNo >= dT.RowNo - 2
AND sub.RowNo <= dT.RowNo
AND sub.RowNo >= 3) As LastThreeAvg
FROM
(SELECT ID_Col1, Value_Col2,
(SELECT Count(*)
FROM myTable As sub
WHERE sub.ID_Col1 <= myTable.ID_Col1) As RowNo
FROM myTable) As dT
SELECT
(
SELECT Avg(A.Value_Col2) As Result
FROM myTable As A
WHERE A.ID_Col1 >= C.ID_Col1 and A.ID_Col1 < C.ID_Col1 + [MyParam]
)
FROM myTable As C
WHERE C.ID_Col1 + [MyParam] -1 <= (SELECT MAX (D.ID_Col1) From myTable As D)
Explanation:
External query: For each record in mytable C, until MyParam (3, 4, or 5 in the question), records befor the last record.
Represented in the query in the where clause: FROM myTable As C WHERE C.ID_Col1 + [MyParam] -1 <= (SELECT MAX (D.ID_Col1) From myTable As D)
Inner query: Calculate the average Value_Col2 of MyParam records, starting the current record.
Representd in the Select statement: SELECT Avg(A.Value_Col2) and in the Where clause: WHERE A.ID_Col1 >= C.ID_Col1, as C.ID_Col1 being the current ID, and and no more than [MyParam] records: A.ID_Col1 < C.ID_Col1 + [MyParam].
Test
MyTable:
ID_Col1 Value_Col2
1 1.5
2 2
3 2.5
4 3
5 3.5
6 4
7 4.5
8 5
9 5.5
10 6
11 6.5
12 7
13 7.5
14 8
15 8.5
16 9
17 9.5
Result for MyParam = 3
Result
2
2.5
3
3.5
4
4.5
5
5.5
6
6.5
7
7.5
8
8.5
9
Result for MyParam = 5
Result
2.5
3
3.5
4
4.5
5
5.5
6
6.5
7
7.5
8
8.5

SQL - Copying rows of the same table but with different foreign key ID

I have three tables which are interlinked:
Table #1 : MAIN
ID_MAIN NAME_MAIN ID_VER
1 XYZ 1.0
2 PQR 1.0
3 ABC 1.0
SUBMAIN: Columns ID_SUBMAIN(identity), NAME_SUBMAIN , ID_MAIN(Foreign Key with MAIN) , ID_VER(Foreign Key with MAIN)
ID_SUBMAIN NAME_SUBMAIN ID_MAIN ID_VER
1 X 1 1.0
2 Y 1 1.0
3 Z 1 1.0
4 A 2 1.0
Table #3 LAST:
Columns ID_LAST(identity) , ID_SUBMAIN(Foreign Key with SUBMAIN)
ID_LAST ID_SUBMAIN
1 1
2 1
3 1
4 2
5 4
Now whenever I create a new MAIN row with ID_MAIN as 1 and ID_VER as 2.0, I want to copy all of the associated records of SUBMAIN with new ID_MAIN and LAST with the new ID_SUBMAIN.
New record in MAIN
ID_MAIN NAME_MAIN ID-VER
1 XYZ 2.0
I am using Insert query to copy all of SUBMAIN records for ID_MAIN = 1
My query is like this:
INSERT INTO SUBMAIN(NAME_SUBMAIN, ID_MAIN, ID_VER)
SELECT
NAME_SUBMAIN, ID_MAIN, '2.0'
WHERE
ID_MAIN = 1
SO new records of SUBMAIN will be:
ID_SUBMAIN NAME_SUBMAIN ID_MAIN ID_VER
5 X 1 2.0
6 Y 1 2.0
7 Z 1 2.0
Now I want to copy all of the records of LAST table where ID_SUBMAIN was 1,2 and 3. Replace ID_SUBMAIN with new ID_SUBMAIN 5,6 and 7.
New records in LAST should look like this:
ID_LAST ID_SUBMAIN
6 5
7 5
8 5
9 6
I am stuck here as I am not able to figure out how can I achieve that?
the sql script you are looking for looks like this one:
INSERT INTO LAST(ID_SUBDOMAIN)
SELECT Sub2.ID_SUBDOMAIN
FROM SUBMAIN AS SUB1
INNER JOIN LAST ON LAST.ID_SUBMAIN = SUB1.ID_SUBMAIN
INNER JOIN SUBMAIN AS SUB2 ON SUB2.NAME = SUB1.NAME
AND SUB2.ID_VER = '2.0'
This way, in the first join you get the submains that are already referenced in the "LAST" table and with the next inner join, you get the new ones whith the same name and new version

Calculating Run Cost for lengths of Pipe & Pile

I work for a small company and we're trying to get away from Excel workbooks for Inventory control. I thought I had it figured out with help from (Nasser) but its beyond me. This is what I can get into a table, from there I need too get it to look like the table below.
My data
ID|GrpID|InOut| LoadFt | LoadCostft| LoadCost | RunFt | RunCost| AvgRunCostFt
1 1 1 4549.00 0.99 4503.51 4549.00 0 0
2 1 1 1523.22 1.29 1964.9538 6072.22 0 0
3 1 2 -2491.73 0 0 3580.49 0 0
4 1 2 -96.00 0 0 3484.49 0 0
5 1 1 8471.68 1.41 11945.0688 11956.17 0 0
6 1 2 -369.00 0 0 11468.0568 0 0
7 2 1 1030.89 5.07 5223.56 1030.89 0 0
8 2 1 314.17 5.75 1806.4775 1345.06 0 0
9 2 1 239.56 6.3 1508.24 1509.228 0 0
10 2 2 -554.46 0 0 954.768 0 0
11 2 1 826.24 5.884 4861.5961 1781.008 0 0
Expected output
ID|GrpID|InOut| LoadFt | LoadCostft| LoadCost | RunFt | RunCost| AvgRunCostFt
1 1 1 4549.00 0.99 4503.51 4549.00 4503.51 0.99
2 1 1 1523.22 1.29 1964.9538 6072.22 6468.4638 1.0653
3 1 2 -2491.73 1.0653 -2490.6647 3580.49 3977.7991 1.111
4 1 2 -96.00 1.111 -106.656 3484.49 3871.1431 1.111
5 1 1 8471.68 1.41 11945.0688 11956.17 15816.2119 1.3228
6 1 2 -369.00 1.3228 -488.1132 11468.0568 15328.0987 1.3366
7 2 1 1030.89 5.07 5223.56 1030.89 5223.56 5.067
8 2 1 314.17 5.75 1806.4775 1345.06 7030.0375 5.2266
9 2 1 239.56 6.3 1508.24 1509.228 8539.2655 5.658
10 2 2 -554.46 5.658 -3137.1346 954.768 5402.1309 5.658
11 2 1 826.24 5.884 4861.5961 1781.008 10263.727 5.7629
The first record of a group would be considered the opening balance. Inventory going into the yard have the ID of 1 and out of the yard are 2's. Load footage going into the yard always has a load cost per foot and I can calculate the the running total of footage. The first record of a group is easy to calculate the run cost and run cost per foot. The next record becomes a little more difficult to calculate. I need to move the average of run cost per foot forward to the load cost per foot when something is going out of the yard and then calculate the run cost and average run cost per foot again. Hopefully this makes sense to somebody and we can automate some of these calculations. Thanks for any help.
Here's an Oracle example I found;
SQL> select order_id
2 , volume
3 , price
4 , total_vol
5 , total_costs
6 , unit_costs
7 from ( select order_id
8 , volume
9 , price
10 , volume total_vol
11 , 0.0 total_costs
12 , 0.0 unit_costs
13 , row_number() over (order by order_id) rn
14 from costs
15 order by order_id
16 )
17 model
18 dimension by (order_id)
19 measures (volume, price, total_vol, total_costs, unit_costs)
20 rules iterate (4)
21 ( total_vol[any] = volume[cv()] + nvl(total_vol[cv()-1],0.0)
22 , total_costs[any]
23 = case SIGN(volume[cv()])
24 when -1 then total_vol[cv()] * nvl(unit_costs[cv()-1],0.0)
25 else volume[cv()] * price[cv()] + nvl(total_costs[cv()-1],0.0)
26 end
27 , unit_costs[any] = total_costs[cv()] / total_vol[cv()]
28 )
29 order by order_id
30 /
ORDER_ID VOLUME PRICE TOTAL_VOL TOTAL_COSTS UNIT_COSTS
---------- ---------- ---------- ---------- ----------- ----------
1 1000 100 1000 100000 100
2 -500 110 500 50000 100
3 1500 80 2000 170000 85
4 -100 150 1900 161500 85
5 -600 110 1300 110500 85
6 700 105 2000 184000 92
6 rows selected.
Let me say first off three things:
This is certainly not the best way to do it. There is a rule saying that if you need a while-loop, then you are most probably doing something wrong.
I suspect there is some calculation errors in your original "Expected output", please check the calculations since my calculated values are different according to your formulas.
This question could also be seen as a gimme teh codez type of question, but since you asked a decently formed question with some follow-up research, my answer is below. (So no upvoting since this is help for a specific case)
Now onto the solution:
I attempted to use my initial hint of the LAG statement in a nicely formed single update statement, but since you can only use a windowed function (aka LAG) inside a select or order by clause, that will not work.
What the code below does in short:
It calculates the various calculated fields for each record when they can be calculated and with the appropriate functions, updates the table and then moves onto the next record.
Please see comments in the code for additional information.
TempTable is a demo table (visible in the linked SQLFiddle).
Please read this answer for information about decimal(19, 4)
-- Our state and running variables
DECLARE #curId INT = 0,
#curGrpId INT,
#prevId INT = 0,
#prevGrpId INT = 0,
#LoadCostFt DECIMAL(19, 4),
#RunFt DECIMAL(19, 4),
#RunCost DECIMAL(19, 4)
WHILE EXISTS (SELECT 1
FROM TempTable
WHERE DoneFlag = 0) -- DoneFlag is a bit column I added to the table for calculation purposes, could also be called "IsCalced"
BEGIN
SELECT top 1 -- top 1 here to get the next row based on the ID column
#prevId = #curId,
#curId = tmp.ID,
#curGrpId = Grpid
FROM TempTable tmp
WHERE tmp.DoneFlag = 0
ORDER BY tmp.GrpID, tmp.ID -- order by to ensure that we get everything from one GrpID first
-- Calculate the LoadCostFt.
-- It is either predetermined (if InOut = 1) or derived from the previous record's AvgRunCostFt (if InOut = 2)
SELECT #LoadCostFt = CASE
WHEN tmp.INOUT = 2
THEN (lag(tmp.AvgRunCostFt, 1, 0.0) OVER (partition BY GrpId ORDER BY ID))
ELSE tmp.LoadCostFt
END
FROM TempTable tmp
WHERE tmp.ID IN (#curId, #prevId)
AND tmp.GrpID = #curGrpId
-- Calculate the LoadCost
UPDATE TempTable
SET LoadCost = LoadFt * #LoadCostFt
WHERE Id = #curId
-- Calculate the current RunFt and RunCost based on the current LoadFt and LoadCost plus the previous row's RunFt and RunCost
SELECT #RunFt = (LoadFt + (lag(RunFt, 1, 0) OVER (partition BY GrpId ORDER BY ID))),
#RunCost = (LoadCost + (lag(RunCost, 1, 0) OVER (partition BY GrpId ORDER BY ID)))
FROM TempTable tmp
WHERE tmp.ID IN (#curId, #prevId)
AND tmp.GrpID = #curGrpId
-- Set all our values, including the AvgRunCostFt calc
UPDATE TempTable
SET RunFt = #RunFt,
RunCost = #RunCost,
LoadCostFt = #LoadCostFt,
AvgRunCostFt = #RunCost / #RunFt,
doneflag = 1
WHERE ID = #curId
END
SELECT ID, GrpID, InOut, LoadFt, RunFt, LoadCost,
RunCost, LoadCostFt, AvgRunCostFt
FROM TempTable
ORDER BY GrpID, Id
The output with your sample data and a SQLFiddle demonstrating how it all works:
ID GrpID InOut LoadFt RunFt LoadCost RunCost LoadCostFt AvgRunCostFt
1 1 1 4549 4549 4503.51 4503.51 0.99 0.99
2 1 1 1523.22 6072.22 1964.9538 6468.4638 1.29 1.0653
3 1 2 -2491.73 3580.49 -2654.44 3814.0238 1.0653 1.0652
4 1 2 -96 3484.49 -102.2592 3711.7646 1.0652 1.0652
5 1 1 8471.68 11956.17 11945.0688 15656.8334 1.41 1.3095
6 1 2 -369 11587.17 -483.2055 15173.6279 1.3095 1.3095
7 2 1 1030.89 1030.89 5226.6123 5226.6123 5.07 5.07
8 2 1 314.17 1345.06 1806.4775 7033.0898 5.75 5.2288
9 2 1 239.56 1584.62 1509.228 8542.3178 6.3 5.3908
10 2 2 -554.46 1030.16 -2988.983 5553.3348 5.3908 5.3907
11 2 1 826.24 1856.4 4861.5962 10414.931 5.884 5.6103
If you are unclear about parts of the code, I can update with additional explanations.