Complex SQL request with double join - sql

There is two tables, first one contains variable name and its ID
1000131 AddIn_EM63Alarms_Alarm_ShotCycle
1000132 AddIn_EM63Alarms_Alarm_Status
1000133 AddIn_EM63Alarms_Alarm_Code
1000134 AddIn_EM63Alarms_Alarm_Message
Second one contains variable ID and data of those variables in one column called "STRVALUE"
1000131 0 1646404026 664 33 1078067200 1209
1000132 0 1646404026 664 122 1078067200 1
1000133 0 1646404026 664 48 1078067200 650
1000134 0 1646404026 664 61 1078067200 HOPPER TEMP.: TOL. LIM. +/-
1000131 0 1646404026 886 131 1078067200 1209
1000132 0 1646404026 886 220 1078067200 1
1000133 0 1646404026 886 146 1078067200 650
1000134 0 1646404026 886 159 1078067200 HOPPER TEMP.: TOL. LIM. +/-
1000131 0 1646404027 146 229 1078067200 1209
1000132 0 1646404027 146 318 1078067200 0
1000133 0 1646404027 146 244 1078067200 650
1000134 0 1646404027 146 257 1078067200 HOPPER TEMP.: TOL. LIM. +/-
1000131 0 1646404027 360 327 1078067200 1209
1000132 0 1646404027 360 416 1078067200 0
1000133 0 1646404027 360 342 1078067200 650
1000134 0 1646404027 360 355 1078067200 HOPPER TEMP.: TOL. LIM. +/-
1000131 0 1646404027 607 425 1078067200 1209
1000132 0 1646404027 607 514 1078067200 1
1000133 0 1646404027 607 440 1078067200 650
1000134 0 1646404027 607 453 1078067200 HOPPER TEMP.: TOL. LIM. +/-
1000131 0 1646404027 777 523 1078067200 1209
1000132 0 1646404027 777 612 1078067200 1
1000133 0 1646404027 777 538 1078067200 650
1000134 0 1646404027 777 551 1078067200 HOPPER TEMP.: TOL. LIM. +/-
1000131 0 1646404028 190 621 1078067200 1512
1000132 0 1646404028 190 698 1078067200 1
1000133 0 1646404028 190 636 1078067200 306
1000134 0 1646404028 190 649 1078067200 REQUEST: INSPECTION 2
I would like write SQL query which should output those four variable data in separate column like that:
timestamp_s | timestamp_ms | strvalue of 1st variable | strvalue of 2nd variable | strvalue of 3rd variable | strvalue of 4th variable
There is what i try:
a.timestamp_ ,
a.strvalue as ShotCycle,
b.strvalue as Code
a.timestamp_s, ,
inner join
IMM0190_AL as a
v.nAME = 'AddIn_EM63Alarms_Alarm_ShotCycle' ) as a
left join
a.timestamp_s, ,
inner join
IMM0190_AL as a
v.nAME = 'AddIn_EM63Alarms_Alarm_Code' ) as b
Therefore result is not as expected.
We can observe that rows are duplicated
Timestamp_s ShotCycle Code
1646404026 1209 650
1646404026 1209 650
1646404026 1209 650
1646404026 1209 650
Please suggest solution how to achieve required result.

This is how I resolved it.
dateadd(S, b.TIMESTAMP_S, '1970-01-01') as Time,
b.STRVALUE as Shot,
c.STRVALUE as Status,
d.STRVALUE as Code,
e.STRVALUE as Message
select al.[VARIABLE]
from [IMM0190_T].[dbo].[IMM0190_AL] as al
where (al.variable = 1000131)
) as b
left join
select al.[VARIABLE]
from [IMM0190_T].[dbo].[IMM0190_AL] as al
where (al.variable = 1000132)
)as c on b.timestamp_s=c.timestamp_s and b.TIMESTAMP_MS=c.TIMESTAMP_MS
left join
select al.[VARIABLE]
from [IMM0190_T].[dbo].[IMM0190_AL] as al
where (al.variable = 1000133)
)as d on b.timestamp_s=d.timestamp_s and b.TIMESTAMP_MS=d.TIMESTAMP_MS
left join
select al.[VARIABLE]
from [IMM0190_T].[dbo].[IMM0190_AL] as al
where (al.variable = 1000134)
)as e on b.timestamp_s=e.timestamp_s and b.TIMESTAMP_MS=e.TIMESTAMP_MS
) Main


create new column from divided columns over iteration

I am working with the following code:
url = ''
df = pd.read_csv(url)
year regional_schlüssel Aus15 Deu15 Aus16 Deu16 Aus17 Deu17 Aus18 Deu18 ... aus36 aus37 aus38 aus39 aus40 aus41 aus42 aus43 aus44 aus45
0 2000 5111000 0 4 8 25 20 45 56 89 ... 935 862 746 732 792 660 687 663 623 722
1 2000 5113000 1 1 4 14 13 33 19 48 ... 614 602 498 461 521 470 393 411 397 400
2 2000 5114000 0 11 0 5 2 13 7 20 ... 317 278 265 235 259 228 204 173 213 192
3 2000 5116000 0 2 2 7 3 28 13 26 ... 264 217 206 207 197 177 171 146 181 169
4 2000 5117000 0 0 3 1 2 4 4 7 ... 135 129 118 116 128 148 89 110 124 83
I would like to create a new set of columns fertility_deu15, ..., fertility_deu45 and fertility_aus15, ..., fertility_aus45 such that aus15 / Aus15 = fertiltiy_aus15 and deu15/ Deu15 = fertility_deu15 for each ausi and Ausj where j == i \n [15-45] and deui:Deuj where j == i \n [15-45]
I'm not sure what is up with that data but we need to fix it to make it numeric. I'll end up doing that while filtering
numerator = df.filter(regex='^[a-z]+\d+$') # Lower case ones
numerator = numerator.apply(pd.to_numeric, errors='coerce') # Fix numbers
denominator = df.filter(regex='^[A-Z][a-z]+\d+$').rename(columns=str.lower)
denominator = denominator.apply(pd.to_numeric, errors='coerce')

Taking the last two rows' minimum value

I have this data frame:
ID Date X 123_Var 456_Var 789_Var
A 16-07-19 3 777 250 810
A 17-07-19 9 637 121 529
A 18-07-19 7 878 786 406
A 19-07-19 4 656 140 204
A 20-07-19 2 295 272 490
A 21-07-19 3 778 600 544
A 22-07-19 6 741 792 907
B 01-07-19 4 509 690 406
B 02-07-19 2 732 915 199
B 03-07-19 2 413 725 414
B 04-07-19 2 170 702 912
B 09-08-19 3 851 616 477
B 10-08-19 9 475 447 555
B 11-08-19 1 412 403 708
B 12-08-19 2 299 537 321
B 13-08-19 4 310 119 125
C 01-12-18 4 912 755 657
C 02-12-18 4 586 771 394
C 04-12-18 9 498 122 193
C 05-12-18 2 500 528 764
C 06-12-18 1 982 383 654
C 07-12-18 1 299 496 488
C 08-12-18 3 336 691 496
C 09-12-18 3 206 433 263
C 10-12-18 2 373 319 111
I want to show the minimum value between current row and previous row values, for each column in 123_Var 456_Var 789_Var set.
That should be applied separately for each ID. (Groupby.)
The first row of each ID, will show the current value. (Since there's no "previous" value to compare.)
Expected result:
ID Date X 123_Var 456_Var 789_Var 123_Min2 456_Min2 789_Min2
A 16-07-19 3 777 250 810 777 250 810
A 17-07-19 9 637 121 529 637 121 529
A 18-07-19 7 878 786 406 637 121 406
A 19-07-19 4 656 140 204 656 140 204
A 20-07-19 2 295 272 490 295 140 204
A 21-07-19 3 778 600 544 295 272 490
A 22-07-19 6 741 792 907 741 600 544
B 01-07-19 4 509 690 406 509 690 406
B 02-07-19 2 732 915 199 509 690 199
B 03-07-19 2 413 725 414 413 725 199
B 04-07-19 2 170 702 912 170 702 414
B 09-08-19 3 851 616 477 170 616 477
B 10-08-19 9 475 447 555 475 447 477
B 11-08-19 1 412 403 708 412 403 555
B 12-08-19 2 299 537 321 299 403 321
B 13-08-19 4 310 119 125 299 119 125
C 01-12-18 4 912 755 657 912 755 657
C 02-12-18 4 586 771 394 586 755 394
C 04-12-18 9 498 122 193 498 122 193
C 05-12-18 2 500 528 764 498 122 193
C 06-12-18 1 982 383 654 500 383 654
C 07-12-18 1 299 496 488 299 383 488
C 08-12-18 3 336 691 496 299 496 488
C 09-12-18 3 206 433 263 206 433 263
C 10-12-18 2 373 319 111 206 319 111
IIUC, We use groupby.shift to select the previous var for each ID, then we can use DataFrame.where
to leave only the cells where the previous value is lower than the current value and fill with the current value in the rest. We use DataFrame.add_suffix to add _Min2 and we join with df with DataFrame.join
df_vars = df[['123_Var','456_Var','789_Var']]
df = df.join(df.groupby('ID')['123_Var','456_Var','789_Var']
.where(lambda x: x.le(df_vars),df_vars)
ID Date X 123_Var 456_Var 789_Var 123_Var_Min2 456_Var_Min2 789_Var_Min2
0 A 16-07-19 3 777 250 810 777.0 250.0 810.0
1 A 17-07-19 9 637 121 529 637.0 121.0 529.0
2 A 18-07-19 7 878 786 406 637.0 121.0 406.0
3 A 19-07-19 4 656 140 204 656.0 140.0 204.0
4 A 20-07-19 2 295 272 490 295.0 140.0 204.0
5 A 21-07-19 3 778 600 544 295.0 272.0 490.0
6 A 22-07-19 6 741 792 907 741.0 600.0 544.0
7 B 01-07-19 4 509 690 406 509.0 690.0 406.0
8 B 02-07-19 2 732 915 199 509.0 690.0 199.0
9 B 03-07-19 2 413 725 414 413.0 725.0 199.0
10 B 04-07-19 2 170 702 912 170.0 702.0 414.0
11 B 09-08-19 3 851 616 477 170.0 616.0 477.0
12 B 10-08-19 9 475 447 555 475.0 447.0 477.0
13 B 11-08-19 1 412 403 708 412.0 403.0 555.0
14 B 12-08-19 2 299 537 321 299.0 403.0 321.0
15 B 13-08-19 4 310 119 125 299.0 119.0 125.0
16 C 01-12-18 4 912 755 657 912.0 755.0 657.0
17 C 02-12-18 4 586 771 394 586.0 755.0 394.0
18 C 04-12-18 9 498 122 193 498.0 122.0 193.0
19 C 05-12-18 2 500 528 764 498.0 122.0 193.0
20 C 06-12-18 1 982 383 654 500.0 383.0 654.0
21 C 07-12-18 1 299 496 488 299.0 383.0 488.0
22 C 08-12-18 3 336 691 496 299.0 496.0 488.0
23 C 09-12-18 3 206 433 263 206.0 433.0 263.0
24 C 10-12-18 2 373 319 111 206.0 319.0 111.0
Case 2: If you want check the n previous use groupby.rolling
df_vars = df[['123_Var','456_Var','789_Var']]
n = 3
df = df.join(df.groupby('ID')['123_Var','456_Var','789_Var']
.rolling(n,min_periods = 1).min()
ID Date X 123_Var 456_Var 789_Var 123_Var_Min3 456_Var_Min3 789_Var_Min3
0 A 16-07-19 3 777 250 810 777.0 250.0 810.0
1 A 17-07-19 9 637 121 529 637.0 121.0 529.0
2 A 18-07-19 7 878 786 406 637.0 121.0 406.0
3 A 19-07-19 4 656 140 204 637.0 121.0 204.0
4 A 20-07-19 2 295 272 490 295.0 121.0 204.0
5 A 21-07-19 3 778 600 544 295.0 140.0 204.0
6 A 22-07-19 6 741 792 907 295.0 140.0 204.0
7 B 01-07-19 4 509 690 406 509.0 690.0 406.0
8 B 02-07-19 2 732 915 199 509.0 690.0 199.0
9 B 03-07-19 2 413 725 414 413.0 690.0 199.0
10 B 04-07-19 2 170 702 912 170.0 690.0 199.0
11 B 09-08-19 3 851 616 477 170.0 616.0 199.0
12 B 10-08-19 9 475 447 555 170.0 447.0 414.0
13 B 11-08-19 1 412 403 708 170.0 403.0 477.0
14 B 12-08-19 2 299 537 321 299.0 403.0 321.0
15 B 13-08-19 4 310 119 125 299.0 119.0 125.0
16 C 01-12-18 4 912 755 657 912.0 755.0 657.0
17 C 02-12-18 4 586 771 394 586.0 755.0 394.0
18 C 04-12-18 9 498 122 193 498.0 122.0 193.0
19 C 05-12-18 2 500 528 764 498.0 122.0 193.0
20 C 06-12-18 1 982 383 654 498.0 122.0 193.0
21 C 07-12-18 1 299 496 488 299.0 122.0 193.0
22 C 08-12-18 3 336 691 496 299.0 383.0 488.0
23 C 09-12-18 3 206 433 263 206.0 383.0 263.0
24 C 10-12-18 2 373 319 111 206.0 319.0 111.0
A quite elegant solution is to apply rolling(2).min() to each group,
but to avoid the first row of NaN in each group, this first row
should be "replicated" from the source group.
To do your task, start from defining the following function:
def fnMin2(grp):
rv = pd.concat([pd.DataFrame([grp.iloc[0, -3:]]),
grp[['123_Var', '456_Var', '789_Var']].rolling(2).min().iloc[1:]])\
rv.columns = [ it.replace('Var', 'Min2') for it in rv.columns ]
return grp.join(rv)
Then apply it to each group:
Note that column names assigned to new columns in my solution are
just as you wish, contrary to the solution you accepted.
#this compares the next row to the previous row
ext = df.iloc[:,3:].gt(df.iloc[:,3:].shift(1))
#simply renamed the columns here
#join the two dataframes by columns
M = pd.concat([df,ext],axis=1)
#based on the conditions, if it is False,
#use value from current row,
#else use value from previous row

How to aggregate multiple columns - Pandas

I have this df:
ID Date XXX 123_Var 456_Var 789_Var 123_P 456_P 789_P
A 07/16/2019 1 987 551 313 22 12 94
A 07/16/2019 9 135 748 403 92 40 41
A 07/18/2019 8 376 938 825 14 69 96
A 07/18/2019 5 259 176 674 52 75 72
B 07/16/2019 9 690 304 948 56 14 78
B 07/16/2019 8 819 185 699 33 81 83
B 07/18/2019 1 580 210 847 51 64 87
I want to group the df by ID and Date, aggregate the XXX column by the maximum value, and aggregate 123_Var, 456_Var, 789_Var columns by the minimum value.
* Note: The df contains many of these columns. The shape is: {some int}_Var.
This is the current code I've started to write:
df = (df.groupby(['ID','Date'], as_index=False)
.agg({'XXX':'max', list(df.filter(regex='_Var')): 'min'}))
Expected result:
ID Date XXX 123_Var 456_Var 789_Var
A 07/16/2019 9 135 551 313
A 07/18/2019 8 259 176 674
B 07/16/2019 9 690 185 699
B 07/18/2019 1 580 210 847
Create dictionary dynamic with dict.fromkeys and then merge it with {'XXX':'max'} dict and pass to GroupBy.agg:
d = dict.fromkeys(df.filter(regex='_Var').columns, 'min')
df = df.groupby(['ID','Date'], as_index=False).agg({**{'XXX':'max'}, **d})
print (df)
ID Date XXX 123_Var 456_Var 789_Var
0 A 07/16/2019 9 135 551 313
1 A 07/18/2019 8 259 176 674
2 B 07/16/2019 9 690 185 699
3 B 07/18/2019 1 580 210 847

To find avg in pig and sort it in ascending order

have a schema with 9 fields and i want to take only two fields(6,7 i.e $5,$6) and i want to calculate the average of $5 and i want to sort the $6 in ascending order so how to do this task can some one help me.
Input Data:
N368SW 188 170 175 17 -1 MCO MHT 1142
N360SW 100 115 87 -10 5 MCO MSY 550
N626SW 114 115 90 13 14 MCO MSY 550
N252WN 107 115 84 -10 -2 MCO MSY 550
N355SW 104 115 85 -1 10 MCO MSY 550
N405WN 113 110 96 14 11 MCO ORF 655
N456WN 110 110 92 24 24 MCO ORF 655
N743SW 144 155 124 7 18 MCO PHL 861
N276WN 142 150 129 -2 6 MCO PHL 861
N369SW 153 145 134 30 22 MCO PHL 861
N363SW 151 145 137 5 -1 MCO PHL 861
N346SW 141 150 128 51 60 MCO PHL 861
N785SW 131 145 118 -15 -1 MCO PHL 861
N635SW 144 155 127 -6 5 MCO PHL 861
N242WN 298 300 276 68 70 MCO PHX 1848
N439WN 130 140 111 -4 6 MCO PIT 834
N348SW 140 135 124 7 2 MCO PIT 834
N672SW 136 135 122 9 8 MCO PIT 834
N493WN 151 160 136 -9 0 MCO PVD 1073
N380SW 170 155 155 13 -2 MCO PVD 1073
N705SW 164 160 147 6 2 MCO PVD 1073
N233LV 157 160 143 1 4 MCO PVD 1073
N786SW 156 160 139 6 10 MCO PVD 1073
N280WN 160 160 146 1 1 MCO PVD 1073
N282WN 104 95 81 10 1 MCO RDU 534
N694SW 89 100 77 3 14 MCO RDU 534
N266WN 94 95 82 9 10 MCO RDU 534
N218WN 98 100 77 12 14 MCO RDU 534
N355SW 47 50 35 15 18 MCO RSW 133
N388SW 44 45 30 37 38 MCO RSW 133
N786SW 46 50 31 4 8 MCO RSW 133
N707SA 52 50 33 10 8 MCO RSW 133
N795SW 176 185 153 -9 0 MCO SAT 1040
N402WN 176 185 161 4 13 MCO SAT 1040
N690SW 123 130 107 -1 6 MCO SDF 718
N457WN 135 130 105 20 15 MCO SDF 718
N720WN 144 155 131 13 24 MCO STL 880
N775SW 147 160 135 -6 7 MCO STL 880
N291WN 136 155 122 96 115 MCO STL 880
N247WN 144 155 127 43 54 MCO STL 880
N748SW 179 185 159 -4 2 MDW ABQ 1121
N709SW 176 190 158 21 35 MDW ABQ 1121
N325SW 110 105 97 36 31 MDW ALB 717
N305SW 116 110 90 107 101 MDW ALB 717
N403WN 145 165 128 -6 14 MDW AUS 972
N767SW 136 165 125 59 88 MDW AUS 972
N730SW 118 120 100 28 30 MDW BDL 777
i have written the code like this but it is not working properly:
a = load '/path/to/file' using PigStorage('\t');
b = foreach a generate (int)$5 as field_a:int,(chararray)$6 as field_b:chararray;
c = group b all;
d = foreach c generate b.field_b,AVG(b.field_a);
e = order d by field_b ASC;
dump e;
I am facing error at order by:
grunt> a = load '/user/horton/sample_pig_data.txt' using PigStorage('\t');
grunt> b = foreach a generate (int)$5 as fielda:int,(chararray)$6 as fieldb:chararray;
grunt> describe #;
b: {fielda: int,fieldb: chararray}
grunt> c = group b all;
grunt> describe #;
c: {group: chararray,b: {(fielda: int,fieldb: chararray)}}
grunt> d = foreach c generate b.fieldb,AVG(b.fielda);
grunt> e = order d by fieldb ;
2017-01-05 15:51:29,623 [main] ERROR - ERROR 1025:
<line 6, column 15> Invalid field projection. Projected field [fieldb] does not exist in schema: :bag{:tuple(fieldb:chararray)},:double.
Details at logfile: /root/pig_1483631021021.log
I want output like(not related to input data):
(({(Bharathi),(Komal),(Archana),(Trupthi),(Preethi),(Rajesh),(siddarth),(Rajiv) },
{ (72) , (83) , (87) , (75) , (93) , (90) , (78) , (89) }),83.375)
If you have found the answer, best practice is to post it so that others referring to this can have a better understanding.

Group clause in SQL command

I have 3 tables: Deliveries, IssuedWarehouse, ReturnedStock.
Deliveries: ID, OrderNumber, Material, Width, Gauge, DelKG
IssuedWarehouse: OrderNumber, IssuedKG
ReturnedStock: OrderNumber, IssuedKG
What I'd like to do is group all the orders by Material, Width and Gauge and then sum the amount delivered, issued to the warehouse and issued back to stock.
This is the SQL that is really quite close:
Count(DELIVERIES.OrderNo) AS [Orders Placed],
Sum(DELIVERIES.DeldQtyKilos) AS [KG Delivered],
Sum(IssuedWarehouse.[Qty Issued]) AS [Film Issued],
Sum([Film Retns].[Qty Issued]) AS [Film Returned],
[KG Delivered]-[Film Issued]+[Film Returned] AS [Qty Remaining]
INNER JOIN IssuedWarehouse
ON DELIVERIES.OrderNo = IssuedWarehouse.[Order No From])
INNER JOIN [Film Retns]
ON DELIVERIES.OrderNo = [Film Retns].[Order No From]
GROUP BY Material, Width, Gauge, ActDelDate
HAVING ActDelDate Between [start date] And [end date]
This groups the products almost perfectly. However if you take a look at the results:
Material Width Gauge Orders Placed Delivered Qnty Kilos Film Issued Film Returned Qty Remaining
COEX-GLOSS 590 75 1 534 500 124 158
COEX-MATT 1080 80 1 4226 4226 52 52
CPP 660 38 8 6720 2768 1384 5336
CPP 666 47 1 5677 5716 536 497
CPP 690 65 2 1232 717 202 717
CPP 760 38 3 3444 1318 510 2636
CPP 770 38 4 4316 3318 2592 3590
CPP 786 38 2 672 442 212 442
CPP 800 47 1 1122 1122 116 116
CPP 810 47 1 1127 1134 69 62
CPP 810 47 2 2250 1285 320 1285
CPP 1460 38 12 6540 4704 2442 4278
LD 975 75 1 502 502 182 182
LDPE 450 50 1 252 252 50 50
LDPE 520 70 1 250 250 95 95
LDPE 570 65 2 504 295 86 295
LDPE 570 65 2 508 278 48 278
LDPE 620 50 1 252 252 67 67
LDPE 660 50 1 256 256 62 62
LDPE 670 75 1 248 248 80 80
LDPE 690 47 1 476 476 390 390
LDPE 790 38 2 2104 1122 140 1122
LDPE 790 50 1 286 286 134 134
LDPE 790 50 1 250 250 125 125
LDPE 810 30 1 4062 4062 100 100
LDPE 843 33 1 408 408 835 835
LDPE 850 80 1 412 412 34 34
LDPE 855 30 1 740 740 83 83
LDPE 880 60 1 304 304 130 130
LDPE 900 70 2 1000 650 500 850
LDPE 1017 60 1 1056 1056 174 174
OPP 25 1100 1 381 381 95 95
OPP 1000 30 2 1358 1112 300 546
OPP 1000 30 1 1492 1491 100 101
OPP 1200 20 1 418 417 461 462
PET 760 12 3 1227 1876 132 -517
You'll see that there are some materials that have the same width and gauge yet they are not grouped. I think this is because the delivered qty is different on the orders. For example:
Material Width Gauge Orders Placed Delivered Qnty Kilos Film Issued Film Returned Qty Remaining
LDPE 620 50 1 252 252 67 67
LDPE 660 50 1 256 256 62 62
I would like these two rows to be grouped. They have the same material, width and gauge but the delivered qty is different therefore it hasn't grouped it.
Can anyone help me group these strange rows?
Your "problem" is that the deliveries occurred on different dates, and you're grouping by ActDelDate so the data splits, but because you haven't selected the ActDelDate column, this isn't obvious.
The fix is: Remove ActDelDate from the group by list
You should also remove the unnecessary brackets around the first join, and change
HAVING ActDelDate Between [start date] And [end date]
WHERE ActDelDate Between [start date] And [end date]
and have it before the GROUP BY
You are grouping by the delivery date, which is causing the rows to be split. Either omit the delivery date from the results and group by, or take the min/max of the delivery date.