hierarchy/tree - get all strings to root

hierarchy/tree - get all strings to root - sql

Say I have this table T below which defines/stored a tree structure by storing parent/child couples. These values are integers. Say in another table S, I have each ID/value mapped to a string.
So, let's say in S we have:
Table S
ID Name
90 "node 90"
301 "node 301"
etc. (even though the real names are different)
Is it possible to add a computed column here in T which gives for each child node, the textual representation of the path all the way up to the root of the tree in an appended form e.g.
"node 1 > node 2 > node 3" (for child/leaf node 3)
or
"node 10 > node 20" (for child/leaf node 20)
If it's not possible through a computed column, then can I do it with a regular column and a one-time update of that column? I was thinking of some recursive CTE but I cannot get my head around it (for now).
Table T
ParentEventID ChildEventID
90 301
90 302
90 303
90 304
90 305
90 306
90 307
301 401
301 402
302 403
302 404
302 405
302 406
302 407
303 408
304 409
304 410
304 411
304 412
304 413
304 414
305 415
305 416
305 417
305 418
306 419
306 420
306 421
306 422
307 423
307 424
307 425
307 426
307 427
403 501
403 502
403 503
403 504
403 505
404 506
404 507
404 508
404 509
404 510
405 511
405 512
405 513
405 514
405 515
406 516
406 517
406 518
406 519
406 520
407 521
407 522
407 523
407 524
407 525
415 526
415 527
415 528
415 529
415 530
416 531
416 532
416 533
416 534
416 535
417 536
417 537
417 538
417 539
417 540
418 541
418 542
418 543
418 544
418 545
420 546
420 547
420 548
420 549
420 550
421 551
421 552
421 553
421 554
421 555
422 556
422 557
422 558
422 559
422 560

Here's what I came up with:
WITH cte AS (
SELECT * FROM (VALUES
(90, 301),
(90, 302),
(90, 303),
(90, 304),
(90, 305),
(90, 306),
(90, 307),
(301,401),
(301,402),
(302,403),
(302,404),
(302,405),
(302,406),
(302,407),
(303,408),
(304,409),
(304,410),
(304,411),
(304,412),
(304,413),
(304,414),
(305,415),
(305,416),
(305,417),
(305,418),
(306,419),
(306,420),
(306,421),
(306,422),
(307,423),
(307,424),
(307,425),
(307,426),
(307,427),
(403,501),
(403,502),
(403,503),
(403,504),
(403,505),
(404,506),
(404,507),
(404,508),
(404,509),
(404,510),
(405,511),
(405,512),
(405,513),
(405,514),
(405,515),
(406,516),
(406,517),
(406,518),
(406,519),
(406,520),
(407,521),
(407,522),
(407,523),
(407,524),
(407,525),
(415,526),
(415,527),
(415,528),
(415,529),
(415,530),
(416,531),
(416,532),
(416,533),
(416,534),
(416,535),
(417,536),
(417,537),
(417,538),
(417,539),
(417,540),
(418,541),
(418,542),
(418,543),
(418,544),
(418,545),
(420,546),
(420,547),
(420,548),
(420,549),
(420,550),
(421,551),
(421,552),
(421,553),
(421,554),
(421,555),
(422,556),
(422,557),
(422,558),
(422,559),
(422,560)
) AS x(ParentEventID, ChildEventID)
), rcte AS (
SELECT DISTINCT NULL AS [ParentEventID], a.[ParentEventID] AS ChildEventID, CONCAT('/', CAST(a.[ParentEventID] AS NVARCHAR(MAX)), '/') AS h
FROM cte AS a
WHERE NOT EXISTS (
SELECT *
FROM [cte]
WHERE [cte].[ChildEventID] = a.[ParentEventID]
)
UNION ALL
SELECT child.[ParentEventID], child.[ChildEventID], CONCAT(parent.h, [child].[ChildEventID], '/')
FROM [cte] AS child
JOIN rcte AS parent
ON child.[ParentEventID] = [parent].[ChildEventID]
)
SELECT * FROM rcte
The first cte is just a quick way for me to expose your data; the real meat of the solution is in rcte. Note, the h column is immediately convertible to a HierarchyID if that is what you're looking for. Which, by the way, you should be looking for that as that allows for you to answer questions of the type "what are the children of this row?" or "which rows are in this row's lineage?" quite easily (i.e. w/o having to compute the entire hierarchy on the fly).

Related

When I import a PDF file inside Overleaf the math mode characters turns "bold" (shadow)

I used the website diagrams.net to create a figure with some mathematical expressions. Of course, I can export it how PNG and import it to my Overleaf, but I want to retain the vectorization of the expressions. Because of that, I am trying to import it how PDF inside my Overleaf document.
When I use:
\begin{figure}[tbp!]
\centering
\includegraphics[width=\linewidth]{images/math_structure.pdf}
\caption{My figure description.}
\label{fig:math_structure}
\end{figure}
My figure is shown normally, aparently, but when I zoom in the mathematical expressions I have it:
Another interesting thing I noted is that when I download the PDF from Overleaf and open it using MUPDF the "bold" disappears, but when I open it using Google Chrome or Firefox the "bold" is there yet.
This is a pretty strange thing because I guess it was a problem of embedding font inside the PDF, but my file opens normally in MUPDF. Does anyone know what is happening and how can I resolve it?
I am sharing the math_structure in order to reproduce the problem in the following link: PDF

As an addendum to K J's answer:
looking at each letter there are two objects so although I can not see the shadow within the editor but accept it is there so it must be placed by the text outline generator? here I have moved and coloured some glyphs so the second edge is deliberate but most viewers would not show them as a "GLOW"
Indeed, all those items with a glow are drawn twice in the content stream, once for filling the defining path, once for stroking. E.g. the capital S of "State":
.111484379 0 0 -.111468516 140.764496 314.20746 cm
/G3 gs
55 507 m
55 562.33331 74 609 112 647 c
150 685 193.66666 704 243 704 c
257 704 l
313.66666 704 363 683 405 641 c
426 672 l
429.33334 676.66669 432.66666 681.66669 436 687 c
439.33334 692.33331 442.66666 696.66669 446 700 c
449 704 l
449.66666 704 451 704 453 704 c
455 704 457 704.33331 459 705 c
463 705 l
465 705 468 703 472 699 c
472 462 l
466 456 l
448 456 l
440.66666 456 436.33334 457 435 459 c
433.66666 461 432 467.66666 430 479 c
418.66666 563 385 618.66669 329 646 c
304.33334 656.66669 279.33334 662 254 662 c
218.66666 662 190 650 168 626 c
146 602 135 574 135 542 c
135 519.33331 140.666672 498.66666 152 480 c
163.333328 461.33334 179.33333 446.33334 200 435 c
206.66667 432.33334 235.33333 424.66666 286 412 c
336.66666 399.33334 364.66666 391.66666 370 389 c
408 374.33334 439 349.33334 463 314 c
487 278.66666 499.33334 237.66667 500 191 c
500 137 482.66666 88.333336 448 45 c
413.33334 1.66666412 364.33334 -20.333334 301 -21 c
263.66666 -21 230.33333 -15.333334 201 -4 c
171.66667 7.333334 151.333328 17.666666 140 27 c
122 41 l
119.333336 37.666668 114.333336 31 107 21 c
99.666664 11 93 1.66666698 87 -7 c
81 -15.666667 78 -20.333334 78 -21 c
76.666664 -21.666666 73.333336 -22 68 -22 c
64 -22 l
62 -22 59 -20 55 -16 c
55 101 l
55 180.33334 55.333332 220.66667 56 222 c
57.333332 225.33333 64 227 76 227 c
89 227 l
93 223 95 218.66667 95 214 c
95 192.66667 98.333336 171.66667 105 151 c
111.666664 130.333328 123 110 139 90 c
155 70 177 54 205 42 c
233 30 266.33334 24 305 24 c
336.33334 24 363.33334 36.666664 386 62 c
408.66666 87.333336 420 118.333328 420 155 c
420 183.66667 412.66666 209.66667 398 233 c
383.33334 256.33334 364 272.33334 340 281 c
302.66666 290.33334 278 296.66666 266 300 c
262.66666 300.66666 253.66667 302.66666 239 306 c
224.33333 309.33334 213.33333 312 206 314 c
198.66667 316 188 319.66666 174 325 c
160 330.33334 149 336.33334 141 343 c
133 349.66666 123.333336 357.66666 112 367 c
100.666664 376.33334 91.666664 388 85 402 c
65 434.66666 55 469.66666 55 507 c
h
f
/G7 gs
55 507 m
55 562.33331 74 609 112 647 c
150 685 193.66666 704 243 704 c
257 704 l
313.66666 704 363 683 405 641 c
426 672 l
429.33334 676.66669 432.66666 681.66669 436 687 c
439.33334 692.33331 442.66666 696.66669 446 700 c
449 704 l
449.66666 704 451 704 453 704 c
455 704 457 704.33331 459 705 c
463 705 l
465 705 468 703 472 699 c
472 462 l
466 456 l
448 456 l
440.66666 456 436.33334 457 435 459 c
433.66666 461 432 467.66666 430 479 c
418.66666 563 385 618.66669 329 646 c
304.33334 656.66669 279.33334 662 254 662 c
218.66666 662 190 650 168 626 c
146 602 135 574 135 542 c
135 519.33331 140.666672 498.66666 152 480 c
163.333328 461.33334 179.33333 446.33334 200 435 c
206.66667 432.33334 235.33333 424.66666 286 412 c
336.66666 399.33334 364.66666 391.66666 370 389 c
408 374.33334 439 349.33334 463 314 c
487 278.66666 499.33334 237.66667 500 191 c
500 137 482.66666 88.333336 448 45 c
413.33334 1.66666412 364.33334 -20.333334 301 -21 c
263.66666 -21 230.33333 -15.333334 201 -4 c
171.66667 7.333334 151.333328 17.666666 140 27 c
122 41 l
119.333336 37.666668 114.333336 31 107 21 c
99.666664 11 93 1.66666698 87 -7 c
81 -15.666667 78 -20.333334 78 -21 c
76.666664 -21.666666 73.333336 -22 68 -22 c
64 -22 l
62 -22 59 -20 55 -16 c
55 101 l
55 180.33334 55.333332 220.66667 56 222 c
57.333332 225.33333 64 227 76 227 c
89 227 l
93 223 95 218.66667 95 214 c
95 192.66667 98.333336 171.66667 105 151 c
111.666664 130.333328 123 110 139 90 c
155 70 177 54 205 42 c
233 30 266.33334 24 305 24 c
336.33334 24 363.33334 36.666664 386 62 c
408.66666 87.333336 420 118.333328 420 155 c
420 183.66667 412.66666 209.66667 398 233 c
383.33334 256.33334 364 272.33334 340 281 c
302.66666 290.33334 278 296.66666 266 300 c
262.66666 300.66666 253.66667 302.66666 239 306 c
224.33333 309.33334 213.33333 312 206 314 c
198.66667 316 188 319.66666 174 325 c
160 330.33334 149 336.33334 141 343 c
133 349.66666 123.333336 357.66666 112 367 c
100.666664 376.33334 91.666664 388 85 402 c
65 434.66666 55 469.66666 55 507 c
h
S
The filled version is drawn with the extended graphics state G3, the stroked version is drawn with the extended graphics state G7.
G3 fills in an opaque manner:
<</BM/Normal/ca 1>
but G7 strokes very transparently (opacity .1098) and sets some other parameters:
<</BM/Normal/CA .1098/LC 0/LJ 0/LW 0/ML 4/SA true/ca .1098>>
But in particular G7 also sets the line width to 0 (the thinnest line that can be rendered at device resolution: 1 device pixel wide).
The OP mentions that they see the shadows when they zoom in. Thus, maybe those viewers in which you see a broad shadow/glow after zooming do simply zoom by drawing everything magnified by the zoom factor, i.e. the shadow/glow becomes zoom factor * 1 pixel wide; and those viewers in which you don't see a broad shadow/glow draw the outlines even after zooming with a 1 pixel width.

It does not appear to be the difference is in the font style since the weighting between standard 24 and bold 24 is shown below on the right. Which is not evident in your two samples.
However, what is noticeable in your sample is the "shadows" around each of those letter on the left giving the impression of extra thickness.
Initially I would expect that could be caused by the difference between jpeg (haloed lettering) and png (crisp anti-alias outlines). But then the shadow is too regular i.e. not uneven like it would normally be in a jpeg.
At this stage it looks like there may be some other reason for such fuzzy fonts.
Without a sample I would have to guess the PDF has potentially a font with an alpha component but could be way off in such a wild assumption.
Later Edit
Thanks for your link but the mystery deepens, since that linked PDF in Chromium Edge even enlarged shows no evidence of any shadows, but then again the maths looks like vector outlines only the middle Tahoma appears to be font and the one embedded, as generated by Skia/PDF thus built by chrome?.
I have to agree there is some other influence somewhere down the line but the browser should not affect the PDF unless it adds or respects some overlay based on an extra component, and looking at each letter there are two objects so although I can not see the shadow within the editor but accept it is there so it must be placed by the text outline generator?
here I have moved and coloured some glyphs so the second edge is deliberate but most viewers would not show them as a "GLOW"
You mentioned "diagrams.net" which does have many shadow options but I never experienced any other than deliberately set to right and down. Perhaps look for a rogue setting there.
In summary the file is declared as compatible with version 1.4 (may have transparency) and clearly some transparent objects have been included around each letter! but not in a fashion expected by all viewers. As a result of #mkl 's observation I retested the pdf in many viewers with the settings that could have an effect such as vector line thickening in acrobat, However NONE I tested showed the extra thick outlines, thus the PDF seems valid but some PDF viewer(apps) methods you are using seem to thicken the anti-alias much more than should be expected for a single pixel boundary.

Taking the last two rows' minimum value

I have this data frame:
ID Date X 123_Var 456_Var 789_Var
A 16-07-19 3 777 250 810
A 17-07-19 9 637 121 529
A 18-07-19 7 878 786 406
A 19-07-19 4 656 140 204
A 20-07-19 2 295 272 490
A 21-07-19 3 778 600 544
A 22-07-19 6 741 792 907
B 01-07-19 4 509 690 406
B 02-07-19 2 732 915 199
B 03-07-19 2 413 725 414
B 04-07-19 2 170 702 912
B 09-08-19 3 851 616 477
B 10-08-19 9 475 447 555
B 11-08-19 1 412 403 708
B 12-08-19 2 299 537 321
B 13-08-19 4 310 119 125
C 01-12-18 4 912 755 657
C 02-12-18 4 586 771 394
C 04-12-18 9 498 122 193
C 05-12-18 2 500 528 764
C 06-12-18 1 982 383 654
C 07-12-18 1 299 496 488
C 08-12-18 3 336 691 496
C 09-12-18 3 206 433 263
C 10-12-18 2 373 319 111
I want to show the minimum value between current row and previous row values, for each column in 123_Var 456_Var 789_Var set.
That should be applied separately for each ID. (Groupby.)
The first row of each ID, will show the current value. (Since there's no "previous" value to compare.)
Expected result:
ID Date X 123_Var 456_Var 789_Var 123_Min2 456_Min2 789_Min2
A 16-07-19 3 777 250 810 777 250 810
A 17-07-19 9 637 121 529 637 121 529
A 18-07-19 7 878 786 406 637 121 406
A 19-07-19 4 656 140 204 656 140 204
A 20-07-19 2 295 272 490 295 140 204
A 21-07-19 3 778 600 544 295 272 490
A 22-07-19 6 741 792 907 741 600 544
B 01-07-19 4 509 690 406 509 690 406
B 02-07-19 2 732 915 199 509 690 199
B 03-07-19 2 413 725 414 413 725 199
B 04-07-19 2 170 702 912 170 702 414
B 09-08-19 3 851 616 477 170 616 477
B 10-08-19 9 475 447 555 475 447 477
B 11-08-19 1 412 403 708 412 403 555
B 12-08-19 2 299 537 321 299 403 321
B 13-08-19 4 310 119 125 299 119 125
C 01-12-18 4 912 755 657 912 755 657
C 02-12-18 4 586 771 394 586 755 394
C 04-12-18 9 498 122 193 498 122 193
C 05-12-18 2 500 528 764 498 122 193
C 06-12-18 1 982 383 654 500 383 654
C 07-12-18 1 299 496 488 299 383 488
C 08-12-18 3 336 691 496 299 496 488
C 09-12-18 3 206 433 263 206 433 263
C 10-12-18 2 373 319 111 206 319 111

IIUC, We use groupby.shift to select the previous var for each ID, then we can use DataFrame.where
to leave only the cells where the previous value is lower than the current value and fill with the current value in the rest. We use DataFrame.add_suffix to add _Min2 and we join with df with DataFrame.join
df_vars = df[['123_Var','456_Var','789_Var']]
df = df.join(df.groupby('ID')['123_Var','456_Var','789_Var']
.shift()
.fillna(df_vars)
.where(lambda x: x.le(df_vars),df_vars)
.add_suffix('_Min2')
)
print(df)
Output
ID Date X 123_Var 456_Var 789_Var 123_Var_Min2 456_Var_Min2 789_Var_Min2
0 A 16-07-19 3 777 250 810 777.0 250.0 810.0
1 A 17-07-19 9 637 121 529 637.0 121.0 529.0
2 A 18-07-19 7 878 786 406 637.0 121.0 406.0
3 A 19-07-19 4 656 140 204 656.0 140.0 204.0
4 A 20-07-19 2 295 272 490 295.0 140.0 204.0
5 A 21-07-19 3 778 600 544 295.0 272.0 490.0
6 A 22-07-19 6 741 792 907 741.0 600.0 544.0
7 B 01-07-19 4 509 690 406 509.0 690.0 406.0
8 B 02-07-19 2 732 915 199 509.0 690.0 199.0
9 B 03-07-19 2 413 725 414 413.0 725.0 199.0
10 B 04-07-19 2 170 702 912 170.0 702.0 414.0
11 B 09-08-19 3 851 616 477 170.0 616.0 477.0
12 B 10-08-19 9 475 447 555 475.0 447.0 477.0
13 B 11-08-19 1 412 403 708 412.0 403.0 555.0
14 B 12-08-19 2 299 537 321 299.0 403.0 321.0
15 B 13-08-19 4 310 119 125 299.0 119.0 125.0
16 C 01-12-18 4 912 755 657 912.0 755.0 657.0
17 C 02-12-18 4 586 771 394 586.0 755.0 394.0
18 C 04-12-18 9 498 122 193 498.0 122.0 193.0
19 C 05-12-18 2 500 528 764 498.0 122.0 193.0
20 C 06-12-18 1 982 383 654 500.0 383.0 654.0
21 C 07-12-18 1 299 496 488 299.0 383.0 488.0
22 C 08-12-18 3 336 691 496 299.0 496.0 488.0
23 C 09-12-18 3 206 433 263 206.0 433.0 263.0
24 C 10-12-18 2 373 319 111 206.0 319.0 111.0
Case 2: If you want check the n previous use groupby.rolling
df_vars = df[['123_Var','456_Var','789_Var']]
n = 3
df = df.join(df.groupby('ID')['123_Var','456_Var','789_Var']
.rolling(n,min_periods = 1).min()
.reset_index(drop=True)
.add_suffix(f'_Min{n}')
)
print(df)
ID Date X 123_Var 456_Var 789_Var 123_Var_Min3 456_Var_Min3 789_Var_Min3
0 A 16-07-19 3 777 250 810 777.0 250.0 810.0
1 A 17-07-19 9 637 121 529 637.0 121.0 529.0
2 A 18-07-19 7 878 786 406 637.0 121.0 406.0
3 A 19-07-19 4 656 140 204 637.0 121.0 204.0
4 A 20-07-19 2 295 272 490 295.0 121.0 204.0
5 A 21-07-19 3 778 600 544 295.0 140.0 204.0
6 A 22-07-19 6 741 792 907 295.0 140.0 204.0
7 B 01-07-19 4 509 690 406 509.0 690.0 406.0
8 B 02-07-19 2 732 915 199 509.0 690.0 199.0
9 B 03-07-19 2 413 725 414 413.0 690.0 199.0
10 B 04-07-19 2 170 702 912 170.0 690.0 199.0
11 B 09-08-19 3 851 616 477 170.0 616.0 199.0
12 B 10-08-19 9 475 447 555 170.0 447.0 414.0
13 B 11-08-19 1 412 403 708 170.0 403.0 477.0
14 B 12-08-19 2 299 537 321 299.0 403.0 321.0
15 B 13-08-19 4 310 119 125 299.0 119.0 125.0
16 C 01-12-18 4 912 755 657 912.0 755.0 657.0
17 C 02-12-18 4 586 771 394 586.0 755.0 394.0
18 C 04-12-18 9 498 122 193 498.0 122.0 193.0
19 C 05-12-18 2 500 528 764 498.0 122.0 193.0
20 C 06-12-18 1 982 383 654 498.0 122.0 193.0
21 C 07-12-18 1 299 496 488 299.0 122.0 193.0
22 C 08-12-18 3 336 691 496 299.0 383.0 488.0
23 C 09-12-18 3 206 433 263 206.0 383.0 263.0
24 C 10-12-18 2 373 319 111 206.0 319.0 111.0

A quite elegant solution is to apply rolling(2).min() to each group,
but to avoid the first row of NaN in each group, this first row
should be "replicated" from the source group.
To do your task, start from defining the following function:
def fnMin2(grp):
rv = pd.concat([pd.DataFrame([grp.iloc[0, -3:]]),
grp[['123_Var', '456_Var', '789_Var']].rolling(2).min().iloc[1:]])\
.astype('int')
rv.columns = [ it.replace('Var', 'Min2') for it in rv.columns ]
return grp.join(rv)
Then apply it to each group:
df.groupby('ID').apply(fnMin2)
Note that column names assigned to new columns in my solution are
just as you wish, contrary to the solution you accepted.

#this compares the next row to the previous row
ext = df.iloc[:,3:].gt(df.iloc[:,3:].shift(1))
#simply renamed the columns here
ext.columns=['123_min','456_min','789_min']
#join the two dataframes by columns
M = pd.concat([df,ext],axis=1)
#based on the conditions, if it is False,
#use value from current row,
#else use value from previous row
M['123_min']=np.where(M['123_min']==0,
M['123_Var'],
M['123_Var'].shift(1)
)
M['456_min']=np.where(M['456_min']==0,
M['456_Var'],
M['456_Var'].shift(1)
)
M['789_min']=np.where(M['789_min']==0,
M['789_Var'],
M['789_Var'].shift(1)
)

SQL Hierarchy Visualization

I have a following problem:
There exists an entity called Branch.
A branch may belong to another branch or may be a standalone branch.
A parent branch may belong to another branch or may be the highest level branch.
There may be a UP TO 4-5 levels of hierarchy.
There are no loop hierarchies (that we know of as of right now).
I am wanting to somehow export the data from SQL and visualize into some sort of tree looking diagram. Any ideas are highly appreciated.
Here is a snapshot of my data model. Note that when DivisonParentBranch = RegionParentBranch = Branch, this implies that branch is standalone.
DivisionParentBranch RegionParentBranch Branch
150 401 401
150 401 402
150 401 403
150 401 404
273 248 248
273 248 277
273 248 278
273 273 273
273 273 286
273 273 408
273 273 809
356 356 356
356 356 358
356 356 363
356 356 405
356 356 773
356 357 357
356 361 361
356 361 364
739 511 511
739 511 513
739 511 514
739 511 515
739 511 517
739 511 519
739 511 520
739 511 779
UPDATE:
Expected Result is to visualize these branch hierarchies. Something along the lines of the below image. We have around 500+ branches so need to automate this somehow.

This isn't pretty and I am sure someone do better using the GROUP BY ROLLUP function but starting with the output of this you could loop through the results set and build a display based on the hierarchy from the select below. Level 1 would be the topmost part of the tree, level 2 would link to level 1 and be the second row, etc.
Also, as mentioned in the comments please don't post 'data' as screen prints.
SELECT div_branch, null as reg_branch, null as branch, '1' as level
FROM #bracnhes
GROUP BY div_branch
union all
SELECT div_branch, reg_branch, null as branch, '2' as level
FROM #bracnhes
GROUP BY div_branch, reg_branch
union all
SELECT div_branch, reg_branch, branch, '3' as level
FROM #bracnhes
GROUP BY div_branch, reg_branch, branch
My output, at least the little bit I bothered to test with looks like this
div_branch reg_branch branch level
150 NULL NULL 1
273 NULL NULL 1
356 NULL NULL 1
150 150 NULL 2
150 401 NULL 2
273 248 NULL 2
273 273 NULL 2
356 356 NULL 2
356 357 NULL 2
356 361 NULL 2
150 150 150 3
150 150 151 3
150 150 153 3
150 150 154 3
150 150 961 3

MPI_sendrecv changes loop index with -O1 flag [duplicate]

This question already has an answer here:
MPI_Recv overwrites parts of memory it should not access
(1 answer)
Closed 7 years ago.
Despite having written long, heavily parallelized codes with complicated send/receives over three dimensional arrays, this simple code with a two dimensional array of integers has got me at my wits end. I combed stackoverflow for possible solutions and found one that resembled slightly with the issue I am having:
Boost.MPI: What's received isn't what was sent!
However the solutions seem to point the looping segment of code as the culprit for overwriting sections of the memory. But this one seems to act even stranger. Maybe it is a careless oversight of some simple detail on my part. The problem is with the below code:
program main
implicit none
include 'mpif.h'
integer :: i, j
integer :: counter, offset
integer :: rank, ierr, stVal
integer, dimension(10, 10) :: passMat, prntMat !! passMat CONTAINS VALUES TO BE PASSED TO prntMat
call MPI_INIT(ierr)
call MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierr)
counter = 0
offset = (rank + 1)*300
do j = 1, 10
do i = 1, 10
prntMat(i, j) = 10 !! prntMat OF BOTH RANKS CONTAIN 10
passMat(i, j) = offset + counter !! passMat OF rank=0 CONTAINS 300..399 AND rank=1 CONTAINS 600..699
counter = counter + 1
end do
end do
if (rank == 1) then
call MPI_SEND(passMat(1:10, 1:10), 100, MPI_INTEGER, 0, 1, MPI_COMM_WORLD, ierr) !! SEND passMat OF rank=1 to rank=0
else
call MPI_RECV(prntMat(1:10, 1:10), 100, MPI_INTEGER, 1, 1, MPI_COMM_WORLD, stVal, ierr)
do i = 1, 10
print *, prntMat(:, i)
end do
end if
call MPI_FINALIZE(ierr)
end program main
When I compile the code with mpif90 with no flags and run it on my machine with mpirun -np 2, I get the following output with wrong values in the first four indices of the array:
0 0 400 0 604 605 606 607 608 609
610 611 612 613 614 615 616 617 618 619
620 621 622 623 624 625 626 627 628 629
630 631 632 633 634 635 636 637 638 639
640 641 642 643 644 645 646 647 648 649
650 651 652 653 654 655 656 657 658 659
660 661 662 663 664 665 666 667 668 669
670 671 672 673 674 675 676 677 678 679
680 681 682 683 684 685 686 687 688 689
690 691 692 693 694 695 696 697 698 699
However, when I compile it with the same compiler but with the -O3 flag on, I get the correct output:
600 601 602 603 604 605 606 607 608 609
610 611 612 613 614 615 616 617 618 619
620 621 622 623 624 625 626 627 628 629
630 631 632 633 634 635 636 637 638 639
640 641 642 643 644 645 646 647 648 649
650 651 652 653 654 655 656 657 658 659
660 661 662 663 664 665 666 667 668 669
670 671 672 673 674 675 676 677 678 679
680 681 682 683 684 685 686 687 688 689
690 691 692 693 694 695 696 697 698 699
This error is machine dependent. This issue turns up only on my system running Ubuntu 14.04.2, using OpenMPI 1.6.5
I tried this on other systems running RedHat and CentOS and the code ran well with and without the -O3 flag. Curiously those machines use an older version of OpenMPI - 1.4
I am guessing that the -O3 flag is performing some odd optimization that is modifying the manner in which arrays are being passed between the processes.
I also tried other versions of array allocation. The above code uses explicit shape arrays. With assumed shape and allocated arrays I am receiving equally, if not more bizarre results, with some of them seg-faulting. I tried using Valgrind to trace the origin of these seg-faults, but I still haven't gotten the hang of getting Valgrind to not give false positives when running with MPI programs.
I believe that resolving the difference in performance of the above code will help me understand the tantrums of my other codes as well.
Any help would be greatly appreciated! This code has really gotten me questioning if all the other MPI codes I wrote are sound at all.

Using the Fortran 90 interface to MPI reveals a mismatch in your call to MPI_RECV
call MPI_RECV(prntMat(1:10, 1:10), 100, MPI_INTEGER, 1, 1, MPI_COMM_WORLD, stVal, ierr)
1
Error: There is no specific subroutine for the generic ‘mpi_recv’ at (1)
This is because the status variable stVal is an integer scalar, rather than an array of MPI_STATUS_SIZE. The F77 interface (include 'mpif.h') to MPI_RECV is:
INCLUDE ’mpif.h’
MPI_RECV(BUF, COUNT, DATATYPE, SOURCE, TAG, COMM, STATUS, IERROR)
<type> BUF(*)
INTEGER COUNT, DATATYPE, SOURCE, TAG, COMM
INTEGER STATUS(MPI_STATUS_SIZE), IERROR
Changing
integer :: rank, ierr, stVal
to
integer :: rank, ierr, stVal(mpi_status_size)
produces a program that works as expected, tested with gfortran 5.1 and OpenMPI 1.8.5.
Using the F90 interface (use mpi vs include "mpif.h") lets the compiler detect the mismatched arguments at compile time rather than producing confusing runtime problems.

Group clause in SQL command

I have 3 tables: Deliveries, IssuedWarehouse, ReturnedStock.
Deliveries: ID, OrderNumber, Material, Width, Gauge, DelKG
IssuedWarehouse: OrderNumber, IssuedKG
ReturnedStock: OrderNumber, IssuedKG
What I'd like to do is group all the orders by Material, Width and Gauge and then sum the amount delivered, issued to the warehouse and issued back to stock.
This is the SQL that is really quite close:
SELECT
DELIVERIES.Material,
DELIVERIES.Width,
DELIVERIES.Gauge,
Count(DELIVERIES.OrderNo) AS [Orders Placed],
Sum(DELIVERIES.DeldQtyKilos) AS [KG Delivered],
Sum(IssuedWarehouse.[Qty Issued]) AS [Film Issued],
Sum([Film Retns].[Qty Issued]) AS [Film Returned],
[KG Delivered]-[Film Issued]+[Film Returned] AS [Qty Remaining]
FROM (DELIVERIES
INNER JOIN IssuedWarehouse
ON DELIVERIES.OrderNo = IssuedWarehouse.[Order No From])
INNER JOIN [Film Retns]
ON DELIVERIES.OrderNo = [Film Retns].[Order No From]
GROUP BY Material, Width, Gauge, ActDelDate
HAVING ActDelDate Between [start date] And [end date]
ORDER BY DELIVERIES.Material;
This groups the products almost perfectly. However if you take a look at the results:
Material Width Gauge Orders Placed Delivered Qnty Kilos Film Issued Film Returned Qty Remaining
COEX-GLOSS 590 75 1 534 500 124 158
COEX-MATT 1080 80 1 4226 4226 52 52
CPP 660 38 8 6720 2768 1384 5336
CPP 666 47 1 5677 5716 536 497
CPP 690 65 2 1232 717 202 717
CPP 760 38 3 3444 1318 510 2636
CPP 770 38 4 4316 3318 2592 3590
CPP 786 38 2 672 442 212 442
CPP 800 47 1 1122 1122 116 116
CPP 810 47 1 1127 1134 69 62
CPP 810 47 2 2250 1285 320 1285
CPP 1460 38 12 6540 4704 2442 4278
LD 975 75 1 502 502 182 182
LDPE 450 50 1 252 252 50 50
LDPE 520 70 1 250 250 95 95
LDPE 570 65 2 504 295 86 295
LDPE 570 65 2 508 278 48 278
LDPE 620 50 1 252 252 67 67
LDPE 660 50 1 256 256 62 62
LDPE 670 75 1 248 248 80 80
LDPE 690 47 1 476 476 390 390
LDPE 790 38 2 2104 1122 140 1122
LDPE 790 50 1 286 286 134 134
LDPE 790 50 1 250 250 125 125
LDPE 810 30 1 4062 4062 100 100
LDPE 843 33 1 408 408 835 835
LDPE 850 80 1 412 412 34 34
LDPE 855 30 1 740 740 83 83
LDPE 880 60 1 304 304 130 130
LDPE 900 70 2 1000 650 500 850
LDPE 1017 60 1 1056 1056 174 174
OPP 25 1100 1 381 381 95 95
OPP 1000 30 2 1358 1112 300 546
OPP 1000 30 1 1492 1491 100 101
OPP 1200 20 1 418 417 461 462
PET 760 12 3 1227 1876 132 -517
You'll see that there are some materials that have the same width and gauge yet they are not grouped. I think this is because the delivered qty is different on the orders. For example:
Material Width Gauge Orders Placed Delivered Qnty Kilos Film Issued Film Returned Qty Remaining
LDPE 620 50 1 252 252 67 67
LDPE 660 50 1 256 256 62 62
I would like these two rows to be grouped. They have the same material, width and gauge but the delivered qty is different therefore it hasn't grouped it.
Can anyone help me group these strange rows?

Your "problem" is that the deliveries occurred on different dates, and you're grouping by ActDelDate so the data splits, but because you haven't selected the ActDelDate column, this isn't obvious.
The fix is: Remove ActDelDate from the group by list
You should also remove the unnecessary brackets around the first join, and change
HAVING ActDelDate Between [start date] And [end date]
to
WHERE ActDelDate Between [start date] And [end date]
and have it before the GROUP BY

You are grouping by the delivery date, which is causing the rows to be split. Either omit the delivery date from the results and group by, or take the min/max of the delivery date.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas