Pig script to tranpose on basis of certain criteria - apache-pig

I have a file containing data in following format:
abc 123 456
cde 45 32
efg 322 654
abc 445 856
cde 65 21
efg 147 384
abc 815 078
efg 843 286
and so on.
How can transpose it into following format using pig:
abc 123 456 cde 45 32 efg 322 654
abc 445 856 cde 65 21 efg 147 348
abc 815 078 efg 843 286
Also, in case cde is missing after abc, it should insert blank spaces instead, since it is a fixed width file.
I tried grouping but it ain't worked for me.

Well, you can do it by writing custom loader. The easiest attempt is to extend PigStorage and override getNext() method making it call record reader three times, instead of 1 and produce unioned Tuple.

Related

how to plot pie charts separately according to their rows using pandas dataframe

I would like to create pie charts according to their respective rows such that each pie chart contain the 3 different columns in their respective years
I manage to create the pie charts but they are all squeezed together in one graph, how can I separate them?
this is my dataset:
sector year Total in Practice (OT) Total in Practice (SLP) Total in Practice (SLP)
0 2014 123 400 123
1 2015 234 456 123
2 2016 345 484 345
3 2017 345 539 566
4 2018 453 565 123
5 2019 454 598 234
6 2020 453 626 243
7 2021 755 682 243
this is my code:
df_all.T.plot.pie(df_all,subplots=True, figsize=(10, 3))
and this is how my plot end up as

Dynamically Calculate difference columns based off slicer- POWERBI

I have a table with quarterly volume data, and a slicer that allows you to choose what quarter/year you want to see volume per code for. The slicer has 2019Q1 through 2021Q4 selections. I need to create dynamic difference column that will adjust depending on what quarter/year is selected in the slicer. I know I need to create a new measure using Calculate/filter but am a beginner in PowerBI and am unsure how to write that formula.
Example of raw table data:
Code
2019Q1
2019Q2
2019Q3
2019Q
2020Q1
2020Q2
2020Q3
2020Q4
11111
232
283
289
19
222
283
289
19
22222
117
481
231
31
232
286
2
19
11111
232
397
94
444
232
553
0
188
22222
117
411
15
14
232
283
25
189
Example if 2019Q1 and 2020Q1 are selected:
Code
2019Q1
2020Q1
Difference
11111
232
222
10
22222
117
481
-364
11111
232
397
-165
22222
117
411
-294
Power BI doesn't work that way. This is an Excel pivot table setup. You don't have any parameter to distinguish first and third or second and fourth row. They have the same code, so Power BI will aggregate their volumes. You could introduce a hidden index column but then why don't you simply stick to Excel? The Power BI approch to the problem would be to unpivot (stack) your table to a Code, Quarter and a Volume column, create 2 independent slicer tables for Minuend and Subtrahend and then CALCULATE your aggregated differences based on the SELECTEDVALUE of the 2 slicers.

Mark the record with the lowest value in a group in SQL

I have a table that looks like the below:
ID
ID2
Name
111
223
ABC
111
225
ABC
111
227
ABC
113
234
DEF
113
242
DEF
113
248
DEF
113
259
DEF
113
288
DEF
What I am trying to achieve is to mark the record that has the lowest value in the ID2 table in every ID1 group doing a select statement, e.g.:
ID1
ID2
Name
R
111
223
ABC
Y
111
225
ABC
111
227
ABC
113
234
DEF
Y
113
242
DEF
113
248
DEF
113
259
DEF
113
288
DEF
116
350
GHI
Y
116
356
GHI
How do I achieve this in a SELECT statement?
The window functions should to the trick . Use dense_rank() if you want to see ties.
Select *
,R = case when row_number() over (partition by ID1,Name order by ID2) = 1
then 'Y'
else ''
end
From YourTable
I should add... The window functions can be invaluable. They are well worth your time experimenting with them.

PSQL - Find all values and make unique based on non unique values in another column

I have a view with multiple columns and need to update the values in column CHILD so that for every distinct value in column PARENT there is unique value in column CHILD. If not, update by adding a few characters to the value in Child before the '-'.
Example: Initial table
ID
PARENT
CHILD
1
ABC - 123
BBB - 364
2
ABC - 123
BBB - 364
3
GHI - 789
BBB - 364
4
JKL - 343
NNN - 679
5
MNO - 524
NNN - 679
6
PQR - 785
YYY - 678
7
STU - 765
MMM - 687
Final result:
ID
PARENT
CHILD
1
ABC - 123
BBBA - 364
2
ABC - 123
BBBA - 364
3
GHI - 789
BBB - 364
4
JKL - 343
NNNQ - 679
5
MNO - 524
NNN - 679
6
PQR - 785
YYY - 678
7
STU - 765
MMM - 687
You should write a function in postgresql (PlPgsql) that makes use of cursor to process each distinct row value in PARENT table, and iterate in a loop to update the value in CHILD column. You may need to add alphabet and numerals to make the values unique. Read this article on how to use cursors to iterate row by row and do some updates to the tuples in your table.

matching customer id value in postgresql

new to learning sql/postgresql and have been hunting all over looking for help with a query to find only the matching id values in a table so i can pull data from another table for a learning project. I have been trying to use the count command, which doesn't seem right, and struggling with group by.
here is my table
id acct_num sales_tot cust_id date_of_purchase
1 9001 1106.10 116 12-Jan-00
2 9002 645.22 125 13-Jan-00
3 9003 1096.18 137 14-Jan-00
4 9004 1482.16 118 15-Jan-00
5 9005 1132.88 141 16-Jan-00
6 9006 482.16 137 17-Jan-00
7 9007 1748.65 147 18-Jan-00
8 9008 3206.29 122 19-Jan-00
9 9009 1184.16 115 20-Jan-00
10 9010 2198.25 133 21-Jan-00
11 9011 769.22 141 22-Jan-00
12 9012 2639.17 117 23-Jan-00
13 9013 546.12 122 24-Jan-00
14 9014 3149.18 116 25-Jan-00
trying to write a simple query to only find matching customer id's, and export them to the query window.