Split content of a column and get the other replicated - awk

I have a file (too large) with a structure like this
A B C,D,E,F
The third column contains 4 values (but could be variable) separated with commas. I would like to convert that file into
A B C
A B D
A B E
A B F
Basically replicating the first two and splitting the second into rows.
Any idea on how to do that in awk?

$ awk '{n=split($3,a,/,/);for(i=1;i<=n;i++)print $1,$2,a[i]}' file
A B C
A B D
A B E
A B F

Related

How to read excel two dimensional parameter in Gams?

I have a Gams model and I want read sets and parameters from Excel to Gams.As shown below:
How can I read this parameter in Gams?
Thanks
For that table you need 2 indexes (i.e. sets) e.g. set i for the column of a, b and c. And set j for the row of d, e and f. Try this:
parameter d(i,j) "Data with column of a, b and c and row of e, d and f";
$Call GDXXRW.exe i=C:\Input.xlsx par=d rng=Sheet1!C1:F4 Rdim=1 Cdim=1 o=C:\Input.gdx
$GDXIN C:\Input.gdx
$LOAD d
$GDXIN
Display d;

Split one row into multiple rows based on comma-separated string column

I have a table like below with columns A(int) and B(string):
A B
1 a,b,c
2 d,e
3 f,g,h
I want to create an output like below:
A B
1 a
1 b
1 c
2 d
2 e
3 f
3 g
3 h
If it helps, I am doing this in Amazon Athena (which is based on presto). I know that presto gives a function to split a string into an array. From presto docs:
split(string, delimiter) → array
Splits string on delimiter
and returns an array.
Not sure how to proceed from here though.
Use unnest on the array returned by split.
SELECT a,split_b
FROM tbl
CROSS JOIN UNNEST(SPLIT(b,',')) AS t (split_b)

Odd Even Sorting in VBA

I am trying to sort rows of data so that the integer value of an alpha-numerical address is in order of odd values then even values given they are of the same type.
The only way I have got it to (semi)work was this:
-Find if the integer of the address is even or odd
-Add EVEN or ODD to a cell in that addresses corresponding row
-Run the macro
-Filter the data by EVEN or ODD designation
This approach isn't ideal. I am interested in rearranging the rows without having to use filtering.
Below is an example of how the sorting would go.
UNSORTED SORTED
Address Type Address Type
1.1p A 1.1p A
1.2p A 1.2p A
1.3p A 1.3p A
1.4p A 1.4p A
2.1p A 3.1p A
2.2p A 3.2p A
2.3p A 3.3p A
2.4p A 3.4p A
3.1p A 5.1p A
3.2p A 5.2p A
3.3p A 5.3p A
3.4p A 5.4p A
4.1p A 2.1p A
4.2p A 2.2p A
4.3p A 2.3p A
4.4p A 2.4p A
5.1p A 4.1p A
5.2p A 4.2p A
5.3p A 4.3p A
5.4p A 4.4p A
6.1p B 7.1p B
6.2p B 7.2p B
6.3p B 7.3p B
6.4p B 7.4p B
7.1p B 9.1p B
7.2p B 9.2p B
7.3p B 9.3p B
7.4p B 9.4p B
8.1p B 6.1p B
8.2p B 6.2p B
8.3p B 6.3p B
8.4p B 6.4p B
9.1p B 8.1p B
9.2p B 8.2p B
9.3p B 8.3p B
9.4p B 8.4p B
10.1p B 10.1p B
10.2p B 10.2p B
10.3p B 10.3p B
10.4p B 10.4p B
I am new to VBA. Thank you in advance for any suggestions.
I think you need to create a helper column where you can store a value that you can use for sorting.
Basic idea is to extract the numeric value from your "Adress" column, check if it is even and if yes multiply it by an high value (eg 1000) so that it is guaranteed to be higher than the highest possible odd value.
You can use either a formula for this cell - but it's looking a little complicated to me. Assuming that your data starts in cell A2:
=VALUE(LEFT(A2, SEARCH("p", A2, 1)-1))*IF(ISODD(VALUE(LEFT(A2, SEARCH("p", A2, 1)-1))),1,1000)
or write a small UDF
Function SortVal(s As String) As Double
SortVal = Val(s)
If Int(SortVal) Mod 2 = 0 Then SortVal = SortVal * 1000
End Function
and put a call to it in your helper column
=SortVal(A2)

How to load array of strings with tab delimiter in pig

I have a text file with tab delimiter and I am trying to print first column as id and remaining array of strings as second column names.
consider below is the file to load:
cat file.txt;
1 A B
2 C D E F
3 G
4 H I J K L M
In the above file, first column is an id and the remaining are names.
I should get the output like:
id names
1 A,B
2 C,D,E,F
3 G
4 H,I,J,K,L,M
If names are split with delimiter ,, then I am getting the output by using below commands:
test = load '/tmp/arr' using PigStorage('\t') as (id:int,names:chararray)
btest = FOREACH test GENERATE id, FLATTEN(TOBAG(STRSPLIT(name,','))) as value:tuple(name:CHARARRAY);
But for the array with delimiter ('\t'), I am not getting them because it's considering only the first value in the column 2 (i.e, names).
Any solution for this?
I have a solution for this:
When using PigStorage('\t') in the load, the file should have tab delimiter. So the number of tab used in a line that many coloumns(+1) is created. This is how it works.
But you have a trick
You can change the default delimiter and use some other delimiter to load the file like comma and then you can have the names in commaseperated.
It will work for sure
Input file sample
1,A B
2,C D E F
3,G
4,H I J K L M
Hope this helps

Using pig, How do I parse and comapre a grouped item

I have
A B
a, d
a, e
a, y
z, v
z, k
z, o
and so on.
Column B is of type cararray and contains key value pairs separated by &.
For example - d = 'abc=1&c=1&p=success'
What I want to figure out --
Suppose -
d = 'abc=1&c=1&xyz=23423423'
e = 'xyz=1&it=ssd'
y = 'abc=1&c=1&p=success'
For every 'a' I want to figure out if it has column b which contains the same value of abc and have c=1 and p = success. I also want to extract the value of abc and c from d and y.
For instance lets take the above example -
d contains abc=1 and c=1
y contains abc=1 and p= success
So this satisfies what I am looking for i.e for a given 'a' i have same value of abc and c=1 and p =success.
I started with grouping my data :
grouped = group data BY (A, B);
which gives me
a, (a,b)(a,e)(a,y)
z, (z,v)(z,k)(z,o)
But after this I am clueless on how to compare data within each group so that the above condition is satisfied.
Any help on this is appreciated.
Please let me know if you want me to clarify further on my question.
Since you are only concerned with some of the fields in the query string (I assume that's what it is), you will want to split the data with a FOREACH and STRSPLIT. Flatten it so you have something that looks like this
(a, b) where b would be a single key/value from the query ex: abc=1
Filter out the key/value pairs you don't care about, join them back together and then group by the combined key/value pairs. That will give you a list of every a with the same b where b only contains abc=X, c=1 and p=success