Merging content of 2 files based on some common field - awk

file 1
a,b, c, d,session-111, e, f
p,f, y, j,session-222, e, o
p,e, c, j,session-333, e, r
t,y, u, j,session-444, r, r
t,y, u, j,session-555, e, w
e,g, m, j,session-555, e, m
e,e, m, j,session-555, e, m
file 2
session-111, data-123, 123, erwt
session-222, data-234, 345, fghjf
session-333, data-345, 456, aasdf
session-555, data-567, 789, aasdf
session-555, data-890, 121, aasdf
session-666, data-678, 121, aasdf
Output
a,b, c, d,session-111, e, f, data-123, 123
p,f, y, j,session-222, e, o, data-234, 345
p,e, c, j,session-333, e, r, data-345, 456
t,y, u, j,session-444, e, r, NODATA
t,y, u, j,session-555, e, r, date-567, 789
t,y, u, j,session-555, e, r, date-890, 121
e,e, m, j,session-555, e, m, NODATA
All data from file1 should be printed - no matter there is reference found in file2 or not
if reference found in file 2, then specific fields (field 2 and 3) will get concatinated in output file

If I understand you correctly, you want to sequentially match fields 5 and 1 in file1 to file2 respectively, and if there is no match a "NODATA" field should be used instead. The following comes close to what you want, I think your listed output has some errors, see the comments made by sudo_O:
parse.awk
BEGIN { FS = OFS = "," }
FNR == NR {
lines[$1][++count[$1]] = $2 FS $3
next
}
count[$5] == 0 { print $0, " NODATA" }
count[$5] > 0 {
count[$5]--
print $0, lines[$5][++prn[$5]]
}
Run it like this:
awk -f parse.awk file2 file1
Output:
a,b, c, d,session-111, e, f, data-123, 123
p,f, y, j,session-222, e, o, data-234, 345
p,e, c, j,session-333, e, r, data-345, 456
t,y, u, j,session-444, r, r, NODATA
t,y, u, j,session-555, e, w, data-567, 789
e,g, m, j,session-555, e, m, data-890, 121
e,e, m, j,session-555, e, m, NODATA

try this one-liner:
awk -F, 'NR==FNR{k[$1]=$2 OFS $3;next} {if($5 in k)print $0,k[$5];else print $0," NODATA"}' OFS="," file2 file1
a,b, c, d,session-111, e, f, data-123, 123
p,f, y, j,session-222, e, o, data-234, 345
p,e, c, j,session-333, e, r, data-345, 456
t,y, u, j,session-444, r, r, NODATA
t,y, u, j,session-555, e, w, data-890, 121
e,g, m, j,session-555, e, m, data-890, 121
e,e, m, j,session-555, e, m, data-890, 121

Related

How to create a Mutable List of Alphabets in Kotlin?

I want to create a MutableList of alphabets form A to Z but so far I can only think of the following method as the shortest one without writing each and every alphabet.
fun main()
{
var alphabets: MutableList<Char> = mutableListOf()
for (a in 'A'..'Z')
{
alphabets.add(a)
}
print(alphabets)
}
So I wanted to know if there is any Lambda implementation or shorter method for the same?
You can use CharRange() or using .. operator to create a range.
For example:
val alphabets = ('A'..'Z').toMutableList()
print(alphabets)
// [A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S, T, U, V, W, X, Y, Z]
or
val alphabets = CharRange('A','Z').toMutableList()
print(alphabets)
// [A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S, T, U, V, W, X, Y, Z]
Alternatively, you can use ('a'..'z').joinToString("").
val abc = ('a'..'z').joinToString("")
println(abc) //abcdefghijklmnopqrstuvwxyz
You can perform the same functions on a string in Kotlin as you can perform on a list of characters
For example:
for (i in abc) { println(i) }
This lists each alphabet individually on a new line just as it would if you converted it into a list.
You can use rangeTo extension function as below
private val alphabets = ('A').rangeTo('Z').toMutableList()
println(alphabets)
[A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S, T, U, V, W, X, Y, Z]

How to shuffle elements of Mutable list in Kotlin?

I wanted to create a MutableList of alphabets and then shuffle them and store it in another MutableList.
I used shuffle() function but it resulted in the original list being shuffled as well which I didn't wanted to happen as I will be using the original list to map it with new shuffled one.
fun main(){
val alphabets = ('A'..'Z').toMutableList()
var shuffAlp = alphabets
shuffAlp.shuffle()
println(alphabets)
println(shuffAlp)
}
So I had to create two mutable list and then shuffle one of them
val alphabets = ('A'..'Z').toMutableList()
var shuffAlp = ('A'..'Z').toMutableList()
shuffAlp.shuffle()
This might be a trivial question but is there any other way where I do not have to create two same list?
shuffle does shuffle into original list, shuffled do and return new list.
And same behavior is for sort & sorted, sortBy & sortedBy, reverse & asReversed:
fun main(){
val alphabets = ('A'..'Z').toMutableList()
val shuffAlp = alphabets.shuffled()
println(alphabets)
println(shuffAlp)
}
Result:
[A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S, T, U, V, W, X, Y, Z]
[U, B, A, N, H, R, O, K, X, C, W, E, Q, P, J, Z, L, Y, S, M, I, D, V, F, G, T]

Maple genmatrix command

Does anyone know what I am doing wrong here? I am trying to generate a matrix from the linear system defined and then solve the matrix using it's inverse. For some reason it won't create the matrix.
sys := {a+.9*h+.8*c+.4*d+.1*e+0*f = 1, .1*a+.2*h+.4*c+.6*d+.5*e+.6*f = .6,
.4*a+.5*h+.7*c+d+.6*e+.3*f = .7, .6*a+.1*h+.2*c+.3*d+.5*e+f = .5,
.8*a+.8*h+c+.7*d+.4*e+.2*f = .8, .9*a+h+.8*c+.5*d+.2*e+.1*f = .9}:
solve(sys, {a, c, d, e, f, h});
Z := genmatrix(sys, [a, h, c, d, e, f], 'b');
Indented values are Maple's response. Third line of code should have generated a matrix, but instead is giving back my input.
You need to load the linear-algebra package with(linalg).
restart:with(linalg):
Z := genmatrix(sys, [a, h, c, d, e, f], 'b');

MD5 - shift function

I try to understand MD5 algorithm. I found on the wikipedia a pseudocode. This is a fragment:
//Main loop:
for i from 0 to 63
if 0 ≤ i ≤ 15 then
F := (B and C) or ((not B) and D)
g := i
else if 16 ≤ i ≤ 31
F := (D and B) or ((not D) and C)
g := (5×i + 1) mod 16
else if 32 ≤ i ≤ 47
F := B xor C xor D
g := (3×i + 5) mod 16
else if 48 ≤ i ≤ 63
F := C xor (B or (not D))
g := (7×i) mod 16
dTemp := D
D := C
C := B
B := B + leftrotate((A + F + K[i] + M[g]), s[i])
A := dTemp
end for
I almost understand it but I'm wondering what's stored in g variable?
Wikipedia states:
"The processing of a message block consists of four similar stages, termed rounds; each round is composed of 16 similar operations". From the pseudo code it can be seen that each round the parts of the message, denoted by M, are mixed in at a different time of the round.
The function to generate g - an index into the message block - will make sure that each part of the message is mixed in. 5, 3 and 7 are all primes and of course they don't have a common divisor with each other or 16. That means that, together with the initial offsets 1, 5 and 0 they make sure that the distribution is as dissimilar as possible.
The index i goes from 0 to 63, which is 4 x 16 - 1. This makes sure that all the parts of the message are mixed in once, at each of the rounds.
EDIT: I thought that the initial g = i could generate an index out of bounds, but that code is only executed for i in the range 0..15 so the % 16 can indeed be left out.
Below is an "unrolled" (meaning all the loops have been written out) implementation of the inner structure of MD5. Here you can see how g operates on the in array:
/* Basic MD5 step. Transform buf based on in.
*/
static void Transform (buf, in)
UINT4 *buf;
UINT4 *in;
{
UINT4 a = buf[0], b = buf[1], c = buf[2], d = buf[3];
/* Round 1 */
#define S11 7
#define S12 12
#define S13 17
#define S14 22
FF ( a, b, c, d, in[ 0], S11, 3614090360); /* 1 */
FF ( d, a, b, c, in[ 1], S12, 3905402710); /* 2 */
FF ( c, d, a, b, in[ 2], S13, 606105819); /* 3 */
FF ( b, c, d, a, in[ 3], S14, 3250441966); /* 4 */
FF ( a, b, c, d, in[ 4], S11, 4118548399); /* 5 */
FF ( d, a, b, c, in[ 5], S12, 1200080426); /* 6 */
FF ( c, d, a, b, in[ 6], S13, 2821735955); /* 7 */
FF ( b, c, d, a, in[ 7], S14, 4249261313); /* 8 */
FF ( a, b, c, d, in[ 8], S11, 1770035416); /* 9 */
FF ( d, a, b, c, in[ 9], S12, 2336552879); /* 10 */
FF ( c, d, a, b, in[10], S13, 4294925233); /* 11 */
FF ( b, c, d, a, in[11], S14, 2304563134); /* 12 */
FF ( a, b, c, d, in[12], S11, 1804603682); /* 13 */
FF ( d, a, b, c, in[13], S12, 4254626195); /* 14 */
FF ( c, d, a, b, in[14], S13, 2792965006); /* 15 */
FF ( b, c, d, a, in[15], S14, 1236535329); /* 16 */
/* Round 2 */
#define S21 5
#define S22 9
#define S23 14
#define S24 20
GG ( a, b, c, d, in[ 1], S21, 4129170786); /* 17 */
GG ( d, a, b, c, in[ 6], S22, 3225465664); /* 18 */
GG ( c, d, a, b, in[11], S23, 643717713); /* 19 */
GG ( b, c, d, a, in[ 0], S24, 3921069994); /* 20 */
GG ( a, b, c, d, in[ 5], S21, 3593408605); /* 21 */
GG ( d, a, b, c, in[10], S22, 38016083); /* 22 */
GG ( c, d, a, b, in[15], S23, 3634488961); /* 23 */
GG ( b, c, d, a, in[ 4], S24, 3889429448); /* 24 */
GG ( a, b, c, d, in[ 9], S21, 568446438); /* 25 */
GG ( d, a, b, c, in[14], S22, 3275163606); /* 26 */
GG ( c, d, a, b, in[ 3], S23, 4107603335); /* 27 */
GG ( b, c, d, a, in[ 8], S24, 1163531501); /* 28 */
GG ( a, b, c, d, in[13], S21, 2850285829); /* 29 */
GG ( d, a, b, c, in[ 2], S22, 4243563512); /* 30 */
GG ( c, d, a, b, in[ 7], S23, 1735328473); /* 31 */
GG ( b, c, d, a, in[12], S24, 2368359562); /* 32 */
/* Round 3 */
#define S31 4
#define S32 11
#define S33 16
#define S34 23
HH ( a, b, c, d, in[ 5], S31, 4294588738); /* 33 */
HH ( d, a, b, c, in[ 8], S32, 2272392833); /* 34 */
HH ( c, d, a, b, in[11], S33, 1839030562); /* 35 */
HH ( b, c, d, a, in[14], S34, 4259657740); /* 36 */
HH ( a, b, c, d, in[ 1], S31, 2763975236); /* 37 */
HH ( d, a, b, c, in[ 4], S32, 1272893353); /* 38 */
HH ( c, d, a, b, in[ 7], S33, 4139469664); /* 39 */
HH ( b, c, d, a, in[10], S34, 3200236656); /* 40 */
HH ( a, b, c, d, in[13], S31, 681279174); /* 41 */
HH ( d, a, b, c, in[ 0], S32, 3936430074); /* 42 */
HH ( c, d, a, b, in[ 3], S33, 3572445317); /* 43 */
HH ( b, c, d, a, in[ 6], S34, 76029189); /* 44 */
HH ( a, b, c, d, in[ 9], S31, 3654602809); /* 45 */
HH ( d, a, b, c, in[12], S32, 3873151461); /* 46 */
HH ( c, d, a, b, in[15], S33, 530742520); /* 47 */
HH ( b, c, d, a, in[ 2], S34, 3299628645); /* 48 */
/* Round 4 */
#define S41 6
#define S42 10
#define S43 15
#define S44 21
II ( a, b, c, d, in[ 0], S41, 4096336452); /* 49 */
II ( d, a, b, c, in[ 7], S42, 1126891415); /* 50 */
II ( c, d, a, b, in[14], S43, 2878612391); /* 51 */
II ( b, c, d, a, in[ 5], S44, 4237533241); /* 52 */
II ( a, b, c, d, in[12], S41, 1700485571); /* 53 */
II ( d, a, b, c, in[ 3], S42, 2399980690); /* 54 */
II ( c, d, a, b, in[10], S43, 4293915773); /* 55 */
II ( b, c, d, a, in[ 1], S44, 2240044497); /* 56 */
II ( a, b, c, d, in[ 8], S41, 1873313359); /* 57 */
II ( d, a, b, c, in[15], S42, 4264355552); /* 58 */
II ( c, d, a, b, in[ 6], S43, 2734768916); /* 59 */
II ( b, c, d, a, in[13], S44, 1309151649); /* 60 */
II ( a, b, c, d, in[ 4], S41, 4149444226); /* 61 */
II ( d, a, b, c, in[11], S42, 3174756917); /* 62 */
II ( c, d, a, b, in[ 2], S43, 718787259); /* 63 */
II ( b, c, d, a, in[ 9], S44, 3951481745); /* 64 */
buf[0] += a;
buf[1] += b;
buf[2] += c;
buf[3] += d;
}

How to obtain a minimal key from functional dependencies?

I need some help and guidelines.
I have the following relation: R = {A, B, C, D, E, F} and the set of functional dependencies
F = {
{AB -> C};
{A -> D};
{D -> AE};
{E -> F};
}
What is the primary key for R ?
If i apply inference rules i get these additional Function dependencies:
D -> A
D -> E
D -> F
D -> AEF
A -> E
A -> F
A -> DEF
How do I continue?
There is a well known algorithm to do this. I don't remember it, but the excercise seems to be simple enough not to use it.
I think this is all about transitivity:
CurrentKey = {A, B, C, D, E, F}
You know D determines E and E determines F. Hence, D determines F by transitivity. As F doesn't determine anything, we can remove it and as E can be obtained from D we can remove it as well:
CurrentKey = {A, B, C, D}
As AB determines C and C doesn't determine anything we know it can't be part of the key, so we remove it:
CurrentKey = {A, B, D}
Finally we know A determines D so we can remove the latter from the key:
CurrentKey = {A, B}
If once you have this possible key, you can recreate all functional dependencies it is a possible key.
PS: If you happen to have the algorithm handy, please post it as I'd be glad to re-learn that :)
Algorithm: Key computation (call with x = ∅)
procedure key(x;A;F)
foreach ! B 2 F do
if x and B 2 x and B ̸2 then
return; /* x not minimal */
fi
od
if x+ = A then
print x; /* found a minimal key x */
else
X any element of A 􀀀 x+;
key(x [ fXg;A;F);
foreach ! X 2 F do
key(x [ ;A;F);
od
fi