awk opposite of split - awk

what would be an opposite of split() in awk?
Imagine I have array containig characters/integers.
What I've tried:
color = "#FFFF00";
printf("color original: %s\n", color);
split(color, chars, "");
joined = "";
for (i=1; i <= length(chars); i++) {
joined = joined + chars[i];
}
printf("color joined: %s\n", joined);
however the output is:
color original: #FFFF00
color joined: 0
that is of course incorrect.
UPDATE:
cool, ended up with the following code (inspired by join function present in answers):
color = "#FFFF00";
printf("color original: %s\n", color);
split(color, chars, "");
joined = "";
for (i=1; i <= length(chars); i++) {
joined = joined "" chars[i];
}
printf("color joined: %s\n", joined);
the trick was not to use + sign when joining things back

Here's a solution that doesn't rely on gawk or knowing the length of the array and lets you put a separator (space in this case) string between each array element if you like:
color = "#FFFF00"
printf "color original: %s\n", color
split(color, chars, "")
joined = sep = ""
for (i=1; i in chars; i++) {
joined = joined sep chars[i]
sep = " " # populate sep here with whatever string you want between elements
}
printf "color joined: %s\n", joined
I also cleaned up the incorrect use of printf and the spurious semi-colons.
In the above script split(color, chars, "") relies on having a version of awk that will split a string into an array given a null field separator, which is undefined behavior per POSIX, but that's not what this answer is about - the question is how to join array elements not how to split them.

Here is a way with POSIX Awk:
br = "red,orange,yellow,green,blue"
ch = split(br, de, ",")
print "original: " br
printf "joined: "
for (ec in de) printf ec == ch ? de[ec] "\n" : de[ec] "-"
Output:
original: red,orange,yellow,green,blue
joined: red-orange-yellow-green-blue

What you want (in your loop) is string concatenation.

Similar to previous answers and less elegant, but easy and short:
color = "#FFFF00"
printf "color original: %s\n", color
split(color, chars, "")
for (i=1; i<=length(chars); i++) {
(i==1) ? joined = chars[i] : joined = joined" "chars[i] # Define separator here
}
printf "color joined: %s\n", joined

Knowing that the opposite of split() is join(), a mere Google Search gives me this page, which seems to contain the solution : http://www.gnu.org/software/gawk/manual/html_node/Join-Function.html . It joins all the elements of an array together, and returns the corresponding string.
['f','o','o'] => "foo"
Have fun

Related

Awk how to negate a condition

I'm trying to compute some stuff in awk, and at the end print the result in the order of the input. For each line, I check if it has not been already seen. If not, I add it to the array and also store it in an order array.
{
if (! $0 in seen) {
seen[$0] = 1
order[o++] = $0
}
} END {
for (i=0; i<o; i++)
printf "%s\n", order[i]
}
You can try it with
printf 'a\nb\na\nc\nb\na\n' | awk script_above
It prints nothing. If I print the variable o at the end, it shows that its value is still 0. What am I doing wrong?
You just need to add parens to get the right operator precedence*:
# a.awk
{
if (!($0 in seen)) {
seen[$0] = 1
order[o++] = $0
}
}
END {
for (i=0; i<o; i++)
printf "%s\n", order[i]
}
Test:
$ awk -f a.awk file
a
b
c
* (The unary ! binds more tightly than the in operator: https://www.gnu.org/software/gawk/manual/html_node/Precedence.html)
What you are trying to do is in Shell way, awk has a way where you could keep checking if an element is part of an array or not, try following once.
printf 'a\nb\na\nc\nb\na\n' | awk '
!seen[$0]++ {
order[o++] = $0
}
END {
for (i=0; i<o; i++)
printf "%s\n", order[i]
}'
Here !seen[$0]++ means it is checking condition if an element is NOT a part of indexes of array named a then go inside the BLOCK(where your next statements are provided) then it does ++ which makes sure that this element(which was NOT there in array before checking condition)'s counter incremented by 1 so that next time this !seen[$0]++` condition is NOT TRUE for the already passed element.

Sum of specific columns data with based on date using awk

I am having a data which is separated by a comma
LBA0SF004,2018-10-01,4681,4681
LBA0SF004,2018-10-01,919,919
LBA0SF004,2018-10-01,3,3
LBA0SF004,2018-10-01,11453,11453
LBA0SF004,2018-10-02,4681,4681
LBA0SF004,2018-10-02,1052,1052
LBA0SF004,2018-10-02,3,3
LBA0SF004,2018-10-02,8032,8032
I need an awk command to add all 3rd and 4th columns with awk command based on date. If you see the same server with different dates values are available I need data like this
LBA0SF004 2018-10-01 17056 17056
LBA0SF004 2018-10-02 13768 13768
Below GNU AWK construct should be able to do what you are looking for.
awk '
BEGIN {
FS = ","
OFS = " "
}
{
if(NF == 4)
{
a[$1][$2]["3rd"] += $3;
a[$1][$2]["4th"] += $4;
}
}
END {
for (i in a)
{
for (j in a[i])
{
print i, j, a[i][j]["3rd"], a[i][j]["4th"];
}
}
}
' Input_File.txt
Explanation :-
FS is input field Separator which in your case is ,
OFS is output field Separator which is
Create an array a with first column, second column and sum of third and forth columns
At the END, Print the contents of the array

gsub for substituting translations not working

I have a dictionary dict with records separated by ":" and data fields by new lines, for example:
:one
1
:two
2
:three
3
:four
4
Now I want awk to substitute all occurrences of each record in the input
file, eg
onetwotwotwoone
two
threetwoone
four
My first awk script looked like this and works just fine:
BEGIN { RS = ":" ; FS = "\n"}
NR == FNR {
rep[$1] = $2
next
}
{
for (key in rep)
grub(key,rep[key])
print
}
giving me:
12221
2
321
4
Unfortunately another dict file contains some character used by regular expressions, so I have to substitute escape characters in my script. By moving key and rep[key] into a string (which can then be parsed for escape characters), the script will only substitute the second record in the dict. Why? And how to solve?
Here's the current second part of the script:
{
for (key in rep)
orig=key
trans=rep[key]
gsub(/[\]\[^$.*?+{}\\()|]/, "\\\\&", orig)
gsub(orig,trans)
print
}
All scripts are run by awk -f translate.awk dict input
Thanks in advance!
Your fundamental problem is using strings in regexp and backreference contexts when you don't want them and then trying to escape the metacharacters in your strings to disable the characters that you're enabling by using them in those contexts. If you want strings, use them in string contexts, that's all.
You won't want this:
gsub(regexp,backreference-enabled-string)
You want something more like this:
index(...,string) substr(string)
I think this is what you're trying to do:
$ cat tst.awk
BEGIN { FS = ":" }
NR == FNR {
if ( NR%2 ) {
key = $2
}
else {
rep[key] = $0
}
next
}
{
for ( key in rep ) {
head = ""
tail = $0
while ( start = index(tail,key) ) {
head = head substr(tail,1,start-1) rep[key]
tail = substr(tail,start+length(key))
}
$0 = head tail
}
print
}
$ awk -f tst.awk dict file
12221
2
321
4
Never mind for asking....
Just some missing parentheses...?!
{
for (key in rep)
{
orig=key
trans=rep[key]
gsub(/[\]\[^$.*?+{}\\()|]/, "\\\\&", orig)
gsub(orig,trans)
}
print
}
works like a charm.

MAWK: Store match() in variable

I try to use MAWK where the match() built-in function doesn't have a third value for variable:
match($1, /9f7fde/) {
substr($1, RSTART, RLENGTH);
}
See doc.
How can I store this output into a variable named var when later I want to construct my output like this?
EDIT2 - Complete example:
Input file structure:
<iframe src="https://vimeo.com/191081157" frameborder="0" height="481" width="608" scrolling="no"></iframe>|Random title|Uploader|fun|tag1,tag2,tag3
<iframe src="https://vimeo.com/212192268" frameborder="0" height="481" width="608" scrolling="no"></iframe>|Random title|Uploader|fun|tag1,tag2,tag3
parser.awk:
{
Embed = $1;
Title = $2;
User = $3;
Categories = $4;
Tags = $5;
}
BEGIN {
FS="|";
}
# Regexp without pattern matching for testing purposes
match(Embed, /191081157/) {
Id = substr(Embed, RSTART, RLENGTH);
}
{
print Id"\t"Title"\t"User"\t"Categories"\t"Tags;
}
Expected output:
191081157|Random title|Uploader|fun|tag1,tag2,tag3
I want to call the Id variable outside the match() function.
MAWK version:
mawk 1.3.4 20160930
Copyright 2008-2015,2016, Thomas E. Dickey
Copyright 1991-1996,2014, Michael D. Brennan
random-funcs: srandom/random
regex-funcs: internal
compiled limits:
sprintf buffer 8192
maximum-integer 2147483647
The obvious answer would seem to be
match($1, /9f7fde/) { var = "9f7fde"; }
But more general would be:
match($1, /9f7fde/) { var = substr($1, RSTART, RLENGTH); }
UPDATE : The solution above mine could be simplified to :
from
match($1, /9f7fde/) { var = substr($1, RSTART, RLENGTH) }
to
{ __=substr($!_,match($!_,"9f7fde"),RLENGTH) }
A failed match would have RLENGTH auto set to -1, so nothing gets substring'ed out.
But even that is too verbose : since the matching criteria is a constant string, then simply
mawk '$(_~_)~_{__=_}' \_='9f7fde'
============================================
let's say this line
.....vimeo.com/191081157" frameborder="0" height="481" width="608" scrolling="no">Random title|Uploader|fun|tag1,tag2,tag3
{mawk/mawk2/gawk} 'BEGIN { OFS = "";
FS = "(^.+vimeo[\056]com[\057]|[\042] frameborder.+[\057]iframe[>])" ;
} (NF < 4) || ($2 !~ /191081157/) { next } ( $1 = $1 )'
\056 is the dot ( . ) \057 is forward slash ( / ) and \042 is double straight quote ( " )
if it can't even match at all, move onto next row. otherwise, use the power of the field separator to gobble away all the unneeded parts of the line. The $1 = $1 will collect the prefix and the rest of the HTML tags you don't need.
The assignment operation of $1 = $1 will also return true, providing the input for boolean evaluation for it to print. This way, you don't need either match( ) or substr( ) at all.

awk | Rearrange fields of CSV file on the basis of column value

I need you help in writing awk for the below problem. I have one source file and required output of it.
Source File
a:5,b:1,c:2,session:4,e:8
b:3,a:11,c:5,e:9,session:3,c:3
Output File
session:4,a=5,b=1,c=2
session:3,a=11,b=3,c=5|3
Notes:
Fields are not organised in source file
In Output file: fields are organised in their specific format, for example: all a values are in 2nd column and then b and then c
For value c, in second line, its coming as n number of times, so in output its merged with PIPE symbol.
Please help.
Will work in any modern awk:
$ cat file
a:5,b:1,c:2,session:4,e:8
a:5,c:2,session:4,e:8
b:3,a:11,c:5,e:9,session:3,c:3
$ cat tst.awk
BEGIN{ FS="[,:]"; split("session,a,b,c",order) }
{
split("",val) # or delete(val) in gawk
for (i=1;i<NF;i+=2) {
val[$i] = (val[$i]=="" ? "" : val[$i] "|") $(i+1)
}
for (i=1;i in order;i++) {
name = order[i]
printf "%s%s", (i==1 ? name ":" : "," name "="), val[name]
}
print ""
}
$ awk -f tst.awk file
session:4,a=5,b=1,c=2
session:4,a=5,b=,c=2
session:3,a=11,b=3,c=5|3
If you actually want the e values printed, unlike your posted desired output, just add ,e to the string in the split() in the BEGIN section wherever you'd like those values to appear in the ordered output.
Note that when b was missing from the input on line 2 above, it output a null value as you said you wanted.
Try with:
awk '
BEGIN {
FS = "[,:]"
OFS = ","
}
{
for ( i = 1; i <= NF; i+= 2 ) {
if ( $i == "session" ) { printf "%s:%s", $i, $(i+1); continue }
hash[$i] = hash[$i] (hash[$i] ? "|" : "") $(i+1)
}
asorti( hash, hash_orig )
for ( i = 1; i <= length(hash); i++ ) {
printf ",%s:%s", hash_orig[i], hash[ hash_orig[i] ]
}
printf "\n"
delete hash
delete hash_orig
}
' infile
that splits line with any comma or colon and traverses all odd fields to save either them and its values in a hash to print at the end. It yields:
session:4,a:5,b:1,c:2,e:8
session:3,a:11,b:3,c:5|3,e:9