Insert blank based on first digit of line - awk

Input:
3abdce
412ae3
21dege
Expected Output - starting digit of line is removed and a blank inserted based on the offset specified by that digit
abd ce
12ae 3
1d ege
I can only remove the first character:
sed 's/^.\{1\}//g' file

GNU awk solution:
awk -v FS="" '{ print substr($0,2,$1), substr($0,$1+2) }' file
$1 - points to the 1st figure value (slice size)
The output:
abd ce
12ae 3
1d ege

this one should do the trick:
awk '{ split($0, a, ""); print substr($0, 2, a[1])" "substr($0, 2+a[1]) }' yourfile
Output:
abd ce
12ae 3
1d ege

If perl is okay
$ perl -F -lane 'print #F[1..$F[0]], " ", #F[$F[0]+1..$#F]' ip.txt
abd ce
12ae 3
1d ege
-F -lane split each line on empty string, so each character is a field, saved in #F array
Then print as required, indexing starts from 0

Using gawk as it supports empty FS and OFS
awk -v FS="" -v OFS="" '{gsub($($1+1),"& ");gsub(/^./,"")}1' inputfile
abd ce
12ae 3
1d ege
Here, FS and OFS are set to blank and two gsub functions are used to to the required search and replace operation.

Related

gawk - Delimit lines with custom character and no similar ending character

Let's say I have a file like so:
test.txt
one
two
three
I'd like to get the following output: one|two|three
And am currently using this command: gawk -v ORS='|' '{ print $0 }' test.txt
Which gives: one|two|three|
How can I print it so that the last | isn't there?
Here's one way to do it:
$ seq 1 | awk -v ORS= 'NR>1{print "|"} 1; END{print "\n"}'
1
$ seq 3 | awk -v ORS= 'NR>1{print "|"} 1; END{print "\n"}'
1|2|3
With paste:
$ seq 1 | paste -sd'|'
1
$ seq 3 | paste -sd'|'
1|2|3
Convert one column to one row with field separator:
awk '{$1=$1} 1' FS='\n' OFS='|' RS='' file
Or in another notation:
awk -v FS='\n' -v OFS='|' -v RS='' '{$1=$1} 1' file
Output:
one|two|three
See: 8 Powerful Awk Built-in Variables – FS, OFS, RS, ORS, NR, NF, FILENAME, FNR
awk solutions work great. Here is tr + sed solution:
tr '\n' '|' < file | sed 's/\|$//'
1|2|3
just flatten it :
gawk/mawk 'BEGIN { FS = ORS; RS = "^[\n]*$"; OFS = "|"
} NF && ( $NF ? NF=NF : —-NF )'
ascii | = octal \174 = hex 0x7C. The reason for —-NF is that more often than not, the input includes a trailing new line, which makes field count 1 too many and result in
1|2|3|
Both NF=NF and --NF are similar concepts to $1=$1. Empty inputs, regardless of whether trailing new lines exist or not, would result in nothing printed.
At the OFS spot, you can delimit it with any string combo you like instead of being constrained by tr, which has inconsistent behavior. For instance :
gtr '\012' '高' # UTF8 高 = \351\253\230 = xE9 xAB x98
on bsd-tr, \n will get replaced by the unicode properly 1高2高3高 , but if you're on gnu-tr, it would only keep the leading byte of the unicode, and result in
1 \351 2 \351 . . .
For unicode equiv-classes, bsd-tr works as expected while gtr '[=高=]' '\v' results in
gtr: ?\230: equivalence class operand must be a single character
and if u attempt equiv-classes with an arbitrary non-ASCII byte, bsd-tr does nothing while gnu-tr would gladly oblige, even if it means slicing straight through UTF8-compliant characters :
g3bn 77138 | (g)tr '[=\224=]' '\v'
bsd-tr : 77138=Koyote 코요태 KYT✜ 高耀太
gnu-tr : 77138=Koyote ?
?
태 KYT✜ 高耀太
I would do it following way, using GNU AWK, let test.txt content be
one
two
three
then
awk '{printf NR==1?"%s":"|%s", $0}' test.txt
output
one|two|three
Explanation: If it is first line print that line content sans trailing newline, otherwise | followed by line content sans trailing newline. Note that I assumed that test.txt has not trailing newline, if this is not case test this solution before applying it.
(tested in gawk 5.0.1)
Also you can try this with awk:
awk '{ORS = (NR%3 ? "|" : RS)} 1' file
one|two|three
% is the modulo operator and NR%3 ? "|" : RS is a ternary expression.
See Ed Morton's explanation here: https://stackoverflow.com/a/55998710/14259465
With a GNU sed, you can pass -z option to match line breaks, and thus all you need is replace each newline but the last one at the end of string:
sed -z 's/\n\(.\)/|\1/g' test.txt
perl -0pe 's/\n(?!\z)/|/g' test.txt
perl -pe 's/\n/|/g if !eof' test.txt
See the online demo.
Details:
s - substitution command
\n\(.\) - an LF char followed with any one char captured into Group 1 (so \n at the end of string won't get matched)
|\1 - a | char and the captured char
g - all occurrences.
The first perl command matches any LF char (\n) not at the end of string ((?!\z)) after slurping the whole file into a single string input (again, to make \n visible to the regex engine).
The second perl command replaces an LF char at the end of each line except the one at the end of file (eof).
To make the changes inline add -i option (mind this is a GNU sed example):
sed -i -z 's/\n\(.\)/|\1/g' test.txt
perl -i -0pe 's/\n(?!\z)/|/g' test.txt
perl -i -pe 's/\n/|/g if !eof' test.txt

linux csv file concatenate columns into one column

I've been looking to do this with sed, awk, or cut. I am willing to use any other command-line program that I can pipe data through.
I have a large set of data that is comma delimited. The rows have between 14 and 20 columns. I need to recursively concatenate column 10 with column 11 per row such that every row has exactly 14 columns. In other words, this:
a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p
will become:
a,b,c,d,e,f,g,h,i,jkl,m,n,o,p
I can get the first 10 columns. I can get the last N columns. I can concatenate columns. I cannot think of how to do it in one line so I can pass a stream of endless data through it and end up with exactly 14 columns per row.
Examples (by request):
How many columns are in the row?
sed 's/[^,]//g' | wc -c
Get the first 10 columns:
cut -d, -f1-10
Get the last 4 columns:
rev | cut -d, -f1-4 | rev
Concatenate columns 10 and 11, showing columns 1-10 after that:
awk -F',' ' NF { print $1","$2","$3","$4","$5","$6","$7","$8","$9","$10$11}'
Awk solution:
awk 'BEGIN{ FS=OFS="," }
{
diff = NF - 14;
for (i=1; i <= NF; i++)
printf "%s%s", $i, (diff > 1 && i >= 10 && i < (10+diff)?
"": (i == NF? ORS : ","))
}' file
The output:
a,b,c,d,e,f,g,h,i,jkl,m,n,o,p
With GNU awk for the 3rd arg to match() and gensub():
$ cat tst.awk
BEGIN{ FS="," }
match($0,"(([^,]+,){9})(([^,]+,){"NF-14"})(.*)",a) {
$0 = a[1] gensub(/,/,"","g",a[3]) a[5]
}
{ print }
$ awk -f tst.awk file
a,b,c,d,e,f,g,h,i,jkl,m,n,o,p
If perl is okay - can be used just like awk for stream processing
$ cat ip.txt
a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p
1,2,3,4,5,6,3,4,2,4,3,4,3,2,5,2,3,4
1,2,3,4,5,6,3,4,2,4,a,s,f,e,3,4,3,2,5,2,3,4
$ awk -F, '{print NF}' ip.txt
16
18
22
$ perl -F, -lane '$n = $#F - 4;
print join ",", (#F[0..8], join("", #F[9..$n]), #F[$n+1..$#F])
' ip.txt
a,b,c,d,e,f,g,h,i,jkl,m,n,o,p
1,2,3,4,5,6,3,4,2,43432,5,2,3,4
1,2,3,4,5,6,3,4,2,4asfe3432,5,2,3,4
-F, -lane split on , results saved in #F array
$n = $#F - 4 magic number, to ensure output ends with 14 columns. $#F gives the index of last element of array (won't work if input line has less than 14 columns)
join helps to stitch array elements together with specified string
#F[0..8] array slice with first 9 elements
#F[9..$n] and #F[$n+1..$#F] the other slices as needed
Borrowing from Ed Morton's regex based solution
$ perl -F, -lape '$n=$#F-13; s/^([^,]*,){9}\K([^,]*,){$n}/$&=~tr|,||dr/e' ip.txt
a,b,c,d,e,f,g,h,i,jkl,m,n,o,p
1,2,3,4,5,6,3,4,2,43432,5,2,3,4
1,2,3,4,5,6,3,4,2,4asfe3432,5,2,3,4
$n=$#F-13 magic number
^([^,]*,){9}\K first 9 fields
([^,]*,){$n} fields to change
$&=~tr|,||dr use tr to delete the commas
e this modifier allows use of Perl code in replacement section
this solution also has the added advantage of working even if input field is less than 14
You can try this gnu sed
sed -E '
s/,/\n/9g
:A
s/([^\n]*\n)(.*)(\n)(([^\n]*\n){4})/\1\2\4/
tA
s/\n/,/g
' infile
First variant - with awk
awk -F, '
{
for(i = 1; i <= NF; i++) {
OFS = (i > 9 && i < NF - 4) ? "" : ","
if(i == NF) OFS = "\n"
printf "%s%s", $i, OFS
}
}' input.txt
Second variant - with sed
sed -r 's/,/#/10g; :l; s/#(.*)((#[^#]){4})/\1\2/; tl; s/#/,/g' input.txt
or, more straightforwardly (without loop) and probably faster.
sed -r 's/,(.),(.),(.),(.)$/#\1#\2#\3#\4/; s/,//10g; s/#/,/g' input.txt
Testing
Input
a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p
a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r
a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u
Output
a,b,c,d,e,f,g,h,i,jkl,m,n,o,p
a,b,c,d,e,f,g,h,i,jklmn,o,p,q,r
a,b,c,d,e,f,g,h,i,jklmnopq,r,s,t,u
Solved a similar problem using csvtool. Source file, copied from one of the other answers:
$ cat input.txt
a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p
1,2,3,4,5,6,3,4,2,4,3,4,3,2,5,2,3,4
1,2,3,4,5,6,3,4,2,4,a,s,f,e,3,4,3,2,5,2,3,4
Concatenating columns:
$ cat input.txt | csvtool format '%1,%2,%3,%4,%5,%6,%7,%8,%9,%10%11%12,%13,%14,%15,%16,%17,%18,%19,%20,%21,%22\n' -
a,b,c,d,e,f,g,h,i,jkl,m,n,o,p,,,,,,
1,2,3,4,5,6,3,4,2,434,3,2,5,2,3,4,,,,
1,2,3,4,5,6,3,4,2,4as,f,e,3,4,3,2,5,2,3,4
anatoly#anatoly-workstation:cbs$ cat input.txt

Confusion about awk command when dealing with if statement

$ cat awk.txt
12 32 45
5 2 3
33 11 33
$ cat awk.txt | awk '{FS='\t'} $1==5 {print $0}'
5 2 3
$ cat awk.txt | awk '{FS='\t'} $1==33 {print $0}'
Nothing is returned when judging the first field is 33 or not. It's confusing.
By saying
awk '{FS='\t'} $1==5 {print}' file
You are defining the field separator incorrectly. To make it be a tab, you need to say "\t" (with double quotes). Further reading: awk not capturing first line / separator.
Also, you are setting it every line, so it does not affect the first one. You want to use:
awk 'BEGIN{FS='\t'} $1==5' file
Yes, but why did it work in one case but not in the other?
awk '{FS='\t'} $1==5' file # it works
awk '{FS='\t'} $1==33' file # it does not work
You're using single quotes around '\t', which means that you're actually concatenating 3 strings together: '{FS=', \t and '} $1==5' to produce your awk command. The shell interprets the \t as t, so your awk script is actually:
awk '{FS=t} $1==5'
The variable t is unset, so you're setting the field separator to the empty string "". This means that the line is split into as many fields as characters you have. You can see it doing awk 'BEGIN{FS='\t'} {print NF}' file, that will show how many fields each record has.
Then, $1 is just 3 and $2 contains the second 3.
first of all !. Could you explain better what you really want to do before you ask ?. look....!
more awk.txt
12 32 45
5 2 3
33 11 33
awk -F"[ \t]" '$1 == 5 { print $0}' awk.txt
5 2 3
awk -F"[ \t]" '$1 == 33 { print $0}' awk.txt
33 11 33
awk -F"[ \t]" '$1 == 12 { print $0}' awk.txt
12 32 45
http://www.staff.science.uu.nl/~oostr102/docs/nawk/nawk_23.html
Fcs

awk - concatenate two string variable and assign to a third

In awk, I have 2 fields: $1 and $2.
They are both strings that I want to concatenate and assign to a variable.
Just use var = var1 var2 and it will automatically concatenate the vars var1 and var2:
awk '{new_var=$1$2; print new_var}' file
You can put an space in between with:
awk '{new_var=$1" "$2; print new_var}' file
Which in fact is the same as using FS, because it defaults to the space:
awk '{new_var=$1 FS $2; print new_var}' file
Test
$ cat file
hello how are you
i am fine
$ awk '{new_var=$1$2; print new_var}' file
hellohow
iam
$ awk '{new_var=$1 FS $2; print new_var}' file
hello how
i am
You can play around with it in ideone: http://ideone.com/4u2Aip
Could use sprintf to accomplish this:
awk '{str = sprintf("%s %s", $1, $2)} END {print str}' file
You can also concatenate strings from across multiple lines with whitespaces.
$ cat file.txt
apple 10
oranges 22
grapes 7
Example 1:
awk '{aggr=aggr " " $2} END {print aggr}' file.txt
10 22 7
Example 2:
awk '{aggr=aggr ", " $1 ":" $2} END {print aggr}' file.txt
, apple:10, oranges:22, grapes:7
Concatenating strings in awk can be accomplished by the print command AWK manual page, and you can do complicated combination. Here I was trying to change the 16 char to A and used string concatenation:
echo CTCTCTGAAATCACTGAGCAGGAGAAAGATT | awk -v w=15 -v BA=A '{OFS=""; print substr($0, 1, w), BA, substr($0,w+2)}'
Output: CTCTCTGAAATCACTAAGCAGGAGAAAGATT
I used the substr function to extract a portion of the input (STDIN). I passed some external parameters (here I am using hard-coded values) that are usually shell variable. In the context of shell programming, you can write -v w=$width -v BA=$my_charval. The key is the OFS which stands for Output Field Separate in awk. Print function take a list of values and write them to the STDOUT and glue them with the OFS. This is analogous to the perl join function.
It looks that in awk, string can be concatenated by printing variable next to each other:
echo xxx | awk -v a="aaa" -v b="bbb" '{ print a b $1 "string literal"}'
# will produce: aaabbbxxxstring literal

How to split a delimited string into an array in awk?

How to split the string when it contains pipe symbols | in it.
I want to split them to be in array.
I tried
echo "12:23:11" | awk '{split($0,a,":"); print a[3] a[2] a[1]}'
Which works fine. If my string is like "12|23|11" then how do I split them into an array?
Have you tried:
echo "12|23|11" | awk '{split($0,a,"|"); print a[3],a[2],a[1]}'
To split a string to an array in awk we use the function split():
awk '{split($0, array, ":")}'
# \/ \___/ \_/
# | | |
# string | delimiter
# |
# array to store the pieces
If no separator is given, it uses the FS, which defaults to the space:
$ awk '{split($0, array); print array[2]}' <<< "a:b c:d e"
c:d
We can give a separator, for example ::
$ awk '{split($0, array, ":"); print array[2]}' <<< "a:b c:d e"
b c
Which is equivalent to setting it through the FS:
$ awk -F: '{split($0, array); print array[2]}' <<< "a:b c:d e"
b c
In GNU Awk you can also provide the separator as a regexp:
$ awk '{split($0, array, ":*"); print array[2]}' <<< "a:::b c::d e
#note multiple :
b c
And even see what the delimiter was on every step by using its fourth parameter:
$ awk '{split($0, array, ":*", sep); print array[2]; print sep[1]}' <<< "a:::b c::d e"
b c
:::
Let's quote the man page of GNU awk:
split(string, array [, fieldsep [, seps ] ])
Divide string into pieces separated by fieldsep and store the pieces in array and the separator strings in the seps array. The first piece is stored in array[1], the second piece in array[2], and so forth. The string value of the third argument, fieldsep, is a regexp describing where to split string (much as FS can be a regexp describing where to split input records). If fieldsep is omitted, the value of FS is used. split() returns the number of elements created. seps is a gawk extension, with seps[i] being the separator string between array[i] and array[i+1]. If fieldsep is a single space, then any leading whitespace goes into seps[0] and any trailing whitespace goes into seps[n], where n is the return value of split() (i.e., the number of elements in array).
Please be more specific! What do you mean by "it doesn't work"?
Post the exact output (or error message), your OS and awk version:
% awk -F\| '{
for (i = 0; ++i <= NF;)
print i, $i
}' <<<'12|23|11'
1 12
2 23
3 11
Or, using split:
% awk '{
n = split($0, t, "|")
for (i = 0; ++i <= n;)
print i, t[i]
}' <<<'12|23|11'
1 12
2 23
3 11
Edit: on Solaris you'll need to use the POSIX awk (/usr/xpg4/bin/awk) in order to process 4000 fields correctly.
I do not like the echo "..." | awk ... solution as it calls unnecessary fork and execsystem calls.
I prefer a Dimitre's solution with a little twist
awk -F\| '{print $3 $2 $1}' <<<'12|23|11'
Or a bit shorter version:
awk -F\| '$0=$3 $2 $1' <<<'12|23|11'
In this case the output record put together which is a true condition, so it gets printed.
In this specific case the stdin redirection can be spared with setting an awk internal variable:
awk -v T='12|23|11' 'BEGIN{split(T,a,"|");print a[3] a[2] a[1]}'
I used ksh quite a while, but in bash this could be managed by internal string manipulation. In the first case the original string is split by internal terminator. In the second case it is assumed that the string always contains digit pairs separated by a one character separator.
T='12|23|11';echo -n ${T##*|};T=${T%|*};echo ${T#*|}${T%|*}
T='12|23|11';echo ${T:6}${T:3:2}${T:0:2}
The result in all cases is
112312
Actually awk has a feature called 'Input Field Separator Variable' link. This is how to use it. It's not really an array, but it uses the internal $ variables. For splitting a simple string it is easier.
echo "12|23|11" | awk 'BEGIN {FS="|";} { print $1, $2, $3 }'
I know this is kind of old question, but I thought maybe someone like my trick. Especially since this solution not limited to a specific number of items.
# Convert to an array
_ITEMS=($(echo "12|23|11" | tr '|' '\n'))
# Output array items
for _ITEM in "${_ITEMS[#]}"; do
echo "Item: ${_ITEM}"
done
The output will be:
Item: 12
Item: 23
Item: 11
Joke? :)
How about echo "12|23|11" | awk '{split($0,a,"|"); print a[3] a[2] a[1]}'
This is my output:
p2> echo "12|23|11" | awk '{split($0,a,"|"); print a[3] a[2] a[1]}'
112312
so I guess it's working after all..
echo "12|23|11" | awk '{split($0,a,"|"); print a[3] a[2] a[1]}'
should work.
echo "12|23|11" | awk '{split($0,a,"|"); print a[3] a[2] a[1]}'
code
awk -F"|" '{split($0,a); print a[1],a[2],a[3]}' <<< '12|23|11'
output
12 23 11
The challenge: parse and store split strings with spaces and insert them into variables.
Solution: best and simple choice for you would be convert the strings list into array and then parse it into variables with indexes. Here's an example how you can convert and access the array.
Example: parse disk space statistics on each line:
sudo df -k | awk 'NR>1' | while read -r line; do
#convert into array:
array=($line)
#variables:
filesystem="${array[0]}"
size="${array[1]}"
capacity="${array[4]}"
mountpoint="${array[5]}"
echo "filesystem:$filesystem|size:$size|capacity:$capacity|mountpoint:$mountpoint"
done
#output:
filesystem:/dev/dsk/c0t0d0s1|size:4000|usage:40%|mountpoint:/
filesystem:/dev/dsk/c0t0d0s2|size:5000|usage:50%|mountpoint:/usr
filesystem:/proc|size:0|usage:0%|mountpoint:/proc
filesystem:mnttab|size:0|usage:0%|mountpoint:/etc/mnttab
filesystem:fd|size:1000|usage:10%|mountpoint:/dev/fd
filesystem:swap|size:9000|usage:9%|mountpoint:/var/run
filesystem:swap|size:1500|usage:15%|mountpoint:/tmp
filesystem:/dev/dsk/c0t0d0s3|size:8000|usage:80%|mountpoint:/export
awk -F'['|'] -v '{print $1"\t"$2"\t"$3}' file <<<'12|23|11'