how to assign a string with multiple white spaces to a variable using awk - variables

I'm running into issues saving awk output to a variable and preserving the input. I have a tab-delimited file, ABC.txt, with the following format: Field 1 = A,
Field 2 = A B <- 2 spaces between the letters A and B,
Field 3 = C.
set text = `awk -F "\t" '{print $2}' ABC.txt`
echo $text
A B <- 1 space between the letters A and B,
echo "$text"
A B <- 1 space between the letters A and B,
I need a solution where the variable preserves multiple spaces (I've seen instances where there can be two or three consecutive spaces):
echo $text = A B <- 2 spaces between the letters A and B,
or
echo "$text" = A B <- 2 spaces between the letters A and B,

In CSH, for keeping the blanks you need to double-quote the variable-expansion and the backticks-expression; a simple case would be:
set text = "`echo 'A B'`"
echo "$text"
A B
But unlike POSIX compliant shells, CSH doesn't start a new quoting-context for the command-substitutions (aka. backticks-expressions); once you double-quote it, the escaping rules becomes those of the surrounding double-quotes:
no backslash escaping.
can't use a ".
$ starts a variable-expansion.
` starts a command-substitution.
etc...?
The way to include those characters as literals inside double-quotes is in fact to append them from outside: close the surrounding double-quotes, append the character (with backslash-escaping or inside single-quotes) and then reopen the double-quotes; the shell will take care of the concatenation for you.
Examples:
echo "..."\""..."
echo "..."\$"..."
echo "..."\`"..."
echo "..."'"'"..."
echo "..."'$'"..."
echo "..."'`'"..."
Both of those output:
..."...
...$...
...`...
What you need to do when double-quoting the command-substitution of awk -F "\t" '{print $2}' ABC.txt is then:
set text = "`awk -F "\""\t"\"" '{print "\$"2}' ABC.txt`"
echo "$text"
A B
BTW, there's no point in using awk -F "\t", it's easier with awk -F '\t' instead:
set text = "`awk -F '\t' '{print "\$"2}' ABC.txt

Related

Add a comma to every column value in a table [unix]

I have a file produced from a program that is filled with values as such :
1 [4:space] 2 [4:space] 3 [4:space] ... N
There is 4 space between each values, I want to remove the 3 spaces and place commas after each values to get the final results :
1, 2, 3, ..., N
I found out from other topics that this command can remove the 3 spaces :
awk -F' +' -v OFS='\t' '{sub(/ +$/,""); $1=$1}1' file
I need to add commas then, or maybe is there a way to removes the space and add commas at the same time.
To replace all space with comma and a space, use:
$ awk '{gsub(/ +/,", ")}1' file
1, 2, 3, ..., N
To replace exactly three spaces with a comma, use:
$ awk '{gsub(/ {3}/,",")}1' file
Using field delimiters for it:
$ awk -F" " -v OFS=", " '{$1=$1}1' file
Using GNU sed to modify file in place:
sed -i -e 's/ /, /g' file
And with the brackets:
sed -i -e 's/ /, /g;s/^/[/;s/$/]/' file
how about hands-free driving with awk?
{m,n,g}awk NF=NF RS='\r?\n' OFS=', '
It'll handle both CRLF from windows and unix LF, trim both ends,and place ", " in between each field

gawk - Delimit lines with custom character and no similar ending character

Let's say I have a file like so:
test.txt
one
two
three
I'd like to get the following output: one|two|three
And am currently using this command: gawk -v ORS='|' '{ print $0 }' test.txt
Which gives: one|two|three|
How can I print it so that the last | isn't there?
Here's one way to do it:
$ seq 1 | awk -v ORS= 'NR>1{print "|"} 1; END{print "\n"}'
1
$ seq 3 | awk -v ORS= 'NR>1{print "|"} 1; END{print "\n"}'
1|2|3
With paste:
$ seq 1 | paste -sd'|'
1
$ seq 3 | paste -sd'|'
1|2|3
Convert one column to one row with field separator:
awk '{$1=$1} 1' FS='\n' OFS='|' RS='' file
Or in another notation:
awk -v FS='\n' -v OFS='|' -v RS='' '{$1=$1} 1' file
Output:
one|two|three
See: 8 Powerful Awk Built-in Variables – FS, OFS, RS, ORS, NR, NF, FILENAME, FNR
awk solutions work great. Here is tr + sed solution:
tr '\n' '|' < file | sed 's/\|$//'
1|2|3
just flatten it :
gawk/mawk 'BEGIN { FS = ORS; RS = "^[\n]*$"; OFS = "|"
} NF && ( $NF ? NF=NF : —-NF )'
ascii | = octal \174 = hex 0x7C. The reason for —-NF is that more often than not, the input includes a trailing new line, which makes field count 1 too many and result in
1|2|3|
Both NF=NF and --NF are similar concepts to $1=$1. Empty inputs, regardless of whether trailing new lines exist or not, would result in nothing printed.
At the OFS spot, you can delimit it with any string combo you like instead of being constrained by tr, which has inconsistent behavior. For instance :
gtr '\012' '高' # UTF8 高 = \351\253\230 = xE9 xAB x98
on bsd-tr, \n will get replaced by the unicode properly 1高2高3高 , but if you're on gnu-tr, it would only keep the leading byte of the unicode, and result in
1 \351 2 \351 . . .
For unicode equiv-classes, bsd-tr works as expected while gtr '[=高=]' '\v' results in
gtr: ?\230: equivalence class operand must be a single character
and if u attempt equiv-classes with an arbitrary non-ASCII byte, bsd-tr does nothing while gnu-tr would gladly oblige, even if it means slicing straight through UTF8-compliant characters :
g3bn 77138 | (g)tr '[=\224=]' '\v'
bsd-tr : 77138=Koyote 코요태 KYT✜ 高耀太
gnu-tr : 77138=Koyote ?
?
태 KYT✜ 高耀太
I would do it following way, using GNU AWK, let test.txt content be
one
two
three
then
awk '{printf NR==1?"%s":"|%s", $0}' test.txt
output
one|two|three
Explanation: If it is first line print that line content sans trailing newline, otherwise | followed by line content sans trailing newline. Note that I assumed that test.txt has not trailing newline, if this is not case test this solution before applying it.
(tested in gawk 5.0.1)
Also you can try this with awk:
awk '{ORS = (NR%3 ? "|" : RS)} 1' file
one|two|three
% is the modulo operator and NR%3 ? "|" : RS is a ternary expression.
See Ed Morton's explanation here: https://stackoverflow.com/a/55998710/14259465
With a GNU sed, you can pass -z option to match line breaks, and thus all you need is replace each newline but the last one at the end of string:
sed -z 's/\n\(.\)/|\1/g' test.txt
perl -0pe 's/\n(?!\z)/|/g' test.txt
perl -pe 's/\n/|/g if !eof' test.txt
See the online demo.
Details:
s - substitution command
\n\(.\) - an LF char followed with any one char captured into Group 1 (so \n at the end of string won't get matched)
|\1 - a | char and the captured char
g - all occurrences.
The first perl command matches any LF char (\n) not at the end of string ((?!\z)) after slurping the whole file into a single string input (again, to make \n visible to the regex engine).
The second perl command replaces an LF char at the end of each line except the one at the end of file (eof).
To make the changes inline add -i option (mind this is a GNU sed example):
sed -i -z 's/\n\(.\)/|\1/g' test.txt
perl -i -0pe 's/\n(?!\z)/|/g' test.txt
perl -i -pe 's/\n/|/g if !eof' test.txt

Use a word as a delimiter for cut command in linux without using awk

I'm trying to take the second segment of a line inside a file delimited by a certain string, and it would usually work something like awk -F ':: My Delimiter ::' '{print $2}', but if I try to print $2 it will print the second argument passed onto the function in which the awk command is located. Is there an alternate way to split a line by a delimiter and print a certain part of the result?
This is the exact line I'm having issues with:
for transaction in $(cat $1)
do
echo "$transaction" | awk -F ':: My Delimiter ::' '{print $2}' >> testLog.data.out
done
Note: The delimiter would be exactly as described. :: My Delimiter ::
Rather than using cut, I think you should really use awk - your example awk -F ':: My Delimiter ::' '{print $2}' should work. If you printed the second arguments passed into the function containing that awk, then that means the $2 was not inside single quotes - maybe you used double quotes? This wouldn't work (notice the double quotes):
awk -F ':: My Delimiter ::' "{print $2}"
But this would (your example):
awk -F ':: My Delimiter ::' '{print $2}'
You can use this code:
echo ':: My Delimiter ::' | awk '{split($0,v,"My Delimiter"); print v[2]}'
>>> echo 'ThisIsINeed0 My Delimiter ThisIsINeed1' | awk '{split($0,v," My Delimiter "); print v[1]}'
ThisIsINeed0
>>> echo 'ThisIsINeed0 My Delimiter ThisIsINeed1' | awk '{split($0,v," My Delimiter "); print v[2]}'
ThisIsINeed1
>>> echo 'This Is I Need 0 My Delimiter This Is I Need 1' | awk '{split($0,v," My Delimiter "); print v[2]}'
This Is I Need 1
>>> echo 'This Is I Need 0 My Delimiter This Is I Need 1' | awk '{split($0,v," My Delimiter "); print v[1]}'
This Is I Need 0

awk or sed etc replace comma with | but where between quotes

I have a delimited file that I'm trying to replace the commas with an or bar | except where the comma (and other text) is between quotes (")
I know that I can replace the comma using sed 's/,/|/g' filename but I'm not sure how to have the text between quotes as an exception to the rule. Or if it is even possible this easily.
As guys recommended here the best and safest is to read csv as csv with appropriate module/library.
Anyway if You wanna sed here it is:
sed -i 's/|//g;y/,/|/;:r;s/\("[^"]*\)|\([^"]*"\)/\1,\2/g;tr' file.csv
Procedure:
Firstly it removes any pipes from csv not to corrupt csv.
Secondly it transforms all commas to pipes
Thirdly it "recovers" recursively all quoted pipes to commas.
Test:
$ cat file.csv
aaa,1,"what's up"
bbb,2,"this is pipe | in text"
ccc,3,"here is comma, in text"
ddd,4, ",, here a,r,e multi, commas,, ,,"
"e,e",5,first column
$ cat file.csv | sed 's/|//g;y/,/|/;:r;s/\("[^"]*\)|\([^"]*"\)/\1,\2/g;tr'
aaa|1|"what's up"
bbb|2|"this is pipe in text"
ccc|3|"here is comma, in text"
ddd|4| ",, here a,r,e multi, commas,, ,,"
"e,e"|5|first column
$ cat file.csv | sed 's/|//g;y/,/|/;:r;s/\("[^"]*\)|\([^"]*"\)/\1,\2/g;tr' | awk -F'|' '{ print NF }'
3
3
3
3
3
You can try this sed :
sed ':A;s/\([^"]*"[^"]*"\)\([^"]*\)\(,\)/\1|/;tA' infile
Using GNU awk, FPAT and #Kubator's sample file:
$ awk '
BEGIN {
FPAT="([^,]+)|( *\"[^\"]+\" *)" # define the field pattern, notice the space before "
OFS="|" # output file separator
}
{
$1=$1 # rebuild the record
}1' file # output
aaa|1|"what's up"
bbb|2|"this is pipe | in text"
ccc|3|"here is comma, in text"
ddd|4| ",, here a,r,e multi, commas,, ,,"
"e,e"|5|first column

awk: changing OFS without looping though variables

I'm working on an awk one-liner to substitute commas to tabs in a file ( and swap \\N for missing values in preparation for MySQL select into).
The following link http://www.unix.com/unix-for-dummies-questions-and-answers/211941-awk-output-field-separator.html (at the bottom) suggest the following approach to avoid looping through the variables:
echo a b c d | awk '{gsub(OFS,";")}1'
head -n1 flatfile.tab | awk -F $'\t' '{for(j=1;j<=NF;j++){gsub(" +","\\N",$j)}gsub(OFS,",")}1'
Clearly, the trailing 1 (can be a number, char) triggers the printing of the entire record. Could you please explain why this is working?
SO also has Print all Fields with AWK separated by OFS , but in that post it seems unclear why this is working.
Thanks.
Awk evaluates 1 or any number other than 0 as a true-statement. Since, true statements without the action statements part are equal to { print $0 }. It prints the line.
For example:
$ echo "hello" | awk '1'
hello
$ echo "hello" | awk '0'
$