Ignoring escaped delimiters (commas) with awk? - awk

If I had a string with escaped commas like so:
a,b,{c\,d\,e},f,g
How might I use awk to parse that into the following items?
a
b
{c\,d\,e}
f
g

{
split($0, a, /,/)
j=1
for(i=1; i<=length(a); ++i) {
if(match(b[j], /\\$/)) {
b[j]=b[j] "," a[i]
} else {
b[++j] = a[i]
}
}
for(k=2; k<=length(b); ++k) {
print b[k]
}
}
Split into array a, using ',' as delimiter
Build array b from a, merging lines that end in '\'
Print array b (Note: Starts at 2 since first item is blank)
This solution presumes (for now) that ',' is the only character that is ever escaped with '\'--that is, there is no need to handle any \\ in the input, nor weird combinations such as \\\,\\,\\\\,,\,.

{
gsub("\\\\,", "!Q!")
n = split($0, a, ",")
for (i = 1; i <= n; ++i) {
gsub("!Q!", "\\,", a[i])
print a[i]
}
}

I don't think awk has any built-in support for something like this. Here's a solution that's not nearly as short as DigitalRoss's, but should have no danger of ever accidentally hitting your made-up string (!Q!). Since it tests with an if, you could also extend it to be careful about whether you actually have \\, at the end of your string, which should be an escaped slash, not comma.
BEGIN {
FS = ","
}
{
curfield=1
for (i=1; i<=NF; i++) {
if (substr($i,length($i)) == "\\") {
fields[curfield] = fields[curfield] substr($i,1,length($i)-1) FS
} else {
fields[curfield] = fields[curfield] $i
curfield++
}
}
nf = curfield - 1
for (i=1; i<=nf; i++) {
printf("%d: %s ",i,fields[i])
}
printf("\n")
}

Related

equivalent to cut -f1-3,5 in awk

I want fields 1,2,3,5
With cut I do:
cut -f1-3,5
However with awk I would do:
awk '{for (i=0;i<=5;i++) {if (i!=4) {print $i}} }'
But I want to make it more succinct. Moreover, in other cases I could have more fields with varying distances. awk '{for (i in 1 2 3 5) {print $i}}' doesn't work. How can I do this
For the job of picking fields by position number and field ranges etc cut does the job better. If you really want to mimic this behavior in awk assuming you have other tasks to do in awk as well, you may consider following code:
cat fcut.awk
BEGIN {
n = split(f, a, /,/)
for (i=1; i<=n; ++i) {
if (split(a[i], b, /-/) == 2) {
for (j=b[1]; j<=b[2]; ++j)
fld[j]
}
else
fld[a[i]]
}
}
{
for (i=1; i<=NF; ++i) {
if (i in fld)
s = (st++ ? s OFS : "") $i
}
print s
s = st = ""
}
Now run it as:
awk -v f='1-3,5' -f fcut.awk file
This does what cut does and a bit more:
$ echo 'a b c d e f g' |
awk -v ranges='1-3,5' '
BEGIN {
split(ranges,r,/,/)
for ( i=1; i in r; i++ ) {
n = split(r[i],range,/-/)
for ( j=range[1]; j<=range[n]; j++ ) {
f[++onf] = j
}
}
}
{
for ( i=1; i<=onf; i++ ) {
printf "%s%s", $(f[i]), (i<onf ? OFS : ORS)
}
}
'
a b c e
The above assumes if you specify the same field number multiple times then you want it printed that many times, and you want the fields printed in the order you specify so you can, for example, rearrange order and/or duplicate fields, e.g.:
$ echo 'a b c d e f g' |
awk -v ranges='6,1-3,5,2,1' '
BEGIN {
split(ranges,r,/,/)
for ( i=1; i in r; i++ ) {
n = split(r[i],range,/-/)
for ( j=range[1]; j<=range[n]; j++ ) {
f[++onf] = j
}
}
}
{
for ( i=1; i<=onf; i++ ) {
printf "%s%s", $(f[i]), (i<onf ? OFS : ORS)
}
}
'
f a b c e b a

Awk create a new array of unique values from another array

I have my array:
array = [1:"PLCH2", 2:"PLCH1", 3:"PLCH2"]
I want to loop on array to create a new array unique of unique values and obtain:
unique = [1:"PLCH2", 2:"PLCH1"]
how can I achieve that ?
EDIT: as per #Ed Morton request, I show below how my array is populated. In fact, this post is the key solution to my previous post.
in my file.txt, I have:
PLCH2:A1007int&PLCH1:D987int&PLCH2:P977L
INTS11:P446P&INTS11:P449P&INTS11:P518P&INTS11:P547P&INTS11:P553P
I use split to obtain array:
awk '{
split($0,a,"&")
for ( i in a ) {
split(a[i], b, ":");
array[i] = b[1];
}
}' file.txt
This might be what you're trying to do:
$ cat tst.awk
BEGIN {
split("PLCH2 PLCH1 PLCH2",array)
printf "array ="
for (i=1; i in array; i++) {
printf " %s:\"%s\"", i, array[i]
}
print ""
for (i=1; i in array; i++) {
if ( !seen[array[i]]++ ) {
unique[++j] = array[i]
}
}
printf "unique ="
for (i=1; i in unique; i++) {
printf " %s:\"%s\"", i, unique[i]
}
print ""
}
$ awk -f tst.awk
array = 1:"PLCH2" 2:"PLCH1" 3:"PLCH2"
unique = 1:"PLCH2" 2:"PLCH1"
EDIT: given your updated question, here's how I'd really approach that:
$ cat tst.awk
BEGIN { FS="[:&]" }
{
numVals=0
for (i=1; i<NF; i+=2) {
vals[++numVals] = $i
}
print "vals =" arr2str(vals)
delete seen
numUniq=0
for (i=1; i<=numVals; i++) {
if ( !seen[vals[i]]++ ) {
uniq[++numUniq] = vals[i]
}
}
print "uniq =" arr2str(uniq)
}
function arr2str(arr, str, i) {
for (i=1; i in arr; i++) {
str = str sprintf(" %s:\"%s\"", i, arr[i])
}
return str
}
$ awk -f tst.awk file
vals = 1:"PLCH2" 2:"PLCH1" 3:"PLCH2"
uniq = 1:"PLCH2" 2:"PLCH1"
vals = 1:"INTS11" 2:"INTS11" 3:"INTS11" 4:"INTS11" 5:"INTS11"
uniq = 1:"INTS11" 2:"PLCH1"

how to substitute from a string to multiple patterns using awk/sed?

I have a patternfile.txt.
1 FPAT = "pata"
2 FPAT = "patb"
3 FPAT = "patc"
and a awkfile.txt.
BEGIN { FPAT }
{
for(i=1; i<=NF; i++){
print("%s\n", $i)}
}
I want to make multiple files after substitute a string 'FPAT' to each patterns pata, patb, patc one by one like these.
awkfile1.txt :
BEGIN { FPAT = "pata" }
{
for(i=1; i<=NF; i++){
print("%s\n", $i)}
}
awkfile2.txt :
BEGIN { FPAT = "patb" }
{
for(i=1; i<=NF; i++){
print("%s\n", $i)}
}
awkfile3.txt :
BEGIN { FPAT = "patc" }
{
for(i=1; i<=NF; i++){
print("%s\n", $i)}
}
Please help me.
I've got a feeling that this is a bit of an XY problem but I think you can get the output you're looking for in one invocation of awk:
awk '{ print "BEGIN { " substr($0, index($0, $2)) " }" > ("awkfile" $1 ".txt") }' patternfile.txt
This simply reads each line of the input file and produces the writes the part you're interested in (from the second field to the end of the line) to the output file. It uses the first field of the input to determine the output file name.

Awk input variable as a rule

Good day!
I have the next code:
BLOCK=`awk '
/\/\* R \*\// {
level=1
count=0
}
level {
n = split($0, c, "");
for (i = 1; i <= n; i++)
{
printf(c[i]);
if (c[i] == ";")
{
if(level==1)
{
level = 0;
if (count != 0)
printf("\n");
};
}
else if (c[i] == "{")
{
level++;
count++;
}
else if (c[i] == "}")
{
level--;
count++;
}
}
printf("\n")
}' $i`
That code cuts the piece of the file from /* R */ mark to the ';' symbol with taking into account the details like braces etc. But that isn't important. I want to replace the hard-coded /* R */ by the variable:
RECORDSEQ="/* R */"
...
BLOCK=`awk -v rec="$RECORDSEQ" '
rec {
level=1
count=0
}
But that doesn't work.
How can I fix it?
Thank you in advance.
Found the solution:
RECORDSEQ="/* R */"
# Construct regexp for awk
RECORDSEQREG=`echo "$RECORDSEQ" | sed 's:\/:\\\/:g;s:\*:\\\*:g'`
# Cycle for files
for i in $SOURCE;
do
# Find RECORDSEQ and cut out the block
BLOCK=`awk -v rec="$RECORDSEQREG" '
$0 ~ rec {
level=1
count=0
}
...
Many thanks to people who helped.

awk '/range start/,/range end/' within script

How do I use the awk range pattern '/begin regex/,/end regex/' within a self-contained awk script?
To clarify, given program csv.awk:
#!/usr/bin/awk -f
BEGIN {
FS = "\""
}
/TREE/,/^$/
{
line="";
for (i=1; i<=NF; i++) {
if (i != 2) line=line $i;
}
split(line, v, ",");
if (v[5] ~ "FOAM") {
print NR, v[5];
}
}
and file chunk:
TREE
10362900,A,INSTL - SEAL,Revise
,10362901,A,ASSY / DETAIL - PANEL,Revise
,,-203,ASSY - PANEL,Qty -,Add
,,,-309,PANEL,Qty 1,Add
,,,,"FABRICATE FROM TEKLAM NE1G1-02-250 PER TPS-CN-500, TYPE A"
,,,-311,PANEL,Qty 1,Add
,,,,"FABRICATE FROM TEKLAM NE1G1-02-750 PER TPS-CN-500, TYPE A"
,,,-313,FOAM SEAL,1.00 X 20.21 X .50 THK,Qty 1,Add
,,,,"BMS1-68, GRADE B, FORM II, COLOR BAC706 (BLACK)"
,,,-315,FOAM SEAL,1.50 X 8.00 X .25 THK,Qty 1,Add
,,,,"BMS1-68, GRADE B, FORM II, COLOR BAC706 (BLACK)"
,PN HERE,Dual Lock,Add
,
10442900,IR,INSTL - SEAL,Update (not released)
,10362901,A,ASSY / DETAIL - PANEL,Revise
,PN HERE,Dual Lock,Add
I want to have this output:
27 FOAM SEAL
29 FOAM SEAL
What is the syntax for adding the command line form '/begin regex/,/end regex/' to the script to operate on those lines only? All my attempts lead to syntax errors and googling only gives me the cli form.
why not use 2 steps:
% awk '/start/,/end/' < input.csv | awk csv.awk
Simply do:
#!/usr/bin/awk -f
BEGIN {
FS = "\""
}
/from/,/to/ {
line="";
for (i=1; i<=NF; i++) {
if (i != 2) line=line $i;
}
split(line, v, ",");
if (v[5] ~ "FOAM") {
print NR, v[5];
}
}
If the from to regexes are dynamic:
#!/usr/bin/awk -f
BEGIN {
FS = "\""
FROM=ARGV[1]
TO=ARGV[2]
if (ARGC == 4) { # the pattern was the only thing, so force read from standard input
ARGV[1] = "-"
} else {
ARGV[1] = ARGV[3]
}
}
{ if ($0 ~ FROM) { p = 1 ; l = 0} }
{ if ($0 ~ TO) { p = 0 ; l = 1} }
{
if (p == 1 || l == 1) {
line="";
for (i=1; i<=NF; i++) {
if (i != 2) line=line $i;
}
split(line, v, ",");
if (v[5] ~ "FOAM") {
print NR, v[5];
}
l = 0 }
}
Now you have to call it like: ./scriptname.awk "FROM_REGEX" "TO_REGEX" INPUTFILE. The last param is optional, if missing STDIN can be used.
HTH
You need to show us what you have tried. Is there something about /begin regex/ or /end regex/ you're not telling us, other wise your script with the additions should work, i.e.
#!/usr/bin/awk -f
BEGIN {
FS = "\""
}
/begin regex/,/end regex/{
line="";
for (i=1; i<=NF; i++) {
if (i != 2) line=line $i;
}
split(line, v, ",");
if (v[5] ~ "FOAM") {
print NR, v[5];
}
}
OR are you using an old Unix, where there is old awk as /usr/bin/awk and New awk as /usr/bin/nawk. Also see if you have /usr/xpg4/bin/awk or gawk (path could be anything).
Finally, show us the error messages you are getting.
I hope this helps.