Remove bad characters to file name while spliting with awk

Remove bad characters to file name while spliting with awk - awk

I have a large file that I split with awk, using the last column as the name for the new files, but one of the columns include a "/", which gives can't open error.
I have tried make a function to transform the name for the file but awk don't use it when I run it, maybe a error from part:
tried_func() {
echo $1 | tr "/" "_"
}
awk -F ',' 'NR>1 {fname="a_map/" tried_func $NF".csv"; print >> fname;
close(fname)}' large_file.csv
Large_file.csv
A, row, I don't, need
plenty, with, columns, good_name
alot, off, them, another_good_name
more, more, more, bad/name
expected res:
list of file i a_map:
good_name.csv
another_good_name.csv
bad_name.csv
actual res:
awk: can't open file a_map/bad/name.csv
Don't need to be a function, if I can just skip the "/" in awk that is fab too.

Awk is not part of the shell, it's an independent programming language, so you can't call shell functions that way. Instead, just do the whole thing within awk:
$ awk -F ',' '
NR>1 {
gsub(/\//,"_",$NF) # replace /s with _s
fname="a_map/" $NF ".csv"
print >> fname
close(fname)
}' file

Related

How to use filenames having special characters with awk '{system("stat " $0)}'

For example, list.txt is like this:
abc.txt
-abc.txt
I couldn't get the correct answer with either
awk '{system("stat " $0)}' list.txt or awk '{system("stat \"" $0 "\"")}' list.txt.
How could I tell the awk-system to add quotes around the filename?
awk '{system("stat " $0)}' list.txt certainly would not work.
But why awk '{system("stat \"" $0 "\"")}' list.txt wouldn't either? It behaves just like the former.
But with awk '{system("stat \\\"" $0 "\\\"")}' list.txt, I would got this:
stat: cannot stat '"abc.txt"': No such file or directory

First of all, if you want to get the output of the stat command, system() is not the right way to go. It merely returns the return code instead of the execution output.
You may try cmd | getline myOutput in awk. The myOutput variable will hold the output (one line only). Or, you can write on a pipe print ... | cmd to print the output
Regarding your file -abc.txt. Quoting it isn't enough. You can try to execute it in terminal stat "-abc.txt" it won't work, as the filename starts with -. You need to add --: stat -- "-abc.txt" So, you probably want to check if the filename starts with - and add the -- in your awk code.
Finally, about quote, you can declare an awk variable, like awk -v q='"' '{.... then, when you want to use ", you use q, in this way, your code may easier to read. E.g., print "stat " q "myName" q

how to use "," as field delimiter [duplicate]

This question already has answers here:
Escaping separator within double quotes, in awk
(3 answers)
Closed 1 year ago.
i have a file like this:
"1","ab,c","def"
so only use comma a field delimiter will get wrong result, so i want to use "," as field delimiter, i tried like this:
awk -F "," '{print $0}' file
or like this:
awk -F "","" '{print $0}' file
or like this:
awk -F '","' '{print $0}' file
but the result is incorrect, don't know how to include "" as part of the field delimiter itself,

If you can handle GNU awk, you could use FPAT:
$ echo '"1","ab,c","def"' | # echo outputs with double quotes
gawk ' # use GNU awk
BEGIN {
FPAT="([^,]*)|(\"[^\"]+\")" # because FPAT
}
{
for(i=1;i<=NF;i++) # loop all fields
gsub(/^"|"$/,"",$i) # remove leading and trailing double quotes
print $2 # output for example the second field
}'
Output:
ab,c
FPAT cannot handle RS inside the quotes.

What you are attempting seems misdirected anyway. How about this instead?
awk '/^".*"$/{ sub(/^\"/, ""); sub(/\"$/, ""); gsub(/\",\", ",") }1'
The proper solution to handling CSV files with quoting in them is to use a language which has an actual CSV parser. My thoughts go to Python, which includes a csv module in its standard library.

In GNU AWK
{print $0}
does print whole line, if no change were made original line is printed, no matter what field separator you set you will get original lines if only action is print $0. Use $1=$1 to trigger string rebuild.
If you must do it via FS AT ANY PRICE, then you might do it as follows: let file.txt content be
"1","ab,c","def"
then
BEGIN{FS="\x22,?\x22?"}{$1=$1;print $0}
output
1 ab,c def
Note leading space (ab,c is $3). Explanation: I inform GNU AWK that field separator is literal " (\x22, " is 22(hex) in ASCII) followed by zero or one (?) , followed by zero or one (?) literal " (\x22). $1=$1 trigger line rebuilt as mentioned earlier. Disclaimer: this solution assume that you never have escaped " inside your string,
(tested in gawk 4.2.1)

Find line in file and replace it with line from another file

My goal is to find a string within a file (file1) and replace its whole line with the content of a specific line (in this example line 3) from another file (file2). As I understand, I need to use RegEx to do the first part and then use a second sed command to store the contents of file2. sed is definitely not my strong suit, so I hope someone here can help a rookie out!
So far I have:
sed -i '/^matching.string.here*/s' <(sed '3!d' file2) file1
Edit
Example file1:
string one
string two
matching.string.here.
string three
Example file2:
alt string one
alt string two
alt string three
Expected Result in file1:
string one
string two
alt string three
string three

Your sed attempt contains several unexplained errors; it's actually hard to see what you are in fact trying to do.
You probably want to do something along the lines of
sed '3!d;s%.*%s/^matching\.string\.here.*/&/%' file2 |
sed -f - -i file1
It's unclear what you hope for the /s to mean; does your sed have a flag with this name?
This creates a sed script from the third line of file2; take out the pipeline to sed -f - to see what the generated script looks like. (If your sed does not allow you to pass in a script on standard input, you will have to write it to a temporary file, and then pass that to the second sed.)
Anyway, this is probably both simpler and more robust using Awk.
awk 'NR==3 && NR==FNR { keep=$0; next }
/^matching\.string\.here/ { $0 = keep } 1' file2 file1
This writes the new content to standard output. If you have GNU Awk, you can explore its -i inplace option; otherwise, you will need to write the result to a file, then move it back to file1.

This might work for you (GNU sed):
sed -n '3s#.*#sed "/matching\\.string\\.here\\./c&" file1#ep' file2
Focus on line 3 of file2.
Manufacture a sed script which changes a matching line in file1 to the contents of the line in focus and print the result.
N.B. The periods in the match must be escaped twice so as not to match an arbitrary character.

This is tailor made job for awk and bonus is that you can avoid any regex:
awk -v s='matching.string.here' 'FNR == NR {a[FNR] = $0; next} index($0, s) {$0 = a[FNR]} 1' file2 file1
string one
string two
alt string three
string three
A more readable version:
awk -v s='matching.string.here' '
FNR == NR {
a[FNR] = $0
next
}
index($0, s) {
$0 = a[FNR]
} 1' file2 file1

Strip last field

My script will be receiving various lengths of input and I want to strip the last field separated by a "/". An example of the input I will be dealing with is.
this/that/and/more
But the issue I am running into is that the length of the input will vary like so:
this/that/maybe/more/and/more
or/even/this/could/be/it/and/maybe/more
short/more
In any case, the expected output should be the whole string minus the last "/more".
Note: The word "more" will not be a constant these are arbitrary examples.
Example input:
this/that/and/more
this/that/maybe/more/and/more
Expected output:
this/that/and
this/that/maybe/more/and
What I know works for a string you know the length of would be
cut -d'/' -f[x]
With what I need is a '/' delimited AWK command I'm assuming like:
awk '{$NF=""; print $0}'

With awk as requested:
$ awk '{sub("/[^/]*$","")} 1' file
this/that/maybe/more/and
or/even/this/could/be/it/and/maybe
short
but this is the type of job sed is best suited for:
$ sed 's:/[^/]*$::' file
this/that/maybe/more/and
or/even/this/could/be/it/and/maybe
short
The above were run against this input file:
$ cat file
this/that/maybe/more/and/more
or/even/this/could/be/it/and/maybe/more
short/more

Depending on how you have the input in your script, bash's Shell Parameter Expansion may be convenient:
$ s1=this/that/maybe/more/and/more
$ s2=or/even/this/could/be/it/and/maybe/more
$ s3=short/more
$ echo ${s1%/*}
this/that/maybe/more/and
$ echo ${s2%/*}
or/even/this/could/be/it/and/maybe
$ echo ${s3%/*}
short
(Lots of additional info on parameter expansion at https://www.gnu.org/software/bash/manual/html_node/Shell-Parameter-Expansion.html)

In your script, you could create a loop that removes the last character in the input string if it is not a slash through each iteration. Then, when the loop finds a slash character, exit the loop then remove the final character (which is supposed to be a slash).
Pseudo-code:
while (lastCharacter != '/') {
removeLastCharacter();
}
removeLastCharacter(); # removes the slash
(Sorry, it's been a while since I wrote a bash script.)

Another awk alternative using fields instead of regexs
awk -F/ '{printf "%s", $1; for (i=2; i<NF; i++) printf "/%s", $i; printf "\n"}'

Here is an alternative shell solution:
while read -r path; do dirname "$path"; done < file

Count field separators on each line of input file and if missing/exceeding, output filename to error file

I have to validate the input file, Input.txt, for proper number of field separators on each row and if even one row including the header is missing or exceeding the correct number of field separators then print the name of the file to errorfiles.txt and exit.
I have another file to use as reference for the correct number of field separators, valid.txt, then compare the number of field separators on each row of the input file with the number of field separators in the valid.txt file.
awk -F '|' '{ print NF-1; exit }' valid.txt > fscount
awk -F '|' '(NF-1) != "cat fscount" { print FILENAME>"errorfiles.txt"; exit}' Input.txt
This is not working.
awk -F '|' '{ print NF-1; exit }' valid.txt > fscount
awk -F '|' '(NF-1) != "cat fscount" { print FILENAME>"errorfiles.txt"; exit}' Input.txt

It is not fully clear what your requirement is, to print the FILENAME on just a single input file provided, perhaps you wanted to loop over a list of files on a directory running this command?
Anyway, to use the content of the file in the context of awk, just use its -v switch and use input re-direction on the file
awk -F '|' -v count="$(<fscount)" -v fname="errorfiles.txt" '(NF-1) != (count+0) { print FILENAME > fname; close(fname); exit}' Input.txt
Notice the use of close(filename) here, which is generally required when you are manipulating files inside awk constructs. The close() call just closes the file descriptor associated with opening the file pointed by filename explicitly, instead of letting the OS do it.

GNU awk solution:
awk -F '|' 'ARGIND==1{aimNF=NF; nextfile} ARGIND==2{if (NF!=aimNF) {print FILENAME > "errorfiles.txt"; exit}}' valid.txt Input.txt
You can do it with just one command,
-- use awk to read two files, store NF number of 1st file, and compare it in the second file.
For other awk you can replace ARGIND==1 with FILENAME==ARGV[1], and so on.
Or if you are sure first file won't be empty, use NR==FNR and NR>FNR instead.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Remove bad characters to file name while spliting with awk - awk

Awk is not part of the shell, it's an independent programming language, so you can't call shell functions that way. Instead, just do the whole thing within awk: $ awk -F ',' ' NR>1 { gsub(/\//,"_",$NF) # replace /s with _s fname="a_map/" $NF ".csv" print >> fname close(fname) }' file

Related

How to use filenames having special characters with awk '{system("stat " $0)}'

how to use "," as field delimiter [duplicate]

Find line in file and replace it with line from another file

Strip last field

Count field separators on each line of input file and if missing/exceeding, output filename to error file

Categories

Resources