Inserting characters into specified fields in large files

Inserting characters into specified fields in large files - awk

Here is my question. I have several hundred files that are too large to edit utilizing vi editor. I'm looking for a possible awk or sed command to manipulate my files. Bit of a rookie. I have a simplified file:
001|1|3|053412|16|1234|||
001|21|4|123618|15|88|||
The files were created, with the fourth field being in the wrong format.
The fourth field should be 05:34:12 reflecting HH:MM:SS. The time values are correct, I just need to insert the : in the appropriate places.
How do I insert the colons after the second character and the fourth characters in the fourth field? I cannot do it by character count since the values before and after the fourth field are variable.

gawk to the rescue!
$ awk -F\| -v OFS=\| '{$4=gensub(/(..)(..)(..)/,"\\1:\\2:\\3","g",$4)}1' file
001|1|3|05:34:12|16|1234|||
001|21|4|12:36:18|15|88|||
otherwise you can do the same with substr($4,1,2)":"...

With GNU awk for gensub() and inplace editing
awk -i inplace 'BEGIN{FS=OFS="|"} {$4=gensub(/(..)(..)/,"\\1:\\2:",1,$4)} 1' *
Similarly with GNU sed for EREs and inplace editing:
sed -i -E 's/(([^|]*\|){3}..)(..)/\1:\3:/' *
e.g.:
$ awk 'BEGIN{FS=OFS="|"} {$4=gensub(/(..)(..)/,"\\1:\\2:",1,$4)} 1' file
001|1|3|05:34:12|16|1234|||
001|21|4|12:36:18|15|88|||
$ sed -E 's/(([^|]*\|){3}..)(..)/\1:\3:/' file
001|1|3|05:34:12|16|1234|||
001|21|4|12:36:18|15|88|||

With sed:
$ sed -r 's/([^|]*|[^|]*|[^|]*|)([0-9]{2})([0-9]{2})([0-9]{2})/\1\2:\3:\4/' file
001|1|3|05:34:12|16|1234|||
001|21|4|12:36:18|15|88|||

This might work for you (GNU sed):
sed -r 's/^(([^|]*\|){3})(..)(..)/\1\3:\4:/' file
Use back references to group the first three fields and the following two pairs. Then format the 4th field as required.

Try this awk!
awk -F"|" -v OFS="|" '{r=split($4,T,"");for(i=2;i<=r;i+=2){if(i!=r)T[i]=T[i]":"}tmp="";for(i=1;i<=r;i++){tmp=tmp T[i]}$4=tmp;}1' file
001|1|3|05:34:12|16|1234|||
001|21|4|12:36:18|15|88|||
Longer whit explanation:
BEGIN{
FS=OFS="|"; #Field separator and output field separator
}
{
tmp="";
r=split($4,time_field,""); # Chunk field into pieces
for(i=2;i<=r;i+=2) # Loop two by two
{
if(i!=r)
{
time_field[i]=time_field[i]":"; # Add ":"
}
}
for(i=1;i<=r;i++) # Loop over again to rebuild
{
tmp=tmp time_field[i];
}
$4=tmp; #rebuid field
print
}
How you could use it in bash: Save it as whatever.awk
while IFS='' read -r file
do
awk -f whatever.awk "$file" > out_file
done < list_of_files_to_edit.txt
If you want edit the files in place, you can add the option -i to Kenavoz sed command. sed -ri ...

Related

awk command to print columns with colum data

cat file1.txt | awk -F '{print $1 "|~|" $2 "|~|" $3}' > file2.txt
I am using above command to filter first three columns from file1 and put into file.
But only getting the column names and not the column data.
How to do that?
|~| - is the delimiter.
file1.txt has values as :
a|~|b|~|c|~|d|~|e
1|~|2|~|3|~|4|~|5
11|~|22|~|33|~|44|~|55
111|~|222|~|333|~|444|~|555
my expedted output is :
a|~|b|~|c
1|~|2|~|3
11|~|22|~|33
111|~|222|~|333

With your shown samples, please try following awk code. You need to set field separator to |~| and remove starting space from lines, then print the lines.
awk -F'\\|~\\|' -v OFS='|~|' '{sub(/^[[:blank:]]+/,"");print $1,$2,$3}' Input_file
In case you want to keep spaces(which was in initial post before edit) then try following:
awk -F'\\|~\\|' -v OFS='|~|' '{print $1,$2,$3}' Input_file
NOTE: Had a chat with user in room and got to know why this code was not working for user because of gunzip -c file was being used wrongly, its output was being saved into a variable on which user was running awk program, so correcting that command generated right file and awk program ran fine on it. Adding this as a reference for future readers.

One approach would be:
awk -v FS="," -v OFS="|~|" '{gsub(/[|][~][|]/,","); sub(/^\s*/,""); print $1,$2,$3}' file1.txt
The approach simply replaces all "|~|" with a "," setting the output file separator to "|~|". All leading whitespace is trimmed with sub().
Example Use/Output
With your data in file1.txt, you would have:
$ awk -v FS="," -v OFS="|~|" '{gsub(/[|][~][|]/,","); sub(/^\s*/,""); print $1,$2,$3}' file1.txt
a|~|b|~|c
1|~|2|~|3
11|~|22|~|33
111|~|222|~|333
Let me know if this is what you intended. You can simply redirect, e.g. > file2.txt to write to the second file.

For such cases, my bash+awk script rcut comes in handy:
rcut -Fd'|~|' -f-3 ip.txt
The -F option enables fixed string input delimiter (which is given using the -d option). And by default, the output field separator will also be same as -d when -F is active. -f-3 is similar to cut syntax to specify first three fields.
For better speed, use hck command:
hck -Ld'|~|' -D'|~|' -f-3 ip.txt
Here, -L enables literal field separator and -D specifies output field separator.
Another benefit is that hck supports -z option to automatically handle common compressed formats based on filename extension (adding this since OP had an issue with compressed input).

Another way:
sed 's/|~|/\t/g' file1.txt | awk '{print $1"|~|"$2"|~|"$3}' > file2.txt
First replace the |~| delimiter, and use the default awk separator, then print columns what you need.

How to add N blank lines between all rows of a text file?

I have a file that looks
a
b
c
d
Suppose I want to add N lines (in the example 3, but I actually need 20 or 100 depending on the file)
a
b
c
d
I can add one blank line between all of them with sed
sed -i '0~1 a\\' file
But sed -i '0~3 a\\' file inserts one line every 3 rows.

You may use with GNU sed:
sed -i 'G;G;G' file
The G;G;G will append three empty lines below each non-final line.
Or, awk:
awk 'BEGIN{ORS="\n\n\n"};1'
See an online sed and awk demo.
If you need to set the number of newlines dynamically use
nl="
"
awk -v nl="$nl" 'BEGIN{for(c=0;c<3;c++) v=v""nl;ORS=v};1' file > newfile

With GNU awk:
awk -i inplace -v lines=3 '{print; for(i=0;i<lines;i++) print ""}' file
Update with Ed's hints (see comments):
awk -i inplace -v lines=3 '{print; for(i=1;i<=lines;i++) print ""}' file
Update (without trailing empty lines):
awk -i inplace -v lines=3 'NR==1; NR>1{for(i=1;i<=lines;i++) print ""; print}' file
Output to file:
a
b
c
d

With sed and corutils:
N=4
sed "\$b;$(yes G\; | head -n$N)" infile
Similar trick with awk:
N=4
awk 1 RS="$(yes \\n | head -n$N | tr -d '\n')" infile

This might work for you (GNU sed):
sed ':a;G;s/\n/&/2;Ta' file
This will add 2 blank lines following each line.
Change 2 to what ever number you desire between each line.
An alternative (more efficient?):
sed '1{x;:a;/^.\{2\}/!s/^/\n/;ta;s/.//;x};G' file

sed replace text between comma

I have csv files that need to be changed f -> 0 and t -> 1 only between commas for every single csv if it matches. From:
,t,t,f,f,a,t,f,t,f,f,t,f,
tftf
to:
,1,1,0,0,a,1,0,1,0,0,1,0,
tftf
Works this way, but want to know better way that could reduce the replacing time consume
for i in 1 2 3 4 5 6
do
echo "converting tables for mariaDB"
find ./ -type f -name "*.csv" -print0 | xargs -0 sed -i 's/\,t\,/\,1\,/g'
find ./ -type f -name "*.csv" -print0 | xargs -0 sed -i 's/\,f\,/\,0\,/g'
echo "$i time(s) changed "
done
I except , one single command will change the line

Could you please try following. Though it is not perfect solution but would be simplest use it in case you don't have gawk's latest version where -inplace edit option is present.
for file in *.csv
awk '{gsub(/,t,/,",1,");gsub(/,f,/,",0,");gsub(/,t,/,",1,");gsub(/,f,/,",0,")} 1' "$file" > temp && mv temp"$file"
done
OR
for file in *.csv
awk -v t_val="1" -v f_val="0" 'BEGIN{FS=OFS=","}{for(i=2;i<NF;i++){$i=($i=="t"?t_val:$i=="f"?f_val:$i)}} 1' "$file" > temp && mv temp "$file"
done
2nd solution: Using gawk's latest version where we could save edit into Input_file itself.
gawk -i inplace '{gsub(/,t,/,",1,");gsub(/,f,/,",0,");gsub(/,t,/,",1,");gsub(/,f,/,",0,")} 1' *.csv
OR
gawk -i inplace -v t_val="1" -v f_val="0" 'BEGIN{FS=OFS=","}{for(i=2;i<NF;i++){$i=($i=="t"?t_val:$i=="f"?f_val:$i)}} 1' Input_file

The main problem, in this case, is that a regular expression does not allow overlap when parsing it with sed 's/ere/str/g' or awk '{gsub(ere,str,$0)}'. This comment nicely explains how you can circumvent this in sed using the t<label> command, which means: if a change happened to the pattern space, move to <label>. The comment shows a generic way of doing it. The awk alternative to this rule would be:
$ awk '{while(match($0,ere)) gsub(ere,str)}'
An alternative sed solution in the case of the OP's example could use the following idea:
duplicate all commas. Since we are searching for strings of the form ",t,", this duplication avoid overlap using s.
since no overlap is possible, replace all ",f," with ",0," and all ",t," with ",1,".
We can now revert all duplicated commas again. As no overlap is allowed, sequences like ,,,, will be nicely converted to ,, and not ,
In POSIX sed this looks like:
$ sed -e 's/,/,,/g' -e 's/,f,/,0,/g' \
-e 's/,t,/,1,/g' -e 's/,,/,/g' file > file.tmp
$ mv file.tmp file
With GNU sed we can do it in one go:
$ sed -i 's/,/,,/g;s/,f,/,0,/g;s/,t,/,1,/g;s/,,/,/g' file
With awk, this would look like:
$ awk 'BEGIN{FS=",";OFS=FS FS}
{$1=$1;gsub(/,f,/,",0,");gsub(/,t,/,",1,");gsub(OFS,FS)}1' file > file.tmp
$ mv file.tmp file

How can I print only lines that are immediately preceeded by an empty line in a file using sed?

I have a text file with the following structure:
bla1
bla2
bla3
bla4
bla5
So you can see that some lines of text are preceeded by an empty line.
I understand that sed has the concept of two buffers, a pattern space buffer and a hold space buffer, so I'm guessing these need to come in to play here, but I'm unclear how to specify them to accomplish what I need.
In my contrived example above, I'd expect to see the following lines outputted:
bla3
bla5

sed is for doing s/old/new on individual lines, that is all. Any time you start talking about buffers or doing anything related to multi-lines comparisons you're using the wrong tool.
You could do this with awk:
$ awk -v RS= -F'\n' 'NR>1{print $1}' file
bla3
bla5
but it would fail to print the first non-empty line if the first line(s) in the file were empty so this may be what you want if you want lines of all space chars considered to be empty lines:
$ awk 'NF && !p{print} {p=NF}' file
bla3
bla5
and this otherwise:
$ awk '($0!="") && (p==""){print} {p=$0}' file
bla3
bla5
All of the above will work even if there are multiple empty lines preceding any given non-empty line.
To see the difference between the 3 approaches (which you won't see given the sample input in the question):
PS1> printf '\nfoo\n \nbar\n\netc\n' | cat -E
$
foo$
$
bar$
$
etc$
PS1> printf '\nfoo\n \nbar\n\netc\n' | awk -v RS= -F'\n' 'NR>1{print $1}'
etc
PS1> printf '\nfoo\n \nbar\n\netc\n' | awk 'NF && !p{print} {p=NF}'
foo
bar
etc
PS1> printf '\nfoo\n \nbar\n\netc\n' | awk '($0!="") && (p==""){print} {p=$0}'
foo
etc

You can use the hold buffer easily to print the line before the blank like this:
sed -n -e '/^$/{x; p;}' -e h input
But I don't see an easy way to use it for your use case. For your case, instead of using the hold buffer, you could do:
sed -n -e '/^$/ba' -e d -e :a -e n -e p input
But I would do this with awk.
awk 'NR!=1{print $1}' RS= FS=\\n input-file

awk 'p;{p=/^$/}' file
above command does these for each line:
if p is 1, print line;
if line is empty, set p to 1.
if lines consisting of one or more spaces are also considered empty:
awk 'p;{p=!NF}' file
to print non-empty lines each coming right after an empty line, you can use this:
awk 'p*!(p=/^$/)' file
if p is 1 and this line is not empty (1*!(0) = 1*1 = 1), print this line;
otherwise (1*!(1) = 1*0 = 0, 0*anything = 0), don't print anything.
note that this one may not work with all awks, a portable version of this would look like:
awk 'p*(/./);{p=/^$/}' file
if lines consisting of one or more spaces are also considered empty:
awk 'p*NF;{p=!NF}' file
see them online here, and here.

If sed/awk is not mandatory, you can do it with grep:
grep -A 1 '^$' input.txt | grep -v -E '^$|--'

You can use sed to match a range of lines and do sub-matches inside the matches, like so:
# - use the "-n" option to omit printing of lines
# - match lines between a blank line (/^$/) and a non-blank one (/^./),
# then print only the line that contains at least a character,
# i.e, the non-blank line.
sed -ne '
/^$/,/^./ {
/^./{ p; }
}' input.txt

tested by gnu sed, your data in 'a':
$ sed -nE '/^$/{N;s/\n(.+)/\1/p}' a
bla3
bla5
add -i option precedes -n to real editing

Append prefix to first column of a file with awk

I have a couple of hundreds of files which I want to process with xargs. They all need a fix of their first column.
Therefore I need an awk command to append the prefix "ID_" to the first column of a file (except for the first header line). Can anyone help me with this?
Something along the line:
gawk -f ';' "{$1='ID_' $1; print $0}" file.csv > file_processed.csv
I am not expert for the command, though. And I would rather like to have some inplace processing instead of making a copy of each file. Beforehand, I made it in VIM, but then I only had one file.
:%s/^-/ID_/
I hope someone can help me here.

gawk 'BEGIN{FS=";"; OFS=";"} {if(NR>1) $1="ID_"$1; print}' file.csv > file_processed.csv
FS and OFS set the input and output field separators, respectively.
NR>1 checks whether current line number is larger than 1, so we don't modify the header line.
You can also modify the file in place with -i inplace option:
gawk -i inplace 'BEGIN{FS=";"; OFS=";"} {if(NR>1) $1="ID_"$1; print}' file.csv
Edit
After elaborating the original question, here's the final version:
gawk -i inplace 'BEGIN{FS=OFS=";"} NR>1{sub(/^-/,"ID_",$2)} 1' file.csv
which substitutes - in the beginning of second column with ID_.
NR>1 action applies for all but first (header) line. 1 invokes the default default print action.

If you just want to do something, particularly adding a prefix, on the first field, it is not different from adding the prefix to the whole line.
So you can just awk '$0 = "ID_" $0' file.csv it should do the work. If you want to make it "change in place", you can:
awk '$0="ID_"$0' csv >/tmp/foo && mv /tmp/foo file.csv
You can also make use of sed:
sed -i 's/^/ID_/' file
The -i does "in-place modification"
You mentioned vim, and gave s/^-/ID_/ cmd, it doesn't add the prefix (ID_), it will replace the leading - by the ID_, they are different.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Inserting characters into specified fields in large files - awk

gawk to the rescue! $ awk -F\| -v OFS=\| '{$4=gensub(/(..)(..)(..)/,"\\1:\\2:\\3","g",$4)}1' file 001|1|3|05:34:12|16|1234||| 001|21|4|12:36:18|15|88||| otherwise you can do the same with substr($4,1,2)":"...

With sed: $ sed -r 's/([^|]|[^|]|[^|]*|)([0-9]{2})([0-9]{2})([0-9]{2})/\1\2:\3:\4/' file 001|1|3|05:34:12|16|1234||| 001|21|4|12:36:18|15|88|||

This might work for you (GNU sed): sed -r 's/^(([^|]*\|){3})(..)(..)/\1\3:\4:/' file Use back references to group the first three fields and the following two pairs. Then format the 4th field as required.

Related

awk command to print columns with colum data

How to add N blank lines between all rows of a text file?

sed replace text between comma

How can I print only lines that are immediately preceeded by an empty line in a file using sed?

Append prefix to first column of a file with awk

Categories

Resources

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Inserting characters into specified fields in large files - awk

gawk to the rescue! $ awk -F\| -v OFS=\| '{$4=gensub(/(..)(..)(..)/,"\\1:\\2:\\3","g",$4)}1' file 001|1|3|05:34:12|16|1234||| 001|21|4|12:36:18|15|88||| otherwise you can do the same with substr($4,1,2)":"...

With sed: $ sed -r 's/([^|]*|[^|]*|[^|]*|)([0-9]{2})([0-9]{2})([0-9]{2})/\1\2:\3:\4/' file 001|1|3|05:34:12|16|1234||| 001|21|4|12:36:18|15|88|||

This might work for you (GNU sed): sed -r 's/^(([^|]*\|){3})(..)(..)/\1\3:\4:/' file Use back references to group the first three fields and the following two pairs. Then format the 4th field as required.

Related

awk command to print columns with colum data

How to add N blank lines between all rows of a text file?

sed replace text between comma

How can I print only lines that are immediately preceeded by an empty line in a file using sed?

Append prefix to first column of a file with awk

Categories

Resources

With sed: $ sed -r 's/([^|]|[^|]|[^|]*|)([0-9]{2})([0-9]{2})([0-9]{2})/\1\2:\3:\4/' file 001|1|3|05:34:12|16|1234||| 001|21|4|12:36:18|15|88|||