Awk or sed to prepend missing zeros in mac addresses - awk

I've got a file consisting of IPs and MAC address pairs and I need to pad the MAC addresses with zeros in each octet, but I don't want to change the IP. So this...
10.5.96.41 0:0:e:4c:b7:42
10.5.96.42 c4:f7:0:13:ef:32
10.5.96.43 0:e8:4c:60:2b:42
10.5.96.44 0:6a:bf:b:35:f1
Should get changed to this...
10.5.96.41 00:00:0e:4c:b7:42
10.5.96.42 c4:f7:00:13:ef:32
10.5.96.43 00:e8:4c:60:2b:42
10.5.96.44 00:6a:bf:0b:35:f1
I tried sed 's/\b\(\w\)\b/0\1/g' but that produces:
10.05.96.41 00:00:0e:4c:b7:42
10.05.96.42 c4:f7:00:13:ef:32
10.05.96.43 00:e8:4c:60:2b:42
10.05.96.44 00:6a:bf:0b:35:f1
which is not desired because I only want to effect the MAC address portion.

Since you've tagged macos, I'm not sure if this will work for you. I tested it on GNU awk
$ awk '{gsub(/\<[0-9a-f]\>/, "0&", $2)} 1' ip.txt
10.5.96.41 00:00:0e:4c:b7:42
10.5.96.42 c4:f7:00:13:ef:32
10.5.96.43 00:e8:4c:60:2b:42
10.5.96.44 00:6a:bf:0b:35:f1
awk is good for field processing, here you can simply perform substitution only for second field
But, I see \b and \w with your sed command, so you are using GNU sed? If so,
sed -E ':a s/( .*)(\b\w\b)/\10\2/; ta' ip.txt
With perl
$ perl -lane '$F[1] =~ s/\b\w\b/0$&/g; print join " ", #F' ip.txt
10.5.96.41 00:00:0e:4c:b7:42
10.5.96.42 c4:f7:00:13:ef:32
10.5.96.43 00:e8:4c:60:2b:42
10.5.96.44 00:6a:bf:0b:35:f1
If you want to get adventurous, specify that you want to avoid replacing first field:
perl -pe 's/^\H+(*SKIP)(*F)|\b\w\b/0$&/g' ip.txt

With any sed that uses -E to support EREs, e.g. GNU sed or OSX/BSD (MacOS) sed:
$ sed -E 's/[ :]/&0/g; s/0([^:]{2}(:|$))/\1/g' file
10.5.96.41 00:00:0e:4c:b7:42
10.5.96.42 c4:f7:00:13:ef:32
10.5.96.43 00:e8:4c:60:2b:42
10.5.96.44 00:6a:bf:0b:35:f1
and with any sed:
$ sed 's/[ :]/&0/g; s/0\([^:][^:]:\)/\1/g; s/0\([^:][^:]$\)/\1/' file
10.5.96.41 00:00:0e:4c:b7:42
10.5.96.42 c4:f7:00:13:ef:32
10.5.96.43 00:e8:4c:60:2b:42
10.5.96.44 00:6a:bf:0b:35:f1

This might work for you (GNU sed):
sed 's/\b.\(:\|$\)/0&/g' file
Prepend a 0 before any single character followed by a : or the end of line.
Other seds may use:
sed 's/\<.\(:\|$\)/0&/g' file

With GNU sed:
sed -E ':a;s/([ :])(.)(:|$)/\10\2\3/g;ta' file
with any sed:
sed ':a;s/\([ :]\)\(.\):/\10\2:/g;ta' file
Explanation (of the GNU version)
:a # a label called 'a', used as a jump target
; # command separator
s # substitute command ...
/([ :])(.)(:|$)/ # search for any single char which is enclosed by
# either two colons, a whitespace and a colon or
# a colon and the end of the line ($)
# Content between () will be matched in a group
# which is used in the replacement pattern
\10\2\3 # replacement pattern: group1 \1, a zero, group2 and
# group3 (see above)
/g # replace as often as possible
; # command separator
ta # jump back to a if the previous s command replaced
# something (see below)
The loop using the label a and the ta command is needed because sed won't match a pattern again if input was already part of a replacement. This would happen in this case for example (first line):
0:0
When the above pattern is applied, sed would replace
<space>0: by <space>00: <- colon
The same colon would not match again as the beginning : of the second zero. Therefore the loop until everything is replaced.

A succinct and precise solution for GNU sed:
sed -Ee 's/\b[0-9a-f](:|$)/0&/gi' file
(On macOS, I recommend installing gsed using brew install gnu-sed.)

very circuitous and verbose solution to deal with mawk not having regex :: back-references - the approach is to prepad every slot with extra zeros, then trim out the excess :
nawk ' sub(".+","\5:&:\3", $NF)^_ + gsub(":", "&00") + \
gsub("[0-9A-Fa-f]{2}:","\6&") + gsub("[^:]*\6|\5:|:00\3$",_)'
mawk ' sub("^", "\5:", $NF)^_ + gsub(":", "&00") + \
gsub("[^ :][^ :](:|$)", "\6&") + gsub("[^:]*\6|\5:",_)'
10.5.96.41 00:00:0e:4c:b7:42
10.5.96.42 c4:f7:00:13:ef:32
10.5.96.43 00:e8:4c:60:2b:42
10.5.96.44 00:6a:bf:0b:35:f1
To do it the proper gawk gensub() way
-- needed 2 calls to gensub() - calling once ended up missing a few ::
gawk -be 'BEGIN { ___ *= __ = "([ :\t])([[:xdigit:]]?)(:|$)"
_="\\10\\2\\3" } $___ = gensub(__, _, "g", gensub(__, _, "g"))'

Related

How can I search for a dot an a number in sed or awk and prefix the number with a leading zero?

I am trying to modify the name of a large number of files, all of them with the following structure:
4.A.1 Introduction to foo.txt
2.C.3 Lectures on bar.pdf
3.D.6 Processes on baz.mp4
5.A.8 History of foo.txt
And I want to add a leading zero to the last digit:
4.A.01 Introduction to foo.txt
2.C.03 Lectures on bar.pdf
3.D.06 Processes on baz.mp4
5.A.08 History of foo.txt
At first I am trying to get the new names with sed (FreeBSD implementation):
ls | sed 's/\.[0-9]/0&/'
But I get the zero before the .
Note: replacing the second dot would also work. I am also open to use awk.
While it may have worked for you here, in general slicing and dicing ls output is fragile, whether using sed or awk or anything else. Fortunately one can accomplish this robustly in plain old POSIX sh using globbing and fancy-pants parameter expansions:
for f in [[:digit:]].[[:alpha:]].[[:digit:]]\ ?*; do
# $f = "[[:digit:]].[[:alpha:]].[[:digit:]] ?*" if no files match.
if [ "$f" != '[[:digit:]].[[:alpha:]].[[:digit:]] ?*' ]; then
tail=${f#*.*.} # filename sans "1.A." prefix
head=${f%"$tail"} # the "1.A." prefix
mv "$f" "${head}0${tail}"
fi
done
(EDIT: Filter out filenames that don't match desired format.)
This pipeline should work for you:
ls | sed 's/\.\([0-9]\)/.0\1/'
The sed command here will capture the digit and replace it with a preceding 0.
Here, \1 references the first (and in this case only) capture group - the parenthesized expression.
I am also open to use awk.
Let file.txt content be:
4.A.1 Introduction to foo.txt
2.C.3 Lectures on bar.pdf
3.D.6 Processes on baz.mp4
5.A.8 History of foo.txt
then
awk 'BEGIN{FS=OFS="."}{$3="0" $3;print}' file.txt
outputs
4.A.01 Introduction to foo.txt
2.C.03 Lectures on bar.pdf
3.D.06 Processes on baz.mp4
5.A.08 History of foo.txt
Explanation: I set dot (.) as both field seperator and output field seperator, then for every line I add leading 0 to third column ($3) by concatenating 0 and said column. Finally I print such altered line.
(tested in GNU Awk 5.0.1)
This might work for you (GNU sed):
sed 's/^\S*\./&0/' file
This appends 0 after the last . in the first string of non-empty characters in each line.
In case it helps somebody else, as an alternative to #costaparas answer:
ls | sed -E -e 's/^([0-9][.][A-Z][.])/\10/' > files
To then create the script the files:
cat files | awk '{printf "mv \"%s\" \"%s\"\n", $0, $0}' | sed 's/\.0/\./' > movefiles.sh

How to extract something occuring after some common paths?

I want to filter out anything that occurs after some common paths. Example, Print out the next word that occurs every pytests/ OR after src/
for "src/cs-test/test_bugcheck_0001.py"
awk -F"/" '{print $2}' works
for "metadata/pytests/ipa-cert.yaml"
awk -F"/pytest/" '{print $2}' | awk -F"." '{print $1}' works
But I want to have these in one awk statement.
metadata/pytests/ipa-cert.yaml
src/cs-test/test_bugcheck_0001.py
Expected result:
ipa-cert
cs-test
I suggest using
sed -E 's,^(.*/pytests/|[^/]+/)([^/.]+).*,\2,' file > newfile
See the online sed demo and the regex demo (not proof).
POSIX ERE pattern details
^ - start of line
(.*/pytests/|[^/]+/) - Group 1: either of the two alternatives:
.*/pytests/ - any 0+ chars as many as possible and then /pytests/ string
| - or
[^/]+/ - a negated bracket expression matching 1+ chars other than / and then a /
([^/.]+) - Group 2: a negated bracket expression matching 1 or more chars other than / and .
.* - any 0 or more chars up to the line end.
The , chars are used as delimiters in the sed command so as not to overescape the pattern that has many / chars.
Simple substitutions on individual strings is what sed is designed to do. With GNU or OSX/BSD sed for -E:
$ sed -E 's:(^|.*/)(pytests|src)/([^/.]+).*:\3:' file
ipa-cert
cs-test
or if you really want to use awk for some reason then with GNU awk for gensub():
$ awk '{print gensub(/(^|.*\/)(pytests|src)\/([^/.]+).*/,"\\3",1)}' file
ipa-cert
cs-test
and with any awk:
$ awk 'match($0,/(^|.*\/)(pytests|src)\/[^/.]+/){$0=substr($0,1,RLENGTH); sub(/.*\//,"")} 1' file
ipa-cert
cs-test

I need to extract I'd from a Google drive urls with sed, gawk or grep

URLs:
1. https://docs.google.com/uc?id=0B3X9GlR6EmbnQ0FtZmJJUXEyRTA&export=download
2. https://drive.google.com/open?id=1TkLq5C7NzzmbRjd7VGRhauNT9Vaap-Py
3. https://drive.google.com/drive/folders/1TkLq5C7NzzmbRjd7VGRhauNT9Vaap-Py?usp=sharing
I need a single regex for these all urls.
This is what I tried to use but didn't get expected results.
sed -E 's/.*\(folders\)?\(id\)?=?\/?(.*)&?.*/\1/'
Expected results:
0B3X9GlR6EmbnQ0FtZmJJUXEyRTA
1TkLq5C7NzzmbRjd7VGRhauNT9Vaap-Py
With your own code updated:
$ cat file
1. https://docs.google.com/uc?id=0B3X9GlR6EmbnQ0FtZmJJUXEyRTA&export=download
2. https://drive.google.com/open?id=1TkLq5C7NzzmbRjd7VGRhauNT9Vaap-Py
3. https://drive.google.com/drive/folders/1TkLq5C7NzzmbRjd7VGRhauNT9Vaap-Py?usp=sharing
$ sed -E 's#.*(folders/|id=)([^?&]+).*#\2#' file
0B3X9GlR6EmbnQ0FtZmJJUXEyRTA
1TkLq5C7NzzmbRjd7VGRhauNT9Vaap-Py
1TkLq5C7NzzmbRjd7VGRhauNT9Vaap-Py
$ sed -E 's#.*(folders/|id=)([^?&]+).*#\2#' file | uniq
0B3X9GlR6EmbnQ0FtZmJJUXEyRTA
1TkLq5C7NzzmbRjd7VGRhauNT9Vaap-Py
And yours updated to sed -E 's#.*(folders/|id=)(.*)(\?|&|$).*#\2#' would work on GNU sed.
You are using -E, so no need to escape group quotes (), and | means OR.
When matching literal ?, you need to escape it.
And the separator of sed can change to other character, which is # here.
Note uniq will only remove adjacent duplicates, if there're duplicates in different places, change it to sort -u instead.
A GNU grep solution :
$ grep -Poi '(id=|folders/)\K[a-z0-9_-]*' file
0B3X9GlR6EmbnQ0FtZmJJUXEyRTA
1TkLq5C7NzzmbRjd7VGRhauNT9Vaap-Py
1TkLq5C7NzzmbRjd7VGRhauNT9Vaap-Py
Also these two give same results, but are more accurate than above shorter sed one:
sed -E 's#.*(folders/|id=)([A-Za-z0-9_-]*).*#\2#'
sed -E 's#.*(folders/|id=)([[:alnum:]_-]*).*#\2#'
Btw, + means one or more occurances, * means zero or more.
A GNU awk version (removes duplicates at the same time):
awk 'match($0,".*(folders/|id=)([A-Za-z0-9_-]+)",m){if(!a[m[2]]++)print m[2]}' file
Could you please try following.
awk 'match($0,/uc\?id=[^&]*|folders\/[^?]*/){value=substr($0,RSTART,RLENGTH);gsub(/.*=|.*\//,"",value);print value}' Input_file
Try this:
sed -E 's/.*(id=|folders\/)([^&?/]*).*/\2/' file
Explanations:
.*(id=|folders\/): after any characters(.*) followed by id= or folders/
([^&?/]*): search and capture any characters except &, ? and /
\2: using backreference, matching string is replaced with the second captured text([^&?/]*)
Edit:
To remove duplicate url, just pipe the command to sort then to uniq(because uniq just removes adjacent duplicate lines, you may want to sort the list before):
sed -E 's/.*(id=|folders\/)([^&?/]*).*/\2/' file | sort | uniq
As #Tiw suggests in edit, you can also pipe to a single command by using sort with the -u flag:
sed -E 's/.*(id=|folders\/)([^&?/]*).*/\2/' file | sort -u
Using Perl
$ cat rohit.txt
1. https://docs.google.com/uc?id=0B3X9GlR6EmbnQ0FtZmJJUXEyRTA&export=download
2. https://drive.google.com/open?id=1TkLq5C7NzzmbRjd7VGRhauNT9Vaap-Py
3. https://drive.google.com/drive/folders/1TkLq5C7NzzmbRjd7VGRhauNT9Vaap-Py?usp=sharing
$ perl -lne ' s/.*\/.*..\/(.*)$/$1/g; s/(.*id=)//g; /(.+?)(&|\?|$)/ and print $1 ' rohit.txt
0B3X9GlR6EmbnQ0FtZmJJUXEyRTA
1TkLq5C7NzzmbRjd7VGRhauNT9Vaap-Py
1TkLq5C7NzzmbRjd7VGRhauNT9Vaap-Py
$

Inserting characters into specified fields in large files

Here is my question. I have several hundred files that are too large to edit utilizing vi editor. I'm looking for a possible awk or sed command to manipulate my files. Bit of a rookie. I have a simplified file:
001|1|3|053412|16|1234|||
001|21|4|123618|15|88|||
The files were created, with the fourth field being in the wrong format.
The fourth field should be 05:34:12 reflecting HH:MM:SS. The time values are correct, I just need to insert the : in the appropriate places.
How do I insert the colons after the second character and the fourth characters in the fourth field? I cannot do it by character count since the values before and after the fourth field are variable.
gawk to the rescue!
$ awk -F\| -v OFS=\| '{$4=gensub(/(..)(..)(..)/,"\\1:\\2:\\3","g",$4)}1' file
001|1|3|05:34:12|16|1234|||
001|21|4|12:36:18|15|88|||
otherwise you can do the same with substr($4,1,2)":"...
With GNU awk for gensub() and inplace editing
awk -i inplace 'BEGIN{FS=OFS="|"} {$4=gensub(/(..)(..)/,"\\1:\\2:",1,$4)} 1' *
Similarly with GNU sed for EREs and inplace editing:
sed -i -E 's/(([^|]*\|){3}..)(..)/\1:\3:/' *
e.g.:
$ awk 'BEGIN{FS=OFS="|"} {$4=gensub(/(..)(..)/,"\\1:\\2:",1,$4)} 1' file
001|1|3|05:34:12|16|1234|||
001|21|4|12:36:18|15|88|||
$ sed -E 's/(([^|]*\|){3}..)(..)/\1:\3:/' file
001|1|3|05:34:12|16|1234|||
001|21|4|12:36:18|15|88|||
With sed:
$ sed -r 's/([^|]*|[^|]*|[^|]*|)([0-9]{2})([0-9]{2})([0-9]{2})/\1\2:\3:\4/' file
001|1|3|05:34:12|16|1234|||
001|21|4|12:36:18|15|88|||
This might work for you (GNU sed):
sed -r 's/^(([^|]*\|){3})(..)(..)/\1\3:\4:/' file
Use back references to group the first three fields and the following two pairs. Then format the 4th field as required.
Try this awk!
awk -F"|" -v OFS="|" '{r=split($4,T,"");for(i=2;i<=r;i+=2){if(i!=r)T[i]=T[i]":"}tmp="";for(i=1;i<=r;i++){tmp=tmp T[i]}$4=tmp;}1' file
001|1|3|05:34:12|16|1234|||
001|21|4|12:36:18|15|88|||
Longer whit explanation:
BEGIN{
FS=OFS="|"; #Field separator and output field separator
}
{
tmp="";
r=split($4,time_field,""); # Chunk field into pieces
for(i=2;i<=r;i+=2) # Loop two by two
{
if(i!=r)
{
time_field[i]=time_field[i]":"; # Add ":"
}
}
for(i=1;i<=r;i++) # Loop over again to rebuild
{
tmp=tmp time_field[i];
}
$4=tmp; #rebuid field
print
}
How you could use it in bash: Save it as whatever.awk
while IFS='' read -r file
do
awk -f whatever.awk "$file" > out_file
done < list_of_files_to_edit.txt
If you want edit the files in place, you can add the option -i to Kenavoz sed command. sed -ri ...

How to work with literal square bracket using awk and foreach iterations

I have a file named mapstring. Because of [ string in my patterns my script is not working. Please help me find a solution to this.
Content of mapstring
BC1 bc1
BC2 bc2
BAD_BIT[0] badl0
BAD_BIT[1] badlleftnr
I am working with following script to replace pattern in file testfile
Content of script
foreach cel (`cat mapstring |awk '{print $1}'`)
echo $cel
grep -wq $cel testfile
if( $status == 0 ) then
set var2 = `grep -w $cel rajeshmap |awk '{print $2}'`
sed -i "s% ${cel} % ${var2} %g" testfile
endif
end
Content of testfile
rajesh jain BAD_BIT[0] 1234 BAD_BIT[1000]
jain rajesh DA[0] snps
raj jain CLK stm
That's because square brackets are reserved in sed's basic regex syntax.
You'll have to escape them (and any other special characters in fact) using backslashes (i.e. \[) before using them later in your script; this can itself be done with sed, e.g.:
sed -re 's/(\[|\])/\\\1/g'
(note that using extended regexes in sed (-r) can make this easier).
Your script is rather inefficient anyhow. You can simply get rid of csh entirely (along with the useless cat and the other stylistic problems), and do this with two connected sed scripts.
sed 's/[][*\\%]/\\&/g;s/\([^ ]*\) *\(.*\)/s%\1%\2%g/' mapstring |
sed -i -f - testfile
This is assuming your sed can accept a script on standard input (-f -) and that your sed dialect does not understand any additional special characters which need to be escaped.
#!/bin/ksh
# or sh
sed 's/[[\\$^&.+*]/\\&/g' mapstring | while read -r OldCel NewCel
do
echo ${OldCel}
sed -i "/${OldCel}/ {
s/.*/ & /;s% ${OldCel} % ${NewCel} %g;s/.\\(.*\\)./\\1/
}" testfile
done
pre escape your cel values for a sed manipulation (you could add other special char if occuring and depending directive given to sed like {( )
try something like this (cannot test, no GNU sed available here)
From the good remarq of #tripleee, this need to be another shell than the one used in the request, script adapted for this