substring in awk using gsub regular expression - awk

The requirement is very simple i feel.
Input string format:
DTC_SubrProfile_20141205230707.unl
Required output format:
SubrProfile
Meaning, "DTC_" "_20141205230707.unl" should be removed from the input string.
Is there possible way we can achieve it using awk gsub?

Through sed,
$ echo 'DTC_SubrProfile_20141205230707.unl' | sed 's/^[^_]*_\|_.*//g'
SubrProfile
Through awk,
$ echo 'DTC_SubrProfile_20141205230707.unl' | awk '{gsub(/^[^_]*_|_.*/,"")}1'
SubrProfile
The above commands would remove all the characters from the start to the first underscore and then it removes from the _ upto the last from the remaining string.
$ echo 'DTC_SubrProfile_20141205230707.unl' | awk -F'_' '{print $2}'
SubrProfile
The above awk would print the second column according to the input Field Separator _

by cut
echo "DTC_SubrProfile_20141205230707.unl"|cut -d _ -f2

Related

How can i use three blanks as a delimiter?

I need to set the delimiter from a : to three blanks.
So far my code is as follows:
cut -d f1,6,7 test | tr : ' '
Dont mind anything before the pipe, that is just for formatting of the file "test". It will replace the delimiter with one blank, but not three, how do i accomplish this?
This should work:
awk -F: -v OFS=" " '{print $1,$6,$7}' test
-F sets the input field separator to colon, -v sets the output field separator to 3 spaces, then the awk body prints the desired fields
The sed option would be
cut -d: -f1,6,7 test | sed 's/:/ /g'

AWK get specificic pattern

I have lines like this:
Volume.Free_IBM_LUN59_28D: 2072083693568
I would like to get only IBM_LUN59_28D from this line using awk.
Thanks
You can use sub to do substitutions on each input line, as per the following transcript:
pax> echo 'Volume.Free_IBM_LUN59_28D: 2072083693568' | awk '
...> {
...> sub (".*Free_", "");
...> sub (":.*", "");
...> print
...> }'
IBM_LUN59_28D
That command crosses multiple lines for readability but, if you're operating on a file and not too concerned about readability, you can just use the compressed version:
awk '{sub(".*Free_","");sub(":.*","");print}' inputFile
If you're amenable to non-awk solutions, you could also use sed:
sed -e 's/.*Free_//' -e 's/:.*//' inputFile
Note that both those solutions rely on your (somewhat sparse) test data. If your definition of "like" includes preceding textual segments other than Free_ or subsequent characters other than :, some more work may be needed.
For example, if you wanted the string between the first _ and the first :, you could use:
awk '{sub("[^_]*_","");sub(":.*","");print}'
With sed:
sed 's/[^_]*_\(.*\):.*/\1/'
Search for sequence of non _ characters followed by _ (this will match Volume.Free_), then another sequence of characters (this will match IBM_LUN59_28D, we group this for future use), followed by : and any char sequence. Substitute with the saved pattern (\1). That's it.
Sample:
$ echo "Volume.Free_IBM_LUN59_28D: 2072083693568" | sed 's/[^_]*_\(.*\):.*/\1/'
IBM_LUN59_28D
Here is one awk
awk -F"Free_" 'NF>1{split($2,a,":");print a[1]}'
Eks:
echo "Volume.Free_IBM_LUN59_28D: 2072083693568" | awk -F"Free_" 'NF>1{split($2,a,":");print a[1]}'
IBM_LUN59_28D
It divides the line by Free_.
If line then have more than one field NF>1 then:
Split second field bye : and print first part a[1]
With awk:
echo "$val" | awk -F: '{print $1}' | awk -F. '{print $2}' | awk '{print substr($0,6)}'
where the given string is in $val.

Why I can't use as delimiter in awk the string "?B?"

By running the following I am getting as a result the string "utf-8"
I thought that with this command I would had string "tralala" returned
echo "=?utf-8?B?tralala" | awk -F "?B?" '{print $2 }'
Why is that?
What delimiter should I use in order to get the string "tralala" ?
? is a regex metacharacter that means zero or one matches of the preceding atom. (I'm surprised awk didn't complain about the one at the start but .)
Try echo "=?utf-8?B?tralala" | awk -F '\\?B\\?' '{print $2 }' instead.
Awk delimiters are NOT strings, they are "Field Separators" (hence the variable named FS) which are a type of Extended Regular Expression with some additional features (e.g. a single blank char as the field separator when not inside square brackets means separate by all chains of contiguous white space and ignore leading and trailing white space on each record).
The difference between a string, a regular expression, and a field separator are very important to be aware of. You sometimes also see the word "pattern" used - do not use that term, it has no (or too many possible) meaning.
A ? is an RE metacharacter so you need to tell awk not to treat it as such in your case by either of these methods:
$ echo "=?utf-8?B?tralala" | awk -F '[?]B[?]' '{print $2}'
tralala
$ echo "=?utf-8?B?tralala" | awk -F '\\?B\\?' '{print $2}'
tralala
You don't strictly need to do that for the first ? as it's metacharacter functionality is not applicable when it's the first char in an RE:
$ echo "=?utf-8?B?tralala" | awk -F '?B[?]' '{print $2}'
tralala
$ echo "=?utf-8?B?tralala" | awk -F '?B\\?' '{print $2}'
tralala
but IMHO it's best to do it anyway for clarity and future-proofing.

awk to transpose lines of a text file

A .csv file that has lines like this:
20111205 010016287,1.236220,1.236440
It needs to read like this:
20111205 01:00:16.287,1.236220,1.236440
How do I do this in awk? Experimenting, I got this far. I need to do it in two passes I think. One sub to read the date&time field, and the next to change it.
awk -F, '{print;x=$1;sub(/.*=/,"",$1);}' data.csv
Use that awk command:
echo "20111205 010016287,1.236220,1.236440" | \
awk -F[\ \,] '{printf "%s %s:%s:%s.%s,%s,%s\n", \
$1,substr($2,1,2),substr($2,3,2),substr($2,5,2),substr($2,7,3),$3,$4}'
Explanation:
-F[\ \,]: sets the delimiter to space and ,
printf "%s %s:%s:%s.%s,%s,%s\n": format the output
substr($2,0,3): cuts the second firls ($2) in the desired pieces
Or use that sed command:
echo "20111205 010016287,1.236220,1.236440" | \
sed 's/\([0-9]\{8\}\) \([0-9]\{2\}\)\([0-9]\{2\}\)\([0-9]\{2\}\)\([0-9]\{3\}\)/\1 \2:\3:\4.\5/g'
Explanation:
[0-9]\{8\}: first match a 8-digit pattern and save it as \1
[0-9]\{2\}...: after a space match 3 times a 2-digit pattern and save them to \2, \3 and \4
[0-9]\{3\}: and at last match 3-digit pattern and save it as \5
\1 \2:\3:\4.\5: format the output
sed is better suited to this job since it's a simple substitution on single lines:
$ sed -r 's/( ..)(..)(..)/\1:\2:\3./' file
20111205 01:00:16.287,1.236220,1.236440
but if you prefer here's GNU awk with gensub():
$ awk '{print gensub(/( ..)(..)(..)/,"\\1:\\2:\\3.","")}' file
20111205 01:00:16.287,1.236220,1.236440

How to preserve spaces in input fields with awk

I'm trying to do something pretty simple but its appears more complicated than expected...
I've lines in a text file, separated by the comma and that I want to output to another file, without the first field.
Input:
echo file1,item, 12345678 | awk -F',' '{OFS = ";";$1=""; print $0}'
Output:
;item; 12345678
As you can see the spaces before 12345678 are kind of merged into one space only.
I also tried with the cut command:
echo file1,item, 12345678 | cut -d, -f2-
and I ended up with the same result.
Is there any workaround to handle this?
Actually my entire script is as follows:
cat myfile | while read l_line
do
l_line="'$l_line'"
v_OutputFile=$(echo $l_line | awk -F',' '{print $1}')
echo $(echo $l_line | cut -d, -f2-) >> ${v_OutputFile}
done
But stills in l_line all spaces but one are removed. I also created the quotes inside the file but same result.
it has nothing to do with awk. quote the string in your echo:
#with quotes
kent$ echo 'a,b, c'|awk -F, -v OFS=";" '{$1="";print $0}'
;b; c
#without quotes
kent$ echo a,b, c|awk -F, -v OFS=";" '{$1="";print $0}'
;b; c
The problem is with your invocation of the echo command you're using to feed awk the test data above. The shell is looking at this command:
echo file1,item, 12345678
and treating file1,item, and 12345678 as two separate parameters to echo. echo just prints all its parameters, separated by one space.
If you were to quote the whitespace, as follows:
echo 'file1,item, 12345678'
the shell would interpret this as a single parameter to feed to echo, so you'd get the expected result.
Update after edit to OP - having seen your full script, you could do this entirely in awk:
awk -F, '{ OFS = "," ; f = $1 ; sub("^[^,]*,","") ; print $0 >> f }' myfile