I know I can use sed 's/[[:blank:]]/,/g' to convert blank spaces into commas or anything of my choosing in my file, but is there a way to somehow set it so that, only the first 5 instances of whitespace convert them into a comma?
This is because my last column has a lot of information written out, so it is annoying when sed coverts all the spaces in that column into commas.
Sample input file:
sample1 gi|11| 123 33 97.23 This is a sentence
sample2 gi|22| 234 33 97.05 Keep these spaces
And the output I was looking for:
sample1,gi|11|,123,33,97.23,This is a sentence
sample2,gi|22|,234,33,97.05,Keep these spaces
Only the first 5 chains of whitespace are changed to a comma.
With GNU awk for the 3rd arg to match():
$ awk '{ match($0,/((\S+\s+){5})(.*)/,a); gsub(/\s+/,",",a[1]); print a[1] a[3] }' file
sample1,gi|11|,123,33,97.23,This is a sentence
sample2,gi|22|,234,33,97.05,This is a sentence
but I'd recommend you actually turn it into a valid CSV (i.e. one that conforms to RFC 4180) such as could be read by MS-Excel and other tools since "This is a sentence" (and possibly other fields) can presumably include commas and double quotes:
$ awk '{
gsub(/"/,"\"\"");
match($0,/((\S+\s+){5})(.*)/,a)
gsub(/\s+/,"\",\"",a[1])
print "\"" a[1] a[3] "\""
}' file
"sample1","gi|11|","123","33","97.23","This is a sentence"
"sample2","gi|22|","234","33","97.05","This is a sentence"
For example given this input:
$ cat file
sample1 gi|11| 123 33 97.23 This is a sentence
a,b,sample2 gi|22| 234 33 97.05 This is, "typically", a sentence
The output from the first script is not valid CSV:
$ awk '{ match($0,/((\S+\s+){5})(.*)/,a); gsub(/\s+/,",",a[1]); print a[1] a[3] }' file
sample1,gi|11|,123,33,97.23,This is a sentence
a,b,sample2,gi|22|,234,33,97.05,This is, "typically", a sentence
while the output from the 2nd script IS valid CSV:
$ awk '{ gsub(/"/,"\"\""); match($0,/((\S+\s+){5})(.*)/,a); gsub(/\s+/,"\",\"",a[1]); print "\"" a[1] a[3] "\"" }' file
"sample1","gi|11|","123","33","97.23","This is a sentence"
"a,b,sample2","gi|22|","234","33","97.05","This is, ""typically"", a sentence"
perl's split can limit the number of fields with its third argument:
$ perl -lnE 'say join ",", split(" ",$_,6)' file
sample1,gi|11|,123,33,97.23,This is a sentence
sample2,gi|22|,234,33,97.05,This is a sentence
If fields might require quoting:
perl -lnE 'say join ",", map { s/"/""/g || y/,// ? "\"$_\"" : $_ } split(" ",$_,6)
' file
Ruby has a str.split that can take a limit:
ruby -ne 'puts $_.split(/\s+/,6).join(",")' file
sample1,gi|11|,123,33,97.23,This is a sentence
sample2,gi|22|,234,33,97.05,This is a sentence
As does Perl:
perl -lnE 'say join ",", split /\s+/,$_,6 ' file
# same
This might work for you (GNU sed):
sed -E 's/\s+/&\n/5;h;s/\n.*//;s/\s+/,/g;G;s/\n.*\n//' file
Append a newline to the 5th occurrence of a group of whitespace.
Make a copy of the amended line in the hold space.
Remove the section from the inserted newline to the end of the line.
Translate groups of whitespace into commas.
Append the copy.
Remove the section between newlines.
Thus the first five groups of whites space are converted to commas and the remaining groups are untouched.
Here is a way to do with 3 sed commands. This requires GNU sed, which supports the /ng (e.g. /6g) pattern flag. That will only apply the substitution from the nth occurrence on. Also note: this method will compress multiple spaces in the last column permanently.
sed 's/ \+/␣/6g' | sed 's/ \+/,/g' | sed 's/␣/ /g'
Another variation: Do the multiple space compression as a separate step with tr -s ' '. This may be more readable.
tr -s ' ' | sed 's/ /␣/6g' | sed 's/ /,/g' | sed 's/␣/ /g'
Another variation: Compress all whitespaces, not just spaces with \s
sed 's/\s\+/␣/6g' | sed 's/\s\+/,/g' | sed 's/␣/ /g'
Explanation:
The first step "protects" the spaces from the 6th occurrence on, by converting them to a special character. I've used ␣ here (unicode U+2423), but you could use any character that doesn't exist in the source data, such as \x00, {space}, etc.
sed 's/ \+/␣/6g'
The second step converts the remaining spaces to commas.
sed 's/ \+/,/g'
The third step converts the "protected" spaces back to spaces.
sed 's/␣/ /g'
I have a file with three columns. I would like to delete the 3rd column(in-place editing). How can I do this with awk or sed?
123 abc 22.3
453 abg 56.7
1236 hjg 2.3
Desired output
123 abc
453 abg
1236 hjg
try this short thing:
awk '!($3="")' file
With GNU awk for inplace editing, \s/\S, and gensub() to delete
1) the FIRST field:
awk -i inplace '{sub(/^\S+\s*/,"")}1' file
or
awk -i inplace '{$0=gensub(/^\S+\s*/,"",1)}1' file
2) the LAST field:
awk -i inplace '{sub(/\s*\S+$/,"")}1' file
or
awk -i inplace '{$0=gensub(/\s*\S+$/,"",1)}1' file
3) the Nth field where N=3:
awk -i inplace '{$0=gensub(/\s*\S+/,"",3)}1' file
Without GNU awk you need a match()+substr() combo or multiple sub()s + vars to remove a middle field. See also Print all but the first three columns.
This might work for you (GNU sed):
sed -i -r 's/\S+//3' file
If you want to delete the white space before the 3rd field:
sed -i -r 's/(\s+)?\S+//3' file
It seems you could simply go with
awk '{print $1 " " $2}' file
This prints the two first fields of each line in your input file, separated with a space.
Try using cut... its fast and easy
First you have repeated spaces, you can squeeze those down to a single space between columns if thats what you want with tr -s ' '
If each column already has just one delimiter between it, you can use cut -d ' ' -f-2 to print fields (columns) <= 2.
for example if your data is in a file input.txt you can do one of the following:
cat input.txt | tr -s ' ' | cut -d ' ' -f-2
Or if you better reason about this problem by removing the 3rd column you can write the following
cat input.txt | tr -s ' ' | cut -d ' ' --complement -f3
cut is pretty powerful, you can also extract ranges of bytes, or characters, in addition to columns
excerpt from the man page on the syntax of how to specify the list range
Each LIST is made up of one range, or many ranges separated by commas.
Selected input is written in the same order that it is read, and is
written exactly once. Each range is one of:
N N'th byte, character or field, counted from 1
N- from N'th byte, character or field, to end of line
N-M from N'th to M'th (included) byte, character or field
-M from first to M'th (included) byte, character or field
so you also could have said you want specific columns 1 and 2 with...
cat input.txt | tr -s ' ' | cut -d ' ' -f1,2
Try this :
awk '$3="";1' file.txt > new_file && mv new_file file.txt
or
awk '{$3="";print}' file.txt > new_file && mv new_file file.txt
Try
awk '{$3=""; print $0}'
If you're open to a Perl solution...
perl -ane 'print "$F[0] $F[1]\n"' file
These command-line options are used:
-n loop around every line of the input file, do not automatically print every line
-a autosplit mode – split input lines into the #F array. Defaults to splitting on whitespace
-e execute the following perl code
I have lines like this:
Volume.Free_IBM_LUN59_28D: 2072083693568
I would like to get only IBM_LUN59_28D from this line using awk.
Thanks
You can use sub to do substitutions on each input line, as per the following transcript:
pax> echo 'Volume.Free_IBM_LUN59_28D: 2072083693568' | awk '
...> {
...> sub (".*Free_", "");
...> sub (":.*", "");
...> print
...> }'
IBM_LUN59_28D
That command crosses multiple lines for readability but, if you're operating on a file and not too concerned about readability, you can just use the compressed version:
awk '{sub(".*Free_","");sub(":.*","");print}' inputFile
If you're amenable to non-awk solutions, you could also use sed:
sed -e 's/.*Free_//' -e 's/:.*//' inputFile
Note that both those solutions rely on your (somewhat sparse) test data. If your definition of "like" includes preceding textual segments other than Free_ or subsequent characters other than :, some more work may be needed.
For example, if you wanted the string between the first _ and the first :, you could use:
awk '{sub("[^_]*_","");sub(":.*","");print}'
With sed:
sed 's/[^_]*_\(.*\):.*/\1/'
Search for sequence of non _ characters followed by _ (this will match Volume.Free_), then another sequence of characters (this will match IBM_LUN59_28D, we group this for future use), followed by : and any char sequence. Substitute with the saved pattern (\1). That's it.
Sample:
$ echo "Volume.Free_IBM_LUN59_28D: 2072083693568" | sed 's/[^_]*_\(.*\):.*/\1/'
IBM_LUN59_28D
Here is one awk
awk -F"Free_" 'NF>1{split($2,a,":");print a[1]}'
Eks:
echo "Volume.Free_IBM_LUN59_28D: 2072083693568" | awk -F"Free_" 'NF>1{split($2,a,":");print a[1]}'
IBM_LUN59_28D
It divides the line by Free_.
If line then have more than one field NF>1 then:
Split second field bye : and print first part a[1]
With awk:
echo "$val" | awk -F: '{print $1}' | awk -F. '{print $2}' | awk '{print substr($0,6)}'
where the given string is in $val.
The requirement is very simple i feel.
Input string format:
DTC_SubrProfile_20141205230707.unl
Required output format:
SubrProfile
Meaning, "DTC_" "_20141205230707.unl" should be removed from the input string.
Is there possible way we can achieve it using awk gsub?
Through sed,
$ echo 'DTC_SubrProfile_20141205230707.unl' | sed 's/^[^_]*_\|_.*//g'
SubrProfile
Through awk,
$ echo 'DTC_SubrProfile_20141205230707.unl' | awk '{gsub(/^[^_]*_|_.*/,"")}1'
SubrProfile
The above commands would remove all the characters from the start to the first underscore and then it removes from the _ upto the last from the remaining string.
$ echo 'DTC_SubrProfile_20141205230707.unl' | awk -F'_' '{print $2}'
SubrProfile
The above awk would print the second column according to the input Field Separator _
by cut
echo "DTC_SubrProfile_20141205230707.unl"|cut -d _ -f2
I'm trying to do something pretty simple but its appears more complicated than expected...
I've lines in a text file, separated by the comma and that I want to output to another file, without the first field.
Input:
echo file1,item, 12345678 | awk -F',' '{OFS = ";";$1=""; print $0}'
Output:
;item; 12345678
As you can see the spaces before 12345678 are kind of merged into one space only.
I also tried with the cut command:
echo file1,item, 12345678 | cut -d, -f2-
and I ended up with the same result.
Is there any workaround to handle this?
Actually my entire script is as follows:
cat myfile | while read l_line
do
l_line="'$l_line'"
v_OutputFile=$(echo $l_line | awk -F',' '{print $1}')
echo $(echo $l_line | cut -d, -f2-) >> ${v_OutputFile}
done
But stills in l_line all spaces but one are removed. I also created the quotes inside the file but same result.
it has nothing to do with awk. quote the string in your echo:
#with quotes
kent$ echo 'a,b, c'|awk -F, -v OFS=";" '{$1="";print $0}'
;b; c
#without quotes
kent$ echo a,b, c|awk -F, -v OFS=";" '{$1="";print $0}'
;b; c
The problem is with your invocation of the echo command you're using to feed awk the test data above. The shell is looking at this command:
echo file1,item, 12345678
and treating file1,item, and 12345678 as two separate parameters to echo. echo just prints all its parameters, separated by one space.
If you were to quote the whitespace, as follows:
echo 'file1,item, 12345678'
the shell would interpret this as a single parameter to feed to echo, so you'd get the expected result.
Update after edit to OP - having seen your full script, you could do this entirely in awk:
awk -F, '{ OFS = "," ; f = $1 ; sub("^[^,]*,","") ; print $0 >> f }' myfile