awk to replace a line in a text file and save it - awk

I want to open a text file that has a list of 500 IP addresses. I want to make the following changes to one of the lines and save the file. Is it possible to do that with awk or sed?
current line :
100.72.78.46:1900
changes :
100.72.78.46:1800

You can achieve that with the following:
sed -ie 's/100.72.78.46:1900/100.72.78.46:1800/' file.txt
The i option will update the original file, and a backup file will be created. This will edit only the first occurrence of the pattern. If you want to replace all matching patterns, add a g after the last /
This solution, however (as point out on the comments) fails in many other instances, such as 72100372578146:190032, which would transform into 72100.72.78.46:180032.
To circumvent that, you'd have to do an exact match, and also not treat the . as special character (see here):
sed -ie 's/\<100\.72\.78\.46:1900\>/100.72.78.46:1800/g' file.txt
note the \. and the \<...\> "word boundary" notation for the exact match. This solution worked for me on a Linux machine, but not on a MAC. For that, you would have to use a slightly different syntax (see here):
sed -ie 's/[[:<:]]100\.72\.78\.46:1900[[:>:]]/100.72.78.46:1800/g' file.txt
where the [[:<:]]...[[:>:]] would give you the exact match.
finally, I also realized that, if you have only one IP address per line, you could also use the special characters ^$ for the beginning and end of line, preventing the erroneous replacement:
sed -ie 's/^100\.72\.78\.46:1900$/100.72.78.46:1800/g' file.txt

Related

How to specify a file prefix in gawk

I am trying to identify file extensions from a list of filenames extracted from a floppy disk image. The problem is different from this example where files are already extracted from the disk image. I'm new to gawk so maybe it is not the right tool.
ls Sounddsk2.img -a1 > allfilenames
The command above creates the list of filenames shown below.
flute.pt
flute.ss
flute.vc
guitar.pt
guitar.ss
guitar.vc
The gawk command below identifies files ending in .ss
cat allfilenames | gawk '/[fluteguitar].ss/' > ssfilenames
This would be fine when there are just a few known file names. How do I specify a file prefix in a more generic form?
Unless someone can suggest a better one this seems to be the most generic way to express this. It will work for any prefix filename spelt with uppercase letters, lowercase letters and numbers
cat allfilenames | gawk '/[a-zA-Z0-9].ss/' > ssfilenames
Edit
αғsнιη's first suggested answer and jetchisel's comment prompted me to try using gawk without using cat.
gawk '/^([a-zA-Z0-9])\.ss$/' allfilenames > ssfilenames
and this also worked
gawk '/[a-zA-Z0-9]\.ss/' allfilenames > ssfilenames
Please use find command to deal with matching of names of files, with your shown samples you could try following. You could run this command on directory itself and you need not to store file names into a file and then use awk for it.
find . -regextype egrep -regex '.*/(flute|guitar)\.ss$'
Explanation: Simple explanation would be, using find command's capability to add regextype in it(using egrep style here); where giving regex to match file names fulte OR guitar and make sure its ending with ss here.
You might also use grep with -E for extended regexp and use an alter alternation to match either flute or guitar.
ls Sounddsk2.img -a1 | grep -E "^(flute|guitar)\.ss$" > ssfilenames
The pattern matches:
^ Start of string
(flute|guitar) Match either flute or guitar
\.ss Match .ss
$ End of string
The file ssfilenames contains:
flute.ss
guitar.ss
with the regex you come /[fluteguitar].ss/, this matches on lines having one of these characters in it f, l, u, e, g, i, t, a and r (specified within bracket expression [...],duplicated characters count only once) followed by any single character (except newline here) that a single un-escaped dot . matches, then double ss in any place of a line.
you need to restrict the matching by using the start ^ and end $ of line anchors, as well as using the group of match.
awk '/^(flute|guitar)\.ss$/' allFilesName> ssFileNames
to filter only two files names matched with flute.ss and/or guitar.ss. The group match (...|...) is matches on any one of regexpes separated by the pipe as saying logical OR.
if these are just prefixes and to match any files beginning with these characters and end with .ss, use:
awk '/^(flute|guitar).*\.ss$/' allFilesName> ssFileNames

How to delete the "0"-row for multiple fles in a folder?

Each file's name starts with "input". One example of the files look like:
0.0005
lii_bk_new
traj_new.xyz
0
73001
146300
I want to delete the lines which only includes '0' and the expected output is:
0.0005
lii_bk_new
traj_new.xyz
73001
146300
I have tried with
sed -i 's/^0\n//g' input_*
and
grep -RiIl '^0\n' input_* | xargs sed -i 's/^0\n//g'
but neither works.
Please give some suggestions.
Could you please try changing your attempted code to following, run it on a single Input_file once.
sed 's/^0$//' Input_file
OR as per OP's comment to delete null lines:
sed 's/^0$//;/^$/d' Input_file
I have intentionally not put -i option here first test this in a single file of output looks good then only run with -i option on multiple files.
Also problem in your attempt was, you are putting \n in regex of sed which is default separator of line, we need to put $ in it to tell sed delete those lines which starts and ends with 0.
In case you want to take backup of files(considering that you have enough space available in your file system) you could use -i.bak option of sed too which will take backup of each file before editing(this isn't necessary but for safer side you have this option too).
$ sed '/^0$/d' file
0.0005
lii_bk_new
traj_new.xyz
73001
146300
In your regexp you were confusing \n (the literal LineFeed character which will not be present in the string sed is analyzing since sed reads one \n-separated line at a time) with $ (the end-of-string regexp metacharacter which represents end-of-line when the string being parsed is a line as is done with sed by default).
The other mistake in your script was replacing 0 with null in the matching line instead of just deleting the matching line.
Please give some suggestions.
I would use GNU awk -i inplace for that following way:
awk -i inplace '!/^0$/' input_*
This simply will preserve all lines which do not match ^0$ i.e. (start of line)0(end of line). If you want to know more about -i inplace I suggest reading this tutorial.

Break a long line into multiple lines on every occurrence of a pattern using sed/awk

I have a file having single line (actually a very long line) as shown below:
{worda:wordB:[{"wordc","active":true},}{:wordb:"words","wordt""wordu""wordv","active":true} and so, on.
In this line, one pattern that is common is '"active":true'. I am trying to break this single line into multiple lines based upon this pattern.
Required output:
{worda:wordB:[{"wordc","active":true}
{:wordb:"words","wordt""wordu""wordv","active":true}
I tried sed and awk, however, either it is resulting into a 0 Kb file after processing or same file as the input file.
I have Windows environment and using GNU sed/awk for the same.
Please help.
$ sed 's/true}.*{/true}\n{/' file
{worda:wordB:[{"wordc","active":true}
{:wordb:"words","wordt""wordu""wordv","active":true}
Brief explanation,
use sed to substitute true}.*{ to true}\n{, where .* would be treated any characters, and \n would be the newline character

How to remove lines that match exact phrasse on Linux?

My file contains two lines with Unicode (probably) characters:
▒▒▒▒=
▒▒▒=
and I wish to remove both these lines from the file.
I searched and found I can use this command to remove non UTF-8 characters:
iconv -c -f utf-8 -t ascii file
but it leaves those two lines like this:
=
=
I can't find how to remove lines that match (not just contain, but match) certain phrase, in my case: =.
UPDATE: i found that when i redirect the "=" lines to other file, and open the file, it contains unwanted line: ^A=
which i was unable to match with sed to delete it.
This might work for you (GNU sed):
sed '/^\(\o342\o226\o222\)\+=/d' file
Use:
sed -n l file
To find the octal representation of the unicode characters and then use the \o... metacharacter in the regexp to match.
EDIT:
To remove the lines only containing = use:
sed '/^\(\o342\o226\o222\)\*=\s*$/d' file
Here is the command to clear these lines:
sed -i 's/^=$//g' your_file
As specify in the comment you can also use grep -v '^whatever$' your_file > cleared_file. Note that this solution required to set a different ouput (cleared_file) while the sed-solution allows you to modify the content "in place".

text processing: sed to work backwards to delete until string

My AWK script generates 1 of the following 2 outputs depending on what text file it is being used on.
49 1146.469387755102 mongodb 192.168.0.8:27017 -p mongodb.database
1 1243.0 jdbc:mysql 192.168.0.8:3306/ycsb -p db.user
I need a way of deleting everything past the IP address, including the port number.
sed 's/:[^:]*//2g'
Works apart from the fact it deletes from left to right and as one of the outputs contains 2 : 's it stops and deletes everything after that. Is there a way of reversing sed to work from right to left?
Just to be clear, desired output of each would be:
49 1146.469387755102 mongodb 192.168.0.8
1 1243.0 jdbc:mysql 192.168.0.8
You could use the below sed command.
sed 's/:[0-9]\{4\}.*//' file
OR
sed 's/:[^:]*$//' file
[^:]* negated character class which matches any char but not of :, zero or more times. $ matches the end of the line boundary. So :[^:]*$ matches all the chars from the last colon upto the end. Replacing those matched chars with empty string will give you the desired output.
You can take advantage of the greedy nature of the Kleene *:
sed 's/\(.*\):.*/\1/' file
The .* consumes as much as it can, while still matching the pattern. The captured part of the line is used in the replacement.
Alternatively, using awk (thanks to glenn jackman for setting me straight):
awk -F: -v OFS=: 'NF{NF--}1' file
Set the input and output field separators to a colon remove the final field by decrementing NF. 1 is true so the default action {print} is performed. The NF condition prevents empty lines from causing an error, which may not be necessary in your case but does no harm.
Output either way:
49 1146.469387755102 mongodb 192.168.0.8
1 1243.0 jdbc:mysql 192.168.0.8