how to show ^M char in gedit (redhat)? - gedit

When I open file using cat command in terminal then '^M' this symbol is shown but in gedit this symbol is not visible.How to shown on gedit or how to remove?

^M, as noted in the comments, is the carriage-return character. Instead of looking for a way to make gefit display them just so you could remove them manually, it may be easier to use dos2unix:
$ dos2unix myfile.txt

Related

Writing special characters (α, β) in bash to an output file - not encoding correctly

I'm attempting to modify an existing file (input.csv) using the awk command in the OSX terminal and then writing the output to a file (output.csv).
The command I have works for the text changes I need and displays the output correctly in the terminal (including correct special characters):
awk '{for(i=1;i<=NF;i++){ $i=toupper(substr($i,1,1)) substr($i,2) }}1' input.csv
I then attempt to write this output to a file by piping it to the tee command:
awk '{for(i=1;i<=NF;i++){ $i=toupper(substr($i,1,1)) substr($i,2) }}1' input.csv | tee output.csv
I find the that the special characters in the file are now corrupted, e.g. "α-Synuclein"
becomes "α-synuclein". I believe this is to do with the encoding but am unsure how to specify which to use or where to change it.
I've noticed the by running the command file input.txt the encoding listed is:
"UTF-8 Unicode (with BOM) text, with very long lines, with CRLF line
terminators"
When I run file output.txt on the output file I get:
"Non-ISO extended-ASCII text, with very long lines, with CRLF, LF line
terminators"
I think this means I should be encoding the output file as UTF-8... Can anyone suggest a way to fix this so that I can write these characters to my output file without them becoming corrupted?

Checking GAWK binary characters

Win 7-64
Cygwin
GNU Awk 5.1.0
I'm trying to develop a program to handle both DOS an Unix formatted files. It looks like the only difference between the two at the application level is that the last character of a line in a DOS file is "\r". I can't figure out how to do a comparison.
My input looks like "w,x,y,z", where z can be "" in Unix for "\r" in DOS. The following does not work:
if (z || z == "\r") # check for Unix ($4) and DOS ($4 == "\r").
gawk may not even see the \r as they can be stripped off by underlying primitives. You need to set BINMODE to make sure your script sees them. See https://www.gnu.org/software/gawk/manual/gawk.html#Built_002din-Variables and https://www.gnu.org/software/gawk/manual/gawk.html#PC-Using where it talks about:
Under MS-Windows, gawk (and many other text programs) silently
translates end-of-line ‘\r\n’ to ‘\n’ on input and ‘\n’ to ‘\r\n’ on
output.
So you can do:
awk -v BINMODE=3 '
{ print "is " ( /\r$/ ? "" : "not " ) "dos" }
'
but even with that you can't tell on a line-by line basis if a file has DOS line-endings or not since Windows tools can generate quoted text with linefeeds in the middle, e.g. Excel would output
"foo","some\nother\nstuff","bar"\r\n
where the above is a single row in the spreadsheet if the middle cell in a spreadsheet contained a couple of linefeeds and that'd look like this and be read as 3 separate lines by gawk on a UNIX platform unless you specifically set RS='\r\n':
"foo","some
other
stuff","bar"\r
So to detect if your file has DOS line-endings or not you need to search your whole input file til you find \r\n and even then you don't really KNOW if that's what it means - could be the data just happened to have a \r at the end of one line.
From your comments below I think you're trying to do something that is simply impossible.
Here's a file created on UNIX and using UNIX line endings where the final field is bar\r:
foo,bar\r\n
Here's a file created on Windows and using DOS line endings where the final field is bar:
foo,bar\r\n
As you can see there's no way to programmatically determine given just the file foo,bar\r\n which of the above 2 cases that means.
$ cat -t carriage
a1^M
a2^M
$ cat -t nocarriage
a1
a2
$ gawk '/\r/' carriage
a1
a2
$ gawk '/\r/' nocarriage
As you can see, with gawk it is straightforward to check if each line has carriage-returns. Writing the octal \015 is a possible alternative to \r.

How to extract the first column from a tsv file?

I have a file containing some data and I want to use only the first column as a stdin for my script, but I'm having trouble extracting it.
I tried using this
awk -F"\t" '{print $1}' inputs.tsv
but it only shows the first letter of the first column. I tried some other things but it either shows the entire file or just the first letter of the first column.
My file looks something like this:
Harry_Potter 1
Lord_of_the_rings 10
Shameless 23
....
You can use cut which is available on all Unix and Linux systems:
cut -f1 inputs.tsv
You don't need to specify the -d option because tab is the default delimiter. From man cut:
-d delim
Use delim as the field delimiter character instead of the tab character.
As Benjamin has rightly stated, your awk command is indeed correct. Shell passes literal \t as the argument and awk does interpret it as a tab, while other commands like cut may not.
Not sure why you are getting just the first character as the output.
You may want to take a look at this post:
Difference between single and double quotes in Bash
Try this (better rely on a real csv parser...):
csvcut -c 1 -f $'\t' file
Check csvkit
Output :
Harry_Potter
Lord_of_the_rings
Shameless
Note :
As #RomanPerekhrest said, you should fix your broken sample input (we saw spaces where tabs are expected...)

awk to replace a line in a text file and save it

I want to open a text file that has a list of 500 IP addresses. I want to make the following changes to one of the lines and save the file. Is it possible to do that with awk or sed?
current line :
100.72.78.46:1900
changes :
100.72.78.46:1800
You can achieve that with the following:
sed -ie 's/100.72.78.46:1900/100.72.78.46:1800/' file.txt
The i option will update the original file, and a backup file will be created. This will edit only the first occurrence of the pattern. If you want to replace all matching patterns, add a g after the last /
This solution, however (as point out on the comments) fails in many other instances, such as 72100372578146:190032, which would transform into 72100.72.78.46:180032.
To circumvent that, you'd have to do an exact match, and also not treat the . as special character (see here):
sed -ie 's/\<100\.72\.78\.46:1900\>/100.72.78.46:1800/g' file.txt
note the \. and the \<...\> "word boundary" notation for the exact match. This solution worked for me on a Linux machine, but not on a MAC. For that, you would have to use a slightly different syntax (see here):
sed -ie 's/[[:<:]]100\.72\.78\.46:1900[[:>:]]/100.72.78.46:1800/g' file.txt
where the [[:<:]]...[[:>:]] would give you the exact match.
finally, I also realized that, if you have only one IP address per line, you could also use the special characters ^$ for the beginning and end of line, preventing the erroneous replacement:
sed -ie 's/^100\.72\.78\.46:1900$/100.72.78.46:1800/g' file.txt

How to remove lines that match exact phrasse on Linux?

My file contains two lines with Unicode (probably) characters:
▒▒▒▒=
▒▒▒=
and I wish to remove both these lines from the file.
I searched and found I can use this command to remove non UTF-8 characters:
iconv -c -f utf-8 -t ascii file
but it leaves those two lines like this:
=
=
I can't find how to remove lines that match (not just contain, but match) certain phrase, in my case: =.
UPDATE: i found that when i redirect the "=" lines to other file, and open the file, it contains unwanted line: ^A=
which i was unable to match with sed to delete it.
This might work for you (GNU sed):
sed '/^\(\o342\o226\o222\)\+=/d' file
Use:
sed -n l file
To find the octal representation of the unicode characters and then use the \o... metacharacter in the regexp to match.
EDIT:
To remove the lines only containing = use:
sed '/^\(\o342\o226\o222\)\*=\s*$/d' file
Here is the command to clear these lines:
sed -i 's/^=$//g' your_file
As specify in the comment you can also use grep -v '^whatever$' your_file > cleared_file. Note that this solution required to set a different ouput (cleared_file) while the sed-solution allows you to modify the content "in place".