Need solution for break line issue in string - scripting

I have below string which has enter character coming randomely and fields are separated by ~$~ and end with ##&.
Please help me to merge broken line into one.
In below string enter character is occured in address field (4/79A)
-------Sting----------
23510053~$~ABC~$~4313708~$~19072017~$~XYZ~$~CHINNUSAMY~$~~$~R~$~~$~~$~~$~42~$~~$~~$~~$~~$~28022017~$~
4/79A PQR Marg, Mumbai 4000001~$~TN~$~637301~$~Owns~$~RAT~$~31102015~$~12345~$~##&
Thanks in advance.
Rupesh

Seems to be a (more or less) duplicate of https://stackoverflow.com/a/802439/3595749
Note, you should ask to your client to remove the CRLF signs (rather than aplying the code below).
Nevertheless, try this:
cat inputfile | tr -d '\n' | sed 's/##&/##\&\n/g' >outputfile
Explanation:
tr is to remove the carriage return,
sed is to add it again (only when ##& is encountred). s/##&/##\&\n/g is to substitute "##&" by "##&\n" (I add a carriage return and "&" must be escaped). This applies globally (the "g" letter at the end).
Note, depending of the source (Unix or Windows), "\n" must be replaced by "\r\n" in some cases.

Related

Awk - How to escape the | in sub?

I'd like to substitue a string, which contains a |
My STDIN :
13|Test|123|6232
14|Move|126|6692
15|Test|123|6152
I'd like to obtain :
13|Essai|666|6232
14|Move|126|6692
15|Essai|666|6152
I tried like this
{sub("|Test|123","|Essai|666") ;} {print;}
But I think the | is bothers me.... I really need to replace the complete string WITH the |.
How should I do to get this result ?
Many thanks for you precious help
You can use
awk '{sub(/\|Test\|123\|/,"|Essai|666|")}1' file
See the online demo.
Note:
/\|Test\|123\|/ is a regex that matches |Test|123| substring
sub(/\|Test\|123\|/,"|Essai|666|") - replaces the first occurrence of the regex pattern in the whole record (since the input is omitted, $0 is assumed)
1 triggers the default print action, no need to explicitly call print here.

removing unconventional field separators (^#^#^#) in a text file [duplicate]

I have a text file containing unwanted null characters (ASCII NUL, \0). When I try to view it in vi I see ^# symbols, interleaved in normal text. How can I:
Identify which lines in the file contain null characters? I have tried grepping for \0 and \x0, but this did not work.
Remove the null characters? Running strings on the file cleaned it up, but I'm just wondering if this is the best way?
I’d use tr:
tr < file-with-nulls -d '\000' > file-without-nulls
If you are wondering if input redirection in the middle of the command arguments works, it does. Most shells will recognize and deal with I/O redirection (<, >, …) anywhere in the command line, actually.
Use the following sed command for removing the null characters in a file.
sed -i 's/\x0//g' null.txt
this solution edits the file in place, important if the file is still being used. passing -i'ext' creates a backup of the original file with 'ext' suffix added.
A large number of unwanted NUL characters, say one every other byte, indicates that the file is encoded in UTF-16 and that you should use iconv to convert it to UTF-8.
I discovered the following, which prints out which lines, if any, have null characters:
perl -ne '/\000/ and print;' file-with-nulls
Also, an octal dump can tell you if there are nulls:
od file-with-nulls | grep ' 000'
If the lines in the file end with \r\n\000 then what works is to delete the \n\000 then replace the \r with \n.
tr -d '\n\000' <infile | tr '\r' '\n' >outfile
Here is example how to remove NULL characters using ex (in-place):
ex -s +"%s/\%x00//g" -cwq nulls.txt
and for multiple files:
ex -s +'bufdo!%s/\%x00//g' -cxa *.txt
For recursivity, you may use globbing option **/*.txt (if it is supported by your shell).
Useful for scripting since sed and its -i parameter is a non-standard BSD extension.
See also: How to check if the file is a binary file and read all the files which are not?
I used:
recode UTF-16..UTF-8 <filename>
to get rid of zeroes in file.
I faced the same error with:
import codecs as cd
f=cd.open(filePath,'r','ISO-8859-1')
I solved the problem by changing the encoding to utf-16
f=cd.open(filePath,'r','utf-16')
Remove trailing null character at the end of a PDF file using PHP, . This is independent of OS
This script uses PHP to remove a trailing NULL value at the end of a binary file, solving a crashing issue that was triggered by the NULL value. You can edit this script to remove all NULL characters, but seeing it done once will help you understand how this works.
Backstory
We were receiving PDF's from a 3rd party that we needed to upload to our system using a PDF library. In the files being sent to us, there was a null value that was sometimes being appended to the PDF file. When our system processed these files, files that had the trailing NULL value caused the system to crash.
Originally we were using sed but sed behaves differently on Macs and Linux machines. We needed a platform independent method to extract the trailing null value. Php was the best option. Also, it was a PHP application so it made sense :)
This script performs the following operation:
Take the binary file, convert it to HEX (binary files don't like exploding by new lines or carriage returns), explode the string using carriage return as the delimiter, pop the last member of the array if the value is null, implode the array using carriage return, process the file.
//In this case we are getting the file as a string from another application.
// We use this line to get a sample bad file.
$fd = file_get_contents($filename);
//We trim leading and tailing whitespace and convert the string into hex
$bin2hex = trim(bin2hex($fd));
//We create an array using carriage return as the delminiter
$bin2hex_ex = explode('0d0a', $bin2hex);
//look at the last element. if the last element is equal to 00 we pop it off
$end = end($bin2hex_ex);
if($end === '00') {
array_pop($bin2hex_ex);
}
//we implode the array using carriage return as the glue
$bin2hex = implode('0d0a', $bin2hex_ex);
//the new string no longer has the null character at the EOF
$fd = hex2bin($bin2hex);

Character-set of SSH keys (safe delimiter for using sed with public keys)

I am using sed to replace a placeholder in a script with my public ssh key. The character / is definitely present in some SSH keys, how can I find out which character I can use as delimiter for sed?
I am looking for an answer of either the set of all characters that can be part of the string generated by ssh-keygen, or which characters are guaranteed not to.
The public key in opnessh format is base64-encoded as mentioned for example in the manual page for sshd. Therefore you can use any character that is not in the list of base64 characters. The / is there but for example | can be used safely (though in the comment section can be anything).
For information, from the info sed, section 3.5:
The '/' characters may be uniformly replaced by any other single character within any given 's' command.
The '/' character (or whatever other character is used in its stead) can
appear in the REGEXP or REPLACEMENT only if it is preceded by a '\'
character.
So you can chose any suitable character that doesn't appear in your input data.

Write multiple lines to text file with '\n'

I have a program that iterates over all lines of a text file, adds spaces between the characters, and writes the output to the same file. However, if there are multiple lines in the input, I want the output to have separate lines as well. I tried:
let text = format!(r"{}\n", line); // Add newline character to each line (while iterating)
file.write_all(text.as_bytes()); // Write each line + newline
Here is an example input text file:
foo
bar
baz
And its output:
f o o\n b a r\n b a z
It seems that Rust treats "\n" as an escaped n character, but using r"\n" treats it as a string. How can I have Rust treat \n as a newline character to write multiple lines to a text file?
Note: I can include the rest of my code if you need it, let me know.
Edit: I am on Windows 7 64 bit
The problem is the 'r' in front of your string. Remove it and your program will print newlines instead of '\n'.
Also note that only most Unices use '\n' as newline. Windows uses "\r\n".

Import format to intellij idea from JSMin/JSFormat

Does anybody knows which formatting rules uses jsmin/jsformatter plugin of Notepad++? I need this because we are forced to use this formatter but I'm using intellij idea to write js code. So having this rules I can import it some how or, at least, apply manually.
Thanks everyone in advance!
The minimising rules applied are listed here:
http://www.crockford.com/javascript/jsmin.html
JSMin is a filter that omits or modifies some characters. This does
not change the behavior of the program that it is minifying. The
result may be harder to debug. It will definitely be harder to read.
JSMin first replaces carriage returns ('\r') with linefeeds ('\n'). It
replaces all other control characters (including tab) with spaces. It
replaces comments in the // form with linefeeds. It replaces comments
in the /* */ form with spaces. All runs of spaces are replaced with a
single space. All runs of linefeeds are replaced with a single
linefeed.
It omits spaces except when a space is preceded and followed by a
non-ASCII character or by an ASCII letter or digit, or by one of these
characters:
\ $ _
It is more conservative in omitting linefeeds, because linefeeds are
sometimes treated as semicolons. A linefeed is not omitted if it
precedes a non-ASCII character or an ASCII letter or digit or one of
these characters:
\ $ _ { [ ( + -
and if it follows a non-ASCII character or an ASCII letter or digit or
one of these characters:
\ $ _ } ] ) + - " '
No other characters are omitted or modified.
There are other custom formatting rules applied according to the plugin developer's page:
http://www.sunjw.us/jsminnpp/