Change files delimiter from TAB and Pipe(|) to CTRL-A - awk

I have two .txt files, one with TAB as field delimiter and another with | as field delimiter.
I want to change the delimiter from TAB to CTRL-A and save as .txt file for the first file and for the second file change the delimiter from | to CTRL-A and save as .txt file.
These two files are separate files.
How can we do it using awk or sed?

For file one, try:
cat file1 | sed -e 's/\t/\x01/g' >file1.txt
For file two, try
cat file2 | sed -e 's/\|/\x01/g' >file2.txt

This is a great use for tr:
tr '\t' '\001' <file1 >file1-new
That will perform the transformation from horizontal tabs on file1 and output the results to file1-new. You can do the same thing with pipes.

An alternative using perl:
Replacing pipes:
echo "a|b|c" | perl -pe '$c=chr(1); s/\|/$c/g' | cat -A
a^Ab^Ac$
Replacing tabs:
echo -e "a\tb\tc" | perl -pe '$c=chr(1); s/\t/$c/g' | cat -A
a^Ab^Ac$

Related

bash script variables weird results when using cp

If I use cp inside a bash script the copied file will have weird charachters around the destination filename.
The destination name comes from the results of an operation, it's put inside a variable, and echoing the variable shows normal output.
The objective is to name a file after a string.
#!/bin/bash
newname=`cat outputfile | grep 'hostname ' | sed 's/hostname //g'
newecho=`echo $newname`
echo $newecho
cp outputfile "$newecho"
If I launch the script the echo looks ok
$ ./rename.sh
mo-swc-56001
However the file is named differently
~$ ls
'mo-swc-56001'$'\r'
As you can see the file contains extra charachters which the echo does not show.
Edit: the newline of the file is like this
# file outputfile
outputfile: ASCII text, with CRLF, CR line terminators
I tried in every possible way to get rid of the ^M charachter but this is an example of the hundreds of attempts
# cat outputfile | grep 'hostname ' | sed 's/hostname //g' | cat -v
mo-swc-56001^M
# cat outputfile | grep 'hostname ' | sed 's/hostname //g' | cat -v | sed 's/\r//g' | cat -v
mo-swc-56001^M
This newline will stay there. Any ideas?
Edit: crazy, the only way is to perform a dos2unix on the output...
Looks like your outputfile has \r characters in it, so you could add logic there to remove them and give it a try then.
#!/bin/bash
##remove control M first from outputfile by tr command.
tr -d '\r' < outputfile > temp && mv temp outputfile
newname=$(sed 's/hostname //g' outputfile)
newecho=`echo $newname`
echo $newecho
cp outputfile "$newecho"
The only way was to use dos2unix

awk/sed solution for printing only next line after it matches a pattern

I have multiple files in a folder. This is how a file look like
File1.txt
ghfgh gfghh
dffd kjkjoliukjkj
sdf ffghf
sf 898575
sfkj utiith
##
my data to be extracted
I want to extract the line immediately below "##" pattern from all the files and write them to an output file. I want the file name to be appended too in the output file.
Desired output
>File1
My data to be extracted
>File2
My data to be extracted
>File3
My data to be extracted
This is what i tried
awk '/##/{getline; print FILENAME; print ">"; print}' *.txt > output.txt
assumes one extract per file (otherwise filename header will be repeated)
$ awk '/##/{f=1; next} f{print ">"FILENAME; print; f=0}' *.txt > output.txt
Perl to the rescue!
perl -ne 'print ">$ARGV\n", scalar <> if /^##/' -- *.txt > output.txt
-n reads the input line by line
$ARGV contains the current input file name
scalar <> reads one line from the input
a quick way with grep:
grep -A1 '##' *.txt|grep -v '##' > output.txt
POSIX or GNU sed:
$ sed -n '/^##/{n;p;}' file
my data to be extracted
grep and sed:
$ grep -A 1 '##' file | sed '1d'
my data to be extracted

Removing blank lines

I have a csv file in which every other line is blank. I have tried everything, nothing removes the lines. What should make it easier is that the the digits 44 appear in each valid line. Things I have tried:
grep -ir 44 file.csv
sed '/^$/d' <file.csv
cat -A file.csv
sed 's/^ *//; s/ *$//; /^$/d' <file.csv
egrep -v "^$" file.csv
awk 'NF' file.csv
grep '\S' file.csv
sed 's/^ *//; s/ *$//; /^$/d; /^\s*$/d' <file.csv
cat file.csv | tr -s \n
Decided I was imagining the blank lines, but import into Google Sheets and there they are still! Starting to question my sanity! Can anyone help?
sed -n -i '/44/p' file
-n means skip printing
-i inplace (overwrite same file)
- /44/p print lines where '44' exists
without '44' present
sed -i '/^\s*$/d' file
\s is matching whitespace, ^startofline, $endofline, d delete line
Use the -i option to replace the original file with the edited one.
sed -i '/^[ \t]*$/d' file.csv
Alternatively output to another file and rename it, which is doing the exactly what -i does.
sed '/^[[:blank:]]*$/d' file.csv > file.csv.out && mv file.csv.out file.csv
Given:
$ cat bl.txt
Line 1 (next line has a tab)
Line 2 (next has several space)
Line 3
You can remove blank lines with Perl:
$ perl -lne 'print unless /^\s*$/' bl.txt
Line 1 (next line has a tab)
Line 2 (next has several space)
Line 3
awk:
$ awk 'NF>0' bl.txt
Line 1 (next line has a tab)
Line 2 (next has several space)
Line 3
sed + tr:
$ cat bl.txt | tr '\t' ' ' | sed '/^ *$/d'
Line 1 (next line has a tab)
Line 2 (next has several space)
Line 3
Just sed:
$ sed '/^[[:space:]]*$/d' bl.txt
Line 1 (next line has a tab)
Line 2 (next has several space)
Line 3
Aside from the fact that your commands do not show that you capture their output in a new file to be used in place of the original, there's nothing wrong with them, EXCEPT that:
cat file.csv | tr -s \n
should be:
cat file.csv | tr -s '\n' # more efficient alternative: tr -s '\n' < file.csv
Otherwise, the shell eats the \ and all that tr sees is n.
Note, however, that the above only eliminates only truly empty lines, whereas some of your other commands also eliminate blank lines (empty or all-whitespace).
Also, the -i (for case-insensitive matching) in grep -ir 44 file.csv is pointless, and while using -r (for recursive searches) will not change the fact that only file.csv is searched, it will prepend the filename followed by : to each matching line.
If you have indeed captured the output in a new file and that file truly still has blank lines, the cat -A (cat -et on BSD-like platforms) you already mention in your question should show you if any unusual characters are present in the file, in the form of ^<char> sequences, such as ^M for \r chars.
If you like awk, this should do:
awk '/44/' file
It will only print lines that contains 44

Shell script: How to split line?

here's my scanerio:
my input file like:
/tmp/abc.txt
/tmp/cde.txt
/tmp/xyz/123.txt
and i'd like to obtain the following output in 2 files:
first file
/tmp/
/tmp/
/tmp/xyz/
second file
abc.txt
cde.txt
123.txt
thanks a lot
Here is all in one single awk
awk -F\/ -vOFS=\/ '{print $NF > "file2";$NF="";print > "file1"}' input
cat file1
/tmp/
/tmp/
/tmp/xyz/
cat file2
abc.txt
cde.txt
123.txt
Here we set input and output separator to /
Then print last field $NF to file2
Set the last field to nothing, then print the rest to file1
I realize you already have an answer, but you might be interested in the following two commands:
basename
dirname
If they're available on your system, you'll be able to get what you want just piping through these:
cat input | xargs -l dirname > file1
cat input | xargs -l basename > file2
Enjoy!
Edit: Fixed per quantdev's comment. Good catch!
Through grep,
grep -o '.*/' file > file1.txt
grep -o '[^/]*$' file > file2.txt
.*/ Matches all the characters from the start upto the last / symbol.
[^/]*$ Matches any character but not of / zero or more times. $ asserts that we are at the end of a line.
The awk solution is probably the best, but here is a pure sed solution :
#n sed script to get base and file paths
h
s/.*\/\(.*.txt\)/\1/
w file1
g
s/\(.*\)\/.*.txt/\1/
w file2
Note how we hold the buffer with h, and how we use the write (w) command to produce the output files. There are many other ways to do it with sed, but I like this one for using multiple different commands.
To use it :
> sed -f sed_script testfile
Here is another oneliner that uses tee:cat f1.txt | tee >(xargs -n 1 dirname >> f2.txt) >(xargs -n 1 basename >> f3.txt) &>/dev/random

Using grep and awk to search and print the output to new file

I have 100 files and want to search a specific word in the first column of each file and print the content of all columns from this word to a new file
I tried this code but doesn't work well it prints only the content of one file not all:
ls -t *.txt > Filelist.tmp
cat Filelist.tmp | while read line do; grep "searchword" | awk '{print $0}' > outputfile.txt; done
This is what you want:
$ awk '$1~/searchword/' *.txt >> output
This compares the first field against searchword and appends the line to output if it matches. The default field separator with awk is whitespace.
The main problem with your attempt is you are overwriting > the file evertime, you want to be using append >>.