No difference in output running 'cut -f 1 sample.txt' - cut

echo 'The quick brown; fox jumps over the lazy dog' > sample.txt
I then run
cut -f 1 sample.txt
or
cut -f 2 sample.txt
and my output is always the same,
The quick brown; fox jumps; over the lazy dog;
should't the output of the first 'cut' command be 'dog'? Why is the output the same if I run each 'cut' command?

the default separator is tab, if you want space instead set it with -d (delimeter)
cut -f 1 -d ' ' sample.txt
The
https://en.wikibooks.org/wiki/Cut

Related

Awk to print a single next word followin a pattern match

This Q is a variation on the theme of printing something after a pattern.
There will be input lines with words. Some lines will match a pattern where the pattern will be one or multiple words separated by space. The pattern might have a leading/trailing space which needs to be obeyed. I need to print the word immediately following the match.
Example input
The quick brown fox jumps over the lazy dog
Pattern : "brown fox "
Desired output : jumps
The pattern will only occur once in the line. There will always be a word following the pattern. There will be lines without the pattern.
awk or sed would be nice.
Cheers.
EDIT :
I failed to ask the question properly. There will be one or more spaces between the pattern and the next word. This breaks Andre's proposal.
% echo -e "The quick brown fox jumps over the lazy dog\n" | awk -F 'brown fox ' 'NF>1{ sub(/ .*/,"",$NF); print $NF }'
jumps
% echo -e "The quick brown fox jumps over the lazy dog\n" | awk -F 'brown fox ' 'NF>1{ sub(/ .*/,"",$NF); print $NF }'
This works, given that the desired word is followed by a space:
$ echo -e "The quick brown fox jumps over the lazy dog\n" > file
$ awk -F 'brown fox ' 'NF>1{ sub(/ .*/,"",$NF); print $NF }' file
jumps
Edit:
If there're more spaces use this:
$ awk -F 'brown fox' 'NF>1{ sub(/^ */,"",$NF);
sub(/ .*/,"",$NF); print $NF }' file
Disclaimer: this solution assumes that if no pattern is found (There will be lines without the pattern.) it is appropriate to print empty line, if this does not hold true ignore this answer entirely.
I would use AWK for this following way, let file.txt content be
The quick brown fox jumps over the lazy dog
No animals in this line
The quick brown fox jumps over the lazy dog
then
awk 'BEGIN{FS="brown fox *"}{sub(/ .*/,"",$2);print $2}' file.txt
output
jumps
jumps
Explanation: I set field seperator FS to "brown fox " followed by any numbers of spaces. What is after this will appear in 2nd column, I jettison from 2nd column anything which is after first space including said space, then print that column. In case there is no match, second column is empty and these actions result in empty line.
With GNU grep:
$ grep -oP '(?<=brown fox )(\w+)' file
jumps
If you have more than 1 space after the match:
$ echo 'The quick brown fox jumps over the lazy dog' | grep -oP '(?<=\bbrown fox\b)\s+\K(\w+)'
jumps
Perl, with the same regex:
$ perl -lnE 'print $1 if /(?<=\bbrown fox )(\w+)/' file
Or, if you have multiple spaces:
$ perl -lnE 'print $1 if /(?<=brown fox)\s+(\w+)/' file
(As stated in comments, both the GNU grep and Perl regex could be \bbrown\h+fox\h+\K\w+ which has the advantage of supporting multiple spaces between brown and fox)
With awk, you can split on the string and split the result (this works as-is for multi spaces):
pat='brown fox'
awk -v pat="$pat" 'index($0, pat){
split($0,arr, pat)
split(arr[2], arr2)
print arr2[1]}' file
With GNU awk, you might also use a capture group with function match.
\ybrown\s+fox\s+(\w+)
\y A word boundary
brown\s+ Match brown and 1+ whitespace chars
fox\s+ Match fox and 1+ whitespace chars
(\w+) Capture 1+ word chars in group 1
In awk, get the group 1 value using arr[1]
Example
echo "The quick brown fox jumps over the lazy dog" |
awk 'match($0,/\ybrown\s+fox\s+(\w+)/, arr) {print arr[1]}'
Output
jumps
See a bash demo

How do I decrement all array indexes in a text file?

Background
I have a text file that looks like the following:
$SomeText.element_[1]="MoreText[3]";\r"
$SomeText.element_[2]="MoreText[6]";\r"
$SomeText.element_[3]="MoreText[2]";\r"
$SomeText.element_[4]="MoreText[1]";\r"
$SomeText.element_[5]="MoreText[5]";\r"
This goes on for over a thousand lines. I want to do the following:
$SomeText.element_[0]="MoreText[3]";\r"
$SomeText.element_[1]="MoreText[6]";\r"
$SomeText.element_[2]="MoreText[2]";\r"
$SomeText.element_[3]="MoreText[1]";\r"
$SomeText.element_[4]="MoreText[5]";\r"
Each line of text in the file should have the left most index reduced by one, with the rest of the text unchanged.
Attempted Solutions
So far I have tried the following...but the issue for me is I do not know how to feed it back into the file properly:
Attempt 1
I tried a double cutting technique:
cat file.txt | cut -d '[' -f2 | cut -d ']' -f1 | xargs -I {} expr {} + 1
This properly outputs all of the indicies reduced by one to the command line.
Attempt 2
I tried using awk with a mix of sed, but this caused by machine to hang:
awk -F'[' '{printf("%d\n", $2-1)}' file.txt | xargs -I {} sed -i 's/\[\d+\]/{}/g' file.txt
Question
How to I properly decrement all of the array indexes in the file by one and properly write the decremented indexes into the right location of the text file?
A Perl one-liner makes this easy, overwriting the input file:
perl -pi -e 's/(\d+)/$1-1/e' your-file-name-here
(assuming the first number on each line is the index)
With simple awk you could try following, written and tested with shown samples.
awk '
match($0,/\[[^]]*/){
print substr($0,1,RSTART) count++ substr($0,RSTART+RLENGTH)
}
' Input_file
OR in case your Input_file's count in between [..] is in any order then simply reduce 1 from them as follows.
awk '
match($0,/\[[^]]*/){
print substr($0,1,RSTART) substr($0,RSTART+1,RLENGTH)-1 substr($0,RSTART+RLENGTH)
}
' Input_file
With GNU sed and bash:
sed -E "s/([^[]*\[)([0-9]+)(].*)/printf '%s%d%s\n' '\1' \$((\2 - 1)) '\3'/e" file
Or, if it is possible that the lines contain ' character:
sed -E "
/\[[0-9]+]/{
s/'/'\\\''/g
s/([^[]*\[)([0-9]+)(].*)/printf '%s%d%s\n' '\1' \$((\2 - 1)) '\3'/e
}" file

awk print overwrite strings

I have a problem using awk in the terminal.
I need to move many files in a group from the actual directory to another one and I have the list of the necessary files in a text file, as:
filename.txt
file1
file2
file3
...
I usually digit:
paste filename.txt | awk '{print "mv "$1" ../dir/"}' | sh
and it executes:
mv file1 ../dir/
mv file2 ../dir/
mv file3 ../dir/
It usually works, but now the command changes its behaviour and awk overwrites the last string ../dir/ on the first one, starting again the print command from the initial position, obtaining:
../dire1 ../dir/
../dire2 ../dir/
../dire3 ../dir/
and of course it cannot be executed.
What's happened?
How do I solve it?
Your input file contains carriage returns (\r aka control-M). Run dos2unix on it before running a UNIX tool on it.
idk what you're using paste for though, and you should not be using awk for this at all anyway, it's just a job for a simple shell script, e.g. remove the echo once you've tested this:
$ < file xargs -n 1 -I {} echo mv "{}" "../dir"
mv file1 ../dir
mv file2 ../dir
mv file3 ../dir

Removing blank lines

I have a csv file in which every other line is blank. I have tried everything, nothing removes the lines. What should make it easier is that the the digits 44 appear in each valid line. Things I have tried:
grep -ir 44 file.csv
sed '/^$/d' <file.csv
cat -A file.csv
sed 's/^ *//; s/ *$//; /^$/d' <file.csv
egrep -v "^$" file.csv
awk 'NF' file.csv
grep '\S' file.csv
sed 's/^ *//; s/ *$//; /^$/d; /^\s*$/d' <file.csv
cat file.csv | tr -s \n
Decided I was imagining the blank lines, but import into Google Sheets and there they are still! Starting to question my sanity! Can anyone help?
sed -n -i '/44/p' file
-n means skip printing
-i inplace (overwrite same file)
- /44/p print lines where '44' exists
without '44' present
sed -i '/^\s*$/d' file
\s is matching whitespace, ^startofline, $endofline, d delete line
Use the -i option to replace the original file with the edited one.
sed -i '/^[ \t]*$/d' file.csv
Alternatively output to another file and rename it, which is doing the exactly what -i does.
sed '/^[[:blank:]]*$/d' file.csv > file.csv.out && mv file.csv.out file.csv
Given:
$ cat bl.txt
Line 1 (next line has a tab)
Line 2 (next has several space)
Line 3
You can remove blank lines with Perl:
$ perl -lne 'print unless /^\s*$/' bl.txt
Line 1 (next line has a tab)
Line 2 (next has several space)
Line 3
awk:
$ awk 'NF>0' bl.txt
Line 1 (next line has a tab)
Line 2 (next has several space)
Line 3
sed + tr:
$ cat bl.txt | tr '\t' ' ' | sed '/^ *$/d'
Line 1 (next line has a tab)
Line 2 (next has several space)
Line 3
Just sed:
$ sed '/^[[:space:]]*$/d' bl.txt
Line 1 (next line has a tab)
Line 2 (next has several space)
Line 3
Aside from the fact that your commands do not show that you capture their output in a new file to be used in place of the original, there's nothing wrong with them, EXCEPT that:
cat file.csv | tr -s \n
should be:
cat file.csv | tr -s '\n' # more efficient alternative: tr -s '\n' < file.csv
Otherwise, the shell eats the \ and all that tr sees is n.
Note, however, that the above only eliminates only truly empty lines, whereas some of your other commands also eliminate blank lines (empty or all-whitespace).
Also, the -i (for case-insensitive matching) in grep -ir 44 file.csv is pointless, and while using -r (for recursive searches) will not change the fact that only file.csv is searched, it will prepend the filename followed by : to each matching line.
If you have indeed captured the output in a new file and that file truly still has blank lines, the cat -A (cat -et on BSD-like platforms) you already mention in your question should show you if any unusual characters are present in the file, in the form of ^<char> sequences, such as ^M for \r chars.
If you like awk, this should do:
awk '/44/' file
It will only print lines that contains 44

In a file with two word lines, grep only those lines which have both words from a whitelist

I have a file1:
green
yellow
apple
mango
and a file2:
red apple
blue banana
yellow mango
purple cabbage
I need to find elements from file2 where both words belong to the list in file1. So it should show:
yellow mango
I tried:
awk < file2 '{if [grep -q $1 file1] && [grep -q $2 file1]; then print $0; fi}'
I am getting syntax error.
This will do the trick:
$ awk 'NR==FNR{a[$0];next}($1 in a)&&($2 in a)' file1 file2
yellow mango
Explanation:
NR is a special awk variable the tracks the current line in the input and FNR tracks the current line in each individual file so the condition NR==FNR is only true when we are in the first file. a is a associative array where the keys are each unique line in the first file. $0 is the value of the current line in awk. The next statement jumps to the next line in file to the next part of skip is not executed. The final part is straight forward if the first field $1 is in the array a and the second field then print the current line. The default block in awk is {print $0} so this is implicit.
This is a very hackish approach and probably frowned upon by many of the grep/sed implementors. In addition it is probably terminal dependent. You have been warned.
GNU grep, when in color mode, highlights pieces of the input that were matched by one of the patterns, this could in theory, be used as a test for a full match. Here, this even works in practice, that is, with some help from GNU sed:
grep --color=always -f file1 file2 | sed -n '/^\x1b.*\x1b\[K *\x1b.*\x1b\[K$/ { s/\x1b\[K//g; s/\x1b[^m]*m//gp }'
Output:
yellow mango
Note that the sed pattern assumes space separated columns in file2.
You can do it with bash, sed and grep:
grep -f <(sed 's/^/^/' file1) file2 | grep -f <(sed 's/$/$/' file1)
this is a bit obscure, so I will break it down:
grep -f <file> reads a sequence of patterns from a file and will match on any of them.
<(...) is bash process substitution and will execute a shell command and create a pseudo-file with the output that can be used in place of a filename.
sed 's/^/^/' file1 inserts a ^ character at the start of each line in file1, turning the lines into patterns that will match the first word of file2.
sed 's/$/$/' file1 inserts a $ character at the end, so the patterns will match the second word.
Edit:
Use:
grep -f <(sed 's/^/^/;s/$/\b/' file1) file2 | grep -f <(sed 's/$/$/;s/^/\b/' file1)
to get round the issue that Jonathan pointed out in his comment.