delete line if string is matched and the next line contains another string - awk

got an annoying text manipulation problem, i need to delete a line in a file if it contains a string but only if the next line also contains another string. for example, i have these lines:
john paul
george
john paul
12
john paul
i want to delete any line containing 'john paul' if it is immediately followed by a line that contains 'george', so it would return:
george
john paul
12
john paul
not sure how to grep or sed this. if anyone could lend a hand that'd be great!

This might work for you (GNU sed):
sed '/john paul/{$!N;/\n.*george/!P;D}' file
If the line contains john paul read the next line and if it contains george don't print the first line.
N.B. If the line containing george contains john paul it will be checked also.

awk 'NR > 1 && !(/george/ && p ~ /john paul/) { print p } { p = $0 } END { print }' file
Output:
george
john paul
12
john paul

This awk should do:
cat file
john paul
george
john paul
12
john paul
hans
george
awk 'f~/john paul/ && /george/ {f=$0;next} NR>1 {print f} {f=$0} END {print}' file
george
john paul
12
john paul
hans
george
This will only delete name above george if it is john paul

Here is one version more general:
if the lines matches a string and previous line was exactly "john paul" then do nothing, otherwise, print the previous line. (change the ^[a-zA-Z]$ part to george if you only want george to be detected.
awk '!(/^[a-zA-W]+$/ && previous ~/^john paul$/){print previous}{previous=$0}END{print}'
In your example:
$> echo 'john paul
george
john paul
12
john paul' |awk '!(/^[a-zA-W]+$/ && previous ~/^john paul$/){print previous}{previous=$0}END{print}'
george
john paul
12
john paul
if there is some numbers in the line, it prints the previous, otherwise it doesn't:
$> echo 'john paul
george 234
john paul
auie
john paul' |awk '!(/^[a-zA-W]+$/ && previous ~/^john paul$/){print previous}{previous=$0}END{print}'
john paul
george 234
auie
john paul

The sed solution is short: two commands and lots of comments ;)
/john paul/ {
# read the next line and append to pattern space
N
# and then if we find "george" in that next line,
# only retain the last line in the pattern space
s/.*\n\(.*george\)/\1/
# and finally print the pattern space,
# as we don't use the -n option
}
You put the above in some sedscript file and then run:
sed -f sedscript your_input_file

Just to throw some Perl into the mix:
perl -ne 'print $p unless /george/ && $p =~ /john paul/; $p = $_ }{ print $p' file
Print the previous line, unless the current line matches /george/ and the previous line $p matched /john paul/. Set $p to the value of the previous line. }{ effectively creates an END block, so the last line is also printed after the file has been read.

You might have to change the \r\n to \n or to \r, other than that this should work:
<?php
$string = "john paul
george
john paul
12
john paul";
$string = preg_replace("#john paul\r\n(george)#i",'$1',$string);
echo $string;
?>
You could also read a file into the variable and then after overwrite the file.

With GNU awk for multi-char RS:
$ gawk -vRS='^$' '{gsub(/john paul\ngeorge/,"george")}1' file
george
john paul
12
john paul
or if there's more on each line than your sample input shows just change the RE to suit and use gensub():
$ gawk -vRS='^$' '{$0 = gensub(/[^\n]*john paul[^\n]*\n([^\n]*george[^\n]*)/,"\\1","")}1' file
george
john paul
12
john paul

Related

Add file name as a new column with awk

First of all existing questions didn't solve my problem that's why I am asking again.
I have two txt files temp.txt
adam 12
george 15
thomas 20
and demo.txt
mark 8
richard 11
james 18
I want to combine them and add a 3rd column as their file names without extension, like this:
adam 12 temp
george 15 temp
thomas 20 temp
mark 8 demo
richard 11 demo
james 18 demo
I used this script:
for i in $(ls); do name=$(basename -s .txt $i)| awk '{OFS="\t";print $0, $name} ' $i; done
But it yields following table:
mark 8 mark 8
richard 11 richard 11
james 18 james 18
adam 12 adam 12
george 15 george 15
thomas 20 thomas 20
I don't understand why it gives the name variable as the whole table.
Thanks in advance.
Awk has no access to Bash's variables, or vice versa. Inside the Awk script, name is undefined, so $name gets interpreted as $0.
Also, don't use ls in scripts, and quote your shell variables.
Finally, the assignment of name does not print anything, so piping its output to Awk makes no sense.
for i in ./*; do
name=$(basename -s .txt "$i")
awk -v name="$name" '{OFS="\t";print $0, $name}' "$i"
done
As such, the basename calculation could easily be performed natively in Awk, but I leave that as an exercise. (Hint: sub(regex, "", FILENAME))
awk has a FILENAME variable whose value is the path of the file being processed, and a FNR variable whose value is the current line number in the file;
so, at FNR == 1 you can process FILENAME and store the result in a variable that you'll use afterwards:
awk -v OFS='\t' '
FNR == 1 {
basename = FILENAME
sub(".*/", "", basename) # strip from the start up to the last "/"
sub(/\.[^.]*$/, "", basename) # strip from the last "." up to the end
}
{ print $0, basename }
' ./path/temp.txt ./path/demo.txt
adam 12 temp
george 15 temp
thomas 20 temp
mark 8 demo
richard 11 demo
james 18 demo
Using BASH:
for i in temp.txt demo.txt ; do while read -r a b ; do printf "%s\t%s\t%s\n" "$a" "$b" "${i%%.*}" ; done <"$i" ; done
Output:
adam 12 temp
george 15 temp
thomas 20 temp
mark 8 demo
richard 11 demo
james 18 demo
For each source file read each line and use printf to output tab-delimited columns including the current source file name without extension via bash parameter expansion.
First, you need to unmask $name which is inside the single quotes, so does not get replaced by the filename from the shell. After you do that, you need to add double quotes around $name so that awk sees that as a string:
for i in $(ls); do name=$(basename -s .txt $i); awk '{OFS="\t";print $0, "'$name'"} ' $i; done

Linux word in column ends with 'a'

I have text file. I need to print people whose names ends with 'a' and have more than 30 years.
I did this:
awk '{if($4>30)print $1,$2}' New.txt
I don't know how to finish.
New.txt:
Name Lastname City Age
John Smith Verona 12
Barney Stinson York 55
Jessica Alba London 33
Could you please try following, written as per shown samples in GNU awk.
awk '$1~/[aA]$/ && $NF>30{print $1,$2}' Input_file
Explanation: Simply checking condition if 1st field ends with a OR A AND last field is greater than 30 then print that line's first and second fields.
You can try
awk '{if($4>30 && $1 ~ /a$/) print$1, $2}' New.txt

Remove line if the second or second to last character is a space in the first column of a CSV

First of all, I apologise for not giving an example of what I've tried because with this one I really don't know where to begin. It's a job for SED or AWK, that's about as far as I can get.
I would like to remove lines if:
The second character is a space in the first column
The second to last character is a space in the first column
Example input
John Smith|Chicago|IL
J Smith|Chicago|IL
Jane Brown|New York|NY
Jane B|New York|NY
Expected Output
John Smith|Chicago|IL
Jane Brown|New York|NY
The files are | delimited, some contain 4 columns of data, others contain 5 or more (I know it's bad formatting, but it's data collected by a NGO that I'm trying to help them with) but in each case I'd like this to happen just for the first column of the file.
I simply translated your two criteria into regexp and use grep with option -v to remove these patterns
The second character is a space in the first column -> ^[^|]
The second to last character is a space in the first column -> ^[^|]* [^|]\|
grep -Ev '(^[^|] )|(^[^|]* [^|]\|)' <input>
Result:
John Smith|Chicago|IL
Jane Brown|New York|NY
Could you please try following.
awk 'BEGIN{FS=OFS="|"} substr($1,2,1)==" " || substr($1,length($1)-1,1)==" "{next} 1' Input_file
This awk should do:
awk -F\| '{s=split($1,a,"")} !(a[2]==" " || a[s-1]==" ")' file
John Smith|Chicago|IL
Jane Brown|New York|NY
It splits the first field inn to array a and length in s. Then test second and second last if empty.
Easy to read and easy to understand how it works :)
$ awk -F'|' '$1 !~ /^. | .$/' file
John Smith|Chicago|IL
Jane Brown|New York|NY
Smaller version of "Corentin Limier" answer
grep -Ev '(^. )|(^* .\|)' filename
Result:
John Smith|Chicago|IL
Jane Brown|New York|NY
This may also be possible with "sed" command
sed '/^. /d' filename | sed '/ .|/d'

how to perform a search for words in one file against another file and display the first matching word in a line

I have an annoying problem. I have two files.
$ cat file1
Sam
Tom
$ cat file2
I am Sam. Sam I am.
Tom
I am Tom. Tom I am.
File 1 is a word list file whereas file2 is a file containing varying number of columns. I want to perform a search using file 1 against file2, display all possible the first matching word that appear in each line of file2. Thus the result needs to be the following:
Sam (line 1 match)
Tom (line 2 match)
Tom (line 3 match)
If the f2 is the following,
I am Sam. Sam I am.
Tom
I am Tom. Tom I am.
I am Tom. Sam I am.
I am Sam. Tom I am.
I am Sammy.
It needs to display the following:
Sam (1st line match)
Tom (2nd line match)
Tom (3rd line match)
Tom (4th line match)
Sam (4th line match)
Sam (5th line match)
Tom (5th line match)
Sam (6th line match)
I think I need an awk solution since the command "grep -f file1 file2" won't work.
Seems like you want first match from each line:
$ cat f1
Sam
Tom
$ cat f2
I am Sam. Sam I am.
Tom
I am Tom. Tom I am.
I am Tom. Sam I am.
I am Sam. Tom I am.
$ grep -Fnof f1 f2 | sort -t: -u -k1,1n
1:Sam
2:Tom
3:Tom
4:Tom
5:Sam
-n option to display line number which is later used to remove duplicates
-F option to match search terms literally and not as regex
-o to display only matching terms
pipe the output to cut -d: --complement -f1 to remove first column of line numbers
With GNU awk for sorted_in:
$ cat tst.awk
BEGIN { PROCINFO["sorted_in"] = "#val_num_asc" }
NR==FNR { res[$0]; next }
{
delete found
for ( re in res ) {
if ( !(re in found) ) {
if ( match($0,re) ) {
found[re] = RSTART
}
}
}
for ( re in found ) {
printf "%s (line #%d match)\n", re, FNR
}
}
$ awk -f tst.awk file1 file2
Sam (line #1 match)
Tom (line #2 match)
Tom (line #3 match)
Tom (line #4 match)
Sam (line #4 match)
Sam (line #5 match)
Tom (line #5 match)
Sam (line #6 match)
Could you please try following and let me know if this helps you.
awk -F"[. ]" 'FNR==NR{a[$0];next} {for(i=1;i<=NF;i++){if($i in a){print $i;next}}}' Input_file1 Input_file2
Seems grep could be made to work
grep -nof f1 f2 | sort -u
1:Sam
2:Tom
3:Tom
4:Sam
4:Tom
5:Sam
5:Tom
6:Sam

adding common names in the column - awk

Is it possible to print unique names in 1st column by adding the names in the 2nd column like below ? thanx in advance!
input
tony singapore
johnny germany
johnny singapore
output
tony singapore
johnny germany;singapore
try this one-liner:
awk '{a[$1]=$1 in a?a[$1]";"$2:$2}END{for(x in a)print x, a[x]}' file
$ awk '{name2vals[$1] = name2vals[$1] sep[$1] $2; sep[$1] = ";"} END { for (name in name2vals) print name, name2vals[name]}' file
johnny germany;singapore
tony singapore
Here is a cryptic sed variant:
Content of script.sed
$ cat script.sed
:a # Create a label called loop
$!N # If not last line, append the line to pattern space
s/^(([^ ]+ ).*)\n\2/\1;/ # If first column is same append second column to it separated by ;
ta # If the last substitution was successful loop back
P # Print up to the first \n of the current pattern space
D # Delete from current pattern space, up to the \n character
Execution:
$ cat file
tony singapore
johnny germany
johnny singapore
$ sed -rf script.sed file
tony singapore
johnny germany; singapore