Using grep and awk to search and print the output to new file - awk

I have 100 files and want to search a specific word in the first column of each file and print the content of all columns from this word to a new file
I tried this code but doesn't work well it prints only the content of one file not all:
ls -t *.txt > Filelist.tmp
cat Filelist.tmp | while read line do; grep "searchword" | awk '{print $0}' > outputfile.txt; done

This is what you want:
$ awk '$1~/searchword/' *.txt >> output
This compares the first field against searchword and appends the line to output if it matches. The default field separator with awk is whitespace.
The main problem with your attempt is you are overwriting > the file evertime, you want to be using append >>.

Related

How do I merge 2 txt file of different layout into a single file using linux command?

I have 2 txt files.
File 1: columns are A,B,C,F
file 2: Columns are A,B,D,E
Final file should have following columns : A,B,C,D,E,F
How should I do?
You could write a script with the following steps:
first sort the files
sort file1 -o file1sorted
sort file2 -o file2sorted
use the paste command to combine
paste file1sorted file2sorted > combofile
use awk to rearrange the columns
awk -F' ' '{print $1,$2,$3,$5,$6,$4}' combofile > finalfile
The -F is field separator and I have assumed it to be space; if it is a comma you write it as -F,
Each value $1, $2, etc are the order of the fields in the combofile.

awk/sed solution for printing only next line after it matches a pattern

I have multiple files in a folder. This is how a file look like
File1.txt
ghfgh gfghh
dffd kjkjoliukjkj
sdf ffghf
sf 898575
sfkj utiith
##
my data to be extracted
I want to extract the line immediately below "##" pattern from all the files and write them to an output file. I want the file name to be appended too in the output file.
Desired output
>File1
My data to be extracted
>File2
My data to be extracted
>File3
My data to be extracted
This is what i tried
awk '/##/{getline; print FILENAME; print ">"; print}' *.txt > output.txt
assumes one extract per file (otherwise filename header will be repeated)
$ awk '/##/{f=1; next} f{print ">"FILENAME; print; f=0}' *.txt > output.txt
Perl to the rescue!
perl -ne 'print ">$ARGV\n", scalar <> if /^##/' -- *.txt > output.txt
-n reads the input line by line
$ARGV contains the current input file name
scalar <> reads one line from the input
a quick way with grep:
grep -A1 '##' *.txt|grep -v '##' > output.txt
POSIX or GNU sed:
$ sed -n '/^##/{n;p;}' file
my data to be extracted
grep and sed:
$ grep -A 1 '##' file | sed '1d'
my data to be extracted

Giving an argument list as file names extracted from another tab separated file

I have a tab delimited file as below. The first column represents the list of file names without .txt extension which I want to pass as an argument list to another awk command.
File1 abcd xyz 234 pqr
File2 abcd xyz 234 pqr
File3 abcd xyz 234 pqr
File4 abcd xyz 234 pqr
e.g. Assume this is my awk command, I want to pass arguments as
awk -F"\t" '---Commamd-----' File1.txt File2.txt File3.txt File4.txt >> Final.txt
So that it takes each row from 1st column with ".txt" extention as input and create Final.txt output file. It should be noted that number of columns may vary each time.
I thought of creating it in bash script, but I am not able to provide correct arguments and append next row from 1st column as next argument.
Going by my understanding of your requirements, you want to use the tab-separated file to get the file names on column 1 and you want to add .txt extension to them and pass it to another file. Firstly use mapfile to get the names from the tab-separated file
mapfile -t fileNames < <(awk -v FS="\t" '{print $1}' tabfile)
Now to pass this as an argument list to another function, all you need to do is use this quoted array by suffixing the .txt extension to it
awk ... "${fileNames[#]/%/.txt}"
Not completely sure here as it is not clear. Based on your statement that you want to get file names from 1 awk and pass it to another awk following could be tried.
awk '{print $0}' <(awk 'NF{print $1".txt"}' Input_file)
So in spite of print $0 you could do your operations here, I just printed it to see if file names are coming proper or not. Also add -F="\t" in 2nd awk in case your Input_file is TAB delimited and could change $1 to any other field in case file names are not on first column.
You can try this awk
awk '{file=$1".txt";while (getline<file == 1)print $2}' infile
append .txt on all $1 of infile to get the filename like File2.txt
print $2 of this file if it exist.

How to write awk -F commands

#!/bin/bash
cat $1 | awk ' /Info/, /<\/Body>/ {print $0}' | while read line; do
file=`awk -F '>' "{print $4}"`
echo "$file"
done
Basically the file input file has some information removed in the first line of awk. Now what I'm trying to do is find a variable using awk -F and print what comes after the > which is the 4th field. Now I can not just search for the > because the file has 100s of them because it's html.
ok maybe someone can answer this so when i run the file now it does not look at the forth field however it just removes all of '>' which is not the goal i am trying to locate the field that comes after the 3 '>' so that would be field 4 but thats not what im getting? any help would be great!
AWK requires 2 parts
Options
File to work on
In your example, you have given options i.e. delimiter, what to print, but you have not mentioned which file to work with
Try this
On the command prompt
cat file | awk -F ">" '{print $4}'
In script
result=`cat file | awk -F ">" '{print $4}'`
echo $result
For the text file as "file" containing data as
a>b>c>d>e>f
Both the above will display 'd'.

Replace a string in each occurence of string in a file, add additional line at first line in that file

I did search and found how to replace each occurrence of a string in files. Besides that I want to add one line to a file only at the first occurrence of the string.
I know this
grep -rl 'windows' ./ | xargs sed -i 's/windows/linux/g'
will replace each occurrence of string. So how do I add a line to that file at first match of the string? Can any have an idea how to do that? Appreciate your time.
Edited :
Exaple : replace xxx with TTT in file, add a line at starting of file for first match.
Input : file1, file2.
file1
abc xxx pp
xxxy rrr
aaaaaaaaaaaddddd
file2
aaaaaaaaaaaddddd
Output
file1
#ADD LINE HERE FOR FIRST MATCH DONT ADD FOR REST OF MATCHES
abc TTT pp
TTTy rrr
aaaaaaaaaaaddddd
file2
aaaaaaaaaaaddddd
Cribbing from the answers to this question.
Something like this would seem to work:
sed -e '0,/windows/{s/windows/linux/; p; T e; a \new line
;:e;d}; s/windows/linux/g'
From start of the file to the first match of /windows/ do:
replace windows with linux
print the line
if s/windows/linux/ did not replace anything jump to label e
add the line new line
create label e
delete the current pattern space, read the next line and start processing again
Alternatively:
awk '{s=$0; gsub(/windows/, "linux")} 7; (s ~ /windows/) && !w {w=1; print "new line"}' file
save the line in s
replace windows with linux
print the line (7 is true and any true pattern runs the default action of {print})
if the original line contained windows and w is false (variables are empty strings by default and empty strings are false-y in awk)
set w to 1 (truth-y value)
add the new line
If I understand you correctly, all you need is:
find . -type f -print |
while IFS= read -r file; do
awk 'gsub(/windows/,"unix"){if (!f) $0 = $0 ORS "an added line"; f=1} 1' "$file" > tmp &&
mv tmp "$file"
done
Note that the above, like sed and grep would, is working with REs, not strings. To use strings would require the use of index() and substr() in awk, is not possible with sed, and with grep requires an extra flag.
To add a leading line to the file if a change is made using gNU awk for multi-char RS (and we may as well do sed-like inplace editing since we're using gawk):
find . -type f -print |
while IFS= read -r file; do
gawk -i inplace -v RS='^$' -v ORS= 'gsub(/windows/,"unix"){print "an added line"} 1' "$file"
done