Parsing a file ,extract contents and perform operation on extracted contents

Parsing a file ,extract contents and perform operation on extracted contents - file-io

I have an HTML file with plenty of contents in it. I want to extract specific lines from it.
Ex:
I want to extract multiple lines which has this specific content "class="red"
<tr class="even"><td>***FRQ\AUTO\spml-hlr601\FC122_005036_PDPContext\DB8PD073\BulkPDPModreq***</a><td align='center' **class="red"**></tr>
Once i extract this line, i would want this string: FRQ\AUTO\spml-hlr601\FC122_005036_PDPContext\DB8PD073\BulkPDPModreq.
This string is a directory name and i would want to copy the contents from this directory to specific directory (/home/user)
I would want to do this operation for all the occurences of the lines which have the text "class=red"
Would like to do this with sed.

This will work for the sample you provided. I also assumed that the * part of the interesting string is not really part of your input file, but if it is this will need tweaking:
$ cat foo.html
foo
<tr class="even"><td>FRQ\AUTO\spml-hlr601\FC122_005036_PDPContext\DB8PD073\BulkPDPModreq1</a><td align='center' class="red"></tr>
<tr class="even"><td>FRQ\AUTO\spml-hlr601\FC122_005036_PDPContext\DB8PD073\BulkPDPModreq2</a><td align='center' class="red"></tr>
bar
<tr class="even"><td>FRQ\AUTO\spml-hlr601\FC122_005036_PDPContext\DB8PD073\BulkPDPModreq3</a><td align='center' class="red"></tr>
<tr class="even"><td>FRQ\AUTO\spml-hlr601\FC122_005036_PDPContext\DB8PD073\BulkPDPModreq4</a><td align='center' class="red"></tr>
quux
.
$ grep 'class="red"' foo.html \
| sed 's#.*<td>##g;s#</a>.*##g;s#\\#/#g' \
| xargs -I% echo cp -r /home/hlrci/%/* /home/hlrci/CopyReq/
cp -r /home/hlrci/FRQ/AUTO/spml-hlr601/FC122_005036_PDPContext/DB8PD073/BulkPDPModreq1/* /home/hlrci/CopyReq/
cp -r /home/hlrci/FRQ/AUTO/spml-hlr601/FC122_005036_PDPContext/DB8PD073/BulkPDPModreq2/* /home/hlrci/CopyReq/
cp -r /home/hlrci/FRQ/AUTO/spml-hlr601/FC122_005036_PDPContext/DB8PD073/BulkPDPModreq3/* /home/hlrci/CopyReq/
cp -r /home/hlrci/FRQ/AUTO/spml-hlr601/FC122_005036_PDPContext/DB8PD073/BulkPDPModreq4/* /home/hlrci/CopyReq/
This searches for class="red" in foo.html (grep), removes everything up to and including <td> and everything starting from and including </a> on each line (sed), reads each line and crafts a cp command around it to copy your files. Depending on the input file and your situation and preference you might have/want to
adapt the sed regex or make it more specific
use something else than cp to actually copy your stuff (tar, cpio, rsync, ...)
Dryrun with echo and if you are happy with the output remove the echo and rerun.

Related

Extra data after end of document

I have a problem inserting XML file. I have to create directory and passing from it. My XML file looks like this:
<?xml version = "1.0" encoding="utf-8"?>
<a>
<date>20221011</date>
</a>
<b>
<b1>
<field1>092010</field1>
</b1>
<b1>
<field1>093456</field1>
</b1>
....
</b>
I want to import <field1>'s data into my table, not <a>'s. But of course there's an error for multiple roots. I want to remove or delete <a> tag or add <a> and <b> tags into a new root.
What should I do?

In case you are on a *nix OS, something like this sed command would work:
sed -i -e '2i<fake>' -e '$a</fake>' input.xml
Runs 2 commands on a file, insert <fake> as the second line, then append </fake> to the end of the named file.
CAVEAT: There is no checking first to see if you need the tags before adding them.
Take it to the next level by using xargs to run this on all xml files in the current directory:
ls *.xml | xargs -I{} sed -i -e '2i<fake>' -e '$a</fake>' {}
xargs runs the command once for every file output by ls. The {} is replaced with the current filename in this case.

How to place quotes mark in ansible task with grep, awk, sed

My task search for config in CMD column to gather information what is directory of application config and also PID.
---
- hosts: all
pre_tasks:
- name: Check if process is running
become: yes
shell: 'ps -e --format="pid cmd" | grep process.cfg | sed -e "s/[[:space:]]\+/ /g"| grep -v color'
register: proces_out
output looks like this after this command:
32423 /var/local/bin/application -c /var/local/etc/process.cfg
But i think ansible have troubles with 2 greps in 1 command. I need them both because if i dont use reversed "grep -v color" this anoying thing appears "grep --color=auto ", i cant cut out PID that i need in another task which kills process because real process is in second line.
My second idea was to use AWK, which i think would be the best tool for this case, but if i use double quotation marks in --format parameter and in SED command and the single quotation mark in awk parameters they dont want to cooperate. Even if i keep them balanced they interfere with them selfs.
AWK idea:
shell: 'ps -e --format="pid cmd" | grep process.cfg | sed -e "s/[[:space:]]\+/ /g"| awk 'FNR == 2''
I want to ask for a hint what would be the best to avoid incompatibility in code and be able to use it after as a output in variable
## PID
{{ proces_out.stdout.split(' ')[0] }}
## application
{{ proces_out.stdout.split(' ')[1] }}
## config
{{ proces_out.stdout.split(' ')[3] }}

But i think ansible have troubles with 2 greps in 1 command
That is for sure not true
if i dont use reversed "grep -v color" this anoying thing appears "grep --color=auto ", i cant cut out PID that i need in another task which kills process because real process is in second line.
You are running into the classic case of the grep process matching its own regex, as will happen in a lot of "simple" cases. What you want is a regex that matches your string but does not match itself. In that example above it would be:
shell: 'ps -e --format="pid cmd" | grep process[.]cfg | sed -e "s/[[:space:]]\+/ /g"'
because process[.]cfg matches process.cfg but does not match process[.]cfg I also fixed your regex because in a regex, the . means any character, which doesn't appear to be what you really wanted to happen
With regard to that --color bit, you can likely can side-step that nonsense by using the full path to grep, which will cause bash to really execute the binary, versus some alias that uses --color=auto; I actually wouldn't have expected the colors to show up in an ansible run, because it's not the right $TERM but systems are weird

Thank you Matthew for that solution, but i found diffirent option to avoid unnessesery output.
So syntax is almost the same, but i added to --format addonational parameter ppid Parent process id, in most case i belive parent process always have number 1 in output which helps to sort it as i want to.
It look like this:
shell: >
ps -e --format="ppid pid cmd" |
grep process.cfg |
sed -e "s/[[:space:]]\+/ /g"
register: output_process
And output looks like this:
1 54345 /var/local/bin/application -c /var/local/etc/process.cfg
6435 6577 grep --color=auto process.cfg
Now its easy we can use ansible modules to sort it:
- name: Kill process
become: yes
shell: "kill {{ output_process.stdout_lines[0].split(' ')[2] }}"
What it does? it selects line 0 which is first line, splits output between spaces and selects 3rd phrase. In output theres :space: before ppid thats why PID is 3rd
Thank you again for your solution Matthew, it might be helpfull in another case.

Search file contents recursively when know where in file

I am interested in efficiently searching files for content using bash and related tools (eg sed, grep), in the specific case that I have additional information about where in the file the intended content is. For example, I want to replace a particular string in line #3 of each file that contains a specific string on line 3 of the file. Therefore, I don't want to do a recursive grep -r on the whole directory as that would search the entirety of each file, wasting time since I know that the string of interest is on line #3, if it is there. This full-grep approach could be done with grep -rl 'string_to_find_in_files' base_directory_to_search_recursively. Instead I am thinking about using sed -i ".bak" '3s/string_to_replace/string_to_replace_with' files to search only on line #3 of all files recursively in a directory, however sed seems to only be able to take one file as input argument. How can I apply sed to multiple files recursively? find -exec {} \; and find -print0 | xargs -0 seem to be very slow.. Is there a faster method than using find? I can achieve the desired effect very quickly with awk but only on a single directory, it does not seem to me to be recursive, such as using awk 'FNR==3{print $0}' directory/*. Any way to make this recursive? Thanks.

You can use find to have the list of files and feed to sed or awk one by one by xargs
for example, this will print the first lines of all files listed by find.
$ find . -name "*.csv" | xargs -L 1 sed -n '1p'

zcat a file, output its contents to another file based on original filename

I'm looking to create a bash/perl script in Linux that will restore .gz files based on filename:
_path_to_file.txt.gz
_path_to_another_file.conf.gz
Where the underscores form the directory structure.. so the two above would be:
/path/to/file.txt
/path/to/another/file.conf
These are all in the /backup/ directory..
I want to write a script that will cat each .gz file into its correct location by changing the _ to / to find the correct path - so that the contents of _path_to_another_file.conf.gz replaces the text in /path/to/another/file.conf
zcat _path_to_another_file.conf.gz > /path/to/another/file.conf
I've started by creating a file with the correct destination filenames in it.. I could create another file to list the original filenames in it and have the script go through line by line?
ls /backup/ |grep .gz > /backup/backup_files && sed -i 's,_,\/,g' /backup/backup_files && cat /backup/backup_files
Whatcha think?

Here's a Bash script that should do what you want :
#!/bin/bash
for f in *.gz; do
n=$(echo $f | tr _ /)
zcat $f > ${n%.*}
done
It loops over all files that end with .gz, and extracts them into the path represented by their filename with _ replaced with /.

That's not necessarily an invertible mapping (what if the original file is named high_scores for instance? is that encoded specially, e.g., with double underscore as high__scores.gz?) but if you just want to take a name and translate _ to / and remove .gz at the end, sed will do it:
for name in /backup/*.gz; do
newname=$(echo "$name" |
sed -e 's,^/backup/,,' \
-e 's,_,/,g' \
-e 's/\.gz$//')
echo "zcat $name > $newname"
done
Make sure it works right (the above is completely untested!) then take out the echo, leaving:
zcat "$name" > "$newname"
(the quotes protect against white space in the names).

How do I iterate over all the lines output by a command in zsh?

How do I iterate over all the lines output by a command using zsh, without setting IFS?
The reason is that I want to run a command against every file output by a command, and some of these files contain spaces.
Eg, given the deleted file:
foo/bar baz/gamma
That is, a single directory 'foo', containing a sub directory 'bar baz', containing a file 'gamma'.
Then running:
git ls-files --deleted | xargs ls
Will report in that file being handled as two files: 'foo/bar', and '/baz/gamma'.
I need it to handle it as one file: 'foo/bar baz/gamma'.

If you want to run the command once for all the lines:
ls "${(#f)$(git ls-files --deleted)}"
The f parameter expansion flag means to split the command's output on newlines. There's a more general form (#s:||:) to split at an arbitrary string like ||. The # flag means to retain empty records. Somewhat confusingly, the whole expansion needs to be inside double quotes, to avoid IFS splitting on the output of the command substitution, but it will produce separate words for each record.
If you want to run the command for each line in turn, the portable idiom isn't particularly complicated:
git ls-filed --deleted | while IFS= read -r line; do ls $line; done
If you want to run the command as few times as the command line length limit permits, use zargs.
autoload -U zargs
zargs -- "${(#f)$(git ls-files --deleted)}" -- ls

Using tr and the -0 option of xargs, assuming that the lines don't contain \000 (NUL), which is a fair assumption due to NUL being one of the characters that can't appear in filenames:
git ls-files --deleted | tr '\n' '\000' | xargs -0 ls
this turns the line: foo/bar baz/gamma\n into foo/bar baz/gamma\000 which xargs -0 knows how to handle

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Parsing a file ,extract contents and perform operation on extracted contents - file-io

Related

Extra data after end of document

How to place quotes mark in ansible task with grep, awk, sed

Search file contents recursively when know where in file

zcat a file, output its contents to another file based on original filename

How do I iterate over all the lines output by a command in zsh?

Categories

Resources