Using sed, Insert a line above or below the pattern? [duplicate] - scripting

This question already has answers here:
How to insert a line using sed before a pattern and after a line number?
(5 answers)
Closed 9 years ago.
I need to edit a good number of files, by inserting a line or multiple lines either right below a unique pattern or above it. Please advise on how to do that using sed, awk, perl (or anything else) in a shell. Thanks! Example:
some text
lorem ipsum dolor sit amet
more text
I want to insert consectetur adipiscing elit after lorem ipsum dolor sit amet, so the output file will look like:
some text
lorem ipsum dolor sit amet
consectetur adipiscing elit
more text

To append after the pattern: (-i is for in place replace). line1 and line2 are the lines you want to append(or prepend)
sed -i '/pattern/a \
line1 \
line2' inputfile
Output:
#cat inputfile
pattern
line1 line2
To prepend the lines before:
sed -i '/pattern/i \
line1 \
line2' inputfile
Output:
#cat inputfile
line1 line2
pattern

The following adds one line after SearchPattern.
sed -i '/SearchPattern/aNew Text' SomeFile.txt
It inserts New Text one line below each line that contains SearchPattern.
To add two lines, you can use a \ and enter a newline while typing New Text.
POSIX sed requires a \ and a newline after the a sed function. [1]
Specifying the text to append without the newline is a GNU sed extension (as documented in the sed info page), so its usage is not as portable.
[1] https://unix.stackexchange.com/questions/52131/sed-on-osx-insert-at-a-certain-line/

Insert a new verse after the given verse in your stanza:
sed -i '/^lorem ipsum dolor sit amet$/ s:$:\nconsectetur adipiscing elit:' FILE

More portable to use ed; some systems don't support \n in sed
printf "/^lorem ipsum dolor sit amet/a\nconsectetur adipiscing elit\n.\nw\nq\n" |\
/bin/ed $filename

Related

Using "grep" command to find Crypto Seed Phrase

Let’s say I have my 24 word crypto backup phrase somewhere on my PC and I don’t know where. it’s a total of 2048 words or so.
How can I use grep to print all/any files containing at least 2 words in given string? I found how to print with?
grep 'extra|value' but that’s for 2 words and they both must be found. how found I grep or whatever command to find any file containing at least 2 words from given string of 2048 words. thanks!
grep 'extra|value'
You cannot use a single grep run to find two different words potentially on different lines. But you can first list all the files containing one word and then search only those for the second one:
find / -type f -exec grep -l 'extra' {} + | xargs grep 'value'
2 words and they both must be found
I would harness GNU grep for this task following way
grep --perl-regexp --recursive --null-data '(extra[.\n]*value)|(value[.\n]*extra)' .
Explanation: I start search from current directory (.) and traverse all subdirs (--recursive) looking for files which have (extra followed by zero-or-more any characters followed by value) OR (value followed by zero-or-more any characters followed by extra. I use --perl-regexp combined with --null-data combined with \n to allow words being in different lines. Consult grep man page if you need further explanation of options used.
Use find + awk
find / -type f -exec awk 'FNR==1{a=b=0} /extra/{a=1} /value/{b=1} a&&b{print FILENAME; nextfile}' {} +`
That requires an awk that has nextfile which most do these days. If yours doesn't then pipe the output to sort -u or uniq to ensure unique file names.
From man grep (GNU grep and BSD grep)
-E, --extended-regexp
Interpret PATTERNS as extended regular expressions (EREs,
see below).
...
grep understands three different versions of regular expression
syntax: “basic” (BRE), “extended” (ERE) ...
That includes the use of logical "or" | in the search pattern.
-n: print line numbers (somewhat guarantees : as record sep)
-o: only print matches (more than one hit on same line)
-H: print matching files names
The awk prints matched files with more than 1 hit.
% str="labore|dolor"
% grep -EnoH "${str}" {file,file2} |
awk -F ':' 'NF>1{x = $1} {mat[x,$NF]++}
END{for(i in mat){split(i, arr, SUBSEP); a[arr[1]]++};
for(i in a){if(a[i] > 1){print i}}}'
file
include -w to only match whole words.
Data
% cat file
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor
incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis
nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu
fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in
labore labore labore culpa qui officia deserunt mollit anim id est laborum.
% cat file2
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor

Why doesn't pandoc convert a plaintext file to PDF properly?

Commands tried:
pandoc -V 'fontfamily:Courier' --variable mainfont="Courier" --pdf-engine=pdflatex 1.txt -o 1.pdf
pandoc -V 'fontfamily:Courier' --variable mainfont="Courier" --pdf-engine=lualatex 1.txt -o 2.pdf
pandoc -V 'fontfamily:Courier' --variable mainfont="Courier" --pdf-engine=xelatex 1.txt -o 3.pdf
pandoc -V 'fontfamily:Courier' --variable mainfont="Courier" --pdf-engine=latexmk 1.txt -o 4.pdf
pandoc -V 'fontfamily:Courier' --variable mainfont="Courier" --pdf-engine=tectonic 1.txt -o 5.pdf
pandoc -V 'fontfamily:Courier' --variable mainfont="Courier" --pdf-engine=wkhtmltopdf 1.txt -o 6.pdf
pandoc -V 'fontfamily:Courier' --variable mainfont="Courier" --pdf-engine=weasyprint 1.txt -o 7.pdf
pandoc -V 'fontfamily:Courier' --variable mainfont="Courier" --pdf-engine=prince 1.txt -o 8.pdf
pandoc -V 'fontfamily:Courier' --variable mainfont="Courier" --pdf-engine=context 1.txt -o 9.pdf
pandoc -V 'fontfamily:Courier' --variable mainfont="Courier" --pdf-engine=pdfroff 1.txt -o 10.pdf
Contents of 1.txt:
--------------------------------------------------------------------------------
Left Right
--------------------------------------------------------------------------------
Lorem ipsum whatever. Lorem ipsum whatever. Lorem ipsum whatever. Lorem ipsum 1
whatever. Lorem ipsum whatever. Lorem ipsum whatever. Lorem ipsum whatever. 2
Lorem ipsum whatever. Lorem ipsum whatever. Lorem ipsum whatever. Lorem ipsum 3
whatever. Lorem ipsum whatever. Lorem ipsum whatever. Lorem ipsum whatever. 4
Lorem ipsum whatever. Lorem ipsum whatever. Lorem ipsum whatever. 5
--------------------------------------------------------------------------------
Results:
Out of all those allegedly supported "engines", only the first and third produce any PDF at all (the others just dump a bunch of nonsensical errors). And those two that do produce PDFs, produce horribly butchered ones:
"pdflatex" (the first command) entirely ignores the specified font, so it's completely useless.
"xelatex" (the third command) seems to be mostly using the right font, but seemingly deletes all the spaces between "Left" and "Right", morphs the "-"s into straight lines (that's not how that font looks...) and messes up the lines completely so that the numbers on the last columns are not aligned to the right, and has crammed the entire contents into the middle of the page instead of, as expected, near the top-left corner:
screenshot of the xelatex-produced PDF
I have spent enormous amounts of times hunting for options and trying a million variations of the above commands, but it seems like this tool is fundamentally broken. I have no idea how others (apparently) use these tools, but they just don't work. It's impossible to convert a text file to PDF...
Pandoc is not broken; it is doing just what its documentation says it will do. Pandoc treats your input file as Markdown with pandoc extensions (since you didn't specify a format). What you have here is a one-column simple table (since there is no break in the line of ----s to indicate a column break).
If what you want is a rendering of this context as verbatim text in a PDF, you could use e.g. enscript 1.txt --output=- | ps2pdf - > 1.pdf. If you want to do it using pandoc, then the easiest way is to put the content inside backtick fences so that it is treated as a markdown verbatim block. One way to do this would be to modify your file, but you could also do it by creating a file ticks.txt containing just
```
and then run
pandoc ticks.txt 1.txt ticks.txt -o 1.pdf

Get the line number of the last line with non-blank characters

I have a file which has the following content:
10 tiny toes
tree
this is that tree
5 funny 0
There are spaces at the end of the file. I want to get the line number of the last row of a file (that has characters). How do I do that in SED?
This is easily done with awk,
awk 'NF{c=FNR}END{print c}' file
With sed it is more tricky. You can use the = operator but this will print the line-number to standard out and not in the pattern space. So you cannot manipulate it. If you want to use sed, you'll have to pipe it to another or use tail:
sed -n '/^[[:blank:]]*$/!=' file | tail -1
You can use following pseudo-code:
Replace all spaces by empty string
Remove all <beginning_of_line><end_of_line> (the lines, only containing spaces, will be removed like this)
Count the number of remaining lines in your file
It's tough to count line numbers in sed. Some versions of sed give you the = operator, but it's not standard. You could use an external tool to generate line numbers and do something like:
nl -s ' ' -n ln -ba input | sed -n 's/^\(......\)...*/\1/p' | sed -n '$p'
but if you're going to do that you might as well just use awk.
This might work for you (GNU sed):
sed -n '/\S/=' file | sed -n '$p'
For all lines that contain a non white space character, print a line number. Pipe this output to second invocation of sed and print only the last line.
Alternative:
grep -n '\S' file | sed -n '$s/:.*//p'

How count words of plain text in PostgreSQL?

I have a few columns with html strings in a postgres 9.5 database. I want to count the words without html tags and their values to get the length of the plain text for each row.
Is there a stored procedure or another way to do this?
Edit:
existing sample text in one field:
<p>Lorem Ipsum: </p><p><br/></p><p align="center"><img src="d9b4c473-08ac-4cd8-883d-86ac30ee9044.png" width="287" height="192"/></p><p><br/></p><p>Lorem ipsum dolor sit amet, <span style="font-weight:bold;color:#86b920">consetetur</span> sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut l. </p><p><br/></p><p><br/></p><p><br/></p>
expected output for this text:
Lorem Ipsum: Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut l.
At best an additional column with a word count of this text
You can do it by using regexp_split_to_table - with the right regex expression you can break all words from the html and return them as a table.

Use Output from one file to delete or outcomment lines in another file using sed / awk

So i have a gigantic file (file1) where i need to delete or outcomment specific lines, this file could look something like this:
Lorem ipsum **abc** dolor sit amet,
consectetur adipiscing elit.
Cras finibus **123** laoreet dignissim.
Curabitur dignissim auctor tortor a cursus.
Nullam sapien ante, tempor eu rutrum
...
for this i have file2 which contains strings which i need to locate lines in file1
file2 could look like this:
abc
123
xyz
098
...
Now, when a string from file2 is found, the line, in file1, where it is found + the line directly beneath it, should be outcommented or deleted.
so that if 123 is found in the above example, it should delete these two lines (marked with --> ):
Lorem ipsum abc dolor sit amet,
consectetur adipiscing elit.
--> Cras finibus 123 laoreet dignissim.
--> Curabitur dignissim auctor tortor a cursus.
Nullam sapien ante, tempor eu rutrum
...
I hope this makes sense
I was fiddeling around with sed and awk, but never got it to work
Something like this would work:
awk 'NR==FNR{a[$0]; next}p{p=0;next}{for(i in a)if(p = $0 ~ i)next}1' file2 file1
Populate the array a with the lines in file2. The first block only applies to file2 because the total record number NR is equal to the record number of the current file FNR. next skips the rest of the blocks.
For each line of file1, loop through the keys in array a. If the current line matches the key, skip the line in the output. Also assign p the true value. For lines where p is true, set it back to false but skip the line in the output. The 1 at the end is always true, so any line that has made it this far is printed, as the default action is to print the line.
This might work for you (GNU sed):
sed 's|.*|/&/{N;d}|' file2 | sed -f - file1 >file3
Create a sed script from file2 and run it against file1 saving the results in file3.