How to get all text from HTML code using awk or grep or sed? - awk

Here is an example of html code I have in a file:
+ "name": "Product 1",
- <title>Cool</title>
+ Nice music
- Great music
- <a class="button" href="/url" data-name="buy">Buy <span class="product">Product 1</span></a>
+ <a class="button" href="/index" data-name="buy">Buy <span class="product">Product 1</span></a>
- Product (1st generation)
+ Product <br class="break" />(1st generation)
- Cool
+ Not cool
- <li> Nice > cool > good. ok.</li>
+ <li> Nice > cool > good. Tap.</li>
+<meta property="track" content="page_1" />
+<meta property="track" content="page_2" />
+ "name": "Product 2",
Expected output:
+ "name": "Product 1",
- Cool
+ Nice music
- Great music
- Buy Product 1
+ Buy Product 1
- Product (1st generation)
+ Product (1st generation)
- Cool
+ Not cool
- Nice > cool > good. ok.
+ Nice > cool > good. Tap.
+ "name": "Product 2",
Here is my code
awk 'BEGIN{RS="<";FS=">"}NF>1{printf "%s", $NF ""}'
I get following result which is not expected:
Cool
+ Nice music
- Great music
- Buy Product 1
+ Buy Product 1
- Product (1st generation)
+ Product (1st generation)
- Cool
+ Not cool
- good. ok.
+ good. Tap.
+
+
+ "name": "Product 2",
How to get expected result using grep or awk or sed? It should be able to get all text file from a html page.

This might work for you (GNU sed):
sed 's/<[^<>]*>//g;/^[+-]\s*$/d' file
Remove all tags.
Remove empty lines.

I would harness GNU AWK for this task following way, let file.txt content be
+ "name": "Product 1",
- <title>Cool</title>
+ Nice music
- Great music
- <a class="button" href="/url" data-name="buy">Buy <span class="product">Product 1</span></a>
+ <a class="button" href="/index" data-name="buy">Buy <span class="product">Product 1</span></a>
- Product (1st generation)
+ Product <br class="break" />(1st generation)
- Cool
+ Not cool
- <li> Nice > cool > good. ok.</li>
+ <li> Nice > cool > good. Tap.</li>
+<meta property="track" content="page_1" />
+<meta property="track" content="page_2" />
+ "name": "Product 2",
then
awk 'BEGIN{FS="<[^>]*>";OFS=""}{$1=$1}/.../' file.txt
gives output
+ "name": "Product 1",
- Cool
+ Nice music
- Great music
- Buy Product 1
+ Buy Product 1
- Product (1st generation)
+ Product (1st generation)
- Cool
+ Not cool
- Nice > cool > good. ok.
+ Nice > cool > good. Tap.
+ "name": "Product 2",
Explanation: I inform GNU AWK that field separator (FS) is < followed by zero or more (*) not (^) > followed by > and output field separator (OFS) is empty string. For each line I do $1=$1 to trigger string rebuilt. Then I select line which matches ... expression, that is only lines having at least 3 characters. Disclaimer: HTML is not Chomsky Type 3 contraption and therefore can not be robustly parsed using regular expression, proposed code use certain heuristic which hopefully will work well enough with data you want to process.
(tested in gawk 4.2.1)

Related

How to insert conditional text in a p tag on a cshtml page

I Have this portion of cshtml page part of my Razor application:
<p style="text-align: justify">
thank you for confirming your reservation for the <b>#{ #Model.HotelChoiceDescription.Substring(#Model.HotelChoiceDescription.IndexOf("_") + 1) }</b>
</p>
if (!string.IsNullOrEmpty(Model.TypeOfRoomDescription))
{
<!-- Verify if english -->
if (Model.TypeOfRoomDescription.Contains("#STARTENG#") && Model.TypeOfRoomDescription.Contains("#ENDENG#"))
{
<p style="text-align: justify">
#(new HtmlString(Model.TypeOfRoomDescription.Substring(Model.TypeOfRoomDescription.IndexOf("#STARTENG#") + 10, Model.TypeOfRoomDescription.IndexOf("#ENDENG#") - Model.TypeOfRoomDescription.IndexOf("#STARTENG#") - 10))) for the period #Model.ReservationDate
</p>
}
<!-- default -->
if (!Model.TypeOfRoomDescription.Contains("#STARTENG#") && !Model.TypeOfRoomDescription.Contains("#STARTITA#"))
{
<p style="text-align: justify">
#(new HtmlString(Model.TypeOfRoomDescription)) for the period #Model.ReservationDate
</p>
}
}
The result is:
thank you for confirming your reservation for the Giant Hotel
Single room (eur. 44,00/night) for the period August 20, 2022 - August 27, 2022.
But to reduce the spaces and consequently the pages to be printed, the best result would be this:
thank you for confirming your reservation for the Giant Hotel Single room (eur. 44,00/night) for the period August 20, 2022 - August 27, 2022.
All the text in the same paragraph
I tried to put everything inside the first tag p but it also prints the if and the "{"
I also tried entering #if, but it doesn't load the page because it goes in error
You can try to only use a <p></p>:
<p style="text-align: justify">
thank you for confirming your reservation for the <b>#{ #Model.HotelChoiceDescription.Substring(#Model.HotelChoiceDescription.IndexOf("_") + 1) }</b>
#if (!string.IsNullOrEmpty(Model.TypeOfRoomDescription))
{
<!-- Verify if english -->
if (Model.TypeOfRoomDescription.Contains("#STARTENG#") && Model.TypeOfRoomDescription.Contains("#ENDENG#"))
{
#(new HtmlString(Model.TypeOfRoomDescription.Substring(Model.TypeOfRoomDescription.IndexOf("#STARTENG#") + 10, Model.TypeOfRoomDescription.IndexOf("#ENDENG#") - Model.TypeOfRoomDescription.IndexOf("#STARTENG#") - 10)+" for the period "+Model.ReservationDate))
}
<!-- default -->
if (!Model.TypeOfRoomDescription.Contains("#STARTENG#") && !Model.TypeOfRoomDescription.Contains("#STARTITA#"))
{
#(new HtmlString(Model.TypeOfRoomDescription+" for the period "+Model.ReservationDate))
}
}
</p>
then the result will be in a line:
thank you for confirming your reservation for the Giant Hotel Single room (eur. 44,00/night) for the period August 20, 2022 - August 27, 2022.

Long live for a selenium script?

Let's say I have a webpage that I need to have some selenium script to automate the UI testing. This page has a list of sections, in the order of "Section A", "Section B", "Section C", etc. Here is my piece of code to automate the steps on "Section B" area testing.
by = By.CSS_SELECTOR
dropdown_in_SectionB = "#app > div.app > div > div > div.content > div:nth-child(2) > div > div:nth-child(7) > ..."
WebDriverWait(self.driver, 20).until(EC.presence_of_element_located((by, target))).click()
Unfortunately, I can not find a XPATH value for this "drowdown_in_SectionB" area, and I have to use css_selector value here. And it works ok for this second.
But later on, this page has been updated, the updated page now contains "Section A", "SectionX", "Section B", "Section C", in such order. And my above script is broken, as the css_selector value of drowpdown_in_Section becomes:
dropdown_in_SectionB = #app > div.app > div > div > div.content > div:nth-child(2) > div > div:nth-child(8) > ..."
So I have to update the script to have it back to working, which is annoyed.
My question is how may I find a way that the script is smart enough to locate the location in its "css_selector" value, however the change is ?
If it was of XPATH value, I can easily have it solved, as I have other clues to trace the dropdown area in Section, but how this can be done in case of css_selector value ???
Thanks,
Jack

Robot Framework for loop - selector always returning same element

I have an unordered list, from which I need to get the title from each element.
li class="chart-flyout-option" ng-repeat="option in filteredOptions = ($ctrl.options | filter: { value: $ctrl.filterText })" ng-class="{'chart-flyout-option-selected': option.key == $ctrl.selectedOption.key}"><a class="chart-flyout-link" ng-bind-html="option.value | highlight:$ctrl.filterText" title="HERE IS TITLE 1" ng-click="$ctrl.select(option.key, $event)" data-key="126">HERE IS TITLE 1</a></li>
Next element would be similar, but assume - HERE IS TITLE 2
I have a for loop, which looks like this:
Get Flyout Entry Child Links
${numflyoutentries}= Get Number Chart Flyout Entries
${numflyoutentries}= Evaluate ${numflyoutentries} + ${1}
: FOR ${entry} IN RANGE 1 ${numflyoutentries}
\ ${label}= Get Text class:chart-flyout-link
\ Log ${label}
However this only ever returns the text "HERE IS TITLE 1".
Why doesn't my selector increment?

HAML how to craft this line?

I got this code
%p.date= "Submitted #{time_ago_in_words(#post.created_at)} ago | " |
= link_to "Edit Post", edit_post_path(#post)
I get unexpected result:
<p class="date">Submitted 10 minutes ago | </p>
Edit Post
I want to get a tag inside p tag:
<p class="date">Submitted 10 minutes ago | Edit Post</p>
I have also tried this:
%p.date= "Submitted #{time_ago_in_words(#post.created_at)} ago | "
= link_to "Edit Post", edit_post_path(#post)
and this:
%p.date= "Submitted #{time_ago_in_words(#post.created_at)} ago "
= "| #{link_to "Edit Post", edit_post_path(#post)}"
In both cases I get same error
Illegal nesting: content can't be both given on the same line as %p and nested within it.
It seems that haml isn't aware that ending | is withing ruby string?
How to fix this?
Break them in next line with indentation to nest them:
%p.date
= "Submitted #{time_ago_in_words(#post.created_at)} ago | "
= link_to "Edit Post", edit_post_path(#post)

Prestashop - Show the prices are tax excluded for visitors

I am on Prestashop 1.6
When a user is a guest or a member, the mention "tax excl." is displayed next to the price in the products sheets.
Like this :
I would like this text to appear also for the visitors.
For visitors, it currently shoows like this :
So go to Client -> Group and for all groups choose as price display the HT.
Regards,
In Back office, customers > groups choose the visitor group click to modify button, change the tax display to tax incl, if it is like that, go to admin coutries choose your country, click to modify button and enable tax showing
think I found the solution. In product.tpl, replace
<span id="our_price_display" class="price" itemprop="price" content="{$productPrice}">{convertPrice price=$productPrice|floatval}</span>
{if $tax_enabled && ((isset($display_tax_label) && $display_tax_label == 1) || !isset($display_tax_label))}
{if $priceDisplay == 1} {l s='tax excl.'}{else} {l s='tax incl.'}{/if}
{ /if}
With :
<span id="our_price_display" class="price" itemprop="price" content="{$productPrice}">{convertPrice price=$productPrice|floatval}</span>
{if $priceDisplay == 1} {l s='tax excl.'}{else} {l s='tax incl.'}{/if}