Form and process list of string with Lua plugin on very old Lua 2.5 - arraylist

I'm trying to write lua plugin to extract and place some metadata from HTML page. It's a plugin for soupault static site generator and it requires lua version 2.5 for work. So no closures and no for loops in particular.
It is no sense to load you how this generator works because plugin exists as a small standalone file .lua and plugs in during the running of generator.
What is significant are the methods used by the plugin. These are used at the input and output of the plugin.
And more importantly, it is how to form and process the list of extracted tags with lua language.
Input data in the body of html page:
<site-meta-data>
#+title: post 1 title
#+subtitle: Post 1 subtitle
#+description: Post 1 decription
#+author: Billy
#+date: 2021-11-03
#+datepublished: 2021-06-02
#+usertags: inventory,errand
#+summary: Post 1 summary
#+id: 1-test1com
</site-meta-data>
And these are steps that plugin should take:
get strings between <site-meta-data></site-meta-data> tags into the
list get string from the list & split by first colon eg.
string.match(destination_number, "(.-):"))
create variable name as first word before colon and without #+ (eg. title)
create conditions for the next operation on this variable. If variable name which is
created from the string = current name, eg. meta_tag = title, insert
metatag with value or insert new tag with extracted value after
parent tag. I'll write by myself which tags should be inserted.
after processing of the list remove all between <site-meta-data></site-meta-data> including
itself.
As being near zero in lua scripting I just wrote the scratch of the script.
all_meta_tags = HTML.select_one(page, "site-meta-data")
all_meta_tags = HTML.parse(page, "site-meta-data")
print(all_meta_tags)
local index = 1
while all_meta_tags[index] do
meta_tag_line = all_meta_tags[index]
meta_tag = string.match(meta_tag_line, "(.-):")
meta_tag_content = string.match(meta_tag_line, ":(.*)")
meta_tag_content = strlower(String.trim(meta_tag_content))
meta_tag = Regex.replace(meta_tag, "#+", "")
if (meta_tag == "title") then
HTML.append_child(page, HTML.create_string('<meta name="title" content="value..">'))
elseif (meta_tag == 'subtitle') then
HTML.append_child(page, HTML.create_string('...'))
elseif (meta_tag == 'description') then
HTML.append_child(page, HTML.create_string('<meta name="description" content="meta_tag_value">'))
elseif (meta_tag == 'author') then
HTML.append_child(page, HTML.create_string('author...'))
elseif (meta_tag == 'date') then
HTML.append_child(page, HTML.create_string('<meta name="date" content="meta_tag_content">'))
end
index = index + 1
end
HTML.delete(HTML.select_one(page, "site-meta-data"))
Please, help me to modify the script to accomplish the above task. I'll correct what tags it should place.

Since version 4.0 soupault supports a pre-parse hook. So it's now possible to reimplement various types of front matter with that hook. The plugin should always put the rendered HTML before the page body. Plugin can be written as this example.
[hooks.pre-parse]
file = "hooks/org-mode-metadata.lua"
template = """
<h1 id="post-title">{{title}}</h1>
...

Related

Is there any way to find out if a particular heading is present in a HTML document using dataweave 2.0 or in Mule 4

I have a document in which I have to check if the document has a heading called "Name". How can I check this using dataweave?
Example of document:
<h2 id="name">Name</h2>
<p>This Anypoint Template should serve as a foundation for setting an online sync of accounts from a Salesforce instance to many destination systems, using the Publish-subscribe pattern. Every time there is a new account or a change in an already existing one, the integration will poll for changes in the Salesforce source Org, publish the changes to a JMS topic and each subscriber will be responsible for updating the accounts in the target systems.</p>
DataWeave doesn't support HTML as a format however it does support XML. If the HTML input has a root tag you can parse as XML and use a recursive function to find if it has a key name and value matching.
%dw 2.0
output application/java
import every from dw::core::Arrays
import someEntry from dw::core::Objects
fun findString(x, keyName, s)= x match {
case o is Object -> o mapObject (($$): if ($$ as String == keyName and $ == s) true else findString($, keyName, s) ) someEntry (value, key) -> value == true
case s1 is String -> s1
case is Number -> false
case is Boolean -> false
case a is Array -> !((a map findString($, s)) every ($ == false)) // should not happen in XML
case is Null -> false
else -> false
}
---
findString(payload, "h2", "Name")
Input:
<html>
<body>
<h2 id="name">Name</h2>
<p>This Anypoint Template should serve as a foundation for setting an online sync of accounts from a Salesforce instance to many destination systems, using the Publish-subscribe pattern. Every time there is a new account or a change in an already existing one, the integration will poll for changes in the Salesforce source Org, publish the changes to a JMS topic and each subscriber will be responsible for updating the accounts in the target systems.</p>
</body>
</html>
Output: true
You can use xpath-extract from xml-module in Mule 4. You can use it to evaluate any XPath against your payload. For example the below XPath will work for you.
//*[matches(name(), '^h[1-6]$') and (text() = 'Name')]
It will match all tags with name that matches regex ^h[1-6]$, so h1 to h6 and has the value ( text() ) as Name. If you only want to look for h2 you can update regex part accordingly
<xml-module:xpath-extract
doc:name="Xpath extract"
doc:id="4b04a662-98cb-4dac-b4b5-18f8a9c6604e"
xpath="//*[matches(name(), '^h[1-6]$') and (text() = 'Name')]">
</xml-module:xpath-extract>
The module will return an array of string containing the results of the xpath. You can check if the output of this isEmpty().

Select specific elemets from a website in VB.net (WebScraping)

I found a website where I can look up vehicle inspections in Denmark. I need to extract some information from the page and loop through a series of license plates. Lets take this car as an example: http://selvbetjening.trafikstyrelsen.dk/Sider/resultater.aspx?Reg=as87640
Here on the left table, you can see some basic information about the vehicle. On the right, you can see a list of the inspections for this specific car. I need a script, which can check if the car has any inspections and then grab the link to each of the inspection reports. Lets take the first inspection from the example. I would like to extract the onclick text from each of the inspections.
The first inspection link would be:
location.href="/Sider/synsrapport.aspx?Inspection=18014439&Vin=VF7X1REVF72378327"
or if you could extract the inspection ID and Vin variable from the URL immediately:
Inspection ID: 18014439
Vin: VF7X1REVF72378327
Here is an example of a car which don't have any inspections yet, if you want to see what that looks like: http://selvbetjening.trafikstyrelsen.dk/Sider/resultater.aspx?Reg=as87400
Current Solution plan:
Download the HTML source code as a String in VB.net
Search the string and extract the specific parts.
Store it in a StringBuilder and upload this to my SQL server
Is this the most efficient way, or do you know of any libraries which is used to specific extract elements from a website in VB.net! Thanks!
You could use Java libraries HtmlUnit or Jsoup to webscrape the page.
Here's an example using HtmlUnit:
LogFactory.getFactory().setAttribute("org.apache.commons.logging.Log", "org.apache.commons.logging.impl.NoOpLog");
java.util.logging.Logger.getLogger("com.gargoylesoftware").setLevel(Level.OFF);
java.util.logging.Logger.getLogger("org.apache.commons.httpclient").setLevel(Level.OFF);
WebClient client = new WebClient(BrowserVersion.CHROME);
client.getOptions().setJavaScriptEnabled(true);
client.getOptions().setThrowExceptionOnScriptError(false);
client.getOptions().setThrowExceptionOnFailingStatusCode(false);
HtmlPage page = client.getPage("http://selvbetjening.trafikstyrelsen.dk/Sider/resultater.aspx?Reg=as87640");
HtmlTable inspectionsTable = (HtmlTable) page.getElementById("tblInspections");
Map<String, String> inspections = new HashMap<String, String>();
for (HtmlTableRow row: inspectionsTable.getRows()) {
String[] splitRow = row.getAttribute("onclick").split("=");
if (splitRow.length >= 4) {
String id = splitRow[2].split("&")[0];
String vin = splitRow[3].replace("\"", "");
inspections.put(id, vin);
System.out.println(id + " " + vin);
}
}

Ektron solr search with smartform fields facing aproblem

I am working with ektron 9.
I have created a smart from,and implemented the search for smart form fields using search api.
For that am using Ektron.Cms.Framework.Search.SearchManager class.It works fine when for single Xpath values.
When my smart form has multiple fields with same Xpath,the search api is returning the results of first occurrence only.
In the below example ,when i search for Book->Title using Xpath "/root/Books/Book/Title" search always return "Hai" in result.
<root>
<Books>
<Book>
<Id>1
</Id>
<Title>Hai
</Title>
<Book>
<Book>
<Id>2
</Id>
<Title>Hello
</Title>
<Book>
</Books>
</root>
How can i get "Hello" also in the result? is any separate api to handle this?
Or is it possible to handle this scenario in a separate way,like by specifying like this "/root/Books/Book[id=1]/Title" ?
For more details on search please look:
http://documentation.ektron.com/cms400/v85/webhelp/Navigating/Search85/APISearch.htm#Major
You haven't provided the code you are using so it is difficult to see where you are going wrong.
However, here is some code that will allow you to search against a SmartForm field in Ektron using Solr (or Microsoft Search Server).
This searches against a specific SmartForm in a field called "Path" - which is accessed using the XPath "/root/Path".
Ektron.Cms.Framework.Search.SearchManager sManager = new Ektron.Cms.Framework.Search.SearchManager();
AdvancedSearchCriteria searchCriteria = new AdvancedSearchCriteria();
searchCriteria.ExpressionTree = SearchContentProperty.XmlConfigId.EqualTo(YourSmartFormID);
searchCriteria.ExpressionTree &= SearchSmartFormProperty.GetStringProperty("/root/Path").EqualTo(YourPathValue);
searchCriteria.PagingInfo = new PagingInfo(10, 1);
searchCriteria.ReturnProperties = new HashSet<PropertyExpression>
{
SearchContentProperty.Id,
SearchContentProperty.Title,
SearchContentProperty.QuickLink
};
SearchResponseData response = sManager.Search(criteria);
The above example asks Search (Solr or Search Server) to return three properties: Id, Title and QuickLink.
You are likely to need to add "using" statements for Ektron.Cms.Search and Ektron.Cms.Framework.Search if you have not already.
Your best reference guide for the Ektron API is this site.
Ektron 9's solr integration has been fairly buggy for me thusfar (granted, it isn't even really out yet!), so this may actually just be a bug.
That said, does the same thing happen when you select /root/Books/Book, or does that also return only one result?
If the API is only ever returning one result, you could try making the search several times, until it comes up empty. The general pseudocode algorithm would be:
var i = 0;
List<item> allItems = new List<item>();
item myItem = select("(/root/Books/Book/Title)[0]");
while(myItem != null){
allItems.add(myItem);
i++;
myItem = select("(/root/Books/Book/Title)["+i+"]");
}
keeping in mind that this is pretty crazy inefficent.
Solr supports multivalued attributes, so when indexing smartform fields they get indexed as true multivalued fields instead of a delimiter separated values as was the case with Search Server 2010/FAST 2010.
In case of multivalued fields, from the SearchResponseData you would have to use the SearchResultData returned in the following manner.
For the case of Multivalued String properties
GetValue(StringMultiValuePropertyExpression) or use the indexer [StringMultiValuePropertyExpression]
For the case of Multivalued Floating point properties
GetValue(DecimalMultiValuePropertyExpression) or use the indexer [DecimalMultiValuePropertyExpression]
Reference
http://reference.ektron.com/developer/framework/Search/SearchResultData/
In case one doesn't use the MultiValuePropertyExpression, the API will return the first value of the set of values which is what you are seeing.
Hope this helps.

Looping in selenium

I recorded one script using Selenium IDE which contain clicking on a link and now i want to add loop to run same script multiple time, for this i am converting script to python but unable to add loop.Please help me in this regards.
Heres some text direct from selenium docs:
Data Driven Testing:
Data Driven Testing refers to using the same test (or tests) multiple times with varying data. These data sets are often from external files i.e. .csv file, text file, or perhaps loaded from a database. Data driven testing is a commonly used test automation technique used to validate an application against many varying input. When the test is designed for varying data, the input data can expand, essentially creating additional tests, without requiring changes to the test code.
# Collection of String values
source = open("input_file.txt", "r")
values = source.readlines()
source.close()
# Execute For loop for each String in the values array
for search in values:
sel.open("/")
sel.type("q", search)
sel.click("btnG")
sel.waitForPageToLoad("30000")
self.failUnless(sel.is_text_present("Results * for " + search))
Hope it helps. More info at: Selenium Documentation
Best Regards,
Paulo Bueno.
Try a loop similar to this example using "for x in range (0,5):" to set the number of times you wish it to iterate.
def test_py2webdriverselenium(self):
for x in range(0,5):
driver = self.driver
driver.get("http://www.bing.com/")
driver.find_element_by_id("sb_form_q").click()
driver.find_element_by_id("sb_form_q").clear()
driver.find_element_by_id("sb_form_q").send_keys("testing software")
driver.find_element_by_id("sb_form_go").click()
driver.find_element_by_link_text("Bing").click()
I tried this for some situations that I have little information:
list = [''' list containing all items ''']
index = 0
while True:
try:
# do what you want with list[index]
index += 1
except:
# index exception occured
break
In java you can do this as below:
# import packages or classes
public class testClassName(){
before test Methods(){
}
#Test
public void testMethod(){
for(int i =0, i<=5, i++){
WebElement element = driver.findElementById("link_ID");
element.click();
waitForPageLoaded(5);
}
}
after Test Method(){
}
}

RegEx help - finding / returning a code

I must admit it's been a few years since my RegEx class and since then, I have done little with them. So I turn to the brain power of SO. . .
I have an Excel spreadsheet (2007) with some data. I want to search one of the columns for a pattern (here's the RegEx part). When I find a match I want to copy a portion of the found match to another column in the same row.
A sample of the source data is included below. Each line represents a cell in the source.
I'm looking for a regex that matches "abms feature = XXX" where XXX is a varibale length word - no spaces in it and I think all alpha characters. Once I find a match, I want to toss out the "abms feature = " portion of the match and place the code (the XXX part) into another column.
I can handle the excel coding part. I just need help with the regex.
If you can provide a solution to do this entirely within Excel - no coding required, just using native excel formula and commands - I would like to hear that, too.
Thanks!
###################################
Structure
abms feature = rl
abms feature = sta
abms feature = pc, pcc, pi, poc, pot, psc, pst, pt, radp
font = 5 abms feature = equl, equr
abms feature = bl
abms feature = tl
abms feature = prl
font = 5
###################################
I am still learning about regex myself, but I have found this place useful for getting ideas or comparing what I came up with, might help in the future?
http://regexlib.com/
Try this regular expression:
abms feature = (\w+)
Here is an example of how to extract the value from the capture group:
using System;
using System.Text.RegularExpressions;
class Program
{
static void Main()
{
Regex regex = new Regex(#"abms feature = (\w+)",
RegexOptions.Compiled |
RegexOptions.CultureInvariant |
RegexOptions.IgnoreCase);
Match match = regex.Match("abms feature = XXX");
if (match.Success)
{
Console.WriteLine(match.Groups[1].Value);
}
}
}
(?<=^abms feature = )[a-zA-Z]*
assuming you're not doing anything with the words after the commas