Querying Reagent/Hiccup Markup For Tests - testing

I want to test a Reagent/Re-Frame view by asserting properties of the Hiccup markup it returns. Therefore I need functions that traverse the tree, filter it, or check that certain elements with certain attributes are present.
I can not be the first to have this problem, yet Google wasn't very helpful. Is there a library that does that? If not, how would you, e.g., write a function that traverses the markup and returns a seq of all elements with a certain tag?

I'd be looking at:
Convert the hiccup out to HTML then use HTML selectors to do analysis
Put the tree inside DataScript then use datalog to do analysis
More details here https://juxt.pro/blog/posts/datascript-dom.html
Explore the sea of libraries for data filtering:
https://github.com/noprompt/meander
https://github.com/BrunoBonacci/where
There's a lot of ways of doing this it's hard to suggest a good answer

You can analyze and modify any tree-like data structure using the Tupelo Forest library. Please view the Lightning Talk video and peruse the many live examples.
If you add more detail to your question I can update the answer with something specific.
You can also see this answer.

Related

Normalizer in cloudconnect for Gooddata

I have some doubts. I'm doing a BI for my company, and I needed to develop a data converter in ETL because the database to which it's connected (PostgreSQL) is bringing me some negative values within the time CSV. It doesn't make much sense to bring from the database (to which we don't have many accesses) negative data like this:
The solution I found so that I don't need to rely exclusively on dealing directly with the database would be to perform a conversion within the cloudconnect. I verified in my researches that the one that most contemplates would be the normalizer, but there are not many explanations available. Could you give me a hand? because I couldn't parameterize how I could convert this data from 00:00:-50 to 00:00:50 with the normalizer.
It might help you to review our CC documentation: https://help.gooddata.com/cloudconnect/manual/normalizer.html
However, I am not sure if normalizer would be able to process timestamps.
Normalizer is basically a generic transform component with a normalization template. You might as well use reformat component that is more universal.
However, what you are trying to do would require some very custom transform script, written in CTL (CloudConnect transformation language) or java.
You can find some templates and examples in the documentation: https://help.gooddata.com/cloudconnect/manual/ctl-templates-for-transformers.html

How to use Alignment API to generate a Alignment Format file?

I am going to attend the Instance Matching of OAEI, now I need to make my results to Alignment Format. In order to achieve it, I have learned official tutorials.(link:http://alignapi.gforge.inria.fr/tutorial/tutorial1/index.html).
But there are many differences between the method taught and the method I want. In other words, I can't understand the API.
This is my situation:
I have 2 rdf file(person11.rdf and person12.rdf respectively.data link is http://oaei.ontologymatching.org/2010/im/index.html, the PR dataset), each file has information of many person. I want to find the coreferent entities, the results must be printed in Alignment Format. I find the results by using SPARQL, but I don't know how to print it in Alignment Format.
So, I have three questions:
First, if I want to generate a Alignment Format file, is the method taught the only way?
Second, can you give me your method(code better) to generate the Alignment Format file? Maybe I am wrong from the beginning, can you give me some suggestions?
Third, if you attended OAEI or know something about Instance Matching, can you give me some advice? I want to find the coreferent entities.
Thank you!
First question: I guess that the "mentioned method" is the one in tutorial1. It is not the appropriate one since you have to write a program to output the alignment format and this is a command line interface tutorial. In this case, you'd better look at http://alignapi.gforge.inria.fr/tutorial/tutorial2/index.html
Then, there are basically two ways to do:
The advised one (for several reasons and for participating to OAEI) is to follow these tutorials, to create an empty alignment in it, to create the correspondences from the results of your SPARQL query and to render it. Everything is covered by the tutorials but the part concerning your SPARQL queries. This assumes that you are programming in Java.
The non-advised solution (primarily non advised because you will have to debug your own renderer), is to write, in any programming language that you want a program that output the format (which corresponds to what you cite).
Think about it: how would you expect that the Alignment API knows the results of your SPARQL query? If you come up with a nice solution, contact the API developers, they may integrate it and others could benefit.
Second question: I cannot do better than what is above.
Third question: too general. Read the OAEI results (http://oaei.ontologymatching.org) and look at the code of others.
Good luck!

General stategy for designing Flexible Language application using ANTLR4

Requirement:
I am trying to develop a language application using antlr4. The language in question is not important. The important thing is that the grammar is very vast (easily >2000 rules!!!). I want to do a number of operations
Extract bunch of informations. These can be call graphs, variable names. constant expressions etc.
Any number of transformations:
if a loop can be expanded, we go ahead and expand it
If we can eliminate dead code we might choose to do that
we might choose to rename all variable names to conform to some norms.
Each of these operations can be applied independent of each other. And after application of these steps I want the rewrite the input as close as possible to the original input.
e.g. So we might want to eliminate loops and rename the variable and then output the result in the original language format.
Questions:
I see a need to build a custom Tree (read AST) for this. So that I can modify the tree with each of the transformations. However when I want to generate the output, I lose the nice abilities of the TokenStreamRewriter. I have to specify how to write each of the nodes of the tree and I lose the original input formatting for the places I didn't do any transformations. Does antlr4 provide a good way to get around this problem?
Is AST the best way to go? Or do I build my own object representation? If so how do I create that object efficiently? Creating object representation is very big pain for such a vast language. But may be better in the long run. Again how do I get back the original formatting?
Is it possible to work just on the parse tree?
Are there similar language applications which do the same thing? If so what strategy do they use?
Any input is welcome.
Thanks in advance.
In general, what you want is called a Program Transformation System (PTS).
PTSs generally have parsers, build ASTs, can prettyprint the ASTs to recover compilable source text. More importantly, they have standard ways to navigate/inspect/modify the ASTs so that you can change them programmatically.
Many offer these capabilities in the form of pattern-matching code fragments written in the surface syntax of the language being transformed; this avoids the need to forever having to know excruciatingly fine details about which nodes are in your AST and how they are related to children. This is incredibly useful when you big complex grammars, as most of our modern (and our legacy languages) all seem to have.
More sophisticated PTSs (very few) provide additional facilities for teasing out the semantics of the source code. It is pretty hard to analyze/transform most code without knowing what scopes individual symbols belong to, or their type, and many other details such as data flow. Full disclosure: I build one of these.

Django Treebeard display tree category using bootstrap effectively

How to traverse a tree of MP_Node (django-treebeard) categories and display effectively with lease amount of queries? I tried looking the docs but I see the queries number increasing with more categories.
Is there a method to limit the number of queries to display a menu like amazon.com and get all the categories in an optimized manner?
I see that dump_bulk() api in treebeard gets all the categories in a single query. Is it advisable to use it? If not why? Where is its practical usage?
A sample code using twitter-bootstrap nav menu would be appreciated.
I'm looking to reduce the number of queries. Answer with explanation with least number of queries will be accepted.
I chose django-mptt in lieu of Treebeard. It's much more simple, and uses only one tree management methodology (MPTT, hence the name). See this function (disclaimer, I submitted a patch which rewrote this function to be more optimized) - it caches an entire tree below a given node, allowing you to go up and down the tree as much as you want (e.g., node.get_children(), node.parent, etc.), without running any more queries. In other words, it's ideal for doing exactly what you're wanting to do.

Tool or methods for automatically creating contextual links within a large corpus of content?

Here's the basic scenario - I have a corpus of say 100,000 newspaper-like articles. Minimally they will all have a well-defined title, and some amount of body content.
What I want to do is find runs of text in articles that ought to link to other articles.
So, if article Foo has a run of text like "Students in 8th grade are being encouraged to read works by John-Paul Sartre" and article Bar is titled (and about) "The important works of John-Paul Sartre", I'd like to automagically create that HTML link from Foo to Bar within the text of Foo.
You should ask yourself something before adding the links. What benefit for users do you want to achieve by doing this? You probably want to increase the navigability of your site. Maybe it is better to create an easier way to add links to older articles in form used to submit new ones. Maybe it is possible to add a "one click search for selected text" feature. Maybe you can add a wiki-like functionality that lets users propose link for selected text. You probably want to add links to related articles (generated through tagging system or text mining) below the articles.
Some potential problems with fully automated link adder:
You may need to implement a good word sense disambiguation algorithm to avoid confusing or even irritating the user by placing bad automatic links with regex (or simple substring matching).
As the number of articles is large you do not want to generate the html for extra links on every request, cache it instead.
You need to make a decision on duplicate titles or titles that contain other title as substring (either take longest title or link to most recent article or prefer article from same category).
TLDR version: find alternative solutions that provide desired functionality to the users.
What you are looking for are text mining tools. You can find more info and links at http://en.wikipedia.org/wiki/Text_mining. You might also want to check out Lucene and its ports at http://lucene.apache.org. Using these tools, the basic idea would be to find a set of similar articles based on the article (or title) in question. You could search various properties of the article including titles and content or both. A tagging system a la Delicious (or Stackoverflow) might also be helpful. Rather than pre-creating the links between articles, you'd present the relevant articles in an interface much like the Related questions interface on the right-hand side of this page.
If you wanted to find and link specific text in each article, I think you'd need to do some preprocessing to select pertinent phrases to key on. Even then I think it would be very hard not to miss things due to punctuation/misspellings or to not include irrelevant links for the same reasons.