geohashTrie for location Nodes - cypher

I found from here that shows the possibilty of geohashTrie in neo4j.
My question is how to do that ? the cypher code ?

Related

Downloading all full-text articles in PMC and PubMed databases

According to one of the answered questions by NCBI Help Desk , we cannot "bulk-download" PubMed Central. However, can I use "NCBI E-utilities" to download all full-text papers in PMC database using Efetch or at least find all corresponding PMCids using Esearch in Entrez Programming Utilities? If yes, then how? If E-utilities cannot be used, is there any other way to download all full-text articles?
First of all, before you go downloading files in bulk, I highly recommend you read the E-utilities usage guidelines.
If you want full-text articles, you're going to want to limit your search to open access files. Furthermore, I suggest also restricting your search to Medline articles if you want articles that are any good. Then you can do the search.
Using Biopython, this gives us :
search_query = 'medline[sb] AND "open access"[filter]'
# getting search results for the query
search_results = Entrez.read(Entrez.esearch(db="pmc", term=search_query, retmax=10, usehistory="y"))
You can use the search function on the PMC website and it will display the generated query that you can copy/paste into your code.
Now that you've done the search, you can actually download the files :
handle = Entrez.efetch(db="pmc", rettype="full", retmode="xml", retstart=0, retmax=int(search_results["Count"]), webenv=search_results["WebEnv"], query_key=search_results["QueryKey"])
You might want to download in batches by changing retstart and retmax by variables in a loop in order to avoid flooding the servers.
If handle contains only one file, handle.read() contains the whole XML file as a string. If it contains more, the articles are contained in <article></article> nodes.
The full text is only available in XML, and the default parser available in pubmed doesn't handle XML namespaces, so you're going to be on your own with ElementTree (or an other parser) to parse your XML.
Here, the articles are found thanks to the internal history of E-utilities, which is accessed with the webenv argument and enabled thanks to the usehistory="y" argument in Entrez.read()
A few tips about XML parsing with ElementTree : You can't delete a grandchild node, so you're probably going to want to delete some nodes recursively. node.text returns the text in node, but only up to the first child, so you'll need to do something along the lines of "".join(node.itertext()) if you want to get all the text in a given node.
According to one of the answered questions by NCBI Help Desk , we cannot "bulk-download" PubMed Central.
https://www.nlm.nih.gov/bsd/medline.html + https://www.ncbi.nlm.nih.gov/pmc/tools/openftlist/ will download a good portion of it (I don't know the percentage). It will indeed miss the PMC full-texts articles whose license doesn't allow redistribution as explained on https://www.ncbi.nlm.nih.gov/pmc/tools/openftlist/.

Aem fulltextsearch

I want to search for a exact combination of words in all nodes in the aem using query builder.
Trying to debug the query http://localhost:4502/libs/cq/search/content/querydebug.html it returns me results that doesn't match my query.
For example if want to search for 'foo bar' in all nodes and I need to receive all nodes that contain 'Foo Bar', 'foo Bar', 'Foo bar', 'FOO BAR' but not only 'foo' and only 'bar' and not 'foo-bar'. Query in service is done by using QueryBuilder.
QueryBuilder is useful when you try to perform a query similar to SQL where you search against a property and its value. The full text search capabilities of the query debug interface is very limited as you have experienced.
However, remember that AEM uses an underlying Lucene and/or Solr index and it does provide a way to perform a native solr / lucene query.
Firstly create a embedded solr index (embedded is sufficient for a local development AEM instance) as mentioned under "Configuring AEM with an embedded SOLR server" in https://docs.adobe.com/docs/en/aem/6-0/deploy/upgrade/queries-and-indexing.html. This will trigger solr indexing of your JCR content.
Once indexing is complete (as seen from logs), you can perform native queries using the crx/de query interface.
Example query: select [jcr:path] from [nt:base] where native('solr', '<filter>?<solr_query_goes_here>'. Quite obviously you need to be familiar with solr queries. Thanks to the following slide share (slide 50 talks about native queries within AEM) http://www.slideshare.net/justinedelson/demystifying-oak-search
AEM support for native solr queries is a bit patchy. You might need to edit the SOLR schema xml file manually (created under the crx-quickstart folder) to add additional filters, custom fields etc. We had successfully tuned solr within AEM to perform a spacial search using the above method.
If you need all sorts of combinations for "foo bar" then you have to query:
fulltext=foo bar
You will only get the first 10 results. To get all, you'll need to:
p.limit=-1
You may want to specify the path:
path=/content/website/
Visit Adobe Query Builder API for more info.
Behind the scenes, AEM creates an xpath query and then executes it. Then, for any part of the query that doesn't map to xpath, it runs through the results and filters them.
You should also think about if there is a property to match as opposed to any text. That will give you much better results since you want accuracy. Right now you are casting an overly wide net, and I think you should consider restricting if for nothing other than performance reasons. Just a suggestion.
You say the results don't match your query, can you give us some idea of what comes back? And can you please put your actual query here. That will make it much easier to help.
this is a minimal example that provides a full-text search:
Query query = queryBuilder.createQuery(...);
// limit path
Predicate path = new Predicate(PathPredicateEvaluator.PATH);
path.set(PathPredicateEvaluator.PATH, "/content/where/ever);
query.getPredicates().add(path);
// Fulltext
Predicate fulltextSearch = new Predicate(FulltextPredicateEvaluator.FULLTEXT);
fulltextSearch.set(FulltextPredicateEvaluator.FULLTEXT, "foo bar");
fulltextSearch.set(FulltextPredicateEvaluator.REL_PATH, "jcr:content");
query.getPredicates().add(fulltextSearch);
// can I haz excerpt?
query.setExcerpt(true);
// Paging?
query.setStart(...);
query.setHitsPerPage(-1);
Note: it's not required to configure a solr index or whatever, you should be fine out of the box.
But if you limit the search to specific fields, you should create an index entry in oak:index. You can find a great cheat-sheet here.
I'm not sure if this helps.
but to get all the combinations of nodes that have the text i'm looking for I use jcr:like in xpath.
for example if I want to search all the nodes which has any property with Foo bar in its value or key, then my query looks like:
/jcr:root/content/yourpath//*[jcr:like(\*/, '%FOO bar%')]
You will not get that flexibility in QueryBuilder but you can still get what you want by using JCR-SQL2.
The following query will return all entries with "Foo Bar", "foo bar", "foo Bar", "Foo bar", but not "foo", "bar", "foo-bar" when your value is "foo bar".
SELECT * FROM [nt:unstructured] WHERE ISDESCENDANTNODE('/jcr:root/content/yourpath') AND LOWER([prop]) LIKE "%foo bar%" ORDER BY [cq:lastModified] desc
Just ensure that while checking for the values in repository you send the value in lowercase for case-insensitive search.
For case-sensitive search you can use:
SELECT * FROM [nt:unstructured] WHERE ISDESCENDANTNODE('/jcr:root/content/yourpath') AND [prop] LIKE "%foo bar%" ORDER BY [cq:lastModified] desc

cytoscape.js successors and predecessors

I'm looking to select the successors and the predecessors from a selected node within my graph. Essentially what I need my code to do is select the full path in and out of a note right to the end nodes.
I know how to select one or the other (successors or predecessors) but not both,
i'm currently using :
var nhood = node.successors();
cy.batch(function(){
cy.elements().not( nhood ).removeClass('highlighted').addClass('faded');
nhood.removeClass('faded').addClass('highlighted');
I'm very new to JS and I'm pretty much fumbling around in the dark just now, learning as I go, so please excuse me if this is a simple question.
Thanks.
You look like you want a BFS instead, because you want the entire connected component essentially. See http://js.cytoscape.org/#collection/algorithms/eles.breadthFirstSearch
You can keep an array and put visited nodes into them.

Querying for shared nodes in JCR (ModeShape)

I have a JCR content repository implemented in ModeShape (4.0.0.Final). The structure of the repository is quite simple and looks like this:
/ (root)
Content/
Item 1
Item 2
Item 3
...
Tags/
Foo/
Bar/
.../
The content is initially created and stored under /Content as [nt:unstructured] nodes with [mix:shareable] mixin. When a content item is tagged, the tag node is first created under /Tags if it's not already there, and the content node is shared/cloned to the tag node using Workspace.clone(...) as described in the JCR 2.0 spec, section 14.1, Creation of Shared Nodes.
(I don't find this particularly elegant and I did just read this answer, about creating a tag based search system in JCR, so I realize this might not be the best/fastest/most scaleable solution. But I "inherited" this solution from developers before me, so I hope I don't have to rewrite it all...)
Anyway, the sharing itself seems to work (I can verify that the nodes are there using the ModeShape Content Explorer web app or programatically by session.getRootNode().getNode("Tags/Foo").getNodes()). But I am not able to find any shared nodes using a query!
My initial try (using JCR_SQL2 syntax) was:
SELECT * FROM [nt:unstructured] AS content
WHERE PATH(content) LIKE '/Tags/Foo/%' // ISDECENDANTNODE(content, '/Tags/Foo') gives same result
ORDER BY NAME(content)
The result set was to my surprise empty.
I also tried searching in [mix:shareable] like this:
SELECT * FROM [mix:shareable] AS content
WHERE PATH(content) LIKE '/Tags/Foo/%' // ISDECENDANTNODE(content, '/Tags/Foo') gives same result
ORDER BY NAME(content)
This also returned an empty result set.
I can see from the query:
SELECT * FROM [nt:unstructured] AS content
WHERE PATH(content) LIKE '/Content/%' // ISDECENDANTNODE(content, '/Content') works just as well
ORDER BY NAME(content)
...that the query otherwise works, and returns the expected result (all content). It just doesn't work when searching for the shared nodes.
How do I correctly search for shared nodes in JCR using ModeShape?
Update: I upgraded to 4.1.0.Final to see if that helped, but it had no effect on the described behaviour.
Cross-posted from the ModeShape forum:
Shared nodes are really just a single node that appears in multiple places within a workspace, so it's not exactly clear what it semantically means to get multiple query results for that one shareable node. Per Section 14.16 of the JSR-283 (JCR 2.0) specification implementations are free to include shareable nodes in query results at just one or at multiple/all of those locations.
ModeShape 2.x and 3.x always returned in query results only a single location of the shared nodes, as this was the behavior of the reference implementation and this was the feedback we got from users. When we were working on Modeshape 4.0, we tried to make it possible to return multiple results, but we ran into problems with the TCK and uncertainty about what this new expected behavior would be. Therefore, we backed off our goals and implemented query to return only one of the shared locations, as we did with 2.x and 3.x.
I may be wrong, but I'm not exactly sure if any JCR implementation returns multiple rows for a single shared node, but I may be wrong.

To detect [programmatically] documents that are in certain Lifecycle state

I want to develop a software that (using DFS) will scan documents on Documentum Content Server and find ones that have a lifecycle attached; and current state of the lifecycle has certain name. Say, ‘ToBeExported’.
Below is DQL query I have created for this:
select dm_document.r_current_state, dm_document.r_object_id, dm_document.object_name from dm_document where dm_document.r_policy_id is not NULL and dm_document.r_current_state in (select i_state_no from (select dm_policy.i_state_no, dm_policy.state_name from dm_policy where dm_policy.r_object_id=dm_document.r_policy_id) where state_name='ToBeExported')
The question is: maybe I have missed something; or there is a better way to do it?
Thanks for help
Something like that would probably work.
But a question arise: why isn't the lifecycle doing whatever work you want to do, by itself ?