Nifi GetSolr processor not working - apache

When run the GetSolr processor using Nifi for the first time it extracts the data from Solr.
But, If I run it for second time it doesn't fetches the data from Solr.
Could anybody please help me regarding this.

It is meant to do incremental extraction, so after the first time it would be getting only data that is newer than the data previously fetched.
If you right click on the processor and view state, there is an option to clear state which should set it back to the beginning.
If this does not answer your question, please show how you have configured GetSolr and which version of NiFi you are using.

Related

Run flowtype checker manually

I have the IDEA Ultimate 2018.1 with flowtype (flow-bin) configured and all the checkboxes selected. I followed this guide: https://www.jetbrains.com/help/idea/2017.2/flow-type-checker.html
The type checking needs much time to be executed. I change something in my code (reverting a wrong annotation, or creating a wrong one), and I need to wait around 30 seconds to get the correct annotation, this is, IDEA triggers the flow server to analyse the files and modify the editor accordingly. That is quite a lot.
Can I trigger that type checking analysis manually inside IDEA to get the editor updated? Or can I change the auto-running interval?
As Kraus noticed, my version of flow-bin was old.
I was using the version 0.26.0 instead of the new 0.74.0, mainly because when I updated flow I was not using flow-bin but flow...
Thanks. Now IDEA and flow are fast.

-Denable-debug-rules=true not giving out statistics

I'm giving the flag -Denable-debug-rules, which the documentation says should print something to a log at least every 5 minutes, according to http://graphdb.ontotext.com/documentation/standard/rules-optimisations.html
Unfortunately it's not, and I need to figure out why inferencing is taking so long.
Help?
The specific files is http://purl.obolibrary.org/obo/pr.owl and I'm using owl2-rl-optimized
Version graphdb-ee-6.3.1
An exchange with GraphDB tech support clarified that the built-in rule sets can not be monitored. To effectively monitor them, copy into a new file and add that file as a ruleset following http://graphdb.ontotext.com/documentation/enterprise/reasoning.html#operations-on-rulesets

dispatching started for transformation

When I preview rows in Text file Input control of Pentaho, no rows appear and 'Show log' option displays this message
"Dispatching started for transformation".
What does it mean? How to overcome this issue?
It seems that either your transformation is invalid (you're missing one essential checkbox or another) or your PDI installation isn't working properly.
Which JAVA version are you using? And which PDI version? Try it on a fresh install and if it still doesn't work, go over your text file input step and validate that it's correctly configured.
Also, try removing all other steps, it could be that one of the subsequent steps is the one causing problems and stopping PDI from starting the transformation execution.
Well... maybe it's quite late, but I'm currently struggling with this issue in the Pentaho Community Version 8.
What I found, and solved some of my issues is that this message can be a potential warning for a Deadlock process. You have to be sure that none of this situations are present in your code:
An external component like a table lock by the database blocks the transformation.
The "Block this step until steps finish" step might run into a deadlock when there are more rows to process than the number of Rows in Rowset.
Within transformations there are situations when streams get split and joined again, so that the transformation blocks by design.
You could see full examples in the Jira Pentaho documentation page:
https://pentaho-community.atlassian.net/wiki/spaces/EAI/pages/386807182/Transformation+Deadlocks
I hope that it will help you!

Query RavenDB without using the studio interface

I am trying to view my sagas in the RavenDB management studio, and loading even the initial page, all that I see is this "Querying documents..." box with a continuous moving progress bar. I can not seem to get past it, going from page to page it does not go away. Is there a way to pull all of the saga data into a list so I can look at it? It appears the issue is that the saga documents are continuously being added.
I've looked into the HTTP API and the Linq adapters, but I guess I am looking for something that already exists that can easily peer into the server much like the silverlight studio, except not such a pain. I more or less just want to pull a snapshot of all the documents into some kind of readable list.
I find LINQPad 4 convenient, the RavenDB driver for LINQPad can be found here:
https://github.com/ronnieoverby/RavenDB-Linqpad-Driver
For the command line - cURL using dynamic indexes as explained here:
http://ravendb.net/docs/http-api/indexes/dynamic-indexes
In the browser, go to http://localhost:8080/docs
You might need to install JsonView, but that should give you what you want.
If anyone wants to know how to browse the data through REST call,
"localhost:8080/databases/{database-name}/docs/{dataset-name}/id"
example:
"localhost:8080/databases/testDB/docs/Sites/1"
will give the json data for the "Sites" document
"localhost:8080/databases/testDB/docs/"
will give the json data for all the documents in
testDB.

Ubuntu + PBS + Apache? How can I show a list of running jobs as a website?

Is there a plugin/package to display status information for a PBS queue? I am currently running an apache webserver on the login-node of my PBS cluster. I would like to display status info and have the ability to perform minimal queries without writing it from scratch (or modifying an age old python script, ala jobmonarch). Note, the accepted/bountied solution must work with Ubuntu.
Update: In addition to ganglia as noted below, I also looked that the Rocks Cluster Toolkit, but I firmly want to stay with Ubuntu. So I've updated the question to reflect that.
Update 2: I've also looked at PBSWeb as well as MyPBS neither one appears to suit my needs. The first is too out-of-date with the current system and the second is more focused on cost estimation and project budgeting. They're both nice, but I'm more interested in resource availability, job completion, and general status updates. So I'm probably just going to write my own from scratch -- starting Aug 15th.
Have you tried Ganglia?
I have no personal experience but few sysadmin I know are using it.
Following pages may help,
http://taos.groups.wuyasea.com/articles/how-to-setup-ganglia-to-monitor-server-stats/3
http://coe04.ucalgary.ca/rocks-documentation/2.3.2/monitoring-pbs.html
my two cents
Have you tried using nagios: http://www.nagios.org/ ?