Reconciliation services for OpenRefine not working? - openrefine

Has anyone been experiencing problems with reconciliation in OpenRefine? I've imported a list of American universities and colleges, selected 50 rows, and tried Freebase, DBpedia, OpenCorporates reconciliation services. I've previously had multiple successes with DBpedia (for colleges and universities), but right now, none of these are working. (I went through every service listed, too.) I've trimmed leading and trailing spaces, checked for duplicates, etc. Things were working fine only about two months ago, and I have not updated OpenRefine during that time. [UPDATED]: I was able to get one reconciliation service to work, but not with the ontology I want. So either I'm forgetting some key bit of info, or the services that reconciliation university/ college names no longer function.

The OpenCorporates Refine service (https://opencorporates.com/reconcile) is currently working, but note you need to use https - http did work at some point in the past but no longer.
Using the RDF Refine extension (http://refine.deri.ie) and trying a SPARQL based reconciliation against DBPedia I'm finding problems at the moment, but using the same approach for other SPARQL services I have no problems. I don't know what the underlying cause for this is, but it seems likely this is due to some issue or change at DBPedia rather than OpenRefine or the RDF Refine extension.
Any more information about how you are setting up the Reconciliation services and any extensions you are using might help in further diagnosis

I have been having the same problem with reconciliation using the RDF extension and DBpedia.org. However, if you try some other service (e.g., your local file or Spanish DBpedia (http://es.dbpedia.org/sparql)) it works very well.
Like Owen already mentioned, it is likely that the it is due to DBpedia. Looks like the problem is with XML 1.1. and the recent update of DBpedia.org, take a look here https://github.com/openlink/virtuoso-opensource/issues/405 If you see OpenRefine log file or the console, this is exactly the same exception that we get when doing reconciliation against the DBpedia.org.
Hopes this give you some idea.
Cheers

At the moment there exists a possibility to create "database" (actually, owl\rdf ontology) and operation json for reconcilliation "on-the-fly" from OpenRefine facets and operation json. Please refer to https://stdgont.uk.to for details.

Related

How can I get more results from anzograph

I am using anzograph with SPARQL trough http using RDFlib. I do not specify any limits in my query, and still I only receive 1000 solutions. The same seems to happen on the web interface.
If I fire the same query on other triple stores with the same data, I do get all results.
Moreover, if I fire this query using their command line tool on the same machine as the database, I do get all results (millions). Maybe it is using a different protocol with the local database. If I specify the hostname and port explicitly on the command line, I get 1030 results...
Is there a way to specify that I want all results from anzograph over http?
I have found the service_graph_rowset_limit setting and changed its value to 100000000 in both config/settings_standalone.conf and config/settings.conf, (and restarted the database) but to no avail.
let me start by thanking you for pointing this issue out.
You have identified a regression of a fix, that had been intended to protect the web UI from freezing on unbounded result sets, but affected the regular sparql endpoint user as well.
Our Anzo customers do not see this issue, as they use the internal gRPC API directly.
We have produced a fix that will be in our upcoming anzograph 2.4.0 and in our upcoming patch release 2.3.2 set of images.
Older releases will receive this fix as well (when we have a shipment vehicle).
If it is urgent to you I can provide you both a point fix (root.war file).
What exact image are you using?
Best - Frank

related keyword recommendation using solr and mongodb

Im new to solr...
I have been looking into related content recommendation engine... for implementing it to my core php and mongodb website.. its music listening website.. so i have added keywords to every music like singer,music,lyrics to mongodb.
My question is related music recommendation using keywords can solr (more like this handler) recommend it?
example keywords : bobby-singer,a-r-rehaman,shreya goshal
its should look for related keywords in order like:
bobby-singer,a-r-rehaman,shreya goshal
bobby-singer,a-r-rehaman
bobby-singer,shreya goshal
a-r-rehaman,shreya goshal
bobby-singer
a-r-rehaman
shreya goshal
my keywords are already in mongodb.. im planning to work with apache solr morelikethis handler.. or please recommend me some good recommendation engine..
Thanks
There are a couple of different things here.
First of all, you can use MLT to get Solr to bring to you related documents but...
I am wondering if you could also benefit from synonyms so that on certain searches you can get results that are similar which may satisfy the user
And also if you have already the list of relationships you can build a small index where you can run an OR of your query and get potential related searches or execute potential related searches and get related results
Hope this helps

Rename NServiceBus.Host.exe

This may very well be a silly question, but I cannot find any documentation on this or where anyone has really asked this question except here NServiceBus Yahoo Groups. I want to rename my NServiceBus.Host.exe, even though they have different names in the services group when I install the service, in task management all the services are NServiceBus.Host.exe. I have tried renaming the exe, but of course this has issues with deployment, and it simply will not run as per the url I had posted, encountering an endpoint config error.
I've looked through the configuration options and I do not see anything that looks as such an option, if anyone knows a good way to do this, that would be great. Thanks!
That sounds like not a very good idea. That road is bound to be riddled with potholes.
In order to tell the processes apart in Task Manager, there's a pretty simple solution.
Go to View -> Select Columns, and add "Command Line" which is just a few up from the bottom.
This will show you the full path to the specific NServiceBus.Host.exe instance, along with the command line arguments, which can give you valuable information like the Profiles that were used.

How to access results of Sonar metrics for use with applications like PowerPivot

I'm trying to run a number of applications with known failure rates through Sonar, with hopes of deciding which metrics are most valuable in determining whether a particular application will fail. Ultimately I'll be making some sort of algorithm that will look at the outputs of whatever metrics I'm using and generate a score from 1 - 100. I've got about 21 applications put through Sonar, and the results have been stored in a MySQL database. I originally planned to use PowerPivot to find relationships in the data, but it seems like the formatting of the tables doesn't lend itself well to that. Other questions on stackoverflow have told me that Sonar's tables are unformatted, and I should instead use the Web Service API to get the information. I'm unfamiliar with API and was unsuccessful in trying to do what I wanted by looking at Sonar's documentation for API.
From an answer to another question:
http://nemo.sonarsource.org/api/timemachine?resource=org.apache.cxf:cxf&format=csv&metrics=ncloc,violations_density,comment_lines_density,public_documented_api_density,duplicated_lines_density,blocker_violations,critical_violations,major_violations,minor_violations
This looks very similar to what I'd like to have, except I'm only looking at each application once (I'm analyzing a sample of all the live applications on a grid), which means Timemachine isn't really what I'm looking for. Would it be possible to generate a similar table, except instead of the stats for a particular application per date, it showed the statistics for an application and all of its classes, etc?
If you're not familiar with the WS API, you can also create your own Sonar plugin to achieve whatever you want: it is written in Java and it will execute on every analysis you run. This way, in the code ot this custom plugin, you can do whatever you want: flush the metrics you need in an output file, push them into a third party system, ... etc.
Just take a look on how to write a plugin (most probably you will create a Decorator). You have concrete examples also to get started faster.

Automating WebTrends analysis

Every week I access server logs processed by WebTrends (for about 7 profiles) and copy ad clickthrough and visitor information into Excel spreadsheets. A lot of it is just accessing certain sections and finding the right title and then copying the unique visitor information.
I tried using WebTrends' built-in query tool but that is really poorly done (only uses a drag-and-drop system instead of text-based) and it has a maximum number of parameters and maximum length of queries to query with. As far as I know, the tools in WebTrends are not suitable to my purpose of automating the entire web metrics gathering process.
I've gotten access to the raw server logs, but it seems redundant to parse that given that they are already being processed by WebTrends.
To me it seems very scriptable, but how would I go about doing that? Is screen-scraping an option?
I use ODBC for querying metrics and numbers out of webtrends. We even fill a scorecard with all key performance metrics..
Its in German, but maybe the idea helps you: http://www.web-scorecard.net/
Michael
Which version of WebTrends are you using? Unless this is a very old install, there should be options to schedule these reports to be emailed to you, and also to bookmark queries. Let me know which version it is and I can make some recommendations.