RDF4J rdf lucene configuration - lucene

I have been trying for some time to configure my sesame RDF repository (at the moment is called RDF4j) in order to use full text queries.
I did not find much documentation about this configuration, I think that I need to create a template file so then I can use it with the console. Here is the little information about the topic https://groups.google.com/forum/#!topic/rdf4j-users/xw2UJCziKl8
Does anybody know any information about the configuration of RDF4j with Lucene? Any clue would be very appreciate. Otherway, I would think about change the whole repository for another, like for example virtuoso.
Thanks in advance,

You need to do these operations:
Start rdf4j-server. I used rdf4j-server.war (and rdf4j-workbench.war). My url was http://127.0.0.1:8080/rdf4j-server
Put lucene.ttl (lucene.ttl or this) into ~/.RDF4J/console/templates, where "~" your home directory
Correct settings in this file
Then start console from rdf4j distributive
Then execute next commands in console:
connect http://127.0.0.1:8080/rdf4j-server
create lucene
Enter lucene repository Id: myRepositoryId
Enter lucene repository name: myRepositoryName
Then you can see in http://127.0.0.1:8080/rdf4j-workbench created repository. If you add some triples you can use lucene search

Related

DSE: Synonym Search (Thesaurus)

I am currently trying to get synonym search to work on multiple nodes via Datastax Studio if possible. The installation was done through OpsCenter with 3 nodes using DSE Search + Graph. I have seen various posts regarding this and the approaches consists of directly changing schema.xml. However, as I am using multiple nodes I am unsure if that is the way to approach this.
I have also tried to look information from datastax docs but couldn't find what i needed, hence, any advise or directions on this would be greatly appreciated. In an Ideal world it would be great if I am able to do synonym search through the graph(gremlin) interface.
For Graph the only option is to create necessary indexes, then get the existing schema.xml via dsetool get_core_schema, do modifications, and load it again with dsetool reload_core - the reloading of core should be done in every data center, but not on every machine of the data center...
Don't forget, that before reloading the core with modified schema, you need to upload synonym files with dsetool write_resource.
See the DSE Search documentation for full list of the options for dsetool command.

What is the significance of data-config.xml file in Solr?

and when shall I use it? How is it configured can anyone please tell me in detail?
The data-config.xml file is an example configuration file for how to use the DataImportHandler in Solr. It's one way of getting data into Solr, allowing one of the servers to connect through JDBC (or through a few other plugins) to a database server or a set of files and import them into Solr.
DIH has a few issues (for example the non-distributed way it works), so it's usually suggested to write the indexing code yourself (and POST it to Solr from a suitable client, such as SolrJ, Solarium, SolrClient, MySolr, etc.)
It has been mentioned that the DIH functionality really should be moved into a separate application, but that hasn't happened yet as far as I know.

Alfresco web-scripts-application-context.xml

I want to register a bean. I'am following this tutorial http://aboutalfresco.blogspot.com/2010/07/java-backed-web-scripts.html.
But i can't found the file tomcat\webapps\alfresco\WEB-INF\classes\web-scripts-application-context.xml.
Is it deprecated ?
Should i create it ? or should i use the file alfresco\web-client-application-context.xml wich also define beans ?
What is the difference bettwen this 2 files ? Alfresco version?
I'm using Alfresco 5.0.d.
In Alfresco 5 many of the context files have been bundled up in jars that you won't be able to access.
Besides, best practice involves you creating your own custom context file rather than overwriting a system context file. Your best bet if you don't want to create an amp is to put your bean definition in a file named something like custom-webscripts-context.xml and putting it in shared/classes/extension. Alfresco will pick up anything that ends in -context.xml.
Also, please don't follow a 5 year old tutorial. The tutorials linked in the comments are created by Jeff Potts, the old Community Manager for Alfresco, so really are the most up to date and easiest to follow you're going to find.

Apache Nutch: Get outlink URL's text context

Anyone knows an efficient way to extract the text context that wraps an outlink URL. For example, given this sample text containing an outlink:
Nutch can run on a single machine, but gains a lot of its strength from running in a Hadoop cluster. You can download Nutch here.
For more information about Apache Nutch, please see the Nutch wiki.
In this example, I would like to get the sentence containing the link, and a sentence before and after that sentence. Any way to do this efficiently? Any methods I can invoke to get something like the position of the link within a fetched content? Or even a part of the nutch code I can modify to do this? Thanks!
What you want to do is Web Scraping. Python and Hadoop offers tools for that. To achieve it, you can use selectors.
Here you find some examples how to do that using Python Scrapy:
Selectors
Scrapy Tutorial
On Hadoop the best way to go is to implement a crawling using selectors:
Web crawl with Hadoop
enter link description here
HiveQL
The cascading can be used to address the URL you specify:
Hadoop and Cascading
After having the data, you can also use R to optimize analysis:
R and Hadoop
Enabling R on Hadoop
If you haven't done anything with Hadoop yet, here is a good starting point. You may also want to have a look in HUE Beeswax as an interactive tool that is very useful for data analysis.

Lucene and Solr

I am creating the Index using Lucene.
But using the SolrSearch engine to search.
My problem is
while I index I add each filed in my
code using the code
doc.add(fieldname, val, tokenized)
**But my code does not see the schema file
Even copy fields I need to add manually**
Now I want to use the autosuggest feature of Solr
I do not know how to enable this feature while creating the Index
But when I use the simplePostTool to post the data through Solr all is fine.
I cannot do that because I have to
get some text from different urls.
So can someone please advise me how
can I achieve this? A sample code
will be very helpful. In any case If
I can have some code that can see
the schema file and use the
fieldTypes it would be great.
Thanks everyone.
--pramila
See EmbeddedSolrServer at:
http://wiki.apache.org/solr/Solrj
It's a pure Java API to Solr, which will allow you to index your documents using the schema.xml you have defined.