How should I use SPARQL queries and semantic web? - sparql

I am currently building a website which sends SPARQL queries to get information from DBpedia (mostly name of cities which can then be displayed on a map, and further information about this places like number of inhabitants, prominent persons etc).
I would like your opinion about the general use of SPARQL queries and DBpedia:
is it preferable that I create a database specific for my website with the information that I need, and I use DBpedia queries only to update this database at regular intervals (e.g. everyday).
or is it ok if my website sends queries systematically to DBpedia when someone comes and visit this website?
The second option is much easier for me to implement because and I don't need to worry about the extra database,
but if everybody does the same it may surcharge the DBpedia servers?

DBPedia is frequently down or unresponsive for the reasons you cite - there can be unanticipated periods of high volume on the servers. So the first choice, caching to a local data store, is probably best.

The public DBpedia endpoint which is made available at no cost to the Web community at large may not be suited to such use as you describe -- being the backend to what seems likely to be a freemium service of some kind.
You can instantiate your own mirror of the DBpedia service on AWS or elsewhere, if you need a more consistently available instance.

Related

API which provide data from Elastic Search and not SQL

I have a system where there are large dataset(s) where I want to have quick searches, and elastic search is suitable for it. So the data resides in SQL, and is synced to ES. There is an obvious small delay in this sync.
There are consumers of this data which could work with slightly stale data. So if there's an API for UI which end users use to see the dataset. A delay of 3-4 seconds is acceptable. So API handler which deals with ES is perfect here.
Then there are consumers of this data (bots) who want to work with real time data. So for the almost same requirements, should I create another API just like that in UI consumer, which gets data from SQL?
What is the usual best practice which is followed, and I'm assuming this is a very common usecase.
You probably should stick to creating just a sinlge API and use a query string parameter to decide which of the two data sources to use. This will result in less code to maintain.

Amazon CloudSearch and Amazon Kendra

I was wondering what is the main difference between Amazon CloudSearch and Kendra? Why there are 2 different tools of the same company products and compete each other? Both looks like same, I am not sure what are the differences in features. How it is being differentiated one among the other.
Amazon CloudSearch: Set up, manage, and scale a search solution for your website or application. Amazon CloudSearch enables you to search large collections of data such as web pages, document files, forum posts, or product information. With a few clicks in the AWS Management Console, you can create a search domain, upload the data you want to make searchable to Amazon CloudSearch, and the search service automatically provisions the required technology resources and deploys a highly tuned search index;
Amazon Kendra: Enterprise search service powered by machine learning. It is a highly accurate and easy to use enterprise search service that’s powered by machine learning. It delivers powerful natural language search capabilities to your websites and applications so your end users can more easily find the information they need within the vast amount of content spread across your company.
The key difference between the two services is that AWS Cloud Search is based on Solr, a keyword engine, while Amazon Kendra is an ML-powered search engine designed to provide more accurate search results over unstructured data such as Word documents, PDFs, HTML, PPTs, and FAQs. Kendra was designed from the ground up to natively handle natural language queries and return specific answers, instead of just lists of documents like keyword engines do.
Another key difference is that in CloudSearch, to upload data to your domain, it must be formatted as a valid JSON or XML batch. Kendra, on the other hand, provides out of the box connectors that allow customers to automatically index content from popular repositories like Sharepoint Online, S3, Salesforce, Servicenow, etc., directly into the Kendra index. So, depending on your use case, Kendra may be a better choice, especially if you’re considering the service for enterprise search applications, or even web site search where deeper language understanding is important. Hope this helps, happy to address follow-up questions. You can also visit our Kendra FAQ page for more specific answers around the service: https://aws.amazon.com/kendra/faqs/

SPARQL Service on the Web

If I have the IRI of an RDF dataset, is there any service on the web which can takes the IRI of the dataset and a SPARQL query and returns me sparql result ?
If I serve Apache Fuseki on my server, can I do this whereby it can take any IRI as the default dataset and perform queries on it ?
There are a few such services - there is a reason they aren't common.
Loading a dataset of any size is expensive compared to the query. Loading it for a single query is only practical if the dataset is small (say, a few thousand triples at most).
If there isn't a SPARQL endpoint for the dataset you should consider setting up your own server with the data already loaded.
If you need a "load-and-query" service, Apache Jena Fuseki1 provides this facility at /sparql.html (the form) and /sparql (endpoint). Fuseki2 does not provide this feature, there is no configuration support.
OpenLink Software (my employer) hosts a couple of services that might be relevant.
Both of these offer Faceted Browsing over their data (at /fct), as well as the typical SPARQL interface (at /sparql), and various other Virtuoso-powered services.
The LOD Cloud Cache has been loaded with DBpedia and most every other data set from the LOD Cloud diagram which makes a suitable dump available.
URI Burner imports data from submitted URIs, with RDFizing as needed via the built-in Virtuoso Sponger.
If you want/need immediate and/or large RDF imports, and/or want to make demanding queries over the data that's been loaded, you may need to set up a relationship with us — as the default restrictions may block your desired activity. That said, just identifying yourself by logging in with any of several supported authenticating services (Twitter, LinkedIn, OpenID, WebID, etc.) lets you do a lot — and it may be enough for you.

Advantage of LDAP over RDBMS?

I have an application with a backend as database.
The application is sort of PUB-SUB model where users post changes to the application and other peers subscribe to those changes. These changes may happen very frequently or periodically and all the changes have to be written to database.
Now, I am being asked to find the possibility of replacing this RDBMS with LDAP. Probably they want unified DB for all applications but anyways I have to find the advantage/disadvantages of both approaches.
I cannot directly compare RDBMS a with LDAP as I have almost no idea of LDAP though I tried to get some.
I understand that LDAP is designed for directory access and is optimized for Read access, so it is write once and read many. I have read that frequent writes will reduce the performance of LDAP server as each write will result a trigger to indexing process.
Just to give a scenario in regards with indexing in LDAP, my table will have few columns say 2 viz. Name and Desc. Now in LDAP I suppose this would become two attributes as Name and Desc. In my scenario it's Desc which will be frequently updated. I assume Name will be indexed so even if Desc is changing frequently it won't trigger indexing process.
I point is worth mentioning that the database will be hosted on some cloud platform.
I tried to find out the differences but nothing conclusive I could find out.
LDAP is a protocol, REST is a service based on the HTTP (protocol). So when the LDAP server shall not be exposed to the internet, how do you want to get the data from it? As LDAP is the protocol you would need direct access to the LDAP-server. Its like a database server that you would not expose directly to the internet. You would build an interface to encapsulate it. and that might as well be a REST interface.
I'd try to get the point actos that one is the transfer protocol and a storage backend and the ither is the public interface to its data. It's a bit like why is mysql better than a webinterface. You'd never make the mysql-server publicly available but encapsulate its protocol into an application.
REST is an interface. It doesn't matter how you orgsnize your data behind that interface. When you decide that you want to organize it differently you can do so without the consumer of your API noticing any change. And you can provide different versions of your API depending on improvements of your service.
LDAP on the other hand is an implementation. You can't change the way your data is handled without the consumer noticing it. So there's no way to rearrange your backend without affecting the consumer.
With REST you can therefore change the backend from MySQL to PostgreSQL even to LDAP without notice which you won't be able with LDAP.
Hope that helps
Now that we finally know what you're actually asking, which has nothing to do with your title, the body of your question, or REST, the simple answer is that there is no particular reason to believe that an LDAP server will perform significantly better than an RDBMS in this application, with two riders:
it may not even be feasible, due to the schema issue, and
if it is feasible it may not be semantically suitable, due to the lack of ACID properties, lack of JOINs, and the other issues mentioned in comments.
I will state that this is one of the worst formulated questions I have seen here for some considerable time, and the difficulty of extracting the actual question was extreme.

What are the data differences between live.dbpedia.org, dbpedia.org, and the dbpedia data dump?

I understand that live.dbpedia.org is closer to a real time version of the dbpedia.org data, but that invites the question, how often is the regular dbpedia extraction/update process run? How often are the data dumps updated? Also, it's been said that the main endpoint incorporates other datasets in addition to what is extracted from Wikipedia.
What are the differences in data between dbpedia.org, live.dbpedia.org, and the data dumps?
I did some research on DBpedia for a project and I am going to share what I found out:
http://dbpedia.org/sparql: This endpoint is using most of the Datasets from DBpedia Downloads 2014 (WayBackMachineLink). For the complete list of the Datasets it is using and a little more information go to this site: Datasets Loaded 2014 (WayBackMachineLink). So how often are the Downloads updated? See the changelog of the Downloads.
http://live.dbpedia.org/sparql: This endpoint is using the data from DBpedia live. If you take a look at the live changesets you can see that sometimes it is updated at least every hour, sometimes like in September2014 just once a month. DBpedia is saying about this:
Q: The live-updates of DBpedia (changesets) have the structure year/month/day/hour/xxxx.nt.gz. What does it mean if there are some gaps in between, e.g. a folder for some hour is missing?
A: This means that the service was down at that time.
And DBpedia live - 3. new features (WayBackMachineLink)says:
5.Development of synchronization tool: The synchronization tool enables a DBpedia Live mirror to stay in synchronization with our live endpoint. It downloads the changeset files sequentially, decompresses them, and integrates them with another DBpedia Live mirror.
So I think if you're synchron with the live-endpoint when applying the changeset, the live endpoint is also applying the changeset.