I run a SPARQL query in graphDB from an API REST and I get only the first 1000 rows back. I am not using a LIMIT clause at the end of my query. I found in the documentation that number is a default result set limit but how can it be overridden without download the data? Because that is the solution what graphDB team propose.
Some sparql endpoints have a limit to the data you can fetch with one query. It can change from endpoint to endpoint, generally it is 10,000 triples. You can still use pagination to get more results. In the SPARQL language you should use OFFSET. Here you can find documentation on how to use it: https://www.w3.org/TR/rdf-sparql-query/#modOffset
Related
I need to get Wikidata artifacts (instance-types, redirects and disambiguations) for a project.
As the original Wikidata endpoint has time constraints when it comes to querying, I have come across Virtuoso Wikidata endpoint.
The problem I have is that if I try to get for example the redirects with this query, it only returns 100,000 results at most:
PREFIX owl: http://www.w3.org/2002/07/owl#
CONSTRUCT {?resource owl:sameAs ?resource2}
WHERE
{
?resource owl:sameAs ?resource2
}
Iām writing to ask if you know of any way to get more than 100,000 results. I would like to be able to achieve the maximum number of possible results.
Once the results are obtained, I must have 3 files (or as few files as possible) in the Ntriples format: wikidata_intance_types.nt, wikidata_redirecions.nt and wikidata_disambiguations.nt.
Thank you very much in advance.
All the best,
Jose Manuel
Please recognize that in both cases (Wikidata itself, and the Virtuoso instance provided by OpenLink Software, my employer), you are querying against a shared resource, and various limits should be expected.
You should space your queries out over time, and consider smaller chunks than the 100,000 limit you've run into -- perhaps 50,000 at a time, waiting for each query to finish retrieving results, plus another second or ten, before issuing the next query.
Most of the guidance in this article about working with the DBpedia public SPARQL endpoint is relevant for any public SPARQL endpoint, especially those powered by Virtuoso. Specific settings on other endpoints will vary, but if you try to be friendly ā by limiting the rate of your queries; limiting the size of partial result sets when using ORDER BY, LIMIT, and OFFSET to step through to get a full result set for a query that overflows the instance's maximum result set size; and the like ā you'll be far more successful.
You can get and host your own copy of wikidata as explained in
https://wiki.bitplan.com/index.php/Get_your_own_copy_of_WikiData
There are also alternatives to get a partial dump of wikidata e.g. with https://github.com/bennofs/wdumper
Or ask for access to one of the non public copies we run by sending me a personal e-mail via my RWTH Aachen i5 account
A LINQ query returns 17000 records. A Odata controller Get() return this query but it takes more than 6 min How I reduce the time.
you can try and add paging logic to your query, the concept is to fetch only a set amount of records each time instead of all of them which may reduce your query time.
note that this solution work only if your client will also be able to support the paging logic.
refer to these articles to see possible implementations:
ODATA:
https://learn.microsoft.com/en-us/aspnet/web-api/overview/odata-support-in-aspnet-web-api/supporting-odata-query-options#server-paging
SQL:
https://learn.microsoft.com/en-us/dotnet/framework/data/adonet/paging-through-a-query-result
I'm trying to filter by datatype in DBpedia. For example:
SELECT *
WHERE {?s ?p ?o .
FILTER ( datatype(?o) = xsd:integer)
}
LIMIT 10
But I get no results, while there are certainly integer values. I get the same from other endpoints using Virtuoso, but I do get results from alternative endpoints. What could be the problem? If Virtuoso does not implement this SPARQL function properly, what to use instead?
This is an expensive query, since it has to traverse all triples (?s ?p ?o .). The query's execution time exceeds the maximum time configured for the Virtuoso instance that serves DBpedia's SPARQL endpoint at http://dbpedia.org/sparql.
If you don't use the timeout parameter, then you will get a time out error (Virtuoso S1T00 Error SR171: Transaction timed out). When you use the timeout (by default set to 30000 for the DBpedia endpoint), you will get incomplete results that will contain HTTP headers like these:
X-SQL-State: S1TAT
X-SQL-Message: RC...: Returning incomplete results, query interrupted by result timeout. Activity: 1.389M rnd 5.146M seq 0 same seg 55.39K same pg 195 same par 0 disk 0 spec disk 0B / 0 me
Empty results thus may be incomplete and it doesn't need to indicate there are no xsd:integer literals in DBpedia. A related discussion about the partial results in Virtuoso can be found here.
As a solution to your query, you can load the DBpedia from dumps and analyze it locally.
As a side note, your query is syntactically invalid, because it's missing namespace for the xsd prefix. You can check the syntax of SPARQL queries via SPARQLer Query Validator. You can find the namespaces for common prefixes using Prefix.cc. Virtuoso that provides the DBpedia SPARQL endpoint ignores the missing namespace for the xsd prefix, but it's a good practice to stick to syntactically valid SPARQL queries for greater interoperability.
I use CouchDB 1.5.0 and noticed a strange thing:
When I query some API action, for example:
curl -X GET "http://localhost:5984/mydb/_changes?limit=1"
I get the same result with limit=1 and with limit=0 and with limit=-55. In all cases is a one row from the start of list.
Although, PostgreSQL returns:
Zero rows when LIMIT 0
Message ERROR: LIMIT must not be negative when LIMIT -55
My question is mainly concerned with the API design. I would like to know your opinions.
It's a flaw or maybe it's good/acceptable practice?
This is how the _changes api is designed. If you do not specify the type of feed i.e long-poll, continuous etc the default is to return a list of all the changes in a single results array.
If you want a row by row result of the changes in the database specify the type of feed in the url like so
curl -X GET "http://localhost:5984/mydb/_changes?feed=continuous"
Another point to note that in the _changes api using 0 has the same effect as using 1 in limit parameter.
Is there a way to fetch results from LDAP using a count and an offset?
Entries returned to search requests are never ordered in any way, nor are attributes, attribute options, or attribute values. Some servers support VLV and server-side sorting, however.
For more information, see LDAP search.