Limit records returned by range - sparql

Hi I have a particular sparql query which returns all my triples in a graph. This takes some time to run so I was wondering if I can limit the amount of triples and rerun the query everytime the user presses next on the interface ex: first he sees the first 10, then next 10 etc... Right now I am getting all the results and saving them and then traverse the results. Can I just get the first 10, next 10 etc..

For a SELECT query, to get only the first ten rows:
SELECT ...
WHERE {
...
}
LIMIT 10
To get the next ten:
SELECT ...
WHERE {
...
}
OFFSET 10 LIMIT 10
For the next ten, increase the OFFSET to 20, and so on.
You say that your query is returning triples, so is it a CONSTRUCT query? LIMIT and OFFSET also work with CONSTRUCT, but the number of triples returned will depend on the number of triple patterns in the construct template.
Edit: When using LIMIT and OFFSET, on some SPARQL stores you will need to also use ORDER BY ?x, where ?x is one of the query variables. This ensures a predictable order of the results.

Related

What order does Wikidata's SPARQL endpoint use when there is no `ORDER BY` clause?

The following query loads twenty cities from Wikidata, together with the country or state of which they are the capital:
SELECT ?item ?capitalOf WHERE {
?item wdt:P31/wdt:P279* wd:Q515.
OPTIONAL { ?item wdt:P1376 ?capitalOf. }
}
LIMIT 20 OFFSET 0
The results are Q60 (New York), Q62 (San Francisco), Q64 (Berlin), Q84 (London) etc.
Now set the OFFSET parameter to 1, and you get the same list starting at Q62. Index 0 is omitted, as expected. With OFFSET set to 2, you get the same list starting at index 2.
However, sometimes you get a completely different result. For example, I just got the list Q2807, Q2861, Q2900 ... when using OFFSET 70, but this list didn't overlap the list from OFFSET 60. It seems that there is some randomness in the LIMIT and OFFSET query clauses.
What is the default sort order of SPARQL results?
The reason I'm asking: We are using some queries that need to be loaded with LIMIT and ORDER BY, because the number of results is so big. Moreover, these queries run into timeouts when using an ORDER BY.
Just as with SQL, there is no default sort order of SPARQL results.
The SPARQL processor is allowed to return solutions in any order, which may (but typically will not) vary with each execution of the query.
The only way to be certain of the order of solutions is to include an ORDER BY clause.
When running expensive (e.g., long-running) queries, the best way to address this is to set up your own local repository/processor/endpoint, instead of using a public endpoint, such as that at wikidata.org.

OFFSET in sparql

I have a request to count the number of records; the request returns 129980 records
SELECT count distinct ?url
WHERE {
?url a dbo:Film.
}
because each time SPARQL returns only 10000 records; So I have to use "offset".
SELECT distinct ?url
WHERE {
?url a dbo:Film.
}limit 10000 offset 1000
Question: if I want to take all the records, I need to set offset =12; But why when I set offset = 1000 I still got 1000 records.
Thank for your responding so much. I appreciate your help.
Note that your first query uses invalid SPARQL syntax. You only get a result because the engine you're querying (if you're querying DBpedia, as it appears, that's Virtuoso) is very forgiving of many errors. Correct and complete syntax would be --
PREFIX dbo: <http://dbpedia.org/ontology/>
SELECT ( COUNT ( DISTINCT ?url ) AS ?HowManyFilms )
WHERE {
?url a dbo:Film .
}
Things to know, for your second query --
OFFSET means "skip this many rows from the total result set"
LIMIT means "only give me this many rows (starting after any OFFSET)"
Rows may be delivered in any order, and this ordering may change from query-to-query, if you don't include an ORDER BY. This can mean that multiple queries with different OFFSET may not get you all rows, and may deliver duplicate rows, when all the partial result sets are combined. So -- anytime you're using OFFSET and/or LIMIT, it's best practice to also use an ORDER BY.
All together, add this to the first query to get the first 10,000 rows--
ORDER BY ?url LIMIT 10000 OFFSET 0
-- and this to get the last 9,980 rows --
ORDER BY ?url LIMIT 10000 OFFSET 120000
I leave the intermediary queries for you...

PDO SQL issue displaying multiple rows when using COUNT()

To display my results from PDO, I always use following PHP code for example:
$STH = $DBH->prepare("SELECT logo_id, guess_count, guessed, count(id) AS counter FROM guess WHERE user_id=:id");
$STH->bindParam(":id",$loginuser['id']);
$STH->execute();
while($row = $STH->fetch()){
print_r($row);
}
Now the issue is that I only get one result. I used to use $STH->rowCount() to check the amount of rows returned, but this method isn't really advised for SELECT statements because in some databases it doesn't react correctly. So I used the count(id) AS counter, but now I only get one result every time, even though the value of $row['counter'] is larger than one.
What is the correct way to count the amount of results in one query?
If you want to check the number of rows that are returned by a query, there are a couple of options.
You could do a ->fetchAll to get an array of all rows. (This isn't advisable for large result sets (i.e. a lot of rows returned by the query); you could add a LIMIT clause on your query to avoid returning more than a certain number of rows, if what you are checking is whether you get more than one row back, you would only need to retrieve two rows.) Checking the length of the array is trivial.
Another option is to run a another, separate query, to get the count separately, e.g.
SELECT COUNT(1) AS counter FROM guess WHERE user_id=:id
But, that approach requires another round trip to the database.
And the old standby SQL_CALC_ROUND_ROWS is another option, though that too can have problematic performance with large sets.
You could also just add a loop counter in your existing code:
$i = 0;
while($row = $STH->fetch()){
$i++
print_r($row);
}
print "fetched row count: ".$i;
If what you need is an exact count of the number of rows that satisfy a particular predicate PRIOR to running a query to return the rows, then the separate COUNT(1) query is likely the most suitable approach. Yes, it's extra code in your app; I recommend you preface the code with a comment that indicates the purpose of the code... to get an exact count of rows that satisfy a set of predicates, prior to running a query that will retrieve the rows.
If I had to process the rows anyway, and adding LIMIT 0,100 to the query was acceptable, I would go for the ->fetchAll(), get the count from the length of the array, and process the rows from the array.
You have to use GROUP BY. Your query should look like
SELECT logo_id, guess_count, guessed, COUNT(id) AS counter
FROM guess
WHERE user_id=:id
GROUP BY logo_id, guess_count, guessed

rowcount before and after query

Can someone explain the difference between these 2 simple queries ...
SET ROWCOUNT = 10
select * from t_Account order by account_date_last_maintenance
and this one
select * from t_Account order by account_date_last_maintenance
SET ROWCOUNT = 10
when executed both return only 10 rows, but the rows are different. There are millions of rows in the table if that matters. Also, the first query runs consistently 20% longer.
Thanks everyone
When you execute SET ROWCOUNT 10 you are telling SQL to stop the query after 10 results are returned. Your first SQL statement is the correct syntax (with the exception of the first line which should read SET ROWCOUNT 10).
The second statement as written will return all of the values ordered when initially executed and then set the row count to 10, so any subsequent execution will return the first 10 items.
ROWCOUNT must be set to 0 to get things back to "normal" execution.
As to why things were returning differently, the data might not be processed the same every time and given the size of your dataset it is most likely that you might sometimes get matching results, but it is not a sure thing. If you want consistent results and only want the first 10 results I would recommend using TOP.

How to stop looking in a database after X rows are found?

I have a query to a database that returns a number X of results. I am looking to return a maximum of 10 results. Is there a way to do this without using LIMIT 0,9? I'll use LIMIT if I have to, but I'd rather use something else that will literally stop the searching, rather than look at all rows and then only return the top 10.
If you're not doing any ordering in the query, then "LIMIT 10" will stop after the first 10 matching rows. But as soon as you do ordering, then all the rows will have to be processed, sorted, and THEN the first 10 rows will be returned.
Think of it as the difference between "ok, first 10 people in line can come in", and "ok, the 10 tallest people in line can come in". First one you just open the door and let in 10 people. Second one you have to through the whole line to find the 10 tallest people.
The LIMIT syntax is the most reliable means of returning a specified number of rows. For ten rows, use:
LIMIT 10
Using LIMIT 0, 9 will return 9 rows, not 10. And there's no benefit to specifying the offset if you're going to start at zero anyways.