Counting results in SQLite, given query with functions - sql

As you may (or may not) already know, SQLite does not provide information about total number of results from the query. One has to wrap the query in SELECT count(*) FROM (original query); in order to get row count.
This worked perfectly fine for me, until one of users created custom SQL function (you can define your own functions in SQLite) that does INSERT into another, unrelated table. Then he executes query:
SELECT customFunction() FROM primaryTable WHERE primaryKeyColumnId = 1;
The query returns always 1 row, that is certain. It turns out that customFunction() was called twice (and inserted to that other table 2 rows) and that's because my application called his query as usuall and then called count(*) on that query as a followup.
How to approach this problem? How to execute only the original query and still have a row count from SQLite?
I'm using SQLite (3.13.0) C API.

You either have to remove such function calls from the query, or you cannot get the row count before actually having stepped through all the result rows.

Related

using SQL COUNT function or executing search query directly which is more efficient

Let's say i have a very big database , if i execute a search query directly then count the returned rows would it be more faster ? Or using COUNT(searchquery) then start executing query like ->
SELECT *
FROM TABLE
WHERE bla='blabla'
OFFSET 0 FETCH NEXT 20 ROWS ONLY
I searched for it but i couldn't find any solution.
Do the count in the database! It will be much faster.
First, a count(*) only returns one row and one value. That is much, much less data -- and much faster -- than returning all the rows.
Second, a count(*) does not reference any columns in the select, so the query can be better optimized. It might be possible to get the count without ever looking at the data pages.
It looks like you are doing paging. You need the total count to do display the total count and calculate the total number of pages to the user, yes?
Than Gordon's answer is the one to use.

Different result size between SELECT * and SELECT COUNT(*) on Oracle

I have an strange behavior on an oracle database. We make a huge insert of around 3.1 million records. Everything fine so far.
Shortly after the insert finished (around 1 too 10 minutes) I execute two statements.
SELECT COUNT(*) FROM TABLE
SELECT * FROM TABLE
The result from the first statement is fine it gives me the exact number of rows that was inserted.
The result from the second statement is now the problem. Depending on the time, the number of rows that are returned is for example around 500K lower than the result from the first statement. The difference of the two results is decreasing with time.
So I have to wait 15 to 30 minutes before both statements return the same number of rows.
I already talked with the oracle dba about this issue but he has no idea how this could happen.
Any ideas, questions or suggestions?
Update
When I select only an index column I get the correct row count.
When I instead select an non index column I get again the wrong row count.
That doesn't sounds like a bug to me, if I understood you correctly, it just takes time for Oracle to fetch the entire table . After all, 3 Mil is not a small amount.
As opposed to count, which brings 1 record with the total number of rows.
If after some waiting, the number of records being output equals to the number that the count query returns, then everything is fine.
Have you already verified with these things:
1- Count single column instead of * ALL to verify both result
2- You can verify both queries result by adding where clause and gradually select more rows by removing conditions so that you can get the issue where it is returning different value from both.
I think you should check Execution plan to identify missing indexes to improve performance.
Add missing Indexes and check the result.
Why missing Indexes are impotent:
To count row, Oracle engine no need to go throw paging operation. But while fetching all the details from a table, it requires to go through paging.
And paging process depends on indexes created on a table to fetch the data effectively and fast.
So to decrease time for your second statement, you should find missing indexes and create those indexes.
How to Find Missing Indexes:
You can start with DBA_HIST_ACTIVE_SESS_HISTORY, and look at all statements that contain that type of hint.
From there, you can pull the index name coming from that hint, and then do a lookup on dba_indexes to see if the index exists, is valid, etc.

Select of calculated value always returns row

I have a database (running on postgres 9.3) of bookings of resources. This database contains a table reservations which contains beside other values the start and stop time of the reservation (as timestamp with time zone)
Now I need to know how much reservations a given company has currently active in the future in terms of total hours of all these reservations added together.
I have put together the following query that does the job:
SELECT EXTRACT(EPOCH FROM Sum(stop-start))/3600 AS total
FROM (reservations JOIN partners ON partner = email)
WHERE stop > now() AND company = 'givencompany'
This works quite well if the given company has reservations in the future. The problem I am experiencing is that when the company doesnt have any reservations the query does in fact return a row but the collumn total is empty whereas I would like it to return no row at all (or a row containing 0 if nothing is too complicated) in that case.
Is this possible to accomplish with a different SELECT or another modification to the database or does the consuming application have to check for null every time?
Sorry if my question is trivial but I am very new to databases altogether
Edit
I found out that I could default the returned value with 0 by using COALESCE but I would much prefer it if no row would be returned
Short answer: just add HAVING Sum(stop-start) IS NOT NULL at the end of query.
Long answer:
This query has no explicit GROUP BY, but since it aggregates the rows with sum(), it's implicitly turned into a GROUP BY query, with all the rows matching the WHERE condition taken as one group.
See the doc on SELECT :
without GROUP BY, an aggregate produces a single value computed across
all the selected rows
And about the HAVING clause:
The presence of HAVING turns a query into a grouped query even if
there is no GROUP BY clause. This is the same as what happens when the
query contains aggregate functions but no GROUP BY clause. All the
selected rows are considered to form a single group, and the SELECT
list and HAVING clause can only reference table columns from within
aggregate functions. Such a query will emit a single row if the HAVING
condition is true, zero rows if it is not true.

SQL - Using MAX in a WHERE clause

Assume value is an int and the following query is valid:
SELECT blah
FROM table
WHERE attribute = value
Though MAX(expression) returns int, the following is not valid:
SELECT blah
FROM table
WHERE attribute = MAX(expression)
OF course the desired effect can be achieved using a subquery, but my question is why was SQL designed this way - is there some reason why this sort of thing is not allowed? Students coming from programming languages where you can always replace a data-type by a function call that returns that type find this issue confusing. Is there an explanation one can give them rather than just saying "that's the way it is"?
It's just because of the order of operations of a query.
FROM clause
WHERE clause
GROUP BY clause
HAVING clause
SELECT clause
ORDER BY clause
WHERE just filters the rows returned by FROM. An aggregate function like MAX() can't have a result returned because it hasn't even been applied to anything.
That's also the reason, why you can't use aliases defined in the SELECT clause in a WHERE clause, but you can use aliases defined in FROM clause.
A where clause checks every row to see if it matches the conditions specified.
A max computes a single value from a row set. If you put a max, or any other aggregate function into a where clause, how can SQL server figure out what rows the max function can use until the where clause has finished it filter?
This deals with the order that SQL Server processes commands in. It runs the WHERE clause before a GROUP BY or any aggregate. Since a where clause runs first, SQL Server can't tell if a row will be included in an aggregate until it processes the where. That is what the HAVING clause is for. HAVING runs after the GROUP BY and the WHERE and can include MAX since you have already filtered out the rows you don't want to use. See http://www.bennadel.com/blog/70-SQL-Query-Order-of-Operations.htm for a good explanation of the order in which SQL commands run.
Maybe this work
SELECT blah
FROM table
WHERE attribute = (SELECT MAX(expresion) FROM table1)
The WHERE clause is specifically designed to test conditions against raw data (individual rows of the table). However, MAX is an aggregate function over multiple rows of data. Basically, without a sub-select, the WHERE clause knows nothing about any rows in the table except for the current row. So how can you determine the maximum value over a whole bunch of rows when you don't even know what those rows are?
Yes, it's a little bit of a simplification, especially when dealing with joins, but the same principle applies. WHERE is always row-by-row, so that's all it really knows about.
Even if you have a GROUP BY clause, the WHERE clause still only processes one row at a time in the raw data before grouping. It doesn't know the value of a column in any other rows, so it has no way of knowing which row has the maximum value.
Assuming this is MS SQL Server, the following would work.
SELECT TOP 1 blah
FROM table
ORDER BY expression DESC

Microsoft Access SQL STDEV of COUNT of data

I have a table in MS Access 2010 I'm trying to analyze of people who belong to various groups having completed various jobs. What I would like to do is calculate the standard deviation of the count of the number of jobs each person has completed per group. Meaning, the output I would like is that for each group, I'd have a number that constitutes the standard deviation of how many jobs each person did.
The data is structured like this:
OldGroup, OldPerson, JobID
I know that I need to do a COUNT of the job IDs by Group and Person. I tried creating a subquery to work with, but that didn't work:
SELECT data.OldGroup, STDEV(
SELECT COUNT(data.JobID)
FROM data
WHERE data.Classification = 1
GROUP BY data.OldGroup, data.OldPerson
)
FROM data
GROUP BY data.OldGroup;
This returned an error "At most one record can be returned by this subquery," which I know is wrong, since when I tried to run the subquery as a standalone query it successfully returned more than one record.
Question:
How can I get the STDEV of a COUNT?
Subquestion: If this question can be answered by correcting incorrect syntax in my examples, please do so.
A minor change in strategy that wouldn't work for all cases but did end up working for this one seemed to take care of the problem. Instead of sticking the subquery in the SELECT statement, I put it in FROM, mimicking creating a separate table.
As such, my code looks like:
SELECT OldGroup, STDEV(NumberJobs) AS JobsStDev
FROM (
SELECT OldGroup, OldPerson, COUNT(JobID) AS NumberJobs
FROM data
WHERE data.Classification = 1
GROUP BY OldGroup, OldPerson
) AS TempTable
GROUP BY OldGroup;
That seemed to get the job done.
Try doing a max table query for "SELECT COUNT(data.JobID)...."
Then for the 2nd query, use the new base table.
Sometimes it is just easier to do something in 2 or more queries.