efficiently find subset of records as well as total count - sql

I'm writing a function in ColdFusion that returns the first couple of records that match the user's input, as well as the total count of matching records in the entire database. The function will be used to feed an autocomplete, so speed/efficiency are its top concerns. For example, if the function receives input "bl", it might return {sampleMatches:["blue", "blade", "blunt"], totalMatches:5000}
I attempted to do this in a single query for speed purposes, and ended up with something that looked like this:
select record, count(*) over ()
from table
where criteria like :criteria
and rownum <= :desiredCount
The problem with this solution is that count(*) over () always returns the value of :desiredCount. I saw a similar question to mine here, but my app will not have permissions to create a temp table. So is there a way to solve my problem in one query? Is there a better way to solve it? Thanks!

I'm writing this on top of my head, so you should definitely have to time this, but I believe that using following CTE
only requires you to write the conditions once
only returns the amount of records you specify
has the correct total count added to each record
and is evaluated only once
SQL Statement
WITH q AS (
SELECT record
FROM table
WHERE criteria like :criteria
)
SELECT q1.*, q2.*
FROM q q1
CROSS JOIN (
SELECT COUNT(*) FROM q
) q2
WHERE rownum <= :desiredCount

A nested subquery should return the results you want
select record, cnt
from (select record, count(*) over () cnt
from table
where criteria like :criteria)
where rownum <= :desiredCount
This will, however, force Oracle to completely process the query in order to generate the accurate count. This seems unlikely to be what you want if you're trying to do an autocomplete particularly when Oracle may decide that it would be more efficient to do a table scan on table if :criteria is just b since that predicate isn't selective enough. Are you really sure that you need a completely accurate count of the number of results? Are you sure that your table is small enough/ your system is fast enough/ your predicates are selective enough for that to be a requirement that you could realistically meet? Would it be possible to return a less-expensive (but less-accurate) estimate of the number of rows? Or to limit the count to something smaller (say, 100) and have the UI display something like "and 100+ more results"?

Related

SQL Aggregate Functions -- HAVING vs WHERE

Context: I'm fiddling with SQL in SQLFiddle Postgres 9.6. I'm trying to apply aggregate functions to 2 columns in the outer query that are dependent on the existence of values from a subquery.
I'm having a hard time determining whether the query is correct using the WHERE clause instead of HAVING.The SQL executes but I'm not confident that it's generating the intended results.
Question: Can someone help me understand if this is the correct way to perform the aggregation? And if not how can I modify the query to get the intended results if including HAVING requires GROUPING BY user_id in the outer query which kinda defeats the purpose.
Intended Results: I want to count the number of actions a user takes before progressing to a new action. I only want to count the number of Read Article events if a user (user_id) made it to the next action (View Product) I'm going to use the aggregation to calculate some averages.
Sample output:
Query:
SELECT event_type as action_a,
COUNT(event_type) as action_a_count,
COUNT(DISTINCT user_id) as unique_users
FROM events
WHERE event_type in ('Read Article')
AND user_id in
(
SELECT DISTINCT(user_id) as user_id
FROM events
WHERE event_type in ('View Product')
)
GROUP BY event_type
Your query is good. With WHERE event_type = 'Read Article' you filter events rows. Thus only those rows must be aggregated.
You could use HAVING event_type = 'Read Article' instead, because you are grouping by that column, too. That would mean you would first look up users for all rows and would aggregate over all desired user rows and only then dismiss undesired event_types. This would give the DBMS much more work to do.
Conclusion: Use WHERE to reduce the rows as soon as possible, so the DBMS can work on smaller data sets. This will Speed up your queries.
HAVING and WHERE do appear to have overlap but there are differences, WHERE checks a row for equality whereas HAVING is used to check against aggregate sets, the most basic example would be finding duplicates in a table with a
SELECT column_name, count(*)
FROM table_name
GROUP BY column_name
HAVING count(*) > 1
This query would need to count the rows before filtering, thus uses HAVING. In your case, filtering for equality using WHERE is fine because it only needs to take a single row into account.

Get count and result from SQL query in Go

I'm running a pretty straightforward query using the database/sql and lib/pq (postgres) packages and I want to toss the results of some of the fields into a slice, but I need to know how big to make the slice.
The only solution I can find is to do another query that is just SELECT COUNT(*) FROM tableName;.
Is there a way to both get the result of the query AND the count of returned rows in one query?
Conceptually, the problem is that the database cursor may not be enumerated to the end so the database does not really know how many records you will get before you actually read all of them. The only way to count (in general case) is to go through all the records in the resultset.
But practically, you can enforce it to do so by using subqueries like
select *, (select count(*) from table) from table
and just ignore the second column for records other than first. But it is very rude and I do not recommend doing so.
Not sure if this is what you are asking for but you can call the ##Rowcount function to return the count of the previous select statement that has been executed.
SELECT mytable.mycol FROM mytable WHERE mytable.foo = 'bar'
SELECT ##Rowcount
If you want the row count included in your result set you can use the the OVER clause (MSDN)
SELECT mytable.mycol, count(*) OVER(PARTITION BY mytable.foo) AS 'Count' FROM mytable WHERE mytable.foo = 'bar'
You could also perhaps just separate two SQL statements with the a ; . This would return a result set of both statements executed.
You would used count(*)
SELECT count(distinct last)
FROM (XYZTable)
WHERE date(FROM_UNIXTIME(time)) >= '2013-10-28' AND
id = 90 ;

Is there any way to calculate total number of rows that return from dynamic query in Common Table Expression(CTE) or Subquery

We are in the process of optimizing our database.We have most of store procedure that uses CTE because it gives us high performance according to our table strucure.We have almost dynamic query that have different result according to different condition.We hold all data in CTE, and check condition, that was the not problem but we need total number of rows that return by each query ,in calculating this it takes lots of time.Temporary table or table variable not suitable in our case as it takes lots of time to insert data in it.We have structure as following
With t(fields) as
(select field1,field2.......
ROW_NUMBER() OVER (order by some column) as row...
from some table and lots of
inner n left joins
where some condition ),
rowTotal(RowTotal) as
(select max(row) from t)
select * from t,RowTotal
where condition for paging
But max(row) took lots of times if i remove this it return data within 100ms. I tried Coun(*),Count(SomeField) and many other it works but took lots of time.How can i achieve total number of rows from cte within some ms any aggregate function will not work for me.Is there any other way to calculate rowtotal like ##rowcount.Thanks in advance for any help.
If you are after the total number of rows from the inner query you can add this as a column to your select using COUNT() and PARTITION BY().
With t(fields) as
(select COUNT(*) OVER (PARTITION BY 1) AS TotalRows,
field1,field2.......
ROW_NUMBER() OVER (order by some column) as row...
from some table ...
This should give you a count of the total rows in 't' as the first column of t
I don't know that this is the fastest way to get the result you want but it works for me on 000's of returned records and prevents extra select queries to find the count separately.

Efficient SQL to count an occurrence in the latest X rows

For example I have:
create table a (i int);
Assume there are 10k rows.
I want to count 0's in the last 20 rows.
Something like:
select count(*) from (select i from a limit 20) where i = 0;
Is that possible to make it more efficient? Like a single SQL statement or something?
PS. DB is SQLite3 if that matters at all...
UPDATE
PPS. No need to group by anything in this instance, assume the table that is literally 1 column (and presumably the internal DB row_ID or something). I'm just curious if this is possible to do without the nested selects?
You'll need to order by something in order to determine the last 20 rows. When you say last, do you mean by date, by ID, ...?
Something like this should work:
select count(*)
from (
select i
from a
order by j desc
limit 20
) where i = 0;
If you do not remove rows from the table, you may try the following hacky query:
SELECT COUNT(*) as cnt
FROM A
WHERE
ROWID > (SELECT MAX(ROWID)-20 FROM A)
AND i=0;
It operates with ROWIDs only. As the documentation says: Rows are stored in rowid order.
You need to remember to order by when you use limit, otherwise the result is indeterminate. To get the latest rows added, you need to include a column with the insertion date, then you can use that. Without this column you cannot guarantee that you will get the latest rows.
To make it efficient you should ensure that there is an index on the column you order by, possibly even a clustered index.
I'm afraid that you need a nested select to be able to count and restrict to last X rows at a time, because something like this
SELECT count(*) FROM a GROUP BY i HAVING i = 0
will count 0's, but in ALL table records, because a LIMIT in this query will basically have no effect.
However, you can optimize making COUNT(i) as it is faster to COUNT only one field than 2 or more (in this case your table will have 2 fields, i and rowid, that is automatically created by SQLite in PKless tables)

SQL - Use results of a query as basis for two other queries in one statement

I'm doing a probability calculation. I have a query to calculate the total number of times an event occurs. From these events, I want to get the number of times a sub-event occurs. The query to get the total events is 25 lines long and I don't want to just copy + paste it twice.
I want to do two things to this query: calculate the number of rows in it, and calculate the number of rows in the result of a query on this query. Right now, the only way I can think of doing that is this (replace #total# with the complicated query to get all rows, and #conditions# with the less-complicated conditions that rows, from #total#, must have to match the sub-event):
SELECT (SELECT COUNT(*) FROM (#total#) AS t1 WHERE #conditions#) AS suboccurs,
COUNT(*) AS totaloccurs FROM (#total#) as t2
As you notice, #total# is repeated twice. Is there any way around this? Is there a better way to do what I'm trying to do?
To re-emphasize: #conditions# does depend on what #total# returns (it does stuff like t1.foo = bar).
Some final notes: #total# by itself takes ~250ms. This more complicated query takes ~300ms, so postgres is likely doing some optimization, itself. Still, the query looks terribly ugly with #total# literally pasted in twice.
If your sql supports subquery factoring, then rewriting it using the WITH statement is an option. It allows subqueries to be used more than once. With will create them as either an inline-view or a temporary table in Oracle.
Here is a contrived example.
WITH
x AS
(
SELECT this
FROM THERE
WHERE something is true
),
y AS
(
SELECT this-other-thing
FROM somewhereelse
WHERE something else is true
),
z AS
(
select count(*) k
FROM X
)
SELECT z.k, y.*, x.*
FROM x,y, z
WHERE X.abc = Y.abc
SELECT COUNT(*) as totaloccurs, COUNT(#conditions#) as suboccurs
FROM (#total# as t1)
Put the reused sub-query into a temp table, then select what you need from the temp table.
#EvilTeach:
I've not seen the "with" (probably not implemented in Sybase :-(). I like it: does what you need in one chunk then goes away, with even less cruft than temp tables. Cool.