SQL - Use results of a query as basis for two other queries in one statement - sql

I'm doing a probability calculation. I have a query to calculate the total number of times an event occurs. From these events, I want to get the number of times a sub-event occurs. The query to get the total events is 25 lines long and I don't want to just copy + paste it twice.
I want to do two things to this query: calculate the number of rows in it, and calculate the number of rows in the result of a query on this query. Right now, the only way I can think of doing that is this (replace #total# with the complicated query to get all rows, and #conditions# with the less-complicated conditions that rows, from #total#, must have to match the sub-event):
SELECT (SELECT COUNT(*) FROM (#total#) AS t1 WHERE #conditions#) AS suboccurs,
COUNT(*) AS totaloccurs FROM (#total#) as t2
As you notice, #total# is repeated twice. Is there any way around this? Is there a better way to do what I'm trying to do?
To re-emphasize: #conditions# does depend on what #total# returns (it does stuff like t1.foo = bar).
Some final notes: #total# by itself takes ~250ms. This more complicated query takes ~300ms, so postgres is likely doing some optimization, itself. Still, the query looks terribly ugly with #total# literally pasted in twice.

If your sql supports subquery factoring, then rewriting it using the WITH statement is an option. It allows subqueries to be used more than once. With will create them as either an inline-view or a temporary table in Oracle.
Here is a contrived example.
WITH
x AS
(
SELECT this
FROM THERE
WHERE something is true
),
y AS
(
SELECT this-other-thing
FROM somewhereelse
WHERE something else is true
),
z AS
(
select count(*) k
FROM X
)
SELECT z.k, y.*, x.*
FROM x,y, z
WHERE X.abc = Y.abc

SELECT COUNT(*) as totaloccurs, COUNT(#conditions#) as suboccurs
FROM (#total# as t1)

Put the reused sub-query into a temp table, then select what you need from the temp table.

#EvilTeach:
I've not seen the "with" (probably not implemented in Sybase :-(). I like it: does what you need in one chunk then goes away, with even less cruft than temp tables. Cool.

Related

How do I set a number of rows and count the conditional results?

I'm trying to count the number of conditions from a set number of 100 results.
Example: Of the last set 100 actions, how many were for 'X'?
I'm not quite sure where to start.
I'm fairly new to SQL, and I've tried inner joins, subqueries, etc, but I just can't seem to figure it out. I feel it's something fairly simple.
Thank you!
To do this, you simply need to sum up a case statement that checks for the value. However, if you want to do this for only 100 rows, you will need to perform this query against a derived table (or a Common Table Expression, if your RDBMS supports it) to do that row count restriction.
Here is an example that should work in most RDBMSes (you may need to replace TOP with LIMIT, depending on your RDBMS):
select
sum(
case
when (my_val='X') then
1
else
0
end
)
from
(
select top 100
my_val
from
my_table
) t
#RToyo's answer covers it for any DBMS.
In Snowflake you can slightly simplify it by using IFF:
select
sum(IFF(my_val='X', 1, 0))
from
(
select top 100
my_val
from
my_table
) t

Get count and result from SQL query in Go

I'm running a pretty straightforward query using the database/sql and lib/pq (postgres) packages and I want to toss the results of some of the fields into a slice, but I need to know how big to make the slice.
The only solution I can find is to do another query that is just SELECT COUNT(*) FROM tableName;.
Is there a way to both get the result of the query AND the count of returned rows in one query?
Conceptually, the problem is that the database cursor may not be enumerated to the end so the database does not really know how many records you will get before you actually read all of them. The only way to count (in general case) is to go through all the records in the resultset.
But practically, you can enforce it to do so by using subqueries like
select *, (select count(*) from table) from table
and just ignore the second column for records other than first. But it is very rude and I do not recommend doing so.
Not sure if this is what you are asking for but you can call the ##Rowcount function to return the count of the previous select statement that has been executed.
SELECT mytable.mycol FROM mytable WHERE mytable.foo = 'bar'
SELECT ##Rowcount
If you want the row count included in your result set you can use the the OVER clause (MSDN)
SELECT mytable.mycol, count(*) OVER(PARTITION BY mytable.foo) AS 'Count' FROM mytable WHERE mytable.foo = 'bar'
You could also perhaps just separate two SQL statements with the a ; . This would return a result set of both statements executed.
You would used count(*)
SELECT count(distinct last)
FROM (XYZTable)
WHERE date(FROM_UNIXTIME(time)) >= '2013-10-28' AND
id = 90 ;

What exactly is the/this data statement in SAS doing? PostgreSQL equivalent?

I'm converting a SAS script to Python for a PostgreSQL environment. In a few places I've found a data statement in SAS, which looks something like this (in multiple scripts):
data dups;
set picc;
by btn wtn resp_ji;
if not (first.resp_ji and last.resp_ji);
run;
Obviously datasets aren't the same in python or SQL environments, and I'm having trouble determining what this specific statement is doing. To be clear, there are a number of scripts being converted which create a dataset in this manner with this same name. So my expectation would be that most of these would be overwritten over and over.
I'm also unclear as to what the postgres equivalent to the condition in the data dups statement would be.
Is there an obvious PostgreSQL statement that would work in its place? Something like this?:
CREATE TABLE dups AS
SELECT btn, wtn, resp_ji
WHERE /*some condition that matches the condition in the data statement*/
Does the
by btn wtn respji;
statement mean which columns are copied over, or is that the equivalent of an ORDER BY clause in PostgreSQL?
Thanks.
The statement is using what's called 'by group processing'. Before the step can run, it requires that the data is sorted by btn wtn resp_ji.
The first.resp_ji piece is checking to see if it's the first time it's seen the current value of resp_ji within the current btn/wtn combination. Likewise the last.resp_ji piece is checking if it's the final time that it will see the current value of resp_ji within the current btn/wtn combination.
Combining it all together the statement:
if not (first.resp_ji and last.resp_ji);
Is saying, if the current value of resp_ji occurs multiple times for the current combination of btn/wtn then keep the record, otherwise discard the record. The behaviour of the if statement when used like that implicitly keeps/discards the record.
To do the equivalent in SQL, you could do something like:
Find all records to discard.
Discard those records from the original dataset.
So...
create table rows_to_discard as
select btn, wtn, resp_ji, count(*) as freq
from mytable
group by btn, wtn, resp_ji
having count(*) = 1
create table want as
select a.*
from mytable a
left join rows_to_discard b on b.btn = a.btn
and b.wtn = a.wtn
and b.resp_ji = a.resp_ji
where b.btn is null
EDIT : I should mention that there is no simple SQL equivalent. It may be possible by numbering rows in subqueries, and then building logic on top of that but it'd be ugh-ly. It may also depend on the specific flavour of SQL being used.
As someone that learned SAS before postgressql, I found the following much more similar to SAS first. last. logic:
--first.
select distinct on (resp_ji) from <table> order by resp_ji
--last.
select distinct on (resp_ji) from <table> order by resp_ji desc
A way to detect duplicates (when no extra differentiating field is available) is to use the ctid as tie-breaker:
CREATE TABLE dups
AS
SELECT * FROM pics p
WHERE EXISTS ( SELECT * FROM pics x
WHERE x.btn = p.btn
AND x.wtn = p.wtn
AND x.resp_ji = p.resp_ji
AND x.ctid <> p.ctid
);

efficiently find subset of records as well as total count

I'm writing a function in ColdFusion that returns the first couple of records that match the user's input, as well as the total count of matching records in the entire database. The function will be used to feed an autocomplete, so speed/efficiency are its top concerns. For example, if the function receives input "bl", it might return {sampleMatches:["blue", "blade", "blunt"], totalMatches:5000}
I attempted to do this in a single query for speed purposes, and ended up with something that looked like this:
select record, count(*) over ()
from table
where criteria like :criteria
and rownum <= :desiredCount
The problem with this solution is that count(*) over () always returns the value of :desiredCount. I saw a similar question to mine here, but my app will not have permissions to create a temp table. So is there a way to solve my problem in one query? Is there a better way to solve it? Thanks!
I'm writing this on top of my head, so you should definitely have to time this, but I believe that using following CTE
only requires you to write the conditions once
only returns the amount of records you specify
has the correct total count added to each record
and is evaluated only once
SQL Statement
WITH q AS (
SELECT record
FROM table
WHERE criteria like :criteria
)
SELECT q1.*, q2.*
FROM q q1
CROSS JOIN (
SELECT COUNT(*) FROM q
) q2
WHERE rownum <= :desiredCount
A nested subquery should return the results you want
select record, cnt
from (select record, count(*) over () cnt
from table
where criteria like :criteria)
where rownum <= :desiredCount
This will, however, force Oracle to completely process the query in order to generate the accurate count. This seems unlikely to be what you want if you're trying to do an autocomplete particularly when Oracle may decide that it would be more efficient to do a table scan on table if :criteria is just b since that predicate isn't selective enough. Are you really sure that you need a completely accurate count of the number of results? Are you sure that your table is small enough/ your system is fast enough/ your predicates are selective enough for that to be a requirement that you could realistically meet? Would it be possible to return a less-expensive (but less-accurate) estimate of the number of rows? Or to limit the count to something smaller (say, 100) and have the UI display something like "and 100+ more results"?

How to get all results, except one row based on a timestamp?

I have an simple question (?) about SQL. I have come across this problem a few times before and I have always solved it, but I'm looking for a more elegant solution and perhaps a faster solution.
The problem is that I would like to select all rows in a table except the one with the max value in a timestampvalue (in this case this is a summary row but it's not marked as this is any way, and it's not releveant to my result).
I could do something like this:
select * from [table] t
where loggedat < (select max(loggedat) from [table] and somecolumn='somevalue')
and somecolumn='somevalue'
But when working with large tables this seems kind of slow. Any suggestions?
If you don't want to change your DB structure, then your query (or one with a slight variation using <> instead of <) is the way to go.
You could add a column IsSummary bit to the table, and always mark the most recent row as true (and all others false). Then your query would change to:
Select * from [table] where IsSummary = 0 and somecolumn = 'somevalue'
This would sacrifice slower speed on inserts (since an insert would also trigger an update of the IsSummary value) in exchange for faster speed on the select query.
If only you don't mind one tiny (4 byte) extra column, then you might possibly go like this:
SELECT *
FROM (
SELECT *, ROW_NUMBER() OVER (ORDER BY loggedat DESC) AS rownum
FROM [table] t
WHERE somecolumn = 'somevalue'
/* and all the other filters you want */
) s
WHERE rownum > 1
In case you do mind the extra column, you'll just have to list the necessary columns explicitly in the outer SELECT.
It may not be the elegant SQL query you're looking for, but it would be trivial to do it in Java, PHP, etc, after fetching the results. To make it as simple as possible, use ORDER BY timestamp DESC and discard the first row.