Two ways to use Count, are they equivalent?

Two ways to use Count, are they equivalent? - sql

Is
SELECT COUNT(a.attr)
FROM TABLE a
equivalent to
SELECT B
FROM (SELECT COUNT(a.attr) as B
FROM TABLE a)
I would guess no, but I'm not sure.
I'm also assuming the answer would be the same for functions like min, max, avg, correct?
EDIT:
This is all out of curiosity, I'm still new at this. Is there a difference between the value returned for the count of the following and the above?
SELECT B, C
FROM (SELECT COUNT(a.attr) as B, a.C
FROM TABLE a
GROUP BY c)
EDIT AGAIN: I looked into it, lesson learned: I should be awake when I try to learn about these things.

Technically, they are not the same, the first one is a simple select, the second one is a select with a sub select.
But every sane optimizer will generate the same execution plan for both of them.

The results are the same, and would be the same as:
SELECT E
FROM
(SELECT D as E
FROM
(SELECT C as D
FROM
(SELECT B as C
FROM
(SELECT COUNT(a.attr) as B
FROM TABLE a))))
And equally as pointless.
The second query is essentially obfuscating a COUNT and should be avoided.
EDIT:
Yes, your edited query that was added to the OP is the same thing. It's just adding a subquery for no reason.

Am posting this answer to supplement what has already been said in the other answers, and because you cannot format comments :)
You can always check the execution plan to see if queries are equivalent; this is what SQL Server makes of it:
DECLARE #A TABLE
(
attr int,
c int
)
INSERT #A(attr,c) VALUES(1,1)
INSERT #A(attr,c) VALUES(2,1)
INSERT #A(attr,c) VALUES(3,1)
INSERT #A(attr,c) VALUES(4,2)
INSERT #A(attr,c) VALUES(5,2)
SELECT count(attr) FROM #A
SELECT B
FROM (SELECT COUNT(attr) as B
FROM #A) AS T
SELECT B, C
FROM (SELECT COUNT(attr) as B, c AS C
FROM #A
GROUP BY c) AS T
Here's the execution plan of the SELECT statments, as you can see there is no difference in the first two:

Yes there are. All your doing in the second one is naming the returned count B. They will return the same results.
http://www.roseindia.net/sql/sql-as-keyword.shtml
EDIT:
Better example:
http://www.w3schools.com/sql/sql_alias.asp
The third example will be different because it contains a group by. It will return the count for every distinct a.C entry. Example
B C
w/e a
w/e a
w/e b
w/e a
w/e c
Would return
3 a
1 b
1 c
Not necessarily in that order
Easiest way to check all of this is to try it for yourself and see what it returns.

Your first code sample is correct, but second does not have any sense.
You just select all data twice without any operations.
So, output for first and second samples will be equal.

Related

Can I (in a many to many relationship) select only those id:s in column A that has a connection to all id:s in column B?

I need to retrieve only those id's in "A" that has a connection to all id´s in "B".
In the example below, the result should be '...fa3e' because '...65d6' does NOT have a reference to all id´s in "B".
However, if '...fa3e' and '...65d6' reference the same id's in column B, then the query should return both '...fa3e' and '...65d6'.
And, subsequently, if a fifth row would connect '...fa3e' with a completely new id in "B". Then '...65d6' would be excluded because it no longer holds a reference to all id's in column "B".
Is there a way to accomplish this in SQL server?
I can´t really come up with a good description/search term of what it is I´m trying to do ("Exclude column A based on values in column B" is not quite right). Hence I´m striking out looking for resources.

I believe these values reside in the same table.
For distinct a values only:
select a
from T
group by a
having count(distinct b) = (select count(distinct b) from T);
To return all the rows:
select * from T where a in (
select a from T group by a
having count(distinct b) = (select count(distinct b) from T)
);
If (a, b) pairs are always unique then you wouldn't need the distinct qualifier on the left-hand counts. In fact you could even use count(*) for that.

This seems like it's going to be a terrible query, but at it's most basic, you want
All A where B in...
All B that are fully distinct
In SQL, that looks like
select distinct A
from test
where B in (select B from test group by B having count(1) = 1);
Absolutely zero guarantees on performance, but, this gives you the right value A. If you want to see which A/B pair actually made the cut, it could be SELECT A, B FROM test... too.

Pass a function return to another in the same row

I Need to pass the return value of a function that is selected in previous column and pass it as parameter to a following function in the same row. I cannot use the alias:
What I want to have is:
SELECT
dbo.GetSecondID(f.ID) as SecondID,
dbo.GetThirdID(SecondID) as ThirdID
FROM Foo f
Any workaround? Thank you!
EDIT:
The method dbo.GetSecondID() is very heavy and I am dealing with a couple of million records in the table. It is not wise to pass the method as a parameter.

The way that SQL is designed, it is intended that all columns can be computed in parallel (in theory). This means that you cannot have one column's value depend on the result of computing a different column (within the same SELECT clause).
To be able to reference the column, you might introduce a subquery:
SELECT SecondID,dbo.GetThirdID(SecondID) as ThirdID
FROM
(
SELECT
dbo.GetSecondID(f.ID) as SecondID
FROM Foo f
) t
or a CTE:
;WITH Results1 AS (
SELECT
dbo.GetSecondID(f.ID) as SecondID
FROM Foo f
)
SELECT SecondID,dbo.GetThirdID(SecondID) as ThirdID
FROM Results1
If you're building up calculations multiple times (e.g. A depends on B, B depends on C, C depends on D...), then the CTE form usually ends up looking neater (IMO).

Bingo! The secret stand in applying a CROSS APPLY. The following code was helpful
SELECT
sndID.SecondID,
dbo.GetThirdID(sndID.SecondID) as ThirdID
FROM Foo f
CROSS APPLY
(
SELECT dbo.GetSecondID(f.ID) as SecondID
) sndID
EDIT:
This only works if SecondID is unique (only one record is returned) or GROUP BY is used

Did you mean this:
SELECT
dbo.GetThirdID(dbo.GetSecondID(f.ID)) as ThirdID
FROM Foo f

Error in using EXCEPT with INTERSECT in SQL

Suppose I have three tables Table A,Table B and Table C.
Table A contains the col t1 with entries 1,2,2,3,4,4.
Table B has col t2 with entries 1,3,4,4.
Table C has col t3 with entries 1,2,4,4.
The query given was
SELECT * FROM A EXCEPT (SELECT * FROM B INTERSECT SELECT * FROM C ).
I saw this question in a test paper. It was mentioned that the expected answer was 2 but the answer obtained from this query was 1,2,4. I am not able to understand the principle behind this.

Well, as I see it, both the expected answer and the answer you obtained are wrong. It may be the RDBMS that you are using, but analyzing your query the results should be 2,3. First you should do the INTERSECT between tables B and C, the values that intersect are 1 and 4. Taking that result, you should take all the values from table A except 1 and 4, that leaves us with 2 and 3 (since EXCEPT and INTERSECT return only distinct values). Here is a sqlfiddle with this for you to try.

Because of the bracket, the INTERSECT between B and C is done first, resulting in (1,4). You can even verify this just be taking the latter part and running in isolation:
SELECT * FROM B INTERSECT SELECT * FROM C
The next step is to select everything in A EXCEPT those that exist in the previous result of (1,4), which leaves (2,3).
The answer should be 2 and 3, not 1,2 and 4.
BTW, it should be mentioned that even if you had no parenthesis in the query at all, the result should still be the same because the INTERSECT operator has a higher precedence than the EXCEPT/UNION operators. This is the SQL Server documentation but it's consistent with the standard that applies to any DBMS that implements these operators.

SQL query in MySQL using GROUP BY

Okay so this query should be easy but I'm having a bit of difficult. Let's say I have a table called 'foo' with columns 'a', 'b'.
I'm trying to figure out the following in one query:
select how of column 'a' are available of type column 'b', this is done with the following:
mysql> select count(a),b from foo GROUP BY b;
that's straight forward. but now I want to add a third output to that query as well which shows the percentage of the result from count(a) divided by count(*). So if I have 100 rows total, and one of the GROUP BY results comes back with 20, I can get the third column to output 20%. Meaning that column a makes for 20% of the aggregate pool.

Assuming you have > 0 rows in foo
SELECT count(a), b, (count(a) / (SELECT count(*) FROM foo)) * 100
FROM foo
GROUP BY b

There is a risk of it running slow, best bet is to program whatever is to preform two separate queries.
SELECT count(*) INTO #c FROM foo;
SELECT count(a), b, (count(a)/#c)*100 FROM foo GROUP by b;

Lazy evaluation of Oracle PL/SQL statements in SELECT clauses of SQL queries

I have a performance problem with an Oracle select statement that I use in a cursor. In the statement one of the terms in the SELECT clause is expensive to evaluate (it's a PL/SQL procedure call, which accesses the database quite heavily). The WHERE clause and ORDER BY clauses are straightforward, however.
I expected that Oracle would first perform the WHERE clause to identify the set of records that match the query, then perform the ORDER BY clause to order them, and finally evaluate each of the terms in the SELECT clause. As I'm using this statement in a cursor from which I then pull results, I expected that the expensive evaluation of the SELECT term would only be performed as needed, when each result was requested from the cursor.
However, I've found that this is not the sequence that Oracle uses. Instead it appears to evaluate the terms in the SELECT clause for each record that matches the WHERE clause before performing the sort. Due to this, the procedure that is expensive to call is called for every result result in the result set before any results are returned from the cursor.
I want to be able to get the first results out of the cursor as quickly as possible. Can anyone tell me how to persuade Oracle not to evaluate the procedure call in the SELECT statement until after the sort has been performed?
This is all probably easier to describe in example code:
Given a table example with columns a, b, c and d, I have a statement like:
select a, b, expensive_procedure(c)
from example
where <the_where_clause>
order by d;
On executing this, expensive_procedure() is called for every record that matches the WHERE clause, even if I open the statement as a cursor and only pull one result from it.
I've tried restructuring the statement as:
select a, b, expensive_procedure(c)
from example, (select example2.rowid, ROWNUM
from example example2
where <the_where_clause>
order by d)
where example.rowid = example2.rowid;
Where the presence of ROWNUM in the inner SELECT statement forces Oracle to evaluate it first. This restructuring has the desired performance benefit. Unfortunately it doesn't always respect the ordering that is required.
Just to be clear, I know that I won't be improving the time it takes to return the entire result set. I'm looking to improve the time taken to return the first few results from the statement. I want the time taken to be progressive as I iterate over the results from the cursor, not all of it to elapse before the first result is returned.
Can any Oracle gurus tell me how I can persuade Oracle to stop executing the PL/SQL until it is necessary?

Why join EXAMPLE to itself in the in-line view? Why not just:
select /*+ no_merge(v) */ a, b, expensive_procedure(c)
from
( select a, b, c
from example
where <the_where_clause>
order by d
) v;

If your WHERE conditions are equalities, i. e.
WHERE col1 = :value1
AND col2 = :value2
you can create a composite index on (col1, col2, d):
CREATE INDEX ix_example_col1_col2_d ON example(col1, col2, d)
and hint your query to use it:
SELECT /*+ INDEX (e ix_example_col1_col2_d) */
a, b, expensive_procedure(c)
FROM example e
WHERE col1 = :value1
AND col2 = :value2
ORDER BY
d
In the example below, t_even is a 1,000,000 rows table with an index on value.
Fetching 100 columns from this query:
SELECT SYS_GUID()
FROM t_even
ORDER BY
value
is instant (0,03 seconds), while this one:
SELECT SYS_GUID()
FROM t_even
ORDER BY
value + 1
takes about 170 seconds to fetch first 100 rows.
SYS_GUID() is quite expensive in Oracle
As proposed by others, you can also use this:
SELECT a, b, expensive_proc(c)
FROM (
SELECT /*+ NO_MERGE */
*
FROM mytable
ORDER BY
d
)
, but using an index will improve your query response time (how soon the first row is returned).

Does this do what you intend?
WITH
cheap AS
(
SELECT A, B, C
FROM EXAMPLE
WHERE <the_where_clause>
)
SELECT A, B, expensive_procedure(C)
FROM cheap
ORDER BY D

You might want to give this a try
select a, b, expensive_procedure(c)
from example, (select /*+ NO_MERGE */
example2.rowid,
ROWNUM
from example example2
where <the_where_clause>
order by d)
where example.rowid = example2.rowid;

Might some form of this work?
FOR R IN (SELECT a,b,c FROM example WHERE ...) LOOP
e := expensive_procedure(R.c);
...
END LOOP;

One of the key problems with the solutions that we've tried is how to adjust the application that generates the SQL to structure the query correctly. The built SQL will vary in terms of number of columns retrieved, number and type of conditions in the where clause and number and type of expressions in the order by.
The inline view returning ROWIDs for joining to the outer was an almost completely generic solution that we can utilise, except where the search is returning a significant portion of the data. In this case the optimiser decides [correctly] that a HASH join is cheaper than a NESTED LOOP.
The other issue was that some of the objects involved are VIEWs that can't have ROWIDs.
For information: "D" was not a typo. The expression for the order by is not selected as part of the return value. Not an unusual thing:
select index_name, column_name
from user_ind_columns
where table_name = 'TABLE_OF_INTEREST'
order by index_name, column_position;
Here, you don't need to know the column_position, but sorting by it is critical.
We have reasons (with which we won't bore the reader) for avoiding the need for hints in the solution, but it's not looking like this is possible.
Thanks for the suggestions thus far - we have tried most of them already ...

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Two ways to use Count, are they equivalent? - sql

Technically, they are not the same, the first one is a simple select, the second one is a select with a sub select. But every sane optimizer will generate the same execution plan for both of them.

Your first code sample is correct, but second does not have any sense. You just select all data twice without any operations. So, output for first and second samples will be equal.

Related

Can I (in a many to many relationship) select only those id:s in column A that has a connection to all id:s in column B?

Pass a function return to another in the same row

Error in using EXCEPT with INTERSECT in SQL

SQL query in MySQL using GROUP BY

Lazy evaluation of Oracle PL/SQL statements in SELECT clauses of SQL queries

Categories

Resources