Redundant use of distinct in group by? - sql

I'm reviewing some SQL queries in SAS and I encountered the following query structure:
SELECT distinct A, B, Sum(C) FROM Table1 GROUP BY A, B;
I would like to know if it's strictly equivalent to:
SELECT A, B, Sum(C) FROM Table1 GROUP BY A, B;
Or if I'm missing a nuance, in the output or the way the computation is handled

The two queries are equivalent.
Generally,
SELECT DISTINCT a, b, c
FROM <something>
is equivalent to
SELECT a, b, c
FROM <something>
GROUP BY a, b, c
In your case, <something> happens to be a result of GROUP BY query, which has distinct columns A and B. This is enough to ensure that triples A, B, SUM(C) are going to be unique as well.

Related

Syntax error for WHERE clause when using proc sql

I am new to SAS and I need to recreate a query I had running using R.
The syntax rules may be different in SAS but I dont see where I am going wrong here
Table "Old" columns: A, B, C, D, E
Table "New" columns: A, B, C, D, E
PROC SQL;
create table delta as
SELECT *
FROM New
WHERE
(A, B, C)
IN(
SELECT (A, B, C)
FROM New
EXCEPT
SELECT A, B, C
FROM Old);
QUIT;
My code should find delta rows based on A, B, C variables.
Error Message on comma
WHERE(A, B, C): ERROR 79-322: Expecting a (.
I'm not in sas but could be that this db don't allow the use of tuple in WHERE IN clause.
in this case you could try refactoring your quesry as an inner join
SELECT *
FROM New N
INNER JOIN (
SELECT A, B, C
FROM New
EXCEPT
SELECT A, B, C
FROM Old
) T ON T.A = N.A
AND T.B = N.B
AND T.C = N.C

Integrate two sql queries

I have these two queries:
SELECT DISTINCT a, b, c, d, FROM x WHERE b IN (1, 2)
SELECT DISTINCT c, d, FROM y
I would now like to merge these queries such that the statement initiated in the first query only includes rows where the c, d combination is in the output resulting from the second query. Any thoughts on how to do this? My table is large, so efficiency is important.
Use exists?
SELECT DISTINCT a, b, c, d
FROM x
WHERE b IN (1, 2) AND
EXISTS (SELECT 1 FROM y WHERE x.c = y.c and x.d = y.d);
When using exists, the select distinct is only necessary if x has duplicate values. Otherwise it is not necessary.
And, for performance, you want an index on y(c, d). Also, an index on x(b, a, c, d) would also be helpful in most databases.
Note: The distinct is not necessary in the subquery. In some databases, you can use in with composite values as well.
SELECT DISTINCT x.a,x.b,x.c,x.d
FROM x
INNER JOIN y ON x.c = y.c
AND x.d = y.d
WHERE b in (1,2)
Regarding efficiency, your indexing will determine how well that performs.

UNION without comparing one of the columns

I have two queries
select A, B, C, D from T1, T2
select A, B, C, D from T2, T3
I want to do a UNION of the two queries (no duplicates) but not comparing column D, that is if columns A B and C are the same then they are considered duplicates regardless of D. I do not want to select from joined tables T1, T2, and T3. Is this possible on a single statement?
(this is Oracle)
Use UNION and GROUP BY to do this, like following;)
select A, B, C
from(
select A, B, C, D from T1, T2
union
select A, B, C, D from T2, T3
)t
group by A, B, C
And you have to specify which D value do you want to get when A, B, C are the same, here I assume you get max(D), like this;
select A, B, C, max(D) as D
from(
select A, B, C, D from T1, T2
union
select A, B, C, D from T2, T3
)t
group by A, B, C
No matter which value you want to reserve, when you use group by in oracle, you only can select columns which appear in group by or some other columns with aggregation functions.

Reuse subquery results in multiple SELECT in Oracle (Can't create table)

I have multiple SELECT queries using same subset of data. I would like to reuse it so no repeated subqueries or WITH clause. However, I can't CREATE TABLE or VIEW because of insufficient privileges. So is there a workaround?
I'm using TOAD Oracle.
For example,
WITH LOCAL_RESULTS
AS (SELECT a, b, c, d...
FROM SURVEY )
SELECT A, B
FROM LOCAL_RESULTS
where condition=1
WITH LOCAL_RESULTS
AS (SELECT a, b, c, d...
FROM SURVEY )
SELECT A, C
FROM LOCAL_RESULTS
where condition=2
WITH LOCAL_RESULTS
AS (SELECT a, b, c, d...
FROM SURVEY )
SELECT B, D, A...
FROM LOCAL_RESULTS
where condition=3
Thanks for any help.
A union query might work.
with local_results as
(subquery goes here)
select a, b, c, 1 condition
from local_results
where whatever
union
select a, b, null c, 2 condition
from local_results
etc

Problem: Group BY clause showing results previously filtered out by where clause

I have something like this
select A, B, C
from tableA
where A = '123'
group by B
and the results include entries whose A is not '123'. Why is this not the results I expected it to be?
thanks
database has 16k entries
actual result (7k entries): a mixture of entries with A='123' and A='other'
expected results (5k entries): all entries with A='123'
Your query will not work as A and C are not inside group by condition. For C you have to use Min, Max, Avg, Count,... aggregate functions, while for A you can use either aggregate function or diretly value of A something like:
Select Max(A) as A, B, Max(C) as C
From Table
Where A='123'
Group by B
Or
Select '123' as A, B, Max(C) as C
From Table
Where A='123'
Group by B