How do I modify this query without increasing the number of rows returned? - sql

I've got a sub-select in a query that looks something like this:
left outer join
(select distinct ID from OTHER_TABLE) as MYJOIN
on BASE_OBJECT.ID = MYJOIN.ID
It's pretty straightforward. Checks to see if a certain relation exists between the main object being queried for and the object represented by OTHER_TABLE by whether or not MYJOIN.ID is null on the row in question.
But now the requirements have changed a little. There's another row in OTHER_TABLE that can have a value of 1 or 0, and the query needs to know whether a relation exists between the primary for a 1-value, and also if it exists for a 0 value. The obvious solutions is to put:
left outer join
(select distinct ID, TYPE_VALUE from OTHER_TABLE) as MYJOIN
on BASE_OBJECT.ID = MYJOIN.ID
But that would be wrong because if 0-type and 1-type objects both exist for the same ID, it will increase the number of rows returned by the query, which isn't acceptable. So what I need is some sort of subselect that will return 1 row for each distinct ID, with a "1-type exists" column and a "0-type exists" column. And I have no idea how to code that in SQL.
For example, for the following table,
ID | TYPE_VALUE
_________________
1 | 1
3 | 0
3 | 1
4 | 0
I'd like to see a result set like this:
ID | HAS_TYPE_0 | HAS_TYPE_1
______________________________
1 | 0 | 1
3 | 1 | 1
4 | 1 | 0
Anyone know how I could set up a query to do this? Hopefully with a minimum of ugly hacks?

In the general case, you would use EXISTS:
SELECT DISTINCT ID,
CASE WHEN EXISTS (
SELECT * FROM Table1 y
WHERE y.TYPE_VALUE = 0 AND ID = x.ID)
THEN 1
ELSE 0 END AS HAS_TYPE_0,
CASE WHEN EXISTS (
SELECT * FROM Table1 y
WHERE y.TYPE_VALUE = 1 AND ID = x.ID)
THEN 1
ELSE 0 END AS HAS_TYPE_1
FROM Table1 x;
If you have a very large number of elements in the table, this won't perform so great - those nested subselects are often a kiss of death when it comes to performance.
For your specific case, you could also use GROUP BY and MAX() and MIN() to speed things up:
SELECT
ID,
CASE WHEN MIN(TYPE_VALUE) = 0 THEN '1' ELSE 0 END AS HAS_TYPE_0,
CASE WHEN MAX(TYPE_VALUE) = 1 THEN '1' ELSE 0 END AS HAS_TYPE_1
FROM Table1
GROUP BY ID;

Instead of select distinct ID, TYPE_VALUE from OTHER_TABLE
use
select ID,
MAX(CASE WHEN TYPE_VALUE =0 THEN 1 END) as has_type_0,
MAX(CASE WHEN TYPE_VALUE =1 THEN 1 END) as has_type_1
from OTHER_TABLE
GROUP BY ID;
You can do the same using PIVOT opearator...

Related

Querying a subset

I want to write an SQL query to find records which contain a particular column and from that subset want to find records which doesn't contain a some other value. How do you write a query for that?
cid id2 attribute
--------------------------------
1 100 delete
1 100 payment
1 100 void
2 100 delete
2 102 payment
2 102 void
3 102 delete
3 103 payment
In above example, I want to list cid for which payment and delete attributes exist but void attribute doesn't exist. So it should list out 3 from above example because it doesn't have void attribute.
Forgot to mention that there could be more attributes. However, I need to list out records for which delete and payment exist regardless of other attributes but void doesn’t.
I call this a "set-within-sets" query, because you are looking for particular sets of attributes within each cid.
I would express this with group by and conditions in the having:
select cid
from t
group by cid
having sum(case when attribute = 'payment' then 1 else 0 end) > 0 and
sum(case when attribute = 'delete' then 1 else 0 end) > 0 and
sum(case when attribute = 'void' then 1 else 0 end) = 0 ;
In some databases, you can simplify this with string aggregation -- assuming there are no duplicate attributes for cids. For instance, using the MySQL function:
select cid
from t
where attribute in ('payment', 'delete' 'void')
group by cid
having group_concat(attribute order by attribute) = 'delete,payment';
You can use conditional aggregation:
select cid
from tablename
where attribute in ('delete', 'payment', 'void')
group by cid
having
count(distinct attribute) = 2
and
sum(
case attribute
when 'void' then 1
else 0
end
) = 0
If there are not more attributes than these 3, then you can omit the WHERE clause.
See the demo.
Results:
| cid |
| --- |
| 3 |
I'm assuming that there are only three attributes, so the logic behind this query is:
First COUNT the number of attributes GROUP BY cid, and then LEFT JOIN the original table ON attribute is void. You should grab cid that has exactly 2 attributes and no void.
The original table is named as temp:
SELECT
subq2.result_cid
FROM (
SELECT
*
FROM (
SELECT
T.cid AS result_cid,
COUNT(T.attribute) AS count
FROM
temp AS T
GROUP BY
T.cid
) AS subq
LEFT OUTER JOIN temp AS T2 ON subq.result_cid = T2.cid AND T2.attribute = 'void'
) AS subq2
WHERE subq2.count = 2 AND subq2.id2 IS NULL
use corelated subquery by using not exists
select t1.* from tablename t1
where not exists( select 1 from tablename t2
where t1.cid=t2.cid and attribute='void'
)
and exists ( select 1 from tablename t2
where t1.cid=t2.cid
having count(distinct attribute)=2
)
and attribute in ('payment','delete')
demo online

Selecting a group with or without certain conditions across many rows in SQL

I have data like this:
ID SomeVar
123 0
123 1
123 2
234 1
234 2
234 3
456 3
567 0
567 1
I'm trying to group by my ID to to return all of the IDs that do not have a record with the value 0. That is, my selection would look like this:
ID
234
456
Is there an easy way to do this without creating a subset table with all records not containing 0 then joining it back to the full data set where the tables don't match?
I generally try to avoid subqueries, but you could use one for this case. Do the same group by, and check that the id isn't in a subquery of ids that have 0 for SomeVar. In this case, distinct will do the same and more efficiently, so I'll do that first:
SELECT DISTINCT ID
FROM [table_name]
WHERE ID NOT IN (
SELECT ID FROM [table_name] WHERE SomeVar = 0
);
And if you want to get other information by using a GROUP BY:
SELECT ID, max(SomeVar), count(*), sum(SomeVar)
FROM [table_name]
WHERE ID NOT IN (
SELECT ID FROM [table_name] WHERE SomeVar = 0
)
GROUP BY ID;
You can use aggregation and having:
select id
from t
group by id
having min(somevar) > 0;
This assumes that somevar is never negative. If that is a possibility, then you can use the slightly more verbose:
select id
from t
group by id
having sum(case when somevar = 0 then 1 else 0 end) = 0;
Use case statement with count or sum aggregation, filter by count using having:
select ID
from
(
select ID, count(case when SomeVar=0 then 1 end) cnt
from mytable
group by ID having count(case when SomeVar=0 then 1 end) = 0
) s
;

Multiple Selects from Subquery

I have multiple queries that look like this:
select count(*) from (
SELECT * FROM TABLE1 t
JOIN TABLE2 e
USING (EVENT_ID)
) s1
WHERE
s1.SOURCE_ID = 1;
where the only difference is the t1.SOURCE_ID = (some other number). I would like to turn these into a single query that just selects from the subquery using a different SOURCE_ID for each column in the result, like this:
+----------------+----------------+----------------+
| source_1_count | source_2_count | source_3_count | ... so on
+----------------+----------------+----------------+
I am trying to avoid using the multiple queries as the join is on a very large table and takes some time, so I would rather do it once and query the result multiple times.
This is on a Snowflake data warehouse which I think uses something similar to PostgreSQL (also I'm fairly new to SQL so feel free to suggest a completely different solution as well).
Use conditional aggregation
SELECT sum(case when sourceid=1 then 1 else 0 end) source_1_count, sum(case when sourceid=2 then 1 else 0 end) source_2_count...
FROM TABLE1 t
JOIN TABLE2 e
USING (EVENT_ID)
You would put the results in separate rows, using group by:
SELECT SOURCE_ID, COUNT(*)
FROM TABLE1 t JOIN
TABLE2 e
USING (EVENT_ID)
GROUP BY SOURCE_ID;
Putting the separate sources in columns is troublesome, unless you know the exact list of sources that you want in the result set.
EDIT:
If you know the exact list of sources, you can use conditional aggregation or pivot:
SELECT SUM(CASE WHEN SOURCE_ID = 1 THEN 1 ELSE 0 END) as source_id_1,
SUM(CASE WHEN SOURCE_ID = 2 THEN 1 ELSE 0 END) as source_id_2,
SUM(CASE WHEN SOURCE_ID = 3 THEN 1 ELSE 0 END) as source_id_3
FROM TABLE1 t JOIN
TABLE2 e
USING (EVENT_ID);
All the comments so far ignore the fact that you won't have the possible benefits of pruning the data during the scan, as there are no WHERE predicates. Join can also be slower than it needs to be because of that.
This is a possible improvement:
SELECT SUM(CASE WHEN SOURCE_ID = 1 THEN 1 ELSE 0 END) as source_id_1,
SUM(CASE WHEN SOURCE_ID = 2 THEN 1 ELSE 0 END) as source_id_2,
SUM(CASE WHEN SOURCE_ID = 3 THEN 1 ELSE 0 END) as source_id_3
FROM TABLE1 t JOIN
TABLE2 e
USING (EVENT_ID);
WHERE SOURCE_ID IN (1, 2, 3)

Group by multiple criteria

Given the table like
| userid | active | anonymous |
| 1 | t | f |
| 2 | f | f |
| 3 | f | t |
I need to get:
number of users
number of users with 'active' = true
number of users with 'active' = false
number of users with 'anonymous' = true
number of users with 'anonymous' = false
with single query.
As for now, I only came out with the solution using union:
SELECT count(*) FROM mytable
UNION ALL
SELECT count(*) FROM mytable where active
UNION ALL
SELECT count(*) FROM mytable where anonymous
So I can take first number and find non-active and non-anonymous users with simple deduction .
Is there any way to get rid of union and calculate number of records matching these simple conditions with some magic and efficient query in PostgreSQL 9?
You can use an aggregate function with a CASE to get the result in separate columns:
select
count(*) TotalUsers,
sum(case when active = 't' then 1 else 0 end) TotalActiveTrue,
sum(case when active = 'f' then 1 else 0 end) TotalActiveFalse,
sum(case when anonymous = 't' then 1 else 0 end) TotalAnonTrue,
sum(case when anonymous = 'f' then 1 else 0 end) TotalAnonFalse
from mytable;
See SQL Fiddle with Demo
Assuming your columns are boolean NOT NULL, this should be a bit faster:
SELECT total_ct
,active_ct
,(total_ct - active_ct) AS not_active_ct
,anon_ct
,(total_ct - anon_ct) AS not_anon_ct
FROM (
SELECT count(*) AS total_ct
,count(active OR NULL) AS active_ct
,count(anonymous OR NULL) AS anon_ct
FROM tbl
) sub;
Find a detailed explanation for the techniques used in this closely related answer:
Compute percents from SUM() in the same SELECT sql query
Indexes are hardly going to be of any use, since the whole table has to be read anyway. A covering index might be of help if your rows are bigger than in the example. Depends on the specifics of your actual table.
-> SQLfiddle comparing to #bluefeet's version with CASE statements for each value.
SQL server folks are not used to the proper boolean type of Postgres and tend to go the long way round.

SQL (TSQL) - Select values in a column where another column is not null?

I will keep this simple- I would like to know if there is a good way to select all the values in a column when it never has a null in another column. For example.
A B
----- -----
1 7
2 7
NULL 7
4 9
1 9
2 9
From the above set I would just want 9 from B and not 7 because 7 has a NULL in A. Obviously I could wrap this as a subquery and USE the IN clause etc. but this is already part of a pretty unique set and am looking to keep this efficient.
I should note that for my purposes this would only be a one-way comparison... I would only be returning values in B and examining A.
I imagine there is an easy way to do this that I am missing, but being in the thick of things I don't see it right now.
You can do something like this:
select *
from t
where t.b not in (select b from t where a is null);
If you want only distinct b values, then you can do:
select b
from t
group by b
having sum(case when a is null then 1 else 0 end) = 0;
And, finally, you could use window functions:
select a, b
from (select t.*,
sum(case when a is null then 1 else 0 end) over (partition by b) as NullCnt
from t
) t
where NullCnt = 0;
The query below will only output one column in the final result. The records are grouped by column B and test if the record is null or not. When the record is null, the value for the group will increment each time by 1. The HAVING clause filters only the group which has a value of 0.
SELECT B
FROM TableName
GROUP BY B
HAVING SUM(CASE WHEN A IS NULL THEN 1 ELSE 0 END) = 0
If you want to get all the rows from the records, you can use join.
SELECT a.*
FROM TableName a
INNER JOIN
(
SELECT B
FROM TableName
GROUP BY B
HAVING SUM(CASE WHEN A IS NULL THEN 1 ELSE 0 END) = 0
) b ON a.b = b.b