Hive SQL - Refining JOIN query to ignore Null values - sql

I'm a little new with SQL so bear with me.
I have two tables, each with an ID column. Table A has a column titled role, Table B has a column titled outcome. I want to query these tables to find which rows based on the ID have role = 'PS' and outcome = 'DE'. Here is my code:
SELECT count(*)
FROM A JOIN B
ON (A.id = B.id
AND A.role = 'PS'
AND B.outcome = 'DE')
I've been searching the internet for a way to do this so that it doesn't include rows that have null values for either A.role or B.outcome.
The above code returns lets say 40,100, even though the total number of entries in B where B.outcome = 'DE' is only 40,000. So it is obviously including entries that do not fit my conditions. Is there a way to better refine my query?

Your query already excludes rows with a null value in A.role. After all, null = 'PS' is not true, and you're using an inner join.
There's an easy explanation of how you can retrieve more rows from the join than there are in B. Say you have these rows for A:
A.id A.role
1 'A'
1 'A'
And these rows for B:
B.id B.outcome
1 'A'
1 'A'
Then this query:
select *
from A
join B
on A.id = B.id and A.role = 'A' and B.role = 'A'
will return 4 rows. That's more than there are in table A or B!
So I'd investigate whether id is unique:
select count(*) from A group by id having count(*) > 1
select count(*) from B group by id having count(*) > 1
If these queries return a count greater than zero, id is not unique. Since a join repeats rows for each match, that would explain a large increase in the amount of returned records.

Related

Select from table where another value doesn't exist in any of the joined tables

I am trying to retrieve the IDs from a table in Oracle only IF another column value doesn't exist in any of the joined tables. Let me give you an example:
example
As you can see in the sketch, Table A is joined to tables B via the ID. I would like to get the IDs from Table A only if all statuses in any joined Table B DO NOT contain the value 2.
Here is my SQL statement:
SELECT ID FROM TABLE A
LEFT JOIN TABLE B
ON A.ID = B.REF_ID
WHERE B.STATUS NOT IN (2)
Unfortunately, I still get all IDs (which makes sense) and am not able to come up with a method to retrieve only the IDs without a certain value in the Status column of a joined table. Hence, I would only like to get ID 1, since all joined tables do not contain the value 2 in their Status.
Many thanks for any inputs.
Use aggregation
SELECT ID
FROM TABLE A LEFT JOIN
TABLE B
ON A.ID = B.REF_ID
GROUP BY A.ID
HAVING SUM(CASE WHEN B.STATUS IN (2) THEN 1 ELSE 0 END) = 0;
Or, simply use NOT EXISTS:
SELECT A.ID
FROM TABLE A
WHERE NOT EXISTS (SELECT 1
FROM B
WHERE A.ID = B.REF_ID AND B.STATUS IN (2)
);

Query to find all records in A that has no join records in B for specific criteria

I am trying to construct a query to find all records in a PostgreSQL table (a) for which:
There are no records at all in table b having state='ready'
It doesn't matter the reason for no records in table b with the state state='ready' (it can be there is no association, the association is null, the state is different or the state is null)
In the below example I expect to find records of ID: 2,3,4 in table a.
I have tried with a left join, but I can't make it work.
PS. The query has to be performant, since tables have millions of records.
https://www.db-fiddle.com/f/5thdgDkv5B6Mx56NyHfoiz/0
Presumably, you just want not exists:
select a.*
from a
where not exists (select 1
from b
where b.? = a.? and -- whatever the join conditions are
b.state = 'ready'
);
Try this:
SELECT * FROM a WHERE id NOT IN
(SELECT b.a_id FROM b WHERE b.state = 'ready');
https://www.db-fiddle.com/f/5thdgDkv5B6Mx56NyHfoiz/2

Query two tables by a condition from a third table

Here is my case:
table a
table b
table c (type int)
if c.type = 1 select all rows in table a
if c.type = 2 select all rows in table b
Currently my solution is find all rows in 3 tables and handle result to get values but it's really bad.
You don't specify what the relationship is between the tables. The expression c.type refers to a rows, not to the entire table. So, let me assume that c.type = 1 means "there exists a row where c.type = 1".
The solution to this problem is then conditional union all:
select a.*
from tablea a
where exists (select 1 from tablec c where c.type = 1)
union all
select b.*
from tableb b
where exists (select 1 from tablec c where c.type = 2)
This assumes that the columns are the same in a and b. Otherwise, you need to specify the correct set of columns.

Is possible have different conditions for each row in a query?

How I can select a set of rows where each row match a different condition?
Example:
Supposing I have a table with a column called name, I want the result ONLY IF the first row name matches 'A', the second row name matches 'B' and the third row name matches 'C'.
Edit:
I want to do this to work without a fixed size, but in a way I can define the sequence like R,X,V,P,T and it matches the sequence, each one in a row, but in the order.
you can, but probably not in a way you would want:
if your table has a numeric id field, that is incremented with each row, you can self join that table 3 times (lets say as "a", "b" and "c") and use the join condition a.id + 1 = b.id and b.id + 1 = c.id and put you filter in a where clause like: a.name = 'A' AND b.name = 'B' AND c.name = 'C'
but don't expect performance ...
Assuming that You know how to provide a row number to your rows (ROW_NUMBER() in SQL Server, for instance), You can create a lookup (match) table and join on it. See below for explanation:
LookupTable:
RowNum Value
1 A
2 B
3 C
Your SourceTable source table (assuming You already added RowNum to it-in case You didn't, just introduce subquery for it (or CTE for SQL Server 2005 or newer):
RowNum Name
-----------
1 A
2 B
3 C
4 D
Now You need to inner join LookupTable with your SourceTable on LookupTable.RowNum = SourceTable.RowNum AND LookupTable.Name = SourceTable.Name. Then do a left join of this result with LookupTable on RowNum only. If there is LookupTable.RowNum IS NULL in final result then You know that there is no complete match on at least one row.
Here is code for joins:
SELECT T.*, LT2.RowNum AS Matched
FROM LookupTable LT2
LEFT JOIN
(
SELECT ST.*
FROM SourceTable ST
INNER JOIN LookupTable LT ON LT.RowNum = ST.RowNum AND LT.Name = ST.Name
) T
ON LT2.RowNum = T.RowNum
Result set of above query will contain rows with Matched IS NULL if row is not matching condition from LookupTable table.
I suppose you could do a sub query for each row, but it wouldn't perform well or scale well at all and would be hard to maintain.
This may be close to what your after... but I need to know where you're getting your values for A, B, C etc...
Select [insert your fields here]
FROM
(Select T1.Name, T1.Age, RowNum as t1RowNum from T T1 order by name) T1O
Full Outer JOIN
(Select T2.Name, T2.Age, RowNum as T2rowNum From T T2 order By name) T2O
ON T1O.T1RowNum+1 = T2O.T2RowNum

Select 2 Rows from Table when COUNT of another table

Here is the code that I currently have:
SELECT `A`.*
FROM `A`
LEFT JOIN `B` ON `A`.`A_id` = `B`.`value_1`
WHERE `B`.`value_2` IS NULL
AND `B`.`userid` IS NULL
ORDER BY RAND() LIMIT 2
What it currently is supposed to do is select 2 rows from A when the 2 rows A_id being selected are not in value_1 or value_2 in B. And the rows in B are specific to individual users with userid.
What I need to do is make it also so that also checks if there are already N rows in B matching a A_id (either in value_1, or value_2) and userid, and if there are more than N rows, it doesn't select the A row.
The following would handle your first request:
Select ...
From A
Left Join B
On ( B.value_1 = A.A_id Or B.value_2 = A.A_id )
And B.userid = #userid
Where B.<non-nullable column> Is Null
Part of the trick is moving your criteria into the ON clause of the Left Join. I'm not sure how the second part of your request fits with the first part. If there are no rows in B that match on value_1 or value_2 for the given user, then by definition that row count will be zero. Is it that you want it be the situation where there can only be a maximum number of rows in B matching on the given criteria? If so, then I'd write my query like so:
Select ...
From A
Where (
Select Count(*)
From B B2
Where ( B2.value_1 = A.A_id Or B2.value_2 = A.A_id )
And B2.userid = #userid
) <= #MaxItems