Restrict many - many results in SQL join - sql

The SQL below contains some DDL and a simple query.
The result I am getting is
a1|b1|c1
a1|b2|c3
a3|b3|c2
a3|b3|c3
a3|b3|c4
a3|b3|c5
a3|b5|c6
a3|b5|c7
The result I want is
a1 |b1 |c1
a1 |b2 |c3
a3 |b3 |c2
null |null |c4
null |null |c5
a3 |b5 |c6
null |null |c7
I tried using MAX, MIN, rownums and what not. I am at my wit's end. I am including only the base query I started with and not all the options I tried because they don't work at all. Any help is appreciated!
BEGIN TRANSACTION;
drop table if exists table_A;
drop table if exists table_B;
drop table if exists table_C;
/* Create a table called NAMES */
CREATE TABLE table_A(a_Id text PRIMARY KEY, val_a text);
CREATE TABLE table_B(a_Id text, b_Id text, val_b text);
CREATE TABLE table_C(b_Id text, c_Id text, val_c text);
/* Create few records in this table */
INSERT INTO table_A VALUES('a1','va1');
INSERT INTO table_A VALUES('a2','va2');
INSERT INTO table_A VALUES('a3','va3');
INSERT INTO table_B VALUES('a1', 'b1','vb1');
INSERT INTO table_B VALUES('a1', 'b2','vb2');
INSERT INTO table_B VALUES('a3', 'b3','vb31');
INSERT INTO table_B VALUES('a2', 'b4','vb4');
INSERT INTO table_B VALUES('a3', 'b5','vb31');
INSERT INTO table_C VALUES('b1', 'c1','vc1');
INSERT INTO table_C VALUES('b3', 'c2','vc2');
INSERT INTO table_C VALUES('b3', 'c3','vc3');
INSERT INTO table_C VALUES('b2', 'c3','vc3');
INSERT INTO table_C VALUES('b3', 'c4','vc2');
INSERT INTO table_C VALUES('b3', 'c5','vc3');
INSERT INTO table_C VALUES('b5', 'c6','vc3');
INSERT INTO table_C VALUES('b5', 'c7','vc3');
COMMIT;
select
a.a_Id, b.b_Id, c.c_Id
from
table_A as a
join
table_B as b
on a.a_Id = b.a_Id
join
table_C as c
on b.b_Id = c.b_Id;

something like this should work (I have tested it on PostgreSql, should work on Oracle too)
SELECT
case when row_number = 1 then a_id end as a_id,
case when row_number = 1 then b_id end as b_id,
c_id
FROM (
SELECT
a.a_Id,
b.b_Id,
c.c_Id,
row_number() OVER (partition by a.a_id, b.b_id order by c.c_id) as row_number, --for a_id, b_id
row_number() OVER (partition by c.c_id order by c.c_id) as row_number2 --to avoid c_id duplicates
FROM
table_A a
join
table_B b on a.a_Id = b.a_Id
join table_C c on b.b_Id = c.b_Id
) innerquery
WHERE
row_number2 = 1 --this is to avoid c_id duplicates
SQLFIDDLE

I'd first recommend that this sounds better to handle in your presentation logic. However, it is possible to accomplish with SQL alone.
You can take advantage of Oracle's LAG() function along with CASE to check if the previous row had the same a and b id values.
Here's an example using a common table expression:
with cte as (
select
a.a_Id, b.b_Id, c.c_Id,
lag (a.a_Id,1) over (order by a.a_Id, b.b_Id) prev_a_Id,
lag (b.b_Id,1) over (order by a.a_Id, b.b_Id) prev_b_Id
from table_A a
join table_B b
on a.a_Id = b.a_Id
join table_C c
on b.b_Id = c.b_Id
order by
a.a_id, b.b_id
)
select
case
when prev_a_Id is null or
prev_a_Id <> a_Id or
prev_b_Id <> b_Id
then a_id
end new_a_Id,
case
when prev_a_Id is null or
prev_a_Id = a_Id or
prev_b_Id = b_Id
then b_id
end new_b_Id, c_Id
from cte;
SQL Fiddle Demo

select t1.a_id, t1.b_id, table_c.c_id
from table_c
left join
(
select a_Id, b_Id, c_Id
from
(
select a.a_Id as a_id, b.b_Id as b_id, c.c_Id as c_id,
ROW_NUMBER() OVER (PARTITION BY a.a_ID, b.b_id ORDER BY C_ID) as aNum
from table_A as a
join table_B as b on a.a_Id = b.a_Id
join table_C as c on b.b_Id = c.b_Id
) t2
where aNum = 1
) t1 on table_c.c_id = t1.c_id
order by table_c.c_id
fiddle:
http://sqlfiddle.com/#!3/6049b/1

I don't think i fully understood that you are trying to achieve, but correct me please if i'm wrong.
First, you use inner join to join tables (at least it will be if you use sql server, but should be the same in oracle). That means that for example you will get row from first table only if it has a correspondent row in the second table, And if now rows in first table for corresponding rows in second table, that rows from second table never appear in results.
According to description of result that you want to achieve you need is a outer join . Which one exactly left/right/full outer join depends on that you are trying to achieve (looks like you need left outer join or full outer join). I'm not quite sure that you exact aim because you explain how your data in this concrete example should looks like not general case.
So please have a look at description of different join types and choose sql join types
And also one importation remark: text type is probably last type from the list that i would consider as primary key.

Related

SQL inner join with conditional selection

I am new in SQL. Lets say I have 2 tables one is table_A and the other one is table_B. And I want to create a view with two of them which is view_1.
table_A:
id
foo
1
d
2
e
null
f
table_B
id
name
1
a
2
b
3
c
and when I use this query :
SELECT DISTINCT table_A.id, table_B.name
FROM table_A
INNER JOIN table_B ON table_B.id = table_A.id
the null value in table_A can't be seen in the view_1 since it is not found in table_B. I want view_1 to show also this null row like :
id
name
1
a
2
b
null
no entry
Should I create a 4. table? I couldn't find a way.
Try this Query:
SELECT DISTINCT a.id,(CASE When b.name IS NULL OR b.name = '' Then 'No Entry' else b.name end) name FROM table_A a
LEFT JOIN table_B b on a.id = b.id
You are looking for an outer join. Thus you keep all table_A rows and join table_B rows where they exist. If no match exists, the table_B columns in the joined row are NULL.
You replace NULLs with a value with COALESCE.
SELECT a.id, COALESCE(b.name, 'no entry') AS name
FROM table_a a
LEFT OUTER JOIN table_b b ON b.id = a.id
ORDER BY a.id NULLS LAST;
You haven't tagged your request with your DBMS. Not all DBMS support the NULLS LAST clause.
Please note that there is no DISTINCT in my query. It is not needed. And every time you think you must use DISTINCT, think twice. SELECT DISTINCT is very seldom needed. Most often it is used, because the query is kind of flawed and causes the undesired duplicates itself.

Select Name instead OF ID in table with ID-Ref Column SQL

Lets say we have 2 Tables:
Table A Table B
- A_ID - B_ID
- A_Name - A_ID
I need a select statement, that selects * from Table B showing the A_NAME instead of the A_ID.
By trying it I got the following select statement which ... doesn't work to well. It is giving me a lot of nulls, but no names.
SELECT B_ID,
(select A_NAME from TableA as A where A.A_ID = B.A_ID) as Name
FROM TableB as B
Thanks for all your Answers.
The final solution:
The shown query DOES work (even though it may be slow) and the solutions in the answers also do work.
The problem why it didn't give results for me was because of my data. On another database with the same schema all the commands work.
You should try LEFT JOIN
SELECT
B_ID, A_Name
FROM
tableB B LEFT JOIN tableA A
ON B.A_ID = A.A_ID
you can do it with a join:
SELECT B.B_ID, A.A_Name
FROM B
INNER JOIN A
ON A.A_ID = B.A_ID;
Edit:
If you want only the entries from table b you can to it with a left join, like #jarlh said:
SELECT B.B_ID, A.A_Name
FROM B
LEFT JOIN A
ON A.A_ID = B.A_ID;

SQL Query Duplicating records

I've got two tables.
Let's call them table_A and table_B.
Table_B contains the ForeignKey of table_A.
Table_A
ID Name
1 A
2 B
3 C
Table_B
ID table_a_fk
1 2
2 3
Now I want to get all the names out of table_a IF table_b does not contain the ID of the record in table_a.
I've tried it with this query:
SELECT a.name
FROM table_a a, table_b b
WHERE a.id != b.table_a_fk
With this Query I'm getting the right result I just get this result like 5times and I don't know why.
Hope someone can explain me that.
Your query creates a cartesian product between your two tables A and B. It is the cartesian product that generates those duplicate values. Instead, you want to use an anti-join, which is most commonly written in SQL using NOT EXISTS
SELECT a.name
FROM table_a a
WHERE NOT EXISTS (
SELECT *
FROM table_b b
WHERE a.id = b.table_a_fk
)
Another way to express an anti-join with NOT IN (only if table_b.table_a_fk is NOT NULL):
SELECT a.name
FROM table_a a
WHERE a.id NOT IN (
SELECT b.table_a_fk
FROM table_b b
)
Another, less common way to express an anti-join:
SELECT a.name
FROM table_a a
LEFT OUTER JOIN table_b b ON a.id = b.table_a_fk
WHERE b.id IS NULL
use distinct
SELECT distinct a.name
FROM table_a a, table_b b
WHERE a.id != b.table_a_fk
or better is...
Select distinct name
from tableA a
Where not exists (Select * from tableB
Where table_a_fk = a.id)

Trouble with select statement on three table join

Im trying to write a query that will return a set of columns from three different tables.
The table that is link between the other two tables is called Table_A, it contains the keys of for the other two tables.
The second table is called the Table_B and the last table is called the Table_C.
Table_A columns.
| a_ID (primary key)| b_ID (foreign key)| c_ID
(foreign key)| ..... |
Table_B columns
|b_ID (primary key)| b1 | b2 | ...... |
Table_C columns
| c_ID (primary key) | c1 | c2 | ...... |
This is my SQL Query below. (Im only concerned with the columns above although there are more in each table.)
SELECT b.b_ID
, b.b1
, b.b2
, a.a_ID
, c.c1
, c.c2
FROM Table_A AS a
JOIN Table_B AS b ON a.b_ID = b.b_ID
JOIN Table_C AS c ON a.c_ID = c.c_ID
Im using open office for my projects and the error I'm getting is
"Table not found in statement [ SELECT b.b_ID
, b.b1
, b.b2
, a.a_ID
, c.c1
, c.c2
FROM Table_A AS a
JOIN Table_B AS b ON a.b_ID = b.b_ID
JOIN Table_C AS c ON a.c_ID = c.c_ID]"
For some reason if I change the select statement just to get all columns (*) it returns the correct results but I need to narrow it down to the columns listed in my query.*
SELECT *
FROM Table_A AS a
JOIN Table_B AS b ON a.b_ID = b.b_ID
JOIN Table_C AS c ON a.c_ID = c.c_ID'
EDIT: I have removed the actual table and column names so that you don't have to understand the story to help with the issue.
Shouldn't your WHERE clause be:
WHERE b.eventStartDate > '2013-10-01'
?
This is a "spot the difference" type of question...

SQL: How to find the minimum number of entries linked from one table to another

Let's say I have some Table_A:
A_id | A_val
1 a
2 b
3 c
Some Table_B:
B_id | B_val
1 d
2 e
3 g
and a linker Table_C:
A_id | B_id
1 1
2 1
2 2
3 1
3 2
3 3
I'm in need of help trying to find the items in Table A that has the fewest items in Table B linked to it. I'm currently a beginner with SQL using PostgreSQL and figured it may have something to do with using a sub-query. I've managed to count the links using:
SELECT A_id, COUNT(B_id) as Num_links
FROM TABLE_C
GROUP BY A_id;
But I've no idea where to go from here.
You could use a with clause to give an alias to your "count" query and treat it like a temp table. Then select the a_id with the num_links less-than-or-equal-to the lowest count in num_links.
WITH link_counts AS (
SELECT a_id, COUNT(b_id) as num_links
FROM table_c
GROUP BY a_id
)
SELECT a_id
FROM link_counts
WHERE num_links <= (SELECT MIN(num_links) FROM link_counts)
Note that this could return multiple rows if different a_id have same (lowest) number of links (for instance if a_id 1 and 4 both only had 1 link each).
You can use RANK(). This will rank your Aid's by the COUNT(Bid) -- for those that have the same number, all will be returned with the same rank.
SELECT *
FROM A T1
JOIN (
SELECT Aid, RANK() OVER (ORDER BY COUNT(Bid)) rnk
FROM C
GROUP BY Aid
) T2 ON T1.Id = T2.Aid
WHERE T2.rnk = 1
And here is the Fiddle.
Good luck.
Here's another option. It uses a subquery in the HAVING clause:
SELECT DISTINCT AId, COUNT(*)
FROM C
GROUP BY AId
HAVING COUNT(*) <= ALL (SELECT COUNT(*)
FROM C
GROUP BY AId)
And the related fiddle. I have no idea how this would compare to the other solutions in terms of performance, but it seems to show clearly what is going on.
Here is the strategy. Calculate the maximum number of links. You can do that by revising your query with an order by and limit.
Next, calculate the total number of links for each row in tableC. For this, I'm using a window function. The statement:
count(*) over (partition by a_id)
Says create a variable that is the count of "a"s in my table.
Then join this together.
select distinct c.a_id
from (select c.*,
count(*) over (partition by a_id) as num_links
from table_c c
) c join
(select a_id, count(*) as num_links
from table_c c
group by a_id
order by 2 asc
limit 1
) cmax
on c.num_links = cmax.num_links
WITH ct AS (
SELECT a.a_id
,count(c.a_id) AS link_ct
,min(count(c.a_id)) OVER () AS min_ct
FROM table_a a
LEFT JOIN table_c c USING (a_id)
GROUP BY 1
)
SELECT a_id, link_ct
FROM ct
WHERE link_ct = min_ct;
This is similar to what #matts posted. It differs in some aspects:
In the CTE ct I count LEFT JOIN to table_c, this way I don't miss rows from table_a with 0 connections to table_b which should win according to the definition in the question.
Compute min_ct in the CTE with a window function (and therefore without additional subquery in the final WHERE condition). May or may not be faster, it's cleaner in any case.
Final WHERE condition is good with = instead of <=.
->sqlfiddle demonstrating the difference.
It looks like others here have a more elegant solution... my SQL Fu is a little rusty but this will work as well.
CREATE TABLE Table_C
(
A_id INT,
B_id INT
);
INSERT INTO Table_C (A_id, B_id) VALUES (13, 112);
INSERT INTO Table_C (A_id, B_id) VALUES (44, 105);
INSERT INTO Table_C (A_id, B_id) VALUES (66, 68);
INSERT INTO Table_C (A_id, B_id) VALUES (13, 113);
INSERT INTO Table_C (A_id, B_id) VALUES (445, 105);
INSERT INTO Table_C (A_id, B_id) VALUES (660, 68);
CREATE TABLE TempTable
(
A_id INT,
Cnt INT
);
INSERT INTO
TempTable (A_id, Cnt)
SELECT
t.A_id
, COUNT(t.A_id) AS Cnt
FROM
Table_C t
GROUP BY
t.A_id;
SELECT #minCnt := MIN(Cnt) FROM TempTable;
SELECT
A_id
FROM
Table_C
GROUP BY
A_id
HAVING
COUNT(A_id) = #minCnt;