SQL query two tables with relation one-to-many - sql

I have two tables A and Band the relation between A to B is A--->one-to-Many--->B
Normally i have one record of B for every record of A.
I am trying to write a query which would give me list of ONLY records of A which have more than ONE(MULTIPLE) record(s) in B.
I am very confused as I have only done basic sql queries and this one seems complex to me.
Could some one please guide me to correct answer or give me the solution.
edited:
ok i tried something like below and it gave me an error
SELECT SOME_COLUMN_NAME FROM A a, B b WHERE a.ID=b.ID and count(b.SOME_OTHER_COLUMN_NAME)>1;
ORA-00934: group function is not allowed here
I tried to search on the internet ad i am not allowed to use grouping in where clause and should go by using having. I am stuck here now.

You haven't specified which database system you are using (sql-server/mysql/sqlite/oracle etc) so this is a generic answer.
In this form, list out all the columns of A explicitly in the SELECT and GROUP BY clauses. It normally generates a good straightforward plan in most DBMS. But it can also fail miserably if the type is not GROUP-able, such as TEXT columns in SQL Server.
SELECT A.Col1, A.Col2, A.Col3
FROM A
JOIN B ON A.LinkID = B.LinkID
GROUP BY A.Col1, A.Col2, A.Col3
HAVING COUNT(*) > 1
This other form using a subquery works for any column types in A and normally produces exactly the same plan.
SELECT A.Col1, A.Col2, A.Col3
FROM A
WHERE 1 < (
SELECT COUNT(*)
FROM B
WHERE A.LinkID = B.LinkID)

You could do it with a sub-query:
select *
from A
where ( select count(*) from B where B.id = A.id ) > 1

select *
from tableA
where Id in (select IdA from tableb group by idA having COUNT(*)>1)
assuming tableB has a field called idA that links it to tableA

Related

Oracle SQL - Select not using index as expected

So I haven't used Oracle in more than 5 years and I'm out of practice. I've been on SQL Server all that time.
I'm looking at some of the existing queries and trying to improve them, but they're reacting really weirdly. According to the explain plan instead of going faster they're instead doing full table scans and not using the indexes.
In the original query, there is an equijoin done between two tables done in the where statement. We'll call them table A and B. I used an explain plan followed by SELECT * FROM table(DBMS_XPLAN.DISPLAY (FORMAT=>'ALL +OUTLINE')); and it tells me that Table A is queried by Local Index.
TABLE ACCESS BY LOCAL INDEX ROWID
SELECT A.*
FROM TableA A, TableB B
WHERE A.SecondaryID = B.ID;
I tried to change the query and join TableA with a new table (Table C). Table C is a subset of Table B with 700 records instead of 100K. However the explain plan tells me that Table A is now queried with a full lookup.
CREATE TableC
AS<br>
SELECT * FROM TableB WHERE Active='Y';
SELECT A.*
FROM TableA A, TableC C
WHERE A.SecondaryID = C.ID;
Next step, I kept the join between tables A & C, but used a hint to tell it to use the index on Table A. However it still does a full lookup.
SELECT /*+ INDEX (A_NDX01) */ A.*
FROM TableA A, TableC C
WHERE A.SecondaryID = C.ID;
So I tried to change from a join to a simple Select of table A and use an IN statement to compare to table C. Still a full table scan.
SELECT A.*
FROM TableA A
WHERE A.SecondaryID in (SELECT ID FROM TableC);
Lastly, I took the previous statement and changed the subselect to pull the top 1000 records, and it used the index. The odd thing is that there are only 700 records in Table C.
SELECT A.*
FROM TableA A
WHERE A.SecondaryID in (SELECT ID FROM TableC WHERE rownum <1000
)
I was wondering if someone could help me figure out what's happening?
My best guess is that since TableC is a new table, maybe the optimizer doesn't know how many records are in it and that's why it's it will only use the index if it knows that there are fewer than 1000 records?
I tried to run dbms_stats.gather_schema_stats on my schema though and it did not help.
Thank you for your help.
As a general rule Using an index will not necessarily make your query go faster ALWAYS.
Hints are directives to the optimizer to make use of the path, it doenst mean optimizer would choose to obey the hint directive. In this case, the optimizer would have considered that an index lookup on TableA is more expensive in the
SELECT A.*
FROM TableA A, TableB B
WHERE A.SecondaryID = B.ID;
SELECT /*+ INDEX (A_NDX01) */ A.*
FROM TableA A, TableC C
WHERE A.SecondaryID = C.ID;
SELECT A.*
FROM TableA A
WHERE A.SecondaryID in (SELECT ID FROM TableC);
Internally it might have converted all of these statements(IN) into a join which when considering the data in the tableA and tableC decided to make use of full table scan.
When you did the rownum condition, this plan conversion was not done. This is because view-merging will not work when it has the rownum in the query block.
I believe this is what is happening when you did
SELECT A.*
FROM TableA A
WHERE A.SecondaryID in (SELECT ID FROM TableC WHERE rownum <1000)
Have a look at the following link
Oracle. Preventing merge subquery and main query conditions

Change join condition based on a condition - Oracle

assume we have two tables.
TabA(Acol1,Acol2,Acol3)
TabB(Bcol1,Bcol2.Ccol3)
Requirement is like, join two tables on Acol1,Bcol1 and if Acol3='C' then join based on Acol2=Bcol2 in addition to above join. Can we make this in single SQL query ? Is join is record wise or table wise ?
One solution I can get to is using Union, but I dont think this will be a optimized one. Any other solutions ?
Another solution I figured
SELECT A.*,B.* FROM TabA A
INNER JOIN TabB B ON A.Acol1 = B.BCol1
and case when A.Acol3='C' then A.ACol2 else '1' end =
case when A.Acol3='C' then B.BCol2 else '1' end ;
Any other solution without case and Union ?
Thanks in advance
If you want to join on TabA.col2 only when it is 'C' then in those case, TabB.col2 will also be 'C', as you are already joining from col1. So your output will be same which you get just by first join.
select a.*, b.* from tabA a join tabB b
on a.col1=b.col1
This should give you the same output anytime. Try creating a different scenario on values of 'C'. The result will always be a subset of your first join result.
Hmmm, after thinking about the question, I think you might just want a complicated on clause:
select a.*, b.*
from tabA a join
tabB b
on a.col1 = b.col1 and
(a.col3 <> 'C' or a.col2 = b.col2);
Note: the above assumes that a.col2 is not null (that condition is easily included if needed).
You may need to work out some examples by hand to see that the or method is equivalent to the case statement.

Returning only duplicate rows from two tables

Every thread I've seen so far has been to check for duplicate rows and avoiding them. I'm trying to get a query to only return the duplicate rows. I thought it would be as simple as a subquery, but I was wrong. Then I tried the following:
SELECT * FROM a
WHERE EXISTS
(
SELECT * FROM b
WHERE b.id = a.id
)
Was a bust too. How do I return only the duplicate rows? I'm currently going through two tables, but I'm afraid there are a large amount of duplicates.
use this query, maybe is better if you check the relevant column.
SELECT * FROM a
INTERSECT
SELECT * FROM b
I am sure your posted code would work too like
SELECT * FROM a
WHERE EXISTS
(
SELECT 1 FROM b WHERE id = a.id
)
You can as well do a INNER JOIN like
SELECT a.* FROM a
JOIN b on a.id = b.id;
You can as well use a IN operator saying
SELECT * FROM a where id in (select id from b);
If none of them, then you can use UNION if both table satisfies the union restriction along with ROW_NUMBER() function like
SELECT * FROM (
SELECT *,
ROW_NUMBER() OVER(PARTITION BY id ORDER BY id) AS rn
FROM (
select * from a
union all
select * from b) xx ) yy
WHERE rn = 1;
Note: there's an ambiguity as to what you mean by a duplicate row, and whether you're talking about duplicate keys, or all fields being the same. My answer deals with all fields being the same; some of the others are assuming it's just the keys. It's unclear which you intend.
You might try
SELECT id, col1, col2 FROM a INNER JOIN b ON a.id = b.id
WHERE a.col1 = b.col1 AND a.col2 = b.col2
adding in other columns as necessary. The database engine should be intelligent enough to do the comparisons on the indexed columns first, so it'll be efficient as long as you don't have rows that are different only on lots of non-indexed fields. (If you do, then I don't think anything will do it particularly efficiently.)

Weird SQL request(Join)

Assuming there are two tables A={a,b} and B={0,1,2}, which can be joined
tableA tableB
a 0
b 1
a 2
3
How to get the following result
ExpectingResult:
tableA tableB
a-------0
b-------1
null----2
null----3
OR
tableA tableB
a-------2
b-------3
null----0
null----1
Just make sure the element in each table just appear once, I tried all kinds of join(inner, full, cross), none of them can achieve so. Could anybody give me a tip?
Thank you very much
Please check this link out to the question itself: http://www.sqlfiddle.com/#!3/9fc21/2
That is a crappy request. There is almost negligible reasons for producing such an output that comes to mind, although I'm not ruling out a sane reason completely.
For SQL Server 2000, you will need to go through temp tables to get a sequential key to zip up with.
SELECT IDENTITY(int,1,1) ID, Value
INTO #tblA
FROM tableA
ORDER BY Value;
SELECT IDENTITY(int,1,1) ID, Value
INTO #tblB
FROM tableB
ORDER BY Value;
SELECT A.Value, B.Value
FROM #tblA A FULL OUTER JOIN #tblB B ON A.ID = B.ID
ORDER BY Coalesce(A.ID, B.ID);
Using SQL Server 2005, you can use a combination of ROW_NUMBER and a FULL OUTER JOIN to combine both results.
See SQL Fiddle
Unfortunately, SQL Server 2000 doesn't have a ROW_NUMBER function so you are kind of stuck with using temporary tables with identity fields to simulate a rownumber.
The gist of this would be to
Select your required data from tableA into #tempTableA, adding an identity field.
Repeat for tableB
Use the temptables to FULL OUTER JOIN the results

How do I get a count of items in one column that match items in another column?

Assume I have two data tables and a linking table as such:
A B A_B_Link
----- ----- -----
ID ID A_ID
Name Name B_ID
2 Questions:
I would like to write a query so that I have all of A's columns and a count of how many B's are linked to A, what is the best way to do this?
Is there a way to have a query return a row with all of the columns from A and a column containing all of linked names from B (maybe separated by some delimiter?)
Note that the query must return distinct rows from A, so a simple left outer join is not going to work here...I'm guessing I'll need nested select statements?
For your first question:
SELECT A.ID, A.Name, COUNT(ab.B_ID) AS bcount
FROM A LEFT JOIN A_B_Link ab ON (ab.A_ID = A.ID)
GROUP BY A.ID, A.Name;
This outputs one row per row of A, with the count of matching B's. Note that you must list all columns of A in the GROUP BY statement; there's no way to use a wildcard here.
An alternate solution is to use a correlated subquery, as #Ray Booysen shows:
SELECT A.*,
(SELECT COUNT(*) FROM A_B_Link
WHERE A_B_Link.A_ID = A.A_ID) AS bcount
FROM A;
This works, but correlated subqueries aren't very good for performance.
For your second question, you need something like MySQL's GROUP_CONCAT() aggregate function. In MySQL, you can get a comma-separated list of B.Name per row of A like this:
SELECT A.*, GROUP_CONCAT(B.Name) AS bname_list
FROM A
LEFT OUTER JOIN A_B_Link ab ON (A.ID = ab.A_ID)
LEFT OUTER JOIN B ON (ab.B_ID = B.ID)
GROUP BY A.ID;
There's no easy equivalent in Microsoft SQL Server. Check here for another question on SO about this:
"Simulating group_concat MySQL function in MS SQL Server 2005?"
Or Google for 'microsoft SQL server "group_concat"' for a variety of other solutions.
For #1
SELECT A.*,
(SELECT COUNT(*) FROM A_B_Link WHERE A_B_Link.A_ID = AOuter.A_ID)
FROM A as AOuter
SELECT A.*, COUNT(B_ID)
FROM A
LEFT JOIN A_B_Link ab ON ab.A_ID=A.ID