Selecting common items from two tables - sql

Given two Sqlite databases:
db1 and db2 having "functions" table with almost 10k entries
There is a column "fcn_name" that will be conditioned
Both tables have 90% similar entries.
I need to SELECT name and column2 from both tables HAVING same name and expected output is 90% of 10k but I get more than 10k entries as a result. In brief, I need an intersection for name column only but select name and column2 as a result.
I have tried the following sql statement:
select count(*) from main.functions f, other.functions d where f.name=d.name
select distinct f.name from main.functions f, other.functions d where f.name=d.name
Both do not work. Although with "distinct" keyword it returns fewer entries but still more than 10k. Why are there more entries?

Your problem could be caused by duplicates in both tables. Join both tables using DISTINCT names.
For example:
SELECT COUNT(*)
FROM (SELECT DISTINCT name FROM main.functions) f1
INNER JOIN (SELECT DISTINCT name FROM other.functions) f2
ON f1.name = f2.name
Maybe that's the solution.
Referring to your comment:
If your tables have a primary key (e.g. an ID). So the following should deliver the desired result under sqllite:
SELECT f1.name, f1.yourField1, f1.yourField2, f1.yourField3
FROM (SELECT name, yourField1, yourField2, yourField3
FROM main.functions
GROUP BY name
HAVING id = MIN(id)
ORDER BY id) f1
INNER JOIN (SELECT DISTINCT name FROM other.functions) f2
ON f1.name = f2.name

To select rows that exist in both tables based on certain matching criteria, we'd typically use an INNER JOIN, e.g
SELECT COUNT(*)
FROM main.functions f1
INNER JOIN other.functions f1
ON f1.name = f2.name
I'm not familiar with Sqlite, but would often be the approach with most DB's.

I think you want all rows from one table where the name is in the other table. For that, I would suggest:
select f.*
from main.functions f
where exists (select 1
from other.functions o
where o.name = f.name
);
This does not check that the rows are the same, only that the name exists in both tables. It also won't duplicate rows if there are multiple rows with the same name in both tables.

Related

Joined query producing more results compared to solo query

I am performing the following query which has an inner join against another table.
select count(myTable.name)
from sch2.sample_detail as myTable
inner join sch1.otherTable as otherTable on myTable.name = otherTable.name
where otherTable.is_valid = 1
and myTable.name IS NOT NULL;
This produces a count of 4912304.
The following is a query just on a single table (my table).
SELECT COUNT(myTable.name)
from sch2.sample_detail as myTable
where myTable.name IS NOT NULL;
This produces a count of 2864654.
But how is this possible? Both queries have the clause where myTable.name IS NOT NULL.
Shouldn't the second query produce same results or if not even more cos the second query doesn't have the otherTable.is_valid = 1 clause?
Why does the inner join produces a higher count of result?
Please advice if there is something I should amend in the 1st query, thanks.
Inner, left or cross join can duplicate rows. sch1.otherTable.name is not unique and this causing rows duplication because for each row in left table all corresponding rows from right table are being selected, this is normal join behavior.
To get duplicate names list use this query and decide how to remove duplicated rows: filter or distinct or filter by row_number, etc.
select count(*) cnt,
name
from sch1.otherTable
having count(*)>1
order by cnt desc;
If you need EXISTS (and do not need to select columns from otherTable), use left semi join.
Also subquery with distinct can be used to pre-aggregate name before join and filter:
select count(myTable.name)
from sch2.sample_detail as myTable
LEFT SEMI JOIN (select distinct name from sch1.otherTable otherTable where otherTable.is_valid = 1 ) as otherTable on myTable.name = otherTable.name
where myTable.name IS NOT NULL;

Joining on 2 tables but only selecting rows from one of the tables

I have 2 tables with identical names and schema. I would like to join on them, but only select rows from one of the tables. What is a good way to do this? The below query selects the rows from both tables, but I just want table a2 from the other DB.
select a.fkey_id, a2.fkeyid_id, a.otherthing, a2.otherthing from mytable a
inner join otherdb.dbo.mytable a2 on a.fkey_id=a2.fkey_id
I tried using left outer join but since the schemas are identical between the 2 tables this doesn't seem to work.
EDIT: I am only including the "a" table columns in the select to get an idea of what values the rows are returning. I just don't want any rows returned from "a", so I'd like to filter those rows out somehow.
Just take out the references to "a2" columns from the select list.
select a.fkey_id, a.otherthing from mytable a
inner join otherdb.dbo.mytable a2 on a.fkey_id=a2.fkey_id
OR
select a.* from mytable a
inner join otherdb.dbo.mytable a2 on a.fkey_id=a2.fkey_id
Which begs the questions on why you're joining to the other table if you don't want data from it. Is this a filtering method? If so, it would better performance-wise to do an exists.
select a.* from mytable a
WHERE EXISTS (
SELECT 1
FROM otherdb.dbo.mytable a2
WHERE a.fkey_id=a2.fkey_id)
select a.fkey_id
, a.otherthing
from mytable a
WHERE EXISTS (SELECT 1
FROM otherdb.dbo.mytable a2
WHERE a.fkey_id=a2.fkey_id)

Include table name in column from select wildcard sql

Is it possible to include table name in the returned column if I use wildcard to select all columns from tables?
To explain it further. Suppose I want to join two tables and both tables have the column name “name” and many other columns. I want to use wildcard to select all columns and not explicitly specifying each column name in the select.
Select *
From
TableA a,
TableB b
Where
a.id = b.id
Instead of seeing two column with same name "name", could I write a sql to return one column name as "a.name" (or TableA.name) and one as "b.name"(or TableB.name) without explicitly putting the column name in select?
I would prefer a solution for mssql but other database could be a reference too.
Thanks!
You can use select a.*, ' ', b.* from T1 a, T2 b to make it more visible where columns from T1 end and columns from T2 begin.
You are basically joining two tables on the ID field, so you will only see one column labeled "ID", not two, because you are asking to see only those records where the ID is the same in table a and table b: they share the same id.
Try ...
SELECT 'TableA' AS 'Table', A.* FROM TableA A
WHERE A.id IN (SELECT id FROM TableB)
UNION
SELECT 'TableB' AS 'Table', B.* FROM TableB B
WHERE B.id IN (SELECT id FROM TableA)
ORDER BY id, [Table]

multi-table query when there is no record in one table

What should I do if I want to:
For now, there are table A and table B,
A:
id, name, address //the id is unique
B
id, contact, email
Since one person may have more than one contact and email, or have no contact and email(which means no record in table B)
Now I want to count how many records for each id, even 0:
And the result will look like:
id, name, contact_email_total_count
How can I do that(for now the only place I can not figure out is how to count 0 record since there is no record in table B)?
For that case you will want to use a LEFT JOIN, then add an aggregate and a GROUP BY:
select a.id,
a.name,
count(b.id) as contact_email_total_count
from tablea a
left join tableb b
on a.id = b.id
group by a.id, a.name
See SQL Fiddle with Demo
If you need help learning join syntax here is a great visual explanation of joins.
Based on your comment the typical order of execution is as follows:
FROM
ON
JOIN
WHERE
GROUP BY
HAVING
SELECT
ORDER BY
Need to do a left join to maintain the records in table A regardless of B:
PostgreSQL: left outer join syntax
Need to aggregate the count of records in B:
PostgreSQL GROUP BY different from MySQL?

How do I get a count of items in one column that match items in another column?

Assume I have two data tables and a linking table as such:
A B A_B_Link
----- ----- -----
ID ID A_ID
Name Name B_ID
2 Questions:
I would like to write a query so that I have all of A's columns and a count of how many B's are linked to A, what is the best way to do this?
Is there a way to have a query return a row with all of the columns from A and a column containing all of linked names from B (maybe separated by some delimiter?)
Note that the query must return distinct rows from A, so a simple left outer join is not going to work here...I'm guessing I'll need nested select statements?
For your first question:
SELECT A.ID, A.Name, COUNT(ab.B_ID) AS bcount
FROM A LEFT JOIN A_B_Link ab ON (ab.A_ID = A.ID)
GROUP BY A.ID, A.Name;
This outputs one row per row of A, with the count of matching B's. Note that you must list all columns of A in the GROUP BY statement; there's no way to use a wildcard here.
An alternate solution is to use a correlated subquery, as #Ray Booysen shows:
SELECT A.*,
(SELECT COUNT(*) FROM A_B_Link
WHERE A_B_Link.A_ID = A.A_ID) AS bcount
FROM A;
This works, but correlated subqueries aren't very good for performance.
For your second question, you need something like MySQL's GROUP_CONCAT() aggregate function. In MySQL, you can get a comma-separated list of B.Name per row of A like this:
SELECT A.*, GROUP_CONCAT(B.Name) AS bname_list
FROM A
LEFT OUTER JOIN A_B_Link ab ON (A.ID = ab.A_ID)
LEFT OUTER JOIN B ON (ab.B_ID = B.ID)
GROUP BY A.ID;
There's no easy equivalent in Microsoft SQL Server. Check here for another question on SO about this:
"Simulating group_concat MySQL function in MS SQL Server 2005?"
Or Google for 'microsoft SQL server "group_concat"' for a variety of other solutions.
For #1
SELECT A.*,
(SELECT COUNT(*) FROM A_B_Link WHERE A_B_Link.A_ID = AOuter.A_ID)
FROM A as AOuter
SELECT A.*, COUNT(B_ID)
FROM A
LEFT JOIN A_B_Link ab ON ab.A_ID=A.ID