SQL - select * given count from another table - sql

I'm trying to select * from two tables (a and b) using a join (column a.id and b.id), given that the count of a column (b.owner) in b is lower than 3, i.e. the occurence of a person's name can be max 2.
I've tried:
SELECT a.*, COUNT(b.owner) AS b_count
FROM a LEFT JOIN b on a.id = b.id
GROUP BY b.owner HAVING COUNT(b_count) <3
As im pretty new to SQL, im pretty stuck here. How can i resolve this issue? The result should be all columns for owners who do not appear more than twice in the data.

The query you are trying to run is not working due to the columns missing in the GROUP BY clause.
As you are outputting all columns from table a (with SELECT a.*), you need to include all those columns in the GROUP BY statement, so that the database understand the group of fields to group by and perform the aggregation required (in your case COUNT(b.owner)).
Example
Considering that your table a has 3 columns below:
CREATE TABLE persons (
id INTEGER,
name VARCHAR(50),
birthday DATE,
PRIMARY KEY (id)
);
.. and your table b the following and referencing the first table as below:
CREATE TABLE sales (
id INTEGER,
person_id INTEGER,
sale_value DECIMAL,
PRIMARY KEY (id),
FOREIGN KEY (person_id) REFERENCES persons(id)
);
.. you should query it aggregating the COUNT() by those 3 columns:
SELECT a.id, a.name, a.birthday, COUNT(b.person_id) AS b_count
FROM persons a
LEFT JOIN sales b ON a.id = b.person_id
GROUP BY a.id, a.name, a.birthday
HAVING COUNT(b.person_id) < 3
Alternative
In case the total of records on the 2nd table is not important to you, you could use a different "strategy" here to avoid performing the JOIN between the tables (useful when joining two huge tables) and rewriting all the columns from a on the SELECT+GROUP BY.
By identifying the records that has less than the 3 occurrences firstly:
SELECT b.person_id
FROM sales b
GROUP BY b.person_id
HAVING COUNT(b.id) < 3;
.. and using it in the WHERE clause to retrieve all the columns from the 1st table only for the ids that resulted from the previous query:
SELECT a.*
FROM persons a
WHERE a.id IN (....other query here....);
.. the execution happens in a more chronological and, perhaps, easier way to visualize while getting more familiar with SQL:
SELECT a.*
FROM persons a
WHERE a.id IN (SELECT b.person_id
FROM sales b
GROUP BY b.person_id
HAVING COUNT(b.id) < 3);
DB Fiddle here

In Standard SQL, you can use:
SELECT a.*, COUNT(b.owner) AS b_count
FROM a LEFT JOIN
b
ON a.id = b.id
GROUP BY a.id
HAVING COUNT(b.owner) < 3;
This may not work in all databases (and it assumes that a.id is unique/primary key). An alternative would be to use a correlated subquery:
SELECT a.*
FROM (SELECT a.*,
(SELECT COUNT(*)
FROM b
WHERE a.id = b.id
) as b_count
FROM a
) a
WHERE b_count < 3;

Related

Compare the results of a ROW COUNT

I have 2 databases in the same server and I need to compare the registers on each one, since one of the databases is not importing all the information
I was trying to do a ROW count but it's not working
Currently I am doing packages of 100,000 rows approximate, and lookup at them in Excel.
Let's say I want a query that does a count for each ID in TABLE A and then compares the count result VS TABLE B count for each ID, since they are the same ID the count should be the same, and I want that brings me the ID on which there where any mismatch between counts.
--this table will contain the count of occurences of each ID in tableA
declare #TableA_Results table(
ID bigint,
Total bigint
)
insert into #TableA_Results
select ID,count(*) from database1.TableA
group by ID
--this table will contain the count of occurences of each ID in tableB
declare #TableB_Results table(
ID bigint,
Total bigint
);
insert into #TableB_Results
select ID,count(*) from database2.TableB
group by ID
--this table will contain the IDs that doesn't have the same amount in both tables
declare #Discordances table(
ID bigint,
TotalA bigint,
TotalB bigint
)
insert into #Discordances
select TA.ID,TA.Total,TB.Total
from #TableA_Results TA
inner join #TableB_Results TB on TA.ID=TB.ID and TA.Total!=TB.Total
--the final output
select * from #Discordances
The question is vague, but maybe this SQL Code might help nudge you in the right direction.
It grabs the IDs and Counts of each ID from database one, the IDs and counts of IDs from database two, and compares them, listing out all the rows where the counts are DIFFERENT.
WITH DB1Counts AS (
SELECT ID, COUNT(ID) AS CountOfIDs
FROM DatabaseOne.dbo.TableOne
GROUP BY ID
), DB2Counts AS (
SELECT ID, COUNT(ID) AS CountOfIDs
FROM DatabaseTwo.dbo.TableTwo
GROUP BY ID
)
SELECT a.ID, a.CountOfIDs AS DBOneCount, b.CountOfIDs AS DBTwoCount
FROM DB1Counts a
INNER JOIN DB2Counts b ON a.ID = b.ID
WHERE a.CountOfIDs <> b.CountOfIDs
This SQL selects from the specific IDs using the "Database.Schema.Table" notation. So replace "DatabaseOne" and "DatabaseTwo" with the names of your two databases. And of course replace TableOne and TableTwo with the names of your tables (I'm assuming they're the same). This sets up two selects, one for each database, that groups by ID to get the count of each ID. It then joins these two selects on ID, and returns all rows where the counts are different.
You could full outer join two aggregate queries and pull out ids that are either missing in one table, or for which the record count is different:
select coalesce(ta.id, tb.id), ta.cnt, tb.cnt
from
(select id, count(*) cnt from tableA) ta
full outer join (select id, count(*) cnt from tableB) tb
on ta.id = tb.id
where
coalesce(ta.cnt, -1) <> coalesce(tb.cnt, -1)
You seem to want aggregation and a full join:
select coalesce(a.id, b.id) as id, a.cnt, b.cnt
from (select id, count(*) as cnt
from a
group by id
) a full join
(select id, count(*) as cnt
from b
group by id
) b
on a.id = b.id
where coalesce(a.cnt, 0) <> coalesce(b.cnt, 0);

use SUM with left join get me wrong result

So I have :
CREATE TABLE A (id INT,type int,amount int);
INSERT INTO A (id,type,amount) VALUES (1,0,25);
INSERT INTO A (id,type,amount) VALUES (2,0,25);
INSERT INTO A (id,type,amount) VALUES (3,1,10);
CREATE TABLE B (id INT,A_ID int,txt text);
INSERT INTO B (id,A_id,txt) VALUES (1,1,'abc');
INSERT INTO B (id,A_id,txt) VALUES (2,1,'def');
INSERT INTO B (id,A_id,txt) VALUES (3,2,'xxx');
I run this query:
SELECT min(A.id), SUM(A.amount), COUNT(B.id) FROM A
LEFT JOIN B ON A.id = B.A_id
GROUP BY A.type
I get :
min(A.id) SUM(A.amount) COUNT(B.id)
1 75 3
3 10 0
But I'm instead expecting to get :
min(A.id) SUM(A.amount) COUNT(B.id)
1 50 3
3 10 0
Can someone help? What is the best way to achieve this exact result ?
I want group BY type and get SUM of grouped A.amount and get count() of all B corresponding to its foreign key.
here is the repro : https://www.db-fiddle.com/f/esu13uGLcgFDpX7aEQRMJR/0 please RUN sql code.
EDIT to add more detail : I know the result is correct if I remove group by we can see
1, 50, 2
2, 25, 1
But I expect the above result, what is the best way to achieve it ? I want make SUM of a TYPE then count all B related to this groupped A
Just a shorter version of the solution. It counts B_IDs first in the inner query, so I need to Sum the counts in the outer query.
SELECT min(A.id), SUM(A.amount), Sum(Bid) FROM A
LEFT JOIN (select count(id) as Bid, A_id from B group by A_id) as Bcount
ON A.id = Bcount.A_id
GROUP BY A.type
This can happen when you SUM from an 1-N relation.
The matching records can multiply the result.
For example, when 1 records in A are joined with 2 in B it returns 2 times the amount of A before the GROUP BY. So a SUM then doubles A.amount.
A way to get around that is using sub-queries that join one-on-one.
And a COUNT DISTINCT can be used to count unique id's.
So this just a way to get the SUM of A correct.
SELECT
q1.type,
q1.min_id,
q2.amount,
COALESCE(q1.totalB, 0) as totalB
FROM
(
SELECT
A.type,
MIN(A.id) AS min_id,
COUNT(DISTINCT B.id) AS totalB
FROM A
LEFT JOIN B ON B.A_id = A.id
GROUP BY A.type
) AS q1
JOIN
(
SELECT
type,
SUM(amount) AS amount
FROM A
GROUP BY type
) AS q2 ON q2.type = q1.type
View on DB Fiddle
The SQL is tested for MySql. But it's an ANSI standard SQL that would run on almost any RDBMS, including MS Sql Server.
one way of doing this would be to use ROW_NUMBER():
WITH CTE AS (SELECT A.id AS Aid,
A.[type],
A.amount,
B.id AS bid,
txt,
ROW_NUMBER() OVER (PARTITION BY A.id ORDER BY B.id) AS RN
FROM A
LEFT JOIN B ON A.id = B.A_ID)
SELECT MIN(Aid) AS Min_A_ID,
SUM(CASE RN WHEN 1 THEN amount END) AS Amount,
COUNT(bid) AS BCount
FROM CTE
GROUP BY [type];
I also recommend getting rid of that text datatype and using varchar(MAX).

Filter by count from another table

This query works fine only without WHERE, otherwise there is an error:
column "cnt" does not exist
SELECT
*,
(SELECT count(*)
FROM B
WHERE A.id = B.id) AS cnt
FROM A
WHERE cnt > 0
Use a subquery:
SELECT a.*
FROM (SELECT A.*,
(SELECT count(*)
FROM B
WHERE A.id = B.id
) AS cnt
FROM A
) a
WHERE cnt > 0;
Column aliases defined in the SELECT cannot be used by the WHERE (or other clauses) for that SELECT.
Or, if the id on a is unique, you can more simply do:
SELECT a.*, COUNT(B.id)
FROM A LEFT JOIN
B
ON A.id = B.id
GROUP BY A.id
HAVING COUNT(B.id) > 0;
Or, if you don't really need the count, then:
select a.*
from a
where exists (select 1 from b where b.id = a.id);
Assumptions:
You need all columns from A in the result, plus the count from B. That's what your demonstrated query does.
You only want rows with cnt > 0. That's what started your question after all.
Most or all B.id exist in A. That's the typical case and certainly true if a FK constraint on B.id references to A.id.
Solution
Faster, shorter, correct:
SELECT * -- !
FROM (SELECT id, count(*) AS cnt FROM B) B
JOIN A USING (id) -- !
-- WHERE cnt > 0 -- this predicate is implicit now!
Major points
Aggregate before the join, that's typically (substantially) faster when processing the whole table or major parts of it. It also defends against problems if you join to more than one n-table. See:
Aggregate functions on multiple joined tables
You don't need to add the predicate WHERE cnt > 0 any more, that's implicit with the [INNER] JOIN.
You can simply write SELECT *, since the join only adds the column cnt to A.* when done with the USING clause - only one instance of the joining column(s) (id in the example) is added to the out columns. See:
How to drop one join key when joining two tables
Your added question in the comment
postgres really allows to have outside aggregate function attributes that are not behind group by?
That's only true if the PK column(s) is listed in the GROUP BY clause - which covers the whole row. Not the case for a UNIQUE or EXCLUSION constraint. See:
Return a grouped list with occurrences using Rails and PostgreSQL
SQL Fiddle demo (extended version of Gordon's demo).

Value present in more than one table

I have 3 tables. All of them have a column - id. I want to find if there is any value that is common across the tables. Assuming that the tables are named a.b and c, if id value 3 is present is a and b, there is a problem. The query can/should exit at the first such occurrence. There is no need to probe further. What I have now is something like
( select id from a intersect select id from b )
union
( select id from b intersect select id from c )
union
( select id from a intersect select id from c )
Obviously, this is not very efficient. Database is PostgreSQL, version 9.0
id is not unique in the individual tables. It is OK to have duplicates in the same table. But if a value is present in just 2 of the 3 tables, that also needs to be flagged and there is no need to check for existence in he third table, or check if there are more such values. One value, present in more than one table, and I can stop.
Although id is not unique within any given table, it should be unique across the tables; a union of distinct id should be unique, so:
select id from (
select distinct id from a
union all
select distinct id from b
union all
select distinct id from c) x
group by id
having count(*) > 1
Note the use of union all, which preserves duplicates (plain union removes duplicates).
I would suggest a simple join:
select a.id
from a join
b
on a.id = b.id join
c
on a.id = c.id
limit 1;
If you have a query that uses union or group by (or order by, but that is not relevant here), then you need to process all the data before returning a single row. A join can start returning rows as soon as the first values are found.
An alternative, but similar method is:
select a.id
from a
where exists (select 1 from b where a.id = b.id) and
exists (select 1 from c where a.id = c.id);
If a is the smallest table and id is indexes in b and c, then this could be quite fast.
Try this
select id from
(
select distinct id, 1 as t from a
union all
select distinct id, 2 as t from b
union all
select distinct id, 3 as t from c
) as t
group by id having count(t)=3
It is OK to have duplicates in the same table.
The query can/should exit at the first such occurrence.
SELECT 'OMG!' AS danger_bill_robinson
WHERE EXISTS (SELECT 1
FROM a,b,c -- maybe there is a place for old-style joins ...
WHERE a.id = b.id
OR a.id = c.id
OR c.id = b.id
);
Update: it appears the optimiser does not like carthesian joins with 3 OR conditions. The below query is a bit faster:
SELECT 'WTF!' AS danger_bill_robinson
WHERE exists (select 1 from a JOIN b USING (id))
OR exists (select 1 from a JOIN c USING (id))
OR exists (select 1 from c JOIN b USING (id))
;

Sql - Simple query with count

I have a simple question on how to read the number of occurences of some values in a table B that references to a value in a table A. It's all explained better in the following example:
Let's say I have two simple tables, A with an attribute ID and B with an attribute ID that references A.ID; I use this code to find the number of occurences of a A.ID value into B:
SELECT A.ID, COUNT(*)
FROM A JOIN B ON A.ID = B.ID
GROUP BY A.ID
Is it possible to achieve the same result using something like the following code...
SELECT ID, -- SOMETHING --
FROM A
....
... using this subquery?
SELECT COUNT(*)
FROM B WHERE B.ID = A.ID
Thank you very much
I think you might be referring to a correlated subquery:
select a.id, (select count(1) from b where id=a.id) cnt from a;
The a.id term in the subquery binds with table a in the outer query.