Efficiently Include Column not in Group By of SQL Query - sql

Given
Table A
Id INTEGER
Name VARCHAR(50)
Table B
Id INTEGER
FkId INTEGER ; Foreign key to Table A
I wish to count the occurrances of each FkId value:
SELECT FkId, COUNT(FkId)
FROM B
GROUP BY FkId
Now I simply want to also output the Name from Table A.
This will not work:
SELECT FkId, COUNT(FkId), a.Name
FROM B b
INNER JOIN A a ON a.Id=b.FkId
GROUP BY FkId
because a.Name is not contained in the GROUP BY clause (produces is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause error).
The point is to move from output like this
FkId Count
1 42
2 25
to output like this
FkId Count Name
1 42 Ronald
2 22 John
There are quite a few matches on SO for that error message, but some e.g. https://stackoverflow.com/a/6456944/141172 have comments like "will generate 3 scans on the table, rather than 1, so won't scale".
How can I efficiently include a field from the joined Table B (which has a 1:1 relationship to FkId) in the query output?

You can try something like this:
;WITH GroupedData AS
(
SELECT FkId, COUNT(FkId) As FkCount
FROM B
GROUP BY FkId
)
SELECT gd.*, a.Name
FROM GroupedData gd
INNER JOIN dbo.A ON gd.FkId = A.FkId
Create a CTE (Common Table Expression) to handle the grouping/counting on your Table B, and then join that result (one row per FkId) to Table A and grab some more columns from Table A into your final result set.

Did you try adding the field to the group by?
SELECT FkId, COUNT(FkId), a.Name
FROM B b
INNER JOIN A a ON a.Id=b.FkId
GROUP BY FkId,a.Name

select t3.Name, t3.FkId, t3.countedFkId from (a t1
join (select t2.FkId, count(FkId) as countedFkId from b t2 group by t2.FkId)
on t1.Id = t2.FkId) t3;

Related

How to count rows matching multiple filters in SQL?

I have data in which I'm aiming to find rows that have unique values of their main_ID column and then count the total of those IDs that also have either of 2 values for another ID column.
I am trying this:
SELECT COUNT(DISTINCT(main_id))
FROM (SELECT other_id, main_id FROM database.table WHERE other_id ='5') a INNER JOIN
(SELECT other_id, main_id FROM database.table WHERE other_id ='6') b USING (main_id)
This returns an error at (SELECT saying subquery in FROM must have an alias. I've never coded in SQL before so I'm not sure what to start with addressing this. As I understand it, it wants aliases for the 2 columns - how do I assign these for my inner join?
your query can be optimized to this :
select count(*) from (
select main_id
from database.table
where other_id in ('5','6')
group by main_id
having count(distinct other_id) = 2
) t
You need to follow this structure for an inner join
SELECT column_name(s)
FROM table1
INNER JOIN table2
ON table1.column_name = table2.column_name;
in your query you need to add the relation between the 2 tables, in this case you need to use the primary key of both tables to make the relation between the tables.
like this example you need to add "On" stament and "table a primary key" equals "table b primary key"
SELECT COUNT(DISTINCT(main_id))
FROM (SELECT other_id, main_id FROM database.table WHERE other_id ='5') a INNER JOIN
(SELECT other_id, main_id FROM database.table WHERE other_id ='6') b on
a.primary_key=b.primary_key
You can red more information about inner join.

SQL - select * given count from another table

I'm trying to select * from two tables (a and b) using a join (column a.id and b.id), given that the count of a column (b.owner) in b is lower than 3, i.e. the occurence of a person's name can be max 2.
I've tried:
SELECT a.*, COUNT(b.owner) AS b_count
FROM a LEFT JOIN b on a.id = b.id
GROUP BY b.owner HAVING COUNT(b_count) <3
As im pretty new to SQL, im pretty stuck here. How can i resolve this issue? The result should be all columns for owners who do not appear more than twice in the data.
The query you are trying to run is not working due to the columns missing in the GROUP BY clause.
As you are outputting all columns from table a (with SELECT a.*), you need to include all those columns in the GROUP BY statement, so that the database understand the group of fields to group by and perform the aggregation required (in your case COUNT(b.owner)).
Example
Considering that your table a has 3 columns below:
CREATE TABLE persons (
id INTEGER,
name VARCHAR(50),
birthday DATE,
PRIMARY KEY (id)
);
.. and your table b the following and referencing the first table as below:
CREATE TABLE sales (
id INTEGER,
person_id INTEGER,
sale_value DECIMAL,
PRIMARY KEY (id),
FOREIGN KEY (person_id) REFERENCES persons(id)
);
.. you should query it aggregating the COUNT() by those 3 columns:
SELECT a.id, a.name, a.birthday, COUNT(b.person_id) AS b_count
FROM persons a
LEFT JOIN sales b ON a.id = b.person_id
GROUP BY a.id, a.name, a.birthday
HAVING COUNT(b.person_id) < 3
Alternative
In case the total of records on the 2nd table is not important to you, you could use a different "strategy" here to avoid performing the JOIN between the tables (useful when joining two huge tables) and rewriting all the columns from a on the SELECT+GROUP BY.
By identifying the records that has less than the 3 occurrences firstly:
SELECT b.person_id
FROM sales b
GROUP BY b.person_id
HAVING COUNT(b.id) < 3;
.. and using it in the WHERE clause to retrieve all the columns from the 1st table only for the ids that resulted from the previous query:
SELECT a.*
FROM persons a
WHERE a.id IN (....other query here....);
.. the execution happens in a more chronological and, perhaps, easier way to visualize while getting more familiar with SQL:
SELECT a.*
FROM persons a
WHERE a.id IN (SELECT b.person_id
FROM sales b
GROUP BY b.person_id
HAVING COUNT(b.id) < 3);
DB Fiddle here
In Standard SQL, you can use:
SELECT a.*, COUNT(b.owner) AS b_count
FROM a LEFT JOIN
b
ON a.id = b.id
GROUP BY a.id
HAVING COUNT(b.owner) < 3;
This may not work in all databases (and it assumes that a.id is unique/primary key). An alternative would be to use a correlated subquery:
SELECT a.*
FROM (SELECT a.*,
(SELECT COUNT(*)
FROM b
WHERE a.id = b.id
) as b_count
FROM a
) a
WHERE b_count < 3;

Compare the results of a ROW COUNT

I have 2 databases in the same server and I need to compare the registers on each one, since one of the databases is not importing all the information
I was trying to do a ROW count but it's not working
Currently I am doing packages of 100,000 rows approximate, and lookup at them in Excel.
Let's say I want a query that does a count for each ID in TABLE A and then compares the count result VS TABLE B count for each ID, since they are the same ID the count should be the same, and I want that brings me the ID on which there where any mismatch between counts.
--this table will contain the count of occurences of each ID in tableA
declare #TableA_Results table(
ID bigint,
Total bigint
)
insert into #TableA_Results
select ID,count(*) from database1.TableA
group by ID
--this table will contain the count of occurences of each ID in tableB
declare #TableB_Results table(
ID bigint,
Total bigint
);
insert into #TableB_Results
select ID,count(*) from database2.TableB
group by ID
--this table will contain the IDs that doesn't have the same amount in both tables
declare #Discordances table(
ID bigint,
TotalA bigint,
TotalB bigint
)
insert into #Discordances
select TA.ID,TA.Total,TB.Total
from #TableA_Results TA
inner join #TableB_Results TB on TA.ID=TB.ID and TA.Total!=TB.Total
--the final output
select * from #Discordances
The question is vague, but maybe this SQL Code might help nudge you in the right direction.
It grabs the IDs and Counts of each ID from database one, the IDs and counts of IDs from database two, and compares them, listing out all the rows where the counts are DIFFERENT.
WITH DB1Counts AS (
SELECT ID, COUNT(ID) AS CountOfIDs
FROM DatabaseOne.dbo.TableOne
GROUP BY ID
), DB2Counts AS (
SELECT ID, COUNT(ID) AS CountOfIDs
FROM DatabaseTwo.dbo.TableTwo
GROUP BY ID
)
SELECT a.ID, a.CountOfIDs AS DBOneCount, b.CountOfIDs AS DBTwoCount
FROM DB1Counts a
INNER JOIN DB2Counts b ON a.ID = b.ID
WHERE a.CountOfIDs <> b.CountOfIDs
This SQL selects from the specific IDs using the "Database.Schema.Table" notation. So replace "DatabaseOne" and "DatabaseTwo" with the names of your two databases. And of course replace TableOne and TableTwo with the names of your tables (I'm assuming they're the same). This sets up two selects, one for each database, that groups by ID to get the count of each ID. It then joins these two selects on ID, and returns all rows where the counts are different.
You could full outer join two aggregate queries and pull out ids that are either missing in one table, or for which the record count is different:
select coalesce(ta.id, tb.id), ta.cnt, tb.cnt
from
(select id, count(*) cnt from tableA) ta
full outer join (select id, count(*) cnt from tableB) tb
on ta.id = tb.id
where
coalesce(ta.cnt, -1) <> coalesce(tb.cnt, -1)
You seem to want aggregation and a full join:
select coalesce(a.id, b.id) as id, a.cnt, b.cnt
from (select id, count(*) as cnt
from a
group by id
) a full join
(select id, count(*) as cnt
from b
group by id
) b
on a.id = b.id
where coalesce(a.cnt, 0) <> coalesce(b.cnt, 0);

Query a table that have 2 cols with multiple criteria

I have a table with the following structure and Example data:
Now I want to query the records that have value equals to # and #.
For example according to the above image, It should returns 1 and 2
id
-----
1
2
Also if the parameters were #, # and $ It should give us 1. Because only the records with id 1 have all the given values.
id
-----
1
You can use a group by and having to get the distinct Id's that contain a distinct count of the number of items you're looking for
SELECT Id
FROM Table
WHERE Value IN ('#','$')
GROUP BY Id
HAVING COUNT(DISTINCT Value) = 2
SELECT Id
FROM Table
WHERE Value IN ('#','$','#')
GROUP BY Id
HAVING COUNT(DISTINCT Value) = 3
SQL Fiddle you can use this link to test
There's several ways to do this.
The subquery method:
SELECT DISTINCT Id
FROM Table
WHERE Id IN (SELECT Id FROM Table WHERE Value = '#')
AND Id IN (SELECT Id FROM Table WHERE Value = '#');
The correlated subquery method:
SELECT DISTINCT t.Id
FROM Table t
WHERE EXISTS (SELECT 1 FROM Table a WHERE a.Id = t.Id and a.Value = '#')
AND EXISTS (SELECT 1 FROM Table b WHERE b.Id = t.Id and b.Value = '#');
And the INTERSECT method:
SELECT Id FROM Table WHERE Value = '#'
INTERSECT
SELECT Id FROM Table WHERE Value = '#';
Best performance will depend on RDBMS vendor, size of table, and indexes. Not all RDBMS vendors support all methods.
Maybe a multiple self join like this?
select
distinct t1.id
from
table t1
join table t2 on (t1.id=t2.id)
join table t3 on (t1.id=t3.id)
...
where
t1.value='#' and
t2.value='#' and
t3.value='$' and
...

Value present in more than one table

I have 3 tables. All of them have a column - id. I want to find if there is any value that is common across the tables. Assuming that the tables are named a.b and c, if id value 3 is present is a and b, there is a problem. The query can/should exit at the first such occurrence. There is no need to probe further. What I have now is something like
( select id from a intersect select id from b )
union
( select id from b intersect select id from c )
union
( select id from a intersect select id from c )
Obviously, this is not very efficient. Database is PostgreSQL, version 9.0
id is not unique in the individual tables. It is OK to have duplicates in the same table. But if a value is present in just 2 of the 3 tables, that also needs to be flagged and there is no need to check for existence in he third table, or check if there are more such values. One value, present in more than one table, and I can stop.
Although id is not unique within any given table, it should be unique across the tables; a union of distinct id should be unique, so:
select id from (
select distinct id from a
union all
select distinct id from b
union all
select distinct id from c) x
group by id
having count(*) > 1
Note the use of union all, which preserves duplicates (plain union removes duplicates).
I would suggest a simple join:
select a.id
from a join
b
on a.id = b.id join
c
on a.id = c.id
limit 1;
If you have a query that uses union or group by (or order by, but that is not relevant here), then you need to process all the data before returning a single row. A join can start returning rows as soon as the first values are found.
An alternative, but similar method is:
select a.id
from a
where exists (select 1 from b where a.id = b.id) and
exists (select 1 from c where a.id = c.id);
If a is the smallest table and id is indexes in b and c, then this could be quite fast.
Try this
select id from
(
select distinct id, 1 as t from a
union all
select distinct id, 2 as t from b
union all
select distinct id, 3 as t from c
) as t
group by id having count(t)=3
It is OK to have duplicates in the same table.
The query can/should exit at the first such occurrence.
SELECT 'OMG!' AS danger_bill_robinson
WHERE EXISTS (SELECT 1
FROM a,b,c -- maybe there is a place for old-style joins ...
WHERE a.id = b.id
OR a.id = c.id
OR c.id = b.id
);
Update: it appears the optimiser does not like carthesian joins with 3 OR conditions. The below query is a bit faster:
SELECT 'WTF!' AS danger_bill_robinson
WHERE exists (select 1 from a JOIN b USING (id))
OR exists (select 1 from a JOIN c USING (id))
OR exists (select 1 from c JOIN b USING (id))
;