use SUM with left join get me wrong result - sql

So I have :
CREATE TABLE A (id INT,type int,amount int);
INSERT INTO A (id,type,amount) VALUES (1,0,25);
INSERT INTO A (id,type,amount) VALUES (2,0,25);
INSERT INTO A (id,type,amount) VALUES (3,1,10);
CREATE TABLE B (id INT,A_ID int,txt text);
INSERT INTO B (id,A_id,txt) VALUES (1,1,'abc');
INSERT INTO B (id,A_id,txt) VALUES (2,1,'def');
INSERT INTO B (id,A_id,txt) VALUES (3,2,'xxx');
I run this query:
SELECT min(A.id), SUM(A.amount), COUNT(B.id) FROM A
LEFT JOIN B ON A.id = B.A_id
GROUP BY A.type
I get :
min(A.id) SUM(A.amount) COUNT(B.id)
1 75 3
3 10 0
But I'm instead expecting to get :
min(A.id) SUM(A.amount) COUNT(B.id)
1 50 3
3 10 0
Can someone help? What is the best way to achieve this exact result ?
I want group BY type and get SUM of grouped A.amount and get count() of all B corresponding to its foreign key.
here is the repro : https://www.db-fiddle.com/f/esu13uGLcgFDpX7aEQRMJR/0 please RUN sql code.
EDIT to add more detail : I know the result is correct if I remove group by we can see
1, 50, 2
2, 25, 1
But I expect the above result, what is the best way to achieve it ? I want make SUM of a TYPE then count all B related to this groupped A

Just a shorter version of the solution. It counts B_IDs first in the inner query, so I need to Sum the counts in the outer query.
SELECT min(A.id), SUM(A.amount), Sum(Bid) FROM A
LEFT JOIN (select count(id) as Bid, A_id from B group by A_id) as Bcount
ON A.id = Bcount.A_id
GROUP BY A.type

This can happen when you SUM from an 1-N relation.
The matching records can multiply the result.
For example, when 1 records in A are joined with 2 in B it returns 2 times the amount of A before the GROUP BY. So a SUM then doubles A.amount.
A way to get around that is using sub-queries that join one-on-one.
And a COUNT DISTINCT can be used to count unique id's.
So this just a way to get the SUM of A correct.
SELECT
q1.type,
q1.min_id,
q2.amount,
COALESCE(q1.totalB, 0) as totalB
FROM
(
SELECT
A.type,
MIN(A.id) AS min_id,
COUNT(DISTINCT B.id) AS totalB
FROM A
LEFT JOIN B ON B.A_id = A.id
GROUP BY A.type
) AS q1
JOIN
(
SELECT
type,
SUM(amount) AS amount
FROM A
GROUP BY type
) AS q2 ON q2.type = q1.type
View on DB Fiddle
The SQL is tested for MySql. But it's an ANSI standard SQL that would run on almost any RDBMS, including MS Sql Server.

one way of doing this would be to use ROW_NUMBER():
WITH CTE AS (SELECT A.id AS Aid,
A.[type],
A.amount,
B.id AS bid,
txt,
ROW_NUMBER() OVER (PARTITION BY A.id ORDER BY B.id) AS RN
FROM A
LEFT JOIN B ON A.id = B.A_ID)
SELECT MIN(Aid) AS Min_A_ID,
SUM(CASE RN WHEN 1 THEN amount END) AS Amount,
COUNT(bid) AS BCount
FROM CTE
GROUP BY [type];
I also recommend getting rid of that text datatype and using varchar(MAX).

Related

SQL - select * given count from another table

I'm trying to select * from two tables (a and b) using a join (column a.id and b.id), given that the count of a column (b.owner) in b is lower than 3, i.e. the occurence of a person's name can be max 2.
I've tried:
SELECT a.*, COUNT(b.owner) AS b_count
FROM a LEFT JOIN b on a.id = b.id
GROUP BY b.owner HAVING COUNT(b_count) <3
As im pretty new to SQL, im pretty stuck here. How can i resolve this issue? The result should be all columns for owners who do not appear more than twice in the data.
The query you are trying to run is not working due to the columns missing in the GROUP BY clause.
As you are outputting all columns from table a (with SELECT a.*), you need to include all those columns in the GROUP BY statement, so that the database understand the group of fields to group by and perform the aggregation required (in your case COUNT(b.owner)).
Example
Considering that your table a has 3 columns below:
CREATE TABLE persons (
id INTEGER,
name VARCHAR(50),
birthday DATE,
PRIMARY KEY (id)
);
.. and your table b the following and referencing the first table as below:
CREATE TABLE sales (
id INTEGER,
person_id INTEGER,
sale_value DECIMAL,
PRIMARY KEY (id),
FOREIGN KEY (person_id) REFERENCES persons(id)
);
.. you should query it aggregating the COUNT() by those 3 columns:
SELECT a.id, a.name, a.birthday, COUNT(b.person_id) AS b_count
FROM persons a
LEFT JOIN sales b ON a.id = b.person_id
GROUP BY a.id, a.name, a.birthday
HAVING COUNT(b.person_id) < 3
Alternative
In case the total of records on the 2nd table is not important to you, you could use a different "strategy" here to avoid performing the JOIN between the tables (useful when joining two huge tables) and rewriting all the columns from a on the SELECT+GROUP BY.
By identifying the records that has less than the 3 occurrences firstly:
SELECT b.person_id
FROM sales b
GROUP BY b.person_id
HAVING COUNT(b.id) < 3;
.. and using it in the WHERE clause to retrieve all the columns from the 1st table only for the ids that resulted from the previous query:
SELECT a.*
FROM persons a
WHERE a.id IN (....other query here....);
.. the execution happens in a more chronological and, perhaps, easier way to visualize while getting more familiar with SQL:
SELECT a.*
FROM persons a
WHERE a.id IN (SELECT b.person_id
FROM sales b
GROUP BY b.person_id
HAVING COUNT(b.id) < 3);
DB Fiddle here
In Standard SQL, you can use:
SELECT a.*, COUNT(b.owner) AS b_count
FROM a LEFT JOIN
b
ON a.id = b.id
GROUP BY a.id
HAVING COUNT(b.owner) < 3;
This may not work in all databases (and it assumes that a.id is unique/primary key). An alternative would be to use a correlated subquery:
SELECT a.*
FROM (SELECT a.*,
(SELECT COUNT(*)
FROM b
WHERE a.id = b.id
) as b_count
FROM a
) a
WHERE b_count < 3;

SQL Server query for calculating sum

I am trying to write a SQL query for calculating sum without success.
Let's say that we have:
table A with columns id and type
table B with columns id, a_id (relation to table A) and amount
I succeed to calculate number of records by type like in the following example:
SELECT DISTINCT
type,
COUNT(A.id) OVER (PARTITION BY type) AS numOfRecords
FROM A;
How to calculate sum of amounts also per type (to sum up all amounts from table B for all distinct types in A)?
Your query would normally be written as:
select type, count(*) as num_records
from A
group by type;
Then, you can incorporate b as:
select a.type, count(*) as num_records, sum(b.amount)
from A left join
(select a_id, sum(amount) as amount
from b
group by a_id
) b
on b.a_id = a.id
group by a.type;
You can also join and aggregate without a subquery, but this will throw off the count. To fix that, you can use count(distinct):
select a.type, count(distinct a.id) as num_records, sum(b.amount)
from A left join
from b
on b.a_id = a.id
group by a.type;

Finding % of rows in a table that exist in another table?

I have the following tables:
Table A:
entryDate memberID course
Each memberID can occur multiple times during the same date
2016-05-10 1192875 STAT-2294
2016-05-10 3292875 STAT-2294
2016-05-10 1192875 ENG-115
Table B consists only of memberIDs
What I’m looking for is a way to find the % of memberIDs in Table A that exist in Table B on a given day.
This is where I'm at so far:
SELECT entryDate,
Count(CASE
WHEN tableA.memberID IN (SELECT memberID
FROM tableB) THEN 1
ELSE 0
END) AS membership
FROM tableA
WHERE entryDate BETWEEN ‘2016-05-01’ AND ‘2016-05-15’
GROUP BY entryDate;
I'm trying to get the raw count as a starting point but I get the following error
Unsupported SubQuery Expression ‘memberID': Currently SubQuery
expressions are only allowed as Where Clause predicates
What is wrong with my current query?
How can I get the % of rows in TableA that exist in TableB for a specific entryDate?
TIA! -Craig
You can use exists to do this.
select count(*)
from tableA a
where exists (select 1 from tableB b where a.memberID = b.memberID)
and entryDate BETWEEN '20160501' AND '20160515'
To get the % entries,
select 100.0 * count(*) / (select count(*)
from tableA a
where exists (select 1 from tableB b where a.memberID = b.memberID)
and entryDate BETWEEN '20160501' AND '20160515')
from tableA
where entryDate BETWEEN '20160501' AND '20160515'
Edit: Correlated subqueries aren't supported in Hive, this can be done with a left join.
select 100.0 * count(b.memberID) / count(a.memberID)
from tableA a
left join tableB b on a.memberID = b.memberID and a.entryDate BETWEEN '20160501' AND '20160515'
I think LEFT JOIN is the easiest method. Assuming tableB has no duplicates:
SELECT entryDate, COUNT(*) as numA,
COUNT(b.memberId) as numB,
AVG(CASE WHEN b.memberId is not null THEN 1.0 ELSE 0.0 END) as ratio
FROM tableA a LEFT JOIN
tableB b
ON a.memberId = b.memberId
WHERE entryDate BETWEEN '2016-05-01' AND '2016-05-15'
GROUP BY entryDate;

How to Insert Select every with TOP() clause

these are my table,
http://sqlfiddle.com/#!3/a8087/1
What i'm trying to achieve is insert into another new table by selecting from Tbl2 and CustTable.
E.g:
INSERT INTO tbl3
SELECT TOP(SELECT Counter FROM Tbl2) a.name, a.amount FROM custTable a
INNER JOIN Tbl2 b ON a.custId = b.custid
I want to insert X number of ROW base on CustId's [Counter].
It's not working because Subquery returned more than 1 value.
how can i fix the query in TOP()?
You can use Windowing functions to do rank the rows by customer, and then filter by the counter:
WITH cte as
(
SELECT a.Name, a.Amount, b.Counter,
ROW_NUMBER() OVER (PARTITION BY a.CustID ORDER BY a.Amount DESC) AS RN
FROM custTable a
INNER JOIN Tbl2 b ON a.custId = b.custid
)
SELECT cte.name, cte.amount
INTO tbl3
FROM cte
WHERE cte.rn <= Counter;
You'll need to choose an ORDER on each customer to determine 'which' of the TOP records get included (I've assumed you want the top amounts here)
I've also used SELECT ... INTO to create table 3 on the fly, but you can INSERT INTO if it is already created.
Updated your SqlFiddle here

LEFT JOIN - How to join tables and include extra row even if you have right match

I have two tables
Table A
-------
ID
ProductName
Table B
-------
ID
ProductID
Size
I want to join these two tables
SELECT * FROM
(SELECT * FROM A)
LEFT JOIN
(SELECT * FROM B)
ON A.ID = B.ProductID
This is easy, I will get all rows from A multiplied by rows matched in B, and NULL fields if there is no match.
But here comes the tricky question, how can I get all rows from A with NULL fields for table B, even if there is a match, so I get an extra line with NULL values plus all the matches?
SELECT A.*
, B3.ID
, B3.ProductID
, B3.Size
FROM A
LEFT JOIN
(
SELECT ProductID as MatchID
, ID
, ProductID
, Size
FROM B
UNION ALL
SELECT ID
, null
, null
, null
FROM A A2
) B3
ON A.ID = B3.MatchID
Live example at SQL Fiddle.
Instead of using UNION ALL in a subquery as suggested by others, you could also (and I would) use UNION ALL at the outer level, which keeps the query simpler:
SELECT A.ID, A.ProductName, B.ID, B.Size
FROM A
INNER JOIN B
ON B.ProductID = A.ID
UNION ALL
SELECT A.ID, A.ProductName, NULL, NULL
FROM A
Since every join is going to be successful, we can switch to a full/inner join:
SELECT
*
FROM
A
INNER JOIN
(SELECT ID,ProductID,Size FROM B
UNION ALL
SELECT NULL,ID,NULL FROM A) B
ON
A.ID = B.ProductID
Now would be a very good time to switch to naming columns explicitly, rather than using SELECT *
Or, if, as per #Andomar's comment, you need all of the B columns to be NULL:
SELECT
A.ID,A.ProductName,
B.ID,B.ProductID,B.Size
FROM
A
INNER JOIN
(SELECT ID,ProductID,Size,ProductID as MatchID FROM B
UNION ALL
SELECT NULL,NULL,NULL,ID FROM A) B
ON
A.ID = B.MatchID