Finding % of rows in a table that exist in another table? - sql

I have the following tables:
Table A:
entryDate memberID course
Each memberID can occur multiple times during the same date
2016-05-10 1192875 STAT-2294
2016-05-10 3292875 STAT-2294
2016-05-10 1192875 ENG-115
Table B consists only of memberIDs
What I’m looking for is a way to find the % of memberIDs in Table A that exist in Table B on a given day.
This is where I'm at so far:
SELECT entryDate,
Count(CASE
WHEN tableA.memberID IN (SELECT memberID
FROM tableB) THEN 1
ELSE 0
END) AS membership
FROM tableA
WHERE entryDate BETWEEN ‘2016-05-01’ AND ‘2016-05-15’
GROUP BY entryDate;
I'm trying to get the raw count as a starting point but I get the following error
Unsupported SubQuery Expression ‘memberID': Currently SubQuery
expressions are only allowed as Where Clause predicates
What is wrong with my current query?
How can I get the % of rows in TableA that exist in TableB for a specific entryDate?
TIA! -Craig

You can use exists to do this.
select count(*)
from tableA a
where exists (select 1 from tableB b where a.memberID = b.memberID)
and entryDate BETWEEN '20160501' AND '20160515'
To get the % entries,
select 100.0 * count(*) / (select count(*)
from tableA a
where exists (select 1 from tableB b where a.memberID = b.memberID)
and entryDate BETWEEN '20160501' AND '20160515')
from tableA
where entryDate BETWEEN '20160501' AND '20160515'
Edit: Correlated subqueries aren't supported in Hive, this can be done with a left join.
select 100.0 * count(b.memberID) / count(a.memberID)
from tableA a
left join tableB b on a.memberID = b.memberID and a.entryDate BETWEEN '20160501' AND '20160515'

I think LEFT JOIN is the easiest method. Assuming tableB has no duplicates:
SELECT entryDate, COUNT(*) as numA,
COUNT(b.memberId) as numB,
AVG(CASE WHEN b.memberId is not null THEN 1.0 ELSE 0.0 END) as ratio
FROM tableA a LEFT JOIN
tableB b
ON a.memberId = b.memberId
WHERE entryDate BETWEEN '2016-05-01' AND '2016-05-15'
GROUP BY entryDate;

Related

SQL select record right after a particular date, compare NULL with date

I like to keep all records in tableA that are right after my targeted date,
Main table A
Table B
SELECT *
FROM tableA a
LEFT JOIN tableB b on b.customerID = a.customerID and b.target_date = a.sell_date
WHERE a.sell_date > b.target_date
Unfortunately my code above doesn't work since SQL can't compare NULL with date.
My expected output is
The inequality between target_date and sell_date could go in the join condition of the FROM clause. This way the WHERE clause could be eliminated.
SELECT *
FROM tableA a
LEFT JOIN tableB b on b.customerID=a.customerID
and b.target_date <= a.sell_date;

Using Min in Update statement to get oldest record

I want to update my tableA with tableB but get only those records from table B having the oldest entry
TableA:
name ID
nick 15
john 12
tableB:
ID sportsname createddate
12 tennis 15march2019
14 baseball 15march2019
15 basketball 16march2019
15 cricket 20march2020
15 football 17may2020
My query:
update a
set a.sportsname=b.sportsname
from tablea a join tableb b
on a.id=b.id where b.createdate=( select min(createdate) from tableb )
But this is not giving correct result
update a
set a.name=b.sportsname
from #T a join (select min(createddate) as min_createddate,ID,sportsname from #t2
group by ID,sportsname) b ON b.ID=a.ID
You can use a SUB QUERY to attain this.
I suspect that the problem with your query is that you are using the minimum create date over the entire tableb rather than per id. Although you could fix that using a correlated subquery, I would recommend apply:
update a
set a.sportsname = b.sportsname
from tablea a cross apply
(select top (1) b.*
from tableb b
where a.id = b.id
order by b.createdate asc
) b;
For performance, you want an index on tableb(id, createdate desc, sportname).
You can use FIRST_VALUE() window function:
UPDATE a
SET a.sportsname=b.sportsname
FROM TableA a INNER JOIN (
SELECT DISTINCT ID,
FIRST_VALUE(sportsname) OVER (PARTITION BY ID ORDER BY createddate) sportsname
FROM TableB
) b ON b.ID = a.ID
See the demo.

use SUM with left join get me wrong result

So I have :
CREATE TABLE A (id INT,type int,amount int);
INSERT INTO A (id,type,amount) VALUES (1,0,25);
INSERT INTO A (id,type,amount) VALUES (2,0,25);
INSERT INTO A (id,type,amount) VALUES (3,1,10);
CREATE TABLE B (id INT,A_ID int,txt text);
INSERT INTO B (id,A_id,txt) VALUES (1,1,'abc');
INSERT INTO B (id,A_id,txt) VALUES (2,1,'def');
INSERT INTO B (id,A_id,txt) VALUES (3,2,'xxx');
I run this query:
SELECT min(A.id), SUM(A.amount), COUNT(B.id) FROM A
LEFT JOIN B ON A.id = B.A_id
GROUP BY A.type
I get :
min(A.id) SUM(A.amount) COUNT(B.id)
1 75 3
3 10 0
But I'm instead expecting to get :
min(A.id) SUM(A.amount) COUNT(B.id)
1 50 3
3 10 0
Can someone help? What is the best way to achieve this exact result ?
I want group BY type and get SUM of grouped A.amount and get count() of all B corresponding to its foreign key.
here is the repro : https://www.db-fiddle.com/f/esu13uGLcgFDpX7aEQRMJR/0 please RUN sql code.
EDIT to add more detail : I know the result is correct if I remove group by we can see
1, 50, 2
2, 25, 1
But I expect the above result, what is the best way to achieve it ? I want make SUM of a TYPE then count all B related to this groupped A
Just a shorter version of the solution. It counts B_IDs first in the inner query, so I need to Sum the counts in the outer query.
SELECT min(A.id), SUM(A.amount), Sum(Bid) FROM A
LEFT JOIN (select count(id) as Bid, A_id from B group by A_id) as Bcount
ON A.id = Bcount.A_id
GROUP BY A.type
This can happen when you SUM from an 1-N relation.
The matching records can multiply the result.
For example, when 1 records in A are joined with 2 in B it returns 2 times the amount of A before the GROUP BY. So a SUM then doubles A.amount.
A way to get around that is using sub-queries that join one-on-one.
And a COUNT DISTINCT can be used to count unique id's.
So this just a way to get the SUM of A correct.
SELECT
q1.type,
q1.min_id,
q2.amount,
COALESCE(q1.totalB, 0) as totalB
FROM
(
SELECT
A.type,
MIN(A.id) AS min_id,
COUNT(DISTINCT B.id) AS totalB
FROM A
LEFT JOIN B ON B.A_id = A.id
GROUP BY A.type
) AS q1
JOIN
(
SELECT
type,
SUM(amount) AS amount
FROM A
GROUP BY type
) AS q2 ON q2.type = q1.type
View on DB Fiddle
The SQL is tested for MySql. But it's an ANSI standard SQL that would run on almost any RDBMS, including MS Sql Server.
one way of doing this would be to use ROW_NUMBER():
WITH CTE AS (SELECT A.id AS Aid,
A.[type],
A.amount,
B.id AS bid,
txt,
ROW_NUMBER() OVER (PARTITION BY A.id ORDER BY B.id) AS RN
FROM A
LEFT JOIN B ON A.id = B.A_ID)
SELECT MIN(Aid) AS Min_A_ID,
SUM(CASE RN WHEN 1 THEN amount END) AS Amount,
COUNT(bid) AS BCount
FROM CTE
GROUP BY [type];
I also recommend getting rid of that text datatype and using varchar(MAX).

LEFT JOIN - How to join tables and include extra row even if you have right match

I have two tables
Table A
-------
ID
ProductName
Table B
-------
ID
ProductID
Size
I want to join these two tables
SELECT * FROM
(SELECT * FROM A)
LEFT JOIN
(SELECT * FROM B)
ON A.ID = B.ProductID
This is easy, I will get all rows from A multiplied by rows matched in B, and NULL fields if there is no match.
But here comes the tricky question, how can I get all rows from A with NULL fields for table B, even if there is a match, so I get an extra line with NULL values plus all the matches?
SELECT A.*
, B3.ID
, B3.ProductID
, B3.Size
FROM A
LEFT JOIN
(
SELECT ProductID as MatchID
, ID
, ProductID
, Size
FROM B
UNION ALL
SELECT ID
, null
, null
, null
FROM A A2
) B3
ON A.ID = B3.MatchID
Live example at SQL Fiddle.
Instead of using UNION ALL in a subquery as suggested by others, you could also (and I would) use UNION ALL at the outer level, which keeps the query simpler:
SELECT A.ID, A.ProductName, B.ID, B.Size
FROM A
INNER JOIN B
ON B.ProductID = A.ID
UNION ALL
SELECT A.ID, A.ProductName, NULL, NULL
FROM A
Since every join is going to be successful, we can switch to a full/inner join:
SELECT
*
FROM
A
INNER JOIN
(SELECT ID,ProductID,Size FROM B
UNION ALL
SELECT NULL,ID,NULL FROM A) B
ON
A.ID = B.ProductID
Now would be a very good time to switch to naming columns explicitly, rather than using SELECT *
Or, if, as per #Andomar's comment, you need all of the B columns to be NULL:
SELECT
A.ID,A.ProductName,
B.ID,B.ProductID,B.Size
FROM
A
INNER JOIN
(SELECT ID,ProductID,Size,ProductID as MatchID FROM B
UNION ALL
SELECT NULL,NULL,NULL,ID FROM A) B
ON
A.ID = B.MatchID

SQL Select all from table A with counting table B

If I needed a query such that I grab all columns from table A but I also need to count how many B's each row in table A has.
Table A: id | username | email | address
Table B: user_id
SELECT *, total
FROM table_a
WHERE total = (SELECT * FROM table_b WHERE table_a.id==table_b.user_id)
Any ideas?
Edit: For more clarification here is the desired output
1 | steve | steve#steve.steve | 123 Steve | 5 // letters
2 | chris | chris#chris.chris | 123 chris | 2 // letters
SELECT
table_a.id,
table_a.username,
table_a.email,
table_a.address,
count(table_b.user_id) as total
FROM table_a
LEFT OUTER JOIN table_b
ON table_a.id = table_b.user_id
GROUP BY (
table_a.id,
table_a.username,
table_a.email,
table_a.address
)
This is a good example of needing an outer join. If we used an inner join, the query would exclude the entries in table_a which has zero table_b entries.
This could be further refined to meet two challenges:
include all of the columns of table_a without explicitly asking for them.
Handle the zero entry scenario without using non-standard SQL (eg ISNULL, WHERE)
This code below should do it.
SELECT
table_a.*,
tempTable.total
FROM (
SELECT
table_a.Id,
COUNT(table_b.user_id) as total
FROM table_a
LEFT OUTER JOIN table_b
ON table_a.id = table_b.user_id
GROUP BY (table_a.id)
) AS tempTable
INNER JOIN table_a
ON tempTable.Id = table_a.Id;
Comparing this with Cybernate's solution, non-standard SQL looks very attractive :-)
Try this:
SELECT a.*, ISNULL(bcnt, 0) bcnt
FROM TableA a LEFT JOIN
(
SELECT user_id, COUNT(1) AS BCNT
FROM TableB
GROUP BY user_id
) b
ON a.id = b.user_id
You can just use a LEFT JOIN and AGGREGATION functions
SELECT b.user_id,
min(a.username) UserName,
min(a.email) Email,
min(a.address) Address,
COUNT(*) Quantity
FROM table_b b left join
table_a a on a.id=b.user_id
group by b.user_id
select
*,
(select count(*)
from #TableB as B
where A.id = B.user_id) as total
from #TableA as A