SQL Select every row in table1 that has a row in table2 - sql

I have two tables, one for users, and one for renewals. I want to select all users who have a row in the renewals table for a specific year, and I can do this fine. However, if there are more than one row for a specific user for the specific year in the renewals table, I get duplicates, which I don't want.
I assume it's because I still don't quite understand JOINS, so here is my query:
SELECT * FROM `users` AS US
RIGHT JOIN `usermeta` UM1
ON UM1.`user_id` = US.`ID`
RIGHT JOIN `membership_renewals` MR
ON MR.`user` = US.`ID` AND MR.year = '2011'
WHERE
UM1.meta_key = 'member'
AND UM1.meta_value = 1
AND US.`user_pass` NOT LIKE '-%'

You can do it with JOINS, but I like to do this kind of thing with EXISTS and subqueries, because it reads more like the rule I am trying to enforce.
SELECT * FROM `users` AS US
RIGHT JOIN `usermeta` UM1
ON UM1.`user_id` = US.`ID`
WHERE
Exists (Select 1 FROM `membership_renewals` MR
WHERE MR.`user` = US.`ID`
AND MR.year = '2011')
AND UM1.meta_key = 'member'
AND UM1.meta_value = 1
AND US.`user_pass` NOT LIKE '-%'
P.S. I really don't like using RIGHT JOIN unless I have to. If you can, just user INNER JOIN. If not, rearrange the FROM so you can use LEFT JOIN. Again, it is just for readability, but I don't know that i have ever used RIGHT JOIN.

SELECT *
FROM `users` AS US
RIGHT JOIN `usermeta` UM1
ON UM1.`user_id` = US.`ID`
RIGHT JOIN (select distinct mr.user from `membership_renewals` MR where MR.year = '2011') vt on vt.user = us.id
WHERE UM1.meta_key = 'member'
AND UM1.meta_value = 1
AND US.`user_pass` NOT LIKE '-%'
This will do it.

Related

How to Display Two Query Side by Side using SQL

I have the following 2 queries that are almost identical except the second query contains a table join, where clause and has less FAXDEPTs(the commented out portion is included in Query 2 just shared the one query to make easier to read).
I want to join the two queries by FAXDEPT and have the results look something like this,
FAXDEPT| TOTAL DOCUMENTS(Query1)|TOTAL PAGES (Query1)|TOTAL DOCUMENTS(Query2)|TOTAL PAGES (Query2)
Query 1 contains more FAXDEPTs than Query2. Is there a way I can display "0"s for Query 2 TOTAL PAGES and TOTAL DOCS
I'm assumed I would do some sort of FUll OUTER JOIN but can't seem to get it to work. Not sure if using alias is part of my issue or not. I appreciate all the help in advance!
select sq.faxdept 'Fax Department', count(sq.docid)'Total Documents', sum(sq.pages)'Total Pages'
from
(
select idp.id as docid, (MAX(idp.pagenum)+1) as pages, ki178.keyvaluechar as faxdept
from DOCDATA dd
left join FAXDEPT ki178 on id.id = ki178.id
left join IMPORTSOURCE ki228 on id.id = ki228.id
left join BATCHINFO kgd105 on dd.id = kgd105.id
left join PAGEDATA idp on id.id = idp.id
--left join QUEUE ilc on id.id = ilc.id
where
id.datestored > '10/07/2021' and id.status = 0
--and ilc.num = 252
and (kgd105.kg128 like 'DISC DIP%SJIN%' or ki228.keyvaluechar like 'SJIN' )
group by idp.id, ki178.keyvaluechar
)
as sq
group by sq.faxdept

sql counting the number is not working correctly

I make related queries and the counting does not work correctly, when I connect 4 and join and add a condition, it does not count correctly, but without the 4th joina and the condition it works correctly. first option result = 2
SELECT
pxixolog_details.*,
directions.direction,
COUNT(directions.direction) procent
FROM
pxixolog_details
LEFT JOIN psixologs_direction ON pxixolog_details.id = psixologs_direction.psixolog_id
LEFT JOIN directions ON directions.id = psixologs_direction.direction_id
LEFT JOIN psixologs_weeks ON pxixolog_details.id = psixologs_weeks.psixolog_id
WHERE
directions.direction IN(
'Трудности в отношениях',
'Проблемы со сном',
'Нежелательная агрессия'
)
AND birthday BETWEEN '1956-04-29' AND '2021-04-29' AND psixologs_weeks.week = '4'
GROUP BY
pxixolog_details.id
and the second one doesn't work correctly. result = 4
SELECT
pxixolog_details.*,
directions.direction,
COUNT(directions.direction) procent
FROM
pxixolog_details
LEFT JOIN psixologs_direction ON pxixolog_details.id = psixologs_direction.psixolog_id
LEFT JOIN directions ON directions.id = psixologs_direction.direction_id
LEFT JOIN psixologs_weeks ON pxixolog_details.id = psixologs_weeks.psixolog_id
LEFT JOIN psixologs_times ON pxixolog_details.id = psixologs_times.psixolog_id
WHERE
directions.direction IN(
'Трудности в отношениях',
'Проблемы со сном',
'Нежелательная агрессия'
)
AND birthday BETWEEN '1956-04-29' AND '2021-04-29' AND psixologs_weeks.week = '4'
AND (psixologs_times.time = '09:00' OR psixologs_times.time = '10:00')
GROUP BY
pxixolog_details.id
what am I doing wrong?
You get double the amount of results when doing 4 JOINs because through the new (4th) JOIN you allow 2 records (9:00 and 10:00 o'clock) for each of the other joined records in the first 3 JOINs. That can lead to the observed result.
Check your data and make sure that your 4th JOIN condition yields a 1:1 record matching with the other data.
The last table has psixologs_times matches multiple rows for each psixolog_id.
You can easily see this using a query:
select psixolog_id, count(*)
from psixologs_times
group by psixolog_id
having count(*) > 1;
How you fix this problem depends on what you want to do. The simplest solution is to use count(distinct):
COUNT(DISTINCT directions.direction) as procent
However, this might just be hiding the problem. You might want to choose one row from the psixologs_times table. Or pre-aggregate it. Or do something else.

SQL Coding, Unable to Limit based on where in subquery

I have a query I'm writing, and it's nearly perfect, excepting one error I can't seem to control for.
SELECT Clients.client_id,
Clients.last_name + ', ' + Clients.first_name AS Client_Name,
Query3.team_id,
Query3.team_name,
Query2.CSW_Name,
Query1.Date_of_Service AS Last_Service_by_CSW,
Query.Tx_Start_Date AS Tx_Start_Date,
Query.Tx_End_Date AS Tx_End_Date
FROM Clients
LEFT OUTER JOIN
(SELECT TxPlus.client_id,
Max(TxPlus.start_date) AS Tx_Start_Date,
Max(TxPlus.end_date) AS Tx_End_Date
FROM TxPlus
GROUP BY TxPlus.client_id) Query ON Query.client_id = Clients.client_id
LEFT OUTER JOIN
(SELECT EmployeeClients.client_id,
Employees.last_name + ', ' + Employees.first_name AS CSW_Name
FROM EmployeeClients
INNER JOIN Employees ON EmployeeClients.emp_id = Employees.emp_id
WHERE EmployeeClients.case_manager = 1) Query2 ON Clients.client_id = Query2.client_id
LEFT OUTER JOIN
(SELECT ClientVisit.client_id,
Max(ClientVisit.rev_timeout) AS Date_of_Service
FROM ClientVisit
INNER JOIN EmployeeClients ON ClientVisit.emp_id = EmployeeClients.emp_id
GROUP BY ClientVisit.client_id,
EmployeeClients.case_manager,
ClientVisit.visittype_id
HAVING EmployeeClients.case_manager = 1
AND ClientVisit.visittype_id = 9) Query1 ON Clients.client_id = Query1.client_id
LEFT OUTER JOIN
(SELECT TeamClient.client_id,
Team.team_id,
Team.team_name
FROM TeamClient
INNER JOIN Team ON TeamClient.team_id = Team.team_id
WHERE TeamClient.primary_flag = 1) Query3 ON Clients.client_id = Query3.client_id
WHERE Clients.client_status LIKE 'Active'
The objective is to basically pull a client record, and show the dates for their Treatment Plan, their Primary Team Assignments, their case manager, and the most Community Support service there case manager provided for them.
Right now, with the code as it's written, I get all the correct information, EXCEPT the most recent Community Support date. This returns the most recent service done by ANYONE, not the Case Manager.
I'm certain its a very simple problem, but it's driving me up the wall.
You all are an invaluable resource, and I thank you in advance.
Well, without knowing what the data looks like, I can't be 100% certain that this is the answer, but I have a pretty good feeling about it. I believe that this is the query that is returning your, "Community Support Date". (It wasn't identified as clearly as I would have liked, but it looks to me like this is it.)
(SELECT ClientVisit.client_id,
Max(ClientVisit.rev_timeout) AS Date_of_Service
FROM ClientVisit
INNER JOIN EmployeeClients ON ClientVisit.emp_id = EmployeeClients.emp_id
GROUP BY ClientVisit.client_id,
EmployeeClients.case_manager,
ClientVisit.visittype_id
HAVING EmployeeClients.case_manager = 1
AND ClientVisit.visittype_id = 9) Query1 ON Clients.client_id = Query1.client_id
So what you're doing is creating a subset of the data in these two tables, joined by emp_id and grouped by client, case manager and visit type. Then you remove any groups that are not the case manager or visit type you want. But by this point, you've already selected your Max(ClientVisit.rev_timeout) AS Date_of_Service. I believe that if you change your HAVING clause into a WHERE clause (and thus filter before you group) you will get the results you're looking for.
EDIT:
(SELECT ClientVisit.client_id,
Max(ClientVisit.rev_timeout) AS Date_of_Service
FROM ClientVisit
INNER JOIN EmployeeClients ON ClientVisit.emp_id = EmployeeClients.emp_id
WHERE EmployeeClients.case_manager = 1
AND ClientVisit.visittype_id = 9
GROUP BY ClientVisit.client_id) Query1 ON Clients.client_id = Query1.client_id

NOT IN query not working, SQL Server 2008

The first part of the query before not in runs and gives me a list of 100 records. The second query runs and gives me a list of 75 records. The query I'm trying to write using not in is to get the records that are in one result set, but not the other. The error I get is incorrect syntax near the word not.
SELECT distinct Patient.patientid
FROM Patient INNER JOIN
patientICD ON Patient.patientid = patientICD.patientid AND Patient.admissiondate = patientICD.admissiondate AND
Patient.dischargedate = patientICD.dischargedate INNER JOIN
tblICD ON patientICD.primarycode = tblICD.ICD_ID
WHERE (tblICD.descrip LIKE N'%diabetes%') and not in
(
SELECT distinct Patient.patientid
FROM Patient INNER JOIN
patientICD ON Patient.patientid = patientICD.patientid AND Patient.admissiondate = patientICD.admissiondate AND
Patient.dischargedate = patientICD.dischargedate INNER JOIN
tblICD ON patientICD.primarycode = tblICD.ICD_ID
WHERE (tblICD.icd_id LIKE N'25000')
)
Is it ever allowed to write a query with expression AND NOT IN (select query?
You need to specify what field is not in the second query
and Patient.patientid not in
Did you mean to write this?
WHERE (tblICD.descrip LIKE N'%diabetes%') and Patient.patientid not in
UPDATE
Would it be possible to rewrite the entire thing as this?
SELECT distinct Patient.patientid
FROM Patient INNER JOIN
patientICD ON Patient.patientid = patientICD.patientid AND Patient.admissiondate = patientICD.admissiondate AND
Patient.dischargedate = patientICD.dischargedate INNER JOIN
tblICD ON patientICD.primarycode = tblICD.ICD_ID
WHERE tblICD.descrip LIKE N'%diabetes%' AND tblICD.icd_id NOT LIKE N'25000'
You forgot a field before NOT.
I believe you need to specify the column that the not in is looking at. So according to your script I think you would want and Patient.patientid not in
Purely stylistic: you can "squeeze out" the patientICD*tblICD product, and put it into a CTE, and reference that twice, like: (untested)
WITH zzz AS (
SELECT pic.patientid , pic.admissiondate , pic.dischargedate
, tab.ICD_ID , tab.descrip
FROM patientICD pic
JOIN tblICD tab ON pic.primarycode = tab.ICD_ID
)
SELECT DISTINCT p.patientid
FROM Patient p
JOIN zzz one ON one.patientid = p.patientid
AND one.admissiondate = p.admissiondate
AND one.dischargedate = p.dischargedate
WHERE one.descrip LIKE N'%diabetes%'
AND p.patientid NOT IN (
SELECT two.patientid
FROM zzz two
WHERE two.admissiondate = p.admissiondate
AND two.dischargedate = p.dischargedate
AND two.icd_id LIKE N'25000'
);
NOTE: I don't like the LIKE N'25000'. I think an exact match would be fine. And the icd_id-field should be numeric, probably. And the {admissiondate,dischargedate} pair should be modelled out, too; possibly by using a diagnosis_id or incident_id.

Super Slow Query - sped up, but not perfect... Please help

I posted a query yesterday (see here) that was horrible (took over a minute to run, resulting in 18,215 records):
SELECT DISTINCT
dbo.contacts_link_emails.Email, dbo.contacts.ContactID, dbo.contacts.First AS ContactFirstName, dbo.contacts.Last AS ContactLastName, dbo.contacts.InstitutionID,
dbo.institutionswithzipcodesadditional.CountyID, dbo.institutionswithzipcodesadditional.StateID, dbo.institutionswithzipcodesadditional.DistrictID
FROM
dbo.contacts_def_jobfunctions AS contacts_def_jobfunctions_3
INNER JOIN
dbo.contacts
INNER JOIN
dbo.contacts_link_emails
ON dbo.contacts.ContactID = dbo.contacts_link_emails.ContactID
ON contacts_def_jobfunctions_3.JobID = dbo.contacts.JobTitle
INNER JOIN
dbo.institutionswithzipcodesadditional
ON dbo.contacts.InstitutionID = dbo.institutionswithzipcodesadditional.InstitutionID
LEFT OUTER JOIN
dbo.contacts_def_jobfunctions
INNER JOIN
dbo.contacts_link_jobfunctions
ON dbo.contacts_def_jobfunctions.JobID = dbo.contacts_link_jobfunctions.JobID
ON dbo.contacts.ContactID = dbo.contacts_link_jobfunctions.ContactID
WHERE
(dbo.contacts.JobTitle IN
(SELECT JobID
FROM dbo.contacts_def_jobfunctions AS contacts_def_jobfunctions_1
WHERE (ParentJobID <> '1841')))
AND
(dbo.contacts_link_emails.Email NOT IN
(SELECT EmailAddress
FROM dbo.newsletterremovelist))
OR
(dbo.contacts_link_jobfunctions.JobID IN
(SELECT JobID
FROM dbo.contacts_def_jobfunctions AS contacts_def_jobfunctions_2
WHERE (ParentJobID <> '1841')))
AND
(dbo.contacts_link_emails.Email NOT IN
(SELECT EmailAddress
FROM dbo.newsletterremovelist AS newsletterremovelist))
ORDER BY EMAIL
With a lot of coaching and research, I've tuned it up to the following:
SELECT contacts.ContactID,
contacts.InstitutionID,
contacts.First,
contacts.Last,
institutionswithzipcodesadditional.CountyID,
institutionswithzipcodesadditional.StateID,
institutionswithzipcodesadditional.DistrictID
FROM contacts
INNER JOIN contacts_link_emails ON
contacts.ContactID = contacts_link_emails.ContactID
INNER JOIN institutionswithzipcodesadditional ON
contacts.InstitutionID = institutionswithzipcodesadditional.InstitutionID
WHERE
(contacts.ContactID IN
(SELECT contacts_2.ContactID
FROM contacts AS contacts_2
INNER JOIN contacts_link_emails AS contacts_link_emails_2 ON
contacts_2.ContactID = contacts_link_emails_2.ContactID
LEFT OUTER JOIN contacts_def_jobfunctions ON
contacts_2.JobTitle = contacts_def_jobfunctions.JobID
RIGHT OUTER JOIN newsletterremovelist ON
contacts_link_emails_2.Email = newsletterremovelist.EmailAddress
WHERE (contacts_def_jobfunctions.ParentJobID <> 1841)
GROUP BY contacts_2.ContactID
UNION
SELECT contacts_1.ContactID
FROM contacts_link_jobfunctions
INNER JOIN contacts_def_jobfunctions AS contacts_def_jobfunctions_1 ON
contacts_link_jobfunctions.JobID = contacts_def_jobfunctions_1.JobID
AND contacts_def_jobfunctions_1.ParentJobID <> 1841
INNER JOIN contacts AS contacts_1 ON
contacts_link_jobfunctions.ContactID = contacts_1.ContactID
INNER JOIN contacts_link_emails AS contacts_link_emails_1 ON
contacts_link_emails_1.ContactID = contacts_1.ContactID
LEFT OUTER JOIN newsletterremovelist AS newsletterremovelist_1 ON
contacts_link_emails_1.Email = newsletterremovelist_1.EmailAddress
GROUP BY contacts_1.ContactID))
While this query is now super fast (about 3 seconds), I've blown part of the logic somewhere - it only returns 14,863 rows (instead of the 18,215 rows that I believe is accurate).
The results seem near correct. I'm working to discover what data might be missing in the result set.
Can you please coach me through whatever I've done wrong here?
Thanks,
Russell Schutte
The main problem with your original query was that you had two extra joins just to introduce duplicates and then a DISTINCT to get rid of them.
Use this:
SELECT cle.Email,
c.ContactID,
c.First AS ContactFirstName,
c.Last AS ContactLastName,
c.InstitutionID,
izip.CountyID,
izip.StateID,
izip.DistrictID
FROM dbo.contacts c
INNER JOIN
dbo.institutionswithzipcodesadditional izip
ON izip.InstitutionID = c.InstitutionID
INNER JOIN
dbo.contacts_link_emails cle
ON cle.ContactID = c.ContactID
WHERE cle.Email NOT IN
(
SELECT EmailAddress
FROM dbo.newsletterremovelist
)
AND EXISTS
(
SELECT NULL
FROM dbo.contacts_def_jobfunctions cdj
WHERE cdj.JobId = c.JobTitle
AND cdj.ParentJobId <> '1841'
UNION ALL
SELECT NULL
FROM dbo.contacts_link_jobfunctions clj
JOIN dbo.contacts_def_jobfunctions cdj
ON cdj.JobID = clj.JobID
WHERE clj.ContactID = c.ContactID
AND cdj.ParentJobId <> '1841'
)
ORDER BY
email
Create the following indexes:
newsletterremovelist (EmailAddress)
contacts_link_jobfunctions (ContactID, JobID)
contacts_def_jobfunctions (JobID)
Do you get the same results when you do:
SELECT count(*)
FROM
dbo.contacts_def_jobfunctions AS contacts_def_jobfunctions_3
INNER JOIN
dbo.contacts
INNER JOIN
dbo.contacts_link_emails
ON dbo.contacts.ContactID = dbo.contacts_link_emails.ContactID
ON contacts_def_jobfunctions_3.JobID = dbo.contacts.JobTitle
SELECT COUNT(*)
FROM
contacts
INNER JOIN contacts_link_jobfunctions
ON contacts.ContactID = contacts_link_jobfunctions.ContactID
INNER JOIN contacts_link_emails
ON contacts.ContactID = contacts_link_emails.ContactID
If so keep adding each join conditon on until you don't get the same results and you will see where your mistake was. If all the joins are the same, then look at the where clauses. But I will be surprised if it isn't in the first join because the syntax you have orginally won't even work on SQL Server and it is pretty nonstandard SQL and may have been incorrect all along but no one knew.
Alternatively, pick a few of the records that are returned in the orginal but not the revised. Track them through the tables one at a time to see if you can find why the second query filters them out.
I'm not directly sure what is wrong, but when I run in to this situation, the first thing I do is start removing variables.
So, comment out the where clause. How many rows are returned?
If you get back the 11,604 rows then you've isolated the problems to the joins. Work though the joins, commenting each one out (remove the associated columns too) and figure out how many rows are eliminated.
As you do this, aim to find what is causing the desired rows to be eliminated. Once isolated, consider the join differences between the first query and the second query.
In looking at the first query, you could probably just modify that to eliminate any INs and instead do a EXISTS instead.
Consider your indexes as well. Any thing in the where or join clauses should probably be indexed.