Trying to write SQL Self Join? - sql

I have a user table associated to a website with when customers forget there password they create a new account rather then be bothered retrieving their forgotten password.
I would like to see how many times a customer may have joined the website by joining the user table to itself using the customer email address and the ID would be unique each time they joined and I put a statement in checking to see if the Account IDs are different.
Here is my query:
`Select Distinct
T1.Email as "eMail-01", T2.Email as "eMail-02", T1.AccountID as "AccountID-01", T2.AccountID as "AccountID-02", T1.UserID as "UserID-01", T2.UserID as "UserID-02"
From User T1
Left Join Users T2 on T1.eMail = T2.eMail
Where ( T2.eMail is not null ) and ( T2.eMail <> '' )
and ( T1.AccountID <> T2.AccountID )`
The table has about 60,000 records in it and I seem to be getting a great number records returned base on the number of AccountID permeations.
For instance 1 Customer registered 5 times with the same email address, so I’m getting 25 records back (5 x 5). I’m not sure if I’m writing this query correctly.
The query is running very long.

If I understand correctly, what you most probably want is a count of AccountID per email address so there's no need for self join here. The query would be :
SELECT Email, count(AccountID)
FROM User
GROUP BY Email
and should run quite quickly event with 60.000 emails.
Anyway, you should think about putting a UNIQUE index on the email column after cleaning the table. You could then benefit with email search performances and prevent users to create multiple account with the same e-mail address. This should help them retrieve their password instead.

Suggestions:
First, you are using a left join which is useless, because you filter the records that are not null on the right side of the relation (which can be acomplished with a simple inner join). Either you use inner join or remove the condition T2.eMail is not null.
Secondly, is your table properly indexed? If it is not, add the appropriate indexes.
Thirdly, you can use a very simple query to track down the emails that have more than one accountId:
select email, count(accountId) as accounts
from user
group by email
having count(accountId) > 1
Then you can work using only the emails that have more than one account.
Hope this helps

Related

Query build to find records where all of a series of records have a value

Let me explain a little bit about what I am trying to do because I dont even know the vocab to use to ask. I have an Access 2016 database that records staff QA data. When a staff member misses a QA we assign a job aid that explains the process and they can optionally send back a worksheet showing they learned about what was missed. If they do all of these ina 3 month period they get a credit on their QA score. So I have a series of records all of whom have a date we assigned the work(RA1) and MAY have a work returned date(RC1).
In the below image "lavalleer" has earned the credit because both of her sheets got returned. "maduncn" Did not earn the credit because he didn't do one.
I want to create a query that returns to me only the people that are like "lavalleer". I tried hitting google and searched here and access.programmers.co.uk but I'm only coming up with instructions to use Not null statements. That wouldn't work for me because if I did a IS Not Null on "maduncn" I would get the 4 records but it would exclude the null.
What I need to do is build a query where I can see staff that have dates in ALL of their RC1 fields. If any of their RC1 fields are blank I dont want them to return.
Consider:
SELECT * FROM tablename WHERE NOT UserLogin IN (SELECT UserLogin FROM tablename WHERE RCI IS NULL);
You could use a not exists clause with a correlated subquery, e.g.
select t.* from YourTable t where not exists
(select 1 from YourTable u where t.userlogin = u.userlogin and u.rc1 is null)
Here, select 1 is used purely for optimisation - we don't care what the query returns, just that it has records (or doesn't have records).
Or, you could use a left join to exclude those users for which there is a null rc1 record, e.g.:
select t.* from YourTable t left join
(select u.userlogin from YourTable u where u.rc1 is null) v on t.userlogin = v.userlogin
where v.userlogin is null
In all of the above, change all occurrences of YourTable to the name of your table.

Issues with subqueries for stored procedure

The query I am trying to perform is
With getusers As
(Select userID from userprofspecinst_v where institutionID IN
(select institutionID, professionID from userprofspecinst_v where userID=#UserID)
and professionID IN
(select institutionID, professionID from userprofspecinst_v where userID=#UserID))
select username from user where userID IN (select userID from getusers)
Here's what I'm trying to do. Given a userID and a view which contains the userID and the ID of their institution and profession, I want to get the list of other userID's who also have the same institutionID and and professionID. Then with that list of userIDs I want to get the usernames that correspond to each userID from another table (user). The error I am getting when I try to create the procedure is, "Only one expression can be specified in the select list when the subquery is not introduced with EXISTS.". Am I taking the correct approach to how I should build this query?
The following query should do what you want to do:
SELECT u.username
FROM user AS u
INNER JOIN userprofspecinst_v AS up ON u.userID = up.userID
INNER JOIN (SELECT institutionID, professionID FROM userprofspecinst_v
WHERE userID = #userID) AS ProInsts
ON (up.institutionID = ProInsts.institutionID
AND up.professionID = ProInsts.professionID)
Effectively the crucial part is the last INNER JOIN statement - this creates a table constituting the insitutionsids and professsionids the user id belongs to. We then get all matching items in the view with the same institution id and profession id (the ON condition) and then link these back to the user table on the corresponding userids (the first JOIN).
You can either run this for each user id you are interested in, or JOIN onto the result of a query (your getusers) (it depends on what database engine you are running).
If you aren't familiar with JOIN's, Jeff Atwood's introductory post is a good starting place.
The JOIN statement effectively allows you to explot the logical links between your tables - the userId, institutionID and professionID are all examples of candidates for foreign keys - so, rather than having to constantly subquery each table and piece the results together, you can link all the tables together and filter down to the rows you want. It's usually a cleaner, more maintainable approach (although that is opinion).

Using Join to parse out duplicate data

I am having a bit of a problem here with this statement below using tables:
[Users] Name,Email,Subscribed
[Email] Name,Email,Subscribed
Basically what needs to be accomplished is all the Subscribed Users in the Users table need to be checked against the users in the Email table to see if they exist in the table or not, and only return the users in the Users table which are not found in the Email table.
Here is the statement I've used, but it returns millions of rows and takes forever, and it is returning every email address in the Email table, I don't think this is the best way to approach this issue because it is not returning accurate data. Any thoughts?
SELECT Distinct c.Name,c.Email from Users c
INNER JOIN Email e on c.Email <> e.Email
WHERE c.Subscribed=1
I think this is what you are looking for.
SELECT c.Name,c.Email
FROM Users c LEFT JOIN Email e ON (c.email=e.email)
WHERE (e.email is null) and (c.subscribed=1)
The INNER JOIN in your query with no ON clause is why you are getting wacky results.
change the <> to =
You are doing an join on non-equality so that is why your result size is exploding.
On a side note - check into foreign keys going forward - they make this type of cross table referential integrity a non-issue.
UPDATE - if you want results in one table that are not in the other do:
select * from table1 where email not in (select distinct email from table2);
This will give you all the records in table1 that don't match an email in table 2.
HTH
You will want to use a LEFT JOIN to accomplish this:
SELECT Distinct u.Name, u.Email
FROM Users AS u
LEFT JOIN Email AS e ON u.Email = e.Email
WHERE e.Email IS NULL
AND u.Subscribed = 1
Graphical explanation of JOINS may be helpful
I strongly recommend using Foreign Keys to maintain referential integrity.

Start Center Result Set for People Table

I am working with the Maximo Asset Management System (version 7.1.1.6). I am trying to display a result set on the start center that contains a Do Not Call list of specific people. However, when using a query for the result set (a query that was saved in the People section that contains the appropriate "where" clause such as "department='ABC'"), I cannot select the phone number or the email address as a column to display. I believe this is due to the fact that the "Primary Phone" and "Primary Email" fields in the person table are not really there. They are virtual fields that are connected in the People application to the Phone and Email tables and joined on the personid column. If I run the following query in the database, I get the result set that I want:
select * from dbo.person as p
left outer join dbo.phone as ph on p.personid=ph.personid and ph.isprimary=1
left outer join dbo.email as e on p.personid=e.personid and e.isprimary=1
Unfortunately for the result sets, you don't have access to the "FROM" clause you can only edit the "WHERE" clause.
Anyone have any idea how to solve this other than adding 2 new columns to the person table for primary phone and primary email? I don't want to HAVE to do it, but I can if I must.
How about this for a where clause
... where (select count(*) from dbo.phone ph where :personid = ph.personid and ph.isprimary=1) > 0 and (select count(*) from dbo.email e where :personid = e.personid and e.isprimary=1) > 0
I can also think of a solution creating a relationships in the database configuration application, but the above query is more straight forward.

FIRST ORDER BY ... THEN GROUP BY

I have two tables, one stores the users, the other stores the users' email addresses.
table users: (userId, username, etc)
table userEmail: (emailId, userId, email)
I would like to do a query that allows me to fetch the latest email address along with the user record.
I'm basically looking for a query that says
FIRST ORDER BY userEmail.emailId DESC
THEN GROUP BY userEmail.userId
This can be done with:
SELECT
users.userId
, users.username
, (
SELECT
userEmail.email
FROM userEmail
WHERE userEmail.userId = users.userId
ORDER BY userEmail.emailId DESC
LIMIT 1
) AS email
FROM users
ORDER BY users.username;
But this does a subquery for every row and is very inefficient. (It is faster to do 2 separate queries and 'join' them together in my program logic).
The intuitive query to write for what I want would be:
SELECT
users.userId
, users.username
, userEmail.email
FROM users
LEFT JOIN userEmail USING(userId)
GROUP BY users.userId
ORDER BY
userEmail.emailId
, users.username;
But, this does not function as I would like. (The GROUP BY is performed before the sorting, so the ORDER BY userEmail.emailId has nothing to do).
So my question is:
Is it possible to write the first query without making use of the subqueries?
I've searched and read the other questions on stackoverflow, but none seems to answer the question about this query pattern.
But this does a subquery for every row and is very inefficient
Firstly, do you have a query plan / timings that demonstrate this? The way you've done it (with the subselect) is pretty much the 'intuitive' way to do it. Many DBMS (though I'm not sure about MySQL) have optimisations for this case, and will have a way to execute the query only once.
Alternatively, you should be able to create a subtable with ONLY (user id, latest email id) tuples and JOIN onto that:
SELECT
users.userId
, users.username
, userEmail.email
FROM users
INNER JOIN
(SELECT userId, MAX(emailId) AS latestEmailId
FROM userEmail GROUP BY userId)
AS latestEmails
ON (users.userId = latestEmails.userId)
INNER JOIN userEmail ON
(latestEmails.latestEmailId = userEmail.emailId)
ORDER BY users.username;
If this is a query you do often, I recommend optimizing your tables to handle this.
I suggest adding an emailId column to the users table. When a user changes their email address, or sets an older email address as the primary email address, update the user's row in the users table to indicate the current emailId
Once you modify your code to do this update, you can go back and update your older data to set emailId for all users.
Alternatively, you can add an email column to the users table, so you don't have to do a join to get a user's current email address.