For about a year now, we’ve been allowing our users to login with usernames and/or email addresses that are not unique (though each user does have a unique id). Although the system handles duplicate usernames/emails elegantly, we’ve decided to finally enforce unique usernames and email addresses. I’ve been tasked with generating a table in MySQL that will show the duplicates and the tables in which a duplicate’s id is being used (i.e. the tables dependent on the duplicate’s user id, using 1 for true and 0 for false). This table will then be used as a reference once duplicate data is marked for deletion. In short, I’m looking to generate a table something like this:
| User_id |
Username |
Email |
Exists_in_Table1 |
Exists_in_Table2 |
Exists_in_Table3 |
-----------------------------------------------------------------------------------------------------------
| 0001.....|
test1.........|
email.|
0..........................|
0..........................|
1..........................|
| 0002.....|
test2.........|
email.|
0..........................|
1..........................|
1..........................|
| 0003.....|
test3.........|
email.|
1..........................|
1..........................|
1..........................|
It doesn’t matter much how this is accomplished. Since my SQL skills are somewhat lacking, I intended to do this programmatically using PHP and a number of simple SQL queries. However, I believe a single SQL query or a series of queries (without the use of PHP) is the cleanest approach. I know how to query for duplicates, but I can’t seem to figure out how to query multiple tables and join them by the user id in an appropriate manner. I appreciate any and all help with this. Thank you.
SELECT u.User_id, u.Username, u.Email,
IF(t1.User_id IS NULL, 0, 1) AS Exists_in_Table1,
IF(t2.User_id IS NULL, 0, 1) AS Exists_in_Table2,
IF(t3.User_id IS NULL, 0, 1) AS Exists_in_Table3
FROM Users u
LEFT OUTER JOIN Table1 t1 USING (User_id)
LEFT OUTER JOIN Table2 t2 USING (User_id)
LEFT OUTER JOIN Table3 t3 USING (User_id);
Related
Imagine I have four tables:
Agents
| agent_id | agent_name |
Teams
| team_id | team_name |agent_id |
Menu
| menu_id | menu_name |
Team_assignment
| menu_id | team_id|
I need to write a query that selects all agents that are assigned to all teams and all queues and disregard the ones that are not assigned to a queue. Note that every agent is always assigned to a team but it's not necessary that the agent is assigned to a queue.
Since you stated that this is for a school project, I'll try to stay within the guidelines mentioned here: How do I ask and answer homework questions?
From what I can make up from your question you basically want to select all the data from the different tables joining them on one of the columns in the first table a equals = a column from the second table b. Most commonly where the primary key from one table equals the foreign key from another table. Then you want to add conditions to your query where for example some column from table 1 equals = some value.
Do you catch my drift? 😏
No?
You want to SELECT a.*, b.* everything FROM table Agents a JOINing table Teams b ON column a.agent_id being equal to = column b.agent_id
You probably want to JOIN another table, lets say Team_assignment c ON column c.team_id being equal to = b.team_id.
You can JOIN more tables in the same way.
Sadly, I do not understand what you mean by the ones that are not assigned to a queue but it sounds like a condition that your query needs to match, so WHERE the potential column a.is_assigned_to_queue equals = true AND for example a.agent_name IS NOT NULL
If you got this far you should have been able to catch onto my drift 😎, congrats. This way hopefully you also got a better understanding of how building query works, instead of me just blatantly giving you the answer and you learn nothing from it. Like this:
SELECT a.*, b.*, c.*, d.* FROM Agents a
JOIN Teams b ON a.agent_id = b.agent_id
JOIN Team_assignment c ON c.team_id = b.team_id
JOIN Menu d ON d.menu_id = c.menu_id
WHERE a.is_assigned_to_queue = true
AND a.agent_name IS NOT NULL;
Now it is possible copy and pasting the snippet above will not work, that is because I'm not an SQL expert and I had to refresh my old memories about SQL myself by googling it. But that's the nice part of actually learning it. Being able to explain it to someone else :)
what is difference between self join and inner join
I find it helpful to think of all of the tables in a SELECT statement as representing their own data sets.
Before you've applied any conditions you can think of each data set as being complete (the entire table, for instance).
A join is just one of several ways to begin refining those data sets to find the information that you really want.
Though a database schema may be designed with certain relationships in mind (Primary Key <-> Foreign Key) these relationships really only exist in the context of a particular query. The query writer can relate whatever they want to whatever they want. I'll give an example of this later...
An INNER JOIN relates two tables to each other. There are often multiple JOIN operations in one query to chain together multiple tables. It can get as complicated as it needs to. For a simple example, consider the following three tables...
STUDENT
| STUDENTID | LASTNAME | FIRSTNAME |
------------------------------------
1 | Smith | John
2 | Patel | Sanjay
3 | Lee | Kevin
4 | Jackson | Steven
ENROLLMENT
| ENROLLMENT ID | STUDENTID | CLASSID |
---------------------------------------
1 | 2 | 3
2 | 3 | 1
3 | 4 | 2
CLASS
| CLASSID | COURSE | PROFESSOR |
--------------------------------
1 | CS 101 | Smith
2 | CS 201 | Ghandi
3 | CS 301 | McDavid
4 | CS 401 | Martinez
The STUDENT table and the CLASS table were designed to relate to each other through the ENROLLMENT table. This kind of table is called a Junction Table.
To write a query to display all students and the classes in which they are enrolled one would use two inner joins...
SELECT stud.LASTNAME, stud.FIRSTNAME, class.COURSE, class.PROFESSOR
FROM STUDENT stud
INNER JOIN ENROLLMENT enr
ON stud.STUDENTID = enr.STUDENTID
INNER JOIN CLASS class
ON class.CLASSID = enr.CLASSID;
Read the above closely and you should see what is happening. What you will get in return is the following data set...
| LASTNAME | FIRSTNAME | COURSE | PROFESSOR |
---------------------------------------------
Patel | Sanjay | CS 301 | McDavid
Lee | Kevin | CS 101 | Smith
Jackson | Steven | CS 201 | Ghandi
Using the JOIN clauses we've limited the data sets of all three tables to only those that match each other. The "matches" are defined using the ON clauses. Note that if you ran this query you would not see the CLASSID 4 row from the CLASS table or the STUDENTID 1 row from the STUDENT table because those IDs don't exist in the matches (in this case the ENROLLMENT table). Look into "LEFT"/"RIGHT"/"FULL OUTER" JOINs for more reading on how to make that work a little differently.
Please note, per my comments on "relationships" earlier, there is no reason why you couldn't run a query relating the STUDENT table and the CLASS table directly on the LASTNAME and PROFESSOR columns. Those two columns match in data type and, well look at that! They even have a value in common! This would probably be a weird data set to get in return. My point is it can be done and you never know what needs you might have in the future for interesting connections in your data. Understand the design of the database but don't think of "relationships" as being rules that can't be ignored.
In the meantime... SELF JOINS!
Consider the following table...
PERSON
| PERSONID | FAMILYID | NAME |
--------------------------------
1 | 1 | John
2 | 1 | Brynn
3 | 2 | Arpan
4 | 2 | Steve
5 | 2 | Tim
6 | 3 | Becca
If you felt so inclined as to make a database of all the people you know and which ones are in the same family this might be what it looks like.
If you wanted to return one person, PERSONID 4, for instance, you would write...
SELECT * FROM PERSON WHERE PERSONID = 4;
You would learn that he is in the family with FAMILYID 2. Then to find all of the PERSONs in his family you would write...
SELECT * FROM PERSON WHERE FAMILYID = 2;
Done and done! SQL, of course, can accomplish this in one query using, you guessed it, a SELF JOIN.
What really triggers the need for a SELF JOIN here is that the table contains a unique column (PERSONID) and a column that serves as sort of a "Category" (FAMILYID). This concept is called Cardinality and in this case represents a one to many or 1:M relationship. There is only one of each PERSON but there are many PERSONs in a FAMILY.
So, what we want to return is all of the members of a family if one member of the family's PERSONID is known...
SELECT fam.*
FROM PERSON per
JOIN PERSON fam
ON per.FamilyID = fam.FamilyID
WHERE per.PERSONID = 4;
Here's what you would get...
| PERSONID | FAMILYID | NAME |
--------------------------------
3 | 2 | Arpan
4 | 2 | Steve
5 | 2 | Tim
Let's note a couple of things. The words SELF JOIN don't occur anywhere. That's because a SELF JOIN is just a concept. The word JOIN in the query above could have been a LEFT JOIN instead and different things would have happened. The point of a SELF JOIN is that you are using the same table twice.
Consider my soapbox from before on data sets. Here we have started with the data set from the PERSON table twice. Neither instance of the data set affects the other one unless we say it does.
Let's start at the bottom of the query. The per data set is being limited to only those rows where PERSONID = 4. Knowing the table we know that will return exactly one row. The FAMILYID column in that row has a value of 2.
In the ON clause we are limiting the fam data set (which at this point is still the entire PERSON table) to only those rows where the value of FAMILYID matches one or more of the FAMILYIDs of the per data set. As we discussed we know the per data set only has one row, therefore one FAMILYID value. Therefore the fam data set now contains only rows where FAMILYID = 2.
Finally, at the top of the query we are SELECTing all of the rows in the fam data set.
Voila! Two queries in one.
In conclusion, an INNER JOIN is one of several kinds of JOIN operations. I would strongly suggest reading further into LEFT, RIGHT and FULL OUTER JOINs (which are, collectively, called OUTER JOINs). I personally missed a job opportunity for having a weak knowledge of OUTER JOINs once and won't let it happen again!
A SELF JOIN is simply any JOIN operation where you are relating a table to itself. The way you choose to JOIN that table to itself can use an INNER JOIN or an OUTER JOIN. Note that with a SELF JOIN, so as not to confuse your SQL engine you must use table aliases (fam and per from above. Make up whatever makes sense for your query) or there is no way to differentiate the different versions of the same table.
Now that you understand the difference open your mind nice and wide and realize that one single query could contain all different kinds of JOINs at once. It's just a matter of what data you want and how you have to twist and bend your query to get it. If you find yourself running one query and taking the result of that query and using it as the input of another query then you can probably use a JOIN to make it one query instead.
To play around with SQL try visiting W3Schools.com There is a locally stored database there with a bunch of tables that are designed to relate to each other in various ways and it's filled with data! You can CREATE, DROP, INSERT, UPDATE and SELECT all you want and return the database back to its default at any time. Try all sorts of SQL out to experiment with different tricks. I've learned a lot there, myself.
Sorry if this was a little wordy but I personally struggled with the concept of JOINs when I was starting to learn SQL and explaining a concept by using a bunch of other complex concepts bogged me down. Best to start at the bottom sometimes.
I hope it helps. If you can put JOINs in your back pocket you can work magic with SQL!
Happy querying!
A self join joins a table to itself. The employee table might be joined to itself in order to show the manager name and the employee name in the same row.
An inner join joins any two tables and returns rows where the key exists in both tables. A self join can be an inner join (most joins are inner joins and most self joins are inner joins). An inner join can be a self join but most inner joins involve joining two different tables (generally a parent table and a child table).
An inner join (sometimes called a simple join) is a join of two or more tables that returns only those rows that satisfy the join condition.
A self join is a join of a table to itself. This table appears twice in the FROM clause and is followed by table aliases that qualify column names in the join condition. To perform a self join, Oracle Database combines and returns rows of the table that satisfy the join condition.
I have a database in MS Access, and I ran into a problem with empty values. I have 3 tables that are connected to eachother. Lets say Table1 contains people, Table 2 contains Phone numbers, and Table 3 connects table 1 and 2, having both their ID's so I could later see what person has what numbers by using the IDs.
What I want from access is that it would display a person even if he/she doesn't have a number assigned, and also a number when there are no people assigned to it.
Something like this:
Persons_name |Phone_number
--------------------------
Fred | 123
| 222
Anna |
The tables look something like this:
People People_phones Phones
------------- -------------- ------------
ID ID ID
Persons_name People_ID Phone_number
Phones_ID
So far I've managed to get access to show either table 1's null values or table 2's null values, but not both.
As E Mett indicated above, your looking for a full outer join which doesn't handle directly. Here is an example of what he's suggesting:
How do I write a full outer join query in access
JB
In sql jargon what you are looking for is an outer join.
This is unfortunately not available in Ms Access because it is rarely needed.
You should create two queries, one using a left join and the other with a right join.
Then use the UNION keyword to combine the results
I'm trying to wrap my head around SQL and I need some help figuring out how to do the following query in PostgreSQL 9.3.
I have a users table, and a friends table that lists user IDs and the user IDs of friends in multiple rows.
I would like to query the user table, and ORDER BY the number of mutual friends in common to a user ID.
So, the friends table would look like:
user_id | friend_user_id
1 | 4
1 | 5
2 | 10
3 | 7
And so on, so user 1 lists 4 and 5 as friends, and user 2 lists 10 as a friend, so I want to sort by the highest count of user 1 in friend_user_id for the result of user_id in the select.
The Postgres way to do this:
SELECT *
FROM users u
LEFT JOIN (
SELECT user_id, count(*) AS friends
FROM friends
) f USING (user_id)
ORDER BY f.friends DESC NULLS LAST, user_id -- as tiebreaker
The keyword AS is just noise for table aliases. But don't omit it from column aliases. The manual on "Omitting the AS Key Word":
In FROM items, both the standard and PostgreSQL allow AS to be omitted
before an alias that is an unreserved keyword. But this is impractical
for output column names, because of syntactic ambiguities.
Bold emphasis mine.
ISNULL() is a custom extension of MySQL or SQL Server. Postgres uses the SQL-standard function COALESCE(). But you don't need either here. Use the NULLS LAST clause instead, which is faster and cleaner. See:
PostgreSQL sort by datetime asc, null first?
Multiple users will have the same number of friends. These peers would be sorted arbitrarily. Repeated execution might yield different sort order, which is typically not desirable. Add more expressions to ORDER BY as tiebreaker. Ultimately, the primary key resolves any remaining ambiguity.
If the two tables share the same column name user_id (like they should) you can use the syntax shortcut USING in the join clause. Another standard SQL feature. Welcome side effect: user_id is only listed once in the output for SELECT *, as opposed to when joining with ON. Many clients wouldn't even accept duplicate column names in the output.
Something like this?
SELECT * FORM [users] u
LEFT JOIN (SELECT user_id, COUNT(*) friends FROM fields) f
ON u.user_id = f.user_id
ORDER BY ISNULL(f.friends,0) DESC
Say I have two tables
User
-----
id
first_name
last_name
User_Prefs
-----
user_id
pref
Sample data in User_Prefs might be
user_id | pref
2 | SMS_NOTIFICATION
2 | EMAIL_OPT_OUT
2 | PINK_BACKGROUND_ON_FRIDAYS
And some users might have no corresponding rows in User_Prefs.
I need to query for the first name and last name of any user who does NOT have EMAIL_OPT_OUT as one of their (possibly many, possibly none) User_Pref rows.
SELECT DISTINCT u.* from User u
LEFT JOIN User_Prefs up ON (u.id=up.user_id)
WHERE up.pref<>'EMAIL_OPT_OUT'
gets me everyone who has at least one row that isn't "EMAIL_OPT_OUT", which of course is not what I want. I want everyone with no rows that match "EMAIL_OPT_OUT".
Is there a way to have the join type and the join conditions filter out the rows I want to leave out here? Or do I need a sub-query?
I personally think a "where not exists" type of clause might be easier to read, but here's a query with a join that does the same thing.
select distinct u.* from User u
left join User_Prefs up ON u.id = up.user_id and up.pref = 'EMAIL_OPT_OUT'
where up.user_id is null
Why not have your user preferences stored in the user table as boolean fields? This would simplify your queries significantly.
SELECT * FROM User WHERE EMAIL_OPT_OUT = false