SQL Query - Displaying the same column twice under different conditions - sql

I am wondering if the following is possible. Say I have the following table:
ID | NAME
1 | John
2 | Bob
3 | John
4 | Bob
Is it possible to run a query that results in the following:
NAME| ID1 | ID2
John | 1 | 3
Bob | 2 | 4
EDIT
Sorry for the confusion. My question addresses instances where I need to handle the possibility of 2 duplicates for a large data set.

Assuming exactly 2 duplicates
SELECT
NAME,
MIN(ID) as ID1,
MAX(ID) as ID2
FROM Table t
GROUP BY NAME

This should work. Note that the subquery screens out all names that don't have exactly two ids.
select name,min(id) as id1,max(id) as id2
from table
join(
select name
from table
group by name
having count(1)=2
)names
using(name)
group by name;

If there are exactly two rows with each name, then the following should work:
SELECT a.name,
a.id as id1,
b.id as id2
FROM the_table a
JOIN the_table b ON a.name = b.name AND a.id <> b.id

Related

Why no similar ids in the results set when query with a correlated query inside where clause

I have a table with columns id, forename, surname, created (date).
I have a table such as the following:
ID | Forename | Surname | Created
---------------------------------
1 | Tom | Smith | 2008-01-01
1 | Tom | Windsor | 2008-02-01
2 | Anne | Thorn | 2008-01-05
2 | Anne | Baker | 2008-03-01
3 | Bill | Sykes | 2008-01-20
Basically, I want this to return the most recent name for each ID, so it would return:
ID | Forename | Surname | Created
---------------------------------
1 | Tom | Windsor | 2008-02-01
2 | Anne | Baker | 2008-03-01
3 | Bill | Sykes | 2008-01-20
I get the desired result with this query.
SELECT id, forename, surname, created
FROM name n
WHERE created = (SELECT MAX(created)
FROM name
GROUP BY id
HAVING id = n.id);
I am getting the result I want but I fail to understand WHY THE IDS ARE NOT BEING REPEATED in the result set. What I understand about correlated subquery is it takes one row from the outer query table and run the inner subquery. Shouldn't it repeat "id" when ids repeat in the outer query? Can someone explain to me what exactly is happening behind the scenes?
First, your subquery does not need a GROUP BY. It is more commonly written as:
SELECT n.id, n.forename, n.surname, n.created
FROM name n
WHERE n.created = (SELECT MAX(n2.created)
FROM name n2
WHERE n2.id = n.id
);
You should get in the habit of qualifying all column references, especially when your query has multiple table references.
I think you are asking why this works. Well, each row in the outer query is tested for the condition. The condition is: "is my created the same as the maximum created for all rows in the name table with the same id". In your data, only one row per id matches that condition, so ids are not repeated.
You can also consider joining the tables by created vs max(created) column values :
SELECT n.id, n.forename, n.surname, n.created
FROM name n
RIGHT JOIN ( SELECT id, MAX(created) as created FROM name GROUP BY id ) t
ON n.created = t.created;
or using IN operator :
SELECT id, forename, surname, created
FROM name n
WHERE ( id, created ) IN (SELECT id, MAX(created)
FROM name
GROUP BY id );
or using EXISTS with HAVING clause in the subquery :
SELECT id, forename, surname, created
FROM name n
WHERE EXISTS (SELECT id
FROM name
GROUP BY id
HAVING MAX(created) = n.created
);
Demo

postgresql find duplicates in column with ID

For instance, I have a table say "Name" with duplicate records in it:
Id | Firstname
--------------------
1 | John
2 | John
3 | Marc
4 | Jammie
5 | John
6 | Marc
How can I fetch duplicate records and display them with their receptive primary key ID?
Use Count()Over() window aggregate function
Select * from
(
select Id, Firstname, count(1)over(partition by Firstname) as Cnt
from yourtable
)a
Where Cnt > 1
SELECT t.*
FROM t
INNER JOIN
(SELECT firstname
FROM t
GROUP BY firstname
HAVING COUNT(*) > 1) sub
ON t.firstname = sub.firstname
A sub-query would do the trick. Select the first names that are found more than once your table, t. Then join these names back to the main table to pull in the primary key.

How to make Oracle DB SQL print out two columns side by side with no correlation?

I currently have two queries that output different numbers of names and I want to list them side by side. Say:
select *
from group A;
where.....................;
select *
from group B;
where ..................;
A gives me: John, Ana, Joseph while B gives me Bob, Juan, Nick, Jess
Then if I do:
select name1, name2
from
(select name1
from group A;
where.....................),
(select name2
from group B;
where ..................)
;
I want to get
name1 name2
-----------------
| John | Bob |
| Ana | Juan|
|Joseph | Nick |
| | Jess |
but so far my outputs are grouped by first column name, so for each name in name1, there are 4 names corresponding in name2, like such:
name1 name2
---------------------
| John | Bob |
| John | Juan|
| John | Nick |
| John | Jess |
| Ana | Bob |
| Ana | Juan|
| Ana | Nick |
| Ana | Jess |
...
Anyway I can get my desired output I mentioned above?
Thanks!
You can do this with a full outer join ("outer" to allow for lists of different length, and "full" since you don't know which one is longer). In a slightly more complicated query, you can also order your two lists (for example alphabetically, or by any other criteria you may have in your data).
For example:
with A (name1) as (select 'John' from dual union all
select 'Ana' from dual union all
select 'Joseph' from dual),
B (name2) as (select 'Bob' from dual union all
select 'Juan' from dual union all
select 'Nick' from dual union all
select 'Jess' from dual)
select name1, name2
from (select name1, row_number() over (order by name1) rn from A) aaa
full outer join
(select name2, row_number() over (order by name2) rn from B) bbb
on aaa.rn = bbb.rn;
Output:
NAME1 NAME2
------ -----
Ana Bob
John Jess
Joseph Juan
Nick
You can assign a row number to each row in each subquery and use that as a pseudo-join-condition:
with a as (
select name1, row_number() over (order by ...) as rn
from group_a
where ...
),
b as (
select name2, row_number() over (order by ...) as rn
from group_b
where ...
)
select a.name1, b.name2
from a
full outer join b on b.rn = a.rn;
I've used subquery factoring (CTEs, or 'with' clauses) to make it easier to use (or follow, anyway) a full outer join between the two result sets, but that isn't necessary - you can use inline views if you prefer; and I've done a full outer join because you expect the two subqueries to have different numbers of rows, but you may not know which subquery will return more rows.
You aren't currently ordering the results in either subquery, so the values are in an arbitrary and indeterminate order. If you did want them ordered you'd do that inside the row_number() over (order by ...) part, replacing the ... with the ordering criteria - e.g. (order by name1) If you really don't want them ordered in a specific way you can order by a constant, or null, or a random value.

How to select all attributes (*) with distinct values in a particular column(s)?

Here is link to the w3school database for learners:
W3School Database
If we execute the following query:
SELECT DISTINCT city FROM Customers
it returns us a list of different City attributes from the table.
What to do if we want to get all the rows like that we get from SELECT * FROM Customers query, with unique value for City attribute in each row.
DISTINCT when used with multiple columns, is applied for all the columns together. So, the set of values of all columns is considered and not just one column.
If you want to have distinct values, then concatenate all the columns, which will make it distinct.
Or, you could group the rows using GROUP BY.
You need to select all values from customers table, where city is unique. So, logically, I came with such query:
SELECT * FROM `customers` WHERE `city` in (SELECT DISTINCT `city` FROM `customers`)
I think you want something like this:
(change PK field to your Customers Table primary key or index like Id)
In SQL Server (and standard SQL)
SELECT
*
FROM (
SELECT
*, ROW_NUMBER() OVER (PARTITION BY City ORDER BY PK) rn
FROM
Customers ) Dt
WHERE
(rn = 1)
In MySQL
SELECT
*
FORM (
SELECT
a.City, a.PK, count(*) as rn
FROM
Customers a
JOIN
Customers b ON a.City = b.City AND a.PK >= b.PK
GROUP BY a.City, a.PK ) As DT
WHERE (rn = 1)
This query -I hope - will return your Cities distinctly and also shows other columns.
You can use GROUP BY clause for getting distinct values in a particular column. Consider the following table - 'contact':
+---------+------+---------+
| id | name | city |
+---------+------+---------+
| 1 | ABC | Chennai |
+---------+------+---------+
| 2 | PQR | Chennai |
+---------+------+---------+
| 3 | XYZ | Mumbai |
+---------+------+---------+
To select all columns with distinct values in City attribute, use the following query:
SELECT *
FROM contact
GROUP BY city;
This will give you the output as follows:
+---------+------+---------+
| id | name | city |
+---------+------+---------+
| 1 | ABC | Chennai |
+---------+------+---------+
| 3 | XYZ | Mumbai |
+---------+------+---------+

SQL query - select uncommon values from 2 tables

Today I was asked following question in my interview for a QA and because of incorrect query, I did not get selected. From then on, my mind is itching to get the correct answer for the following scenario:
I was given following 2 tables:
Tabel A | |Table B
--------- ----------
**ID** **ID**
-------- -----------
0 | | 5 |
1 | | 6 |
2 | | 7 |
3 | | 8 |
4 | | 9 |
5 | | 10|
6 | -----
----
And following output was expected using an SQL query:
**ID**
--------
| 0 |
| 1 |
| 2 |
| 3 |
| 4 |
| 7 |
| 8 |
| 9 |
| 10 |
--------
Thanks everyone, I really like this forum and from now on will be active here to learn more and more about SQL. I would like to make it my strong point rather a weak so as not to get kicked out of other interviews. I know there is a long way to go. However beside all of your responses, I came to draft the following query and would like to know from the experts here of their opinion about my query (and the reason why they think of what they think):
BTW the query has worked on MSSQLSRV-2008 (using Union or Union All, didn't matter to the result I got):
select ID from A where ID not in (5,6)
union
select ID from B where ID not in (5,6)
Is this really an efficient query?
If you want values in only one of two tables, I would use a full outer join and condition:
select coalesce(a.id, b.id)
from tableA a full outer join
tableB b
on a.id = b.id
where a.id is null or b.id is null;
Of course, if the job at a company that uses MS Access or MySQL, then this isn't the right answer, because these systems don't support full outer join. You can also do this in more complicated ways using union all and aggregation or even with other methods.
EDIT:
Here is another method:
select id
from (select a.id, 1 as isa, 0 as isb from tablea union all
select b.id, 0, 1 from tableb
) ab
group by id
having sum(isa) = 0 or sum(isb) = 0;
And another:
select id
from tablea
where a.id not in (select id from tableb)
union all
select id
from tableb
where b.id not in (select id from tablea);
As I think about this, it is a pretty good interview question (even though I've just given three reasonable answers).
Edit: See Gordon answer above for a better request, this is very inneficient way of doing what you want.
I think this should do the trick :
(SELECT * FROM A WHERE NOT id IN (SELECT A.id FROM A, B WHERE A.id = B.id))
UNION
(SELECT * FROM B WHERE NOT id IN (SELECT A.id FROM A, B WHERE A.id = B.id))
You could avoid the duplication of SELECT A.id ...by using a temporary table.
without full outer joins...
Select id
from (Select id from tableA
Union all
Select id from tableB) Z
group by id
Having count(*) = 1
or using Except and Intersect .....
(Select id from tableA Except Select id from tableB)
Union
(Select id from tableB Except Select id from tableA)
or ....
(Select id from tableA union Select id from tableB)
Except
(Select id from tableA intersect Select id from tableB)