SQL DISTINCT Value Question - sql

How can I filter my results in a Query? example
I have 5 Records
John,Smith,apple
Jane,Doe,apple
Fred,James,apple
Bill,evans,orange
Willma,Jones,grape
Now I want a query that would bring me back 3 records with the DISTINCT FRUIT, BUT... and here is the tricky part, I still want the columns for First Name , Last Name. PS I do not care which of the 3 it returns mind you, but I need it to only return 3 (or what ever how many DISTINCT fruit there are.
ex return would be
John,Smith,apple
Bill,evans,orange
Willma,Jones,grape
Thanks in advance I've been banging my head on this all day.

Oddly enough, the best solution doesn't involve GROUP BY.
WITH DistinctFruit AS (
SELECT
ROW_NUMBER() OVER (PARTITION BY Fruit ORDER BY LastName) AS FruitNo,
LastName,
FirstName,
Fruit
FROM table)
SELECT FirstName, LastName, Fruit
FROM DistinctFruit
WHERE FruitNo = 1;

If you have a small amount of data (not tens of thousands of rows), you can do sub-queries.
select distinct t1.fruit as Fruit,
(select top 1 t2.lastname
from t1 as t2
where t1.fruit = t2.fruit
order by t2.lastname) as LastName,
(select top 1 t2.firstname
from t1 as t2
where t1.fruit = t2.fruit
order by t2.lastname, t2.firstname) as FirstName
from t1
Note the FirstName column is sorted the same as the LastName column. This will give you a matching last name with the correct first name.
Here is my test data:
create table t1
(firstname varchar(20),
lastname varchar(20),
fruit varchar(20))
insert into t1
values ('John','Smith','apple')
insert into t1
values ('Jane','Doe','apple')
insert into t1
values ('Fred','James','apple')
insert into t1
values ('Bill','evans','orange')
insert into t1
values ('Willma','Jones','grape')

Just another solution
select distinct x.*,fruit from t1
cross apply
(select top 1 firstname, lastname from t1 t2 where t1.fruit=t2.fruit) x

SELECT DISTINCT x.*,fruit FROM peopleFruit pf
CROSS APPLY
(SELECT TOP 1 firstname, lastname FROM peopleFruit pf1 WHERE pf.fruit=pf1.fruit) x

Related

SQL - How to Order By in UNION query

Is there a way to union two tables, but keep the rows from the first table appearing first in the result set? However orderby column is not in select query
For example:
Table 1
name surname
-------------------
John Doe
Bob Marley
Ras Tafari
Table 2
name surname
------------------
Lucky Dube
Abby Arnold
Result
Expected Result:
name surname
-------------------
John Doe
Bob Marley
Ras Tafari
Lucky Dube
Abby Arnold
I am bringing Data by following query
SELECT name,surname FROM TABLE 1 ORDER BY ID
UNION
SELECT name,surname FROM TABLE 2
The above query is not keeping track of order by after union.
P.S - I dont want to show ID in my select query
I am getting ORDER BY Column by joining tables. Following is my real query
SELECT tbl_Event_Type_Sort_Orders.Appraisal_Event_Type_ID AS Appraisal_Event_Type_ID , ISNULL(tbl_Appraisal_Event_Types.Appraisal_Event_Type_Display_Name, 'UnCategorized') AS Appraisal_Event_Type_Display_Name
INTO #temptbl
FROM tbl_Event_Type_Sort_Orders
INNER JOIN tbl_Appraisal_Event_Types
ON tbl_Event_Type_Sort_Orders.Appraisal_Event_Type_ID = tbl_Appraisal_Event_Types.Appraisal_Event_Type_ID
WHERE 1=1
AND User_Name='abc'
ORDER BY tbl_Event_Type_Sort_Orders.Sort_Order
SELECT * FROM #temptbl
UNION
SELECT DISTINCT (tbl_Appraisal_Event_Types.Appraisal_Event_Type_ID) AS Appraisal_Event_Type_ID , ISNULL(tbl_Appraisal_Event_Types.Appraisal_Event_Type_Display_Name, 'UnCategorized') AS Appraisal_Event_Type_Display_Name
FROM tbl_Appraisal_Event_Types
INNER JOIN tbl_Appraisal_Events
ON tbl_Appraisal_Event_Types.Appraisal_Event_Type_ID = tbl_Appraisal_Events.Event_Type_ID
INNER JOIN tbl_Appraisals
ON tbl_Appraisal_Events.Appraisal_ID = tbl_Appraisal_Events.Appraisal_ID
WHERE 1=1
AND ((tbl_Appraisals.Assigned_To_Staff_User) = 'abc' OR (tbl_Appraisals.Assigned_To_Staff_User2) = 'abc' OR (tbl_Appraisals.Assigned_To_Staff_User3) = 'abc')
Put a UNION ALL in a derived table. To keep duplicate elimination, do select distinct and also add a NOT EXISTS to second select to avoid returning same person twice if found in both tables:
select name, surname
from
(
select distinct name, surname, 1 as tno
from table1
union all
select distinct name, surname, 2 as tno
from table2 t2
where not exists (select * from table1 t1
where t2.name = t1.name
and t2.surname = t1.surname)
) dt
order by tno, surname, name
You can use a column for the table and one for the ID to order by:
SELECT x.name, x.surname FROM (
SELECT ID, TableID = 1, name, surname
FROM table1
UNION ALL
SELECT ID = -1, TableID = 2, name, surname
FROM table2
) x
ORDER BY x.TableID, x.ID
You can write as below, if you are ok with duplicate data then please use UNION ALL it will be faster:
SELECT NAME, surname FROM (
SELECT ID,name,surname FROM TABLE 1
UNION
SELECT ID,name,surname FROM TABLE 2 ) t ORDER BY ID
this will order the first row sets first then by anything you need
(haven't tested the code)
;with cte_1
as
(SELECT ID,name,surname,1 as table_id FROM TABLE 1
UNION
SELECT ID,name,surname,2 as table_id FROM TABLE 2 )
SELECT name, surname
FROM cte_1
ORDER BY table_id,ID
simply use a UNION clause with out order by.
SELECT name,surname FROM TABLE 1
UNION
SELECT name,surname FROM TABLE 2
if you wanted to order first table use the below query.
;WITH cte_1
AS
(SELECT name,surname,ROW_NUMBER()OVER(ORDER BY Id)b FROM TABLE 1 )
SELECT name,surname
FROM cte_1
UNION
SELECT name,surname
FROM TABLE 2

How do you make a column value slide to the right or the left using SQL?

This was an interview question.
If I have a table like this:
ID FirstName LastName
-- --------- --------
1 Aaron Aames
2 Malcolm Middle
3 Zamon Zorr
How can I get output that looks like this?
Aaron Aames
Aames Malcolm
Malcolm Middle
Middle Zamon
Zamon Zorr
Note: If you need a specific dialect to do it, use T-SQL.
Here is another way using a self-join.
CREATE TABLE temp (ID INT IDENTITY, FirstName VARCHAR(25), LastName VARCHAR(25));
INSERT INTO temp VALUES
(N'Aaron', N'Aames'),
(N'Malcolm', N'Middle'),
(N'Zamon', N'Zorr');
WITH names(ID, Name, ColNum) AS(
SELECT
ID, FirstName, 1
FROM temp
UNION ALL
SELECT
ID, LastName, 2
FROM temp
),
numbered AS(
SELECT
rn = ROW_NUMBER() OVER(ORDER BY ID, ColNum),
Name
FROM names
)
SELECT
n.Name AS Name1, n2.Name AS Name2
FROM numbered n
INNER JOIN numbered n2
ON n.rn = n2.rn - 1
DROP TABLE temp
http://sqlfiddle.com/#!3/d91c4/2
You have really high reputation, so this isn't just a "they asked me at an interview" kind of question.
There are several approaches. I think the one that I would take is a union all. Recognize that every other row is from the table. The rest are from joining one row to the next. So, that suggests:
select firstname, lastname
from likethis t
union all
select t.lastname, lead(t.firstname) over (order by id)
from likethis t
Alas, this gives you six rows instead of five, so that last one needs to be filtered out:
select firstname, lastname
from (select firstname, lastname
from likethis t
union all
select t.lastname, lead(t.firstname) over (order by id)
from likethis t
) t
where lastname is not null
order by firstname;
Note: I cannot determine if the sort criteria is alphabetical or by id; these solutions assume it is alphabetical.
Second note: I'm guessing this is not the solution they have in mind. They probably are looking for a self-join. But why bother when lead() does the work for you.
I think it could be solved this way:
SELECT
t.LastName AS FirstName, t2.FirstName AS LastName
FROM
t
INNER JOIN t as t2 ON t2.ID - 1 = t.ID
UNION
SELECT
t3.FirstName, t3.LastName
FROM t AS t3
As far as I've checked correctly it should give the final result set the following way:
Aaron Aames <= originates from t3: where t3.ID = 1
Aames Malcolm <= originates from (t1, t2) Join: where t2.ID = 2 and t.ID = 1
Malcolm Middle <= originates from t3: where t3.ID = 2
Middle Zamon <= originates from (t1, t2) Join: where t2.ID = 3 and t.ID = 2
Zamon Zorr <= originates from t3: where t3.ID = 3

TSQL merge 2 dataset with even number of rows next to eachother

What I am trying to accomplish:
Dataset 1
Name1
Name2
Name3
Dataset 2
Number1
Number2
Number3
will become 2 columns:
dataset1 dataset2
Name1 Number1
Name2 Number2
Name3 Number3
My datasets 1 & 2 will always have equal rows.
Which name linked to which number I don't care as long as two names are not linked to the same number and vice versa.
How can I solve this with SQL / SQL Server ?
If you don't want to add an identity column to the tables, you can use the ROW_NUMBER() function like this:
SELECT
T1.Col1,
T2.Col1
FROM
(SELECT Col1, ROW_NUMBER() OVER (ORDER BY Col1) AS N FROM Table1) T1
INNER JOIN
(SELECT Col1, ROW_NUMBER() OVER (ORDER BY Col1) AS N FROM Table2) T2
ON T1.N = T2.N
Here, replace Table1 and Table2 with the name of your tables, and replace Col1 with the name of the column (or columns) that you want to output from the two tables.
Add identity columns to both the tables and perform join on basis of these column
ALTER TABLE Table1
ADD ID INT IDENTITY(1,1) NOT NULL
ALTER TABLE Table2
ADD ID INT IDENTITY(1,1) NOT NULL
SELECT Table1.dataset1col , Table2.dataset2Col
From Table1 INNER JOIN Table2
ON Table1.ID = Table2.ID
This may work for you :
;WITH cte1 (name, rn)
AS (SELECT Name,
row_number()
OVER(
ORDER BY Name) rn
FROM Dataset1),
cte2 (Number, rn)
AS (SELECT Number,
row_number()
OVER(
ORDER BY Number) rn
FROM Dataset2)
SELECT name,
Number
FROM cte1
JOIN cte2
ON cte1.rn = cte2.rn
WITH Table1 AS
(
SELECT ROW_NUMBER() OVER (PARTITION BY ID ORDER BY Dataset1) as Rnk,Dataset1
FROM TA1
)
With Table2 AS
(
SELECT ROW_NUMBER() OVER (PARTITION BY ID ORDER BY Dataset2) as Rnk, Dataset2
FROM TA2
)
Select Table.Dataset1 as 'DataSet1', Table2.DataSet2 as 'DataSet2'
From Table1
inner join Table2 on Table1.Rnk = Table2.Rnk
Because you haven't added table name so I considered it as TA1 and TA2.
Another way of writing the query is:
select row_number() over (order by Names asc) as rownum,
Names
into #Temp1
from NameTable
select row_number() over (order by Numbers asc) as rownum,
Numbers
into #Temp2
from NumberTable
select Names, Numbers
from #Temp1
inner join #Temp2 on #Temp1.rownum = #Temp2.rownum
Demo
There are 3 possible solutions to this.
First: Use following trick (Warning: Use this in case of small datasets)
SELECT DISTINCT tbl1.col1, tbl2.col2
FROM
(SELECT FirstName AS col1, ROW_NUMBER() OVER (ORDER BY FirstName) Number FROM dbo.User) tbl1
INNER JOIN
(SELECT LastName AS col2, ROW_NUMBER() OVER (ORDER BY LastName) Number FROM dbo.User) tbl2
ON tbl1.Number = tbl2.Number
Second: Use variable tables to store result temporarily. This solution is for relatively large datasets. (approx records to 100s)
Third:
Use identitfy field in both tables as already mentioned by mmhasannn. But i will prefer this method least, as we need to modify our DB structure.
RECOMMENDED: Use variable tables approach

Show all rows that have certain columns duplicated

suppose I have following sql table
objid firstname lastname active
1 test test 0
2 test test 1
3 test1 test1 1
4 test2 test2 0
5 test2 test2 0
6 test3 test3 1
Now, the result I am interested in is as follows:
objid firstname lastname active
1 test test 0
2 test test 1
4 test2 test2 0
5 test2 test2 0
How can I achieve this?
I have tried the following query,
select firstname,lastname from table
group by firstname,lastname
having count(*) > 1
But this query gives results like
firstname lastname
test test
test2 test2
You've found your duplicated records but you're interested in getting all the information attached to them. You need to join your duplicates to your main table to get that information.
select *
from my_table a
join ( select firstname, lastname
from my_table
group by firstname, lastname
having count(*) > 1 ) b
on a.firstname = b.firstname
and a.lastname = b.lastname
This is the same as an inner join and means that for every record in your sub-query, that found the duplicate records you find everything from your main table that has the same firstseen and lastseen combination.
You can also do this with in, though you should test the difference:
select *
from my_table a
where ( firstname, lastname ) in
( select firstname, lastname
from my_table
group by firstname, lastname
having count(*) > 1 )
Further Reading:
A visual representation of joins from Coding Horror
Join explanation from Wikipedia
SELECT DISTINCT t1.*
FROM myTable AS t1
INNER JOIN myTable AS t2
ON t1.firstname = t2.firstname
AND t1.lastname = t2.lastname
AND t1.objid <> t2.objid
This will output every row which has a duplicate, basing on firstname and lastname.
Here's a little more legible way to do Ben's first answer:
WITH duplicates AS (
select firstname, lastname
from my_table
group by firstname, lastname
having count(*) > 1
)
SELECT a.*
FROM my_table a
JOIN duplicates b ON (a.firstname = b.firstname and a.lastname = b.lastname)
SELECT user_name,email_ID
FROM User_Master WHERE
email_ID
in (SELECT email_ID
FROM User_Master GROUP BY
email_ID HAVING COUNT(*)>1)
nice option get all duplicated value from tables
select * from Employee where Name in (select Name from Employee group by Name having COUNT(*)>1)
This is the easiest way:
SELECT * FROM yourtable a WHERE EXISTS (SELECT * FROM yourtable b WHERE a.firstname = b.firstname AND a.secondname = b.secondname AND a.objid <> b.objid)
If you want to print all duplicate IDs from the table:
select * from table where id in (select id from table group By id having count(id)>1)
I'm surprised that there is no answer using Window function. I just came across this use case and this helped me.
select t.objid, t.firstname, t.lastname, t.active
from
(
select t.*, count(*) over (partition by firstname, lastname) as cnt
from my_table t
) t
where t.cnt > 1;
Fiddle - https://dbfiddle.uk/?rdbms=sqlserver_2017&fiddle=c0cc3b679df63c4d7d632cbb83a9ef13
The format goes like
select
tbl.relevantColumns
from
(
select t.*, count(*) over (partition by key_columns) as cnt
from desiredTable t
) as tbl
where tbl.cnt > 1;
This format selects whatever columns you require from the table (sometimes all columns) where the count > 1 for the key_columns being used to identify the duplicate rows. key_columns can be any number of columns.
This answer may not be great one, but I think it is simple to understand.
SELECT * FROM table1 WHERE (firstname, lastname) IN ( SELECT firstname, lastname FROM table1 GROUP BY firstname, lastname having count() > 1);
This Query returns dupliacates
SELECT * FROM (
SELECT a.*
FROM table a
WHERE (`firstname`,`lastname`) IN (
SELECT `firstname`,`lastname` FROM table
GROUP BY `firstname`,`lastname` HAVING COUNT(*)>1
)
)z WHERE z.`objid` NOT IN (
SELECT MIN(`objid`) FROM table
GROUP BY `firstname`,`lastname` HAVING COUNT(*)>1
)
Please try
WITH cteTemp AS (
SELECT EmployeeID, JoinDT,
row_number() OVER(PARTITION BY EmployeeID, JoinDT ORDER BY EmployeeID) AS [RowFound]
FROM dbo.Employee
)
SELECT * FROM cteTemp WHERE [RowFound] > 1 ORDER BY JoinDT

Find duplicates in SQL

I have a large table with the following data on users.
social security number
name
address
I want to find all possible duplicates in the table
where the ssn is equal but the name is not
My attempt is:
SELECT * FROM Table t1
WHERE (SELECT count(*) from Table t2 where t1.name <> t2.name) > 1
A grouping on SSN should do it
SELECT
ssn
FROM
Table t1
GROUP BY
ssn
HAVING COUNT(*) > 1
..or if you have many rows per ssn and only want to find duplicate names)
...
HAVING COUNT(DISTINCT name) > 1
Edit, oops, misunderstood
SELECT
ssn
FROM
Table t1
GROUP BY
ssn
HAVING MIN(name) <> MAX(name)
This will handle more than two records with duplicate ssn's:
select count(*), name from table t1, (
select count(*) ssn_count, ssn
from table
group by ssn
having count(*) > 1
) t2
where t1.ssn = t2.ssn
group by t1.name
having count(*) <> t2.ssn_count