Querying SQL table with different values in same column with same ID - sql

I have an SQL Server 2012 table with ID, First Name and Last name. The ID is unique per person but due to an error in the historical feed, different people were assigned the same id.
------------------------------
ID FirstName LastName
------------------------------
1 ABC M
1 ABC M
1 ABC M
1 ABC N
2 BCD S
3 CDE T
4 DEF T
4 DEG T
In this case, the people with ID’s 1 are different (their last name is clearly different) but they have the same ID. How do I query and get the result? The table in this case has millions of rows. If it was a smaller table, I would probably have queried all ID’s with a count > 1 and filtered them in an excel.
What I am trying to do is, get a list of all such ID's which have been assigned to two different users.
Any ideas or help would be very appreciated.
Edit: I dont think I framed the question very well.
There are two ID's which are present multiple time. 1 and 4. The rows with id 4 are identical. I dont want this in my result. The rows with ID 1, although the first name is same, the last name is different for 1 row. I want only those ID's whose ID is same but one of the first or last names is different.
I tried loading ID's which have multiple occurrences into a temp table and tried to compare it against the parent table albeit unsuccessfully. Any other ideas that I can try and implement?

SELECT
ID
FROM
<<Table>>
GROUP BY
ID
HAVING
COUNT(*) > 1;

SELECT *
FROM myTable
WHERE ID IN (
SELECT ID
FROM myTable
GROUP BY ID
HAVING MAX(LastName) <> MIN(LastName) OR MAX(FirstName) <> MIN(FirstName)
)
ORDER BY ID, LASTNAME

Related

How to create a new table that only keeps rows with more than 5 data records under the same id in Bigquery

I have a table like this:
Id
Date
Steps
Distance
1
2016-06-01
1000
1
There are over 1000 records and 50 Ids in this table, most ids have about 20 records, and some ids only have 1, or 2 records which I think are useless.
I want to create a table that excludes those ids with less than 5 records.
I wrote this code to find the ids that I want to exclude:
SELECT
Id,
COUNT(Id) AS num_id
FROM `table`
GROUP BY
Id
ORDER BY
num_id
Since there are only two ids I need to exclude, I use WHERE clause:
CREATE TABLE `` AS
SELECT
*
FROM ``
WHERE
Id <> 2320127002
AND Id <> 7007744171
Although I can get the result I want, I think there are better ways to solve this kind of problem. For example, if there are over 20 ids with less than 5 records in this table, what shall I do? Thank you.
Consider this:
CREATE TABLE `filtered_table` AS
SELECT *
FROM `table`
WHERE TRUE QUALIFY COUNT(*) OVER (PARTITION BY Id) >= 5
Note: You can remove WHERE TRUE if it runs successfully without it.

Is there a way to return a partial blank row in SQL when joining tables through automation?

For my project, I have 2 tables. Initially I inner joined both tables (table 1 and 2) via an inner join. However, I wanted an outcome as seen in table 4 where the repeated value from table 1 is left blank instead.
For table 2, the number of rows will always vary. There will always only be 1 department ID attached to numerous function IDs. Is there a query then where regardless of the number of function IDs, the department ID will only appear as the first row as seen in table 4?
(I think to many, this might seem weird and frankly not clean data, but for mail merges within word, it is easier to field code when the data is presented this way to refrain sections from 'reprinting itself'.)
Current Code:
SELECT Table1.*, Table2.* FROM
INNER JOIN Table 2 ON Table1.DepartmentID = Table2.DepartmentID
Table 1:
Department ID
Department
1
XYZ
Table 2:
Department ID
Function ID
Function
1
1
ABC
1
2
DEF
Table 3 (inner joined):
Department ID
Department
FunctionID
Function
1
XYZ
1
ABC
1
XYZ
2
DEF
Table 4 (desired):
Department ID
Department
FunctionID
Function
1
XYZ
1
ABC
2
DEF

Count and Group By Returning Different Sets - Halfway solved :/

Can anyone help? My trouble is that people might have the same id or different and have different name spellings. If I group by id (which is not the primary key) I get a different amount of rows than if I group by ID and name. How do I just group by ID, while still having the ID and Name in the select?
Create Table Client(ID Int, Name Varchar(15))
Insert Into Client VALUES(11,'Batman'),(22,'Batman'),(33,'Robin'),(44,'Joker'),(44,'The Joker'),(33,'Robin')
Select Count(ID) From Client
Select * From Client
--This returns 4 rows as it should
Select Count (ID)
From Client
Group By ID
--This returns 5 rows because Joker and The Joker have different names, but the same ID. I want to count by ID and not the name, since so many have typos.
Select Count (ID), [Name] , ID
From Client
Group By ID, [Name]
How do I do this and have it work?
Select Count (ID), [Name] , ID
From Client
Group By ID --<< Always throws and error unless I include Name, which
--returns too many rows.
It should return
Count Name ID
1 Batman 11
1 Batman 22
2 Joker 44 --<< Correct
2 Robin 33
And not
Count Name ID
1 Batman 11
1 Batman 22
2 Robin 33
1 Joker 44 --Wrong
1 The Joker 44 --Wrong
using select count(*) from ClientLog will tell you exactly how many records there are in your table. If your ID field is the primary key, then select count(ID) from ClientLog should return the same number.
Your first query is a little confusing, because you're grouping by ID but not displaying the ID. So you're likely getting a row for each record, where the row value is 1.
Your 2nd query is also a bit confusing, because there's no aggregation happening (since your ID field is unique).
What specifically are you trying to obtain in your query (if anything besides just how many records you have in your table)?

SQL get differences in one column by ID

It's hard for me to word what I want which is why I've had trouble researching this issue. What I want is to look at a table by id and see if another column changes:
id name
---- ------
1 Al
2 Mia
1 Al
2 Jean
In the example, I don't care about id 1 because the name always stayed as Al but I care about id 2 because there is a record with the name Mia but then, that id 2 also has a record with the name Jean. I was thinking of using group by somehow but that doesn't work. Any ideas?
Try this:
SELECT id
FROM mytable
GROUP BY id
HAVING MIN(name) <> MAX(name)
This will select all ids having at least two different values.

I DISTINCTly hate MySQL (help building a query)

This is staight forward I believe:
I have a table with 30,000 rows. When I SELECT DISTINCT 'location' FROM myTable it returns 21,000 rows, about what I'd expect, but it only returns that one column.
What I want is to move those to a new table, but the whole row for each match.
My best guess is something like SELECT * from (SELECT DISTINCT 'location' FROM myTable) or something like that, but it says I have a vague syntax error.
Is there a good way to grab the rest of each DISTINCT row and move it to a new table all in one go?
SELECT * FROM myTable GROUP BY `location`
or if you want to move to another table
CREATE TABLE foo AS SELECT * FROM myTable GROUP BY `location`
Distinct means for the entire row returned. So you can simply use
SELECT DISTINCT * FROM myTable GROUP BY 'location'
Using Distinct on a single column doesn't make a lot of sense. Let's say I have the following simple set
-id- -location-
1 store
2 store
3 home
if there were some sort of query that returned all columns, but just distinct on location, which row would be returned? 1 or 2? Should it just pick one at random? Because of this, DISTINCT works for all columns in the result set returned.
Well, first you need to decide what you really want returned.
The problem is that, presumably, for some of the location values in your table there are different values in the other columns even when the location value is the same:
Location OtherCol StillOtherCol
Place1 1 Fred
Place1 89 Fred
Place1 1 Joe
In that case, which of the three rows do you want to select? When you talk about a DISTINCT Location, you're condensing those three rows of different data into a single row, there's no meaning to moving the original rows from the original table into a new table since those original rows no longer exist in your DISTINCT result set. (If all the other columns are always the same for a given Location, your problem is easier: Just SELECT DISTINCT * FROM YourTable).
If you don't care which values come from the other columns you can use a (bad, IMHO) MySQL extension to SQL and do:
SELECT * FROM YourTable GROUP BY Location
which will give a result set with one row per location and values for the other columns derived from the original data in an undefined fashion.
Multiple rows with identical values in all columns don't have any sense. OK - the question might be a way to correct exactly that situation.
Considering this table, with id being the PK:
kram=# select * from foba;
id | no | name
----+----+---------------
2 | 1 | a
3 | 1 | b
4 | 2 | c
5 | 2 | a,b,c,d,e,f,g
you may extract a sample for every single no (:=location) by grouping over that column, and selecting the row with minimum PK (for example):
SELECT * FROM foba WHERE id IN (SELECT min (id) FROM foba GROUP BY no);
id | no | name
----+----+------
2 | 1 | a
4 | 2 | c