SQL add IDENTITY base on grouped column value

SQL add IDENTITY base on grouped column value - sql

I have data like this:
FirstName LastName
El Even
Mike Wheeler
Mike Byers
Dustin Henderson
And my desired output is to add Identity into each of the unique FirstName
ID FirstName LastName
1 El Even
2 Mike Wheeler
2 Mike Byers
3 Dustin Henderson
The way I do this is:
/* part 1 */
SELECT IDENTITY(int, 1,1) AS ID, FirstName
INTO TabTemp FROM TabName
GROUP BY FirstName
/* part 2 */
SELECT B.ID, A.FirstName, A.LastName
INTO TabTempFinal FROM TabName A, TabTemp B
WHERE A.FistName = B.FirstName
My question is can I achieve the result without using the part 2?

You could just use the dense_rank window function:
SELECT DENSE_RANK() OVER (ORDER BY first_name) AS id,
first_name,
last_name
FROM tabname

Not really. You can't add an identity column to a table where the identity is not supposed to be unique. That is not how identity columns work.
But, you can use dense_rank() instead:
SELECT DENSE_RANK() OVER (ORDER BY FirstName) AS ID,
FirstName, LastName
INTO TabTempFinal
FROM TabName ;
This generates the unique indicator for each FirstName. The only difference is that the column is not an identity column.

You could do an update with a CTE using DENSE_RANK:
WITH cte AS (
SELECT FirstName, LastName, DENSE_RANK() OVER (ORDER BY FirstName) dr
FROM TabName
)
UPDATE cte
SET ID = dr;
This assumes that you already have a column called ID.

Related

SQL Server : finding duplicates based on first few characters on column

I want to find duplicates based on the first three characters of the surname, is there a way a to do that on SQL? I can compare the whole name, but how to do we compare the first few characters?
Below are my tables
custid forename surname dateofbirth
----------------------------------------
1 David John 16-09-1985
2 David Jon 16-09-1985
3 Sarah Smith 10-08-2015
4 Peter Proca 11-06-2011
5 Peter Proka 11-06-2011
This is my query that I am currently running to compare
SELECT
y.id, y.forename, y.surname
FROM
customers y
INNER JOIN
(SELECT
forename, surname, COUNT(*) AS CountOf
FROM customers
GROUP BY forename, surname
HAVING COUNT(*) > 1) dt ON y.forename = dt.forename

You can use left():
select c.*
from (select c.*, count(*) over (partition by left(surname, 3)) as cnt
from customers c
) c
order by surname;
You can include the forename as well in the partition by if you mean forename and first three letters of surname.

You can use exists as follows:
select t.* from t
Where exists
(select 1 from t tt
Where left(t.surname, 3) = left(tt.surname, 3) and t.custid <> tt.custid
)
order by t.surname;

SQL Union Select for alternate data

I created an sql query with union select and here is the query to join the two columns into one.
(select top 10 FirstName from Users) union (select top 10 LastName from Users)
Here is the Result:
QUERY RESULT 1
And here is the original data for the result 1 of union select.
ORIGINAL DATA
So, here is my problem.
How do I select the data of each firstname and lastname with the same column but the first one is firstname and the second one is lastname. For example:
Tumbaga Temp - <FirstName>
Villamor - <LastName>
Jun - <FirstName>
Villamor - <LastName>
FN83 - <FirstName>
Lising Geron - <LastName>
So on and so fort.
I am new in sql query. Thanks for your help.

We add a common row_number() to both parts to essentially group them, then order by this and the name type to display in clusters of first/last pairs
select 'First' as thename,
Firstname,
row_number() over(order by firstname) rn
from Users
union all
select 'Last',
Lastname,
row_number() over(order by firstname)
from users
order by rn, thename
If you only want the 1st 10, then wrap this and add a clause
select *
from
(
select 'First' as thename,
Firstname,
row_number() over(order by firstname) rn
from Users
union all
select 'Last',
Lastname,
row_number() over(order by firstname)
from users
)
where rn <=10
order by rn, thename

No need to use union, As per your description You have two columns 'firstName' and 'lastName' in a table and you want both in a single column. Just try the following query-:
select FirstName+' '+LastName as FullName from Users
SQL Server

you can add a column to both queries with your favourite data to be selected.
(select top 10 FirstName, 'FirstName' as NameType from SysUser) union (select top 10 LastName, 'LastName' as NameType from SysUser)

SQL: select from same table and same column, just different counts

I have a table called names, and I want to select 2 names after being count(*) as uniq, and then another 2 names just from the entire sample pool.
firstname
John
John
Jessica
Mary
Jessica
John
David
Walter
So the first 2 names would select from a pool of John, Jessica, and Mary etc giving them equal chances of being selected, while the second 2 names will select from the entire pool, so obvious bias will be given to John and Jessica with multiple rows.
I'm sure there's a way to do this but I just can't figure it out. I want to do something like
SELECT uniq.firstname
FROM (SELECT firstname, count(*) as count from names GROUP BY firstname) uniq
limit 2
AND
SELECT firstname
FROM (SELECT firstname from names) limit 2
Is this possible? Appreciate any pointers.

I think you are close but you need some randomness for the sampling:
(SELECT uniq.firstname
FROM (SELECT firstname, count(*) as count from names GROUP BY firstname) uniq
ORDER BY rand()
limit 2
)
UNION ALL
(SELECT firstname
FROM from names
ORDER BY rand()
limit 2
)

As mentioned here you can use RAND or similar functions to achieve it depending on the database.
MySQL:
SELECT firstname
FROM (SELECT firstname, COUNT(*) as count FROM names GROUP BY firstname)
ORDER BY RAND()
LIMIT 2
PostgreSQL:
SELECT firstname
FROM (SELECT firstname, COUNT(*) as count FROM names GROUP BY firstname)
ORDER BY RANDOM()
LIMIT 2
Microsoft SQL Server:
SELECT TOP 2 firstname
FROM (SELECT firstname, COUNT(*) as count FROM names GROUP BY firstname)
ORDER BY NEWID()
IBM DB2:
SELECT firstname , RAND() as IDX
FROM (SELECT firstname, COUNT(*) as count FROM names GROUP BY firstname)
ORDER BY IDX FETCH FIRST 2 ROWS ONLY
Oracle:
SELECT firstname
FROM(SELECT firstname, COUNT(*) as count FROM names GROUP BY firstname ORDER BY dbms_random.value )
WHERE rownum in (1,2)
Follow the similar approach for selecting from entire pool

LIMIT on SQL query

How can I LIMIT the results to 10 in the following query?
I use SQLSRV.
SELECT Id, Firstname, Surname FROM Person WHERE Firstname LIKE ?

Use TOP:
SELECT TOP 10 Id, Firstname, Surname FROM Person WHERE Firstname LIKE ?

use
select top(10) Id, Firstname, Surname ....

The answer by kevingessner is certainly the easiest way.
I just thought I would throw some alternatives in for fun.
SET ROWCOUNT 10
SELECT Id, Firstname, Surname FROM Person WHERE Firstname LIKE ?
SET ROWCOUNT 0
Or a more convoluted way:
With q
as
(
Select ROW_NUMBER() Over(Order by Id) as rn,
Id,
Firstname,
Surname
FROM Person WHERE Firstname LIKE ?
)
Select *
From q
where q.rn <= 10

In a SQL GROUP BY query, what value is used for the non-aggregate columns?

Say I've got the following data back from a SQL query:
Lastname Firstname Age
Anderson Jane 28
Anderson Lisa 22
Anderson Jack 37
If I want to know the age of the oldest person with the last name Anderson, I can select MAX(Age) and GROUP BY Lastname. But I also want to know the first name of that oldest person. How can I make sure that, when the Firstname values are collapsed into one row by the GROUP BY, I get the Firstname value from the same row where I got the max age?

For those RDBMS that support it (e.g., SQL Server 2005+), you can use a window function:
select t.Lastname, t.Firstname, t.Age
from (select Lastname, Firstname, Age,
row_number() over (partition by Lastname order by Age desc) as RowNum
from YourTable
) t
where t.RowNum = 1
For others, you'd need a subquery on Lastname and a join to get Firstname:
select yt.Lastname, yt.Firstname, yt.Age
from YourTable yt
inner join (select LastName, max(Age) as MaxAge
from YourTable
group by LastName) q
on yt.Lastname = q.Lastname
and yt.Age = q.MaxAge

You have to join back to the table from your grouped results - i.e. create a view or a nested query to contain the group by.

The main thing you need to watch out for whatever your approach is that there might be more than 1 firstname with the same age for a given lastname.
This query will return just 1 row, but if your data set had more than one 'Anderson' aged 37, it could return either one:
select firstname, age
from yourtable
where lastname = 'Anderson'
order by age desc limit 1

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL add IDENTITY base on grouped column value - sql

You could just use the dense_rank window function: SELECT DENSE_RANK() OVER (ORDER BY first_name) AS id, first_name, last_name FROM tabname

You could do an update with a CTE using DENSE_RANK: WITH cte AS ( SELECT FirstName, LastName, DENSE_RANK() OVER (ORDER BY FirstName) dr FROM TabName ) UPDATE cte SET ID = dr; This assumes that you already have a column called ID.

Related

SQL Server : finding duplicates based on first few characters on column

SQL Union Select for alternate data

SQL: select from same table and same column, just different counts

LIMIT on SQL query

In a SQL GROUP BY query, what value is used for the non-aggregate columns?

Categories

Resources