Get unique row based on 2 columns with duplicate values - sql

I have a table with 3 columns and 6 rows:
As you can see based on the highlighted red text, Ash and Joey have the same Last name and Street address i.e. column "Last" and column "Street" have a duplicate value. I would like to only get one of them.
Desired result would be to get rows without duplicate values on the "Last" and "Street" columns:
Where only one of Ash or Joey is retained (I just used Ash in this example, but Joey would be fine too - just need 1 or the other, not both).
Is this even possible? Any advise appreciated, thanks.
P.S. the “Street” column is actually on a different table so the picture of the graph represents 2 tables already joined.

Since you don't care which record of the duplicates survives you can give this a shot. It'll actually keep the first one alphabetically by First
DROP TABLE IF EXISTS #t;
CREATE TABLE #t (First VARCHAR(255), Last VARCHAR(255), Street VARCHAR(255));
INSERT #t SELECT 'Ash', 'Williams', '123 Main';
INSERT #t SELECT 'Ben', 'O''Shea', '456 Grand';
INSERT #t SELECT 'Claire', 'Port', '543 Jasper';
INSERT #t SELECT 'Denise', 'Stone', '543 Jasper';
INSERT #t SELECT 'Erica', 'Thomas', '789 Holt';
INSERT #t SELECT 'Joey', 'Williams', '123 Main';
WITH dupes AS (
SELECT First,
Last,
Street,
ROW_NUMBER() OVER (PARTITION BY Last, Street ORDER BY First) RowNum
FROM #t
)
SELECT First, Last, Street
FROM dupes
WHERE RowNum = 1;

On the assumption you want to retain the person with the first name alpabetically, you can use the ROW_NUMBER window function to generate a new row number for each duplicate and use that to filter out the dupes:
CREATE TABLE Peeps
(
FirstName NVARCHAR(20),
LastName NVARCHAR(20),
Street NVARCHAR(20)
)
INSERT INTO Peeps
VALUES
('Ash','Williams','123 Main'),
('Ben','O''Shea','456 grand'),
('Claire','Port','543 Jasper'),
('Denise','Stone','543 Jasper'),
('Erica','Thomas','789 Holt'),
('Joey','Williams','123 Main')
SELECT FirstName,
LastName,
Street
FROM (
SELECT FirstName,
LastName,
Street,
ROW_NUMBER () OVER (PARTITION BY LastName,Street ORDER BY FirstName) AS RowN
FROM Peeps
) a
WHERE RowN = 1
DROP TABLE Peeps

Related

Identify which columns are different in the two queries

I currently have a query that looks like this:
Select val1, val2, val3, val4 from Table_A where someID = 10
UNION
Select oth1, val2, val3, oth4 from Table_B where someId = 10
I initially run this same query above but with EXCEPT, to identify which ID's are returned with differences, and then I do a UNION query to find which columns specifically are different.
My goal is to compare the values between the two tables (some columns have different names). And that's what I'm doing.
However, the two queries above have about 250 different field names, so it is quite mundane to scroll through to find the differences.
Is there a better and quicker way to identify which column names are different after running the two queries?
EDIT: Here's my current process:
DROP TABLE IF EXISTS #Table_1
DROP TABLE IF EXISTS #Table_2
SELECT 'Dave' AS Name, 'Smih' AS LName, 18 AS Age, 'Alabama' AS State
INTO #Table_1
SELECT 'Dave' AS Name, 'Smith' AS LName, 19 AS Age, 'Alabama' AS State
INTO #Table_2
--FInd differences
SELECT Name, LName,Age,State FROM #Table_1
EXCEPT
SELECT Name, LName,Age,State FROM #Table_2
--How I compare differences
SELECT Name, LName,Age,State FROM #Table_1
UNION
SELECT Name, LName,Age,State FROM #Table_2
Is there any way to streamline this so I can get a column list of differences?
Here is a generic way to handle two tables differences.
We just need to know their primary key column.
It is based on JSON, and will work starting from SQL Server 2016 onwards.
SQL
-- DDL and sample data population, start
DECLARE #TableA TABLE (rowid INT IDENTITY(1,1), FirstName VARCHAR(100), LastName VARCHAR(100), Phone VARCHAR(100));
DECLARE #TableB table (rowid int Identity(1,1), FirstName varchar(100), LastName varchar(100), Phone varchar(100));
INSERT INTO #TableA(FirstName, LastName, Phone) VALUES
('JORGE','LUIS','41514493'),
('JUAN','ROBERRTO','41324133'),
('ALBERTO','JOSE','41514461'),
('JULIO','ESTUARDO','56201550'),
('ALFREDO','JOSE','32356654'),
('LUIS','FERNANDO','98596210');
INSERT INTO #TableB(FirstName, LastName, Phone) VALUES
('JORGE','LUIS','41514493'),
('JUAN','ROBERTO','41324132'),
('ALBERTO','JOSE','41514461'),
('JULIO','ESTUARDO','56201551'),
('ALFRIDO','JOSE','32356653'),
('LUIS','FERNANDOO','98596210');
-- DDL and sample data population, end
SELECT rowid
,[key] AS [column]
,Org_Value = MAX( CASE WHEN Src=1 THEN Value END)
,New_Value = MAX( CASE WHEN Src=2 THEN Value END)
FROM (
SELECT Src=1
,rowid
,B.*
FROM #TableA A
CROSS APPLY ( SELECT [Key]
,Value
FROM OpenJson( (SELECT A.* For JSON Path,Without_Array_Wrapper,INCLUDE_NULL_VALUES))
) AS B
UNION ALL
SELECT Src=2
,rowid
,B.*
FROM #TableB A
CROSS APPLY ( SELECT [Key]
,Value
FROM OpenJson( (SELECT A.* For JSON Path,Without_Array_Wrapper,INCLUDE_NULL_VALUES))
) AS B
) AS A
GROUP BY rowid,[key]
HAVING MAX(CASE WHEN Src=1 THEN Value END)
<> MAX(CASE WHEN Src=2 THEN Value END)
ORDER BY rowid,[key];
Output
rowid
column
Org_Value
New_Value
2
LastName
ROBERRTO
ROBERTO
2
Phone
41324133
41324132
4
Phone
56201550
56201551
5
FirstName
ALFREDO
ALFRIDO
5
Phone
32356654
32356653
6
LastName
FERNANDO
FERNANDOO

How to handle Multivalue attribute in sql server

I have a table in Data warehouse.
Create table customer
(
id int,
name varchar(30),
address varchar(50)
);
let data in table
insert into Customer values(1, 'Smith', 'abc,def, lkj');
insert into Customer values(2, 'James', 'pqr,lmn');
i want to split the table address column and insert new row if we have many values. Like
1 Smith abc
1 Smith def
1 Smith lkj
2 James pqr
2 James lmn
i have data of 100000 recrds, please help me in this regards.
You can use string_split() and adjust the insert statement:
insert into Customer (id, name, address)
select v.id, v.name, s.value
from (values (1, 'Smith', 'abc,def,lkj')
) v(id, name, address) cross apply
string_split(v.address, ',') s;
You might also want to add a check constraint on address so it does not contain a comma.
You would load from a staging table to the final table by doing:
insert into Customer (id, name, address)
select t.id, t.name, s.value
from staging t cross apply
string_split(v.address, ',') s;

Insert values of one table in a database to another table in another database

I would like to take some data from a table from DB1 and insert some of that data to a table in DB2.
How would one proceed to do this?
This is what I've got so far:
CREATE VIEW old_study AS
SELECT *
FROM dblink('dbname=mydb', 'select name,begins,ends from study')
AS t1(name varchar(50), register_start date, register_end date);
/*old_study now contains the data I wanna transfer*/
INSERT INTO studies VALUES (nextval('studiesSequence'),name, '',3, 0, register_start, register_end)
SELECT name, register_start, register_end from old_study;
This is how my table in DB2 looks:
CREATE TABLE studies(
id int8 PRIMARY KEY NOT NULL,
name_string VARCHAR(255) NOT NULL,
description VARCHAR(255),
field int8 REFERENCES options_table(id) NOT NULL,
is_active INTEGER NOT NULL,
register_start DATE NOT NULL,
register_end DATE NOT NULL
);
You should include the column names in both the insert and select:
insert into vip_employees(name, age, occupation)
select name, age, occupation
from employees;
However, your data structure is suspect. Either you should use a flag in employees to identify the "VIP employees". Or you should have a primary key in employees and use this primary key in vip_employees to refer to employees. Copying over the data fields is rarely the right thing to do, especially for columns such as age which are going to change over time. Speaking of that, you normally derive age from the date of birth, rather than storing it directly in a table.
INSERT INTO studies
(
id
,name_string
,description
,field
,is_active
,register_start
,register_end
)
SELECT nextval('studiesSequence')
,NAME
,''
,3
,0
,register_start
,register_end
FROM dblink('dbname=mydb', 'select name,begins,ends from study')
AS t1(NAME VARCHAR(50), register_start DATE, register_end DATE);
You can directly insert values that retured by dblink()(that means no need to create a view)
Loop and cursor are weapons of last resort. Try to avoid them. You probably want INSERT INTO ... SELECT:
INSERT INTO x(x, y, z)
SELECT x, y, z
FROM t;
SqlFiddleDemo
EDIT:
INSERT INTO vip_employees(name, age, occupation) -- your column list may vary
SELECT name, age, occupation
FROM employees;
Your syntax is wrong. You cannot have both, a values clause for constant values and a select clause for a query in your INSERT statement.
You'd have to select constant values in your query:
insert into studies
(
id,
name_string,
description,
field,
is_active,
register_start,
register_end
)
select
studiesSequence.nextval,
name,
'Test',
null,
0,
register_start,
register_end
from old_study;

Inserting multiple names in same cell with separated commas

Previously, I was trying to keep ALL previous last names of an employee in a table cell with commas (see below) but I didn’t know how. Someone then suggested using normalization which I’m not sure if it’ll be easier in this situation. My boss wants to display all previous last names of an employee on a web page each time she edits her account info. Simply put, when Judy changes her last name again – her last name Kingsley should be inserted behind Smith. So, my question is back to whether or not it is possible to add multiple last names in the same cell, separated with commas as I thought it’ll be simpler when I use a variable on the web page to display all the Alias at once? Yet, I’ve no clue the complexity to write the codes for this. Any help is truly appreciated.
Current SQL table
+---------------+-----------------+----------------+--------------------+
People FirstName LastName Alias
+---------------+-----------------+----------------+--------------------+
002112 Judy Smith Hall
Preferred
+---------------+-----------------+----------------+--------------------+
People FirstName LastName Alias
+---------------+-----------------+----------------+--------------------+
002112 Judy Kingsley Hall, Smith
Keep the database normalized.
People:
(Id, Firstname, Lastname)
LastnameHistory:
(PeopleId, OldLastname, NewLastname, DateChanged)
You can the create a view which would be a "GROUP_CONCAT" type of query to transform the data as required.
An example:
DECLARE #people TABLE ( id INT IDENTITY(1,1), fname VARCHAR(50), lname VARCHAR(50))
DECLARE #lnameHistory TABLE ( id INT IDENTITY(1,1), people_id INT, lname VARCHAR(50), date_changed DATETIME)
INSERT INTO #people (fname, lname)
VALUES ('john', 'smith'), ('jane', 'doe')
INSERT INTO #lnameHistory (people_id, lname, date_changed)
VALUES (2, 'shakespeare', '2012-01-01'), (2, 'einstein', '2013-12-12')
;WITH group_concat AS
(
SELECT people_id, LEFT(lnames , LEN(lnames )-1) AS lnames
FROM #lnameHistory AS o
CROSS APPLY
(
SELECT lname + ', '
FROM #lnameHistory AS i
WHERE o.people_id = i.people_id
ORDER BY date_changed ASC
FOR XML PATH('')
) pre_trimmed (lnames)
GROUP BY people_id, lnames
)
SELECT p.*, gc.lnames FROM #people p
JOIN group_concat gc ON gc.people_id = p.id
Some reference for syntax:
SQL Server CROSS APPLY and OUTER APPLY
XML Data (SQL Server)
Assuming your update statement is a stored procedure taking in parameters of #personId and #newLastName:
EDIT
Sorry, wrong version of SQL. Have to do it the old school way!
UPDATE PeopleTable
SET Alias = Alias + ', ' + LastName,
LastName = #newLastName
WHERE
People = #personId
When you update the table for a new LastName, use something like this:
UPDATE <table> SET Alias = Alias + ', ' + LastName, LastName = <newLastName>

How do i insert new record in this table?

I'm sorry for this noob question.
There're 2 tables: the smaller one is derived from bigger one.
How do i insert new record into bigger table??
INSERT INTO people (lname, fname, city, age, salary) VALUES (' Doe','John','Paris', '25','1000$' );
but bigger table contains city as number. How should i insert 'Paris'?? Should i know its number beforehand?? But 'Paris' isn't in Cities (smaller) table!!
How do records get inserted in bigger (people) table??
EDIT:
Added IF block to check for Paris.
IF NOT EXISTS (SELECT 1 FROM City WHERE City = 'Paris')
Insert INTO City (City) VALUES ('Paris')
DECLARE #Cid int = (SELECT CityID FROM City WHERE City = 'Paris')
INSERT INTO people (lname, fname, city, age, salary)
VALUES (' Doe','John', #cid, '25','1000$' )
I made an assumption about the structure of the city table obviously.
You could also parameterize this with a #city variable and sub that for 'Paris' everywhere in the code.
In the insert to the people table, the value of the city column should be the number of the city you want the person to be linked to - so in your quoted example replace 'Paris' with the city number for Paris.
If you want to create a new person record with a city that does not yet exist in the cities table, you'll need to do an insert into the cities table first, get the number of the created city and then use that in your insert to the people table.
EDIT:
adopted solution for existing entries in city table. as I wrote in comments, transactions could be ommited (for example when running as stored procedure)
BEGIN TRANSACTION
DECLARE #city_id INT
SELECT #city_id=id FROM city WHERE name='Paris'
IF #city_id IS NULL
BEGIN
INSERT INTO city (name) VALUES ('Paris')
SET #city_id=##IDENTITY
END
INSERT INTO people (lname, fname, city, age, salary) VALUES ('Doe','John',#city_id, '25','1000$' );
END TRANSACTION