Update data in table1 where duplicates in table2 - sql

I have 2 tables: Users and Results.
The usertable contains duplicate data which is reflected in the results table. The user below is created 3 times. I need to update the results table where UserId 2 and 3 to 1 so that all the results can be viewed on this user only.
This is easy if I have only have a few users and a few results for them, but in my case I have 500 duplicated users and 30000 results.
I am using SQL Server Express 2014
I will really appreciate any help with this!
Edit: misstyped column names in resultTable. Im sorry if you got confused by it.
UserTable
UserId---Fname---LName
1-----Georg-----Smith
2-----Georg-----Smith
3-----Georg-----Smith
ResultsTable
ResultId---UserRefId
1-----1
2-----2
3-----3
4-----1
I have manage to select duplicates from usertable, but i don't know how to proceed further.
;WITH T AS
(
SELECT *, COUNT(*) OVER (PARTITION BY Fname + Lname) as Cnt
FROM TestDatabase.Users
)
SELECT Id, Fname, Lname
FROM T
WHERE Cnt > 2

Your ResultTable has 2 columns with the same UserId name. I changed the second to UserId2 for the query below:
;WITH cte As
(
SELECT R.UserId, R.UserId2,
MIN(U.UserId) OVER (PARTITION BY U.FName, U.LName) As OriginalUserId
FROM ResultTable R
INNER JOIN UserTable U ON R.UserId = U.UserId
)
UPDATE cte
SET UserId2 = OriginalUserId

You are on the right track with the cte. The ROW_NUMBER() function can be used to flag duplicate UserIds, then you can join the cte into the from clause of your update statement to find the UserIds you want to replace, and join again to find the UserIds you want to replace them with.
;WITH cteDedup AS(
SELECT
UserId
,FName
,LName
,ROW_NUMBER() OVER(PARTITION BY FName, LName ORDER BY UserID ASC) AS row_num
FROM UserTable
)
UPDATE rt
SET UserId = original.UserId
FROM ResultsTable rt
JOIN cteDedup dupe
ON rt.UserId = dupe.UserId
JOIN cteDedup original
ON dupe.FName = original.FName
AND dupe.LName = original.LName
WHERE dupe.row_num <> 1
AND original.row_num = 1
See the SQLFiddle

A little tricky query looks like this:
;with t as (
select fname+lname name,id,
ROW_NUMBER() over(partition by fname+lname order by id) rn
from #users
)
--for test purpose comment next 2 lines
update #results
set userid=t1.id
--and uncomment the next one
--select t.name,t.id,userid,res,t1.id id1--,(select top 1 id from t t1 where t1.name=t.name and t.rn=1) id1
from t
inner join #results r on t.id=r.userid
inner join t t1 on t.name=t1.name and t1.rn=1
And then you can delete duplicate users
;with t as (
select name,id,
ROW_NUMBER() over(partition by name order by id) rn
from #users
)
delete t where rn>1

Related

Need SQL MAX please

I have a table with a list of Client No's, ID etc & there are many Clients with different ID's. I need to pull out the MAX ID No. for each Client, e.g.
ClientNo: 1500 has 3 ID's - the maximum in the ID field is the one I need!
UPDATE: This works:
SELECT MP.ClientID, MP.SequenceID
FROM TABLENAME MP
INNER JOIN (
SELECT ClientID, MAX(SequenceID) SequenceID
FROM TABLENAME
GROUP BY ClientID
) b on MP.ClientID = b.ClientID AND MP.SequenceID = b.SequenceID
BUT....
I need to link the table to many others to pull in other data, where do I insert the left joins to these tables please?
I am assuming you have multiple same client (numbers) with different IDs and for each client (number) you have to get the maximum ID. You may do the following:
select client_number, max(ID) from client group by client_number;
Depending on your need tweak this query.
You may want to do this:
SELECT MP.ClientID, b.DesiredValue
FROM TABLENAME MP
INNER JOIN (
SELECT ClientID, DesiredValue, ROW_NUMBER () OVER (PARTITION BY ClientID order by SequenceID desc) As RecentRN
FROM TABLENAME
) b
on MP.ClientID = b.ClientID AND b.RecentRN = 1
or
SELECT MP.ClientID, b.DesiredValue
FROM TABLENAME MP
INNER JOIN (
SELECT ClientID, DesiredValue,ROW_NUMBER () OVER (PARTITION BY ClientID order by SequenceID desc) As RecentRN
FROM TABLENAME
) b
on MP.ClientID = b.ClientID
Where b.RecentRN = 1

delete duplicates and retain MAX(id) mysql

I have a code where it list all the duplicates of the data on database
SELECT MAX(id) id
FROM el_student_class_relation
GROUP BY student_id, class_id
HAVING COUNT(*) > 1
Now, I'm trying to retain the MAX(id), then the rest of the duplicates should be deleted
I tried the code
DELETE us
FROM el_student_class_relation us
INNER JOIN(SELECT MAX(id) id
FROM el_student_class_relation
GROUP BY student_id, class_id HAVING COUNT(*) > 1) t ON t.id = us.id
But it deletes the MAX(ID) and it is retaining the the other duplicates and it is the opposite of what I want.
Try this
DELETE FROM el_student_class_relation
WHERE id not in
(
SELECT * from
(SELECT MAX(id) id
FROM el_student_class_relation
GROUP BY student_id, class_id) temp_tbl
)
Please note:
do not use the HAVING COUNT(*) > 1 in inner query.
it will create issue when there is only single record with same id.
You might try the following query that deletes all elements for which another one with a higher ID (and same class and student) exists:
DELETE
FROM el_student_class_relation el1
WHERE EXISTS (SELECT el2.id
FROM el_student_class_relation el2
WHERE el1.student_id = el2.student_id
AND el1.class_id = el2.class_id
AND el2.id > el1.id);
The direct fix for your query is to use an "anti-join", where NOT joining is the important feature. This can be done with LEFT JOIN.
DELETE
us
FROM
el_student_class_relation us
LEFT JOIN
(
SELECT student_id, class_id, MAX(id) id
FROM el_student_class_relation
GROUP BY student_id, class_id
-- HAVING COUNT(*) > 1 [Don't do this, you need to return ALL the rows you want to keep]
)
gr
ON gr.id = us.id
WHERE
gr.id IS NULL -- WHERE there wasn't a match in the "good rows" table
EDIT MariaDB and MySQL aren't the same thing. MariaDB DOES allow self joins on the table being deleted from.
in mysql(lower version) in case of delete sub-query work a little bit different way, you have to use a layer more than required
DELETE FROM el_student_class_relation us
WHERE us.id not in
(
select * from (
SELECT MAX(id) id
FROM el_student_class_relation
GROUP BY student_id, class_id
) t1
)

sql get max of multiple results

In SQL server 2008 R2, how do you get the MAX result if you have two rows that have the same value? I have a table that I use to store the data about the number of times a client is seen at a location.
create table #templocation
(
ClientID Int,
Location Varchar(100),
LocationCount Int,
MaxCount Int
)
I only fill the first three columns with an insert statement then I get the max for each client like this:
update t
set t.MaxCount = t2.locationcount
from #templocation t
join (select tl.ClientID, MAX(tl.LocationCount) as 'locationcount'
from #templocation tl
group by tl.ClientID) t2
on t2.ClientID = t.ClientID
update t
set t.PrimaryLocation = tl.Location
from #temp t
join #templocation tl
on t.ClientId = tl.ClientID
where tl.LocationCount = tl.MaxCount
I then join this to the main table to get the final results. My problem is if a client is has two or more max counts. Then it wants to display all of the max locations like this:
ClientId Location LocationCount
12502 Main St. 4
12502 Lake Ave 4
12502 Tracy Rd 2
The results I get are:
ClientId ClientName Location
12502 John Smith Main St.
12502 John Smith Lake Ave
I only want the first or top one displayed though preferably by alphabetical order.
I do things like this through a couple of CTEs. One will give us the max(locationcount) for each client, and one just adds a row_number() for every clientid so we can pick the "first" location".
SQL Fiddle
It's a bit of a hack, but it works.
;with maxrows as
(select
ClientID as ClientID,
max(LocationCount) as MaxValue
--row_number() OVER (partition by ClientID order by ClientID)
from
table1
group by ClientID)
,
rowcounts as
(
select
t1.ClientID,
t1.Location,
row_number() over (PARTITION BY t1.ClientID order by t1.Location) as RowNum
from
table1 t1
)
select
t2.ClientID,
t2.Location
from
maxrows t1
inner join rowcounts t2
on t1.ClientID = t2.ClientID
and t2.rownum = 1
From how I understood the issue, you could use MAX() OVER () to achieve what you want like this:
WITH maxvalues AS (
SELECT
ClientId,
Location,
LocationCount,
MaxCount = MAX(LocationCount) OVER (PARTITION BY ClientId)
FROM atable
)
SELECT
ClientId,
Location
FROM maxvalues
WHERE LocationCount = MaxCount
;
Join maxvalues to whatever table holds the client names to include those into the output too.
Learn more about the OVER clause in this manual:
OVER clause (Transact-SQL)

SQL display two results side-by-side

I have two tables, and am doing an ordered select on each of them. I wold like to see the results of both orders in one result.
Example (simplified):
"SELECT * FROM table1 ORDER BY visits;"
name|# of visits
----+-----------
AA | 5
BB | 9
CC | 12
.
.
.
"SELECT * FROM table2 ORDER BY spent;"
name|$ spent
----+-------
AA | 20
CC | 30
BB | 50
.
.
.
I want to display the results as two columns so I can visually get a feeling if the most frequent visitors are also the best buyers. (I know this example is bad DB design and not a real scenario. It is an example)
I want to get this:
name by visits|name by spent
--------------+-------------
AA | AA
BB | CC
CC | BB
I am using SQLite.
Select A.Name as NameByVisits, B.Name as NameBySpent
From (Select C.*, RowId as RowNumber From (Select Name From Table1 Order by visits) C) A
Inner Join
(Select D.*, RowId as RowNumber From (Select Name From Table2 Order by spent) D) B
On A.RowNumber = B.RowNumber
Try this
select
ISNULL(ts.rn,tv.rn),
spent.name,
visits.name
from
(select *, (select count(*) rn from spent s where s.value>=spent.value ) rn from spent) ts
full outer join
(select *, (select count(*) rn from visits v where v.visits>=visits.visits ) rn from visits) tv
on ts.rn = tv.rn
order by ISNULL(ts.rn,tv.rn)
It creates a rank for each entry in the source table, and joins the two on their rank. If there are duplicate ranks they will return duplicates in the results.
I know it is not a direct answer, but I was searching for it so in case someone needs it: this is a simpler solution for when the results are only one per column:
select
(select roleid from role where rolename='app.roles/anon') roleid, -- the name of the subselect will be the name of the column
(select userid from users where username='pepe') userid; -- same here
Result:
roleid | userid
--------------------------------------+--------------------------------------
31aa33c4-4e66-4da3-8525-42689e46e635 | 12ad8c95-fbef-4287-9834-7458a4b250ee
For RDBMS that support common table expressions and window functions (e.g., SQL Server, Oracle, PostreSQL), I would use:
WITH most_visited AS
(
SELECT ROW_NUMBER() OVER (ORDER BY num_visits) AS num, name, num_visits
FROM visits
),
most_spent AS
(
SELECT ROW_NUMBER() OVER (ORDER BY amt_spent) AS num, name, amt_spent
FROM spent
)
SELECT mv.name, ms.name
FROM most_visited mv INNER JOIN most_spent ms
ON mv.num = ms.num
ORDER BY mv.num
Just join table1 and table2 with name as key like bellow:
select a.name,
b.name,
a.NumOfVisitField,
b.TotalSpentField
from table1 a
left join table2 b on a.name = b.name

How to display the record with the highest value in Oracle?

I have 4 tables with the following structure:
Table artist:
artistID lastname firstname nationality dateofbirth datedcease
Table work:
workId title copy medium description artist ID
Table Trans:
TransactionID Date Acquired Acquistionprice datesold askingprice salesprice customerID workID
Table Customer:
customerID lastname Firstname street city state zippostalcode country areacode phonenumber email
First question is which artist has the most works of artsold and how many of the artist works have been sold.
My SQL query is this:
SELECT * From dtoohey.artist A1
INNER JOIN
(
SELECT COUNT(W1.ArtistID) AS COUNTER, artistID FROM dtoohey.trans T1
INNER JOIN dtoohey.work W1
ON W1.workid = T1.Workid
GROUP BY W1.artistID
) TEMP1
ON TEMP1.artistID = A1.artistID
WHERE A1.artistID = TEMP1.artistId
ORDER BY COUNTER desc;
I am to get the whole table but I only want show only the first row which is the highest count how do I do that??
I have tried inserting WHERE ROWNUM <=1 but it shows artist ID with 1
qns 2 is sales of which artist's work have resulted in the highest average profit (i.e) the average of the profits made on each sale of worksby an artist), and what is that amount.
My SQL query is:
SELECT A1.artistid, A1.firstname FROM
(
SELECT
(salesPrice - AcquisitionPrice) as profit,
w1.artistid as ArtistID
FROM dtoohey.trans T1
INNER JOIN dtoohey.WORK W1
on W1.workid = T1.workid
) TEMP1
INNER JOIN dtoohey.artist A1
ON A1.artistID = TEMP1.artistID
GROUP BY A1.artistid
HAVING MAX(PROFIT) = AVG(PROFIT);
I'm not able to execute it
I have tried query below but still not able to get it keep getting the error missing right parenthesis
SELECT A1.artistid, A1.firstname, TEMP1.avgProfit
FROM
(
SELECT
AVG(salesPrice - AcquisitionPrice) as avgProfit,
W1.artistid as artistid
FROM dtoohey.trans T1
INNER JOIN dtoohey.WORK W1
ON W1.workid = T1.workid
GROUP BY artistid
ORDER BY avgProfit DESC
LIMIT 1
) TEMP1
INNER JOIN dtoohey.artist A1
ON A1.artisid = TEMP1.artistid
Sometimes ORA-00907: missing right parenthesis means exactly that: we have a left bracket without a matching right one. But it can also be thrown by a syntax error in a part of a statement bounded by parentheses.
It's that second cause here: LIMIT is a Mysql command which Oracle does not recognise. You can use an analytic function here:
SELECT A1.artistid, A1.firstname, TEMP1.avgProfit
FROM
(
select artistid
, avgProfit
, rank() over (order by avgProfit desc) as rnk
from (
SELECT
AVG(salesPrice - AcquisitionPrice) as avgProfit,
W1.artistid as artistid
FROM dtoohey.trans T1
INNER JOIN dtoohey.WORK W1
ON W1.workid = T1.workid
GROUP BY artistid
)
) TEMP1
INNER JOIN dtoohey.artist A1
ON A1.artisid = TEMP1.artistid
where TEMP1.rnk = 1
This uses the RANK() function which will return more than one row if several artists achieve the same average profit. You might want to use ROW_NUMBER() instead. Analytic functions can be very powerful. Find out more.
You can apply ROWN_NUMBER(), RANK() and DENSE_RANK() to any top-n problem. You can use one of them to solve your first problem too.
"however the avg profit is null."
That's probably a data issue. If one of the numbers in (salesPrice - AcquisitionPrice) is null the result will be null, and won't be included in the average. If all the rows for an artist are null the AVG() will be null.
As it happens the sort order will put NULL last. But as the PARTITION BY clause sorts by AvgProfit desc that puts the NULL results at rank 1. The solution is to use the NULLS LAST in the windowing clause:
, rank() over (order by avgProfit desc nulls last) as rnk
This will guarantee you a non-null result at the top (providing at least one of your artists has values in both columns).
1st question - Oracle does not guarantee the order by which rows are retrieved. Hence you must first order and then limit the ordered set.
SELECT * from (
SELECT A1.* From dtoohey.artist A1
INNER JOIN
(
SELECT COUNT(W1.ArtistID) AS COUNTER, artistID FROM dtoohey.trans T1
INNER JOIN dtoohey.work W1
ON W1.workid = T1.Workid
GROUP BY W1.artistID
) TEMP1
ON TEMP1.artistID = A1.artistID
WHERE A1.artistID = TEMP1.artistId
ORDER BY COUNTER desc
) WHERE ROWNUM = 1
2nd question: I believe (haven't tested) that you have that LIMIT 1 wrong. That keyword is for use with Bulk collecting.