sql - Data with more than one record - sql

I have the following temporary table
Aim is to flag the data with more than one records and put More than one records
In my example below, if Siren appears more than once, I would have
Siren ETS_RS Voie Ville nom_etp
348177155 POITOU-CHARENTES ENGRAIS P.C.E. (SNC) BOULEVARD WLADIMIR MORCH 17000 LA ROCHELLE More than one records
For records that are appearing once, I will have the single name of the company (here nom_etp)
Siren ETS_RS Voie Ville nom_etp
344843347 PRESTIGE AUTO ROCHELAIS (SAS) 4 RUE JEAN DEMEOCQ 17000 LA ROCHELLE NIGER
I tried a few things based on the idea that if I can have a count of more than one, I could flag them easily and use them with a CASE :
First: I tried to do a count
WITH cte_ssrep_moraux AS (...)
SELECT SIREN,ETS_RS,Voie,Ville
,Denomination AS nom_etp,COUNT(SIREN)
FROM cte_ssrep_moraux
GROUP BY ETS_RS,Voie,Ville,Denomination,SIREN
It hits a snitch as all counts were equal to one and I have the same dataset as in the picture...
Second:
WITH cte_ssrep_moraux AS (...)
SELECT ETS_RS,Voie,Ville
,Denomination AS nom_etp,SIREN,
RANK() OVER (PARTITION BY ETS_RS ORDER BY ETS_RS ASC) AS xx
FROM cte_ssrep_moraux
GROUP BY ETS_RS,Voie,Ville,Denomination,SIREN
It hits a snitch as all counts were equal to one and I have the same dataset as in the picture...
I'm bit confused on what I should do next. I have the feeling will be an easy one and I'll face palmed myself.
Many thanks for reading my question

If this is your criteria:
if Siren appears more than once,
Then the group by clause should only contain Siren:
SELECT SIREN, COUNT(*)
FROM cte_ssrep_moraux
GROUP BY SIREN
HAVING COUNT(*) > 1;
I'm not sure what you want to do after that, but this will return the SIREN values that appear more than once.

If there is more than one row and you change every nom_etp to 'more than one record', you end up with identical rows. That's why I prepared some tweaked query. See following (table simplified for clarity):
CREATE TABLE Duplicates
(
Id int,
Name varchar(20),
Item varchar(20)
)
INSERT Duplicates VALUES
(1,'Name1', 'Item1'),
(2,'Name2', 'Item2'),
(2,'Name2', 'Item3'),
(3,'Name3', 'Item4'),
(3,'Name3', 'Item5'),
(3,'Name3', 'Item6'),
(4,'Name4', 'Item7');
If you need just a query:
WITH Numbered AS
(
SELECT Id, Name, Item,
ROW_NUMBER() OVER (PARTITION BY Id ORDER BY Id) RowNum,
COUNT(*) OVER (PARTITION BY Id ORDER BY ID) TotalInGroup
FROM Duplicates
)
SELECT Id, Name,
CASE WHEN RowNum=1 AND TotalInGroup>1 THEN 'More records' ELSE Item END Item
FROM Numbered
If you need to normalize:
WITH Numbered AS
(
SELECT Id, Name, Item,
ROW_NUMBER() OVER (ORDER BY Id) Number,
ROW_NUMBER() OVER (PARTITION BY Id ORDER BY Id) RowNum,
COUNT(*) OVER (PARTITION BY Id ORDER BY ID) TotalInGroup
FROM Duplicates
)
MERGE Numbered AS tgt
USING Numbered AS src
ON src.Number=tgt.Number
WHEN MATCHED AND tgt.RowNum=1 AND tgt.TotalInGroup>1 THEN
UPDATE SET tgt.Item='More'
WHEN MATCHED AND tgt.RowNum>1 THEN
DELETE;
Table will look like below:
Id Name Item
-- ---- ----
1 Name1 Item1
2 Name2 More
3 Name3 More
4 Name4 Item7
If there are multiple rows with same id, first of them is updated with 'More' constant, all other in the group are deleted.

Use CTE for this purpose
;WITH CTE AS(
SELECT ETS_RS,Voie,Ville,Denomination AS nom_etp,SIREN,
ROW_NUMBER() OVER (PARTITION BY ETS_RS ORDER BY ETS_RS ASC) AS RN
FROM cte_ssrep_moraux
--GROUP BY ETS_RS,Voie,Ville,Denomination,SIREN
)
SELECT ETS_RS,
Voie,Ville,
CASE WHEN RN > 1 THEN 'More than one records'
ELSE nom_etp
END AS 'nom_etp',
SIREN
FROM CTE

;with cte
as
(
select siren,count(*) as cnt
from
yourtable
having count(*)>1
)
update t
set nom_etp='more than one records'
yourtable t where exists(Select 1 from cte c where c.sirenid=t.sirenid)

Since you still want all the records, including the unique.
Then you can use COUNT as a window function.
With a CASE to choose what to display as nom_etp.
select Siren, ETS_RS, Voie, Ville,
(case when count(*) over (partition by Siren) > 1 then 'More than one records' else nom_etp end) as nom_etp
from cte_ssrep_moraux;

Please find what I did
WITH cte_ssrep_moraux AS (
SELECT SIREN,ETS_RS,Voie,Ville
,Denomination AS nom_etp,ROW_NUMBER()
OVER (PARTITION BY ETS_RS ORDER BY ETS_RS ASC) AS Counting
FROM
(my_initial_cte) AS tb
)
SELECT Siren, ETS_RS, Voie, Ville,nom_etp
FROM cte_ssrep_moraux
WHERE counting = 1
AND Siren NOT IN (SELECT Siren FROM cte_ssrep_moraux WHERE counting > 1)
UNION ALL
SELECT DISTINCT Siren, ETS_RS, Voie, Ville,'More than one records'
FROM cte_ssrep_moraux
WHERE counting > 1
Explanation: After the initial CTE, I tried many of the solutions mentioned above especially using the CASE.
Issue with the CASE was that it would put something like that
Siren ETS_RS Voie Ville nom_etp
xxxx xyxy xyzet Bordeaux More than one records
xxxx xyxy xyzet Bordeaux More than one records
xxxx xyxy xyzet Bordeaux More than one records
xxxy zzzy ssare Paris Firm ABC
So instead of putting everything under a CASE, I said let's split that into 2 part :
First part would put everything with a counting equal to 1
Second part would put the rest with a counting that goes above 1 with a DISTINCT
Join the two results with an UNION ALL as the two sets have the same numbers of fetch rows

Related

How to find Max value in a column in SQL Server 2012

I want to find the max value in a column
ID CName Tot_Val PName
--------------------------------
1 1 100 P1
2 1 10 P2
3 2 50 P2
4 2 80 P1
Above is my table structure. I just want to find the max total value only from the table. In that four row ID 1 and 2 have same value in CName but total val and PName has different values. What I am expecting is have to find the max value in ID 1 and 2
Expected result:
ID CName Tot_Val PName
--------------------------------
1 1 100 P1
4 2 80 P1
I need result same as like mention above
select Max(Tot_Val), CName
from table1
where PName in ('P1', 'P2')
group by CName
This is query I have tried but my problem is that I am not able to bring PName in this table. If I add PName in the select list means it will showing the rows doubled e.g. Result is 100 rows but when I add PName in selected list and group by list it showing 600 rows. That is the problem.
Can someone please help me to resolve this.
One possible option is to use a subquery. Give each row a number within each CName group ordered by Tot_Val. Then select the rows with a row number equal to one.
select x.*
from ( select mt.ID,
mt.CName,
mt.Tot_Val,
mt.PName,
row_number() over(partition by mt.CName order by mt.Tot_Val desc) as No
from MyTable mt ) x
where x.No = 1;
An alternative would be to use a common table expression (CTE) instead of a subquery to isolate the first result set.
with x as
(
select mt.ID,
mt.CName,
mt.Tot_Val,
mt.PName,
row_number() over(partition by mt.CName order by mt.Tot_Val desc) as No
from MyTable mt
)
select x.*
from x
where x.No = 1;
See both solutions in action in this fiddle.
You can search top-n-per-group for this kind of a query.
There are two common ways to do it. The most efficient method depends on your indexes and data distribution and whether you already have another table with the list of all CName values.
Using ROW_NUMBER
WITH
CTE
AS
(
SELECT
ID, CName, Tot_Val, PName,
ROW_NUMBER() OVER (PARTITION BY CName ORDER BY Tot_Val DESC) AS rn
FROM table1
)
SELECT
ID, CName, Tot_Val, PName
FROM CTE
WHERE rn=1
;
Using CROSS APPLY
WITH
CTE
AS
(
SELECT CName
FROM table1
GROUP BY CName
)
SELECT
A.ID
,A.CName
,A.Tot_Val
,A.PName
FROM
CTE
CROSS APPLY
(
SELECT TOP(1)
table1.ID
,table1.CName
,table1.Tot_Val
,table1.PName
FROM table1
WHERE
table1.CName = CTE.CName
ORDER BY
table1.Tot_Val DESC
) AS A
;
See a very detailed answer on dba.se Retrieving n rows per group
, or here Get top 1 row of each group
.
CROSS APPLY might be as fast as a correlated subquery, but this often has very good performance (and better than ROW_NUMBER():
select t.*
from t
where t.tot_val = (select max(t2.tot_val)
from t t2
where t2.cname = t.cname
);
Note: The performance depends on having an index on (cname, tot_val).

Remove duplicate records except the first record in SQL

I want to remove all duplicate records except the first one.
Like :
NAME
R
R
rajesh
YOGESH
YOGESH
Now in the above I want to remove the second "R" and the second "YOGESH".
I have only one column whose name is "NAME".
Use a CTE (I have several of these in production).
;WITH duplicateRemoval as (
SELECT
[name]
,ROW_NUMBER() OVER(PARTITION BY [name] ORDER BY [name]) ranked
from #myTable
ORDER BY name
)
DELETE
FROM duplicateRemoval
WHERE ranked > 1;
Explanation: The CTE will grab all of your records and apply a row number for each unique entry. Each additional entry will get an incrementing number. Replace the DELETE with a SELECT * in order to see what it does.
Seems like a simple distinct modifier would do the trick:
SELECT DISTINCT name
FROM mytable
This is bigger code but it works perfectly where you don't take the original row but find all the duplicate Rows
select majorTable.RowID,majorTable.Name,majorTable.Value from
(select outerTable.Name, outerTable.Value, RowID, ROW_NUMBER()
over(partition by outerTable.Name,outerTable.Value order by RowID)
as RowNo from #Your_Table outerTable inner join
(select Name, Value,COUNT(*) as duplicateRows FROM #Your_Table group by Name, Value
having COUNT(*)>1)innerTable on innerTable.Name = outerTable.Name
and innerTable.Value = outerTable.Value)majorTable where MajorTable.ROwNo <>1

SQL Separating Distinct Values using single column

Does anyone happen to know a way of basically taking the 'Distinct' command but only using it on a single column. For lack of example, something similar to this:
Select (Distinct ID), Name, Term from Table
So it would get rid of row with duplicate ID's but still use the other column information. I would use distinct on the full query but the rows are all different due to certain columns data set. And I would need to output only the top most term between the two duplicates:
ID Name Term
1 Suzy A
1 Suzy B
2 John A
2 John B
3 Pete A
4 Carl A
5 Sally B
Any suggestions would be helpful.
select t.Id, t.Name, t.Term
from (select distinct ID from Table order by id, term) t
You can use row number for this
Select ID, Name, Term from(
Select ID, Name, Term, ROW_NUMBER ( )
OVER ( PARTITION BY ID order by Name) as rn from Table
Where rn = 1)
as tbl
Order by determines the order from which the first row will be picked.

SQL Query to obtain the maximum value for each unique value in another column

ID Sum Name
a 10 Joe
a 8 Mary
b 21 Kate
b 110 Casey
b 67 Pierce
What would you recommend as the best way to
obtain for each ID the name that corresponds to the largest sum (grouping by ID).
What I tried so far:
select ID, SUM(Sum) s, Name
from Table1
group by ID, Name
Order by SUM(Sum) DESC;
this will arrange the records into groups that have the highest sum first. Then I have to somehow flag those records and keep only those. Any tips or pointers? Thanks a lot
In the end I'd like to obtain:
a 10 Joe
b 110 Casey
You want the row_number() function:
select id, [sum], name
from (select t.*]
row_number() over (partition by id order by [sum] desc) as seqnum
from table1
) t
where seqnum = 1;
Your question is more confusing than it needs to be because you have a column called sum. You should avoid using SQL reserved words for identifiers.
The row_number() function assigns a sequential number to a group of rows, starting with 1. The group is defined by the partition by clause. In this case, all rows with the same id are in the same group. The ordering of the numbers is determined by the order by clause, so the one with the largest value of sum gets the value of 1.
If you might have duplicate maximum values and you want all of them, use the related function rank() or dense_rank().
select *
from
(
select *
,rn = row_number() over (partition by Id order by sum desc)
from table
)x
where x.rn=1
demo

Oracle query - select top records

Assuming the following table:
ID Name Revision
--- ----- --------
1 blah 0
2 yada 1
3 blah 1
4 yada 0
5 blah 2
6 blah 3
How do I get the two records, one for "blah" and one for "yada" with highest revision number (3 for blah and 1 for yada)? Something like:
ID Name Revision
--- ----- --------
6 blah 3
2 yada 1
Also, once these records are retrieved, how do I get the rest, ordered by name and revision?
I am trying to create a master-detail view where master records are latest revisions and details include the previous revisions.
Basically, with the aggregate function MAX():
SELECT "Name", MAX("Revision") AS max_revison
FROM tbl
WHERE "Name" IN ('blah', 'yada');
GROUP BY "Name"
ORDER BY "Name"; -- ordering by revision would be pointless;
If you need more columns from the row, there are several ways. One would be to join the above subquery back to the base table:
SELECT t.*
FROM (
SELECT "Name", max("Revision") AS max_revison
FROM tbl
WHERE "Name" IN ('blah', 'yada');
GROUP BY "Name"
) AS sub
JOIN tbl AS t ON t."Revision" = sub.max_revison
AND t."Name" = sub."Name"
ORDER BY "Name";
Generally, this has the potential to yield more than one row per "Name" - if "Revision" is not unique (per "Name"). You would have to define how to pick one from a group of peers sharing the same maximum "Revision" - a tiebreaker.
Another way would be with NOT EXISTS, excluding rows that have greater peers, possibly faster:
SELECT t.*
FROM tbl AS t
WHERE "Name" IN ('blah', 'yada')
AND NOT EXISTS (
SELECT 1
FROM tbl AS t1
WHERE t1."Name" = t."Name"
AND t1."Revision" > t."Revision"
)
ORDER BY "Name";
Or you could use a CTE with an analytic function (window function):
WITH cte AS (
SELECT *, ROW_NUMBER() OVER(PARTITION BY "Name" ORDER BY "Revision" DESC) AS rn
FROM tbl
WHERE "Name" IN ('blah', 'yada')
)
SELECT *
FROM cte
WHERE rn = 1;
The last one is slightly different: one row per "Name" is guaranteed. If you don't use more ORDER BY items, an arbitrary row will be picked in case of a tie. If you want all peers use RANK() instead.
This approach will select the rows for each Name with the maximum revision number for that Name. The result will be the exact output you were looking for in your post.
SELECT *
FROM tbl a
WHERE a.revision = (select max(revision) from tbl where name = a.name)
ORDER BY a.name
In Oracle, you can use LAST function to simplify this.
select max(id) keep (dense_rank last order by revision),
name,
max(revision)
from table
group by name;
Demo