Select last duplicate row with different id Oracle 11g - sql

I have a table that look like this:
The problem is I need to get the last record with duplicates in the column "NRODENUNCIA".

You can use MAX(DENUNCIAID), along with GROUP BY... HAVING to find the duplicates and select the row with the largest DENUNCIAID:
SELECT MAX(DENUNCIAID), NRODENUNCIA, FECHAEMISION, ADUANA, MES, NOMBREESTADO
FROM YourTable
GROUP BY NRODENUNCIA, FECHAEMISION, ADUANA, MES, NOMBREESTADO
HAVING COUNT(1) > 1
This will only show rows that have at least one duplicate. If you want to see non-duplicate rows too, just remove the HAVING COUNT(1) > 1

There are a number of solutions for your problem. One is to use row_number.
Note that I've ordered by DENUNCIID in the OVER clause. This defines the "Last Record" as the one that has the largest DENUNCIID. If you want to define it differently you'd need to change the field that is being ordered.
with dupes as (
SELECT
ROW_NUMBER() OVER (Partition by NRODENUNCIA ORDER BY DENUNCIID DESC) RN,
*
FROM
YourTable
)
SELECT * FROM dupes where rn = 1
This only get's the last record per dupe.
If you want to only include records that have dupes then you change the where clause to
WHERE rn =1
and NRODENUNCIA in (select NRODENUNCIA from dupes where rn > 1)

Related

Getting MAX of a column and adding one more

I'm trying to make an SQL query that returns the greatest number from a column and its respective id.
For more information I have two columns ID and NUMBER. Both of them have 2 entries and I want to get the highest number with the ID next to it. This is what I tried but didn't success.
SELECT ID, MAX(NUMBER) AS MAXNUMB
FROM TABLE1
GROUP BY ID, MAXNUMB;
The problem I'm experiencing is that it just shows ALL the entries and if I add a "where" expression it just shows the same (all entries [ids+numbers]).
Pd.: Yes, I got what I wanted but only with one column (number) if I add another column (ID) to select it "brokes".
Try:
SELECT
ID,
A_NUMBER
FROM TABLE1
WHERE A_NUMBER = (
SELECT MAX(A_NUMBER)
FROM TABLE1);
Presuming you want the IDs* of the row with the highest number (and not, instead, the highest number for each ID -- if IDs were not unique in your table, for example).
* there may be more than one ID returned if there are two or more IDs with equal maximum numbers
you can try this
Select ID,maxNumber
From
(
SELECT
ID,
(Select Max(NUMBER) from Tmp where Id = t.Id) maxNumber
FROM
Tmp t
)T1
Group By ID,maxNumber
The query you posted has an illegal column name (number) and is group by the alias for the max value, which is illegal and also doesn't make sense; and you can't include the unaliased max() within the group-by either. So it's likely you're actually doing something like:
select id, max(numb) as maxnumb
from table1
group by id;
which will give one row per ID, with the maximum numb (which is the new name I've made up for your numeric column) for each ID. Or as you said you get "ALL the entries" you might have group by id, numb, which would show all rows from the table (unless there are duplicate combinations).
To get the maximum numb and the corresponding id you could group by id only, order by descending maxnumb, and then return the first row only:
select id, max(numb) as maxnumb
from table1
group by id
order by maxnumb desc
fetch first 1 row only
If there are two ID with the same maxnumb then you would only get one of them - and which one is indeterminate unless you modify the order by - but in that case you might prefer to use first 1 row with ties to see them all.
You could achieve the same thing with a subquery and analytic function to generating a ranking, and have the outer query return the highest-ranking row(s):
select id, numb as maxnumb
from (
select id, numb, dense_rank() over (order by numb desc) as rnk
from table1
)
where rnk = 1
You could also use keep to get the same result as first 1 row only:
select max(id) keep (dense_rank last order by numb) as id, max(numb) as maxnumb
from table1
fiddle

Removing duplicate rows based on one column same values but keep one record

SQL Server Version
Remove all dupe rows (row 3 thru 18) with service_date = '2018-08-29 13:05:00.000' but keep the oldest row (row 2) and of course keep row 1 since its different service_date. Don't mind the create_timestamp or document_file since it's the same customer. Any idea?
In SQL Server, we can try deleting using a CTE:
WITH cte AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY service_date ORDER BY create_timestamp) rn
FROM yourTable
)
DELETE
FROM cte
WHERE rn > 1;
The strategy here is to assign a row number to each group of records sharing the same service_date, with 1 being assigned to the oldest record in that group. Then, we can phrase the delete by just targeting all records which have a row number greater than 1.
You don't need to use Partition function.please use the below query for efficient performance.i have tested its working fine.
with result as
(
select *, row_number() over(order by create_timestamp) as Row_To_Delete from TableName
)
delete from result where result.Row_To_Delete>2
I think you will want to remove these data per customer basis
I mean, if customers are different you will want to keep the entries even on the same date
If you you will require the addition of Customer column in partition by clause used to identify duplicate rows in SQL
By copying and modifying Tim's solution, you can check following
;WITH cte AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY customer, service_date ORDER BY create_timestamp) rn
FROM yourTable
)
DELETE
FROM cte
WHERE rn > 1;

BigQuery - Select only first row in BigQuery

I have a table with data where in Column A I have groups of repeating Data (one after another).
I want to select only first row of each group based on values in column A only (no other criteria). Mind you, I want all corresponding columns selected also for the mentioned new found row (I don't want to exclude them).
Can someone help me with a proper query.
Here is a sample:
SAMPLE
Thanks!
#standardSQL
SELECT row.*
FROM (
SELECT ARRAY_AGG(t LIMIT 1)[OFFSET(0)] row
FROM `project.dataset.table` t
GROUP BY columnA
)
you can try smth like this:
#standardSQL
SELECT
* EXCEPT(rn)
FROM (
SELECT
*,
ROW_NUMBER() OVER(PARTITION BY columnA ORDER BY columnA) AS rn
FROM
your_dataset.your_table)
WHERE rn = 1
that will return:
Row columnA col2 ...
1 AC1001 Z_Creation
2 ACO112BISPIC QN
...
Add LIMIT 1 at the end of the query
something like
SELECT name, year FROM person_table ORDER BY year LIMIT 1
You can now use qualify for a more concise solution:
select
*
from
your_dataset.your_table
where true
qualify ROW_NUMBER() OVER(PARTITION BY columnA ORDER BY columnA) = 1
In BigQuery the physical sequence of rows is not significant. “BigQuery does not guarantee a stable ordering of rows in a table. Only the result of a query with an explicit ORDER BY clause has well-defined ordering.”[1].
First, you need to define which property will determine the first row of your group, then you can run Vasily Bronsky’s query by changing ORDER BY with that property. Which means either you should add another column to the table to store the order of the rows or select one from the columns you have.

Remove duplicate records except the first record in SQL

I want to remove all duplicate records except the first one.
Like :
NAME
R
R
rajesh
YOGESH
YOGESH
Now in the above I want to remove the second "R" and the second "YOGESH".
I have only one column whose name is "NAME".
Use a CTE (I have several of these in production).
;WITH duplicateRemoval as (
SELECT
[name]
,ROW_NUMBER() OVER(PARTITION BY [name] ORDER BY [name]) ranked
from #myTable
ORDER BY name
)
DELETE
FROM duplicateRemoval
WHERE ranked > 1;
Explanation: The CTE will grab all of your records and apply a row number for each unique entry. Each additional entry will get an incrementing number. Replace the DELETE with a SELECT * in order to see what it does.
Seems like a simple distinct modifier would do the trick:
SELECT DISTINCT name
FROM mytable
This is bigger code but it works perfectly where you don't take the original row but find all the duplicate Rows
select majorTable.RowID,majorTable.Name,majorTable.Value from
(select outerTable.Name, outerTable.Value, RowID, ROW_NUMBER()
over(partition by outerTable.Name,outerTable.Value order by RowID)
as RowNo from #Your_Table outerTable inner join
(select Name, Value,COUNT(*) as duplicateRows FROM #Your_Table group by Name, Value
having COUNT(*)>1)innerTable on innerTable.Name = outerTable.Name
and innerTable.Value = outerTable.Value)majorTable where MajorTable.ROwNo <>1

How do I use ROW_NUMBER()?

I want to use the ROW_NUMBER() to get...
To get the max(ROW_NUMBER()) --> Or i guess this would also be the count of all rows
I tried doing:
SELECT max(ROW_NUMBER() OVER(ORDER BY UserId)) FROM Users
but it didn't seem to work...
To get ROW_NUMBER() using a given piece of information, ie. if I have a name and I want to know what row the name came from.
I assume it would be something similar to what I tried for #1
SELECT ROW_NUMBER() OVER(ORDER BY UserId) From Users WHERE UserName='Joe'
but this didn't work either...
Any Ideas?
For the first question, why not just use?
SELECT COUNT(*) FROM myTable
to get the count.
And for the second question, the primary key of the row is what should be used to identify a particular row. Don't try and use the row number for that.
If you returned Row_Number() in your main query,
SELECT ROW_NUMBER() OVER (Order by Id) AS RowNumber, Field1, Field2, Field3
FROM User
Then when you want to go 5 rows back then you can take the current row number and use the following query to determine the row with currentrow -5
SELECT us.Id
FROM (SELECT ROW_NUMBER() OVER (ORDER BY id) AS Row, Id
FROM User ) us
WHERE Row = CurrentRow - 5
Though I agree with others that you could use count() to get the total number of rows, here is how you can use the row_count():
To get the total no of rows:
with temp as (
select row_number() over (order by id) as rownum
from table_name
)
select max(rownum) from temp
To get the row numbers where name is Matt:
with temp as (
select name, row_number() over (order by id) as rownum
from table_name
)
select rownum from temp where name like 'Matt'
You can further use min(rownum) or max(rownum) to get the first or last row for Matt respectively.
These were very simple implementations of row_number(). You can use it for more complex grouping. Check out my response on Advanced grouping without using a sub query
If you need to return the table's total row count, you can use an alternative way to the SELECT COUNT(*) statement.
Because SELECT COUNT(*) makes a full table scan to return the row count, it can take very long time for a large table. You can use the sysindexes system table instead in this case. There is a ROWS column that contains the total row count for each table in your database. You can use the following select statement:
SELECT rows FROM sysindexes WHERE id = OBJECT_ID('table_name') AND indid < 2
This will drastically reduce the time your query takes.
You can use this for get first record where has clause
SELECT TOP(1) * , ROW_NUMBER() OVER(ORDER BY UserId) AS rownum
FROM Users
WHERE UserName = 'Joe'
ORDER BY rownum ASC
ROW_NUMBER() returns a unique number for each row starting with 1. You can easily use this by simply writing:
ROW_NUMBER() OVER (ORDER BY 'Column_Name' DESC) as ROW_NUMBER
May not be related to the question here. But I found it could be useful when using ROW_NUMBER -
SELECT *,
ROW_NUMBER() OVER (ORDER BY (SELECT 100)) AS Any_ID
FROM #Any_Table
select
Ml.Hid,
ml.blockid,
row_number() over (partition by ml.blockid order by Ml.Hid desc) as rownumber,
H.HNAME
from MIT_LeadBechmarkHamletwise ML
join [MT.HAMLE] h on ML.Hid=h.HID
SELECT num, UserName FROM
(SELECT UserName, ROW_NUMBER() OVER(ORDER BY UserId) AS num
From Users) AS numbered
WHERE UserName='Joe'
You can use Row_Number for limit query result.
Example:
SELECT * FROM (
select row_number() OVER (order by createtime desc) AS ROWINDEX,*
from TABLENAME ) TB
WHERE TB.ROWINDEX between 0 and 10
--
With above query, I will get PAGE 1 of results from TABLENAME.
If you absolutely want to use ROW_NUMBER for this (instead of count(*)) you can always use:
SELECT TOP 1 ROW_NUMBER() OVER (ORDER BY Id)
FROM USERS
ORDER BY ROW_NUMBER() OVER (ORDER BY Id) DESC
Need to create virtual table by using WITH table AS, which is mention in given Query.
By using this virtual table, you can perform CRUD operation w.r.t row_number.
QUERY:
WITH table AS
-
(SELECT row_number() OVER(ORDER BY UserId) rn, * FROM Users)
-
SELECT * FROM table WHERE UserName='Joe'
-
You can use INSERT, UPDATE or DELETE in last sentence by in spite of SELECT.
SQL Row_Number() function is to sort and assign an order number to data rows in related record set. So it is used to number rows, for example to identify the top 10 rows which have the highest order amount or identify the order of each customer which is the highest amount, etc.
If you want to sort the dataset and number each row by seperating them into categories we use Row_Number() with Partition By clause. For example, sorting orders of each customer within itself where the dataset contains all orders, etc.
SELECT
SalesOrderNumber,
CustomerId,
SubTotal,
ROW_NUMBER() OVER (PARTITION BY CustomerId ORDER BY SubTotal DESC) rn
FROM Sales.SalesOrderHeader
But as I understand you want to calculate the number of rows of grouped by a column. To visualize the requirement, if you want to see the count of all orders of the related customer as a seperate column besides order info, you can use COUNT() aggregation function with Partition By clause
For example,
SELECT
SalesOrderNumber,
CustomerId,
COUNT(*) OVER (PARTITION BY CustomerId) CustomerOrderCount
FROM Sales.SalesOrderHeader
This query:
SELECT ROW_NUMBER() OVER(ORDER BY UserId) From Users WHERE UserName='Joe'
will return all rows where the UserName is 'Joe' UNLESS you have no UserName='Joe'
They will be listed in order of UserID and the row_number field will start with 1 and increment however many rows contain UserName='Joe'
If it does not work for you then your WHERE command has an issue OR there is no UserID in the table. Check spelling for both fields UserID and UserName.