How to identify duplicates in SQL? - sql

I have an requirment that to identify the dupliacte values in the result data and append colour to it, where the Limit is of 10 records only(not to check duplicates in entire table).
Now my issue is how to find the duplicate in the respective result set.
I have tried in this way, But it's checking all table for duplicate. But, I want in within the limit checking.
SELECT count(*)
,safer_id
,CONCAT (
(DAYS_OPEN)
,CASE
WHEN (count(*) > 1)
THEN '~#0a9ec1'
END
) AS DAYS_OPEN
FROM table_gear
WHERE SAFER_ID NOT LIKE '%WYN%'
GROUP BY safer_id
,url
,DAYS_OPEN
ORDER BY Days_open DESC limit 10 offset 0;

I would use COUNT as an analytic function here:
SELECT
safer_id,
url,
CASE WHEN cnt > 1 THEN '~#0a9ec1' END AS color
FROM
(
SELECT t.*, COUNT(*) OVER (PARTITION BY safer_id, url) cnt
FROM table_gear t
WHERE safer_id NOT LIKE '%WYN%'
) a
ORDER BY cnt DESC
LIMIT 10;
This query conditionally assigns a hex color to those records which are duplicate with respect to the combination of safer_id and url values. I'm not entirely sure about your limit or ordering logic, but you can easily modify what I wrote above to fit your needs.

Related

I wish to return the number of limit based on the field/column in SQL

I try to use the limit in SQL to perform limit on rows but somehow I wish to have the limit based on the field instead. like we can have many rows but the limit on table 1 should be 500.
Code:
select
table1,
table2
from
place_table
limit 100 --I wish to change this. to only focus on table1 for 100 data.
You can use row_number():
select p.*
from (select p.*, row_number() over (order by <col>) as seqnum
from place_table p
) p
where seqnum <= <limit col>
<col> is a column to specify the ordering -- which rows you want.

Select last duplicate row with different id Oracle 11g

I have a table that look like this:
The problem is I need to get the last record with duplicates in the column "NRODENUNCIA".
You can use MAX(DENUNCIAID), along with GROUP BY... HAVING to find the duplicates and select the row with the largest DENUNCIAID:
SELECT MAX(DENUNCIAID), NRODENUNCIA, FECHAEMISION, ADUANA, MES, NOMBREESTADO
FROM YourTable
GROUP BY NRODENUNCIA, FECHAEMISION, ADUANA, MES, NOMBREESTADO
HAVING COUNT(1) > 1
This will only show rows that have at least one duplicate. If you want to see non-duplicate rows too, just remove the HAVING COUNT(1) > 1
There are a number of solutions for your problem. One is to use row_number.
Note that I've ordered by DENUNCIID in the OVER clause. This defines the "Last Record" as the one that has the largest DENUNCIID. If you want to define it differently you'd need to change the field that is being ordered.
with dupes as (
SELECT
ROW_NUMBER() OVER (Partition by NRODENUNCIA ORDER BY DENUNCIID DESC) RN,
*
FROM
YourTable
)
SELECT * FROM dupes where rn = 1
This only get's the last record per dupe.
If you want to only include records that have dupes then you change the where clause to
WHERE rn =1
and NRODENUNCIA in (select NRODENUNCIA from dupes where rn > 1)

How to select the rows in original order in Hive?

I want to select rows from mytable in original rows with definite numbers.
As we know, the key word 'limit' will randomly select rows. The rows in mytable are in order. I just want to select them in their original order. For example, to select the 10000 rows which means from row 1 to row 10000.
How to realize this?
Thanks.
Try:
SET mapred.reduce.tasks = 1
SELECT * FROM (
SELECT *, ROW_NUMBER() OVER () AS row_num
FROM table ) table1
SORT BY row_num LIMIT 10000
Rows in your table may be in order but...
Tables are being read in parallel, results returned from different mappers or reducers not in original order. That is why you should know the rule defining "original order".
If you know then you can use row_number() or order by. For example:
select * from table order by ... limit 10000;

Update top N values using PostgreSQL

I want to update the top 10 values of a column in table. I have three columns; id, account and accountrank. To get the top 10 values I can use the following:
SELECT * FROM accountrecords
ORDER BY account DESC
LIMIT 10;
What I would like to do is to set the value in accountrank to be a series of 1 - 10, based on the magnitude of account. Is this possible to do in PostgreSQL?
WITH cte AS (
SELECT id, row_number() OVER (ORDER BY account DESC NULLS LAST) AS rn
FROM accountrecords
ORDER BY account DESC NULLS LAST
LIMIT 10
)
UPDATE accountrecords a
SET accountrank = cte.rn
FROM cte
WHERE cte.id = a.id;
Joining in a table expression is typically faster than correlated subqueries. It is also shorter.
With the window function row_number() distinct numbers are guaranteed. Use rank() (or possibly dense_rank()) if you want rows with equal values for account to share the same number.
Only if there can be NULL values in account, you need to append NULLS LAST for descending sort order, or NULL values sort on top:
Sort by column ASC, but NULL values first?
If there can be concurrent write access, the above query is subject to a race condition. Consider:
Atomic UPDATE .. SELECT in Postgres
Postgres UPDATE … LIMIT 1
However, if that was the case, the whole concept of hard-coding the top ten would be a dubious approach to begin with.
Use a CTE instead of a plain subquery to enforce the LIMIT reliably. See links above.
Sure, you can use your select statement in a subquery. Generating the rank-order isn't trivial, but here's at least one way to do it. I haven't tested this, but off the top of my head:
update accountrecords
set accountrank =
(select count(*) + 1 from accountrecords r where r.account > account)
where id in (select id from accountrecords order by account desc limit 10);
This has the quirk that if two records have the same value for account, then they will get the same rank. You could consider that a feature... :-)

Oracle Select query help please

SELECT id
FROM (
SELECT id
FROM table
WHERE
PROCS_DT is null
ORDER BY prty desc, cret_dt ) where rownum >0 and rownum <=100
The above query is giving me back 100 records as expected
SELECT id
FROM (
SELECT id
FROM table
WHERE
PROCS_DT is null
ORDER BY prty desc, cret_dt ) where rownum >101 and rownum <=200
why is the above query returning me zero records?
Can some one help me how i can keep on. I am dumb in oracle...
Try this:
SELECT id
FROM
(SELECT id,
rownum AS rn
FROM
(SELECT id
FROM TABLE
WHERE PROCS_DT IS NULL
ORDER BY prty DESC, cret_dt) )
WHERE rn >101
AND rn <=200
If you are comfortable using the ANALYTIC functions, try this:
SELECT id
FROM
(
SELECT id,
ROW_NUMBER() OVER(ORDER BY prty DESC, cret_dt ) rn
FROM table
WHERE procs_dt IS NULL
)
WHERE rn >101 and rn <=200
ROWNUM values are assigned to rows as they are returned from a query (or subquery). If a row is not returned, it is not assigned a ROWNUM value at all; so the ROWNUM values returned always begin at 1 and increment by 1 for each row.
(Note that these values are assigned prior to any sorting indicated by the ORDER BY clause. This is why in your case you need to check rownum outside the subquery.)
The odd bit of logic you have to understand is that when you have a predicate on ROWNUM, you are filtering on a value that will only exist if the row passes the filter. Conceptually, Oracle applies any other filters in the query first, then tentatively assigns ROWNUM 1 to the first matching row and checks it against the filter on ROWNUM. If it passes this check, it will be returned with that ROWNUM value, and the next row will be tentatively assigned ROWNUM 2. But if it does not pass the check, the row is discarded, and the same ROWNUM value is tentatively assigned to the next row.
Therefore, if the filter on ROWNUM does not accept a value of 1, no rows will ever pass the filter.
The use of the analytic function ROW_NUMBER() shown in the other answers is one way around this. This function explicitly assigns row numbers (distinct from ROWNUM) based on a given ordering. However, this can change performance significantly, as the optimizer does not necessarily realize that it can avoid assigning numbers to ever possible row in order to complete the query.
The traditional ROWNUM-based way of doing what you want is:
SELECT id
FROM (
SELECT rownum rn, id
FROM (
SELECT id
FROM table
WHERE
PROCS_DT is null
ORDER BY prty desc, cret_dt
) where rownum <=200
) where rn > 101
The innermost query conceptually finds all matching rows and sorts them. The next layer assigns ROWNUMs to these and returns only the first 200 matches. (And actually, the Oracle optimizer understands the significance of a sort followed by a ROWNUM filter, and will usually do the sort in such a way as to identify the top 200 rows without caring about the specific ordering of the other rows.)
The middle layer also takes the ROWNUMs that it assigns and returns them as part of its result set with the alias "rn". This allows the outermost layer to filter on that value to establish the lower limit.
I would experiment with this variant and the analytic function to see which performs better in your case.