Oracle: find duplicate rows in select query - sql

My SQL query returns results with 4 columns "A", "B", "C", "D".
Suppose the results are:
A B C D
1 1 1 1
1 1 1 2
2 2 2 1
Is it possible to get the count of duplicate rows with columns "A", "B", "C" in each row.
e.g. the expected result is:
A B C D cnt
1 1 1 1 2
1 1 1 2 2
2 2 2 1 1
I tried using count(*) over. But it returns me the total number of rows returned by the query.
Another information is that in example I have mentioned only 3 columns based on which I need to check the count. But my actual query has such 8 columns. And number of rows in database are huge. So I think group by will not be a feasible option here.
Any hint is appreciable.
Thanks.

Maybe too late, but probably the count over as analytic function (aka window function) within oracle helps you. When I understand your request correctly, this should solve your problem :
create table sne_test(a number(1)
,b number(1)
,c number(1)
,d number(1)
,e number(1)
,f number(1));
insert into sne_test values(1,1,1,1,1,1);
insert into sne_test values(1,1,2,1,1,1);
insert into sne_test values(1,1,2,4,1,1);
insert into sne_test values(1,1,2,5,1,1);
insert into sne_test values(1,2,1,1,3,1);
insert into sne_test values(1,2,1,2,1,2);
insert into sne_test values(2,1,1,1,1,1);
commit;
SELECT a,b,c,d,e,f,
count(*) over (PARTITION BY a,b,c)
FROM sne_test;
A B C D E F AMOUNT
-- -- -- -- -- -- ------
1 1 1 1 1 1 1
1 1 2 4 1 1 3
1 1 2 1 1 1 3
1 1 2 5 1 1 3
1 2 1 1 3 1 2
1 2 1 2 1 2 2
2 1 1 1 1 1 1

To find duplicates you must group the data based on key column
select
count(*)
,empno
from
emp
group by
empno
having
count(*) > 1;
This allows you to aggregate by empno even when multiple records exist for each category (more than one).

You have to use a subquery where you get the count of rows, grouped by A, B and C. And then you join this subquery again with your table (or with your query), like this:
select your_table.A, your_table.B, your_table.C, your_table.D, cnt
from
your_table inner join
(SELECT A, B, C, count(*) as cnt
FROM your_table
GROUP BY A, B, C) t
on t.A = your_table.A
and t.B = your_table.B
and t.C = your_table.C

Related

SQL reset row_number when previous id column null

This is hard to explain so I will give an example.
I need SQL (ms server), I assume its with row_number over partition but can't get it to work.
I have this table:
ID
PreviousID
Data
1
a
2
1
b
3
2
c
4
d
5
4
e
6
f
I want these results:
ID
NewID
Data
1
1
a
2
1
b
3
1
c
4
2
d
5
2
e
6
3
f
And another with just the new IDs of each sequence:
NewID
Data
1
a
2
d
3
f
Instead of a row number new id, it could also have the first id of the sequence, whatever is easier, as long as it identifies the sequence.
Seems you want a windowed COUNT of rows where the value of PreviousID is NULL.
SELECT ID,
COUNT(CASE WHEN PreviousID IS NULL THEN 1 END) OVER (ORDER BY ID) AS NewID,
Data
FROM dbo.YourTable;

Select field 1 Where Field 2 ='value1' And field 2 ='value2' when Field 1 is Same

I am currently having troubles with filtering my SQL records. I need something like what it results in the following concept: Table is
A B
1 1
1 3
2 1
2 2
2 3
2 4
3 1
3 2
I want to select value of A , where B=1 and B=2 And B=3 when same A .... result is
A
2
Please help
You can use aggregation:
select a
from mytable
where b in (1, 2, 3)
group by a
having count(*) = 3
This assumes no duplicates in the table - else, you need to change the having clause to:
having count(distinct b) = 3

SQL query to find the entries corresponding to the maximum count of each type

I have a table X in Postgres with the following entries
A B C
2 3 1
3 3 1
0 4 1
1 4 1
2 4 1
3 4 1
0 5 1
1 5 1
2 5 1
3 5 1
0 2 2
1 2 3
I would like to find out the entries having maximum of Column C for every kind of A and B i.e (group by B) with the most efficient query possible and return corresponding A and B.
Expected Output:
A B C
1 2 3
2 3 1
0 4 1
0 5 1
Please help me with this problem . Thank you
demo: db<>fiddle
Using DISTINCT ON:
SELECT DISTINCT ON (B)
A, B, C
FROM
my_table
ORDER BY B, C DESC, A
DISTINCT ON gives you exactly the first row for an ordered group. In this case B is grouped.
After ordering B (which is necessary): We first order the maximum C (with DESC) to the top of each group. Then (if there are tied MAX(C) values) we order the A to get the minimum A to the top.
Seems like it is a greatest n per group problem:
WITH cte AS (
SELECT *, RANK() OVER (PARTITION BY B ORDER BY C DESC, A ASC) AS rnk
FROM t
)
SELECT *
FROM cte
WHERE rnk = 1
You're not clear which A needs to be considered, the above returns the row with smallest A.
itseems to me you need max()
select A,B, max(c) from table_name
group by A,B
this will work:
select * from (SELECT t.*,
rank() OVER (PARTITION BY A,B order by C) rank
FROM tablename t)
where rank=1 ;

Create multiple rows based on 1 column

I currently have a table with a quantity in it.
ID Code Quantity
1 A 1
2 B 3
3 C 2
4 D 1
Is there anyway to write a sql statement that would get me
ID Code Quantity
1 A 1
2 B 1
2 B 1
2 B 1
3 C 1
3 C 1
4 D 1
I need to break out the quantity and have that many number of rows
Thanks
Here's one option using a numbers table to join to:
with numberstable as (
select 1 AS Number
union all
select Number + 1 from numberstable where Number<100
)
select t.id, t.code, 1
from yourtable t
join numberstable n on t.quantity >= n.number
order by t.id
Online Demo
Please note, depending on which database you are using, this may not be the correct approach to creating the numbers table. This works in most databases supporting common table expressions. But the key to the answer is the join and the on criteria.
One way would be to generate an array with X elements (where X is the quantity). So for rows
ID Code Quantity
1 A 1
2 B 3
3 C 2
you would get
ID Code Quantity ArrayVar
1 A 1 [1]
2 B 3 [1,2,3]
3 C 2 [2]
using a sequence function (e.g, in PrestoDB, sequence(start, stop) -> array(bigint))
Then, unnest the array, so for each ID, you get a X rows, and set the quantity to 1. Not sure what SQL distribution you're using, but this should work!
You can use connect by statement to cross join tables in order to get your desired output.
check my solution it works pretty robust.
select
"ID",
"Code",
1 QUANTITY
from Table1, table(cast(multiset
(select level from dual
connect by level <= Table1."Quantity") as sys.OdciNumberList));

SQL: How to count instances of Column A in Column B

I have a table with 2 columns A, and B that represent a connection graph between the two.
A B
1 3
2 5
4 2
3 5
2 3
I need to find how many instances of column A occur in column B (including 0)
So for the example above I would need the result set
A OccurencesInB
1 0
2 1
3 2
4 0
The best I have so far is
SELECT B, COUNT(*) AS TABLE_COUNT
FROM TABLE
GROUP BY B
ORDER BY TABLE_COUNT DESC
This does not find the instances of A that do not occur in B, which is crucial.
Any assistance will be greatly appreciated!
Use a correlated sub-query:
SELECT A,
TABLE_COUNT = (SELECT COUNT(*)
FROM TableName t2
WHERE t2.B = t1.A)
FROM TableName t1
GROUP BY A
ORDER BY TABLE_COUNT DESC, A
Result:
A TABLE_COUNT
3 2
2 1
1 0
4 0