sql query to find unique records - sql

I am new to sql and need your help to achieve the below , I have tried using group and count functions but I am getting all the rows in the unique group which are duplicated.
Below is my source data.
CDR_ID,TelephoneNo,Call_ID,call_Duration,Call_Plan
543,xxx-23,12,12,500
543,xxx-23,12,12,501
543,xxx-23,12,12,510
643,xxx-33,11,17,700
343,xxx-33,11,17,700
766,xxx-74,32,1,300
766,xxx-74,32,1,300
877,xxx-32,12,2,300
877,xxx-32,12,2,300
877,xxx-32,12,2,301
Please note :-the source has multiple combinations of unique records, so when I do the count the unique set is not appearing as count =1
example :- the below data in source have 60 records for each combination
877,xxx-32,12,2,300 -- 60 records
877,xxx-32,12,2,301 -- 60 records
I am trying to get the unique unique records, but the duplicate records are also getting in
Below are the rows which should come up in the unique group. i.e. there will be multiple call_Plans for the same combinations of CDR_ID,TelephoneNo,Call_ID,call_Duration. I want to read records for which there is only one call plan for each unique combination of CDR_ID,TelephoneNo,Call_ID,call_Duration,
CDR_ID,TelephoneNo,Call_ID,call_Duration,Call_Plan
643,xxx-33,11,17,700
343,xxx-33,11,17,700
766,xxx-74,32,1,300
Please advice on this.
Thanks and Regards

To do more complex groupings you could also use a Common Table Expression/Derived Table along with windowed functions:
declare #t table(CDR_ID int,TelephoneNo nvarchar(20),Call_ID int,call_Duration int,Call_Plan int);
insert into #t values (543,'xxx-23',12,12,500),(543,'xxx-23',12,12,501),(543,'xxx-23',12,12,510),(643,'xxx-33',11,17,700),(343,'xxx-33',11,17,700),(766,'xxx-74',32,1,300),(766,'xxx-74',32,1,300),(877,'xxx-32',12,2,300),(877,'xxx-32',12,2,300),(877,'xxx-32',12,2,301);
with cte as
(
select CDR_ID
,TelephoneNo
,Call_ID
,call_Duration
,Call_Plan
,count(*) over (partition by CDR_ID,TelephoneNo,Call_ID,call_Duration) as c
from (select distinct * from #t) a
)
select *
from cte
where c = 1;
Output:
+--------+-------------+---------+---------------+-----------+---+
| CDR_ID | TelephoneNo | Call_ID | call_Duration | Call_Plan | c |
+--------+-------------+---------+---------------+-----------+---+
| 343 | xxx-33 | 11 | 17 | 700 | 1 |
| 643 | xxx-33 | 11 | 17 | 700 | 1 |
| 766 | xxx-74 | 32 | 1 | 300 | 1 |
+--------+-------------+---------+---------------+-----------+---+

using not exists()
select distinct *
from t
where not exists (
select 1
from t as i
where i.cdr_id = t.cdr_id
and i.telephoneno = t.telephoneno
and i.call_id = t.call_id
and i.call_duration = t.call_duration
and i.call_plan <> t.call_plan
)
rextester demo: http://rextester.com/RRNNE20636
returns:
+--------+-------------+---------+---------------+-----------+-----+
| cdr_id | TelephoneNo | Call_id | call_Duration | Call_Plan | cnt |
+--------+-------------+---------+---------------+-----------+-----+
| 343 | xxx-33 | 11 | 17 | 700 | 1 |
| 643 | xxx-33 | 11 | 17 | 700 | 1 |
| 766 | xxx-74 | 32 | 1 | 300 | 1 |
+--------+-------------+---------+---------------+-----------+-----+

Basically you should try this:
SELECT A.CDR_ID, A.TelephoneNo, A.Call_ID, A.call_Duration, A.Call_Plan
FROM YOUR_TABLE A
INNER JOIN (SELECT CDR_ID,TelephoneNo,Call_ID,call_Duration
FROM YOUR_TABLE
GROUP BY CDR_ID,TelephoneNo,Call_ID,call_Duration
HAVING COUNT(*)=1
) B ON A.CDR_ID= B.CDR_ID AND A.TelephoneNo=B.TelephoneNo AND A.Call_ID=B.Call_ID AND A.call_Duration=B.call_Duration
You can do a shorter query using Windows Function COUNT(*) OVER ...

Below query will provide you the result
SELECT CDR_ID,TelephoneNo,Call_ID,call_Duration,Call_Plan, COUNT(*)
FROM TABLE_NAME GROUP BY CDR_ID,TelephoneNo,Call_ID,call_Duration,Call_Plan
HAVING COUNT(*) < 2;
It gives you with the count as well. If not required you can remove it.

Select *, count(CDR_ID)
from table
group by CDR_ID, TelephoneNo, Call_ID, call_Duration, Call_Plan
having count(CDR_ID) = 1

Related

How to delete the rows with three same data columns and one different data column

I have a table "MARK_TABLE" as below.
How can I delete the rows with same "STUDENT", "COURSE" and "SCORE" values?
| ID | STUDENT | COURSE | SCORE |
|----|---------|--------|-------|
| 1 | 1 | 1 | 60 |
| 3 | 1 | 2 | 81 |
| 4 | 1 | 3 | 81 |
| 9 | 2 | 1 | 80 |
| 10 | 1 | 1 | 60 |
| 11 | 2 | 1 | 80 |
Now I already filtered the data I want to KEEP, but without the "ID"...
SELECT student, course, score FROM mark_table
INTERSECT
SELECT student, course, score FROM mark_table
The output:
| STUDENT | COURSE | SCORE |
|---------|--------|-------|
| 1 | 1 | 60 |
| 1 | 2 | 81 |
| 1 | 3 | 81 |
| 2 | 1 | 80 |
Use the following query to delete the desired rows:
DELETE FROM MARK_TABLE M
WHERE
EXISTS (
SELECT
1
FROM
MARK_TABLE M_IN
WHERE
M.STUDENT = M_IN.STUDENT
AND M.COURSE = M_IN.COURSE
AND M.SCORE = M_IN.SCORE
AND M.ID < M_IN.ID
)
OUTPUT
db<>fiddle demo
Cheers!!
use distinct
SELECT distinct student, course, score FROM mark_table
Assuming you don't just want to select the unique data you want to keep (you mention you've already done this), you can proceed as follows:
Create a temporary table to hold the data you want to keep
Insert the data you want to keep into the temporary table
Empty the source table
Re-Insert the data you want to keep into the source table.
select * from
(
select row_number() over (partition by student,course,score order by score)
rn,student,course,score from mark_table
) t
where rn=1
Use CTE with RowNumber
create table #MARK_TABLE (ID int, STUDENT int, COURSE int, SCORE int)
insert into #MARK_TABLE
values
(1,1,1,60),
(3,1,2,81),
(4,1,3,81),
(9,2,1,80),
(10,1,1,60),
(11,2,1,80)
;with cteDeleteID as(
Select id, row_number() over (partition by student,course,score order by score) [row_number] from #MARK_TABLE
)
delete from #MARK_TABLE where id in
(
select id from cteDeleteID where [row_number] != 1
)
select * from #MARK_TABLE
drop table #MARK_TABLE

SQL : Getting duplicate rows along with other variables

I am working on Terradata SQL. I would like to get the duplicate fields with their count and other variables as well. I can only find ways to get the count, but not exactly the variables as well.
Available input
+---------+----------+----------------------+
| id | name | Date |
+---------+----------+----------------------+
| 1 | abc | 21.03.2015 |
| 1 | def | 22.04.2015 |
| 2 | ajk | 22.03.2015 |
| 3 | ghi | 23.03.2015 |
| 3 | ghi | 23.03.2015 |
Expected output :
+---------+----------+----------------------+
| id | name | count | // Other fields
+---------+----------+----------------------+
| 1 | abc | 2 |
| 1 | def | 2 |
| 2 | ajk | 1 |
| 3 | ghi | 2 |
| 3 | ghi | 2 |
What am I looking for :
I am looking for all duplicate rows, where duplication is decided by ID and to retrieve the duplicate rows as well.
All I have till now is :
SELECT
id, name, other-variables, COUNT(*)
FROM
Table_NAME
GROUP BY
id, name
HAVING
COUNT(*) > 1
This is not showing correct data. Thank you.
You could use a window aggregate function, like this:
SELECT *
FROM (
SELECT id, name, other-variables,
COUNT(*) OVER (PARTITION BY id) AS duplicates
FROM users
) AS sub
WHERE duplicates > 1
Using a teradata extension to ISO SQL syntax, you can simplify the above to:
SELECT id, name, other-variables,
COUNT(*) OVER (PARTITION BY id) AS duplicates
FROM users
QUALIFY duplicates > 1
As an alternative to the accepted and perfectly correct answer, you can use:
SELECT {all your required 'variables' (they are not variables, but attributes)}
, cnt.Count_Dups
FROM Table_NAME TN
INNER JOIN (
SELECT id
, COUNT(1) Count_Dups
GROUP BY id
HAVING COUNT(1) > 1 -- If you want only duplicates
) cnt
ON cnt.id = TN.id
edit: According to your edit, duplicates are on id only. Edited my query accordingly.
try this,
SELECT
id, COUNT(id)
FROM
Table_NAME
GROUP BY
id
HAVING
COUNT(id) > 1

An SQL query that combines aggregate and non-aggregate values in one row

The following query gives me the information that I need but I want it to take it just a step further. In the table at the bottom (only showing a subset of the fields), I want to group by cust_line in an unusual way (at least to me it's unusual).
Let's look at the items with a cust_line of 2 as an example. I would like these to be represented by one line not 5. For this line, I would like to select all the fields except for the price field where the cust_part = "GROUPINVC". For the total field I would like it to be 'sum(total) as new_total' and for the price, I would like it to be new_total / qty_invoiced, where qty_invoiced is the value on the line where cust_part = "GROUPINV".
Is what I am asking for completely ridiculous? Is it even possible? I'm not advanced at SQL so it may also be easy and I just don't know how to approach it. I thought of using 'partition by' but I couldn't imagine how I would get it to work as I figured it would still return 5 rows where I only want 1.
I've also looked at these questions with similar titles but not really what I am looking for:
SQL query that returns aggregate AND non aggregate results
Combined aggregated and non-aggregate query in SQL
SELECT L.CUST_LINE, I.LINE_NO, I.ORDER_NO, I.STAGE, I.ORDER_LINE_POS, I.CUST_PART,
I.LINE_ITEM_NO, I.QTY_INVOICED, I.CUST_DESC, I.DESCRIPTION, I.SALE_UNIT_PRICE, I.PRICE_TOTAL,
I.INVOICE_NO, I.CUSTOMER_PO_NO, I.ORDER_NO, I.CUSTOMER_NO, I.CATALOG_DESC, I.ORDER_LINE_NOTES
FROM
(SELECT CUST_LINE, ORDER_NO, LINE_NO
FROM CUSTOMER_ORDER_LINE
GROUP BY CUST_LINE, ORDER_NO, LINE_NO
) L
INNER JOIN CUSTOMER_ORDER_IVC_REP I
ON I.ORDER_NO = L.ORDER_NO
WHERE RESULT_KEY = 999999
AND I.LINE_NO = L.LINE_NO
ORDER BY L.CUST_LINE;
| cust_line | line_no | cust_part | qty_invoiced | cust_desc | price | total |
| 1 | 4 | ... | 1 | ... | 55 | 55 |
| 2 | 1 | GROUPINV | 1 | some part | 0 | 0 |
| 2 | 6 | ... | 3 | ... | 0 | 0 |
| 2 | 2 | ... | 1 | ... | 0 | 0 |
| 2 | 3 | ... | 1 | ... | 0 | 0 |
| 2 | 7 | ... | 2 | ... | 10 | 20 |
| 3 | 7 | ... | 1 | ... | 67 | 67 |
You can use an analytic function to calculate a total over multiple rows of a result set, then filter out the rows you don't want.
Leaving out all the extra columns for sake of brevity:
SELECT cust_line, qty_invoiced, order_total/qty_invoiced AS price
FROM (
SELECT l.cust_line, qty_invoiced,
SUM(total) OVER (PARTITION BY l.cust_line) AS order_total,
COUNT(cust_line) OVER (PARTITION BY l.cust_line) AS group_count
FROM
(SELECT CUST_LINE, ORDER_NO, LINE_NO
FROM CUSTOMER_ORDER_LINE
GROUP BY CUST_LINE, ORDER_NO, LINE_NO
) L
INNER JOIN CUSTOMER_ORDER_IVC_REP I
ON I.ORDER_NO = L.ORDER_NO
WHERE RESULT_KEY = 999999
AND I.LINE_NO = L.LINE_NO
)
WHERE ( cust_part = 'GROUPINV' OR group_count = 1 )
ORDER BY cust_line
I am guessing on what you want in the PARTITION BY clause; this is essentially a GROUP BY that applies only to the SUM function. Not sure if you might also want order_no in the partition.
The trick is to select all the rows in the inner query, applying SUM across them all; then filter out the rows you are not interested in in the outermost query.

Find one single row for a column with a unique value using SQL

I have a table which contains data that similar to this:
RowID | CustomerID | Quantity | Type | .....
1 | 345 | 100 | Software | .....
2 | 1280 | 200 | Software | .....
3 | 456 | 20 | Hub | .....
4 | 345 | 100 | Software | .....
5 | 345 | 180 | Monitor | .....
6 | 23 | 15 | Router | .....
7 | 1280 | 120 | Software | .....
8 | 345 | 5 | Mac | .....
.... | .... | ... | ..... | .....
The database have hundreds of thousand of rows. As you can see, the CustomerID has duplicates.
What I want to do is to find EXACTLY ONE row for each unique CustomerID and Type combination and with Quantity more than 10.
For example, for the above table, I want to get:
RowID | CustomerID | Quantity | Type | .....
2 | 1280 | 200 | Software | .....
3 | 456 | 20 | Hub | .....
4 | 345 | 100 | Software | .....
5 | 345 | 180 | Monitor | .....
6 | 23 | 15 | Router | .....
What I tried to do is:
select distinct CustomerID, Type from MyTable
where Quantity > 10
Which gives me:
CustomerID | Type
1280 | Software
456 | Hub
345 | Software
345 | Monitor
23 | Router
But I don't know how to select other columns because if I do:
select distinct CustomerID, Type, RowID, Quantity from MyTable
where Quantity > 10
It returns every rows because the RowID is unique.
I think maybe I should use a subquery by iterating the result of the above query. Can someone help me on this?
Use Partition Over. This will allow you to group all similar rows together, and then you query that table to get just the first row. Note: An "order by" must be specified in the partition, even if you don't use the value. But it is useful for pulling the combination with the highest quantity. If you also want distinct Quantity, add that column to the select in the partition.
select CustomerId
, Type
FROM
(
select
CustomerId
, Type
, row_number() over (partition by CustomerId, Type order by Quantity desc) as rn
From MyTable
where Quantity > 10
) dta
Where rn = 1
Something like this will work (unless you have more requirements that you didn't mention):
SELECT CustomerID, Type, SUM(Quantity) AS Quantity
FROM MyTable
GROUP BY CustomerID, Type
HAVING SUM(Quantity) > 10
You need to choose which one of the "duplicated" rows to retrieve.
I wrote duplicated with quotes because they are not technically duplicated:
+-------+------------+----------+----------+
| RowID | CustomerID | Type | Quantity |
+-------+------------+----------+----------+
| 1 | 345 | Software | 100 |
| 2 | 345 | Software | 200 |
| 3 | 345 | Software | 300 |
+-------+------------+----------+----------+
All of this are different rows because of the different RowID and Quantity columns.
So you must to specify which one of these you want to retrieve.
For this example I will use the RowID and Quantity with the minimum value.
So I will tell SQL to pick this one, for this I will order the table by RowID and Quantity in ascending order and I will do a join with the same table
so I can pick up the first row with the lower RowID and Quantity for the same CustomerID and Type.
+-------+------------+----------+----------+
| RowID | CustomerID | Type | Quantity |
+-------+------------+----------+----------+
| 1 | 345 | Software | 100 |
+-------+------------+----------+----------+
The SQL code for this is the following:
SELECT
*
FROM
MyTable originalTable
WHERE
originalTable.Quantity > 10 AND
originalTable.RowID =
(
SELECT TOP 1 orderedTable.RowID
FROM MyTable orderedTable
WHERE orderedTable.CustomerID = originalTable.CustomerID AND orderedTable.Type = originalTable.Type
ORDER BY orderedTable.RowID ASC, orderedTable.Quantity ASC
)
One way is to use the row_number window function as partition the data by CustomerID and Type, and the filter out the first rows in each partition.
WITH Uniq AS (
SELECT
CustomerID, Type, RowID, Quantity,
rn = ROW_NUMBER() OVER (PARTITION BY CustomerID, Type ORDER BY RowID)
FROM MyTable WHERE Quantity > 10
)
SELECT * FROM Uniq WHERE rn = 1;
SQL Fiddle
Or you could find the a unique RowID (min or max) for each group of CustomerID and Type and use that as a source in a join, either as a common table expression of derived table:
WITH Uniq AS (
SELECT MIN(RowID) RowID FROM MyTable WHERE Quantity > 10 GROUP BY CustomerID, Type
)
SELECT MyTable.* FROM MyTable JOIN Uniq ON MyTable.RowID = Uniq.RowID
Sample SQL Fiddle

Looking for a solution to a SQL GROUP BY .... WHERE MIN date

EDIT: The following question is angled at both MS-SQL and MySQL.
I've been pondering over this for a good 7 hours now. I've seen many stack overflow answers that are similar, but none that i've properly understood or worked out how to implement.
I am looking to SELECT id, title, e.t.c e.t.c FROM a table, WHERE the date is the next available date AFTER NOW(). The catch is, it needs to be GROUPED BY one particular column.
Here is the table:
==================================
id | name | date_start | sequence_id
--------------------------------------------------------
1 | Foo1 | 20150520 | 70
2 | Foo2 | 20150521 | 70
3 | Foo3 | 20150522 | 70
4 | Foo4 | 20150523 | 70
5 | FooX | 20150524 | 70
6 | FooY | 20150525 | 70
7 | Bar | 20150821 | 61
8 | BarN | 20151110 | 43
9 | BarZ | 20151104 | 43
And here is what I would like to see:
==================================
id | name | date_start | sequence_id
--------------------------------------------------------
1 | Foo1 | 20150520 | 70
7 | Bar | 20150821 | 61
9 | BarZ | 20151104 | 43
The results are filtered by MIN(date_start) > NOW() AND GROUPED BY sequence_id.
I'm not entirely sure this can be achieved with a GROUP BY as the rest of the columns would need to be contained within an aggregate function which I don't think will work.
Does anyone have an answer for this dilemma?
Many Thanks!
Simon
Just use a join and aggregation in a subquery:
select t.*
from table t join
(select sequence_id, min(date_start) as minds
from table t
group by sequence_id
) tt
on tt.sequence_id = t.sequence_id and t.date_start = tt.minds;
This is standard SQL, so it should run in any database.
http://sqlfiddle.com/#!9/d8576/4
SELECT *
FROM table1 as t1
LEFT JOIN
(SELECT *
FROM table1
WHERE date_start>NOW()
) as t2
ON t1.sequence_id = t2.sequence_id and t1.date_start>t2.date_start
WHERE t1.date_start>NOW() and t2.date_start IS NULL
GROUP BY t1.sequence_id
MSSQL fiddle
SELECT *
FROM table1 as t1
LEFT JOIN
(SELECT *
FROM table1
WHERE date_start>GetDate()
) as t2
ON t1.sequence_id = t2.sequence_id and t1.date_start>t2.date_start
WHERE t1.date_start>GetDate() and t2.date_start IS NULL