I have a table like below.
Id amount
--------------
10. 12345
10. 12345
12. 34567
13. 34567
As per my business requirement same id with same amount is not duplicate record. different Ids wtih same amount is duplicate record. hope you understood the requirement.
In the above sample record I have to get the duplicate amount values and its count and at the same time Id should be different.
The expected query result is 34567 and count as 2.
IF you need to display id as well,
SELECT a.*
FROM
(
SELECT id, amount, count(1) OVER (PARTITION BY amount) num_dup
FROM table1
)a
WHERE a.num_dup >1
Update. If you care only about distinct id , use COUNT(DISTINCT id) instead of COUNT(1)
More examples.
With joining another table
SELECT a.*
FROM
(
SELECT a.id, a.amount,
count(distinct a.id) OVER (PARTITION BY a.amount) num_dup
FROM table1 a
INNER JOIN table2 b ON (b.id = a.id)
)a
WHERE a.num_dup >1
Without window function and without table1.id :
SELECT a.amount, count(distinct a.id)
FROM table1 a
INNER JOIN table2 b ON (b.id = a.id)
GROUP BY a.amount
HAVING count(distinct a.id) >1 ;
Without window function and with table1.id :
SELECT b.*
FROM
(
SELECT a.amount, count(distinct a.id)
FROM table1 a
INNER JOIN table2 b ON (b.id = a.id)
GROUP BY a.amount
HAVING count(distinct a.id) >1
)a
INNER JOIN table1 b ON (b.amount = a.amount)
Related
I have 4 tables as shown below
For each table I want the count of users that are present exclusively in that table (not present in other tables). The result should look something likes this
I have one way of getting desired result as shown below:
First Column:
SELECT COUNT(DISTINCT A.id) table1_only
FROM table1 A
LEFT JOIN (SELECT DISTINCT id
FROM table2
UNION
SELECT DISTINCT id
FROM table3
UNION
SELECT DISTINCT id
FROM table4) B
ON A.id = B.id
WHERE B.id IS NULL
Second Column:
SELECT COUNT(DISTINCT A.id) table2_only
FROM table2 A
LEFT JOIN (SELECT DISTINCT id
FROM table1
UNION
SELECT DISTINCT id
FROM table3
UNION
SELECT DISTINCT id
FROM table4) B
ON A.id = B.id
WHERE B.id IS NULL
Third Column:
SELECT COUNT(DISTINCT A.id) table3_only
FROM table3 A
LEFT JOIN (SELECT DISTINCT id
FROM table1
UNION
SELECT DISTINCT id
FROM table2
UNION
SELECT DISTINCT id
FROM table4) B
ON A.id = B.id
WHERE B.id IS NULL
Fourth Column:
SELECT COUNT(DISTINCT A.id) table4_only
FROM table4 A
LEFT JOIN (SELECT DISTINCT id
FROM table1
UNION
SELECT DISTINCT id
FROM table2
UNION
SELECT DISTINCT id
FROM table3) B
ON A.id = B.id
WHERE B.id IS NULL
But I wanted to know if there is any efficient and scalable way to get same result. Just for 4 tables the amount of code is too much.
Any ways of optimizing this task will be really helpful.
Sample fiddle. (This fiddle is for mysql, I am looking for a generic SQL based approach than any db specific approach)
P.S.:
There is no complusion on the result needs to be in column wise. It can be row wise as well, as shown below:
I would approach this by combining the data from all tables. Then aggregate and filter:
select which, count(*) as num_in_table_only
from (select id, min(which) as which, count(*) as cnt
from ((select id, 1 as which from table1) union all
(select id, 2 as which from table2) union all
(select id, 3 as which from table3) union all
(select id, 4 as which from table4)
) t
group by id
) i
where cnt = 1
group by which
Note: In your sample data, the ids are unique in each table. This solution assumes that is true, but can easily be tweaked to handle duplicates within a table.
I know this has been asked a lot but I can't seem to get my query working.
I'm trying to get only one row per id in a query looking like this :
SELECT a.id, b.name
FROM table1 a
LEFT JOIN table2 b ON a.key = b.key
WHERE a.Date =
(SELECT MAX(a1.date) from table1 WHERE a1.primarykey = a.primarykey)
GROUP BY a.id, b.name
I do not need to group by b.name but have to since I need to group by id.
Right now, I have multiple occurences for b.name which duplicates a.id where I just want the corresponding b.name for the last date for a.id.
Can anyone point me to the right way to do this ?
Thank you
I guess this condition:
WHERE a1.primarykey = a.primarykey
should be:
WHERE a1.key = a.key
and key is not the primary key of table1, because if you really mean the primary key then there is no point to search for the MAX(date) for the primary key since there is only 1 date for each primary key.
If I'm not wrong then try with row_number():
SELECT t.id, t.name
FROM (
SELECT a.id, b.name,
row_number() over (partition by a.key order by a.date desc) rn
FROM table1 a LEFT JOIN table2 b
ON a.key = b.key
) t
WHERE t.rn = 1
It looks like you would be getting 1 row per id if you would be removing b.name from your group statement.
Not sure why you would need to group on b.name if you group on a.id?
try this:
SELECT a.id, b.name from (
SELECT a1.id,a1.key,
rank() over(partition by a1.key order by a1.date desc) md FROM table1 a1 )a
LEFT JOIN table2 b ON a.key = b.key and a.md=1;
but I don't get -you need group by Id or key, double check it
So I know MS-Access does not allow SELECT COUNT(DISTINCT....) FROM ..., but I am trying to find a more viable alternative to the usual standard of
SELECT COUNT(*) FROM (SELECT DISTINCT Name FROM table1)
My problem is I am trying to do three separate Count functions and group them on ID. If I use the method above, it is giving me the total unique value count for the whole table instead of the total count for only the value of ID. I tried doing
(SELECT COUNT(*) FROM (SELECT DISTINCT Name FROM table1 as T2
WHERE T2.ColumnA = T1.ColumnA)) As MyVal
FROM table1 as T1
but it tells me I need to specify a value for T1.ColumnA.
The SQL query I am trying to accomplish is this:
SELECT ID
COUNT(DISTINCT ColumnA) as CA,
COUNT(DISTINCT ColumnB) as CB,
COUNT(DISTINCT ColumnC) as CC
FROM table1
GROUP BY ID
Any ideas?
You can use subqueries. Assuming you have a table where each id occurs once:
select (select count(*)
from (select columnA
from table1 t1
where t1.id = t.id
group by columnA
) as a
) as num_a,
(select count(*)
from (select columnB
from table1 t1
where t1.id = t.id
group by columnB
) as b
) as num_b,
(select count(*)
from (select columnC
from table1 t1
where t1.id = t.id
group by columnC
) as c
) as num_c
from <table with ids> as t;
I'm not sure if you'll think this is "viable".
EDIT:
This makes it even more complicated . . . it suggests that MS Access doesn't support correlation clauses more than one level deep (might you consider switching to another database?).
In any case, the brute force way:
select a.id, a.numA, b.numB, c.numC
from ((select id, count(*) as numA
from (select id, columnA
from table1 t1
group by id, columnA
) as a
) as a inner join
(select id, count(*) as numB
from (select id, columnB
from table1 t1
group by id, columnB
) as b
) as b
on a.id = b.id
) inner join
(select id, count(*) as numC
from (select id, columnC
from table1 t1
group by id, columnC
) as c
) c
on c.id = a.id;
I am using SQL Server 2012, and have the following query. Let's call this query A.
SELECT a.col, a.fk
FROM Table1 a
INNER JOIN (
select b.col
from Table1 b
group by b.col
having count(*) > 1)
b on b.col = a.col
I want to delete only the rows returned from query A, specifically rows that match the returned col AND fk
I am thinking of doing the following, but it will only delete rows that match on the col.
delete from Table1
where col in (
SELECT a.col
FROM Table1 a
INNER JOIN (
select b.col
from Table1 b
group by b.col
having count(*) > 1)
b on b.col = a.col)
)
Use delete from Join syntax
delete t1
from table1 t1
INNER JOIN (SELECT a.col, a.fk
FROM Table1 a
INNER JOIN (
select b.col
from Table1 b
group by b.col
having count(*) > 1)
b on b.col = a.col) t2
ON t1.col1=t2.col1 and t1.fk=t2.fk
you can combine col and fk fields to be another unique filed to retrieve wanted rows
delete from Table1
where cast(col as varchar(50))+'//'+cast(fk as varchar(50)) in (
SELECT cast(a.col as varchar(50))+'//'+cast(a.fk as varchar(50))
FROM Table1 a
INNER JOIN (
select b.col
from Table1 b
group by b.col
having count(*) > 1)
b on b.col = a.col)
)
You can express Query A like this:
SELECT col, fk
FROM (
SELECT a.col, a.fk, COUNT(*) OVER (PARTITION BY a.col) AS [count]
FROM Table1 a
) counted
WHERE [count] > 1
Which leads to a nice way to do the DELETE using a CTE:
;WITH ToDelete AS (
SELECT a.col, a.fk, COUNT(*) OVER (PARTITION BY a.col) AS [count]
FROM Table1 a
)
DELETE FROM ToDelete
WHERE [count] > 1
This does give the same result as the DELETE statement in your question though.
If you want to delete all but one row with the duplicate col value you can use something like this:
;WITH ToDelete AS (
SELECT a.col, a.fk
, ROW_NUMBER() OVER (PARTITION BY a.col ORDER BY a.fk) AS [occurance]
FROM Table1 a
)
DELETE FROM ToDelete
WHERE [occurance] > 1
The ORDER BY clause will determine which row is kept.
this is my table layout simplified:
table1: pID (pkey), data
table2: rowID (pkey), pID (fkey), data, date
I want to select some rows from table1 joining one row from table2 per pID for the most recent date for that pID.
I currently do this with the following query:
SELECT * FROM table1 as a
LEFT JOIN table2 AS b ON b.rowID = (SELECT TOP(1) rowID FROM table2 WHERE pID = a.pID ORDER BY date DESC)
This way of working is slow, probabaly because it has to do a subquery on each row of table 1. Is there a way to improve performance on this or do it another way?
You can try something on these lines, use the subquery to get the latest based on the date field (grouping by the pID), then join that with the first table, this way the subquery would not have not have to be executed for each row of Table1 and will result in better performance:
Select *
FROM Table1 a
INNER JOIN
(
SELECT pID, Max(Date) FROM Table2
GROUP BY pID
) b
ON a.pID = b.pID
I have provided the sample SQL for one column using the group by, in case you need additional columns, add them to the GROUP BY clause. Hope this helps.
use the below code, and note that i added the order by Date desc to get the most resent data
select *
from table1 a
inner join table2 b on a.pID=b.pID
where b.rowID in(select top(1) from table2 t where t.pID=a.pID order by Date desc)
I am using the code below in a similar scenaro (I transcripted it to your example)
SELECT b.*
FROM table1 AS a
left outer join (
SELECT a.*
FROM table2 a
inner join (
SELECT a.pID, max(date) as date
FROM table2
WHERE date <= <max_date>
group by pID
) b ON a.pID = b.pID AND a.date = b.date
) b ON a.pID = b.pID
) b on a.pID = b.pID
The only problem with this aproach is that you have to make sure the date's don't reapet for the pID's
You can do this with the row_number() function and a subquery:
SELECT t1.*
FROM table1 t1 LEFT JOIN
(select t2.*, row_number() over (partition by pId order by rowId desc) as seqnum
from table2 t2
) t2
on t1.pId = t2.pId and t2.seqnum = 1;
Use the ROW_NUMBER() function to get a column saying which id of each row in table 2 is the first (As partitioned by the pID, and ordered by the rowDate descending)
Example:
WITH cte AS
(
SELECT
rowID AS t2RowId,
ROW_NUMBER OVER (PARTITION BY pID ORDER BY rowDate DESC) AS rowNum
FROM table2 t2
) -- gets the t2RowIds + a column which says which is the latest for each pID
SELECT t1.*, t2.*
FROM table1 t1
LEFT JOIN
(
table2 t2
JOIN cte ON t2.rowID = cte.t2RowId AND cte.rowNum = 1
) ON t1.pID = t2.pID
This is guaranteed to only return 1 item from table2 per pID, even if multiple items have the same date. You should of course ensure that the date column is indexed in table 2 for quick performance (ideally an index that also covers the PrimaryID of table2)