I want to get redundant records from the database. Is my query correct for this?
select (fields)
from DB
group by name, city
having count(*) > 1
If wrong please let me know how can I correct this.
Also if I want to delete duplicate record will it work?
delete from tbl_name
where row_id in
(select row_id from tbl_name group by name, city having count(*) > 1)
so i can make the above query like this
DELETE FROM tb_name where row_id not in(select min(row_id) from tb_name groupBy(name, city) having count(*)>1)
Your DELETE syntax is definitely totally wrong - that won't work ever. What it'll do is delete all rows that have more than one occurence - not leaving any data around...
What you can do in SQL Server 2005 and up is use a CTE (Common Table Expression) and the
ROW_NUMBER() ranking function:
;WITH Duplicates AS
(
SELECT
Name, City,
ROW_NUMBER() OVER (PARTITION BY Name, City ORDER BY City) AS 'RowNum'
)
DELETE FROM dbo.YourTable
WHERE RowNum > 1
You basically create "partitions" of your data by the (name, city) combo - each of those pairs will get sequential numbers from 1 on up.
Those that have more than one occurence will also have entries in that CTE with a RowNum > 1 - just delete all of those and your duplicates are done!
Read about Using Common Table Expressions in SQL Server 2005 and about Ranking Functions and Performance in SQL Server 2005 (or consult the MSDN docs on those topics)
You have the syntax wrong:
select name, city, count(*) from table group by name, city having count(*) > 1
If you are not interested in the actual count, remove ", count(*)" from the query
Related
fairly new to PostgreSQL and trying out a few count queries. I'm looking to count and count distinct all values in a table. Pretty straightforward -
CountD Count
351 400
With a query like this:
SELECT COUNT(*)
COUNT(id) AS count_id,
COUNT DISTINCT(id) AS count_d_id
FROM table
I see that I can create a single column this way:
SELECT COUNT(*) FROM (SELECT DISTINCT id FROM table) AS count_d_id
But the title (count_d_id) doesn't come through properly and unsure how can I add an additional column. Guidance appreciated
This is the correct syntax:
SELECT COUNT(id) AS count_id,
COUNT(DISTINCT id) AS count_d_id
FROM table
Your original query aliases the subquery rather than the column. You seem to want:
SELECT COUNT(*) AS count_d_id FROM (SELECT DISTINCT id FROM table) t
-- column alias --^ -- subquery alias --^
This code I have finds duplicate rows in a table. H
SELECT position, name, count(*) as cnt
FROM team
GROUP BY position, name,
HAVING COUNT(*) > 1
How do I delete the duplicate rows that I have found in Hiveql?
Apart from distinct, you can use row_number for this in Hive. Explicit delete and update can only be performed on tables that support ACID. So insert overwrite is more universal.
insert overwrite table team
select position, name, other1, other2...
from (
select
*,
row_number() over(partition by position, name order by rand()) as rn
from team
) tmp
where rn = 1
;
Please try this.assuming id is primary key column
delete from team where id in (
select t1.id from team t1,
(SELECT position, name, count(*) as cnt ,max(id) as id1
FROM team
GROUP BY position, name,
HAVING COUNT(*) > 1) t2
where t1.position=t2.position
and t1.name=t2.name
and t1.id<>t2.id1)
This is an alternative way, since deletes are expensive in Hive
Create table Team_new
As
Select distinct <col1>, <col2>,...
from Team;
Drop table Team purge;
Alter table Team_new rename to Team;
This is assuming you don’t have an id column. If you have an id column then the 1st query would change slightly as
Create table Team_new
As
Select <col1>,<col2>,...,max(id) as id from Team
Group by <col1>,<col2>,... ;
Other queries (drop & alter post this) would remain the same as above.
I have the following SQL related question:
Let us assume I have the following simple data table:
I would like to identify the most common street address and place it in column 3:
I think this should be fairly straight-forward using COUNT? Not quite sure how to go about it though. Any help is greatly appreciated
Regards
This is a very long method that I just wrote. It only lists the most frequent address. You have to get these values and insert them into the table. See if it works for you:
select * from
(select d.company, count(d.address) as final, c.maxcount,d.address
from dbo.test d inner join
(select a.company,max(a.add_count) as maxcount from
(select company,address,count(address) as add_count from dbo.test group by company,address)a
group by a.company) c
on (d.company = c.company)
group by d.company,c.maxcount,d.address)e
where e.maxcount=e.final
Here is a query in standard SQL. It first counts records per company and address, then ranks them per company giving the most often occurring address rank #1. Then it only keeps those best ranked address records, joins with the table again and shows the results.
select
mytable.company,
mytable.address,
ranked.address as most_common_address
from mytable
join
(
select
company,
address,
row_number() over (partition by company oder by cnt desc) as rn
from
(
select
company,
address,
count(*) over (partition by company, address) as cnt
from mytable
) counted
) ranked on ranked.rn = 1
and ranked.company = mytable.company
and ranked.address = mytable.address;
This select statement will give you the most frequent occurrence. Let us call this A.
SELECT `value`,
COUNT(`value`) AS `value_occurrence`
FROM `my_table`
GROUP BY `value`
ORDER BY `value_occurrence` DESC
LIMIT 1;
To INSERT this into your table,
INSERT INTO db (col1, col2, col3) VALUES (val1, val2, A)
Note that you want that whole select statment for A!
You don't mention your DBMS. Here is a solution for Oracle.
select
company,
address,
(
select stats_mode(address)
from mytable this_company_only
where this_company_only.company = mytable.company
) as most_common_address
from mytable;
This looks a bit clumsy, because STATS_MODE is only available as an aggregate function, not as an analytic window function.
I am looking to add in a simple counter to an SQL Query - i.e., if I run a query on individual surnames that returns 3 results I would like the results to display their row value from the query. E.g.:
Surnames Counter
Smith 1
Murphy 2
Brown 3
How can this be done?
The row_number analyitic function should do the trick:
SELECT surname, ROW_NUMBER() OVER (ORDER BY surname) AS counter
FROM my_table
EDIT:
In a simple query like this, you could just use the rownum pseudocolumn:
SELECT surname, rownum
FROM my_table
ORDER BY 1 DESC
select a.*
, rownum rnum
from
(select surname from name_table order by surname) a
will get you a simple numbering (according to the order of surname, but will not deal with ties.
You can try this one
SELECT Surnames,row_number() OVER (ORDER BY Surnames)
FROM t
SQL Fiddle:- http://sqlfiddle.com/#!4/ad3aa/3
SELECT Surnames,row_number() OVER (ORDER BY Surnames) as Counter FROM table;
Cerate an SQL Expression Field named something like 'RowNumber' with simply 'rownum' as the statement. Add that field to where you want it in the details row, then sort your report by the name of your Expression Field, 'RowNumber' in this case.
I have a Personal table with a LastName column and a MaybeUniqueID.
I want to put in output a table with a LastName column where the counter set on to the column MaybeUniqueID gives more than 1 row.
I would like to do everything in one unique run, avoiding mid-step outputs.
I prefere not using temporary table or table variables, otherwise I would like to use at most one table variable (not temporary tables), but I think this should not be necessary.
I am using Microsoft SQL Server 2005.
I tried different scenarios with different SQL statements like HAVING or GROUP BY, but I failed to get the outcome I am looking for.
Please have a look at the following not-working summary test:
SELECT LastName
FROM Personal
JOIN
(SELECT MaybeUniqueID AS ID2,
COUNT(*) AS CNT
FROM Personal
--WHERE CNT > 1
GROUP BY MaybeUniqueID
HAVING cnt > 1
) AS MultiMaybeUniqueID
ON Personal.MaybeUniqueID = MultiMaybeUniqueID.ID2
HAVING cnt > 1 should be HAVING COUNT(*) > 1.
Column aliases can only be referenced in the ORDER BY clause not the HAVING clause.
Though you could also use
;WITH T AS
(
SELECT LastName,
COUNT(*) OVER (PARTITION BY MaybeUniqueID) AS Cnt
FROM Personal
)
SELECT LastName
FROM T
WHERE Cnt > 1