SQL query for finding records where count > 1 - sql

I have a table named PAYMENT. Within this table I have a user ID, an account number, a ZIP code and a date. I would like to find all records for all users that have more than one payment per day with the same account number.
UPDATE: Additionally, there should be a filter than only counts the records whose ZIP code is different.
This is how the table looks like:
| user_id | account_no | zip | date |
| 1 | 123 | 55555 | 12-DEC-09 |
| 1 | 123 | 66666 | 12-DEC-09 |
| 1 | 123 | 55555 | 13-DEC-09 |
| 2 | 456 | 77777 | 14-DEC-09 |
| 2 | 456 | 77777 | 14-DEC-09 |
| 2 | 789 | 77777 | 14-DEC-09 |
| 2 | 789 | 77777 | 14-DEC-09 |
The result should look similar to this:
| user_id | count |
| 1 | 2 |
How would you express this in a SQL query? I was thinking self join but for some reason my count is wrong.

Use the HAVING clause and GROUP By the fields that make the row unique
The below will find
all users that have more than one payment per day with the same account number
SELECT
user_id,
COUNT(*) count
FROM
PAYMENT
GROUP BY
account,
user_id,
date
HAVING COUNT(*) > 1
Update
If you want to only include those that have a distinct ZIP you can get a distinct set first and then perform you HAVING/GROUP BY
SELECT
user_id,
account_no,
date,
COUNT(*)
FROM
(SELECT DISTINCT
user_id,
account_no,
zip,
date
FROM
payment
) payment
GROUP BY
user_id,
account_no,
date
HAVING COUNT(*) > 1

Try this query:
SELECT column_name
FROM table_name
GROUP BY column_name
HAVING COUNT(column_name) = 1;

I wouldn't recommend the HAVING keyword for newbies, it is essentially for legacy purposes.
I am not clear on what is the key for this table (is it fully normalized, I wonder?), consequently I find it difficult to follow your specification:
I would like to find all records for all users that have more than one
payment per day with the same account number... Additionally, there
should be a filter than only counts the records whose ZIP code is
different.
So I've taken a literal interpretation.
The following is more verbose but could be easier to understand and therefore maintain (I've used a CTE for the table PAYMENT_TALLIES but it could be a VIEW:
WITH PAYMENT_TALLIES (user_id, zip, tally)
AS
(
SELECT user_id, zip, COUNT(*) AS tally
FROM PAYMENT
GROUP
BY user_id, zip
)
SELECT DISTINCT *
FROM PAYMENT AS P
WHERE EXISTS (
SELECT *
FROM PAYMENT_TALLIES AS PT
WHERE P.user_id = PT.user_id
AND PT.tally > 1
);

create table payment(
user_id int(11),
account int(11) not null,
zip int(11) not null,
dt date not null
);
insert into payment values
(1,123,55555,'2009-12-12'),
(1,123,66666,'2009-12-12'),
(1,123,77777,'2009-12-13'),
(2,456,77777,'2009-12-14'),
(2,456,77777,'2009-12-14'),
(2,789,77777,'2009-12-14'),
(2,789,77777,'2009-12-14');
select foo.user_id, foo.cnt from
(select user_id,count(account) as cnt, dt from payment group by account, dt) foo
where foo.cnt > 1;

Related

Update in the same column from the same table

I'm trying to update a column in my table that was ignored at the initial insert based on a key and not null values in the same column.
My table is a history table in a data warehouse : it consists of (to simplify):
id which is its primary key
employee_id
date_of_birth
project_id
The rows help the company keep track of projects that an employee had worked on.
The problem is that when updating this table, the date_of_birth column is ignored, which is a problem for me since I'm working on a project that needs the age of the employee at the time he changed projects.
Actual:
+----+-------------+---------------+------------+
| ID | EMPLOYEE_ID | YEAR_OF_BIRTH | PROJECT_ID |
+----+-------------+---------------+------------+
| 1 | 1 | 1980 | 1 |
| 2 | 1 | NULL | 2 |
| 3 | 2 | 1990 | 2 |
| 4 | 2 | NULL | 1 |
+----+-------------+---------------+------------+
And this what I want:
+----+-------------+---------------+------------+
| ID | EMPLOYEE_ID | YEAR_OF_BIRTH | PROJECT_ID |
+----+-------------+---------------+------------+
| 1 | 1 | 1980 | 1 |
| 2 | 1 | 1980 | 2 |
| 3 | 2 | 1990 | 2 |
| 4 | 2 | 1990 | 1 |
+----+-------------+---------------+------------+
We could try using COALESCE to conditionally replace a NULL year of birth with a non NULL value:
SELECT
ID,
EMPLOYEE_ID,
COALESCE(YEAR_OF_BIRTH, MAX(YEAR_OF_BIRTH) OVER (PARTITION BY EMPLOYEE_ID)) AS YEAR_OF_BIRTH,
PROJECT_ID
FROM yourTable;
The following query should do what you want:
UPDATE yourTable
SET YEAR_OF_BIRTH = (SELECT MIN(YEAR_OF_BIRTH) FROM yourTable a where a.EMPLOYEE_ID = EMPLOYEE_ID)
WHERE YEAR_OF_BIRTH IS NULL
According to your sample data, you can also use a correlated subquery as
SELECT T1.ID,
T1.EMPLOYEE_ID,
ISNULL(YEAR_OF_BIRTH,(
SELECT MAX(T2.YEAR_OF_BIRTH)
FROM T T2
WHERE T2.EMPLOYEE_ID = T1.EMPLOYEE_ID
)),
T1.PROJECT_ID
FROM T T1 ;
OR
SELECT ID,
EMPLOYEE_ID,
ISNULL(YEAR_OF_BIRTH, MAX(YEAR_OF_BIRTH) OVER (PARTITION BY EMPLOYEE_ID)) AS YEAR_OF_BIRTH,
PROJECT_ID
FROM T;
Demo
I would use an updatable CTE for this purpose:
with toupdate as (
select a.*, min(year_of_birth) over (partition by employee_id) as min_date_of_birth
from actual a
)
update toupdate
set date_of_birth = min_date_of_birth
where date_of_birth is null or date_of_birth <> min_ date_of_birth;
The where clause reduces the number of rows being updated.
That said, FIX YOUR DATA MODEL. Sorry for raising my voice. The date-of-birth information should not be stored in this table. It should be in the employee table, because an employee has only one of them.
Your desired output can get by this query:
SELECT ID, EMPLOYEE_ID,
MAX(YEAR_OF_BIRTH) OVER (PARTITION BY EMPLOYEE_ID) AS YEAR_OF_BIRTH,
PROJECT_ID
FROM Table1
To check the output of the query you can Click Here

Select similar records in SQL

I am looking to find some records that are very similar (for all intents and purposes, duplicate records) after adding a new field to a table. Here's some sample data.
+-----+------+-----+------+-------------+----------+
| Id | Task | Sig | Form | Description | Location |
+-----+------+-----+------+-------------+----------+
| 255 | 5000 | 1 | 1 | Record 1 | (null) |
| 256 | 5000 | 1 | 1 | Record 1 | 000 |
| 257 | 5001 | 1 | 1 | Record 2 | 0T3 |
| 258 | 5001 | 1 | 2 | Record 3 | 0T3 |
| 259 | 5002 | 1 | 1 | Record 4 | 001 |
| 260 | 5003 | 1 | 1 | Record 5 | 001 |
+-----+------+-----+------+-------------+----------+
How could I design the query to just find 'duplicate' records whose only difference is the Location field?
If I use a query like this:
SELECT *
FROM MY_SAMPLE_TABLE
WHERE Task IN
(SELECT Task FROM MY_SAMPLE_TABLE
GROUP BY Task, Sig, Form, Description HAVING COUNT(*) > 1);
It returns any records with the same Task, unfortunately. And this is a table with tens of thousands of records.
One simple method is to use window functions:
select t.*
from (select t.*, count(*) over (partition by task, sig, form, description) as cnt
from my_sample_table
) t
where cnt > 1;
If you actually want the locations to be different, you can use count(distinct):
select t.*
from (select t.*,
count(distinct location) over (partition by task, sig, form, description) as cnt
from my_sample_table
) t
where cnt > 1;
If you want to treat NULL as a "different" value, then the logic is a little more complex.
Presumably each record has a unique id.
Since you should have the id of the new record, just do a join:
SELECT IF(new.id=all.id, "New", id) AS recordnum
, all.task
, all.sig
, all.form
, all.description
, all.location
FROM my_sample_table new
INNER JOIN my_sample_table all
ON new.task=all.task
AND new.sig=all.sig
AND new.form=all.form
AND new.description=all.description
-- AND new.id<>all.id -- optional to exclude the new record from the output
WHERE new.id=$THE_INSERTED_ID
If you don't want to do this on insert, but retrospectively,
SELECT task, sig,form, description, COUNT(*), GROUP_CONCAT(id), GROUP_CONCAT(location)
FROM my_sample_table
GROUP BY task, sig,form, description
HAVING COUNT(*)>1
and just use this as a subquery to get the row level records....
SELECT r.*, all.count_dups, all.ids, all.locns
FROM my_sample_table r
INNER JOIN (
SELECT task, sig,form, description, COUNT(*) as count_ups,
GROUP_CONCAT(id) AS ids, GROUP_CONCAT(location) AS locns
FROM my_sample_table
GROUP BY task, sig,form, description
HAVING COUNT(*)>1
) all
ON r.task=all.task
AND r.sig=all.sig
AND r.description=all.description

Using aggregate function to return minimum value

Please, help me to create a query to determine minimum date_time from the table below:
ID | Name | Date_Time | Location
---------------------------------------
001 | John | 01/01/2015 | 901
001 | john | 02/01/2015 | 903
001 | john | 05/01/2015 | 905
001 | john | 06/01/2015 | 904
002 | Jack | 01/01/2015 | 903
002 | Jack | 03/01/2015 | 904
002 | Jack | 04/01/2015 | 905
003 | Sam | 01/01/2015 | 904
003 | Sam | 03/01/2015 | 903
003 | Sam | 04/01/2015 | 901
003 | Sam | 06/01/2015 | 903
I tried this query:
SELECT ID, NAME, MIN(DATE_TIME), LOCATION
FROM TABLE
GROUP BY (ID)
but I got this error message:
ORA-00979: not a GROUP BY expression
If you use aggregation function, you have specify for which fields the agregation should be applied. So you are using group by clause. In this case you probably mean to find the minimum date_time for each id, name combination.
select id, name, min(date_time)
from my_table group by id, name
When you group something, all other rows will be left clustered to that grouped key. For a key, you can only fetch one of the row(entity) in SELECT.
Shortcut is, what ever in GROUP BY can be in SELECT freely. Otherwise, they have to be enclosed in a AGGREGATE function.
When you group by id,
001 key has 4 rows clustered to it.. Just think, what would happen when you specify non grouped column in SELECT. Where-as when you use MIN(date).. out of 4 dates, a minimum of one is taken.
So, your query has to be
SELECT ID,MIN(NAME),MIN(LOCATION),MIN(DATE)
FROM TABLE
GROUP BY ID
OR
SELECT ID,LOCATION,NAME,MIN(DATE)
FROM TABLE
GROUP BY ID,LOCATION,NAME
OR
Analytical approach.
SELECT ID,LOCATION,DATE,MIN(DATE) OVER(PARTITION BY ID ORDER BY NULL) AS MIN_DATE
FROM TABLE.
Still, it is upto the requirements, on how the query has to be re-written.
EDIT: To fetch rows corresponding the Min date, we can create a SELF JOIN like one below.
SELECT T.ID,T.NAME,T.LOCATION,MIN_DATE
FROM
(
SELECT ID,MIN(DATE) AS MIN_DATE
FROM TABLE T1
GROUP BY ID
) AGG, TABLE T
WHERE T.ID = AGG.ID
AND T.DATE = AGG.MIN_DATE
OR
SELECT ID,NAME,LOCATION,MIN_DATE
FROM
(
SELECT ID,
NAME,
LOCATION,
MIN(DATE) OVER(PARTITION BY ID ORDER BY NULL) MIN_DATE,
ROW_NUMBER() OVER(PARTITION BY ID ORDER BY NULL) RNK
FROM TABLE
)
WHERE RNK = 1;
Try grouping all the other columns ... and if the table is name 'table' try changing the table name to something else in the schema.
SELECT ID , NAME , MIN(DATE_TIME) , LOCATION FROM TABLE GROUP BY ID, Name, Location
select t1.name,t1.id,t1.location,t1.date from (select id,MIN(Date) as min_date from table group by id ) t2 inner join TABLE t1 on t1.date=t2.min_date and t1.id=t2.id;

Selecting unique records from database

Running this query,
select * from table;
Returns the following
|branch | number |
-------------------
| 1 | 123 |
| 1 | 001 |
| 2 | 123 |
| 3 | 123 |
| 4 | 123 |
| 1 | 123 |
| 1 | 789 |
| 2 | 123 |
| 3 | 123 |
| 4 | 009 |
I want to find values that are unique to ONLY branch 1
| 1 | 001 |
| 1 | 789 |
Can this be done without the data being stored in separate tables? I've tried a few "select distinct" queries & don't seem to get the results I'm expecting.
SELECT branch, number
FROM table
WHERE branch = 1
GROUP BY branch, number
If you do not need any aggregates, you can use distinct instead of group by:
select distinct branch
, number
from YourTable
where branch = 1
I guess what I'm trying to say is that I want to find all numbers that are unique to ONLY branch 1. If they are found in any other branch, I don't want to see them.
I guess this is what you want.
SELECT distinct number
FROM MyTable
WHERE branch=1 and number not in
( SELECT distinct number
FROM MyTable
WHERE branch != 1 )
Try this:
SELECT branch, number
FROM table
GROUP BY branch, number
Here is a SQLFiddle for you to have a look at
If you want to limit it to only branch 1, then just add a where clause.
SELECT branch, number
FROM table
WHERE branch = 1
GROUP BY branch, number
To select all values that are unique in column number and have a branch value of 1 you can use the following code:
SELECT branch, number
FROM table1
WHERE number IN (
SELECT number
FROM table1
GROUP BY number
HAVING (COUNT(number ) = 1)
)
AND branch = 1
For a demo see http://sqlfiddle.com/#!2/97145/62

How to use count and group by at the same select statement

I have an SQL SELECT query that also uses a GROUP BY,
I want to count all the records after the GROUP BY clause filtered the resultset.
Is there any way to do this directly with SQL? For example, if I have the table users and want to select the different towns and the total number of users:
SELECT `town`, COUNT(*)
FROM `user`
GROUP BY `town`;
I want to have a column with all the towns and another with the number of users in all rows.
An example of the result for having 3 towns and 58 users in total is:
Town
Count
Copenhagen
58
New York
58
Athens
58
This will do what you want (list of towns, with the number of users in each):
SELECT `town`, COUNT(`town`)
FROM `user`
GROUP BY `town`;
You can use most aggregate functions when using a GROUP BY statement
(COUNT, MAX, COUNT DISTINCT etc.)
Update:
You can declare a variable for the number of users and save the result there, and then SELECT the value of the variable:
DECLARE #numOfUsers INT
SET #numOfUsers = SELECT COUNT(*) FROM `user`;
SELECT DISTINCT `town`, #numOfUsers FROM `user`;
You can use COUNT(DISTINCT ...) :
SELECT COUNT(DISTINCT town)
FROM user
The other way is:
/* Number of rows in a derived table called d1. */
select count(*) from
(
/* Number of times each town appears in user. */
select town, count(*)
from user
group by town
) d1
Ten non-deleted answers; most do not do what the user asked for. Most Answers mis-read the question as thinking that there are 58 users in each town instead of 58 in total. Even the few that are correct are not optimal.
mysql> flush status;
Query OK, 0 rows affected (0.00 sec)
SELECT province, total_cities
FROM ( SELECT DISTINCT province FROM canada ) AS provinces
CROSS JOIN ( SELECT COUNT(*) total_cities FROM canada ) AS tot;
+---------------------------+--------------+
| province | total_cities |
+---------------------------+--------------+
| Alberta | 5484 |
| British Columbia | 5484 |
| Manitoba | 5484 |
| New Brunswick | 5484 |
| Newfoundland and Labrador | 5484 |
| Northwest Territories | 5484 |
| Nova Scotia | 5484 |
| Nunavut | 5484 |
| Ontario | 5484 |
| Prince Edward Island | 5484 |
| Quebec | 5484 |
| Saskatchewan | 5484 |
| Yukon | 5484 |
+---------------------------+--------------+
13 rows in set (0.01 sec)
SHOW session status LIKE 'Handler%';
+----------------------------+-------+
| Variable_name | Value |
+----------------------------+-------+
| Handler_commit | 1 |
| Handler_delete | 0 |
| Handler_discover | 0 |
| Handler_external_lock | 4 |
| Handler_mrr_init | 0 |
| Handler_prepare | 0 |
| Handler_read_first | 3 |
| Handler_read_key | 16 |
| Handler_read_last | 1 |
| Handler_read_next | 5484 | -- One table scan to get COUNT(*)
| Handler_read_prev | 0 |
| Handler_read_rnd | 0 |
| Handler_read_rnd_next | 15 |
| Handler_rollback | 0 |
| Handler_savepoint | 0 |
| Handler_savepoint_rollback | 0 |
| Handler_update | 0 |
| Handler_write | 14 | -- leapfrog through index to find provinces
+----------------------------+-------+
In the OP's context:
SELECT town, total_users
FROM ( SELECT DISTINCT town FROM canada ) AS towns
CROSS JOIN ( SELECT COUNT(*) total_users FROM canada ) AS tot;
Since there is only one row from tot, the CROSS JOIN is not as voluminous as it might otherwise be.
The usual pattern is COUNT(*) instead of COUNT(town). The latter implies checking town for being not null, which is unnecessary in this context.
With Oracle you could use analytic functions:
select town, count(town), sum(count(town)) over () total_count from user
group by town
Your other options is to use a subquery:
select town, count(town), (select count(town) from user) as total_count from user
group by town
If you want to order by count (sound simple but i can`t found an answer on stack of how to do that) you can do:
SELECT town, count(town) as total FROM user
GROUP BY town ORDER BY total DESC
You can use DISTINCT inside the COUNT like what milkovsky said
in my case:
select COUNT(distinct user_id) from answers_votes where answer_id in (694,695);
This will pull the count of answer votes considered the same user_id as one count
I know this is an old post, in SQL Server:
select isnull(town,'TOTAL') Town, count(*) cnt
from user
group by town WITH ROLLUP
Town cnt
Copenhagen 58
NewYork 58
Athens 58
TOTAL 174
If you want to select town and total user count, you can use this query below:
SELECT Town, (SELECT Count(*) FROM User) `Count` FROM user GROUP BY Town;
if You Want to use Select All Query With Count Option, try this...
select a.*, (Select count(b.name) from table_name as b where Condition) as totCount from table_name as a where where Condition
Try the following code:
select ccode, count(empno)
from company_details
group by ccode;