SQL Server: GROUP BY with multiple columns produces duplicate results - sql

I'm trying to include a 3rd column into my existing SQL Server query but I am getting duplicate result values.
Here is an example of the data contained in tb_IssuedPermits:
| EmployeeName | Current |
|--------------|---------|
| Person A | 0 |
| Person A | 0 |
| Person B | 1 |
| Person C | 0 |
| Person B | 0 |
| Person A | 1 |
This is my current query which produces duplicate values based on 1 or 0 bit values.
SELECT EmployeeName, COUNT(*) AS Count, [Current]
FROM tb_IssuedPermits
GROUP BY EmployeeName, [Current]
| EmployeeName | Count | Current |
|--------------|-------|---------|
| Person A | 2 | 0 |
| Person B | 1 | 0 |
| Person C | 1 | 0 |
| Person A | 1 | 1 |
| Person B | 1 | 1 |
Any ideas on how I can amend my query to have the following expected result? I want one result row per EmployeeName. And Current shall be 1, if for the EmployeeName exists a row with Current = 1, else it shall be 0.
| EmployeeName | Count | Current |
|--------------|-------|---------|
| Person A | 3 | 1 |
| Person B | 2 | 1 |
| Person C | 1 | 0 |
The result does not need to be in any specific order.
TIA

If your Current column contains the string values 'FALSE' and 'TRUE' you can do this
SELECT EmployeeName, Count(*) AS Count,
MAX([Current]) AS Current
FROM tb_IssuedPermits
GROUP BY EmployeeName
It's a hack but it works: MAX will get the TRUE from each group if there is one.
If your Current column is a BIT, cast to INT and cast back, as #ThorstenKettner suggested.
SELECT EmployeeName,
Count(*) AS Count,
CAST(MAX(CAST([Current] AS INT)) AS BIT) AS Current
FROM tb_IssuedPermits
GROUP BY EmployeeName
Alternatively, you can use conditional aggregation:
SELECT EmployeeName,
Count(*) AS Count,
CAST(COUNT(NULLIF(Current, 0)) AS BIT) AS Current
FROM tb_IssuedPermits
GROUP BY EmployeeName

you can do like this
SELECT EmployeeName, Count(1) AS Count,SUM(CAST([Current]AS INT)) AS Current FROM tb_IssuedPermits GROUP BY EmployeeName

Related

Get some values from the table by selecting

I have a table:
| id | Number |Address
| -----| ------------|-----------
| 1 | 0 | NULL
| 1 | 1 | NULL
| 1 | 2 | 50
| 1 | 3 | NULL
| 2 | 0 | 10
| 3 | 1 | 30
| 3 | 2 | 20
| 3 | 3 | 20
| 4 | 0 | 75
| 4 | 1 | 22
| 4 | 2 | 30
| 5 | 0 | NULL
I need to get: the NUMBER of the last ADDRESS change for each ID.
I wrote this select:
select dh.id, dh.number from table dh where dh =
(select max(min(t.history)) from table t where t.id = dh.id group by t.address)
But this select not correctly handling the case when the address first changed, and then changed to the previous value. For example id=1: group by return:
| Number |
| -------- |
| NULL |
| 50 |
I have been thinking about this select for several days, and I will be happy to receive any help.
You can do this using row_number() -- twice:
select t.id, min(number)
from (select t.*,
row_number() over (partition by id order by number desc) as seqnum1,
row_number() over (partition by id, address order by number desc) as seqnum2
from t
) t
where seqnum1 = seqnum2
group by id;
What this does is enumerate the rows by number in descending order:
Once per id.
Once per id and address.
These values are the same only when the value is 1, which is the most recent address in the data. Then aggregation pulls back the earliest row in this group.
I answered my question myself, if anyone needs it, my solution:
select * from table dh1 where dh1.number = (
select max(x.number)
from (
select
dh2.id, dh2.number, dh2.address, lag(dh2.address) over(order by dh2.number asc) as prev
from table dh2 where dh1.id=dh2.id
) x
where NVL(x.address, 0) <> NVL(x.prev, 0)
);

SQL group by a field and only return one joined row for each grouping

Table data
+-----+----------------+--------+----------------+
| ID | Required_by | Name | Another_Field |
+-----+----------------+--------+----------------+
| 1 | 7 August | cat | X |
| 2 | 7 August | cat | Y |
| 3 | 10 August | cat | Z |
| 4 | 11 August | dog | A |
+-----+----------------+--------+----------------+
What I want to do is group by the name, then for each group choose one of the rows with the earliest required by date.
For this data set, I would like to end up with either rows 1 and 4, or rows 2 and 4.
Expected result:
+-----+----------------+--------+----------------+
| ID | Required_by | Name | Another_Field |
+-----+----------------+--------+----------------+
| 1 | 7 August | cat | X |
| 4 | 11 August | dog | A |
+-----+----------------+--------+----------------+
OR
+-----+----------------+--------+----------------+
| ID | Required_by | Name | Another_Field |
+-----+----------------+--------+----------------+
| 2 | 7 August | cat | Y |
| 4 | 11 August | dog | A |
+-----+----------------+--------+----------------+
I have something that returns 1,2 and 4 but I'm not sure how to only pick one from the first group to get the desired result. I'm joining the grouping with the data table so that I can get the ID and another_field back after the grouping.
SELECT d.id, d.name, d.required_by, d.another_field
FROM
(
SELECT min(required_by) as min_date, name
FROM data
GROUP BY name
) agg
INNER JOIN
data d
on d.required_by = agg.min_date AND d.name = agg.name
This is typically solved using window functions:
select d.id, d.name, d.required_by, d.another_field
from (
select id, name, required_by, another_field,
row_number() over (partition by name order by required_by) as rn
from data
) d
where d.rn = 1;
In Postgres using distinct on() is typically faster:
select distinct on (name) *
from data
order by name, required_by
Online example
SELECT [id]
,[date]
,[name]
FROM [test].[dbo].[data]
WHERE date IN (SELECT min(date) FROM data GROUP BY name)
enter image description here

SQL - Count the occurence of one column value in another

I have a table as shown below. Record key is unique per row
ID RECORD_KEY CONCAT_REJECT CONCAT_SUB
1 A34785 A123 23
1 B23845 R384 A123
1 H38959 Y345 A123
Expected Result
ID CONCAT_REJECT COUNT_REJECT_IN_SUB
1 A123 2
1 R384 0
1 Y345 0
How do I perform this count? I tried using COUNT(CONCAT_REJECT) over (PARTITION BY CONCAT_SUB). But it's not giving the desired result
Count concat_sub in the inner query and then do a left join with concat_reject to get final result. Here is the demo.
select
id,
concat_reject,
coalesce(total, 0) as count_reject_in_sub
from myTable m
left join(
select
concat_sub,
count(*) as total
from myTable
group by
concat_sub
) m1
on m.concat_reject = m1.concat_sub
output:
| id | concat_reject | count_reject_in_sub |
| --- | ------------- | ------------------- |
| 1 | A123 | 2 |
| 1 | R384 | 0 |
| 1 | Y345 | 0 |

Identify the counts of each distinct value in one column in sqlite3

I am trying to identify the count of each distinct value in one column (name) in a table called brgy.
---------------------
| ID | name |
---------------------
| 1 | Alfonso |
| 2 | Arakan |
| 3 | Poblacion |
| 4 | Ilaya |
| 5 | Poblacion |
----------------------
I tried using this code but it keeps giving the COUNT as 1 despite Poblacion appearing twice in the name column:
SELECT name,COUNT(name) AS distinct_name
FROM (
SELECT DISTINCT name
FROM brgy
GROUP BY name
)
GROUP BY name;
The intended output should eliminate duplicate names but sum up the number of times the distinct name appears in the name column:
Expected Output is as below,
-----------------------------
| name | distinct_name |
-----------------------------
| Alfonso | 1 |
| Arakan | 1 |
| Ilaya | 1 |
| Poblacion | 2 |
-----------------------------
A simple GROUP BY name:
SELECT name, COUNT(name) AS distinct_name
FROM brgy
GROUP BY name;
you don't need the subquery:
SELECT DISTINCT name FROM brgy GROUP BY name
because GROUP BY name takes care of it.
Just removed the subquery because it will give you unique entry :
select name, count(*) as distinct_name
from brgy
group by name
order by distinct_name, name;

SQL : Getting duplicate rows along with other variables

I am working on Terradata SQL. I would like to get the duplicate fields with their count and other variables as well. I can only find ways to get the count, but not exactly the variables as well.
Available input
+---------+----------+----------------------+
| id | name | Date |
+---------+----------+----------------------+
| 1 | abc | 21.03.2015 |
| 1 | def | 22.04.2015 |
| 2 | ajk | 22.03.2015 |
| 3 | ghi | 23.03.2015 |
| 3 | ghi | 23.03.2015 |
Expected output :
+---------+----------+----------------------+
| id | name | count | // Other fields
+---------+----------+----------------------+
| 1 | abc | 2 |
| 1 | def | 2 |
| 2 | ajk | 1 |
| 3 | ghi | 2 |
| 3 | ghi | 2 |
What am I looking for :
I am looking for all duplicate rows, where duplication is decided by ID and to retrieve the duplicate rows as well.
All I have till now is :
SELECT
id, name, other-variables, COUNT(*)
FROM
Table_NAME
GROUP BY
id, name
HAVING
COUNT(*) > 1
This is not showing correct data. Thank you.
You could use a window aggregate function, like this:
SELECT *
FROM (
SELECT id, name, other-variables,
COUNT(*) OVER (PARTITION BY id) AS duplicates
FROM users
) AS sub
WHERE duplicates > 1
Using a teradata extension to ISO SQL syntax, you can simplify the above to:
SELECT id, name, other-variables,
COUNT(*) OVER (PARTITION BY id) AS duplicates
FROM users
QUALIFY duplicates > 1
As an alternative to the accepted and perfectly correct answer, you can use:
SELECT {all your required 'variables' (they are not variables, but attributes)}
, cnt.Count_Dups
FROM Table_NAME TN
INNER JOIN (
SELECT id
, COUNT(1) Count_Dups
GROUP BY id
HAVING COUNT(1) > 1 -- If you want only duplicates
) cnt
ON cnt.id = TN.id
edit: According to your edit, duplicates are on id only. Edited my query accordingly.
try this,
SELECT
id, COUNT(id)
FROM
Table_NAME
GROUP BY
id
HAVING
COUNT(id) > 1