SQL Server aggregate functions - how to? - sql

Input table contains 2 columns i.e. name and dept
+------+------+
| name | dept |
+------+------+
| A | 123 |
| B | 456 |
| A | 789 |
| C | 123 |
| A | 456 |
| B | 789 |
+------+------+
Output is
name
-----
A
so here A is working in 3 depts (123, 456, 789). How to retrieve the name who is working in all the 3 depts?

This might help you.
SELECT NAME
FROM TABLE1
GROUP BY NAME
HAVING COUNT(DISTINCT DEPT) =
(
SELECT COUNT(DISTINCT DEPT)
FROM TABLE1
)

Here's one option using a window function:
select name
from (
select name, count(distinct dept) cnt,
count(distinct dept) over () overallcnt
from yourtable
group by name
) t
where cnt = overallcnt

Try this:
SELECT NAME
FROM TABLE1
GROUP BY NAME
HAVING COUNT(DISTINCT DEPT)=(SELECT COUNT(DISTINCT DEPT) FROM TABLE1 )

Related

SQL - SUM of the for max ID

I have a table like this,
| id | name | subtask | maintask |
|----|------|---------|----------|
| 1 | t1 | 11 | 20 |
| 1 | t1 | 12 | 20 |
| 1 | t1 | 1 | 30 |
| 2 | t1 | 2 | 20 |
| 2 | t1 | 2 | 20 |
I want to prepare a result like this
| id | name | sum_of_subtask | sum_of_maintask | diff |
|----|------|----------------|-----------------|------|
| 2 | t1 | 4 | 40 | 36 |
Need to pick the max ID, then do the sum for subtask and maintask, then the last column is the difference of sum(subtask) and sum(maintask)
I tried this below query, but its calculating the sum for all the columns.
select max(id), name, sum(subtask),sum(maintask),sum(subtask-maintask) from tbl
group by name
Do you just want one row? If so, use order by and limit:
select id, name, sum(subtask), sum(maintask), sum(subtask-maintask)
from tbl
group by id, name
order by id desc
limit 1;
If your data is large, it might be more efficient to filter before aggregating:
select id, name, sum(subtask), sum(maintask), sum(subtask-maintask)
from tbl
where id = (select max(id) from tbl)
group by id, name;
If you want the maximum id per name, then the filtering logic is:
select id, name, sum(subtask), sum(maintask), sum(subtask-maintask)
from tbl t
where t.id = (select max(t2.id) from tbl t2 where t2.name = t.name)
group by id, name;
Please use below query,
select id, name, sum(subtask), sum (maintask), sum(subtask)-sum (maintask)
where id in
(select max(id) from table)
group by id, name;
select id, name, sum(subtask), sum(maintask), sum(subtask-maintask)
from tbl
where id = (select max(id) from tbl)
group by id, name

How to handle duplicates created by LEFT JOIN

LEFT TABLE:
+------+---------+--------+
| Name | Surname | Salary |
+------+---------+--------+
| Foo | Bar | 100 |
| Foo | Kar | 300 |
| Fo | Ba | 35 |
+------+---------+--------+
RIGHT TABLE:
+------+-------+
| Name | Bonus |
+------+-------+
| Foo | 10 |
| Foo | 20 |
| Foo | 50 |
| Fo | 10 |
| Fo | 100 |
| F | 1000 |
+------+-------+
DESIRED OUTPUT:
+------+---------+--------+-------+
| Name | Surname | Salary | Bonus |
+------+---------+--------+-------+
| Foo | Bar | 100 | 80 |
| Foo | Kar | 300 | 0 |
| Fo | Ba | 35 | 110 |
+------+---------+--------+-------+
The closest I get is this:
SELECT
a.Name,
Surname,
sum(Salary),
sum(Bonus)
FROM (SELECT
Name,
Surname,
sum(Salary) as Salary
FROM input
GROUP BY 1,2) a LEFT JOIN (SELECT Name,
SUM(Bonus) as Bonus
FROM input2
GROUP BY 1) b
ON a.Name = b.Name
GROUP BY 1,2;
Which gives:
+------+---------+-------------+------------+
| Name | Surname | sum(Salary) | sum(Bonus) |
+------+---------+-------------+------------+
| Fo | Ba | 35 | 110 |
| Foo | Bar | 100 | 80 |
| Foo | Kar | 300 | 80 |
+------+---------+-------------+------------+
I can't figure out how to get rid of Bonus duplication. Ideal solution for me would be as specified in the 'DESIRED OUTPUT', which is adding Bonus to only one Name and for other records with the same Name adding 0.
You can use row_number():
select l.*, (case when l.seqnum = 1 then r.bonus else 0 end) as bonus
from (select l.*, row_number() over (partition by name order by salary) as seqnum
from "left" l
) l left join
(select r.name, sum(bonus) as bonus
from "right" r
group by r.name
) r
on r.name = l.name
Try a Row_number over the Name category partioned by Name. This will give you different numbers for your duplicates. You can then search for the case when this number is 1 and return the result you want. Else return 0. The code can look something like this.
SELECT
a.Name,
Surname,
sum(Salary),
Case when Duplicate_Order = 1
then bonus
else 0
end as 'Bonus'
FROM (SELECT
Name,
Surname,
sum(Salary) as Salary
,ROW_NUMBER() over (partition by Name order by name) as [Duplicate_Order]
FROM input
GROUP BY 1,2) a
LEFT JOIN (SELECT Name,
SUM(Bonus) as Bonus
FROM input2
GROUP BY 1) b
ON a.Name = b.Name
GROUP BY 1,2;
Hope that helps!
You can use Correlated Subquery with sum() aggregation to compute the bonus column, and then apply lag() window analytic function to get the zeros for successively identical valued column values for the name column :
select Name, Surname, Salary,
bonus - lag(bonus::int,1,0) over (partition by name order by salary) as bonus
from
(
select i1.*,
( select sum(Bonus)
from input2 i2
where i1.Name = i2.Name
group by i2.Name ) as bonus
from input i1
) ii
order by name desc, surname;
Demo

SQL - SELECT duplicates between IDs, but not show records if duplicates occur for same ID

I have the following table (simplified from the real table) at the moment:
+----+-------+-------+
| ID | Name | Phone |
+----+-------+-------+
| 1 | Tom | 123 |
| 1 | Tom | 123 |
| 1 | Tom | 123 |
| 2 | Mark | 321 |
| 2 | Mark | 321 |
| 3 | Kate | 321 |
+----+-------+-------+
My desired output in the SELECT statement is:
+----+------+-------+
| ID | Name | Phone |
+----+------+-------+
| 2 | Mark | 321 |
| 3 | Kate | 321 |
+----+------+-------+
I want to select duplicates only when they occur between two different IDs (like Mark and Kate sharing the same phone number), but not to show any records for IDs that share the same phone number with themselves only (like Tom).
Could someone advise how this can be achieved?
You can use an EXISTS condition with a correlated subquery to ensure that another record exists that has the same phone and a different id. We also need DISTINCT to remove the duplicates in the resultset.
SELECT DISTINCT id, name, phone
FROM mytable t
WHERE EXISTS (
SELECT 1
FROM mytable t1
WHERE t1.phone = t.phone AND t1.id <> t.id
)
Demo on DB Fiddle:
| id | name | phone |
| --- | ---- | ----- |
| 2 | Mark | 321 |
| 3 | Kate | 321 |
You can use window functions for this:
select t.*
from (select t.*,
row_number() over (partition by phone, name order by id) as seqnum,
min(id) over (partition by phone) as min_id,
max(id) over (partition by phone) as max_id
from t
) t
where seqnum = 1 and min_id <> max_id;
Another method uses aggregation and a window function:
select phone, name, id
from (select phone, name, id,
count(*) over (partition by phone) as num_ids
from t
group by phone, name, id
) pn
where num_ids > 1;
Both of these have the advantage over the exists solution (GMB's) that they refer to the "table" only once. That can be a big advantage if the table is a complex view or query. If performance is an issue, I would encourage you to test several variants to see which works best.
Can use somewhat a corelated query with group by and having as below
Select ID, NAME, max(PHONE) From
(Select * From Table) t group by id,
name having
1= max(
case
When phone in (select phone from
table where t.id<>Id) then 1 else 0)
end)

SQL: Select only one row of table with same value

Im a bit new to sql and for my project I need to do some Database sorting and filtering:
Let's assume my database looks like this:
==========================================
| id | email | name
==========================================
| 1 | 123#test.com | John
| 2 | 234#test.com | Peter
| 3 | 234#test.com | Steward
| 4 | 123#test.com | Ethan
| 5 | 542#test.com | Bob
| 6 | 123#test.com | Patrick
==========================================
What should I do to only have the last column with the same email te be returned:
==========================================
| id | email | name
==========================================
| 3 | 234#test.com | Steward
| 5 | 542#test.com | Bob
| 6 | 123#test.com | Patrick
==========================================
Thanks in advance!
SQL Query:
SELECT * FROM test.test1 WHERE id IN (
SELECT MAX(id) FROM test.test1 GROUP BY email
);
Hope this solves your problem. Thanks.
A generic way to do this in SQL is to use the ANSI standard row_number() function:
select t.*
from (select t.*, row_number() over (partition by email order by id desc) as seqnum
from t
) t
where seqnum = 1;
Here is a clearer way:
SELECT *
FROM table
ORDER BY email DESC
LIMIT 1;
You can use following query to get the MAX id value per email:
SELECT email, MAX(id)
FROM mytable
GROUP BY email
Using the above query as a derived table you can obtain the whole record:
SELECT t1.*
FROM mytable AS t1
JOIN (
SELECT email, MAX(id) AS id
FROM mytable
GROUP BY email
) AS t2 ON t1.id = t2.id

SQL Query needs to match similar records

I have a very large table of contacts which I am building an interface to help my client to de-dupe. Here is an example of the table content
id | firstname | lastname | email | address1 | addres2 | verifiedAt |
1 | James | johnson | james#test.com | | | |
2 | David | bloggs | james#bloggs.com | | | |
3 | John | nobel | james#nobel.com | | | |
4 | Terry | jacket | james#jacket.com | | | 05/05/2013 |
5 | James | johnson | james#johnson.com| | | |
6 | James | privett | james#test.com | | | |
I need to write a query that will return the first contact that has another contact in the same table where either the email addresses match or the firstname + lastname match.
Is this possible in a single query?
Thanks in advance
Try this (SQL Fiddle).
SELECT DISTINCT *
FROM
( SELECT
MIN(id) as [id]
FROM mytable
GROUP BY email
HAVING COUNT(*) > 1
UNION ALL
SELECT
MIN(id) as [id]
FROM mytable
GROUP BY firstName,lastName
HAVING Count(*) > 1 )dups
JOIN myTable t
ON t.Id = dups.id
This works (SQLFiddle DEMO):
SELECT a.* FROM mytable a
JOIN (
SELECT email
FROM mytable
GROUP BY email
HAVING count(*) > 1
) b ON a.email = b.email
UNION
SELECT a.* FROM mytable a
JOIN (
SELECT firstname, lastname
FROM mytable
GROUP BY firstname, lastname
HAVING count(*) > 1
) b ON a.firstname = b.firstname AND a.lastname = b.lastname
To make sure that this query works fast, be sure to have at least following indexes:
CREATE INDEX i1 ON mytable(email);
CREATE INDEX i2 ON mytable(firstname, lastname);
One method:
with cte as
(select c.*,
row_number() over (partition by email order by id) rnem,
count(*) over (partition by email) ctem,
row_number() over (partition by firstname, lastname order by id) rnfl,
count(*) over (partition by firstname, lastname) ctfl
from contacts c)
select * from cte
where (ctem > 1 and rnem = 1) or (ctfl > 1 and rnfl = 1)
SQLFiddle here.