How to use window functions in sql to bring distinct values?

How to use window functions in sql to bring distinct values? - sql

I have this query to bring a company name and its top 5 contact names and top 5 phone numbers.
It works fine when I bring only contacts or only phones but when I try to bring both all the values returned are not distinct (e.g. there is more then one row for each company).
I think it has something to do with the partitions, but I have not idea what it is.
Can someone please help me to:
Fix this query.
Understand what the fix means.
query:
select
p.company_name,
p.Contact_1, p.Contact_2, p.Contact_3, p.Contact_4, p.Contact_5,
p.Phone_1, p.Phone_2, p.Phone_3, p.Phone_4, p.Phone_5
from
(
select contact.first_name + ' ' + contact.last_name as contact_name,
phone.display_phone,
company.company_name,
'Contact_'+
cast(row_number() over(partition by relation.company_id
order by contact.first_name, contact.last_name) as varchar(50)) row,
'Phone_'+
cast(row_number() over(partition by phone.contact_id
order by phone.display_phone) as varchar(50)) row2
from contacts company
left join contact_company_relation_additional_information relation
on company.id = relation.company_id and relation.ContactCompanyRelation_IsActive = 1
left join contacts contact
on relation.contact_id = contact.id and contact.is_company = 0 and contact.is_active = 1
left join contact_phones phone on company.id = phone.contact_id and phone.is_active = 1
where company.is_company = 1 and company.is_active = 1
) d
pivot
(
max(contact_name)
for row in (Contact_1, Contact_2, Contact_3, Contact_4, Contact_5)
) x
pivot
(
max(display_phone)
for row2 in (Phone_1, Phone_2, Phone_3, Phone_4, Phone_5)
) p
Here is a link to sql fiddle with the duplicated rows: Contacts and Phones
Here are links to the queries with only contacts or only phones that bring one row for each company:
Contacts only
Phones only

In order to get the result that you want, I would suggest a slightly different approach to this. Since you want to pivot on two columns of data Contacts and Phone, I would first unpivot these columns into multiple rows, then apply the PIVOT - I think it is easier to to that then trying to apply the PIVOT twice.
I see a few things that I would fix in your current query. The main part of you query that is joining to all of the tables has a couple of things to change. First, I would only create one Row column:
row_number() over(partition by relation.company_id
order by contact.first_name, contact.last_name) row
This column would be partitioned by the company_Id in the in the contact_company_relation table. This new row number will be used for both the Contact and the Phone number columns.
Second, your current join to return the Phone number appears to be incorrect. Your current code is using the main company id but you want to join on each contact. Change your code from:
left join contact_phones phone
on company.id = phone.contact_id
to:
left join contact_phones phone
on contact.id = phone.contact_id
This will make your subquery:
select
contact.first_name + ' ' + contact.last_name as contact_name,
phone.display_phone,
company.company_name,
row_number() over(partition by relation.company_id
order by contact.first_name, contact.last_name) row
from contacts company
left join contact_company_relation relation
on company.id = relation.company_id
left join contacts contact
on relation.contact_id = contact.id
and contact.is_company = 0
left join contact_phones phone
on contact.id = phone.contact_id -- change to join on contact
where company.is_company = 1;
See SQL Fiddle with Demo. The data will now look like:
| CONTACT_NAME | DISPLAY_PHONE | COMPANY_NAME | ROW |
|--------------|---------------|--------------|-----|
| Ben Gurion | 2222222 | Analist | 1 |
| Ofer Jerus | 3333333 | Analist | 2 |
| Ori Reshef | 1111111 | Analist | 3 |
Once you have the data with the row number, you can unpivot the display_phone and company_name into multiple rows instead of columns. You didn't specify what version of SQL Server you are using but you can use either UNPIVOT or CROSS APPLY to do this. When you unpivot the data you will then use the Row value to associate each contact and phone pair - this makes sure that each contact is still associated with the correct phone number. The code would be similar to:
;with cte as
(
-- query from above here
)
select compnay_name, col, value
from
(
select company_name,
col = col+'_'+cast(row as varchar(50)),
value
from cte
cross apply
(
select 'Contact', Contact_name union all
select 'Phone', display_phone
) c (col, value)
) src;
See SQL Fiddle with Demo. The data will now be in the format which has multiple rows for each company_name, contact and phone:
| COMPANY_NAME | COL | VALUE |
|--------------|-----------|------------|
| Analist | Contact_1 | Ben Gurion |
| Analist | Phone_1 | 2222222 |
| Analist | Contact_2 | Ofer Jerus |
| Analist | Phone_2 | 3333333 |
| Analist | Contact_3 | Ori Reshef |
| Analist | Phone_3 | 1111111 |
The final step would be to add the PIVOT function making the final code:
;with cte as
(
select
contact.first_name + ' ' + contact.last_name as contact_name,
phone.display_phone,
company.company_name,
row_number() over(partition by relation.company_id
order by contact.first_name, contact.last_name) row
from contacts company
left join contact_company_relation relation
on company.id = relation.company_id
left join contacts contact
on relation.contact_id = contact.id
and contact.is_company = 0
left join contact_phones phone
on contact.id = phone.contact_id -- change to join on contact
where company.is_company = 1
)
select company_name,
contact_1, contact_2, contact_3, contact_4, contact_5,
phone_1, phone_2, phone_3, phone_4, phone_5
from
(
select company_name,
col = col+'_'+cast(row as varchar(50)),
value
from cte
cross apply
(
select 'Contact', Contact_name union all
select 'Phone', display_phone
) c (col, value)
) src
pivot
(
max(value)
for col in (contact_1, contact_2, contact_3, contact_4, contact_5,
phone_1, phone_2, phone_3, phone_4, phone_5)
) p;
See SQL Fiddle with Demo. The final result looks like:
| COMPANY_NAME | CONTACT_1 | CONTACT_2 | CONTACT_3 | CONTACT_4 | CONTACT_5 | PHONE_1 | PHONE_2 | PHONE_3 | PHONE_4 | PHONE_5 |
|--------------|------------|------------|------------|-----------|-----------|---------|---------|---------|---------|---------|
| Analist | Ben Gurion | Ofer Jerus | Ori Reshef | (null) | (null) | 2222222 | 3333333 | 1111111 | (null) | (null) |
| Bar Net | Dima Brods | Maya Leshe | Yossi Farc | (null) | (null) | 7777777 | 4444444 | 6666666 | (null) | (null) |

Related

Get related tables not contains result

I have a DesignGroup table as:
+--------------------------------------+----------+
| DesignGroupId | Name |
+--------------------------------------+----------+
| 3A81C1FF-442F-4291-B8E2-7079D80920CF | Design 1 |
| 3238F4C6-7BA7-4B3F-9383-17702B0D1CC3 | Design 2 |
+--------------------------------------+----------+
Each DesignGroup can have multiple customers, so I have a table DesignGroupCustomers as:
+--------------------------------------+--------------------------------------+-------------+
| DesignGroupCustomerId | DesignGroupId (FK) | CustomerKey |
+--------------------------------------+--------------------------------------+-------------+
| D0828677-F295-46F7-BB85-65888D5A48B7 | 3A81C1FF-442F-4291-B8E2-7079D80920CF | 10 |
| 10C01BB9-1DDB-4DB4-BEC4-9539E030BF68 | 3A81C1FF-442F-4291-B8E2-7079D80920CF | 20 |
| F88C9F66-C0D9-EB11-8481-5CF9DDF6DC87 | 3238F4C6-7BA7-4B3F-9383-17702B0D1CC3 | 10 |
+--------------------------------------+--------------------------------------+-------------+
Each customer have a CustomerType as, customerTable:
+-------------+-------------+
| CustomerKey | CustTypeKey |
+-------------+-------------+
| 10 | 2 |
| 20 | 1 |
+-------------+-------------+
That I want to achieve is to get only this statement:
return only the DesignGroup who not have a customer with custTypeKey = 1
In this case it should return Design 2 because it does not have customer with custTypeKey = 1
I was thinking about CTE usage but I just have not idea how to get the desire result:
;WITH CTE
AS (SELECT
[DG].[DesignGroupId]
, ROW_NUMBER() OVER(PARTITION BY [DesignGroupCustomer]) AS [RN]
FROM [DesignGroup] AS [DG]
INNER JOIN [DesignGroupCustomer] AS [DGC] ON [DG].[DesignGroupId] = [DGC].[DesignGroupId]
INNER JOIN [Customer] AS [C] ON [DGC].[CustomerKey] = [C].[CustomerKey]
INNER JOIN [CustomerType] AS [CT] ON [C].[CustTypeKey] = [CT].[CustTypeKey])
SELECT
[DesignGroupId]
FROM [CTE] -- WHERE CustomerType NOT CONTAINS (1)

WITH temp AS (
SELECT DISTINCT
dgc.DesignGroupId AS DesignGroupId
FROM DesignGroupCustomers dgc
INNER JOIN customerTable ct
ON dgc.CustomerKey = ct.CustomerKey
WHERE ct.CustTypeKey = 1
)
SELECT
DesignGroupId
FROM DesignGroup
WHERE DesignGroupId NOT IN (
SELECT
DesignGroupId
FROM temp
)
Firstly, you can get all designgroups having CustTypeKey =1 and then get all other designgroups using NOT IN. Please let me know if you face any issues

You can use a subquery to return the design groups which have this customer type key of 1 and then LEFT JOIN the subquery on the design table and filter down to results that have a DesignGroupId of null (any design group that isn't included in the dataset of the subquery)
SELECT d.[DesignGroupId]
FROM [DesignGroup] AS d
LEFT JOIN
(
SELECT dgc.[DesignGroupId]
FROM [DesignGroupCustomer] AS dgc
ON dgc.[DesignGroupId] = d.[DesignGroupId]
INNER JOIN [Customer] AS c
ON c.[CustomerKey] = dgc.[CustomerKey]
WHERE c.[CustTypeKey] = 1
GROUP BY dgc.[DesignGroupId]
) x
ON x.[DesignGroupId] = d.[DesignGroupId]
WHERE x.[DesignGroupId] IS NULL

A better way to aggregate into a default value

For this example I have three tables (individual, business, and ind_to_business). Individual has information on people. Business has information on businesses. And ind_to_business has information on which people are linked to which business. Here are their DDL:
CREATE TABLE individual
(
ID INTEGER PRIMARY KEY,
NAME VARCHAR2(100) NOT NULL,
ENTERPRISE_ID VARCHAR2(25) NOT NULL UNIQUE
);
CREATE TABLE business
(
ID INTEGER PRIMARY KEY,
NAME VARCHAR2(100) NOT NULL,
ENTERPRISE_ID VARCHAR2(25) NOT NULL UNIQUE
);
CREATE TABLE ind_to_business
(
ID INTEGER PRIMARY KEY,
IND_ID REFERENCES individual(id),
BUS_ID REFERENCES business(id),
START_DT DATE NOT NULL,
END_DT DATE
);
I'm looking for the best way to display one row for each person. If they are linked to one business, I want to display the the business's ENTERPRISE_ID. If they are linked to more than one business, I want to display the default value 'Multiple'. They will always be linked to a business, so there is no LEFT JOIN necessary. They can also be linked to a business more than once (Leaving and coming back). Multiple records for the same business would be aggregated.
So for the following sample data:
Individual:
+----+------------+---------------+
| ID | NAME | ENTERPRISE_ID |
+----+------------+---------------+
| 1 | John Smith | 53a23B7 |
| 2 | Jane Doe | 63f2a35 |
+----+------------+---------------+
Business:
+----+----------+---------------+
| ID | NAME | ENTERPRISE_ID |
+----+----------+---------------+
| 3 | ABC Corp | 2a34d9b |
| 4 | XYZ Inc | 34bf21e |
+----+----------+---------------+
ind_to_business
+----+--------+--------+-------------+-------------+
| ID | IND_ID | BUS_ID | START_DT | END_DT |
+----+--------+--------+-------------+-------------+
| 5 | 1 | 3 | 01-JAN-2000 | 31-DEC-2002 |
| 6 | 1 | 3 | 01-JAN-2015 | |
| 7 | 2 | 3 | 01-JAN-2000 | |
| 8 | 2 | 4 | 01-MAR-2006 | 05-JUN-2010 |
| 9 | 2 | 4 | 15-DEC-2019 | |
+----+--------+--------+-------------+-------------+
I would expect the following output:
+---------+------------+------------+
| IND_ID | NAME | LINKED_BUS |
+---------+------------+------------+
| 53a23B7 | John Smith | 2a34d9b |
| 63f2a35 | Jane Doe | Multiple |
+---------+------------+------------+
Here is my current query:
SELECT DISTINCT
sub.ind_id,
sub.name,
DECODE(sub.bus_count, 1, sub.bus_id, 'Multiple') AS LINKED_BUS
FROM (SELECT i.enterprise_id AS IND_ID,
i.name,
b.enterprise_id AS BUS_ID,
COUNT(DISTINCT b.enterprise_id) OVER (PARTITION BY i.id) AS BUS_COUNT
FROM individual i
INNER JOIN ind_to_business i2b ON i.id = i2b.ind_id
INNER JOIN business b ON i2b.bus_id = b.id) sub;
My query works, but this is running on a large dataset and taking a long time to run. I'm wondering if anyone has any ideas on how improve this so that there isn't so much wasted processing (i.e Needing to do a DISTINCT on the final result or doing COUNT(DISTINCT) in the inline view only to use that value in the DECODE above).
I've also created a DBFiddle for this question. (Link)
Thanks in advance for any input.

You could try and use a correlated subquery. This removes the need for outer distinct:
SELECT
i.enterprise_id ind_id,
i.name,
(
SELECT DECODE(COUNT(DISTINCT b.enterprise_id), 1, MIN(bus_id), 'Multiple')
FROM ind_to_business i2b
INNER JOIN business b ON i2b.bus_id = b.id
WHERE i2b.ind_id = i.id
) linked_bus
FROM individual i

You can join with the aggregated ind_to_business per individual. One way to do this:
select i.id, i.name, coalesce(b.enterprise_id, 'Multiple')
from individual i
join
(
select
ind_id,
case when min(bus_id) = max(bus_id) then min(bus_id) else null end as bus_id
from ind_to_business
group by ind_id
) ib on ib.ind_id = i.id
left join business b on b.id = ib.bus_id
order by i.id;

First you should sub-query to get all needed dimensions and then do all your final aggregation using CASE statement.
select
ind_id,
name,
case
when count(*) > 1 then 'Multiple'
else ind_id
end as linked_bus
from
(
select
distinct i.enterprise_id as ind_id,
i.name,
b.enterprise_id as bus_id
from individual i
join ind_to_business i2b
on i.id = i2b.ind_id
join business b
on i2b.bus_id = b.id
) vals
group by
ind_id,
name
order by
ind_id

No need of using DISTINCT twice. You could use subquery factoring and put the in-line view in WITH clause, and make the data set DISTINCT in the subquery itself.
WITH data AS
(
SELECT distinct
i.enterprise_id AS IND_ID,
i.name,
b.enterprise_id AS BUS_ID
FROM individual i
JOIN ind_to_business i2b ON i.id = i2b.ind_id
JOIN business b ON i2b.bus_id = b.id
)
SELECT ind_id,
name,
case
when count(*) = 1 then MIN(bus_id)
else 'Multiple'
end AS LINKED_BUS
FROM data
GROUP BY ind_id, name;
IND_ID NAME LINKED_BUS
---------- ---------- -------------------------
53a23B7 John Smith 2a34d9b
63f2a35 Jane Doe Multiple

Postgres group by empty string question to include empty string in output

I have following table in Postgres
| phone | group | spec |
| 1 | 1 | 'Lock' |
| 1 | 2 | 'Full' |
| 1 | 3 | 'Face' |
| 2 | 1 | 'Lock' |
| 2 | 3 | 'Face' |
| 3 | 2 | 'Scan' |
Tried this
SELECT phone, string_agg(spec, ', ')
FROM mytable
GROUP BY phone;
Need this ouput for each phone where there is empty string for missing group.
| phone | spec
| 1 | Lock, Full, Face
| 2 | Lock, '' , Face
| 3 | '', Scan ,''

You need a CTE which returns all possible combinations of phone and group and a left join to the table so you can group by phone:
with cte as (
select *
from (
select distinct phone from mytable
) m cross join (
select distinct "group" from mytable
) g
)
select c.phone, string_agg(coalesce(t.spec, ''''''), ',') spec
from cte c left join mytable t
on t.phone = c.phone and t."group" = c."group"
group by c.phone
See the demo.
Results:
| phone | spec |
| ----- | -------------- |
| 1 | Lock,Full,Face |
| 2 | Lock,'',Face |
| 3 | '',Scan,'' |

You can use conditional aggregation:
select phone,
(max(case when group = 1 then spec else '''''' end) || ', ' ||
max(case when group = 2 then spec else '''''' end) || ', ' ||
max(case when group = 3 then spec else '''''' end)
) as specs
from mytable t
group by phone;
Alternatively, you can general all the groups using generate_series() and then aggregation:
select p.phone,
string_agg(coalesce(t.spec, ''''''), ', ') as specs
from (select distinct phone from mytable) p cross join
generate_series(1, 3, 1) gs(grp) left join
mytable t
on t.phone = p.phone and t.group = gs.grp
group by p.phone

You can consider using a self - (RIGHT/LEFT)JOIN with all three distinct groups (which's stated within the subquery just after RIGHT JOIN keywords ) and a correlated query for your table :
WITH mytable1 AS
(
SELECT distinct t1.phone, t2."group",
( SELECT spec FROM mytable WHERE phone = t1.phone AND "group"=t2."group" )
FROM mytable t1
RIGHT JOIN ( SELECT distinct "group" FROM mytable ) t2
ON t2."group" = coalesce(t2."group",t1."group")
)
SELECT phone, string_agg(coalesce(spec,''''''), ', ') as spec
FROM mytable1
GROUP BY phone;
Demo

Using GROUP BY to only show the row with latest update for each user

I've been having trouble figuring out the syntax to do a GROUP BY to only show the row that has the latest ups.db_LastUpdate for each User (by db_UserId).
SELECT up.db_FirstName, up.db_LastName, up.db_UserId, ups.db_Initials, ups.db_LastUpdate
FROM tblUserProfile up
JOIN tblUserSel ups
ON ups.db_Code = up.db_UserId
WHERE ups.db_UserTech = 'U'
Output (There will be several hundred users, but you get the point):
Jeff | Ledger | 1-34 | JL | 2015-08-11
Jeff | Ledger | 1-34 | DBC | 2015-06-06
Jeff | Ledger | 1-34 | YX | 2015-08-01
John | Barker | 1-26 | JR | 2015-04-04
John | Barker | 1-26 | YY | 2015-02-18
John | Barker | 1-26 | FF | 2015-11-14
Maybe something like GROUP BY ups.dbUserId, MAX(db_LastUpdate)
Thanks for your help

Use ROW_NUMBER:
;WITH CTE AS
(
SELECT up.db_FirstName,
up.db_LastName,
up.db_UserId,
ups.db_Initials,
ups.db_LastUpdate,
RN = ROW_NUMBER() OVER(PARTITION BY up.db_UserId ORDER BY ups.db_LastUpdate DESC)
FROM tblUserProfile up
INNER JOIN tblUserSel ups
ON ups.db_Code = up.db_UserId
WHERE ups.db_UserTech = 'U'
)
SELECT *
FROM CTE
WHERE RN = 1;
As pointed in the comments, you can use MAX and then join with your table:
;WITH CTE AS
(
SELECT up.db_UserId,
MAX(ups.db_LastUpdate) MaxLastUpdate
FROM tblUserProfile up
INNER JOIN tblUserSel ups
ON ups.db_Code = up.db_UserId
WHERE ups.db_UserTech = 'U'
GROUP BY up.db_UserId
)
SELECT B.*
FROM CTE A
INNER JOIN tblUserSel B
ON A.db_UserId = B.db_Code
AND A.MaxLastUpdate = B.db_LastUpdate;
But you need to know that if there exists a row with the same date for the same user you'll get those 2 rows as a result.

If your tables have a unique ID column, I usually handle that situation something like this:
WITH LastEdit AS (
SELECT ups.db_Code, ups.db_Initials, ups.db_LastUpdate
FROM tblUserSel ups
WHERE ups.db_UserTech = 'U' AND ups.ID = (
SELECT TOP 1 ID
FROM tblUserSel upsn
WHERE ups.db_Initials = upsn.db_Initials
ORDER BY upsn.db_LastUpdate DESC
)
)
SELECT up.db_FirstName, up.db_LastName, up.db_UserId, le.db_Initials, le.db_LastUpdate
FROM tblUserProfile up
INNER JOIN LastEdit le
ON le.db_Code = up.db_UserId

SQL GROUP BY and retrieve last child records

I'm writing a DB view that pulls data from several tables. The goal is to determine the latest status of a company, and this is noted by each record (grouped by company_id) with the highest vetting_event_type_position.
Essentially I'm trying to grab the latest record for each company. I'm not a SQL guru at all; I understand I need to group by in order to collapse the related records, but I can't get that to work.
Current results
company_id | name | ... | vetting_event_type_position
-----------------------------------------------------
1 | ABC | ... | 1
1 | ABC | ... | 2
1 | ABC | ... | 3
2 | CBS | ... | 1
2 | CBS | ... | 2
3 | HBO | ... | 1
DESIRED results
company_id | name | ... | vetting_event_type_position
-----------------------------------------------------
1 | ABC | ... | 3
2 | CBS | ... | 2
3 | HBO | ... | 1
SQL Code
SELECT
companies.id as company_id,
companies.name as name,
companies.uuid as uuid,
companies.company_type as company_type,
companies.description as overview,
practice_areas.id as practice_area_id,
practice_areas.name as practice_area_name,
companies.created_at as created_at,
companies.updated_at as updated_at,
companies.created_by as created_by,
companies.updated_by as updated_by,
vettings.id as vetting_id,
vettings.name as vetting_name,
vetting_event_types.name as vetting_event_status,
vetting_events.id as vetting_event_id,
vetting_event_types.position as vetting_event_type_position
FROM
vettings
LEFT OUTER JOIN vetting_events ON (vettings.id = vetting_events.vetting_id)
LEFT OUTER JOIN vetting_event_types ON (vetting_events.vetting_event_type_id = vetting_event_types.id)
RIGHT OUTER JOIN companies ON (companies.id = vettings.company_id)
LEFT OUTER JOIN practice_areas ON (companies.practice_area_id = practice_areas.id)
LEFT OUTER JOIN dispositions ON (companies.disposition_id = dispositions.id)
ORDER BY
name, vetting_name, vetting_event_type_position
;
Associations among tables
companies has_many vettings
vettings has_many vetting_events
vetting_events belongs_to vetting_event_types
or put another way...
companies -> vettings -> vetting_events <- vetting_event_types
I am trying to retrieve the company record with the highest vetting_event_types.position value for each group.

SELECT company_id
,name
,uuid
,company_type
,overview
,practice_area_id
,practice_area_name
,created_at
,created_by
,updated_by
,vetting_id
,vetting_name
,vetting_event_status
,vetting_event_id
,vetting_event_type_position
FROM (
SELECT
companies.id as company_id,
companies.name as name,
companies.uuid as uuid,
companies.company_type as company_type,
companies.description as overview,
practice_areas.id as practice_area_id,
practice_areas.name as practice_area_name,
companies.created_at as created_at,
companies.updated_at as updated_at,
companies.created_by as created_by,
companies.updated_by as updated_by,
vettings.id as vetting_id,
vettings.name as vetting_name,
vetting_event_types.name as vetting_event_status,
vetting_events.id as vetting_event_id,
vetting_event_types.position as vetting_event_type_position,
ROW_NUMBER() OVER (PARTITION BY companies.id ORDER BY vetting_event_types.position DESC) rn
FROM vettings
LEFT OUTER JOIN vetting_events ON (vettings.id = vetting_events.vetting_id)
LEFT OUTER JOIN vetting_event_types ON (vetting_events.vetting_event_type_id = vetting_event_types.id)
RIGHT OUTER JOIN companies ON (companies.id = vettings.company_id)
LEFT OUTER JOIN practice_areas ON (companies.practice_area_id = practice_areas.id)
LEFT OUTER JOIN dispositions ON (companies.disposition_id = dispositions.id)
) A
WHERE A.rn = 1
ORDER BY name, vetting_name, vetting_event_type_position

You can use row_number analytic function.
Select * from (
Select ...,
Row_number() over ( partition by company_id order by vetting_event_type_position desc) as seq) T
Where seq=1

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to use window functions in sql to bring distinct values? - sql

Related

Get related tables not contains result

A better way to aggregate into a default value

Postgres group by empty string question to include empty string in output

Using GROUP BY to only show the row with latest update for each user

SQL GROUP BY and retrieve last child records

Categories

Resources