SQL - Find duplicate children - sql

I have a table containing meetings:
MeetID Description
-----------------------------------------------------
1 SQL Workshop
2 Cake Workshop
I have another table containing all participants in the meetings:
PartID MeetID Name Role
-----------------------------------------------------
1 1 Jan Coordinator
2 1 Peter Participant
3 1 Eva Participant
4 1 Michael Coordinator
5 2 Jan Coordinator
6 2 Peter Participant
I want to find is a list of all meetings that have 2 or more participants with Role = 'Coordinator'.
Eg. in the example above that would be the meeting with MeetID=1 and not 2.
I cannot for the life of me figure out how to do this, allthough I think it should be simple :-)
(I am using SQL Server 2012)

This is easy to do using group by and having:
select MeetId
from participants p
where Role = 'Coordinator'
group by MeetId
having count(*) >= 2;
Note: Role is a potential keyword/reserved word, so it is a bad choice for a column name.

Related

BigQuery count each item in array across table

I cannot quite find what I'm looking for, so here goes:
I'm looking for a way to get a count of the number of times an item occurs in an array across the entire table.
Imagine you have a table child_names with two columns - user_id and children
I know it's unusual to have two children with same name, but bear with me
.
user_id children
1 Bob, Jane, Bob
2 Jeff, Jane
3 Bob, Matt
4 Jane, John
I am looking for a result that would have two columns
Bob 3
Jane 3
Jeff 1
Matt 1
John 1
So far I have this
SELECT
ARRAY(
SELECT AS STRUCT child, `count`
FROM t.children child
LEFT JOIN (
SELECT AS STRUCT child, COUNT(1) `count`
FROM t.children child
GROUP BY child
) stats
USING(child)
) hashtag
FROM `child_names` t,
UNNEST(children)
But this gives me a count of how many children have that name per parent, not per table.
I get
Bob 2
Jane 1
Jeff 1
Jane 1
Bob 1
Matt 1
etc.
I hope that makes sense. Any help would be appreciated.
Use below
SELECT name, COUNT(*) cnt
FROM child_names,
UNNEST(children) name
GROUP BY name
if applied to sample data in your question - output is

SQL for joining two tables and grouping by shared column

I want to join two tables and use the column they both share to group the results, including a null result for those accountIds which only appear in one table.
Table a
AccountId
productApurchases
Steve
1
Jane
5
Bill
10
Abed
2
Table b
AccountId
productApurchases
Allan
1
Jane
10
Bill
2
Abed
1
Mike
2
Desired output
AccountId
productApurchases
productBpurchases
Steve
1
0
Jane
5
10
Bill
10
2
Abed
2
1
Mike
0
2
I've been trying with various joins but cannot figure out how to group by all the account ids.
Any advice much appreciated, thanks.
Use full join:
select accountid,
coalesce(productApurchases, 0) as productApurchases,
coalesce(productBpurchases, 0) as productBpurchases
from a full join
b
using (accountid);

SQL Query: Join (or select) 2 columns from 1 table with 1 column from another table for a view without extra join columns

This is my very first Stackoverflow post, so I apologize if I am not formatting my question correctly. I'm pounding my head against the wall with what I'm sure is a simple problem. I have a table with a bunch of event information, about 10 columns as so:
Table: event_info
date location_id lead_user_id colead_user_id attendees start end <and a few more...>
------------------------------------------------------------------------------------------------
2020-10-10 1 3 1 26 2100 2200 .
2020-10-11 3 2 4 18 0600 0700
2020-10-12 2 5 6 6 0800 0900
And another table with user information:
Table: users
user_id user_name display_name email phone city
----------------------------------------------------------------------
1 Joe S goofball ...
2 John T schmoofball ...
3 Jack U aloofball ...
4 Jim V poofball ...
5 Joy W tootball ...
6 George A boring ...
I want to create a view that has only a subset of the information, not full table joins. The event table lead_user_id and colead_user_id columns both refer to the user_id column in the users table.
I want to create a view like this:
date Location Lead Name CoLead Name attendees
---------------------------------------------------------------------
2020-10-10 1 Jack U Joe S 26
2020-10-11 3 John T Jim V 18
2020-10-12 2 Joy W George A 6
I have tried the following and several iterations like it to no avail...
SELECT
E.date, E.location,
U1.display_name AS Lead Name,
U2.display_name AS CoLead Name.
E.attendees
FROM
users U1, event_info E
INNER JOIN
event_info E ON U1.user_id = E.lead_user_id
INNER JOIN
users U2 ON U2.user_id = E.colead_user_id
And I get the dreaded
You have an error in your SQL Syntax
message. I'm not surprised, as I've really only ever used joins on single columns or nested select statements... this two columns pointing to one is throwing me for a loop. Help!
correct query for this matter
SELECT
E.date, E.location,
U1.display_name AS Lead Name,
(select display_name from users where user_id=E.colead_user_id) AS CoLead Name,
E.attendees
FROM
event_info E
INNER JOIN
users U1 ON U1.user_id = E.lead_user_id

Recursive SQL Query, but not the usual kind of recursive

I have a set of tables that organize a group of people into teams.
Users (ID int PK, etc.)
Teams (ID int PK, etc.)
UsersToTeams (ID int PK, UserID int FK, TeamID int FK, TeamSupervisor bit not null)
There is no Parent ID in the table because users can be on any number of teams, and teams can have any number of supervisors. A user may be on six teams but only supervise two of them, and one or both of those supervised teams may have other supervisors in it. So my hierarchy looks more like a web than a tree.
I recognize that a recursive query may result in a circular reference. Assume the software is handling that for the moment.
The company hierarchy is described by a Supervisor supervising a team of Users, a Manager supervising a team of Supervisors, etc. So it's hierarchical, but not in the usual way.
I need a query which, given a UserID, will return the IDs of the users he supervises, down infinite levels. How might such a query go?
Example
Users (ID, Name)
1 Archie
2 Betty
3 Chuck
4 Dilton
5 Eddie
6 Fannie
User 1 is a Manager (level 3). Users 2 and 3 are Supervisors (level 2). Users 4, 5, 6 are Users (level 1).
Teams (ID, Name)
1 Team Alpha
2 Team Bravo
3 Sup Team
UsersToTeams (ID INT PK, UserID INT FK, TeamID INT FK, isSupervisor BIT)
1 1 3 1 -- Archie supervises Sup Team
2 2 3 0 -- Betty is a member of Sup Team
3 3 3 0 -- Chuck is a member of Sup Team
4 2 1 1 -- Betty supervises team Alpha
5 4 1 0 -- Dilton is a member of team Alpha
6 5 1 0 -- Eddie is a member of team Alpha
7 3 2 1 -- Chuck supervises Team Bravo
8 6 2 0 -- Fannie is a member of Team Bravo
Archie is a Manager, supervising a team of supervisors.
Betty is a Supervisor, supervising a team of users.
Chuck is a Supervisor, supervising a team of users.
Betty and Chuck are also on Archie's team, but do not supervise it.
Therefore:
If I pass in UserID 5 (Eddie), I should get back only 5, because Eddie doesn't supervise anyone.
If I pass in UserID 3 (Chuck), I should get back 3 and 6, because Fannie is on a team that Chuck supervises.
If I pass in UserID 1 (Archie), I should get back all UserIDs described here, because Betty and Chuck are on Archie's team, and everyone else is on either Betty's team or Chuck's team.
Sorry, I tried that SQL fiddle link, but after 15 minutes of "Building Schema" I lost hope for it.
You can do this with a recursive CTE.
First, select the user himself, and then recursively select all the user he's immediately supervising over:
declare #userID int = 1;
with u as (
select id from users where id = #userID
union all
select lacky.userID from u supervisor
join usersToTeams supervising on supervising.userID = supervisor.id and isSupervisor = 1
join usersToTeams lacky on lacky.teamID = supervising.teamID and lacky.isSupervisor = 0
)
select * from u
Here's the fiddle: http://www.sqlfiddle.com/#!3/525e1/3

Excluding results that appear in another column of a CONNECT BY query

Have a heavy query (takes 15 minutes to run), but it's returning more results than I need. It's a CONNECT BY query, and I'm getting nodes that are descendants in the root node results. I.E.:
Ted
Bob
John
Bob
John
John
Normally, the way to resolve this is using a START WITH condition, typically requiring the parent of a node to be null. But due to the nature of the query, I don't have the START WITH values I need to compare to until I have the full resultset. I'm basically trying to double-query my results to say QUERY STUFF START WITH RECORDS THAT AREN'T IN THAT STUFF.
Here's the query (built with the help of Nicholas Krasnov, here: Oracle Self-Join on multiple possible column matches - CONNECT BY?):
select cudroot.root_user, cudroot.node_level, cudroot.user_id, cudroot.new_user_id,
cudbase.* -- Not really, just simplyfing
from css.user_desc cudbase
join (select connect_by_root(user_id) root_user,
user_id user_id,
new_user_id new_user_id,
level node_level
from (select cudordered.user_id,
coalesce(cudordered.new_user_id, cudordered.nextUser) new_user_id
from (select cud.user_id,
cud.new_user_id,
decode(cud.global_hr_id, null, null, lead(cud.user_id ignore nulls) over (partition by cud.global_hr_id order by cud.user_id)) nextUser
from css.user_desc cud
left join gsu.stg_userdata gstgu
on (gstgu.user_id = cud.user_id
or (gstgu.sap_asoc_global_id = cud.global_hr_id))
where upper(cud.user_type_code) in ('EMPLOYEE','CONTRACTOR','DIV_EMPLOYEE','DIV_CONTRACTOR','DIV_MYTEAPPROVED')) cudordered)
connect by nocycle user_id = prior new_user_id) cudroot
on cudbase.user_id = cudroot.user_id
order by
cudroot.root_user, cudroot.node_level, cudroot.user_id;
This gives me results about related users (based off of user_id renames or associated SAP IDs) that look like this:
ROOT_ID LEVEL USER_ID NEW_USER_ID
------------------------------------------------
A5093522 1 A5093522 FG096489
A5093522 2 FG096489 A5093665
A5093522 3 A5093665
FG096489 1 FG096489 A5093665
FG096489 2 A5093665
A5093665 1 A5093665
What I need is a way to filter the first join (select connect_by_root(user_id)... to exclude FG096489 and A5093665 from the root list.
The best START WITH I can think of would look like this (not tested yet):
start with user_id not in (select new_user_id
from (select coalesce(cudordered.new_user_id, cudordered.nextUser) new_user_id
from (select cud.new_user_id,
decode(cud.global_hr_id, null, null, lead(cud.user_id ignore nulls) over (partition by cud.global_hr_id order by cud.user_id)) nextUser
from css.user_desc cud
where upper(cud.user_type_code) in ('EMPLOYEE','CONTRACTOR','DIV_EMPLOYEE','DIV_CONTRACTOR','DIV_MYTEAPPROVED')) cudordered)
connect by nocycle user_id = prior new_user_id)
... but I'm effectively executing my 15 minute query twice.
I've looked at using partitions in the query, but there's not really a partition... I want to look at the full resultset of new_user_ids. Have also explored analytical functions such as rank()... my bag of tricks is empty.
Any ideas?
Clarification
The reason I don't want the extra records in the root list is because I only want one group of results for each user. I.E., if Bob Smith has had four accounts during his career here (people come and go frequently, as employees and/or contractors), I want to work with a set of accounts that all belong(ed) to Bob Smith.
If Bob came here as an contractor, converted to an employee, left, came back as a contractor in another country, and left/returned to a legal org that is now in our SAP system, his account rename/chain might look like:
Bob Smith CONTRACTOR ---- US0T0001 -> US001101 (given a new ID as an employee)
Bob Smith EMPLOYEE ---- US001101 -> EB0T0001 (contractor ID for the UK)
Bob Smith CONTRACTOR SAP001 EB0T000T (no rename performed)
Bob Smith EMPLOYEE SAP001 TE110001 (currently-active ID)
In the above example, the four accounts are linked by either a new_user_id field that was set when the user was renamed or through having the same SAP ID.
Because HR frequently fails to follow the business process, returning users may end up with any of those four ID being restored to them. I have to analyze all the IDs for Bob Smith and say "Bob Smith can only have TE110001 restored", and kick back an error if they try to restore something else. I have to do it for 90,000+ records.
The first column, "Bob Smith", is just an identifier to the group of associated accounts. In my original example, I'm using the root User ID as the identifier (e.g. US0T0001). If I use first/last names to identify users, I end up with collisions.
So Bob Smith would look like this:
US0T0001 1 CONTRACTOR ---- US0T0001 -> US001101 (given a new ID as an employee)
US0T0001 2 EMPLOYEE ---- US001101 -> EB0T0001 (contractor ID for the UK)
US0T0001 3 CONTRACTOR SAP001 EB0T0001 (no rename performed)
US0T0001 4 EMPLOYEE SAP001 TE110001 (currently-active ID)
... where 1, 2, 3, 4 are the levels in the heirarchy.
Since US0T0001, US001101, EB0T0001, and TE110001 are all accounted for, I don't want another group for them. But the results I have now have those accounts listed in multiple groups:
US001101 1 EMPLOYEE ---- US001101 -> EB0T0001 (
US001101 2 CONTRACTOR SAP001 EB0T0001
US001101 3 EMPLOYEE SAP001 TE110001
EB0T0001 1 CONTRACTOR SAP001 EB0T0001
EB0T0001 2 EMPLOYEE SAP001 TE110001
US001101 1 EMPLOYEE SAP001 TE110001
This causes two problems:
When I query the results for a User ID, I get hits from multiple groups
Each group will report a different expected user ID for Bob Smith.
You asked for an expanded set of records... here are some actual data:
-- NumRootUsers tells me how many accounts are associated with a user.
-- The new user ID field is explicitly set in the database, but may be null.
-- The calculated new user ID analyzes records to determine what the next related record is
NumRoot New User Calculated
RootUser Users Level UserId ID Field New User ID SapId LastName FirstName
-----------------------------------------------------------------------------------------------
BG100502 3 1 BG100502 BG1T0873 BG1T0873 GRIENS VAN KION
BG100502 3 2 BG1T0873 BG103443 BG103443 GRIENS VAN KION
BG100502 3 3 BG103443 41008318 VAN GRIENS KION
-- This group causes bad matches for Kion van Griens... the IDs are already accounted for,
-- and this group doesn't even grab all of the accounts for Kion. It's also using a new
-- ID to identify the group
BG1T0873 2 1 BG1T0873 BG103443 BG103443 GRIENS VAN KION
BG1T0873 2 2 BG103443 41008318 VAN GRIENS KION
-- Same here...
BG103443 1 1 BG103443 41008318 VAN GRIENS KION
-- Good group of records
BG100506 3 1 BG100506 BG100778 41008640 MALEN VAN LARS
BG100506 3 2 BG100778 BG1T0877 41008640 MALEN VAN LARS
BG100506 3 3 BG1T0877 41008640 VAN MALEN LARS
-- Bad, unwanted group of records
BG100778 2 1 BG100778 BG1T0877 41008640 MALEN VAN LARS
BG100778 2 2 BG1T0877 41008640 VAN MALEN LARS
-- Third group for Lars
BG1T0877 1 1 BG1T0877 41008640 VAN MALEN LARS
-- Jan... fields are set differently than the above examples, but the chain is calculated correctly
BG100525 3 1 BG100525 BG1T0894 41008651 ZANWIJK VAN JAN
BG100525 3 2 BG1T0894 TE035165 TE035165 41008651 VAN ZANWIJK JAN
BG100525 3 3 TE035165 41008651 VAN ZANWIJK JAN
-- Bad
BG1T0894 2 1 BG1T0894 TE035165 TE035165 41008651 VAN ZANWIJK JAN
BG1T0894 2 2 TE035165 41008651 VAN ZANWIJK JAN
-- Bad bad
TE035165 1 1 TE035165 41008651 VAN ZANWIJK JAN
-- Somebody goofed and gave Ziano a second SAP ID... but we still matched correctly
BG100527 3 1 BG100527 BG1T0896 41008652 STEFANI DE ZIANO
BG100527 3 2 BG1T0896 TE033030 TE033030 41008652 STEFANI DE ZIANO
BG100527 3 3 TE033030 42006172 DE STEFANI ZIANO
-- And we still got extra, unwanted groups
BG1T0896 3 2 BG1T0896 TE033030 TE033030 41008652 STEFANI DE ZIANO
BG1T0896 3 3 TE033030 42006172 DE STEFANI ZIANO
TE033030 3 3 TE033030 42006172 DE STEFANI ZIANO
-- Mark's a perfect example of the missing/frustrating data I'm dealing with... but we still matched correctly
BG102188 3 1 BG102188 BG1T0543 41008250 BULINS MARK
BG102188 3 2 BG1T0543 TE908583 41008250 BULINS R.J.M.A.
BG102188 3 3 TE908583 41008250 BULINS RICHARD JOHANNES MARTINUS ALPHISIUS
-- Not wanted
BG1T0543 3 2 BG1T0543 TE908583 41008250 BULINS R.J.M.A.
BG1T0543 3 3 TE908583 41008250 BULINS RICHARD JOHANNES MARTINUS ALPHISIUS
TE908583 3 3 TE908583 41008250 BULINS RICHARD JOHANNES MARTINUS ALPHISIUS
-- One more for good measure
BG1T0146 3 1 BG1T0146 BG105905 BG105905 LUIJENT VALERIE
BG1T0146 3 2 BG105905 TE034165 42006121 LUIJENT VALERIE
BG1T0146 3 3 TE034165 42006121 LUIJENT VALERIE
BG105905 3 2 BG105905 TE034165 42006121 LUIJENT VALERIE
BG105905 3 3 TE034165 42006121 LUIJENT VALERIE
TE034165 3 3 TE034165 42006121 LUIJENT VALERIE
Not sure if all that info makes it clearer or will make your eyes roll back into your head : )
Thanks for looking at this!
I think I have it. We have allowed ourselves to become fixated on the chronological order whereas in fact it doesn't matter. Your START WITH clause should be 'NEW_USER_ID IS NULL'.
To get chronological order you could 'ORDER BY cudroot.node_level * -1'.
I would also recommend that you look at using a WITH clause to form your base data and perform the heirarchical query on that.
Perhaps what you need here is multiple queries. Each query will find a subset of the records you are trying to find. Each query will hopefully be simpler and faster than a single, ginormous query. Something like:
where new_user_id is null and SAP ID is null
where new_user_id is not null and SAP ID is null
where new_user_id is null and SAP ID is not null
where new_user_id is not null and SAP ID is not null
(these are of the cuff examples)
I think part of the problem with solving this conundrum is that the problem space is too large. By subdividing this problem into smaller pieces, each piece will be workable.