I'm having trouble understanding how to do a multi-table join without generating lots of duplicate fields.
Let's say that I have three tables:
family: id, name
parent: id, family, name
child: id, family, name
If I do a simple select:
select family.id, family.name from family
order by family.id;
I get a simple list:
ID Name
1 Smith
2 Jones
3 Wong
If I add an inner join:
select family.id, family.name, parent.first_name, parent.last_name
from family
inner join parent
on parent.family = family.id
order by family.id;
I get some duplicated fields:
ID Name Parent
1 Smith Howard Smith
1 Smith Janet Smith
2 Jones Phil Jones
2 Jones Harriet Jones
3 Wong Billy Wong
3 Wong Rachel Wong
And if I add another inner join:
select family.id, family.name, parent.first_name, parent.last_name
from family
inner join parent
on parent.family = family.id
inner join child
on child.family = family.id
order by family.id;
I get even more duplicated fields:
ID Name Parent Child
1 Smith Howard Smith Peter Smith
1 Smith Howard Smith Sally Smith
1 Smith Howard Smith Fred Smith
1 Smith Janet Smith Peter Smith
1 Smith Janet Smith Sally Smith
1 Smith Janet Smith Fred Smith
2 Jones Phil Jones Mark Jones
2 Jones Phil Jones Melissa Jones
2 Jones Harriet Jones Mark Jones
2 Jones Harriet Jones Melissa Jones
3 Wong Billy Wong Mary Wong
3 Wong Billy Wong Jennifer Wong
3 Wong Rachel Wong Mary Wong
3 Wong Rachel Wong Jennifer Wong
What I would prefer, because it's more human readable, is something like this:
ID Name Parent Child
1 Smith Howard Smith Peter Smith
Janet Smith Sally Smith
Fred Smith
2 Jones Phil Jones Mark Jones
Harriet Jones Melissa Jones
3 Wong Billy Wong Mary Wong
Rachel Wong Jennifer Wong
I know that one of the benefits of an inner join is to avoid presenting excess information through a Cartesian product. But it seems that I get something similar with a multi-table join. Is there a way to summarize each group as shown above or will this require post-processing with a scripting language like Python?
Thanks,
--Dan
This is precisely the way the relation databases work: each row must contain all information in itself, with every single field that you request. In other words, each row needs to make sense in isolation from all other rows. If you do a single query and you need to get all three levels of information, you need to deal with eliminating duplicates yourself for the desired formatting.
Alternatively, you can run three separate queries, and then do in-memory joins in code. Although this may be desirable in certain rare situations, it is generally a wrong way of spending your development time, because RDBMS are usually much more efficient at joining relational data.
You've hit it on the head. You'll need some post processing to get the results you're looking for.
SQL query results are always simple tabular data, so to get the results you're looking for would definitely not be a pretty query. You could do it, but it would involve quite a bit of query voodoo, storing things in temporary tables or using cursors, or some other funky workaround.
I'd definitely suggest using an external application to retrieve your data and format it appropriately from there.
ORMs like Entity Framework in .NET can probably do this pretty easily, but you could definitely do this with a few nested collections or dictionaries in any language.
Related
I need to create column with name(s) (Supervisors - can be multiple supervisors at the same time, but also there might not be supervisor at all) from JSON format column, that not in 2 other column with names (Employee and Client).
Id
Employee
Client
AllParticipants
1
Justin Bieber
Ariana Grande
[{"ParticipantName":"Justin Bieber"},{"ParticipantName":"Ariana Grande"}]
2
Lionel Messi
Christiano Ronaldo
[{"ParticipantName":"Christiano Ronaldo"},{"ParticipantName":"Lionel Messi"}]
3
Nicolas Cage
Robert De Niro
[{"ParticipantName":"Robert De Niro"},{"ParticipantName":"Nicolas Cage"},{"ParticipantName":"Brad Pitt"}]
4
Harry Potter
Ron Weasley
[{"ParticipantName":"Ron Weasley"},{"ParticipantName":"Albus Dumbldor"},{"ParticipantName":"Harry Potter"},{"ParticipantName":"Lord Voldemort"}]
5
Tom Holland
Henry Cavill
[{"ParticipantName":"Henry Cavill"},{"ParticipantName":"Tom Holland"}]
6
Spider Man
Venom
[{"ParticipantName":"Venom"},{"ParticipantName":"Iron Man"},{"ParticipantName":"Superman"},{"ParticipantName":"Spider Man"}]
7
Andrew Garfield
Leonardo DiCaprio
[{"ParticipantName":"Tom Cruise"},{"ParticipantName":"Andrew Garfield"},{"ParticipantName":"Leonardo DiCaprio"}]
8
Dwayne Johnson
Jennifer Lawrence
[{"ParticipantName":"Jennifer Lawrence"},{"ParticipantName":"Dwayne Johnson"}]
The output column I need:
Supervisors
NULL
NULL
Brad Pitt
Albus Dumbldor, Lord Voldemort
NULL
Iron Man, Superman
Tom Cruise
NULL
I've tried to create extra columns to use Case expression after that, but it seems too complex.
SELECT *,
JSON_VALUE(w.AllParticipants,'$[0].ParticipantName') AS ParticipantName1,
JSON_VALUE(w.AllParticipants,'$[1].ParticipantName') AS ParticipantName2,
JSON_VALUE(w.AllParticipants,'$[2].ParticipantName') AS ParticipantName3,
JSON_VALUE(w.AllParticipants,'$[3].ParticipantName') AS ParticipantName4
FROM Work AS w
I'm wondering if there is an easy way to compare values and extract only unique ones.
I have tried to simplify my question with the following example:
I have a table with the following data:
Marker Name Location
1 Eric Benson Mixed
2 John Smith Rural
3 A David Rural
4 B John Mixed
And i want to insert into the table:
Name Location
Andy Jones Mixed
Ian Davies Rural
How can i continue the sequencein the Marker column to end up with:
Marker Name Location
1 Eric Benson Mixed
2 John Smith Rural
3 A David Rural
4 B John Mixed
5 Andy Jones Mixed
6 Ian Davies Rural
If you make this with a Stored Procedure you can ask the max of the Marker before to insert.
(That only works if the Marker Column is not identity)
Like This:
declare #max_marker int
set #max_marker=isnull((select max(marker) from table),0)
--Insert comes here
Insert into table (Marker,Name,Location) Values(#max_marker+1,'Andy Jones','Mixed')
The goal is to try and obtain two random sample cases per handler ID.
The data for this project is below.
ID Complaint Handler Handler ID Reference Outcome Handler Notes
1 John Doe h384 R38423 Uphold Not Applicable
2 Ryan Jones h632 R38482 Uphold Not Applicable
3 Chris Smith h238 R84823 Defend Not Applicable
4 Emily Surry h634 R48384 Reject Not Applicable
5 Elle Smith h123 R48823 Uphold Not Applicable
6 Jane Doe h324 R48282 Uphold Not Applicable
7 Joe Bloggs h538 R83322 Reject Not Applicable
8 Ryan Jones h632 R38299 Defend Not Applicable
9 Chris Smith h238 R83482 Reject Not Applicable
10 Chris Smith h238 R91823 Reject Not Applicable
11 Joe Bloggs h538 R18291 Uphold Not Applicable
I have used the following query to select all the unique case handler references.
SELECT Cases.[Handler ID]
FROM Cases
GROUP BY Cases.[Handler ID];
I then need to loop through all these unique references and execute the following query
SELECT TOP 2 *
FROM Cases
WHERE Cases.[Handler ID] = 'XXXXXX'
ORDER BY Rnd(ID)
An example of the result would be
ID Complaint Handler Handler ID Reference Outcome Handler Notes
1 John Doe h384 R38423 Uphold Not Applicable
2 Ryan Jones h632 R38482 Uphold Not Applicable
3 Chris Smith h238 R84823 Defend Not Applicable
4 Emily Surry h634 R48384 Reject Not Applicable
5 Elle Smith h123 R48823 Uphold Not Applicable
6 Jane Doe h324 R48282 Uphold Not Applicable
7 Joe Bloggs h538 R83322 Reject Not Applicable
8 Ryan Jones h632 R38299 Defend Not Applicable
10 Chris Smith h238 R91823 Reject Not Applicable
11 Joe Bloggs h538 R18291 Uphold Not Applicable
Result example: Row 9 randomly removed as there was three Chris Smith's.
No other rows affected as there is 2 or less results.
I think this will work in MS Access:
select c.*
from cases as c
where c.id in (select top 2 c2.id
from cases as c2
where c2.handler = c.handler
order by rnd(-Timer() * c2.id)
);
I have a Column named 'Complete name'
I need to update people with any last name 'Smiht' to 'Smith' without losing the name and the second last name.
For example, now I have:
John Smiht G.
Sarah Connor Smiht
John Ford Connor
James Smiht Ford
And the result of update has to be the same data but with Smiht being replaced to Smith:
John Smith G.
Sarah Connor Smith
John Ford Connor
James Smith Ford
Thanks!
The generic method is something like this:
update t
set CompleteName = replace(CompleteName, ' Smiht', ' Smith'
where CompleteName like '% Smiht%';
I have two tables:
RecommendedFriends and AddedFriends
each of the tables have a User field and a Friend field. I am trying to figure out how I can see how many friends a User added that they were also recommended. Heres an example of the tables:
RecommendedFriends
User Friends Time
------------------------------------
Jake Eric 8:00am
Jake John 8:00am
Jake Jack 8:30am
Greg John 8:30am
Greg Tim 9:00am
Greg Steve 9:30am
Will Jackson 9:30am
AddedFriends
User Friends Time
------------------------------------
Jake Jack 8:35am
Greg John 8:35am
Greg Tim 9:00pm
Greg Jim 10:30am
Greg Tina 10:45am
Greg Bob 10:00am
Charlie Brian 11:00am
So the table I need would look like this:
Results
User RecFriends AddFriends
------------------------------------
Jake Eric
Jake John
Jake Jack Jack
Greg John John
Greg Tim Tim
Greg Steve
Greg Tina
Will Jackson
Charlie Brian
So I can go in and say 3 people added friends they were recommended, 4 Recommendations failed, and 2 people added someone they weren't recommended.
I think what you want is full outer join:
select coalesce(rf.USER, af.user) as user, rf.friends as RecFriends, af.Friends as AddFriends,
from RecommendedFriends rf full outer join
AddedFriends af
on rf.user = af.user and
rf.Friends = af.Friends
This doesn't take time into account. You might want to check that the time of the add is after the time of the recommendation, if you want to infer causality between the recommendation and the add.
If you are using a database that doesn't support full outer join (can anyone say "MySQL"), you can get the same result doing:
select t.user, MAX(case when which = 'rec' then friends end) as RecFriends,
MAX(case when which = 'add' then friends end) as AddFriends
from ((select rf.user, rf.friends, 'rec' as which
from RecommendedFriends af.user
) union all
(select af.user, af.friends, 'add' as which
from AddedFriends af
)
) t
group by user
This version has the nice feature that it will not produce duplicate records, in the event of multiple recommendations or adds.