Interesting SQL Sorting Issue - sql

It's crunch time, deadline for my most recent contract is coming in two days and almost everything is complete and working fine (knock on wood) except for one issue.
In one of my stored procedures, I'm needing to return a result set as follows.
group_id name
-------- --------
A101 Craig
A102 Craig
Z101 Craig
Z102 Craig
A101 Jim
A102 Jim
Z101 Jim
Z102 Jim
B101 Andy
B102 Andy
Z101 Andy
Z102 Andy
The names need to be sorted by the first character of the group id and also include the Z101/Z102 entries. By sorting strictly by the group id, I get a result set as follows:
group_id name
-------- --------
A101 Craig
A102 Craig
A101 Jim
A102 Jim
B101 Andy
B102 Andy
Z101 Andy
Z102 Andy
Z101 Craig
Z102 Craig
Z101 Jim
Z102 Jim
I really can't think of a solution that doesn't involve me making a cursor and bloating the stored procedure up more than it already is. I'm sure a great mind out there has an elegant solution and I'm eager to see what the community can come up with.
Thanks a ton in advance.
Edit: Let me expand :) I'm sorry, it's late and I'm coffee addled.
The above result set is a special case for a special type of data entry. Being transparent, we're making an election based website and these are going to be candidates sorted by office, name, and then district.
Most offices have multiple districts in them except for district positions like magistrate/coroner, which will have only one. The Z comes in as the "district" for absentee machine and absentee paper votes.
The non-magistrate positions can be sorted by name first, as they are all grouped together. However, the existing system lists all magistrates in a huge clump of information, when they should be sorted by individual districts. This is where the issue lies.
To protect my pride, I want to add that I had no control over the normalization of the database. It was given to me by the client.
Here's the order clause of my stored procedure, if it helps:
ORDER BY candidate.party,
candidate.ballot_name,
CASE WHEN candidate.district_type = 'MAG' THEN LEFT(votecount.precinct_id, 1) END,
candidate.last_name,
candidate.first_name,
precinct.name
Edit 2: Here's where I currently stand (1:43 A.M.) -
I'm using a suggestion below to create a conditional inner join as follows:
IF candidate.district_type = 'MAG'
BEGIN
(
SELECT candidate.id AS candidate_id, candidate.last_name, LEFT(votecount.precinct_id, 1) AS district, votecount.precinct_id
FROM candidate
INNER JOIN votecount
ON votecount.candidate_id = candidate.id
GROUP BY name
) mag_order
INNER JOIN mag_order
ON mag_order.candidate_id = candidate.id
END
and then I'll sort it by mag_order.district, candidate.precinct_id, candidate.last_name.
For some reason I'm getting a SQL error when aliasing the ( SELECT ) as mag_order. Anyone see anything wrong with the code? I can't for the life of me. Sorry this is a bit tangential.

SELECT g1.group_id, g1.name
FROM
groups g1
INNER JOIN
(
SELECT MIN(group_id), name
FROM groups
GROUP BY name
) g2 on g1.name = g2.name
ORDER BY g2.group_id, g1.name, g1.group_id

ORDER BY name DESC, SUBSTR(group_id,1), group_id

SELECT groupId, name
FROM table
ORDER BY getFirstGroupId(name), name, groupId
Then your getFirstGroupId() function would return the first groupId for that name
SELECT MIN(groupId)
FROM groupTable
WHERE name = #name

Related

Merge SQL Rows in Subquery

I am trying to work with two tables on BigQuery. From table1 I want to find the accession ID of all records that are "World", and then from each of those accession numbers I want to create a column with every name in a separate row. Unfortunately, when I run this:
Select name
From `table2`
Where acc IN (Select acc
From `table1`
WHERE source = 'World')
Instead of getting something like this:
Acc1
Acc2
Acc3
Jeff
Jeff
Ted
Chris
Ted
Blake
Rob
Jack
Jack
I get something more like this:
row
name
1
Jeff
2
Chris
3
Rob
4
Jack
5
Jeff
6
Jack
7
Ted
8
Blake
Ultimately, I am hoping to download the data and somehow use python or something to take each name and count the number of times it shows up with each other name at a given accession number, and furthermore measure the degree to which each pairing is also found with third names in any given column, i.e. the degree to which they share a cohort. So I need to preserve the groupings which exist with each accession number, but I am struggling to find info on how one might do this.
Could anybody point me in the right direct for this, or otherwise is the way I am going about this wise if that is my end goal?
Thanks!
This is not a direct answer to the question you asked. In general, it is easier to handle multiple rows rather than multiple columns.
So, I would recommend that you put each acc value in a separate row and then list the names as an array:
select t2.acc, array_agg(t2.name order by t2.name) as names
from `table2` t2
where t2.acc in (Select t1.acc
From `table1` t1
where t1.source = 'World'
)
group by t2.acc;
Otherwise, you are going to have a challenge just naming the columns in your result set.

Compute number of direct report for each employee in the organization (aggregation)

FYI I use Redshift SQL.
I have a database that looks roughly like the one below (the database has multiple columns that I'll abstract away for simplicity).
This table is a representation of the hierarchical tree within my organization.
employee manager
-------- -------
daniel louis
matt martha
martha kim
laura matt
michael martha
...
As you can see, matt appears in two distinct records, one as the employee and the other as laura's manager. Martha appears in three records, one as an employee and in two other as manager.
I'd like to find a way to compute the number of direct reports each employee has. A conditional count in which the criteria would be where employee = manager, perhaps?
I guess I could find this information using a subquery and then join it back but I was wondering if there was a more "elegant" way to do this making use of window functions maybe.
The expected output for the table above would be:
employee manager direct_reports
-------- ------- --------------
daniel louis 0
matt martha 1
martha kim 2
laura matt 0
michael martha 0
...
I would approach this with a correlated subquery:
select
t.employee,
t.manager,
(select count(*) from mytable t1 where t1.manager = t.employee) direct_reports
from mytable t
This should be a quite efficient method, especially with an index on (employee, manager).
Use a left join and aggregation:
select em.employee, em.manager, count(ew.employee)
from employees em left join
employees ew
on ew.manager = em.employee
group by em.employee, em.manager;

SQL Combine null rows with non null

Due to the way a particular table is written I need to do something a little strange in SQL and I can't find a 'simple' way to do this
Table
Name Place Amount
Chris Scotland
Chris £1
Amy England
Amy £5
Output
Chris Scotland £1
Amy England £5
What I am trying to do is above, so the null rows are essentially ignored and 'grouped' up based on the Name
I have this working using For XML however it is incredibly slow, is there a smarter way to do this?
This is where MAX would work
select
Name
,Place = Max(Place)
,Amount = Max(Amount)
from
YourTable
group by
Name
Naturally, if you have more than one occurance of a place for a given name, you may get unexpected results.

MS Access: Selecting the first item according to a rank

Imagine I have a query called QueryA that returns stuff like this:
Employee Description Rank
John Happy 1
John Depressed 3
James Happy 1
James Confused 2
Mark Depressed 3
I am trying to make a query that grabs the Employee and the Description, but only one description -- the one with the best "rank." (the lower the rank the better). I sort QueryA by Employee then by Rank (descending).
So I'd want my new query QueryB to show that John as Happy, James as Happy, and Mark as Depressed.
However I try selecting Employee and then First of Description and it doesn't always work.
I'm not able to check this for Access, but it should work fine. Check my SQL Fiddle
select
r.employee, d.description
from
table1 as d
inner join (select min(rank) as rank, employee
from
table1
group by employee) r on d.rank = r.rank
and d.employee = r.employee

Group by a field not in select

I want to find how many modules a lecturer taught in a specific year and want to select name of the lecturer and the number of modules for that lecturer.
Problem is that because I am selecting Name, and I have to group it by name to make it work. But what if there are two lecturers with same name? Then sql will make them one and that would be wrong output.
So what I really want to do is select name but group by id, which sql is not allowing me to do. Is there a way around it?
Below are the tables:
Lecturer(lecturerID, lecturerName)
Teaches(lecturerID, moduleID, year)
This is my query so far:
SELECT l.lecturerName, COUNT(moduleID) AS NumOfModules
FROM Lecturer l , Teaches t
WHERE l.lecturerID = t.lecturerID
AND year = 2011
GROUP BY l.lecturerName --I want lectureID here, but it doesn't run if I do that
SELECT a.lecturerName, b.NumOfModules
FROM Lecturer a,(
SELECT l.lecturerID, COUNT(moduleID) AS NumOfModules
FROM Lecturer l , Teaches t
WHERE l.lecturerID = t.lecturerID
AND year = 2011
GROUP BY l.lecturerID) b
WHERE a.lecturerID = b.lecturerID
You should probably just group by lecturerID and include it in the select column list. Otherwise, you're going to end up with two rows containing the same name with no way to distinguish between them.
You raise the problem of "wrong output" when grouping just by name but "undecipherable output" is just as big a problem. In other words, your desired output (grouping by ID but giving name):
lecturerName Module
------------ ------
Bob Smith 1
Bob Smith 2
is no better than your erroneous output (grouping by, and giving, name):
lecturerName Module
------------ ------
Bob Smith 3
since, while you now know that one of the lecturers taught two modules and the other taught one, you have no idea which is which.
The better output (grouping by ID and displaying both ID and name) would be:
lecturerId lecturerName Module
---------- ------------ ------
314159 Bob Smith 1
271828 Bob Smith 2
And, yes, I'm aware this doesn't answer your specific request but sometimes the right answer to "How do I do XYZZY?" is "Don't do XYZZY, it's a bad idea for these reasons ...".
Things like writing operating systems in COBOL, accounting packages in assembler, or anything in Pascal come to mind instantly :-)
You could subquery your count statement.
SELECT lecturername,
(SELECT Count(*)
FROM teaches t
WHERE t.lecturerid = l.lecturerid
AND t.year = 2011) AS NumOfModules
FROM lecturer l
Note there are other ways of doing this. If you also wanted to elimiate the rows with no modules you can then try.
SELECT *
FROM (SELECT lecturername,
(SELECT Count(*)
FROM teaches t
WHERE t.lecturerid = l.lecturerid
AND t.year = 2011) AS NumOfModules
FROM lecturer l) AS temp
WHERE temp.numofmodules > 0