SQL JOINing on max value, even if it is 0

SQL JOINing on max value, even if it is 0 - sql

I have two tables that look roughly like this:
Airports
uniqueID | Name
0001 | Dallas
Runways
uniqueID | AirportID | Length
000101 | 0001 | 8000
I'm doing a join that looks like this:
SELECT Airports.Name, Runways.Length FROM Airports, Runways
WHERE Airports.uniqueID==Runways.AirportID
Obviously, each runway has exactly one airport, and each airport has 1..n runways.
For an airport with multiple runways, this gives me several rows, one for each runway at that airport.
I want a result set that contains ONLY the row for the longest runway, i.e. MAX(Length).
Sometimes, the Length is 0 for several runways in the database, because the source data is missing. In that case I only want one row with the Length = 0 obviously.
I've tried the approach laid out here: Inner Join table with respect to a maximum value but that's actually not helpful because that's like searching for the longest runway of all, not for the longest at one particular airport.

This seems to simple to be what you want but it seems to meet all the cases you've described...
SELECT A.Name, Max(R.Length)
FROM Airports A
INNER JOIN Runways R
on A.uniqueID=R.AirportID
Group by A.Name
This should give you the max runway for each airport.
If you need additional data elements then use the above as a inline view (Subquery within the joins) to limit the results sets to just those airports and their max runway.

Related

How does SQL count(distinct) work in this case?

I'm trying to find the match no in which Germany played against Poland. This is from https://www.w3resource.com/sql-exercises/soccer-database-exercise/sql-subqueries-exercise-soccer-database-4.php. There are two tables : match_details and soccer_country. I don't understand how the count(distinct) works in this case. Can someone please clarify? Thanks!
SELECT match_no
FROM match_details
WHERE team_id = (
SELECT country_id
FROM soccer_country
WHERE country_name = 'Germany')
OR team_id = (
SELECT country_id
FROM soccer_country
WHERE country_name = 'Poland')
GROUP BY match_no
HAVING COUNT(DISTINCT team_id) = 2;

As Lamak mentioned, what an ugly consideration for a query, but many ways to approach a query.
As mentioned, counting for (Distinct team_id) makes sure that there are only 2 unique teams. If there is ever a Cartesian result, you could get repetition of multiple rows showing more than one instance of both teams. So the count of distinct on the TEAM_ID eliminates that.
Now, that said, Other "team" query data structures I have seen have a single record for the match and a column for EACH TEAM playing the match. That is easier by a long-shot, but still a relatively easy query.
Break the query down a little, and consider a large scale set of data (not that this, or any sort of even professional league would have such large record counts to give delay with a sql engine).
Your first criteria is games with Germany. So lets start with that.
SELECT
md1.match_no
FROM
match_details md1
JOIN soccer_country sc1
on md1.team_id = sc1.country_id
AND sc1.country_name = 'Germany'
So, why even look at any other record/match if Germany is not even part of the match on either side. Of which this in itself would return 6 matches from the sample data of 51 matches. So now, all you need to do is join AGAIN to the match details table a second time for only those matches, but ALSO the second team is Poland
SELECT
md1.match_no
FROM
match_details md1
JOIN soccer_country sc1
on md1.team_id = sc1.country_id
AND sc1.country_name = 'Germany'
-- joining again for the same match Germany was already qualified
JOIN match_details md2
on md1.match_no = md2.match_no
-- but we want the OTHER team record since Germany was first team
and md1.team_id != md2.team_id
-- and on to the second country table based on the SECOND team ID
JOIN soccer_country sc2
on md2.team_id = sc2.country_id
-- and the second team was Poland
AND sc2.country_name = 'Poland'
Yes, may be a longer query, but by eliminating 45 other matches (again, thinking a LARGE database), you have already saved blowing through tons of data to a very finite set. And now finishing only those Germany / Poland. No aggregates, counts, distincts, just direct joins.
FEEDBACK
Lets take a look at some BAD sample data... which as all programmers know, there is no such thing (NOT). Anyhow, lets take a look at these few matches.
Match Team ID blah
52 Poland Just put the names here for simplistic purposes
52 Poland
53 Germany
53 Germany
If you were to run the query without DISTINCT Teams, both match 52 and 53 would show up... As Poland is one team and appears 2 times for match 52, and similarly Germany 2 times for match 53. By doing DISTINCT Team, you can see that for each match, there is only 1 team being returned and thus excluded. Does that help? Again, no such thing as bad data :)
And yet another sample match where more than 2 teams created
Match Team ID
54 France
54 Poland
54 England
55 Hungary
56 Austria
In each of these matches, NONE would be returned. Match 54 has 3 distinct teams, and Match 55 and 56 only have single entry, thus no opponent to compete against.
2nd FEEDBACK
To clarify the query. If you look at the short query for just Germany, that aliased instance of "md1" is already sitting on any given record for a Germany match. So the second join to the "md2", I only care about the same match, so I can join on the same match_no. However, in the "md2" alias, the "!=" means NOT EQUAL. ! = logical NOT. So the join is saying from the MD1, join to the MD2 alias on the same match id. However, only give me where the teams are NOT the same. So the first instance holds Germany's team ID (already qualified) and thus give me the secondary team id. So now I can use the secondary (md2) instance team ID to join to the country to confirm only for Poland.
Does this now clarify things for you?

Partitioning join to limit records in SQL

I have 2 tables:
- first one containing spatial data - geometry of circles
- second contains geometries of lines.
I want to find all lines which are inside each circle. I have a query which can do that, however there are millions of records so it is unusably slow.
There is a column in both tables which is area_id and essentially all circles are assigned to particular area and all lines as well, so if I can do the intersect of the circles only with the lines in the matching area this will reduce the load a lot. The problem is I can't think of solution e.g. using windowing function. The query I am using is:
Select ct.AREA_ID, ct.Circle_descr, lt.Line_descr from circles_table as ct
JOIN lines_table as lt
ON
circles_table.Circle_location.STIntersects(points_table.Point_location)=1
*using a where clause at the end makes no difference as it is essentially part of the slow join...
+---------------+----------------------+--------------------------+
| AREA_ID (int) | Circle_descr(varchar) | Circle_location(geometry)|
+---------------+----------------------+--------------------------+
+---------------+---------------------+-------------------------+
| AREA_ID (int) | Line_descr(varchar) | Line_location(geometry) |
+---------------+---------------------+-------------------------+

Add an additional join criterion to partition the rows by area_id before comparing them. Something like
Select ct.AREA_ID, ct.Circle_descr, lt.Line_descr
from circles_table as ct
JOIN lines_table as lt
ON ct.Circle_location.STIntersects(lt.Point_location)=1
AND ct.area_id = lt.area_id

SSRS query and WHERE with multiple

Being new with SQL and SSRS and can do many things already, but I think I must be missing some basics and therefore bang my head on the wall all the time.
A report that is almost working, needs to have more results in it, based on conditions.
My working query so far is like this:
SELECT projects.project_number, project_phases.project_phase_id, project_phases.project_phase_number, project_phases.project_phase_header, project_phase_expensegroups.projectphase_expense_total, invoicerows.invoicerow_total
FROM projects INNER JOIN
project_phases ON projects.project_id = project_phases.project_id
LEFT OUTER JOIN
project_phase_expensegroups ON project_phases.project_phase_id = project_phase_expensegroups.project_phase_id
LEFT OUTER JOIN
invoicerows ON project_phases.project_phase_id = invoicerows.project_phase_id
WHERE ( projects.project_number = #iProjectNumber )
AND
( project_phase_expensegroups.projectphase_expense_total >0 )
The parameter is for selectionlist that is used to choose a project to the report.
How to have also records that have
( project_phase_expensegroups.projectphase_expense_total ) with value 0 but there might be invoices for that project phase?
Tried already to add another condition like this:
WHERE ( projects.project_number = #iProjectNumber )
AND
( project_phase_expensegroups.projectphase_expense_total > 0 )
OR
( invoicerows.invoicerow_total > 0 )
but while it gives some results - also the one with projectphase_expense_total with value 0, but the report is total mess.
So my question is: what am I doing wrong here?

There is a core problem with your query in that you are left joining to two tables, implying that rows may not exist, but then putting conditions on those tables, which will eliminate NULLs. That means your query is internally inconsistent as is.
The next problem is that you're joining two tables to project_phases that both may have multiple rows. Since these data are not related to each other (as proven by the fact that you have no join condition between project_phase_expensegroups and invoicerows, your query is not going to work correctly. For example, given a list of people, a list of those people's favorite foods, and a list of their favorite colors like so:
People
Person
------
Joe
Mary
FavoriteFoods
Person Food
------ ---------
Joe Broccoli
Joe Bananas
Mary Chocolate
Mary Cake
FavoriteColors
Person Color
------ ----------
Joe Red
Joe Blue
Mary Periwinkle
Mary Fuchsia
When you join these with links between Person <-> Food and Person <-> Color, you'll get a result like this:
Person Food Color
------ --------- ----------
Joe Broccoli Red
Joe Bananas Red
Joe Broccoli Blue
Joe Bananas Blue
Mary Chocolate Periwinkle
Mary Chocolate Fuchsia
Mary Cake Periwinkle
Mary Cake Fuchsia
This is essentially a cross-join, also known as a Cartesian product, between the Foods and the Colors, because they have a many-to-one relationship with each person, but no relationship with each other.
There are a few ways to deal with this in the report.
Create ExpenseGroup and InvoiceRow subreports, that are called from the main report by a combination of project_id and project_phase_id parameters.
Summarize one or the other set of data into a single value. For example, you could sum the invoice rows. Or, you could concatenate the expense groups into a single string separated by commas.
Some notes:
Please, please format your query before posting it in a question. It is almost impossible to read when not formatted. It seems pretty clear that you're using a GUI to create the query, but do us the favor of not having to format it ourselves just to help you
While formatting, please use aliases, Don't use full table names. It just makes the query that much harder to understand.

You need an extra parentheses in your where clause in order to get the logic right.
WHERE ( projects.project_number = #iProjectNumber )
AND (
(project_phase_expensegroups.projectphase_expense_total > 0)
OR
(invoicerows.invoicerow_total > 0)
)
Also, you're using a column in your WHERE clause from a table that is left joined without checking for NULLs. That basically makes it a (slow) inner join. If you want to include rows that don't match from that table you also need to check for NULL. Any other comparison besides IS NULL will always be false for NULL values. See this page for more information about SQL's three value predicate logic: http://www.firstsql.com/idefend3.htm
To keep your LEFT JOINs working as you intended you would need to do this:
WHERE ( projects.project_number = #iProjectNumber )
AND (
project_phase_expensegroups.projectphase_expense_total > 0
OR project_phase_expensegroups.project_phase_id IS NULL
OR invoicerows.invoicerow_total > 0
OR invoicerows.project_phase_id IS NULL
)

I found the solution and it was kind easy after all. I changed the only the second LEFT OUTER JOIN to INNER JOIN and left away condition where the query got only results over zero. Also I used SELECT DISTINCT
Now my report is working perfectly.

oracle - sql query select max from each base

I'm trying to solve this query where i need to find the the top balance at each base. Balance is in one table and bases are in another table.
This is the existing query i have that returns all the results but i need to find a way to limit it to 1 top result per baseID.
SELECT o.names.name t.accounts.bidd.baseID, MAX(t.accounts.balance)
FROM order o, table(c.accounts) t
WHERE t.accounts.acctype = 'verified'
GROUP BY o.names.name, t.accounts.bidd.baseID;
accounts is a nested table.
this is the output
Name accounts.BIDD.baseID MAX(T.accounts.BALANCE)
--------------- ------------------------- ---------------------------
Jerard 010 1251.21
john 012 3122.2
susan 012 3022.2
fin 012 3022.2
dan 010 1751.21
What i want the result to display is calculate the highest balance for each baseID and only display one record for that baseID.
So the output would look only display john for baseID 012 because he has the highest.
Any pointers in the right direction would be fantastic.

I think the problem is cause of the "Name" column. since you have three names mapped to one base id(12), it is considering all three records as unique ones and grouping them individually and not together.
Try to ignore the "Name" column in select query and in the "Group-by" clause.
SELECT t.accounts.bidd.baseID, MAX(t.accounts.balance)
FROM order o, table(c.accounts) t
WHERE t.accounts.acctype = 'verified'
GROUP BY t.accounts.bidd.baseID;

SQL count query not returning correct results

Struggling getting a query to work……..
I have two tables:-
tbl.candidates:
candidate_id
agency_business_unit_id
tbl.candidate_employment_tracker
candidate_id
The candidate employment can have duplicate records of a candidate_id as it contains records on their working history for different clients.
The candidates tables is unique for each candidate.
I'm trying to obtain results which will group by agency_business_unit_id and count the amount of candidates each has which exist in the candidate_employment_tracker.
E.g.
Agency Business Unit Id | Candidates
------------------------------------------------------------
100 | 2
987 | 1
12 | 90
The query I'm working on doesn't appear to be working as I'm getting the count of the candidates in candidate_employment_tracker.
SELECT
abu.agency_business_unit_id,
abu.agency_business_unit_name,
count(c.candidate_id) AS candidateCount
FROM candidate_employment_tracker cet
INNER JOIN candidate c ON c.candidate_id = cet.candidate_id
INNER JOIN agency_business_unit abu ON abu.agency_business_unit_id = c.agency_business_unit_id
WHERE c.candidate_ni_number NOT REGEXP '^[A-CEGHJ-PR-TW-Z][A-CEGHJ-NPR-TW-Z] ?[0-9]{2} ?[0-9]{2} ?[0-9]{2} ?[ABCD]$'
GROUP BY abu.agency_business_unit_id
ORDER BY abu.agency_business_unit_name ASC
I've tried several approaches and the results are inconsistent. For instance I know one of the agency business units only has 1 candidate but the result is 2. This is as a result of this particular candidate having 2 records in the candidate employment tracker table. I'll keep bashing away but any help would be much appreciated.

Do you need
count(DISTINCT c.candidate_id)
That would avoid the double counting where candidates have 2 records in the candidate employment tracker table.

Hmmm this doesn't appear to work now that I look further into the results. When I compare the candidates for a agency business unit I get inconsistent count numbers.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas