SQL exercise - implicit join

SQL exercise - implicit join - sql

Good afternoon everyone,
I'm a beginner, trying to solve an exercise and can't understand why as result I get nothing but the headers.
I'm using IBM Db2.
Consider 2 tables:
CHICAGO_PUBLIC_SCHOOLS: (COMMUNITY_AREA_NAME | SAFETY_SCORE)
CENSUS_DATA: (COMMUNITY_AREA_NAME | PER_CAPITA_INCOME)
Exercise:
[Without using an explicit JOIN operator] Find the Per Capita Income of the Community Area which has a Safety Score of 1
my code:
SELECT COMMUNITY_AREA_NAME, PER_CAPITA_INCOME
FROM CENSUS_DATA
where COMMUNITY_AREA_NAME IN
(SELECT COMMUNITY_AREA_NAME FROM CHICAGO_PUBLIC_SCHOOLS
WHERE SAFETY_SCORE=1)
Thanks a lot to whoever will help me understanding
EDIT:
A little add on based on the first two comments I received, hoping this might be helpful. the safety score shouldn't be an issue, since the following code works properly:
SELECT COMMUNITY_AREA_NAME FROM CHICAGO_PUBLIC_SCHOOLS
WHERE SAFETY_SCORE=1

First, you should qualify all column names in any query that has more than one table reference:
SELECT cd.COMMUNITY_AREA_NAME, cd.PER_CAPITA_INCOME
FROM CENSUS_DATA cd
WHERE cd.COMMUNITY_AREA_NAME IN
(SELECT cps.COMMUNITY_AREA_NAME
FROM CHICAGO_PUBLIC_SCHOOLS cps
WHERE cps.SAFETY_SCORE = 1
);
Here are possibilities that I can readily think of to explain why this query would return no rows:
Either table has no rows.
CHICAGO_PUBLIC_SCHOOLS has no rows that match SAFETY_SCORE = 1.
Any rows with SAFETY_SCORE = 1 have area names that are not in the census table.
COMMUNITY_AREA_NAME does not match between the two tables.
You have not provided sample data or a fiddle. Here are some ideas for debugging:
For the first possibility, run COUNT(*) on the two tables.
For the second, just run the subquery to see if it returns any rows.
For the third, run the subquery and manually check to see if the names match in the table.
For the fourth, you have to consider strings that look the same but are different -- often because of hidden characters.

Related

Need a SQL query explained

I'm learning the databricks platform at the moment, and I'm on a lesson where we are talking about CTE's. This specific query is of a CTE in a CTE definition, and the girl in the video is not doing the best job breaking down what exactly this query is doing.
WITH lax_bos AS (
WITH origin_destination (origin_airport, destination_airport) AS (
SELECT
origin,
destination
FROM
external_table
)
SELECT
*
FROM
origin_destination
WHERE
origin_airport = 'LAX'
AND destination_airport = 'BOS'
)
SELECT
count(origin_airport) AS `Total Flights from LAX to BOS`
FROM
lax_bos;
the output of the query comes out to 684 which I know comes from the last select statement, It's just mostly everything that's going on above, I don't fully understand what's happening.

at first you choose 2 needed columns from external_table and name this cte "origin_destination" :
SELECT
origin,
destination
FROM
external_table
next you filter it in another cte named "lax_bos"
SELECT
*
FROM
origin_destination ------the cte you already made
WHERE
origin_airport = 'LAX'
AND destination_airport = 'BOS'
and this is the main query where you use cte "lax_bos" that you made in previous step, here you just count a number of flights:
SELECT
count(origin_airport) AS `Total Flights from LAX to BOS`
FROM
lax_bos

Nesting CTE's is wierd. Normally they form a single-level transformation pipeline, like this:
WITH origin_destination (origin_airport, destination_airport) AS
(
SELECT origin, destination
FROM external_table
), lax_bos AS
(
SELECT *
FROM origin_destination
WHERE origin_airport = 'LAX'
AND destination_airport = 'BOS'
)
SELECT count(origin_airport) AS `Total Flights from LAX to BOS`
FROM lax_bos;

I do not understand why you are using an common table expression (cte).
I am going to give you a quick overview of how this can be done without an cte.
Always, use some type of sample data set. There are plenty that are installed with databricks. In fact, there is one for delayed airplane departures.
The next step is to read in the file and convert it to a temporary view.
At this point, we can use the Spark SQL magic command to query the data.
The query shows plane flights from LAX to BOS. We can remove the limit 10 option and change the '*' to "count(*) as Total" to get your answer. Thus, we solved the problem without a CTE.
The above image uses a CTE to pull the origin, destination and delay for all flights from LAX to BOS. Then it bins the delays from -9 to 9 hours with counts.
Again, this can all be done in one SQL statement that might be cleaner.
I reserve CTE for more complex situations. For instance, calculating a complex math formula using a range of data and paring it with the base data set.

CTE can be recursive query, or subquery. Here, they are only simple subquery.
1st, the query origin_destination is done. Second, the query lax_bos is done over origin_destination result. And then, the final query is done on lax_bos result.

MS Access 2013, How to add totals row within SQL

I'm in need of some assistance. I have search and not found what I'm looking for. I have an assigment for school that requires me to use SQL. I have a query that pulls some colunms from two tables:
SELECT Course.CourseNo, Course.CrHrs, Sections.Yr, Sections.Term, Sections.Location
FROM Course
INNER JOIN Sections ON Course.CourseNo = Sections.CourseNo
WHERE Sections.Term="spring";
I need to add a Totals row at the bottom to count the CourseNo and Sum the CrHrs. It has to be done through SQL query design as I need to paste the code. I know it can be done with the datasheet view but she will not accept that. Any advice?

To accomplish this, you can union your query together with an aggregation query. Its not clear from your question which columns you are trying to get "Totals" from, but here's an example of what I mean using your query and getting counts of each (kind of useless example - but you should be able to apply to what you are doing):
SELECT
[Course].[CourseNo]
, [Course].[CrHrs]
, [Sections].[Yr]
, [Sections].[Term]
, [Sections].[Location]
FROM
[Course]
INNER JOIN [Sections] ON [Course].[CourseNo] = [Sections].[CourseNo]
WHERE [Sections].[Term] = [spring]
UNION ALL
SELECT
"TOTALS"
, SUM([Course].[CrHrs])
, count([Sections].[Yr])
, Count([Sections].[Term])
, Count([Sections].[Location])
FROM
[Course]
INNER JOIN [Sections] ON [Course].[CourseNo] = [Sections].[CourseNo]
WHERE [Sections].[Term] = “spring”

You can prepare your "total" query separately, and then output both query results together with "UNION".
It might look like:
SELECT Course.CourseNo, Course.CrHrs, Sections.Yr, Sections.Term, Sections.Location
FROM Course
INNER JOIN Sections ON Course.CourseNo = Sections.CourseNo
WHERE Sections.Term="spring"
UNION
SELECT "Total", SUM(Course.CrHrs), SUM(Sections.Yr), SUM(Sections.Term), SUM(Sections.Location)
FROM Course
INNER JOIN Sections ON Course.CourseNo = Sections.CourseNo
WHERE Sections.Term="spring";

Whilst you can certainly union the aggregated totals query to the end of your original query, in my opinion this would be really bad practice and would be undesirable for any real-world application.
Consider that the resulting query could no longer be used for any meaningful analysis of the data: if displayed in a datagrid, the user would not be able to sort the data without the totals row being interspersed amongst the rest of the data; the user could no longer use the built-in Totals option to perform their own aggregate operation, and the insertion of a row only identifiable by the term totals could even conflict with other data within the set.
Instead, I would suggest displaying the totals within an entirely separate form control, using a separate query such as the following (based on your own example):
SELECT Count(Course.CourseNo) as Courses, Sum(Course.CrHrs) as Hours
FROM Course INNER JOIN Sections ON Course.CourseNo = Sections.CourseNo
WHERE Sections.Term = "spring";
However, since CrHrs are fields within your Course table and not within your Sections table, the above may yield multiples of the desired result, with the number of hours multiplied by the number of corresponding records in the Sections table.
If this is the case, the following may be more suitable:
SELECT Count(Course.CourseNo) as Courses, Sum(Course.CrHrs) as Hours
FROM
Course INNER JOIN
(SELECT DISTINCT s.CourseNo FROM Sections s WHERE s.Term = "spring") q
ON Course.CourseNo = q.CourseNo

Trouble with pulling distinct data

Ok this is hard to explain partially because I'm bad at sql but this code isn't doing exactly what I want it to do. I'll try to explain what it is supposed to do as best I can and hopefully someone can spot a glaring mistake. I'm sorry about the long winded explanation but there is a lot going on here and I really could use the help.
The point of this script is to search for parts which need to be obsoleted. in other words they haven't been used in three years and are still active.
When we obsolete part, "part.status" is set to 'O'. It is normally null. Also, the word 'OBSOLETE' is usually written in to "part.description"
The "WORK_ORDER" contains every scheduled work order. These are defined by base,lot, and sub ID's. It also contains many dates such as the date when the work order was closed.
the "REQUIREMENT" table contains all the parts require for each job. many jobs may require multiple parts, some at different legs of the job. The way this is handled is that for a given "REQUIREMENT.WORKORDER_BASE_ID" and "REQUIREMENT.WORKORDER_LOT_ID", they may be listed on a dozen or so subsequent rows. Each line specifies a different "REQUIREMENT.PART_ID". The sub id separates what leg of the job that the part is needed. All of the parts I care about start with 'PCH'
When I run this code it returns 14 lines, I happen to know it should be returning about 39 right now. I believe the screwy part starts at line 17. I found that code on another form hoping that it would help solve the original problem. Without that code, I get like 27K lines because the DB is pulling every criteria matching requirement from every criteria matching work order. Many of these parts are used on multiple jobs. I've also tried using DISTINCT on REQUIREMENT.PART_ID which seems like it should solve the problem. Alas it doesn't.
So I know despite all the information I probably still didn't give nearly enough. Does anyone have any suggestions?
SELECT
PART.ID [Engr Master]
,PART.STATUS [Master Status]
,WO.CLOSE_DATE
,PT.ID [Die]
,PT.STATUS [Die Status]
FROM PART
CROSS APPLY(
SELECT
WORK_ORDER.BASE_ID
,WORK_ORDER.LOT_ID
,WORK_ORDER.SUB_ID
,WORK_ORDER.PART_ID
,WORK_ORDER.CLOSE_DATE
FROM WORK_ORDER
WHERE
GETDATE() - (360*3) > WORK_ORDER.CLOSE_DATE
AND PART.ID = WORK_ORDER.PART_ID
AND PART.STATUS ='O'
)WO
CROSS APPLY(
SELECT
REQUIREMENT.WORKORDER_BASE_ID
,REQUIREMENT.WORKORDER_LOT_ID
,REQUIREMENT.WORKORDER_SUB_ID
,REQUIREMENT.PART_ID
FROM REQUIREMENT
WHERE
WO.BASE_ID = REQUIREMENT.WORKORDER_BASE_ID
AND WO.LOT_ID = REQUIREMENT.WORKORDER_LOT_ID
AND WO.SUB_ID = REQUIREMENT.WORKORDER_SUB_ID
AND REQUIREMENT.PART_ID LIKE 'PCH%'
)REQ
CROSS APPLY(
SELECT
PART.ID
,PART.STATUS
FROM PART
WHERE
REQ.PART_ID = PART.ID
AND PART.STATUS IS NULL
)PT
ORDER BY PT.ID

This is difficult to understand without any sample data, but I took a stab at it anyway. I removed the second JOIN to PART (that had alias PART1) as it seemed unecessary. I also removed the subquery that was looking for parts HAVING COUNT(PART_ID) = 1
The first JOIN to PART should be done on REQUIREMENT.PART_ID = PART.PART_ID as the relationship as already been defined from WORK_ORDER to REQUIREMENT, hence you can JOIN PART directly to REQUIREMENT at this point.
EDIT 03/23/2015
If I understand this correctly, you just need a distinct list of PCH parts, and their respective last (read: MAX) CLOSE_DATE. If that is the case, here is what I propose.
I broke the query up into a couple of CTE's. The first CTE is simply going through the PART table and pulling out a DISTINCT list of PCH parts, grouping by PART_ID and DESCRIPTION.
The second CTE, is going through the REQUIREMENT table, joining to the WORK_ORDER table and, for each PART_ID (handled by the PARTITION) assigning the CLOSE_DATE a ROW_NUMBER in descending order. This will ensure that each ROW_NUMBER with a value of "1" will be the Max CLOSE_DATE for each PART_ID.
The final SELECT statement simply JOINS the two Cte's on PART_ID, filtering where LastCloseDate = 1 (the ROW_NUMBER assigned in the second CTE).
If I understand the requirements correctly, this should give you the desired results.
Additionally, I removed the filter WHERE PART.DESCRIPTION NOT LIKE 'OB%' because we're already filtering by PART.STATUS IS NULL and you stated above that an 'O' is placed in this field for Obsolete parts. Also, [DIE] and [ENGR MASTER] have the same value in the 27 rows being pulled before, so I just used the same field and labeled them differently.
; WITH Parts AS(
SELECT prt.PART_ID AS [ENGR MASTER]
, prt.DESCRIPTION
FROM PART prt
WHERE prt.STATUS IS NULL
AND prt.PART_ID LIKE 'PCH%'
GROUP BY prt.ID, prt.DESCRIPTION
)
, LastCloseDate AS(
SELECT req.PART_ID
, wrd.CLOSE_DATE
, ROW_NUMBER() OVER(PARTITION BY req.PART_ID ORDER BY wrd.CLOSE_DATE DESC) AS LastCloseDate
FROM REQUIREMENT req
INNER JOIN WORK_ORDER wrd
ON wrd.BASE_ID = req.WORKORDER_BASE_ID
AND wrd.LOT_ID = req.WORKORDER_LOT_ID
AND wrd.SUB_ID = req.WORKORDER_SUB_ID
WHERE wrd.CLOSE_DATE IS NOT NULL
AND GETDATE() - (365 * 3) > wrd.CLOSE_DATE
)
SELECT prt.PART_ID AS [DIE]
, prt.PART_ID AS [ENGR MASTER]
, prt.DESCRIPTION
, lst.CLOSE_DATE
FROM Parts prt
INNER JOIN LastCloseDate lst
ON prt.PART_ID = lst.PART_ID
WHERE LastCloseDate = 1

SQL Nested Query with distinct count

I have a dilemma, and I'm hoping someone will be able to help me out. I am attempting to work on some made up problems from an old text book of mine, this isn't a question from the book, but the data is, I just wanted to see if I could still work in SQL, so here goes. When this code is executed,
SELECT COUNT(code_description) "Number of Different Crimes", last, first,
code_description
FROM
(
SELECT criminal_id, last, first, crime_code, code_description
FROM criminals
JOIN crimes USING (criminal_id)
JOIN crime_charges USING (crime_id)
JOIN crime_codes USING (crime_code)
ORDER BY criminal_id
)
WHERE criminal_id = 1020
GROUP BY last, first, code_description;
I am provided with these results:
Number of Different Crimes LAST FIRST CODE_DESCRIPTION
1 Phelps Sam Agg Assault
1 Phelps Sam Drug Offense
Inevitably, I would like the number of different crimes to be 2 for each line since this criminal has two unique crimes charged to him. I would like it to be displayed something like:
Number of Different Crimes LAST FIRST CODE_DESCRIPTION
2 Phelps Sam Agg Assault
2 Phelps Sam Drug Offense
Not to push my luck but I would also like to get rid of the follow line also:
WHERE criminal_id = 1020
to something a little more elegant to represent any criminal with more than 1 crime type associated with them, for this case, Sam Phelps is the only one in this data set.

As #sgeddes said in a comment, you can use an analytic count, which doesn't need a subquery if you're specifying the criminal ID:
SELECT COUNT(code_description) OVER (PARTITION BY first, last) AS "Number of Different Crimes",
last, first, code_description
FROM criminals
JOIN crimes USING (criminal_id)
JOIN crime_charges USING (crime_id)
JOIN crime_codes USING (crime_code)
WHERE criminal_id = 1020;
If you want to look for anyone with multiple crimes then you do need a subquery so you can filter on the analytic result:
SELECT charge_count AS "Number of Different Crimes",
last, first, code_description
FROM (
SELECT COUNT(DISTINCT code_description) OVER (PARTITION BY first, last) AS charge_count,
criminal_id, last, first, code_description
FROM criminals
JOIN crimes USING (criminal_id)
JOIN crime_charges USING (crime_id)
JOIN crime_codes USING (crime_code)
)
WHERE charge_count > 1
ORDER BY criminal_id, code_description;
SQL Fiddle demo.
If the charges are across multiple crimes, but duplicated, then the distinct count still works, but you might want to make add a distinct to the overall result set - unless you want to show other crime-specific info - otherwise you get something like this.

single-row subquery returns more than one row - how to find the duplicate?

iam not a big ORACLE - SQL Expert, so i hope someone knows a good way to find the "duplicate" record wich is causing the: single-row subquery returns more than one row error.
This my Statement:
SELECT
CAST(af.SAP_SID AS VARCHAR2(4000)) APP_ID,
(SELECT DR_OPTION
FROM
DR_OPTIONS
WHERE DR_OPTIONS.ID = (
select dr_option from applications where applications.sap_sid = af.sap_sid)) DR_OPTION
FROM
APPLICATIONS_FILER_VIEW af
it works on my test system, so iam "sure" there must be an error inside the available data records, but i have no idea how to find those ..

Try with this query:
select applications.sap_sid, count(dr_option)
from applications
group by applications.sap_sid
having count(dr_option) > 1
This should give you the sap_sid of the duplicated rows

I'd suggest simplifying your query:
SELECT CAST(af.SAP_SID AS VARCHAR2(4000)) APP_ID,
dr.DR_OPTION
FROM APPLICATIONS_FILER_VIEW af
INNER JOIN applications a ON af.sap_sid = a.sap_sid
INNER JOIN DR_OPTIONS dr ON a.dr_option = dr.ID

I would investigate what you get when you run:
select dr_option from applications where applications.sap_sid = af.sap_sid
but you could force only one row to be returned (I see this as being a fudge and would not recommend using it at least add an order by to have some control over the row being returned) with something like:
SELECT
CAST(af.SAP_SID AS VARCHAR2(4000)) APP_ID,
(SELECT DR_OPTION
FROM
DR_OPTIONS
WHERE DR_OPTIONS.ID = (
select dr_option
from applications
where applications.sap_sid = af.sap_sid
and rownumber = 1)
) DR_OPTION
FROM
APPLICATIONS_FILER_VIEW af
(not tested just googled how to limit results in oracle)
If you fix the data issue (as per A.B.Cades comment) then I would recommend converting it to use joins as per weenoid's answer. this would also highlight other data issues that may arise in the future.
IN SHORT: I have never fixed anything in this way.. the real answer is to investigate the multiple rows returned and decide what you want to do maybe:
add more where clauses
order the results and only select top row
actually keep the duplicates as they represent a scenario you have not thought of before

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas