SQL - subquery returning more than 1 value - sql

What my issue is:
I am constantly returning multiple values when I don't expect to. I am attempting to get a specific climate, determined by the state, county, and country.
What I've tried:
The code given below. I am unsure as to what is wrong with it specifically. I do know that it is returning multiple values. But why? I specify that STATE_ABBREVIATION = PROV_TERR_STATE_LOC and with the inner joins that I do, shouldn't that create rows that are similar except for their different CLIMATE_IDs?
SELECT
...<code>...
(SELECT locations.CLIMATE_ID
FROM REF_CLIMATE_LOCATION locations, SED_BANK_TST.dbo.STATIONS stations
INNER JOIN REF_STATE states ON STATE_ID = states.STATE_ID
INNER JOIN REF_COUNTY counties ON COUNTY_ID = counties.COUNTY_ID
INNER JOIN REF_COUNTRY countries ON COUNTRY_ID = countries.COUNTRY_ID
WHERE STATE_ABBREVIATION = PROV_TERR_STATE_LOC) AS CLIMATE_ID
...<more code>...
FROM SED_BANK_TST.dbo.STATIONS stations
I've been at this for hours, looking up different questions on SO, but I cannot figure out how to make this subquery return a single value.

All those inner joins don't reduce the result set if the IDs you're testing exist in the REF tables. Apart from that you're doing a Cartesian product between locations and stations (which may be an old fashioned inner join because of the where clause).
You'll only get a single row if you only have a single row in the locations table that matches a single row in the stations table under the condition that STATE_ABBREVIATION = PROV_TERR_STATE_LOC

Your JOINs show a hierarchy of locations: Country->State->County, but your WHERE clause only limits by the state abbreviation. By joining the county you'll get one record for every county in that state. You CAN limit your results by taking the TOP 1 of the results, but you need to be very careful that that's really what you want. If you're looking for a specific county, you'll need to include that in the WHERE clause. You get some control with the TOP 1 in that it will give the top 1 based on an ORDER BY clause. I.e., if you want the most recently added, use:
SELECT TOP 1 [whatever] ORDER BY [DateCreated] DESC;
For your subquery, you can do something like this:
SELECT TOP 1
locations.CLIMATE_ID
FROM REF_CLIMATE_LOCATION locations ,
SED_BANK_TST.dbo.STATIONS stations
INNER JOIN REF_STATE states ON STATE_ID = states.STATE_ID
INNER JOIN REF_COUNTY counties ON COUNTY_ID = counties.COUNTY_ID
INNER JOIN REF_COUNTRY countries ON COUNTRY_ID = countries.COUNTRY_ID
WHERE STATE_ABBREVIATION = PROV_TERR_STATE_LOC
Just be sure to either add an ORDER BY at the end or be okay with it choosing the TOP 1 based on the "natural order" on the tables.

If you are expecting to have a single value on your sub-query, probably you need to use DISTINCT. The best way to see it is you run your sub-query separately and see the result. If you need to include other columns from the tables you used, you may do so to check what makes your result have multiple rows.
You can also use MAX() or MIN() or TOP 1 to get a single value on the sub-query but this is dependent to the logic you want to achieve for locations.CLIMATE_ID. You need to answer the question, "How is it related to the rest of the columns retrieved?"

Related

SQL Select Top with Left Join and Where clause

I have two database tables:
Cities with columns:
Country_Code | City_Code | City_Name
Countries with columns
Country_Code | Country_Name
Based on a few chars entered by User, it checks the City_Name column to return results to populate a City autocomplete box. The result needs to have the city code, city name, country code, and country name, hence the need for a join.
The query I am using is
SELECT TOP 10
ci.Country_Code, ci.City_Code, ci.City_Name, co.Country_Name
FROM
Cities ci
LEFT OUTER JOIN
Countries co ON ci.Country_Code = co.Country_Code
WHERE
ci.City_Name LIKE '#CityName'
ORDER BY
ci.City_Name
The results I get are correct, but the query takes a long time to complete. From what I understand, first, the results contain join of both the tables, then the where clause kicks in to get the specific rows only, which are ordered by City Name and top 10 results returned.
My question is, is there a way to speed up the query. Have the where clause checked, and then only perform the join, better still perform it only on the top 10 results? I tried putting my WHERE clause in the ON clause, but that gave wrong results.
EDIT : #CityName contains 2-3 chars entered by the user and then a '%'.
I'd suggest start with adding clustered index on Countries.Country_Code (also making it the primary key of the Countries table if it is not already so). An index would sort the table such that the search speed in join is increased.
This appears to be your query:
SELECT TOP 10 ci.Country_Code, ci.City_Code, ci.City_Name, co.Country_Name
FROM Cities ci LEFT OUTER JOIN
Countries co
ON ci.Country_Code = co.Country_Code
WHERE ci.City_Name LIKE #CityName
ORDER BY ci.City_Name ;
Quotes should not be needed around #CityName.
I don't understand the LEFT JOIN. It suggests that there are cities without a valid Country_Code -- and that seems unlikely.
Assuming #CityName does not start with a wildcard (as suggested by your question), then this can make use of an index. I would suggest the following indexes:
cities(city_name, country_code)
countries(country_code, country_name)
The second is not needed if country_code is a primary key.

Why use many columns in GROUP BY and HAVING clause in these examples

Given the schema here I'm trying to understand and solve the below 3 sql queries as I'm confused:
1- Present a table giving the names of the countries with ≥ 50% urbanization
rates, their urbanization rates, and their per capita GDP. Note that
urbanization rate is the percentage of population living in cities. Do not
count cities with NULL values for population.
SELECT country.name, round(sum(city.population)/country.population, 3) AS urban, round(gdp/country.population, 3) AS gdppc
FROM city
INNER JOIN country ON code = country
INNER JOIN economy ON code = economy.country
WHERE city.population IS NOT NULL
GROUP BY country.name, country.population, economy.gdp
HAVING round(sum(city.population)/country.population, 3) >= 0.5
ORDER BY urban DESC;
In the above query, Why I need to include country.population and economy.gdp in the GROUP BY? If I tried using just country.name in the GROUP BY I get an error saying I should include the others.
2- Show organizations that have as members all the European countries with over 50 million people?
SELECT name
FROM organization
INNER JOIN (SELECT organization
FROM country
INNER JOIN encompasses
ON code = encompasses.country
INNER JOIN ismember
ON code = ismember.country
WHERE population > 50000000 AND continent = 'Europe'
GROUP BY organization
HAVING count(ismember.country) = (SELECT count(*)
FROM country
INNER JOIN encompasses
ON code = country
WHERE population > 50000000 AND continent = 'Europe'))
AS innerQuery
ON abbreviation = innerQuery.organization;
Why I need the HAVING Part above?
3- Insert a new organization called “Tivoli” and a trigger that says if Germany joins “Tivoli” then so too must the UK and France. Insert Germany into the “Tivoli” organization. Confirm proper behavior.
I tried the below script but it's not working, any advice please?
do $$
begin
IF(NOT EXISTS ( SELECT 1 FROM organization WHERE organization."name" = 'Tivoli' AND organization.country = 'D' ))
BEGIN
INSERT INTO organization VALUES ('Tivoli','Tivoli organization',NULL,'F',NULL,NULL);
INSERT INTO organization VALUES ('Tivoli','Tivoli organization',NULL,'GB',NULL,NULL);
END;
end $$
1)
You used country.population and economy.gdp in the select, outside of aggregate functions ( COUNT(), AVG() and SUM() ), and you have a GROUP BY. Everything that you select has to be in GROUP BY or inside of aggregate functions.
2)
Because you were asked to show organizations that have ALL of 50mil + people countries. With HAVING, you check if that organization has the right amount of countries.
3)
organization."name" = 'Tivoli'
It's supposed to be :
organization.name
First of all, you should limit a question to one only, not 3. But here are some pointers for all 3:
In the above query, Why I need to include country.population and economy.gdp in the GROUP BY? If I tried using just country.name in the GROUP BY I get an error saying I should include the others.
This is a requirement. A group by country.name alone would work (in Postgres 9.1+) only if the other two fields are known to be functionally dependent on country.name. But probably country.name is not the primary key of the country table, so in theory it is possible to have two records in that table with the same name, but different population.
The rule is as follows:
When GROUP BY is present, it is not valid for the SELECT list expressions to refer to ungrouped columns except within aggregate functions or if the ungrouped column is functionally dependent on the grouped columns, since there would otherwise be more than one possible value to return for an ungrouped column. A functional dependency exists if the grouped columns (or a subset thereof) are the primary key of the table containing the ungrouped column.
This is implemented since version 9.1.
Why I need the HAVING Part above?
Because a condition on an aggregate (count in this case) can only be performed after grouping, and can thus not be expressed in the where clause. In this case the having clause makes sure that the organisation is not only present in some big EU Member States, but all big EU Member states.
I tried the below script but it's not working, any advice please?
Without a proper database schema, it is not possible to provide you with the correct SQ, but from the ERD diagram it seems that the organization table does not have a country field. Instead the ismember table connects organizations with countries. You would only insert one organization, but several ismember records (one per Member State involved)
It is better also to name the fields in your insert statement, so it is clear which value corresponds to which field.

SQL statement for many-to-many WHERE Item IN ALL results, not just one

Put into a simplified way, I have three tables: Products, ProductsCounties, and Counties. We can only sell certain products in certain counties, so there is a many-to-many relationship between Products and Counties with ProductsCounties.
To make it even simpler to ask (because I'm using split strings from a comma separated list in a stored procedure... among magic), let's say I just have a table called WantedCounties with CountyName's County1, County2, and County3 as records. This is basically the result of the split string function based on what I give it. My problem is, I want to give three counties (think WantedCounties is going to be what I'm giving) and that Product has to be available in ALL THREE and not just one, hence why my IN statement fails here.
SELECT DISTINCT p.* FROM Products p
JOIN ProductsCounties pc ON pc.ProductId = p.ProductId
JOIN Counties c ON pc.CountyId = c.CountyId
WHERE c.CountyName IN (SELECT CountyName FROM WantedCounties)
This is as much as I can narrow down my problem for sake of making it easy to ask. Does anyone know how to do this? I want only products that have a relationship with every single county given, not just one match IN the subquery.
I think my issue is I can't grasp how to deal with there being multiple rows being selected because of each county match on the join.
I have also tried something along the lines of = ALL (SELECT CountyName FROM WantedCounties) to no avail.
Edit: I found the solution here. I had to include this:
GROUP BY --All my refrences
HAVING COUNT(DISTINCT c.CountyName) = (
SELECT CountyName
FROM WantedCounties)
)
This only returns products that have a match in every County in WantedCounties. So there had to be all 3 counties in my results to match the 3 counties in WantedCounties in order for it to say "that's all of them."
I found the solution here. I had to include this:
GROUP BY --All my refrences
HAVING COUNT(DISTINCT c.CountyName) = (
SELECT CountyName
FROM WantedCounties)
)
This only returns products that have a match in every County in WantedCounties. So there had to be all 3 counties in my results to match the 3 counties in WantedCounties in order for it to say "that's all of them."

Cannot find correct number of values in a table that are not in another table, though I can do otherwise

I want to retrieve the course_id in table course that is not in the table takes. Table takes only contains course_id of courses taken by students. The problem is that if I have:
select count (distinct course.course_id)
from course, takes
where course.course_id = (takes.course_id);
the result is 85 which is smaller than the total number of course_id in table course, which is 200. The result is correct.
But I want to find the number of course_id that are not in the table takes, and I have:
select count (distinct course.course_id)
from course, takes
where course.course_id != (takes.course_id);
The result is 200, which is equal the number of course_id in table course. What is wrong with my code?
This SQL will give you the count of course_id in table course that aren't in the table takes:
select count (*)
from course c
where not exists (select *
from takes t
where c.course_id = t.course_id);
You didn't specify your DBMS, however, this SQL is pretty standard so it should work in the popular DBMSs.
There are a few different ways to accomplish what you're looking for. My personal favorite is the LEFT JOIN condition. Let me walk you through it:
Fact One: You want to return a list of courses
Fact Two: You want to
filter that list to not include anything in the Takes table.
I'd go about this by first mentally selecting a list of courses:
SELECT c.Course_ID
FROM Course c
and then filtering out the ones I don't want. One way to do this is to use a LEFT JOIN to get all the rows from the first table, along with any that happen to match in the second table, and then filter out the rows that actually do match, like so:
SELECT c.Course_ID
FROM
Course c
LEFT JOIN -- note the syntax: 'comma joins' are a bad idea.
Takes t ON
c.Course_ID = t.Course_ID -- at this point, you have all courses
WHERE t.Course_ID IS NULL -- this phrase means that none of the matching records will be returned.
Another note: as mentioned above, comma joins should be avoided. Instead, use the syntax I demonstrated above (INNER JOIN or LEFT JOIN, followed by the table name and an ON condition).

How do I remove "duplicate" rows from a view?

I have a view which was working fine when I was joining my main table:
LEFT OUTER JOIN OFFICE ON CLIENT.CASE_OFFICE = OFFICE.TABLE_CODE.
However I needed to add the following join:
LEFT OUTER JOIN OFFICE_MIS ON CLIENT.REFERRAL_OFFICE = OFFICE_MIS.TABLE_CODE
Although I added DISTINCT, I still get a "duplicate" row. I say "duplicate" because the second row has a different value.
However, if I change the LEFT OUTER to an INNER JOIN, I lose all the rows for the clients who have these "duplicate" rows.
What am I doing wrong? How can I remove these "duplicate" rows from my view?
Note:
This question is not applicable in this instance:
How can I remove duplicate rows?
DISTINCT won't help you if the rows have any columns that are different. Obviously, one of the tables you are joining to has multiple rows for a single row in another table. To get one row back, you have to eliminate the other multiple rows in the table you are joining to.
The easiest way to do this is to enhance your where clause or JOIN restriction to only join to the single record you would like. Usually this requires determining a rule which will always select the 'correct' entry from the other table.
Let us assume you have a simple problem such as this:
Person: Jane
Pets: Cat, Dog
If you create a simple join here, you would receive two records for Jane:
Jane|Cat
Jane|Dog
This is completely correct if the point of your view is to list all of the combinations of people and pets. However, if your view was instead supposed to list people with pets, or list people and display one of their pets, you hit the problem you have now. For this, you need a rule.
SELECT Person.Name, Pets.Name
FROM Person
LEFT JOIN Pets pets1 ON pets1.PersonID = Person.ID
WHERE 0 = (SELECT COUNT(pets2.ID)
FROM Pets pets2
WHERE pets2.PersonID = pets1.PersonID
AND pets2.ID < pets1.ID);
What this does is apply a rule to restrict the Pets record in the join to to the Pet with the lowest ID (first in the Pets table). The WHERE clause essentially says "where there are no pets belonging to the same person with a lower ID value).
This would yield a one record result:
Jane|Cat
The rule you'll need to apply to your view will depend on the data in the columns you have, and which of the 'multiple' records should be displayed in the column. However, that will wind up hiding some data, which may not be what you want. For example, the above rule hides the fact that Jane has a Dog. It makes it appear as if Jane only has a Cat, when this is not correct.
You may need to rethink the contents of your view, and what you are trying to accomplish with your view, if you are starting to filter out valid data.
So you added a left outer join that is matching two rows? OFFICE_MIS.TABLE_CODE is not unique in that table I presume? you need to restrict that join to only grab one row. It depends on which row you are looking for, but you can do something like this...
LEFT OUTER JOIN OFFICE_MIS ON
OFFICE_MIS.ID = /* whatever the primary key is? */
(select top 1 om2.ID
from OFFICE_MIS om2
where CLIENT.REFERRAL_OFFICE = om2.TABLE_CODE
order by om2.ID /* change the order to fit your needs */)
If the secondd row has one different value than it is not really duplicate and should be included.
Instead of using DISTINCT, you could use a GROUP BY.
Group by all the fields that you want to be returned as unique values.
Use MIN/MAX/AVG or any other function to give you one result for fields that could return multiple values.
Example:
SELECT Office.Field1, Client.Field1, MIN(Office.Field1), MIN(Client.Field2)
FROM YourQuery
GROUP BY Office.Field1, Client.Field1
You could try using Distinct Top 1 but as Hunter pointed out, if there is if even one column is different then it should either be included or if you don't care about or need the column you should probably remove it. Any other suggestions would probably require more specific info.
EDIT: When using Distinct Top 1 you need to have an appropriate group by statement. You would really be using the Top 1 part. The Distinct is in there because if there is a tie for Top 1 you'll get an error without having some way to avoid a tie. The two most common ways I've seen are adding Distinct to Top 1 or you could add a column to the query that is unique so that sql would have a way to choose which record to pick in what would otherwise be a tie.