I have a table with cities and publishers in those cities.
How do I create a query that counts how many publishers are in which city, and then also shows the names of those publishers only for the city with the most publishers.
See if this works for you.
SELECT Sub2.City,
Table1.Publisher
FROM (
SELECT TOP 1 Sub.City, Sub.CountOfPublisher AS MaxOfCountOfPublisher
FROM (SELECT Table1.City,
Count(Table1.Publisher) AS CountOfPublisher
FROM Table1
GROUP BY Table1.City
) AS Sub
ORDER BY Sub.CountOfPublisher DESC
) AS Sub2
INNER JOIN Table1
ON Sub2.City = Table1.City
Note: Replace Table1 with your table name.
Explanation
So, we needed to roll up our aggregation so Access can use the information we provided in our Count.
From the inside out:
We need a Count of Publishers for each City, and we want to Group By each City. This gives us some numbers we can work with per City.
The next piece, we want to select the TOP 1 record from those results, sorted by the Count of Publishers DESC (descending - highest to lowest). So, sorting by the Count Descending makes the first record the City with the highest count of Publishers. We then use that in our final step.
Finally, we wanted the City and the Publishers for the City with the most publishers. Well, thus far we have the City with the most Publishers, but we don't have a list of Publishers. To get those, we need to join our original Table on City.
This basically says, I have the City with the most Publishers, now give me all of the records from our Table where the City is equal to the City (INNER JOIN Table1 ON Sub2.City = Table1.City) we are supplying in our query.
Related
I have a table bls_jobs with the following columns: city, state, occ_title, jobs_1000, and loc_quotient
I am trying to retrieve the highest loc_quotient for each city (each city has several occ_titles, and each occ_title has a loc_quotient)
Currently, I can use this query:
SELECT *
FROM bls_jobs
WHERE city = 'Hattiesburg'
ORDER BY loc_quotient DESC
LIMIT 1
Which does return what I'm looking for (the highest loc_quotient in the city, with each of the columns returned), but I'm struggling to figure out how to have it do this for all cities so I have a returned output of just each city's highest loc_quotient along with it's data from the other columns ...
Use distinct on:
SELECT DISTINCT ON (j.city) j.*
FROM bls_jobs j
ORDER BY j.city, j.loc_quotient DESC;
DISTINCT ON is a convenient Postgres extension. It returns the first row in each group, where groups are the keys in the DISTINCT ON () clause (and the ORDER BY is consistent with them).
I have two database tables:
Cities with columns:
Country_Code | City_Code | City_Name
Countries with columns
Country_Code | Country_Name
Based on a few chars entered by User, it checks the City_Name column to return results to populate a City autocomplete box. The result needs to have the city code, city name, country code, and country name, hence the need for a join.
The query I am using is
SELECT TOP 10
ci.Country_Code, ci.City_Code, ci.City_Name, co.Country_Name
FROM
Cities ci
LEFT OUTER JOIN
Countries co ON ci.Country_Code = co.Country_Code
WHERE
ci.City_Name LIKE '#CityName'
ORDER BY
ci.City_Name
The results I get are correct, but the query takes a long time to complete. From what I understand, first, the results contain join of both the tables, then the where clause kicks in to get the specific rows only, which are ordered by City Name and top 10 results returned.
My question is, is there a way to speed up the query. Have the where clause checked, and then only perform the join, better still perform it only on the top 10 results? I tried putting my WHERE clause in the ON clause, but that gave wrong results.
EDIT : #CityName contains 2-3 chars entered by the user and then a '%'.
I'd suggest start with adding clustered index on Countries.Country_Code (also making it the primary key of the Countries table if it is not already so). An index would sort the table such that the search speed in join is increased.
This appears to be your query:
SELECT TOP 10 ci.Country_Code, ci.City_Code, ci.City_Name, co.Country_Name
FROM Cities ci LEFT OUTER JOIN
Countries co
ON ci.Country_Code = co.Country_Code
WHERE ci.City_Name LIKE #CityName
ORDER BY ci.City_Name ;
Quotes should not be needed around #CityName.
I don't understand the LEFT JOIN. It suggests that there are cities without a valid Country_Code -- and that seems unlikely.
Assuming #CityName does not start with a wildcard (as suggested by your question), then this can make use of an index. I would suggest the following indexes:
cities(city_name, country_code)
countries(country_code, country_name)
The second is not needed if country_code is a primary key.
Given the schema here I'm trying to understand and solve the below 3 sql queries as I'm confused:
1- Present a table giving the names of the countries with ≥ 50% urbanization
rates, their urbanization rates, and their per capita GDP. Note that
urbanization rate is the percentage of population living in cities. Do not
count cities with NULL values for population.
SELECT country.name, round(sum(city.population)/country.population, 3) AS urban, round(gdp/country.population, 3) AS gdppc
FROM city
INNER JOIN country ON code = country
INNER JOIN economy ON code = economy.country
WHERE city.population IS NOT NULL
GROUP BY country.name, country.population, economy.gdp
HAVING round(sum(city.population)/country.population, 3) >= 0.5
ORDER BY urban DESC;
In the above query, Why I need to include country.population and economy.gdp in the GROUP BY? If I tried using just country.name in the GROUP BY I get an error saying I should include the others.
2- Show organizations that have as members all the European countries with over 50 million people?
SELECT name
FROM organization
INNER JOIN (SELECT organization
FROM country
INNER JOIN encompasses
ON code = encompasses.country
INNER JOIN ismember
ON code = ismember.country
WHERE population > 50000000 AND continent = 'Europe'
GROUP BY organization
HAVING count(ismember.country) = (SELECT count(*)
FROM country
INNER JOIN encompasses
ON code = country
WHERE population > 50000000 AND continent = 'Europe'))
AS innerQuery
ON abbreviation = innerQuery.organization;
Why I need the HAVING Part above?
3- Insert a new organization called “Tivoli” and a trigger that says if Germany joins “Tivoli” then so too must the UK and France. Insert Germany into the “Tivoli” organization. Confirm proper behavior.
I tried the below script but it's not working, any advice please?
do $$
begin
IF(NOT EXISTS ( SELECT 1 FROM organization WHERE organization."name" = 'Tivoli' AND organization.country = 'D' ))
BEGIN
INSERT INTO organization VALUES ('Tivoli','Tivoli organization',NULL,'F',NULL,NULL);
INSERT INTO organization VALUES ('Tivoli','Tivoli organization',NULL,'GB',NULL,NULL);
END;
end $$
1)
You used country.population and economy.gdp in the select, outside of aggregate functions ( COUNT(), AVG() and SUM() ), and you have a GROUP BY. Everything that you select has to be in GROUP BY or inside of aggregate functions.
2)
Because you were asked to show organizations that have ALL of 50mil + people countries. With HAVING, you check if that organization has the right amount of countries.
3)
organization."name" = 'Tivoli'
It's supposed to be :
organization.name
First of all, you should limit a question to one only, not 3. But here are some pointers for all 3:
In the above query, Why I need to include country.population and economy.gdp in the GROUP BY? If I tried using just country.name in the GROUP BY I get an error saying I should include the others.
This is a requirement. A group by country.name alone would work (in Postgres 9.1+) only if the other two fields are known to be functionally dependent on country.name. But probably country.name is not the primary key of the country table, so in theory it is possible to have two records in that table with the same name, but different population.
The rule is as follows:
When GROUP BY is present, it is not valid for the SELECT list expressions to refer to ungrouped columns except within aggregate functions or if the ungrouped column is functionally dependent on the grouped columns, since there would otherwise be more than one possible value to return for an ungrouped column. A functional dependency exists if the grouped columns (or a subset thereof) are the primary key of the table containing the ungrouped column.
This is implemented since version 9.1.
Why I need the HAVING Part above?
Because a condition on an aggregate (count in this case) can only be performed after grouping, and can thus not be expressed in the where clause. In this case the having clause makes sure that the organisation is not only present in some big EU Member States, but all big EU Member states.
I tried the below script but it's not working, any advice please?
Without a proper database schema, it is not possible to provide you with the correct SQ, but from the ERD diagram it seems that the organization table does not have a country field. Instead the ismember table connects organizations with countries. You would only insert one organization, but several ismember records (one per Member State involved)
It is better also to name the fields in your insert statement, so it is clear which value corresponds to which field.
I have a table with customers, and a table with cities.
In the customers table, the city_id is related to the id_city of the cities table.
Other fields in the tables
customers: name, surname
cities: ext_code, description, address_code
The problem is that I have thousands of customers records related to cities in which the ext_code is not present.
The cities table, elsewhere, contains a lot of duplicated records; in the duplicated sets only one record has a valid ext_code.
The problem is: substitute the city_id with a id_city that contains a valid ext_code. The only fields to evaluate to group cities are address_code or description.
Any suggestion?
If the
only fields to evaluate to group cities are address_code or
description
then that's what you should use to join your data on while filtering out the non "valid ext_code" that you mentioned.
Here is a crude way to do a one time update of customer.
You will need to tweak the code to make it work with your system, but that should be easy enough.
UPDATE Customer
SET CityID =
(
--This bit will find cities that look like the one we already have,
SELECT TOP 1 CityID
FROM City AS X
WHERE X.AddressCode = City.AddressCode
OR X.Description = City.Description
ORDER BY X.ExtCode DESC --This puts nulls last!
)
FROM Customer
INNER JOIN City ON City.CityID = Customer.CityID
Whats the best way to do this, when looking for distinct rows?
SELECT DISTINCT name, address
FROM table;
I still want to return all fields, ie address1, city etc but not include them in the DISTINCT row check.
Then you have to decide what to do when there are multiple rows with the same value for the column you want the distinct check to check against, but with different val;ues in the other columns. In this case how does the query processor know which of the multiple values in the other columns to output, if you don't care, then just write a group by on the distinct column, with Min(), or Max() on all the other ones..
EDIT: I agree with comments from others that as long as you have multiple dependant columns in the same table (e.g., Address1, Address2, City, State ) That this approach is going to give you mixed (and therefore inconsistent ) results. If each column attribute in the table is independant ( if addresses are all in an Address Table and only an AddressId is in this table) then it's not as significant an issue... cause at least all the columns from a join to the Address table will generate datea for the same address, but you are still getting a more or less random selection of one of the set of multiple addresses...
This will not mix and match your city, state, etc. and should give you the last one added even:
select b.*
from (
select max(id) id, Name, Address
from table a
group by Name, Address) as a
inner join table b
on a.id = b.id
When you have a mixed set of fields, some of which you want to be DISTINCT and others that you just want to appear, you require an aggregate query rather than DISTINCT. DISTINCT is only for returning single copies of identical fieldsets. Something like this might work:
SELECT name,
GROUP_CONCAT(DISTINCT address) AS addresses,
GROUP_CONCAT(DISTINCT city) AS cities
FROM the_table
GROUP BY name;
The above will get one row for each name. addresses contains a comma delimted string of all the addresses for that name once. cities does the sames for all the cities.
However, I don't see how the results of this query are going to be useful. It will be impossible to tell which address belongs to which city.
If, as is often the case, you are trying to create a query that will output rows in the format you require for presentation, you're much better off accepting multiple rows and then processing the query results in your application layer.
I don't think you can do this because it doesn't really make sense.
name | address | city | etc...
abc | 123 | def | ...
abc | 123 | hij | ...
if you were to include city, but not have it as part of the distinct clause, the value of city would be unpredictable unless you did something like Max(city).
You can do
SELECT DISTINCT Name, Address, Max (Address1), Max (City)
FROM table
Use #JBrooks answer below. He has a better answer.
Return all Fields and Distinct Rows
If you're using SQL Server 2005 or above you can use the RowNumber function. This will get you the row with the lowest ID for each name. If you want to 'group' by more columns, add them in the PARTITION BY section of the RowNumber.
SELECT id, Name, Address, ...
(select id, Name, Address, ...,
ROW_NUMBER() OVER (PARTITION BY Name ORDER BY id) AS RowNo
from table) sub
WHERE RowNo = 1