I am doing a databases course and I have a question that I don't seem to be able to get the answer right to.
There are 3 tables:
country(code, iso_abbreviation, name)
area(name, city, country_code, latitude, longitude, elevation)
attraction(name, type, city, country_name, latitude, longitude, elevation)
Now, the question asks this: areas are found in both the attraction and area tables. List
(country_abbreviation, area_name, latitude, longitude, elevation)
for all the areas above 5000 feet elevation. As there may be some inconsistency between the area and attraction data, latitude, longitude and elevation might differ. In such cases, display both variants of the data.
So I came up with the query below, but I'm not sure it pairs them up correctly and it also doesn't split the data into two rows where one of the (latitude, longitude, elevation) elements is different.
SELECT country.iso_abbreviation as country_abbreviation, area.name as name,
area.latitude, area.longitude, area.elevation
FROM area JOIN country on country.code = area.country_code
JOIN attraction on area.name = attraction.name
WHERE area.elevation > 10000
UNION
SELECT DISTINCT country.iso_abbreviation as country_abbreviation, area.name,
attraction.latitude, attraction.longitude, attraction.elevation
FROM area JOIN country on country.code = area.state_code
JOIN attraction on area.name = attraction.name
WHERE attraction.elevation > 10000 ORDER BY country_abbreviation
;
Could someone please help me out with this?
This would do what you describe:
WITH cte AS (
SELECT c.iso_abbreviation AS country_abbreviation
, a.name, a.latitude, a.longitude, a.elevation
FROM area a
JOIN country c ON c.code = a.country_code
WHERE a.elevation > 5000
)
SELECT * FROM cte
UNION
SELECT c.country_abbreviation
, t.name, t.latitude, t.longitude, t.elevation
FROM cte c
JOIN attraction t USING (name) -- assuming name links area & attraction (?)
ORDER BY country_abbreviation, name -- (?)
But honestly, the table layout as well as the task you have been given seem unclear.
Using a common table expression to reuse results from first query.
UNION (as opposed to UNION ALL) removes full duplicates automatically
Related
I'm using MS Access for the following task (due to office restrictions). I'm quite new to SQL.
I have the following table:
I want to select all stores grouped by street, zip and place. But i only want to group them, if the SquareSum (after Group by) is < 1000. Rue de gare 2 should be grouped, while Bahnhofstrasse 23 should be seperate lines.
So far as i know MS Access doesn't allow a case statement. So my query looks like this:
SELECT
Street,
ZIP,
Place,
Sum(Square) AS SumSquare,
FROM Table1
SWITCH (SumSquare > 1000, GROUP BY (Street, ZIP, Place))
I also tried:
GROUP BY
SWITCH (SumSquare > 1000, (Street, ZIP, Place))
But it keeps telling me i have a syntax error. Could someone please help me?
In Access, I would do this with several queries.
This would be easier to do if you had an id on the rows (such as an autonumber).
First query identifies the streets that should be summed.
query: SumTheseStreets
SELECT
Street,
ZIP,
Place,
Sum(Square) AS SumSquare
FROM Table1
GROUP BY Street, ZIP, Place
HAVING sum(Square) < 1000
Note the HAVING which is a bit like a WHERE clause that's applied outside of the GROUP BY or SUM
Second query identifies the other rows (notes on this one below):
query: StreetsNotSummed
SELECT
Street,
ZIP,
Place,
Square AS SumSquare
FROM Table1
LEFT JOIN SumTheseStreets ON Table1.Street = SumTheseStreets.Street AND Table1.ZIP = SUmTheseStreets.ZIP AND Table1.Place = SumTheseStreets.Place
WHERE SumTheseStreets.Street IS NULL;
A couple of notes:
I've called the field SumSquare because I want it to be the same name as the SumSquare field in the first query
It uses the first query as one of the input "tables"
This uses a LEFT JOIN which means "give me all of the rows in the first table (table1) and if any rows in the second table (SumTheseStreets) match, put those in as well.
but then it filters out the rows that DO match.
So this query only lists the streets that you want NOT summed.
So now you need a third query.
This simply includes all of the rows in both of those queries.
I'm not too sure on the Access syntax on this one, but there's a union query wizard if this isn't right.
Query: TheAnswerRequired
SELECT
Street,
ZIP,
Place,
SumSquare
FROM SumTheseStreets
UNION
SELECT
Street,
ZIP,
Place,
SumSquare
FROM StreetsNotSummed
(it might need to be UNION ALL)
Good luck.
You can use UNION ALL:
SELECT ts.*
FROM (SELECT Street, Zip, Place, SUM(Square) as SumSquare
FROM Table1
GROUP BY Street, Zip, Place
) as ts
WHERE ts.SumSquare < 1000
UNION ALL
SELECT t1.*
FROM Table1 as t1 INNER JOIN
(SELECT Street, Zip, Place, SUM(Square) as SumSquare
FROM Table1
GROUP BY Street, Zip, Place
) as ts
ON t1.Street = ts.Street AND t1.Zip = ts.Zip and t1.Place = ts.Place
WHERE ts.SumSquare >= 1000
I have two tables in BigQuery:
City List: Table: invertible-fin-XXX238.Reports.City
StationionNames: invertible-fin-XXX238.Reports.Station
Most of the StationNames containing City Names. Now I want to extract the city from the Station Table.
Here some example data:
City: Berlin
Stationname: inStore_Berlin_Alexanderplatz
Stationname: Berlin Schönefeld Airport
Stationname: Train Station Franchise Berlin
I tried the INSTR Function, but had no success (the INSTR works only with Legacy SQL and there I couldn’t use SUBSELECTS).
SELECT City,
INSTR((SELECT AdGroupName
FROM [invertible-fin-XXX238.Reports.City]),City) AS Match
FROM [invertible-fin-XXX238.Reports.Station]
Therefore I tried it with WHERE LIKE. Below the SQL Code:
SELECT a.City
FROM [invertible-fin-XXX238.Reports.City] a
CROSS JOIN [invertible-fin-XXX238.Reports.Station] b
WHERE b. Name LIKE '%' + a.City + '%'
GROUP BY a.City
But now the Query is too computationally intensive and I got the Error Code “Query exceeded resource limits for tier 1. Tier 18 or higher required.” back.
Could some please help me, writing a more resource friendly query.
Thanks in advance,
Philipp
Below are few of many possible versions for BiigQuery Standard SQL
#standardSQL
SELECT city, station
FROM `invertible-fin-XXX238.Reports.Station` AS s
JOIN `invertible-fin-XXX238.Reports.City` AS c
ON REPLACE(LOWER(station), LOWER(city), '') <> LOWER(station)
or
#standardSQL
SELECT city, station
FROM `invertible-fin-XXX238.Reports.Station` AS s
JOIN `invertible-fin-XXX238.Reports.City` AS c
ON LOWER(station) LIKE CONCAT('%',LOWER(city),'%')
You can remove LOWER() function if names of City are spelled in same case in both tables
While above versions look more straightforward - i would prefer below one as it allows control way you extract city from station -r'([^ _]+)' - you should all characters that you observe being delimiters in column station. So in this case you will extract only city when it is not part of longer name
Of course you should validate if you even need to worry of this
#standardSQL
WITH tokens AS (
SELECT token, station
FROM `invertible-fin-XXX238.Reports.Station` AS s,
UNNEST(REGEXP_EXTRACT_ALL(LOWER(station), r'([^ _]+)')) token
)
SELECT city, station
FROM tokens AS s
JOIN `invertible-fin-XXX238.Reports.City` AS c
ON LOWER(city) = token
I also wonder how the performance for a sub-query would be in this case. For instance:
WITH City AS(
SELECT 'Berlin' As Name UNION ALL
SELECT 'Hamburg'
),
StationNames AS(
SELECT 'inStore_Berlin_Alexanderplatz' AS Name UNION ALL
SELECT 'Berlin Schönefeld Airport' UNION ALL
SELECT 'Train Station Franchise Berlin' UNION ALL
SELECT 'Train Station Hamburg' UNION ALL
SELECT 'Train Station Pluton'
)
SELECT
Name StationName,
(SELECT Name FROM City c WHERE LOWER(s.Name) LIKE CONCAT('%', LOWER(c.Name), '%')) city
FROM StationNames s
Or in your case:
SELECT
Name StationName,
(SELECT Name FROM `invertible-fin-XXX238.Reports.City` c WHERE LOWER(s.Name) LIKE CONCAT('%', LOWER(c.Name), '%')) city
FROM `invertible-fin-XXX238.Reports.Station` s
I know it's common sense for most databases that JOIN has better performance than sub-queries but BigQuery have lots of different optimization techniques for storing and querying data, I was curious to know how different the performance would be in this case.
I have the following relations in my db:
Organization: information about political and economical organizations.name: the full name of the organizationabbreviation: its abbreviation
isMember: memberships in political and economical organizations.organization: the abbreviation of the organizationcountry: the code of the member country
geo_desert: geographical information about desertsdesert: the name of the desertcountry: the country code where it is locatedprovince: the province of this country
My task is to retrieve organizations which have within their members full set of countries with deserts. This organization can have also countries without deserts. So I have a set of countries with deserts and every organization in result should have all of them as members and arbitrary amount of other (no desert) countries.
I tried so far to write following code, but it doesn't work.
WITH CountriesWithDeserts AS (
SELECT DISTINCT country
FROM dbmaster.geo_desert
), OrganizationsWithAllDesertMembers AS (
SELECT organization
FROM dbmaster.isMember AS ism
WHERE (
SELECT count(*)
FROM (
SELECT *
FROM CountriesWithDeserts
EXCEPT
SELECT country
FROM dbmaster.isMember
WHERE organization = ism.organization
)
) IS NULL
), OrganizationCode AS (
SELECT name, abbreviation
FROM dbmaster.Organization
)
SELECT oc.name AS Organization
FROM OrganizationCode AS oc, OrganizationsWithAllDesertMembers AS owadm
WHERE oc.abbreviation=owadm.organization;
UPD: DBMS says: "ism.organization is not defined"
I'm using DB2/LINUXX8664 9.7.0
Output should look like this:
NAME --------------------------------------------------------------------------------
African, Caribbean, and Pacific Countries
African Development Bank
Agency for Cultural and Technical Cooperation
Andean Group
I find the easiest way to handle this is by using group by and having. You just want to focus on the deserts, so the rest of the countries don't matter.
select m.organization
from isMember m join
geo_desert d
on m.country = d.country
group by m.organization
having count(distinct m.country) = (select count(distinct d.country) from geo_desert);
The having clause simply counts the number of matching (i.e. desert) countries and checks that all are included.
Word it like this: You are looking for organizations for which not exists a desert country they don't include.
select *
from organization o
where not exists
(
select country from geo_desert
except
select country from ismember
where organization = o.abbreviation
);
Here are two equivalent solutions:
First:
WITH CountriesWithDeserts AS (
SELECT DISTINCT country
FROM dbmaster.geo_desert
), OrganizationsWithAllDesertMembers AS (
SELECT ism.organization
FROM dbmaster.isMember AS ism
JOIN CountriesWithDeserts AS cwd
ON ism.country = cwd.country
GROUP BY ism.organization
HAVING count(ism.country) = (SELECT count(*) FROM CountriesWithDeserts)
), OrganizationCode AS (
SELECT name, abbreviation
FROM dbmaster.Organization
)
SELECT oc.name AS Organization
FROM OrganizationCode AS oc, OrganizationsWithAllDesertMembers AS owadm
WHERE oc.abbreviation=owadm.organization;
Second:
WITH CountriesWithDeserts AS (
SELECT DISTINCT country
FROM dbmaster.geo_desert
)
SELECT org.name AS Organization
FROM dbmaster.Organization AS org
WHERE NOT EXISTS (
SELECT *
FROM CountriesWithDeserts
EXCEPT
SELECT country
FROM dbmaster.isMember
WHERE organization = org.abbreviation
);
Schema is below:
Ships(name, yearLaunched, country, numGuns, gunSize, displacement)
Battles(ship, battleName, result)
where name and ship are equal. By this I mean if 'Missouri' was one of the tuple
results for name, 'Missouri' would also appear as a tuple result for ship.
(i.e. name = 'Missouri' , ship = 'Missouri)
They are the same
Now the question I have is what SQL statement would I make in order to list
the battleship amongst a list of battleships that has the largest amount
of guns (i.e. gunSize)
I tried:
SELECT name, max(gunSize)
FROM Ships
But this gave me the wrong result.
I then tried:
SELECT s.name
FROM Ships s,
(SELECT MAX(gunSize) as "Largest # of Guns"
FROM Ships
GROUP BY name) maxGuns
WHERE s.name = maxGuns.name
But then SQLite Admin gave me an error saying that no such column 'maxGuns' exists
even though I assigned it as an alias: maxGuns
Do any of you know what the correct query for this problem would be?
Thanks!
The problem in your query is that the subquery has no column named name.
Anyway, to find the largest amount of guns, just use SELECT MAX(gunSize) FROM Ships.
To get all ships with that number of guns, you need nothing more than a simple comparison with that value:
SELECT name
FROM Ships
WHERE gunSize = (SELECT MAX(gunSize)
FROM Ships)
It does not exist because you are trying to alias a subquery in the 'Where' clause, instead of aliasing specific column from a table. In order to identify the ship with the most guns you could try something like:
with cte as (select *
,ROW_NUMBER() over (order by s.gunsize desc) seq
from ships s )
select * from cte
where seq = '1'
Another approach could be: And it will only select the 1st row,containing the ship with highest number of guns.
select Top 1 *
from ships s
order by s.gunsize desc
WITH TAB_SHIPS(NAME, NUMGUNS,DISPLACEMENT) AS (SELECT NAME, NUMGUNS,DISPLACEMENT FROM SHIPS AS S
LEFT JOIN CLASSES AS C
ON S.CLASS=C.CLASS
WHERE C.NUMGUNS >=ALL(SELECT NUMGUNS FROM CLASSES C1 WHERE C1.DISPLACEMENT = C.DISPLACEMENT )
UNION
SELECT SHIP, NUMGUNS,DISPLACEMENT FROM OUTCOMES AS O
LEFT JOIN CLASSES AS C
ON C.CLASS=O.SHIP
WHERE C.NUMGUNS >=ALL(SELECT NUMGUNS FROM CLASSES C1 WHERE C1.DISPLACEMENT = C.DISPLACEMENT ) )
SELECT NAME FROM TAB_SHIPS
WHERE NUMGUNS IS NOT NULL
I am using the MySQL world.sql database. Exactly what is in it doesn't matter, but the schema that matters to use looks like:
CREATE TABLE city (
name char(35),
country_code char(3),
population int(11),
);
CREATE TABLE country (
code char(3),
name char(52),
population int(11)
);
The query in question is, in english, "for each country, give me its name and population, along with the name and population for the city who has the highest ratio of its population to the country's population"
Currently I have the following SQL:
SELECT t.name, t.population, c.name, c.population
FROM country c
JOIN city t
ON t.country_code = c.code
WHERE t.population / c.population = (
SELECT MAX(tt.population / c.population)
FROM city tt
WHERE t.country_code = tt.country_code
)
Currently the query takes about 10 minutes to run on my SQLite database. The world.sql database isn't large (4000-5000 rows?) so I'm guessing I'm doing something wrong here.
I currently don't have any sort of indexes or anything: the database is an empty database with this dataset (https://dl.dropboxusercontent.com/u/7997532/world.sql) entered into it. Could anyone give me any pointers as to what I need to fix to make it run in a reasonable amount of time?
EDIT: well here's another twist to the question:
This runs in <2 seconds
SELECT t.name, t.population, c.name, c.population
FROM country c
JOIN city t
ON t.country_code = c.code
WHERE t.population * 1.0 / c.population = (
SELECT MAX(tt.population * 1.0 / c.population)
FROM city tt
WHERE tt.country_code = t.country_code
)
While this take 10 minutes to run
SELECT t.name, t.population, c.name, c.population
FROM country c
JOIN city t
ON t.country_code = c.code
AND t.population * 1.0 / c.population = (
SELECT MAX(tt.population * 1.0 / c.population)
FROM city tt
WHERE tt.country_code = t.country_code
)
Is the solution then to simply stuff as much as possible into the ON clause when i'm doing JOINs? It seems in this case I can get away without an index if I do that...
For each country, the city that has the highest ratio of population to it's country's population is the city with the highest population, so try this:
SELECT t.name, t.population, c.name, c.population
FROM country c
JOIN city t
ON t.country_code = c.code
And population =
(Select Max(population) from city
Where country_code = c.Code)
But this may still not improve performance much... if you have no indicies. You need to put an index on country.code, and on city.country_code
Ideally, I would first start with indexes and consider adding a computed field that pre-calculates the t.population / c.population into a link table
So for each country and city, you can look up it's ratio of population without computing in RBAR.
I suggest adding numeric primary keys to both tables and a foreign key on country_code in your city table. One of the benefits will be better performance because primary keys are indexed.
Edit starts here
Since the question doesn't ask you to provide the actual ratio, don't worry about trying to calculate it. The city with the highest population in the country will have the highest proportion of the country's population.