subquery in a subquery - sql

How do I create subquery in another subquery?
This is what I got, obviously it doesn't work:
"Select `region` from `region` where `regioncode` IN
(Select `city` from `city` where `citycode` IN
(Select `citycode` from `postcode` where `postcode` LIKE '%" + txtPostcode.Text + "%'))"
Sorry for my amateur questions by the way.
Tabel structure:
Tabel region:
regioncode region provincecode netnumber
Tabel city:
citycode city from to regioncode
"from" and "to" imply the postcode range of the city.
Tabel postcode:
postcode street from to citycode
Here "from" and "to" implies the range of the adress numbers
Tabel province:
provincecode province
So, I should be able to get the region from where the postal code comes from. The postal code will be entered from a textbox.
I got the street, city and adress number range. But the region doesn't work for me.
Also sorry for my broken English, and my amateur questions :.

You might consider using JOINs instead of multiple subqueries.
SELECT region from region r
JOIN city c ON c.regioncode = r.regioncode
JOIN postcode pc ON pc.citycode = c.citycode
WHERE pc.postcode LIKE '%test%'
This would be the equivalent of what you're currently trying to do with subqueries.
EDIT: I've updated the query above to not go through province since it's not needed if city has a regioncode and you're just trying to get region. You could still use subqueries. You should just be getting regioncode from the city table instead of city:
SELECT region FROM region WHERE regioncode IN
(SELECT regioncode FROM city WHERE citycode IN
(SELECT citycode FROM postcode WHERE postcode LIKE '%test%')
)
I'm just using test as a temp value for whatever would be in txtPostcode.Text

Related

Subquery yields different results when used alone

I have to write a query across two different tables country and city. The goal is to get every district and that district's population for every country. As the district is just an attribute of each city, I have to subsume all the populations of every city belonging to a district.
My query so far looks like this:
SELECT country.name, country.population, array_agg(
(SELECT (c.district, sum(city.population))
FROM city GROUP BY c.district))
AS districts
FROM country
FULL OUTER JOIN city c ON country.code = c.countrycode
GROUP BY country.name, country.population;
The result:
name | population | districts
---------------------------------------------+------------+------------------------------------------------------------------------------------------------------------------
Afghanistan | 22720000 | {"(Balkh,1429559884)","(Qandahar,1429559884)","(Herat,1429559884)","(Kabol,1429559884)"}
Albania | 3401200 | {"(Tirana,1429559884)"}
Algeria | 31471000 | {"(Blida,1429559884)","(Béjaïa,1429559884)","(Annaba,1429559884)","(Batna,1429559884)","(Mostaganem,1429559884)"
American Samoa | 68000 | {"(Tutuila,1429559884)","(Tutuila,1429559884)"}
So apparently it sums all the city-populations of the world. I need to limit that somehow to each district alone.
But if I run the Subquery alone as
SELECT (city.district, sum(city.population)) FROM city GROUP BY city.district;
it gives me the districts with their population:
row
----------------------------------
(Bali,435000)
(,4207443)
(Dnjestria,194300)
(Mérida,224887)
(Kochi,324710)
(Qazvin,291117)
(Izmir,2130359)
(Meta,273140)
(Saint-Denis,131480)
(Manitoba,618477)
(Changhwa,354117)
I realized it has to do something with the abbreviation that I use when joining. I used it for convenience but it seems to have real consequences because if I don't use it, it gives me the error
more than one row returned by a subquery used as an expression
Also, if I use
sum(c.population)
in the subquery it won't execute because
aggregate function calls cannot be nested
This abbreviation when joining apparently changes a lot.
I hope someone can shed some light on that.
Solved it myself.
Window functions are the most convenient method for this kind of task:
SELECT DISTINCT
country.name
, country.population
, city.district
, sum(city.population) OVER (PARTITION BY city.district)
AS district_population
, sum(city.population) OVER (PARTITION BY city.district)/ CAST(country.population as float)
AS district_share
FROM
country JOIN city ON country.code = city.countrycode
;
But it also works with subselects:
SELECT DISTINCT
country.name
, country.population
, city.district
,(
SELECT
sum(ci.population)
FROM
city ci
WHERE ci.district = city.district
) AS district_population
,(
SELECT
sum(ci2.population)/ CAST(country.population as float)
FROM
city ci2
WHERE ci2.district = city.district
) AS district_share
FROM
country JOIN city ON country.code = city.countrycode
ORDER BY
country.name
, country.population
;

Find string from table in cell in BiqQuery --> Query exceeded resource limits

I have two tables in BigQuery:
City List: Table: invertible-fin-XXX238.Reports.City
StationionNames: invertible-fin-XXX238.Reports.Station
Most of the StationNames containing City Names. Now I want to extract the city from the Station Table.
Here some example data:
City: Berlin
Stationname: inStore_Berlin_Alexanderplatz
Stationname: Berlin Schönefeld Airport
Stationname: Train Station Franchise Berlin
I tried the INSTR Function, but had no success (the INSTR works only with Legacy SQL and there I couldn’t use SUBSELECTS).
SELECT City,
INSTR((SELECT AdGroupName
FROM [invertible-fin-XXX238.Reports.City]),City) AS Match
FROM [invertible-fin-XXX238.Reports.Station]
Therefore I tried it with WHERE LIKE. Below the SQL Code:
SELECT a.City
FROM [invertible-fin-XXX238.Reports.City] a
CROSS JOIN [invertible-fin-XXX238.Reports.Station] b
WHERE b. Name LIKE '%' + a.City + '%'
GROUP BY a.City
But now the Query is too computationally intensive and I got the Error Code “Query exceeded resource limits for tier 1. Tier 18 or higher required.” back.
Could some please help me, writing a more resource friendly query.
Thanks in advance,
Philipp
Below are few of many possible versions for BiigQuery Standard SQL
#standardSQL
SELECT city, station
FROM `invertible-fin-XXX238.Reports.Station` AS s
JOIN `invertible-fin-XXX238.Reports.City` AS c
ON REPLACE(LOWER(station), LOWER(city), '') <> LOWER(station)
or
#standardSQL
SELECT city, station
FROM `invertible-fin-XXX238.Reports.Station` AS s
JOIN `invertible-fin-XXX238.Reports.City` AS c
ON LOWER(station) LIKE CONCAT('%',LOWER(city),'%')
You can remove LOWER() function if names of City are spelled in same case in both tables
While above versions look more straightforward - i would prefer below one as it allows control way you extract city from station -r'([^ _]+)' - you should all characters that you observe being delimiters in column station. So in this case you will extract only city when it is not part of longer name
Of course you should validate if you even need to worry of this
#standardSQL
WITH tokens AS (
SELECT token, station
FROM `invertible-fin-XXX238.Reports.Station` AS s,
UNNEST(REGEXP_EXTRACT_ALL(LOWER(station), r'([^ _]+)')) token
)
SELECT city, station
FROM tokens AS s
JOIN `invertible-fin-XXX238.Reports.City` AS c
ON LOWER(city) = token
I also wonder how the performance for a sub-query would be in this case. For instance:
WITH City AS(
SELECT 'Berlin' As Name UNION ALL
SELECT 'Hamburg'
),
StationNames AS(
SELECT 'inStore_Berlin_Alexanderplatz' AS Name UNION ALL
SELECT 'Berlin Schönefeld Airport' UNION ALL
SELECT 'Train Station Franchise Berlin' UNION ALL
SELECT 'Train Station Hamburg' UNION ALL
SELECT 'Train Station Pluton'
)
SELECT
Name StationName,
(SELECT Name FROM City c WHERE LOWER(s.Name) LIKE CONCAT('%', LOWER(c.Name), '%')) city
FROM StationNames s
Or in your case:
SELECT
Name StationName,
(SELECT Name FROM `invertible-fin-XXX238.Reports.City` c WHERE LOWER(s.Name) LIKE CONCAT('%', LOWER(c.Name), '%')) city
FROM `invertible-fin-XXX238.Reports.Station` s
I know it's common sense for most databases that JOIN has better performance than sub-queries but BigQuery have lots of different optimization techniques for storing and querying data, I was curious to know how different the performance would be in this case.

Speeding up a slow SQL query

I am using the MySQL world.sql database. Exactly what is in it doesn't matter, but the schema that matters to use looks like:
CREATE TABLE city (
name char(35),
country_code char(3),
population int(11),
);
CREATE TABLE country (
code char(3),
name char(52),
population int(11)
);
The query in question is, in english, "for each country, give me its name and population, along with the name and population for the city who has the highest ratio of its population to the country's population"
Currently I have the following SQL:
SELECT t.name, t.population, c.name, c.population
FROM country c
JOIN city t
ON t.country_code = c.code
WHERE t.population / c.population = (
SELECT MAX(tt.population / c.population)
FROM city tt
WHERE t.country_code = tt.country_code
)
Currently the query takes about 10 minutes to run on my SQLite database. The world.sql database isn't large (4000-5000 rows?) so I'm guessing I'm doing something wrong here.
I currently don't have any sort of indexes or anything: the database is an empty database with this dataset (https://dl.dropboxusercontent.com/u/7997532/world.sql) entered into it. Could anyone give me any pointers as to what I need to fix to make it run in a reasonable amount of time?
EDIT: well here's another twist to the question:
This runs in <2 seconds
SELECT t.name, t.population, c.name, c.population
FROM country c
JOIN city t
ON t.country_code = c.code
WHERE t.population * 1.0 / c.population = (
SELECT MAX(tt.population * 1.0 / c.population)
FROM city tt
WHERE tt.country_code = t.country_code
)
While this take 10 minutes to run
SELECT t.name, t.population, c.name, c.population
FROM country c
JOIN city t
ON t.country_code = c.code
AND t.population * 1.0 / c.population = (
SELECT MAX(tt.population * 1.0 / c.population)
FROM city tt
WHERE tt.country_code = t.country_code
)
Is the solution then to simply stuff as much as possible into the ON clause when i'm doing JOINs? It seems in this case I can get away without an index if I do that...
For each country, the city that has the highest ratio of population to it's country's population is the city with the highest population, so try this:
SELECT t.name, t.population, c.name, c.population
FROM country c
JOIN city t
ON t.country_code = c.code
And population =
(Select Max(population) from city
Where country_code = c.Code)
But this may still not improve performance much... if you have no indicies. You need to put an index on country.code, and on city.country_code
Ideally, I would first start with indexes and consider adding a computed field that pre-calculates the t.population / c.population into a link table
So for each country and city, you can look up it's ratio of population without computing in RBAR.
I suggest adding numeric primary keys to both tables and a foreign key on country_code in your city table. One of the benefits will be better performance because primary keys are indexed.
Edit starts here
Since the question doesn't ask you to provide the actual ratio, don't worry about trying to calculate it. The city with the highest population in the country will have the highest proportion of the country's population.

Query for Counting number of orders by UK postcode

I have got a table of orders placed by customer , what i want is to check from which part of the country orders are coming historically, I can only check this by postcodes , for intance an order with post code SK... means its stockport , similarly the post code starting from M .. means the order is from manchester, Is it possible to write a query which can count the orders by postcode.
Some of the fields of the Order table:
OrderNumber OGUID custID firstname last name address postcode email authorisation date etc...
Any suggestion or assistance will be appreciated.
Thanks
Here is way that works... but it can get too long for a huge list. I will try to find a way around that problem.
SELECT
CASE
WHEN postcode LIKE 'SK%' THEN 'SK'
WHEN postcode LIKE 'M%' THEN 'M'
END AS group_by_value
, COUNT(*) AS group_by_count
FROM [Table] a
GROUP BY
CASE
WHEN postcode LIKE 'SK%' THEN 'SK'
WHEN postcode LIKE 'M%' THEN 'M'
END
If you have a table that contains the city code and city name, then you might be able to use something like the following which joins your orders table to the codes using a LIKE:
select o.postcode,
c.city,
count(c.code) over(partition by c.code) Total
from orders o
inner join codes c
on o.postcode like '%'+c.code+'%'
See SQL Fiddle with Demo
You can use GROUP BY to get the total number of orders in each postcode:
select postcode, count(postcode) TotalOrdersByPostCode
from orders
group by postcode
If you want the City included, then you can also GROUP BY city:
select city, postcode, count(postcode) TotalOrdersByPostCode
from orders
group by city, postcode
select count(1) over(partition by postcode) as countByPostcode, othecolumnhere
from Order
Have you tried something like this? The town part of the postcode will be the first 1 or 2 bytes, delimited by a number after, I think. So this will give you the first few letters.
select substring(postcode,1, patindex('%[0-9]%',postcode)-1), count(*)
from Order
group by substring(postcode,1, patindex('%[0-9]%',postcode)-1)
Then you'll have to decode M into Manchester, W into West London, GU into Guildford etc...

Is it Possible to Use IF/Else in SQL?

Is it possible to use if/else in SQL? If I have a table called supplier with columns: sid -> primary key, sname and city.
Then I wish to:
select sid from supplier where city="taipei" if not empty.
Or select sid from supplier where city="tainan"
Yes, you can. I don't know about other DBMS but I have used such things in Microsot SQL Server in my Stored Procedure like this;
IF EXISTS
(SELECT [sid] FROM [supplier] WHERE [city]= "taipei")
select sid from supplier where city="taipei" // your true condition query
ELSE
select sid from supplier where city="tainan"
In MySQL From this link, it turns out that is also possible. see;
IF EXISTS(SELECT * FROM tbl_name WHERE category_code ='some-category-code') THEN UPDATE tbl_name SET active='0' WHERE category_code = 'some-category-code' END IF
It was unclear what you want to do (I leave my previous hypotheses below).
You want to associate a priority to your suppliers, so that the one for Taipei is selected, but if it is unavailable, then Tainan gets selected instead.
In this specific case you can just use:
SELECT sid FROM supplier WHERE city = (
SELECT MAX(city) FROM supplier WHERE city IN ('Taipei', 'Tainan')
);
The inner sub-SELECT will retrieve Taipei or, if unavailable, Tainan.
This uses the fact that Taipei is lexicographically greater than Tainan, but if you wanted a more flexible solution, MAX would not work. In that case you would change the subselect to sort cities in order of desirability (missing cities are of course undesirable) and then fetch the one most desirable:
SELECT sid FROM supplier WHERE city = (
SELECT city FROM supplier ORDER BY CASE
WHEN city = 'Taipei' THEN 1
WHEN city = 'Tainan' THEN 2
WHEN city = 'New York' THEN 3
ELSE 4
END
LIMIT 1
);
The subselect now will retrieve first Taipei, but missing Taipei it will get to Tainan and so on.
Note that if you want only one SID, you can do it much more simply:
SELECT sid FROM supplier ORDER BY CASE
WHEN city = 'Taipei' THEN 1
WHEN city = 'Tainan' THEN 2
WHEN city = 'New York' THEN 3
ELSE 4
END
LIMIT 1
This will retrieve all suppliers, but the one from Taipei, if available, will come out first; and the LIMIT 1 will truncate the response to that first row.
The solutions below do not apply
This will get sid from supplier where city is Taipei or Tainan (which of course means that city is not empty!):
SELECT sid FROM supplier WHERE city IN ('Taipei', 'Tainan');
This will get sid from supplier as above, provided sid is not empty:
SELECT sid FROM supplier WHERE city IN ('Taipei', 'Tainan') AND sid IS NOT NULL;
This will get sid from supplier as above, and replace sid if it is empty.
SELECT CASE WHEN sid IS NULL then 'Empty' ELSE sid END AS sid
FROM supplier WHERE city IN ('Taipei', 'Tainan');
Maybe you should provide two or three sample rows with the expected results.
Edit: sorry, I see now that sid is a primary key, which means it should never be empty. This means that cases 2 and 3 can never apply.
Then perhaps you mean that sname is not empty?:
SELECT sid FROM supplier WHERE city IN ('Taipei', 'Tainan')
AND sname IS NOT NULL AND sname != '';
The following selects a supplier if there is one in taipei, otherwise it selects the one in Tainan. If none of them exists, nothing will be returned.
select sid
from supplier
where city = 'Taipei'
union all
select sid
from supplier
where city = 'Tainan'
and not exists (select 1 from supplier where city = 'taipei')