How to design table relationship where the foreign key can mean "all rows", "some rows" or "one row"? - sql

I hope you can help me with this. I've used pseudocode to keep everything simple.
I have a table which describes locations.
location_table
location = charfield(200) # New York, London, Tokyo
A product manager now wants locations to be as follows:
Global = select every location
Asia = select every location in Asia
US = select every location in US
Current system = London (etc.)
This is my proposed redesign.
location_table
location = charfield(200) # New York, London, Tokyo
continent = foreign key to continent_table
continent_table
continent = charfield(50) # "None", "Global", Asia, Europe
But this seems horrible. It means in my code I'll always need to check if the customer is using "global" or "none", and then select the corresponding location records. For example, there will be code like this scattered everywhere:
get continent
if continent is global, select everything from location_table
else if continent is none, select location from location_table
else select location from location_table where foreign key is continent
My feeling is this is a known problem, and there is a known solution for it. Any ideas?
Thank you.

What you seem to have here is a set of locations, and then a set of location groups. Those groups might be all of the locations (global), or a subset of them.
You can build this with an intermediate table between the locations and a new location sets table which associates locations and location sets.
You might build the location set table and the join table so that the individual locations are also location sets, but ones which join only to one location. That way all location selections come from one table -- the location sets.
So you end up with three different types of location set:
Ones which map 1:1 with a location
One which maps 1:all ("global")
Ones which map 1:many (continents and other areas)
It's conceivable that this could be created as a hierarchy, but those queries can be inefficient because the join cardinalities tend to be obscured from the optimiser.

You could do this using a hierarchy, and a self referencing foreign key, e.g.
LocationID Name ParentLocationID LocationType
------------------------------------------------------------------
1 Planet Earth NULL Planet
2 Africa 1 Continent
3 Antartica 1 Continent
4 Asia 1 Continent
5 Australasia 1 Continent
6 Europe 1 Continent
7 North America 1 Continent
8 South America 1 Continent
9 United States 7 Country
10 Canada 7 Country
11 Mexico 7 Country
12 California 9 State
13 San Diego 12 City
14 England 6 Country
15 Cornwall 14 County
16 Truro 15 City
Hierarchical data usually requires either recursion, or multiple joins to get all levels, this answer contains links to articles comparing performance on the major DBMS.
Many DBMS now support recursive Common table expressions, and since no DBMS is specified I will use SQL Server syntax because it is what I am most comfortable with, a quick example would be.
DECLARE #LocationID INT = 7; -- NORTH AMERICA
WITH LocationCTE AS
( SELECT l.LocationID, l.Name, l.ParentLocationID, l.LocationType
FROM dbo.Location AS l
WHERE LocationID = #LocationID
UNION ALL
SELECT l.LocationID, l.Name, l.ParentLocationID, l.LocationType
FROM dbo.Location AS l
INNER JOIN LocationCTE AS c
ON c.LocationID = l.ParentLocationID
)
SELECT *
FROM LocationCTE;
Output based on above sample data
LocationID Name ParentLocationID LocationType
-----------------------------------------------------------------
7 North America 1 Continent
9 United States 7 Country
10 Canada 7 Country
11 Mexico 7 Country
12 California 9 State
13 San Diego 12 City
Online Demo
Supplying a value of 1 (Planet Earth) for the location ID will return the full table, or supplying a locationID of 11 (Mexico) would only return this one row, because there is nothing smaller than this in the sample data.

I'll go with your answer and say that I don't find it quite horrible to look everytime a customer to check if he searches by city or location, or nothing. That would be the role of the backend code and would always lead to different queries depending on what option he chooses.
But I would remove "None", "Global" from the continent table, and just use other queries when these option are not chosen. You would end up with the 3 possibles SQL queries you have, and I don't find it to be bad design per se. Maybe other solution are more performant, but this one seems to be more readable and logical. It's just optional querying with join tables.
Other answer will trade performance/duplication for readability (which isn't a bad thing, depending on how many time you will be relying on this condition in your application, in how many queries you'll be using it, and how many cities you have).
For readability and non-repetition, the best thing would be to concentrate these condition in one SQL function wich take a string parameter and return all location depending on the input (but at the cost of preformance).

Use levels:
0 -> None
00 -> Global
001 -> Europe
002 -> Asia
003 -> Africa
select location from location_table where continent like '[value]%'
Using a fixed length code, you can prefix regions, and then add one more digit for a region inside a region, and so on.
Ok, let me try to improve it.
Consider the world, it has the minimum level (or maximum depending on how you see it)
World ID = '0' (1 digit)
Now, select how you want to divide the world: (Continents, Half-Continents, ...) and assign the next level.
Europe ID = '01' (First digit World + Second digit Europe)
Asia ID = '02'
America ID = '03'
...
Next Level: Countries. (At least 2 digits)
England ID = '0101' (World + Continent + Country)
Deutchland ID = '0102'
....
Texas ID = '0301'
....
Next Level: Regions (2 digits)
Yorkshire ID = '010101' (World + Continent + Country + Region)
....
Next Level: Cities (2 or 3 digits)
London ID = '01010101' (World + Continent + Country + Region + City)
And so on.
Now, the same SELECT some_aggregate, statistics, ... FROM ... can be used for no matter what region, simply change:
WHERE Region like '0%' --> The whole world
WHERE Region like '02%' --> Asia
WHERE Region like '01010101%' --> London
WHERE Region like '02%' AND Region like '01%' --> Asia & Europe

Related

Best way to add info/description to my items?

I made a geo game a while back where the player has to guess an item from an image (what I call an item is a SQL row basically) for example the bot sends the flag of the Netherlands, you have to type "Netherlands" to win.
Items can be the flag of a country, a capital city, a french department...
I made an info tab where it would basically give info about an item (ie region, former name, capital city, etc).
What I would like to do is properly save this information. I don't really know if I should store this in files like JSON because I would also like to give stats (Win rate per region, amount of games played per region, etc...).
Also, these elements are not fixed because some items have regions, capital cities or whatever and some don't.
Item examples :
(For a flag
Column
Attribute
ID
1
Name
United Kingdom
Former name
United Kingdom of Great Britain and Northern Ireland
Code
GB
Continent
Europe
Subregion
Northern Europe
Capital city
London
...
(For a U.S. State)
Column
Attribute
ID
1
Name
Arizona
Capital city
Phoenix
Largest city
Phoenix
...
The both solution (Add all as column and json) are not the proper way.
I think the best design is to have a key-value table.
Create Table tableName (ID INT, [Key] SYSNAME, [Value])
And data will look like:
ID
Key
Value
1
Name
Arizona
1
Capital City
Phoenix
1
Largest City
Phoenix
2
Name
United Kingdom
2
Former name
United Kingdom of Great Britain and Northern Ireland
Most valuable benefits: No Extra storage for columns with large amount of rows with NULL value.

When a statement contains an item in a list, show it in a new column

I would appreciate a little help on some script in sql. So I have a list like the one below and a database table -Table1 with statement as a colum name, and I will like to create a column called location, where the script can search in the statement column and once it finds any of the items in the list in any row it states that in the location column
(Tema, london, Sydney, Germany, China, Africa,)
Statement
-------------------
Going to london
Apples in Tema
Sydney is a city
China is a country
Africa is a continent
In the end I hope to see a table like this :
Statement
location
Going to london
London
Apples in Tema
Tema
Sydney is a city
Sydney
china is a country
China
Africa is a continent
Africa
By using this script,
SELECT Statement,
Case
WHEN Statement::text ~~* '%london%'::character varying::text
THEN 'london'::character varying
ELSE NULL::character varying
END AS location
FROM Table1
I think I would have to write a very tall script, but I was wondering if I could get help with something efficient and quite simple to achieve this
If you have a list of places, you can use that:
select t1.*, v.place
from table1 t1 cross join
(values ('tema'), ('london'), ('sydney'), ('germany'), ('china'), ('africa')
) v(place)
on Statement::text ilike '%' || v.place || '%';
Note: You might want to use regular expressions so you can include work boundaries but your example code doesn't do tis.

Filtering out records in a SQL table using rules in another SQL table

Data Table
Name Company Continent country state district
Tom HP Asia India Assam Kdk
George SAP Africa Sudan Chak ksk
Bill EBAY Europe Denmark Lekh Sip
Charles WM Asia India Haryana Jhat
Chip WM Asia India Punjab Chista
Chia WM Asia India Punjab Mast
Rule Table
Continent country state district Pass
Asia India ALL ALL Yes
Asia India Punjab ALL NO
Asia India Punjab Mast Yes
I have two tables in Hive. Depending on the rule I have to filter out the data in the data table.
In the rule table there is a column called pass which determines whether a record in data table needs to be filtered or not.
In this example there are different kinds of rules. They are the ones at broader level and at narrow level.
The rules at narrow level should not affect the rules at broader level. This means the rules at narrow level is an exception to rules at broader level.
For ex: in the rules table, there are 3 records. The first record is the rule at broader level. The other ones are at narrow level.
The first rules says to pass all the records that have country as india,state as any/all and district as any/all.
The second rule says to not pass all the records that have country as India, state as punjab and district as any/all.
The third rule says to pass all records that have country as India,state as punjab and district as Mast.
The second rule is an exception to first rule. The third rule is an exception to second rule.
Considering the data in the data table and rules in the rules table, the pass columns will be as follows for the Indian(country) records.
Name Company Continent country state district Pass
Tom HP Asia India Assam Kdk Yes
Charles WM Asia India Haryana Jhat Yes
Chia WM Asia India Punjab Mast Yes
Chip WM Asia India Punjab Chista No
This is just an example. In production the data will be different.
How do I implement this using SQL/Sql script?
Help is much appreciated.
You want the most specific rule. In Hive, you can use multiple left joins:
select d.*, coalesce(r1.pass, r2.pass, r3.pass)
from data d left join
rules r1
on r1.Continent = d.Continent and
r1.country = d.country and
r1.state = d.state and
r1.district = d.district left join
rules r2
on r2.Continent = d.Continent and
r2.country = d.country and
r2.state = d.state and
r2.district = 'ALL' left join
rules r3
on r3.Continent = d.Continent and
r3.country = d.country and
r3.state = 'ALL' and
r3.district = 'ALL' ;
You might want to continue with the LEFT JOINs if 'ALL' is allowed for continent and country.
#TomG : Please see the below code if that helps
select * from TEMP_TESTING where country ='India' and district<>'Chista'
union
(select * from TEMP_TESTING where country ='India' except
select * from TEMP_TESTING where country ='India' and state='Punjab')
union
select * from TEMP_TESTING where country ='India'and state='Punjab' and district='Mast'

SQL (COUNT(*) / locations.area)

We are learning SQL at school, and my professor has this sql code in his documents.
SELECT wp.city, (COUNT(*) / locations.area) AS population_density
FROM world_poulation AS wp
INNER JOIN location
ON wp.city = locations.city
WHERE locations.state = “Hessen”
GROUP BY wp.city, locations.area
Everything is almost clear for me, just the aggregate function with /locations.area doesn't make any sense to me. Can anybody help?
Thank you in advance!
Look at what the query is grouped on, that tells you what each group consists of. In this case, each group is a city, and contains all the rows that have the same value for wp.city (and as the location table is joined on that value too, the locations.area is only included in the grouping so that it can be used in the result).
So each group has a number of rows, and the COUNT(*) aggregate will contain the number of rows for each group. The value of (COUNT(*) / locations.area) will be the number of rows in the group divided by the value of locations.area for that group.
If you would have data like this:
world_population
name city
--------- ---------
John London
Peter London
Sarah London
Malcolm London
Ian Cardiff
Johanna Stockholm
Sven Stockholm
Egil Stockholm
locations
city state area
----------- -------------- ---------
London Hessen 2
Cardiff Somehere else 14
Stockholm Hessen 1
Then you would get a result with two groups (as Cardiff is not in the state Hessen). One group has four people from London which has the area 2, so the population density would be 2. The other group has three people from Stockholm which has the area 1, so the population density would be 3.
Side note: There is a typo in the query, as it joins in the table location but refers to it as locations everywhere else.
Try writing it like:
SELECT wp.city,
locations.area,
COUNT(*) AS population,
(COUNT(*) / locations.area) AS population_density
FROM world_poulation AS wp
INNER JOIN location
ON wp.city = locations.city
WHERE locations.state = “Hessen”
GROUP BY wp.city, locations.area
The key is the GROUP BY statement. You are showing pairs of cities and areas. The COUNT(*) is the number of times a given pair shows up in the table you created by joining world population and location. The area is just a number, so you can divide the area by the COUNT.

Needing Clarity on SQL Join Query

Having some trouble understanding this query, particularly the WHERE in the subquery. I don't really get what it is accomplishing. Any help would be appreciated. Thanks
# Find the largest country (by area) in each continent. Show the continent,
# name, and area.
SELECT continent, name, area
FROM countries AS a
WHERE area = (
SELECT MAX(area)
FROM countries AS b
WHERE a.continent = b.continent
)
Consider the following subset of the countries data:
Continent Country Area
North America USA 3718691
North America Canada 3855081
North America Mexico 761602
Europe France 211208
Europe Germany 137846
Europe UK 94525
Europe Italy 116305
This is a correlated query that behaves as follows:
Reads the first row returned by the outer query (North America, USA, 3718691)
Runs the subquery which correlates to a.continent, North America, and returns 3855081 which is the maximum area in North America.
Does the where equality which checks to see if 3855081 matches the area on the row we're working on.
It doesn't match so the next row in the outer query is read and we start over at step 1 this time working on the second row.
Repeat for all rows in the outer query.
When we're looking at rows 2 and 4, step 4. will match so those rows will be returned by the query.
You can check the results by using this data in your countries table and running the query.
Note that this is a very poor way to determine the country with the maximum area per continent because it repeats the subquery for every country. Using my sample data, it determines the maximum area for North America 3 times and the maximum area for Europe 4 times.
Since you asked in your comment, I would write this query as follows:
SELECT a.continent, a.name, a.area
FROM countries AS a
inner join (select continent, max(area) max_area
from countries
group by continent) as b on a.continent = b.continent
WHERE a.area = b.max_area
In this version of the query, the maximum for each continent is only determined once. The original query was written to illustrate correlated queries and it's important to understand them. Correlated queries can often be used to resolve complex logic.
The subquery is finding the maximum area for countries. Which countries? All countries that match the continent of the country in the outer query.
So, for each country it gets the area of the largest country on the same continent.
The WHERE clause then says "are the two areas the same -- the maximum area and the area of this country?". It chooses only countries that have the maximum area.