I have categories and Listings stored in a ListingCategory table and a Listing table respectively.
A listing can be stored in many categories and a category can have many listings. These are joined by a table *ListingCategory_Listings*:
ID | ListingCategoryID | ListingID
I need to somehow grab all the ListingCategories where listings in them meet a certain criteria.
As an example, Imagine categories such as: Food, Drink, Lodging.
A bar listing would be linked to Food and Drink and a hotel would link to Food, Drink and Lodging, a hostel would link to lodging etc etc.
Each of these listings is geo-coded and I want to be able to display the categories where there are listings within X miles of a determined geo-location. So if just the bar fell within the X miles, we would show Food and Drink. If just the hostel fell in this radius, we only show lodging, etc. I have the logic to work out the distance, I just don't know how to get my desired result
Lastly... apologies for the horrible post title
should be as simple as
SELECT DISTINCT c.ID, c.name
FROM ListingCategory c
JOIN ListingCategory_Listings lc
ON c.ID = lc.ListingCategoryID
WHERE lc.ListingID IN (<list of listings comma separated>)
Related
A restaurant provides wine pairings for most food items on its menu. The structure of two of the tables containing this information is shown below
Join these two tables by their id columns to find the country that the recommended wine is produced in.
Here is the code I have tried:
SELECT country, item
FROM regions
INNER JOIN pairing
regions.id = pairing.id
ORDER BY item
LIMIT 5;
But the compiler gives the solution as:
SELECT country, item
FROM regions
INNER JOIN pairing
USING(id)
ORDER BY item
LIMIT 5;
OUTPUT:
country
item
France
caviar
Italy
curry
Italy
grilled vegetables
Argentina
lamb
Germany
roast duck
Doubt:
I want to clear if there is any difference bwtween USING and equal statement on id or they are same?
I'm using BigQuery.
I have two simple tables with "bad" data quality from our systems. One represents revenue and the other production rows for bus journeys.
I need to match every journey to a revenue transaction but I only have a set of fields and no key and I don't really know how to do this matching.
This is a sample of the data:
Revenue
Year, Agreement, Station_origin, Station_destination, Product
2020, 123123, London, Manchester, Qwerty
Journeys
Year, Agreement, Station_origin, Station_destination, Product
2020, 123123, Kings Cross, Piccadilly Gardens, Qwer
2020, 123123, Kings Cross, Victoria Station, Qwert
2020, 123123, London, Manchester, Qwerty
Every station has a maximum of 9 alternative names and these are stored in a "station" table.
Stations
Station Name, Station Name 2, Station Name 3,...
London, Kings Cross, Euston,...
Manchester, Piccadilly Gardens, Victoria Station,...
I would like to test matching or joining the tables first with the original fields. This will generate some matches but there are many journeys that are not matched. For the unmatched revenue rows, I would like to change the product name (shorten it to two letters and possibly get many matches from production table) and then station names by first change the station_origin and then station_destination. When using a shorter product name I could possibly get many matches but I want the row from the production table with the most common product.
Something like this:
1. Do a direct match. That is, I can use the fields as they are in the tables.
2. Do a match where the revenue.product is changed by shortening it to two letters. substr(product,0,2)
3. Change the rev.station_origin to the first alternative, Station Name 2, and then try a join. The product or other station are not changed.
4. Change the rev.station_origin to the first alternative, Station Name 2, and then try a join. The product is changed as above with a substr(product,0,2) but rev.station_destination is not changed.
5. Change the rev.station_destination to the first alternative, Station Name 2, and then try a join. The product or other station are not changed.
I was told that maybe I should create an intermediate table with all combinations of stations and products and let a rank column decide the order. The station names in the station's table are in order of importance so "station name" is more important than "station name 2" and so on.
I started to do a query with a subquery per rank and do a UNION ALL but there are so many combinations that there must be another way to do this.
Don't know if this makes any sense but I would appreciate any help or ideas to do this in a better way.
Cheers,
Cris
To implement a complex joining strategy with approximate matching, it might make more sense to define the strategy within JavaScript - and call the function from a BigQuery SQL query.
For example, the following query does the following steps:
Take the top 200 male names in the US.
Find if one of the top 200 female names matches.
If not, look for the most similar female name within the options.
Note that the logic to choose the closest option is encapsulated within the JS UDF fhoffa.x.fuzzy_extract_one(). See https://medium.com/#hoffa/new-in-bigquery-persistent-udfs-c9ea4100fd83 to learn more about this.
WITH data AS (
SELECT name, gender, SUM(number) c
FROM `bigquery-public-data.usa_names.usa_1910_2013`
GROUP BY 1,2
), top_men AS (
SELECT * FROM data WHERE gender='M'
ORDER BY c DESC LIMIT 200
), top_women AS (
SELECT * FROM data WHERE gender='F'
ORDER BY c DESC LIMIT 200
)
SELECT name male_name,
COALESCE(
(SELECT name FROM top_women WHERE name=a.name)
, fhoffa.x.fuzzy_extract_one(name, ARRAY(SELECT name FROM top_women))
) female_version
FROM top_men a
Consider a relation that contained the names and number of locations of restaurants including split and stand alone restaurants:
RESTAURANT: NUM_OF_LOC
Pizza Hut 1
Pizza Hut/Taco Bell 2
Taco Bell 2
Also consider you will not know the name of the restaurant, stand alone or split, or Number of Locations. The only consistent piece is the "/" string character between split restaurants.
How to return the above table as a result with the number of stand alone restaurants summed into the number of split restaurants in desc, like so:
RESTAURANT: NUM_OF_LOC
Pizza Hut/Taco Bell 5
Taco Bell 2
Pizza Hut 1
So are you looking to get the count of all restaurants for just Taco Bell and Pizza Hut where a joint counts as 1 for each or are you looking to count all occurrences of each variant?
I'm thinking you aren't just looking for totals and are looking to tear apart the combined restaurants so you can do something like
SELECT name, count(*)
FROM restaurants
WHERE CONTAINS (name, 'Taco Bell')
Relooking it seems like you want the consolidated groups to include all occurrences of either which would be something like:
CREATE TABLE sums AS
SELECT name, count(*)
FROM restaurants
WHERE CONTAINS (name, 'Taco Bell')
OR CONTAINS(name, 'Pizza Hut')
I have 3 database tables.
First one containing Ingredients, second one containing Dishes and the third one which is conecting both Ingredients and Dishes.
Adding data to those tables was easy but I faced a problem while trying to select specific content.
Reurning all ingredients for specific dish.
SELECT *
FROM Ingredient As I
JOIN DishIngredients as DI
ON I.ID = DI.IngredientID
WHERE DI.DishID = 1;
But If i try to query for dish Name and Description no matter what kind o join I use i always get number of results equal to number of used Ingredients. If i have 4 ingredients in my dish then select returns Name and Description 4 times, how can I modify my slect to select those values just once?
Here is result of my query (same as hawk's) if i try to select Name and Description. I am using MS SQL.
ID Name Description DishID IngredientID
-- -------------------- -------------------------------------------------------------------- ------ ---------
1 Spaghetti Carbonara This delcitious pasta is made with fresh Panceta and Single Cream 1 1
1 Spaghetti Carbonara This delcitious pasta is made with fresh Panceta and Single Cream 1 2
Kuzgun's query worked fine for me. However from your sugestions I see that I dont really need join between DishIngredient and Dish.
When I need Name and Descritpion I can simply go for
SELECT * FROM Dish WHERE ID=1;
Wehn I need list of Ingredient I can use my above query.
If you need to display both dish details and ingredient details, you need to join all 3 tables:
SELECT *
FROM Ingredient As I
JOIN DishIngredients as DI
ON I.ID = DI.IngredientID
JOIN Dish AS D
ON D.ID=DI.DishID
WHERE DI.DishID = 1;
If you don't care about ingredient,you don't have to use the table DishIngredient.Just use tale Dish.select * from dish d where d.id=1.
If you want to know what the ingredient is ,the sql that you use just query the id of table ingredient.It's useless.Because of the design of your database ,a little redundancy is a must .
select * from dish d join dishingredient di on d.id=di.dishid join ingredient i on
i.id=di.ingredientid where d.id=1
Of course,you will get number of results that contain dish's name and description.
If you want to get the full information but the least redundancy,you can do it in two step:
select * from dish d where d.id=1;
select * from ingredient i join DishIngredient di on i.id=di.ingredientid where di.dishid=1
In java ,you can write a class to represent a dish and a list to represent the ingredients it use.
public class Dish {
BigDecimal id;
String name;
String description;
List<Ingredient> ingredient;
}
class Ingredient{
BigDecimal id;
String name;
.....
}
I am working on a site that lists a directory of various restaurants, and currently in the process of switching to a newer CMS. The problem I have is that both CMSes represent the restaurant data differently.
Old CMS
A Cross Reference Database so it may list an entry for an example like this:
ID / FieldID / ItemID / data
3 / 1 / 6 / 123 Foo Street
4 / 2 / 6 / Bar
One reference table that reference FieldID 1 as street, FieldID 2 as City.
Another reference table that references ItemID 6 as Delicious Restaurant.
New CMS
The way the database is on the new CMS when I set up a sample listing, is all direct rows, no cross referencing. So instead the data for the same restaurant will be:
ID / Name / Street / City
3 / Delicious Restaurant / 123 Foo Street / Bar
There are about 2,000 restaurant listings so it's not a HUGE amount in terms of SQL row data size, but of course enough to not even consider re-entering all the restaurant listings by hand.
I have a few ideas, but it would be extremely dirty and take a while, and I'm not a MySQL expert so I am here for some ideas how I should tackle it.
Many thanks to those who can help.
You can join against the data table multiple times to get something like this:
insert into newTable
select oldNames.ItemID,
oldNames.Name,
oldStreets.data,
oldCities.data
from oldNames
inner join oldData as oldStreets on oldNames.ItemID = oldStreets.ItemID
inner join oldData as oldCities on oldNames.ItemID = oldCities.ItemID
inner join oldFields as streetsFields
on oldStreets.FieldID = streetsFields.FieldID
and streetsFields.Name = 'Street'
inner join oldFields as citiesFields
on oldCities.FieldID = citiesFields.Field
and citiesFields.Name = 'City'
You didn't provide names for all of the tables, so I made some names up. If you have more fields that you need to extract, it should be trivial to extend this sort of query.