SQL: Looking up the same field in one table for multiple values in another table? - sql

(Not sure if the name of this question really makes sense, but what I want to do is pretty straightforward)
Given tables that looks something like this:
Table Foo
---------------------------------
| bar1_id | bar2_id | other_val |
---------------------------------
Table Bar
--------------------
| bar_id | bar_desc|
--------------------
How would I create a select that would return a table that would look like the following:
---------------------------------------------------------
| bar1_id | bar1_desc | bar2_id | bar2_desc | other_val |
---------------------------------------------------------
i.e. I want to grab every row from Foo and add in a column containing the description of that bar_id from Bar. So there might be some rows from Bar that don't end up in the result set, but every row of Foo should be in it.
Also, this is postgres, if that makes a difference.

SELECT F.bar_id1,
(SELECT bar_desc FROM Bar B WHERE (F.bar_id1 = B.bar_id)),
F.bar_id2,
(SELECT bar_desc FROM Bar B WHERE (F.bar_id2 = B.bar_id)),
F.other_val
FROM FOO F;

This doesn't directly answer your question (but that's ok, the people above already have), but...
This is considered very bad design. What happens in the future when your foo can be associated with 3 bars? Or more? (Don't say it will never happen. I've lost count of the number of "that'll never happen" things I've implemented over the years).
The generally correct way to do this is to do a one-to-many relationship (either with each bar pointing back to a foo, or an intermediate foo-to-bar table, see many-to-many relationships). Now you correctly format output on the front end, and just fetch a list of bars per foo to pass up to it (easy to do in SQL). Reports are a special case, but it's still relatively easily accomplished with pivoting or CrossTab queries.

SELECT
foo.bar1_id, bar1.bar_desc AS bar1_desc,
foo.bar2_id, bar2.bar_desc AS bar2_desc,
foo.other_val
FROM
foo
INNER JOIN bar bar1 ON bar1.id = foo.bar1_id
INNER JOIN bar bar2 ON bar2.id = foo.bar2_id
This assumes you'll always have both a bar1_id and a bar2_id in foo. If these can be null then change INNER JOIN to LEFT OUTER JOIN.

select f.bar1, b1.desc, f.bar2, b2.desc, f.value
from foo as f, bar as b1, bar as b2
where f.bar1 = b1.id
and f.bar2 = b2.id

I would try with information_schema.colums table.
SELECT concat(table_name,'_',column_name)
FROM information_schema.columns WHERE table_name = bar1 OR table_name = bar2
into new_table
Then you can populate it.
with foo as select * from bar1
or select into

Related

Filter list of points using list of Polygons

Given a list of points and a list of polygons. How do you return a list of points (subset of original list of points) that is in any of the polygons on the list
I've removed other columns in the sample tables to simplify things
Points Table:
| Longitude| Latitude |
|----------|-----------|
| 7.07491 | 51.28725 |
| 3.674765 | 51.40205 |
| 6.049105 | 51.86624 |
LocationPolygons Table:
| LineString |
|----------------------|
| CURVEPOLYGON (COMPOUNDCURVE (CIRCULARSTRING (-122.20 47.45, -122.81 47.0, -122.942505 46.687131 ... |
| MULTIPOLYGON (((-110.3086 24.2154, -110.30842 24.2185966, -110.3127...
If I had row from the LocationPolygons table I could do something like
DECLARE #homeLocation geography;
SET #homeLocation = (select top 1 GEOGRAPHY::STGeomFromText(LineString, 4326)
FROM LocationPolygon where LocationPolygonId = '123abc')
select Id, Longitude, Latitude, #homeLocation.STContains(geography::Point(Latitude, Longitude, 4326))
as IsInLocation from Points PointId in (1, 2, 3,)
which would return what I want in a format like the below. However this is only true for just one location on the list
| Id | Longitude| Latitude | IsInLocation |
|----|----------|-----------|--------------|
| 1 | 7.07491 | 51.28725 | 0 |
| 2 | 3.674765 | 51.40205 | 1 |
| 3 | 6.049105 | 51.86624 | 0 |
How do I handle the scenario with multiple rows of the LocationPolygon table?
I'd like to know
if any of the points are in any of the locationPolygons?
what specific location polygon they are in? or if they are in more than one polygon.
Question 2 is more of an extra. Can someone help?
Update #1
In response to #Ben-Thul answer.
Unfortunately I don't have access/permission to make changes to the original tables, I can request access but not certain it'll be given. So not certain I'll be able to add the columns or create the index. Although I can create temp tables in a stored proc, I might be able to use test your solution that way
I stumbled on an answer like the below, but slightly worried about performance implications of using a cross join.
WITH cte AS (
select *, (GEOGRAPHY::STGeomFromText(LineString, 4326)).STContains(geography::Point(Latitude, Longitude, 4326)) as IsInALocation from
(
select Longitude, Latitude from Points nolock
) a cross join (
select LineString FROM LocationPolygons nolock
) b
)
select * from cte where IsInALocation = 1
Obviously, it's better to look at a query plan but is the solution I stumbled upon essentially the same as yours? Are there any potential issues that I missed. Apologies for this but my sql isn't very good.
Question 1 shouldn't be too bad. First, some set up:
alter table dbo.Points add Point as (GEOGRAPHY::Point(Latitude, Longitude, 4326));
create spatial index IX_Point on dbo.Points (Point) with (online = on);
alter table dbo.LocationPolygon add Polygon as (GEOGRAPHY::STGeomFromText(LineString, 4326));
create spatial index IX_Polygon on dbo.LocationPolygon (Polygon) with (online = on);
This will create a computed column on each of your tables that is of type geography that has a spatial index on it.
From there, you should be able to do something like this:
select pt.ID,
pt.Longitude,
pt.Latitude,
coalesce(pg.IsInLocation, 0) as IsInLocation
from Points as pt
outer apply (
select top(1) 1 as IsInLocation
from dbo.LocationPolygon as pg
where pg.Polygon.STContains(p.Point) = 1
) as pg;
Here, you're selecting every row from the Points table and using outer apply to see if any polygons contain that point. If one does (it doesn't matter which one), that query will return a 1 in the result set and bubble that back up to the driving select.
To extend this to Question 2, you can remove the top() from the outer apply and have it return either the IDs from the Polygon table or whatever you want. Note though that it'll return one row per polygon that contains the point, potentially changing the cardinality of your result set!

Efficiently return words that match, or whose synonym(s), match a keyword

I have a database of industry-specific terms, each of which may have zero or more synonyms. Users of the system can search for terms by keyword and the results should include any term that contains the keyword or that has at least one synonym that contains the keyword. The result should then include the term and ONLY ONE of the matching synonyms.
Here's the setup... I have a term table with 2 fields: id and term. I also have a synonym table with 3 fields: id, termId, and synonym. So there would data like:
term Table
id | term
-- | -----
1 | dog
2 | cat
3 | bird
synonym Table
id | termId | synonym
-- | ------ | --------
1 | 1 | canine
2 | 1 | man's best friend
3 | 2 | feline
A keyword search for (the letter) "i" should return the following as a result:
id | term | synonym
-- | ------ | --------
1 | dog | canine <- because of the "i" in "canine"
2 | cat | feline <- because of the "i" in "feline"
3 | bird | <- because of the "i" in "bird"
Notice how, even though both "dog" synonyms contain the letter "i", only one was returned in the result (doesn't matter which one).
Because I need to return all matches from the term table regardless of whether or not there's a synonym and I need no more than 1 matching synonym, I'm using an OUTER APPLY as follows:
<!-- language: sql -->
SELECT
term.id,
term.term,
synonyms.synonym
FROM
term
OUTER APPLY (
SELECT
TOP 1
term.id,
synonym.synonym
FROM
synonym
WHERE
term.id = synonym.termId
AND synonym.synonym LIKE #keyword
) AS synonyms
WHERE
term.term LIKE #keyword
OR synonyms.synonym LIKE #keyword
There are indexes on term.term, synonym.termId and synonym.synonym. #Keyword is always something like '%foo%'. The problem is that, with close to 50,000 terms (not that much for databases, I know, but...), the performance is horrible. Any thoughts on how this can be done more efficiently?
Just a note, one thing I had thought to try was flattening the synonyms into a comma-delimited list in the term table so that I could get around the OUTER APPLY. Unfortunately though, that list can easily exceed 900 characters which would then prevent SQL Server from adding an index to that column. So that's a no-go.
Thanks very much in advance.
You've got a lot of unnecessary logic in there. There's no telling how SQL server is creating an execution path. It's simpler and more efficient to split this up into two separate db calls and then merge them in your code:
Get matches based on synonyms:
SELECT
term.id
,term.term
,synonyms.synonym
FROM
term
INNER JOIN synonyms ON term.termId = synonyms.termId
WHERE
synonyms.synonym LIKE #keyword
Get matches based on terms:
SELECT
term.id
,term.term
FROM
term
WHERE
term.term LIKE #keyword
For "flattening the synonyms into a comma-delimited list in the term table: - Have you considered using Full Text Search feature? It would be much faster even when your data goes on becoming bulky.
You can put all synonyms (as comma delimited) in "synonym" column and put full text index on the same.
If you want to get results also with the synonyms of the words, I recommend you to use Freetext. This is an example:
SELECT Title, Text, * FROM [dbo].[Post] where freetext(Title, 'phone')
The previous query will match the words with ‘phone’ by it’s meaning, not the exact word. It will also compare the inflectional forms of the words. In this case it will return any title that has ‘mobile’, ‘telephone’, ‘smartphone’, etc.
Take a look at this article about SQL Server Full Text Search, hope it helps

Find and group duplicates

Hopefully I'm able to explain what I'm trying to achieve, it's a bit complicated I think.
I have two tables like this:
ID | Names
--------------
A | Name1
B | Name2
C | Name3
ID | md5s
--------------
A | a
A | b
B | c
C | a
C | c
I'm trying to achieve this: In the end I want to have a list of all "Names" that have duplicate MD5 values and in which other "Names" these MD5 values were found.
So I want to get something like this:
Name1 has 5 duplicate entries in "md5s" with Name8, 4 with Name10 ...
I need a list for all "Names" like described above.
Hopefully that makes sense to someone. :)
I already tried it with this SQL statement:
SELECT names,COUNT(names) AS Num FROM tablename GROUP BY names HAVING(Num > 1);
But that gives me only the md5s that are duplicates. The relation to the rest is totally missing.
*edit:fixed typo
I feel like there must be a better solution than this, but here's what I've thrown together for you:
SELECT a.names NAME,
b.names DUPE_NAME,
COUNT(*) NUM_DUPES
FROM names_tbl a, names_tbl b, md5_tbl md5a, md5_tbl md5b
WHERE a.id < b.id
AND a.id = md5a.id
AND b.id = md5b.id
AND md5a.md5 = md5b.md5
GROUP BY a.names, b.names
ORDER BY a.names
The rule of thumb with finding duplicates is that you probably need to do a self join. This would be simpler if the names and their associated md5's were in the same record, but because they're in separate tables I think you need two versions of each table.

Creating new table from data of other tables

I'm very new to SQL and I hope someone can help me with some SQL syntax. I have a database with these tables and fields,
DATA: data_id, person_id, attribute_id, date, value
PERSONS: person_id, parent_id, name
ATTRIBUTES: attribute_id, attribute_type
attribute_type can be "Height" or "Weight"
Question 1
Give a person's "Name", I would like to return a table of "Weight" measurements for each children. Ie: if John has 3 children names Alice, Bob and Carol, then I want a table like this
| date | Alice | Bob | Carol |
I know how to get a long list of children's weights like this:
select d.date,
d.value
from data d,
persons child,
persons parent,
attributes a
where parent.name='John'
and child.parent_id = parent.person_id
and d.attribute_id = a.attribute_id
and a.attribute_type = "Weight';
but I don't know how to create a new table that looks like:
| date | Child 1 name | Child 2 name | ... | Child N name |
Question 2
Also, I would like to select the attributes to be between a certain range.
Question 3
What happens if the dates are not consistent across the children? For example, suppose Alice is 3 years older than Bob, then there's no data for Bob during the first 3 years of Alice's life. How does the database handle this if we request all the data?
1) It might not be so easy. MS SQL Server can PIVOT a table on an axis, but dumping the resultset to an array and sorting there (assuming this is tied to some sort of program) might be the simpler way right now if you're new to SQL.
If you can manage to do it in SQL it still won't be enough info to create a new table, just return the data you'd use to fill it in, so some sort of external manipulation will probably be required. But you can probably just use INSERT INTO [new table] SELECT [...] to fill that new table from your select query, at least.
2) You can join on attributes for each unique attribute:
SELECT [...] FROM data AS d
JOIN persons AS p ON d.person_id = p.person_id
JOIN attributes AS weight ON p.attribute_id = weight.attribute_id
HAVING weight.attribute_type = 'Weight'
JOIN attributes AS height ON p.attribute_id = height.attribute_id
HAVING height.attribute_type = 'Height'
[...]
(The way you're joining in the original query is just shorthand for [INNER] JOIN .. ON, same thing except you'll need the HAVING clause in there)
3) It depends on the type of JOIN you use to match parent/child relationships, and any dates you're filtering on in the WHERE, if I'm reading that right (entirely possible I'm not). I'm not sure quite what you're looking for, or what kind of database you're using, so no good answer. If you're new enough to SQL that you don't know the different kinds of JOINs and what they can do, it's very worthwhile to learn them - they put the R in RDBMS.
when you do a select, you need to specify the exact columns you want. In other words you can't return the Nth child's name. Ie this isn't possible:
1/2/2010 | Child_1_name | Child_2_name | Child_3_name
1/3/2010 | Child_1_name
1/4/2010 | Child_1_name | Child_2_name
Each record needs to have the same amount of columns. So you might be able to make a select that does this:
1/2/2010 | Child_1_name
1/2/2010 | Child_2_name
1/2/2010 | Child_3_name
1/3/2010 | Child_1_name
1/4/2010 | Child_1_name
1/4/2010 | Child_2_name
And then in a report remap it to how you want it displayed

How can I optimize this query...?

I have two tables, one for routes and one for airports.
Routes contains just over 9000 rows and I have indexed every column.
Airports only 2000 rows and I have also indexed every column.
When I run this query it can take up to 35 seconds to return 300 rows:
SELECT routes.* , a1.name as origin_name, a2.name as destination_name FROM routes
LEFT JOIN airports a1 ON a1.IATA = routes.origin
LEFT JOIN airports a2 ON a2.IATA = routes.destination
WHERE routes_build.carrier = "Carrier Name"
Running it with "DESCRIBE" I get the followinf info, but I'm not 100% sure on what it's telling me.
id | Select Type | Table | Type | possible_keys | Key | Key_len | ref | rows | Extra
--------------------------------------------------------------------------------------------------------------------------------------
1 | SIMPLE | routes_build | ref | carrier,carrier_2 | carrier | 678 | const | 26 | Using where
--------------------------------------------------------------------------------------------------------------------------------------
1 | SIMPLE | a1 | ALL | NULL | NULL | NULL | NULL | 5389 |
--------------------------------------------------------------------------------------------------------------------------------------
1 | SIMPLE | a2 | ALL | NULL | NULL | NULL | NULL | 5389 |
--------------------------------------------------------------------------------------------------------------------------------------
The only alternative I can think of is to run two separate queries and join them up with PHP although, I can't believe something like this being something that could kill a mysql server. So as usual, I suspect I'm doing something stupid. SQL is my number 1 weakness.
Personally, I would start by removing the left joins and replacing them with inner joins as each route must have a start and end point.
It's telling you that it's not using an index for joining on the airports table. See how the "rows" column is so huge, 5000 odd? that's how many rows it's having to read to answer your query.
I don't know why, as you have claimed you have indexed every column. What is IATA? Is it Unique? I believe if mysql decides the index is inefficient it may ignore it.
EDIT: if IATA is a unique string, maybe try indexing half of it only? (You can select how many characters to index) That may give mysql an index it can use.
SELECT routes.*, a1.name as origin_name, a2.name as destination_name
FROM routes_build
LEFT JOIN
airports a1
ON a1.IATA = routes_build.origin
LEFT JOIN
airports a2
ON a2.IATA = routes_build.destination
WHERE routes_build.carrier = "Carrier Name"
From your EXPLAIN PLAN I can see that you don't have an index on airports.IATA.
You should create it for the query to work fast.
Name also suggests that it should be a UNIQUE index, since IATA codes are unique.
Update:
Please post your table definition. Issue this query to show it:
SHOW CREATE TABLE airports
Also I should note that your FULLTEXT index on IATA is useless unless you have set ft_max_word_len is MySQL configuration to 3 or less.
By default, it's 4.
IATA codes are 3 characters long, and MySQL doesn't search for such short words using FULLTEXT with default settings.
After you implement Martin Robins's excellent advice (i.e. remove every instance of the word LEFT from your query), try giving routes_build a compound index on carrier, origin, and destination.
It really depends on what information you're trying to get to. You probably don't need to join airports twice and you probably don't need to use left joins. Also, if you can search on a numeric field rather than a text field, that would speed things up as well.
So what are you trying to fetch?