Say I have this very simple table with duplicate entries. Is the relationship between the A and B columns one-to-one or many-to-many?
A
B
C
1
2
x
1
2
y
Undoubtedly a simple question, but I can't find confirmation for this corner case... Thanks in advance!
EDIT: Changed the content of the table to avoid stick to the math definition.
As I said in the comment this is a one-to-many relation. for clarifying, let's take a look at this example(I normalized your table into these bellow tables):
Suppose You have a table entitled with continent like below:
id title
---|--------
1 | Asia
2 | Europe
3 | America
Now we have another table with this name country like below:
id title continent_Id
---|----------|--------------
1 | Norway | 2
2 | Germany | 2
3 | Canada | 3
4 | Japan | 1
also, we have a state table with this structure:
id stateTitle country_Id
---|----------|--------------
1 | Munich | 2
2 | Berlin | 2
3 | Torento | 3
4 | Tokio | 4
5 | Osaka | 4
Related
I want to find relationships between two persons using a database. For example, I have a database like this:
Person:
Id| Name
1 | Edvard
2 | Ivan
3 | Molly
4 | Julian
5 | Emily
6 | Katarina
Relationship:
Id| Type
1 | Parent
2 | Husband\Wife
3 | ex-Husband\ex-Wife
Relationships:
Id| Person_1_Id | Person_2_Id | Relation_Id
1 | 1 | 3 | 2
2 | 3 | 4 | 3
3 | 3 | 2 | 1
4 | 4 | 2 | 1
5 | 1 | 6 | 3
6 | 1 | 5 | 1
7 | 6 | 5 | 1
What the best way to find what relationship between Person-2 and Person-5? This example is not large enough, but what if there were 5 families or 10000. I think, if there are too many families, then it is necessary to introduce the concept of depth. Maybe it will be better to change the database design? Is this possible to make it like trees or graphs? Some ideas on how to solve this problem differently?
As soon as you get above a handful of nodes and a few relationships between them, this becomes a very complex problem: there are whole branches of maths based around this type of challenge and how long it takes to compute a result.
For any non-trivial set of nodes/relationships you are going to need to look at deploying a graph database e.g. Neo4j
I cannot disclose the actual data so I am just using an example. I have two tables one is a dictionary table which has the IDs for the titles. The second table is new data that's coming into the database and doesn't have IDs I need to update the ID for the new data by checking in the dictionary table if I have something similar already in there or else update the dictionary with the new value and get a new ID for it and update the same for the new data. The Expected ID column in the second table is what I expect them to be updated as.
ID Title
-- ------------------------
1 Aliens
2 The Hunger Games
3 John Wick
4 Alien vs Predator
ID Title Expected ID
----------------------------------------------------------------
null The Hunger Games: Mockingjay Part I 2
null The Hunger Games: Mockingjay Part II 2
null John Wick (2014) 3
null John Wick Chapter 2 3
null Alien 1
null Aliens 1
null Alien 3 1
null Lord of the Rings 5 (New ID generated)
This seems like a task for pg_trgm. It provides you with the % operator, which returns true if the two strings are "close enough". You can tune what close enough means via changing pg_trgm.similarity_threshold. You can create an index to accelerate this operation. The higher similarity_threshold is set, the more acceleration you are likely to achieve.
If I take you first shown table as foo1 and all the distinct titles from your desired output as foo2, then this query gives reasonable results:
select *, foo1.title <-> foo2.title as distance from foo2 left join foo1 on foo1.title % foo2.title;
title | id | title | distance
--------------------------------------+--------+-------------------+-----------
The Hunger Games: Mockingjay Part I | 2 | The Hunger Games | 0.5142857
The Hunger Games: Mockingjay Part II | 2 | The Hunger Games | 0.5277778
John Wick (2014) | 3 | John Wick | 0.3333333
John Wick Chapter 2 | 3 | John Wick | 0.5
Alien | 1 | Aliens | 0.375
Alien | 4 | Alien vs Predator | 0.6666666
Aliens | 1 | Aliens | 0
Alien 3 | 1 | Aliens | 0.5
Alien 3 | 4 | Alien vs Predator | 0.7
Lord of the Rings | (null) | (null) | (null)
If you want one a single output row for each foo2, showing the "best" match only, then you would use a LEFT JOIN LATERAL:
select *, a.title <-> foo2.title as distance from foo2 left join lateral
(select * from foo1 where foo1.title % foo2.title order by foo1.title <-> foo2.title limit 1) a
on true;
title | id | title | distance
--------------------------------------+--------+------------------+-----------
The Hunger Games: Mockingjay Part I | 2 | The Hunger Games | 0.5142857
The Hunger Games: Mockingjay Part II | 2 | The Hunger Games | 0.5277778
John Wick (2014) | 3 | John Wick | 0.3333333
John Wick Chapter 2 | 3 | John Wick | 0.5
Alien | 1 | Aliens | 0.375
Aliens | 1 | Aliens | 0
Alien 3 | 1 | Aliens | 0.5
Lord of the Rings | (null) | (null) | (null)
How to replace the NULL in the "id" column (where there are no close-enough matches) with a newly generated id is a separate question, and you should ask separate questions separately.
For any realistically sized datasets, it is unlikely you would be able to just blindly accept whatever a query like the above produces, at least not if you want high quality results. Rather, you could have the computer generate something like the above as recommendations, and then offer them (in a convenient interface) for a human to ratify, reject, or investigate further.
I am trying to move data from a database into a pandas data frame. I have data in multiple tables that I want to combine.
I'm using SQLAlchemy and relationship between parent/children.
I'm trying to understand how I'd do this in SQL before attempting in SQLAlchemy
I am using Sqlite as a DB.
parent_table
ID | Name | Class
1 | Joe | Paladin
2 | Ron | Mage
3 | Sara | Knight
child1
ID | distance | finished | parent_id
1 | 2 miles | yes | 1
2 | 3 miles | yes | 1
3 | 1 miles | yes | 1
4 | 10 miles | no | 2
child2
ID | Weight | height | parent_id
1 | 5 lbs | 5'3 | 1
2 | 10 lbs | 5'5 | 2
I want to write a query where the result would be everything for Joe (id: 1) on a row.
1 | Joe | Paladin | 2 miles | yes | 3 miles | yes | 1 miles | yes | 5lbs | 5'3
2 | Ron | Mage | 10 miles | no | None | None | None | None | 10lbs | 5'5
3 | Sara | Knight | None | None | None | None | None | None | None | None
I'm guessing I need to do a join, but confused about the fact that Ron has less child1 entries.
How do I construct a table that has as many columns as needed and fills out the empty ones as None when some of the rows in parent_table don't have as many children?
simply search everyone by themself and use a union to join:
SELECT Name,Class FROM parent_table WHERE ID = 1
UNION
SELECT distance,finished FROM child1 WHERE parent_id = 1
UNION
SELECT weight,height FROM child2 WHERE parent_id =1
This way you avoid the problem for Ron or anyone that does not have a register in a table,
You can't have "As many columns as needed" because the number of child rows is variable and you can't have a variable number of columns. If you can figure out a fixed number of children, (say 2) you can do:
CREATE TABLE
"some_table"
AS
SELECT
"parent_table"."ID",
"parent_table"."Name",
"parent_table"."Class",
"child1_1"."finished" AS "2_miles",
"child1_2"."finished" AS "3_miles"
FROM
"parent_table",
"child1" AS "child1_1",
"child1" AS "child1_2"
WHERE
"child1_1"."parent_id"="parent_table"."id" AND
"child1_2"."parent_id"="parent_table"."id" AND
"child1_1"."distance"='2 miles' AND
"child1_2"."distance"='3 miles'
You can add columns from child2 in the same manner. And child subkeys (data in child1.distance i.e.) will need to go to column names. But for variable one-to-many relations, you need multiple tables. It's basically what the relational concept is all about.
For data analysis (which you are trying to do as it seems) you will also need two datasets (like tables) because the 2 measurements (sample sets) are not correlated (i.e. distances and weights), which you can obtain in 2 tables. Think of what a "sample" is (the result of a measurement). It can't be "entity 1 completed 2 miles and 4lbs" because "2 miles and 4 lbs" it's not a measurable event. So you have 2 distinct samples: "entity 1 completed 2 miles" and "entity 1 completed 4 lbs". (Or are the data in child2 1-to-1 properties of the entity in parent_table ? You should detail better the meaning of the data and what you-re trying to achieve).
I have to do this in SQL.
I have a table called 'locations'. It contains a list of locations ranging from houses, to streets, to cities all the way up to continents.
locationId | name | desiredValue
1 | Wimbledon |
2 | Peckham |
3 | London |
4 | UK |
5 | France | 123
6 | Europe | 456
7 | Australia |
8 | Paris |
I have a second table called 'links' which contains the link of locations, and their relation
Location1 | Location2 | Linktype
3 | 1 | 5
3 | 2 | 5
4 | 3 | 5
6 | 4 | 5
5 | 8 | 5
linktype 5 indicates that location2 is situated 'in' location1. In the example above, locationId 1 (wimbledon) is located 'in' locationId 3 (london). LocationId 3 (london) is located 'in' locationId 4 (Europe) and so on.
The linktype just describes this 'in' relationship - the link table contains other relations as well which are not pertinent to this question, I just mention it in case it needs to be in a where clause.
For a given location, I want to get the first instance in its location hierarchy that has a 'desiredValue'
For example:
if I was interested in Peckham, I'd like to see that Peckham has no value, that London has no value, that UK has no value but that Europe does (456).
If I was interested in London, I'd see that it has no value, nor does the UK, but that Europe does (456)
If I was interested in Europe, I'd see that it has a value (456)
If I was interested in Paris, I'd see that it has no value, but France does (123)
I know I should probably be using recursive CTEs for this, but I'm stumped. Any help would be greatfuly received!
I have a table with information like this.
ID | Name | #ofCow | UItem | place
--------+---------+-------- +---------+----------
0 | Bob | 7 | 1 | maine
1 | Bob | 3 | 5 | new york
2 | Tom | 2 | 5 | cali
I wish to produce a table like this where it would add up the number of cows and Uitem if the name is the same. However my select query seems to not be working. I suspect it is because the place column is the problem. Since you can't add 'Maine' and 'New York' together. Can anyone help me find a solution ?
ID | Name | #ofCow | UItem |
--------+---------+-------- +---------+
0 | Bob | 10 | 6 |
2 | Tom | 2 | 5 |
TLDR : Add the values in two columns in table 1 if name is same. Output in another column. Don't show the two columns. I don't need places also.
You could use this (I have considered the name of the table as HolyCow) :
SELECT holy.ID,
holy.Name,
SUM(holy.Cows) as '#ofCow',
SUM(holy.UItem) as 'UItem'
FROM HolyCow holy
GROUP BY holy.ID, holy.Name
ORDER BY holy.Name
Hope this helps!!