I want to find relationships between two persons using a database. For example, I have a database like this:
Person:
Id| Name
1 | Edvard
2 | Ivan
3 | Molly
4 | Julian
5 | Emily
6 | Katarina
Relationship:
Id| Type
1 | Parent
2 | Husband\Wife
3 | ex-Husband\ex-Wife
Relationships:
Id| Person_1_Id | Person_2_Id | Relation_Id
1 | 1 | 3 | 2
2 | 3 | 4 | 3
3 | 3 | 2 | 1
4 | 4 | 2 | 1
5 | 1 | 6 | 3
6 | 1 | 5 | 1
7 | 6 | 5 | 1
What the best way to find what relationship between Person-2 and Person-5? This example is not large enough, but what if there were 5 families or 10000. I think, if there are too many families, then it is necessary to introduce the concept of depth. Maybe it will be better to change the database design? Is this possible to make it like trees or graphs? Some ideas on how to solve this problem differently?
As soon as you get above a handful of nodes and a few relationships between them, this becomes a very complex problem: there are whole branches of maths based around this type of challenge and how long it takes to compute a result.
For any non-trivial set of nodes/relationships you are going to need to look at deploying a graph database e.g. Neo4j
Related
I am trying to move data from a database into a pandas data frame. I have data in multiple tables that I want to combine.
I'm using SQLAlchemy and relationship between parent/children.
I'm trying to understand how I'd do this in SQL before attempting in SQLAlchemy
I am using Sqlite as a DB.
parent_table
ID | Name | Class
1 | Joe | Paladin
2 | Ron | Mage
3 | Sara | Knight
child1
ID | distance | finished | parent_id
1 | 2 miles | yes | 1
2 | 3 miles | yes | 1
3 | 1 miles | yes | 1
4 | 10 miles | no | 2
child2
ID | Weight | height | parent_id
1 | 5 lbs | 5'3 | 1
2 | 10 lbs | 5'5 | 2
I want to write a query where the result would be everything for Joe (id: 1) on a row.
1 | Joe | Paladin | 2 miles | yes | 3 miles | yes | 1 miles | yes | 5lbs | 5'3
2 | Ron | Mage | 10 miles | no | None | None | None | None | 10lbs | 5'5
3 | Sara | Knight | None | None | None | None | None | None | None | None
I'm guessing I need to do a join, but confused about the fact that Ron has less child1 entries.
How do I construct a table that has as many columns as needed and fills out the empty ones as None when some of the rows in parent_table don't have as many children?
simply search everyone by themself and use a union to join:
SELECT Name,Class FROM parent_table WHERE ID = 1
UNION
SELECT distance,finished FROM child1 WHERE parent_id = 1
UNION
SELECT weight,height FROM child2 WHERE parent_id =1
This way you avoid the problem for Ron or anyone that does not have a register in a table,
You can't have "As many columns as needed" because the number of child rows is variable and you can't have a variable number of columns. If you can figure out a fixed number of children, (say 2) you can do:
CREATE TABLE
"some_table"
AS
SELECT
"parent_table"."ID",
"parent_table"."Name",
"parent_table"."Class",
"child1_1"."finished" AS "2_miles",
"child1_2"."finished" AS "3_miles"
FROM
"parent_table",
"child1" AS "child1_1",
"child1" AS "child1_2"
WHERE
"child1_1"."parent_id"="parent_table"."id" AND
"child1_2"."parent_id"="parent_table"."id" AND
"child1_1"."distance"='2 miles' AND
"child1_2"."distance"='3 miles'
You can add columns from child2 in the same manner. And child subkeys (data in child1.distance i.e.) will need to go to column names. But for variable one-to-many relations, you need multiple tables. It's basically what the relational concept is all about.
For data analysis (which you are trying to do as it seems) you will also need two datasets (like tables) because the 2 measurements (sample sets) are not correlated (i.e. distances and weights), which you can obtain in 2 tables. Think of what a "sample" is (the result of a measurement). It can't be "entity 1 completed 2 miles and 4lbs" because "2 miles and 4 lbs" it's not a measurable event. So you have 2 distinct samples: "entity 1 completed 2 miles" and "entity 1 completed 4 lbs". (Or are the data in child2 1-to-1 properties of the entity in parent_table ? You should detail better the meaning of the data and what you-re trying to achieve).
I have an influxdb table lets call it my_table
my_table is structured like this (simplified):
+-----+-----+-----
| Time| m1 | m2 |
+=====+=====+=====
| 1 | 8 | 4 |
+-----+-----+-----
| 2 | 1 | 12 |
+-----+-----+-----
| 3 | 6 | 18 |
+-----+-----+-----
| 4 | 4 | 1 |
+-----+-----+-----
However I was wondering if it is possible to find out how many of the metrics are larger than a certain (dynamic) threshold for each time.
So lets say I want to know how many of the metrics (columns) are higher than 5,
I would want to do something like this:
select fieldcount(/m*/) from my_table where /m*/ > 5
Returning:
1
1
2
0
I am relatively restricted in structuring the database as I'm using diamond collector (python) which takes care of all datacollection for me and flushes it to my influxdb without me telling what the tables should look like.
EDIT
I am aware of a possible solution if I hardcode the threshold and add a third metric named mGreaterThan5:
+-----+-----+------------------+
| Time| m1 | m2 |mGreaterThan5|
+=====+=====+====+=============+
| 1 | 8 | 4 | 1 |
+-----+-----+----+-------------+
| 2 | 1 | 12 | 1 |
+-----+-----+----+-------------+
| 3 | 6 | 18 | 2 |
+-----+-----+----+-------------+
| 4 | 4 | 1 | 0 |
+-----+-----+----+-------------+
However this means that I cant easily change this threshold to 6 or any other number so thats why I would prefer a better solution if there is one.
EDIT2
Another similar problem occurs with trying to retrieve the highest x amount of metrics. Eg:
On Jan 1st what were the highest 3 values of m? Given table:
+-----+-----+----+-----+----+-----+----+
| Time| m1 | m2 | m3 | m4 | m5 | m6 |
+=====+=====+====+=====+====+=====+====+
| 1/1 | 8 | 4 | 1 | 7 | 2 | 0 |
+-----+-----+----+-----+----+-----+----+
Am I screwed if I keep the table structured this way?
I have the following issue:
I am planning a database with trains. Each train has carriages which divides into compartment and non-compartment. Both of these types has three classes: 1,2,3, and all of them has different amount of places in compartment or in a row.
I could create the following table:
| type | class | seats in a row | rows | seats in a compartment | compartments |
| non-c| 1 | 3 | 18 | NULL | NULL |
| non-c| 2 | 4 | 22 | NULL | NULL |
| non-c| 3 | 5 | 25 | NULL | NULL |
| comp | 1 | NULL | NULL | 6 | 9 |
| comp | 2 | NULL | NULL | 8 | 10 |
| comp | 3 | NULL | NULL | 10 | 11 |
That is, I would set NULL when a property is not connected with a particular type (example number of places in a compartment for a non-compartment car), but in my opinion it is not good looking solution. Do you have any other ideas? Maybe two tables: non-compartment attributes and compartment attributes? However I think that better solution exists.
Like you said, break your design into tables that correspond to logical entities (normalization), that way you will have more scope to accommodate change and less redundant info.
Proposed design
Tables
Tbl_train(Id, other_train_info) -Stores only tran info
Tbl_Carriage(Id, trainid, carriagetypeid, other_carriage_info) - stores carriage info related to a train
Tbl_carriagetype_master(Id, type_desc, class, .. Etc) - stores all the static compartmental info
In MySQL I have a table called "meanings" with three columns:
"person" (int),
"word" (byte, 16 possible values)
"meaning" (byte, 26 possible values).
A person assigns one or more meanings to each word:
person word meaning
-------------------
1 1 4
1 2 19
1 2 7 <-- Note: second meaning for word 2
1 3 5
...
1 16 2
Then another person, and so on. There will be thousands of persons.
I need to find for each of the 16 words the top three meanings (with their frequencies). Something like:
+--------+-----------------+------------------+-----------------+
| Word | 1st Most Ranked | 2nd Most Ranked | 3rd Most Ranked |
+--------+-----------------+------------------+-----------------+
| 1 | meaning 5 (35%) | meaning 19 (22%) | meaning 2 (13%) |
| 2 | meaning 8 (57%) | meaning 1 (18%) | meaning 22 (7%) |
+--------+-----------------+------------------+-----------------+
...
Is it possible to solve this with a single MySQL query?
Well, if you group by word and meaning, you can easily get the % of people who use each word/meaning combination out of the dataset.
In order to limit the number of meanings for each word returned, you will need create some sort of filter per word/meaning combination.
Seems like you just want the answer to your homework, so I wont post more than this, but this should be enough to get you on the right track.
Of course you can do
SELECT * FROM words WHERE word = 2 ORDER BY meaning DESC LIMIT 3
But this is cheating since you need to create a loop.
Im working on a better solution
I believe the problem I had a while ago looks similar. I ended up with the #counter thing.
Note about the problem
Let's suppose there is only one person, who says:
+--------+----------------+
| Person | Word | Meaning |
+--------+----------------+
| 1 | 1 | 7 |
| 1 | 1 | 3 |
| 1 | 2 | 8 |
+--------+----------------+
The report should read:
+--------+------------------+------------------+-----------------+
| Word | 1st Most Ranked | 2nd Most Ranked | 3rd Most Ranked |
+--------+------------------+------------------+-----------------+
| 1 | meaning 7 (100%) | meaning 3 (100%) | NULL |
| 2 | meaning 8 (100%) | NULL | NULL |
+--------+------------------+------------------+-----------------+
The following is not OK (50% frequency is absurd in a population of one person):
+--------+------------------+------------------+-----------------+
| Word | 1st Most Ranked | 2nd Most Ranked | 3rd Most Ranked |
+--------+------------------+------------------+-----------------+
| 1 | meaning 7 (50%) | meaning 3 (50%) | NULL |
| 2 | meaning 8 (100%) | NULL | NULL |
+--------+------------------+------------------+-----------------+
The intended meaning of the frequencies is "How many people think this meaning corresponds to that word"?
So it's not merely about counting "cases", but about counting persons in the table.
Thankfully, I haven't had to work with particularly complex SQL queries before. Here's my goal.
I have the table hams, which I would like to cross-join with the table eggs - that is, get all ham-egg combinations... to an extent.
The eggs table also has an attribute how_cooked, which is defined as ENUM('over-easy','scrambled','poached'). I would like a resultset listing every possible combination of ham and egg-cooking method, along with a sample egg cooked that way. (I don't care which egg in particular.)
So if 3 hams with id of 1, 2, and 3, and 3 eggs of each cooking method, my resultset should look something like this:
+---------+-----------------+---------+
| hams.id | eggs.how_cooked | eggs.id |
+---------+-----------------+---------+
| 1 | over-easy | 1 |
| 1 | scrambled | 4 |
| 1 | poached | 7 |
| 2 | over-easy | 1 |
| 2 | scrambled | 4 |
| 2 | poached | 7 |
| 3 | over-easy | 1 |
| 3 | scrambled | 4 |
| 3 | poached | 7 |
+---------+-----------------+---------+
I'm sure I could hack together some solution with loads of subqueries here and there, but is there any elegant way to do this is MySQL?
Through a bit of thinking real hard and Googling, I may have found a good solution:
SELECT * FROM hams, eggs GROUP BY hams.id, eggs.how_cooked
It seems to work. Is it really that easy?
SELECT hams.id, eggs.how_cooked, eggs.id
FROM hams
CROSS JOIN eggs
This does the trick. CROSS JOIN is synonymous with , but has a higher precedence in MySQL .
MySQL 5.0 Reference - JOIN syntax