I have a dataset with two columns of non-unique IDs (ID-A and ID-B respectively).
A single ID-A can have multiple ID-Bs and vice versa. I am trying to generate a third, set identifier using transitivity (calling it ID-C) which is set to the same value for all records with either ID-A or ID-B in common. Two records having neither ID-A nor ID-B in common should only share an ID-C set identifier if there is a transitive chain of records between them.
To visualize, I have something like the first two columns, and want to generate the third column(ID-C)
ID-A ID-B ID-C
1 1 1
1 2 1
1 3 1
2 2 1
2 4 1
3 4 1
4 5 2
5 5 2
5 6 2
6 7 3
I am using Presto SQL inside of AWS Athena, so I cannot use any variables or loops, that I am aware of.
Related
I have a table that contains parent child relationships and I need to find all related issues.
Parent
Child
1
3
1
4
2
4
5
6
7
6
I need to be able to identify all related issues, ideally I could return a new table that returns the following.
Issue
Group
1
1
2
1
3
1
4
1
5
2
6
2
7
2
I'm currently using CTEs to try and solve this, but I'm running into circular reference issues and I'm not sure how to create a group number for each. Essentially, it is just tracing up and down different parents and each group does not have one central parent node which I think may be causing the issues.
The teacher gave us a team assignment, and me and my teammate are quite struggling with it (especially since we need to use things like TRIGGERS and PROCEDURES, things we didn't see in class yet …).
We need to implement an arc-relationship, and we fail to understand how …
But before I tell you guys what I need to accomplish, I will give you part of the description of the task, so you guys can understand the situation a bit better …
We basically need to make an ERD for a VLSI CAD-system and we need to implement it. Now, we have our CELL entity, the attributes of which aren't really relevant … The only thing you guys need to know in order to help us is that it has a primary key, CELL_CODE, which is a VARCHAR.
Each CELL has many (I think at least four, I don't think you can have triangular CELLS, but doesn't matter anyways) SIDES. A SIDE can be logically identified by its CELL, and to make matters ridiculously difficult, each SIDE has to be numbered by its CELL, like so:
CELLS:
CELL_CODE
1
2
SIDES:
SEQUENCE_NUMBER CELL_CODE
1 1
2 1
3 1
1 2
2 2
3 2
Now, each SIDE has its CONNECTION_PINS. CONNECTION_PINS is also uniquely identified by SIDES, which are basically numbered in a similar manner:
CELLS:
CELL_CODE
1
2
SIDES:
SEQUENCE_NUMBER CELL_CODE
1 1
2 1
3 1
1 2
2 2
3 2
CONNECTION_PINS:
SEQUENCE_NUMBER SIE_SEQUENCE_NUMBER CELL_CODE
1 1 1
2 1 1
1 2 1
2 2 1
1 3 1
2 3 1
1 1 2
2 1 2
1 2 2
2 2 2
1 3 2
2 3 2
I tried to explain the numbering issue we have here: Data model - PRIMARY KEY numbering issue, but yeah, I didn't really explain it the way it should be explained ...
Now, we have one final entity, which is where the Arc comes in: CONNECTIONS. CONNECTIONS has 2 CONNECTION_PINS: one for START_FROMand one for END_OF. Now, logically seen the start pin can't be the end pin as well, for a given connection. And that's our struggle. Basically, this shouldn't be allowed:
CELLS:
CELL_CODE
1
2
SIDES:
SEQUENCE_NUMBER CELL_CODE
1 1
2 1
3 1
1 2
2 2
3 2
CONNECTION_PINS:
SEQUENCE_NUMBER SIE_SEQUENCE_NUMBER CELL_CODE
1 1 1
2 1 1
1 2 1
2 2 1
1 3 1
2 3 1
1 1 2
2 1 2
1 2 2
2 2 2
1 3 2
2 3 2
CONNECTIONS:
(you shouldn't be able to put this in …)
CPI_SEQNUM_START SIE_SEQNUM_START CELL_CODE_START CPI_SEQNUM_END SIE_SEQNUM_END CELL_CODE_END
1 1 1 1 1 1
Now, this is basically the ERD for this part:
ERD with barred relationships and the arc-relationship in question
and this is the physical model:
Physical model
I basically thought a simple CHECK might do (CHECK (CPI_SEQNUM_START <> CPI_SEQNUM_END AND CELL_CODE_START <> CELL_CODE_END AND SIE_SEQNUM_START <> SIE_SEQNUM_END) ), but that prevented us from inserting anything somehow … Any advice?
Your approach was correct to use a CHECK constraint. Your logic for the constraint was wrong though. You need an OR condition. Only one of the three fields needs to be different.
CPI_SEQNUM_START <> CPI_SEQNUM_END OR
CELL_CODE_START <> CELL_CODE_END OR
SIE_SEQNUM_START <> SIE_SEQNUM
... assuming all three fields are not nullable.
I have a requirement from client where I need to store a value against list of combination.
For example I have following LOBs and against each combination I need to store a value.
Auto
WC
Personal
I purposed multiple solutions he is not satisfied with anyone.
Solution 1: create single table, insert value against all possible combination(string) something like
LOB Value
Auto 1
WC 2
Personal 3
Auto,WC 4
Auto, personal 5
WC, Personal 6
Auto, WC, Personal 7
Solution 2: create lkp_lob, lob_group and lob_group_detail tables. Each group combination represent a group.
Lkp_lob
Lob_key Name
1 Auto
2 WC
3 Person
Lob_group (unique query constrain on lob_group_key and lob_key)
Lob_group_key Lob_key
1 1
2 2
3 3
4 1
4 2
5 1
5 3
6 2
6 3
7 1
7 2
7 3
Lob_group_detail
Lob_group_key Value
1 1
2 2
3 3
4 4
5 5
6 6
7 7
Any suggestion would be highly appreciated.
First of all I did not understood that terms you said.
But from database perspective it is always good to have multiple tables for each module. You will be facing less difficulties when doing CRUD. And will be more faster.
I've Benchmarking table like this
BMID TestID BMTitle ConnectedTestID
---------------------------------------------------
1 5 My BM1 0
2 6 My BM2 5
3 7 My BM3 5,6
4 8 My BM4 10,12,8
5 9 My BM5 0
6 10 My BM6 3,6
7 5 My BM7 8,3,12,9
8 3 My BM8 7,10
9 8 My BM9 0
10 12 My BM10 9
---------------------------------------------
Explaining the table a little
Here the TestID and the connected TestID is playing the roles. If the user wants all the benchmarks for the TestID 3
It should return rows where testID=3 and also if any rows having connectedTestID column having that testID in it among the comma separated values
That means if the user specify the value 3 as the testID, it should return
---------------------------------------------
8 3 My BM8 7,10
7 5 My BM7 8,3,12,9
6 10 My BM6 3,6
--------------------------------------------
Hope its clear how those 3 rows returned. Means First row is because the testID 3 is there. the other two rows because 3 is in their connectedIDs cell
You should fix the data structure. Storing numeric ids in a comma-delimited list is a bad, bad, bad idea:
SQL Server doesn't have the best string manipulation functions.
Storing numberings as character strings is a bad idea.
Having undeclared foreign key relationships is a bad idea.
The resulting queries cannot make use of indexes.
While you are exploring what a junction table is so you can fix the problem with the data structure, you can use a query such as this:
where testid = 3 or
',' + ConnectedTestID + ',' like '%,3,%'
How to fit multiple equal references into table structure? How could I do that? For example: I have list of classmates:
1 Peter
2 Jack
3 John
4 Mary
5 Birgit
6 Stella
7 Janus
8 Margo
9 Fred
Now I want to define fellowships. In first place, let's limit that every kid may belong to one fellowship. So we could have 3 fellowships:
[Peter, Jack]
[John, Mary, Birgit]
[Stella, Janus, Margo, Fred]
All members are equal, so they all should reference to other members. Is there better ways to define such relations than just to have table of pairs? Like:
1 2
3 4
3 5
4 5
4 3
5 3
5 4
6 7
6 8
6 9
7 6
7 8
7 9
8 6
8 7
8 9
9 6
9 7
9 8
If using table of pairs, is it better to describe relation both way (like above), or is it enough to have link just from one way to another? What are the benefits of both ways?
Table of pairs does not constrain any member into just one fellowsip, but how would it possible?
I was looking for SQL table solution, but maybe there are better tools for handling such data-structures, so I added nosql-tag too. I am looking for right tools for such data, but I am eager to know, how to fit it in SQL tables too.
Yes, there is another way. If you have "fellowships", then you do not have pair-wise relationships. STart with a Fellowships table that has a FellowshipsId.
Then you would have a FellowshipsKids table. This is called a junction table, and it would have one row for each member of each fellowship. It would have rows like this:
FellowshipId KidId
1 1
1 2
2 3
2 4
2 5
. . .
What you have is an m-n relationship between fellowships and kids -- one fellowship can have multiple kids, one kids can be in multiple fellowships. A junction table is the standard way of represent this in a relational database.