Using CTEs to show transitive relationships in data - sql

I'm working on an archaeological database which includes a couple of tables describing the spatial relationship between stratigraphic units. It's quite simple -- a unit is either above or below another unit. For this I have a table that records unit_1, unit_2 and the type of spatial relationship between them (above or below). I also want to generate a view which also records the transitive counterpart. In other words, if unit A is above unit B, I also want a temporary row stating that Unit B is below unit A.
This is how my CTE looks currently. The error I get is "ERROR: relation "matrix_cte" does not exist", so this is probably not the way to do it. But the idea here is that when the relation is 'above' (which is the same as 1), the INSERT command should add a new line to the table created where the two units are reversed, and the relationship is below (or 2). Any help greatly appreciated, thanks in advance.
WITH matrix_cte (unit, related_unit, relationship)
AS (SELECT lookup_unit,
lookup_unit_2,
lookup_unit_relationship
FROM register_unit_matrix)
INSERT INTO matrix_cte(unit, related_unit, relationship)
SELECT lookup_unit_2, lookup_unit, 2
FROM (register_unit_matrix
INNER JOIN matrix_cte ON ((register_unit_matrix.lookup_unit = matrix_cte.unit)))
WHERE relationship = 1;

You can't INSERT into CTE. CTE is a logical table, it is an alias to a result set. You can SELECT from CTE.
Not really sure what you are trying to achieve there.
ERROR: relation "matrix_cte" does not exist
This error messages means that you can INSERT only into the relations (tables). CTE is not a table, it is not a permanent object in the database and your database doesn't have a table called matrix_cte.
To generate all relationships, both direct and inverse you can UNION two result sets together. If your original table has only relationships in one direction, then you can use UNION ALL and the query will be faster. I mean, if original table never has two rows for the same pair of units:
unit1, unit2, 1
unit2, unit1, 2
then you can use UNION ALL below. If original table may have such duplicates, you should use UNION to remove extra duplicates.
-- all direct relationships as they are
SELECT
lookup_unit,
lookup_unit_2,
lookup_unit_relationship
FROM register_unit_matrix
UNION
-- inverse all relationships
SELECT
lookup_unit_2,
lookup_unit,
CASE WHEN lookup_unit_relationship = 1 THEN 2 ELSE 1 END AS lookup_unit_relationship
FROM register_unit_matrix
You can put this query above into a view, or use as is.

Related

How to get the differences between two - kind of - duplicated tables (sql)

Prolog:
I have two tables in two different databases, one is an updated version of the other. For example we could imagine that one year ago I duplicated table 1 in the new db (say, table 2), and from then I started working on table 2 never updating table 1.
I would like to compare the two tables, to get the differences that have grown in this period of time (the tables has preserved the structure, so that comparison has meaning)
My way of proceeding was to create a third table, in which I would like to copy both table 1 and table 2, and then count the number of repetitions of every entry.
In my opinion, this, added to a new attribute that specifies for every entry the table where he cames from would do the job.
Problem:
Copying the two tables into the third table I get the (obvious) error to have two duplicate key values in a unique or primary key costraint.
How could I bypass the error or how could do the same job better? Any idea is appreciated
Something like this should do what you want if A and B have the same structure, otherwise just select and rename the columns you want to confront....
SELECT
*
FROM
B
WHERE NOT EXISTS (SELECT * FROM A)
if NOT EXISTS doesn't work in your DBMS you could also use a left outer join comparing the rows columns values.
SELECT
A.*
from
A left outer join B
on A.col = B.col and ....

Importing one to many relations in a Join transformation Azure Synapse

I have two data sources that are loaded into Azure Synapse. Both raw data sources contain an 'Apple' table.
I merge these into a single 'Apple' table in my Enriched data store.
SELECT * FROM datasource1.apple JOIN datasource2.apple on datasource1.apple.id = datasource2.apple.id
However, both data sourecs also contain a one to many relation AppleColours.
Please could someone help me understand the correct approach to creating a single AppleColours table in my enriched zone?
You need data from both sources when you want to merge them. JOIN(INNER JOIN) will bring only the apple.id that is in both datasource1 and datasource2.
You should try a CROSS JOIN
For the AppleColours 1 to many relation there are 2 methods:
You could put direct the color in the Apple table in this case there is no need for a separate AppleColours table
Apple
ID| Color
1 | red
2 | green
To get data into Color column make another JOIN this time with the AppleColours on the colorID from Apple table and AppleColours
Create a separate table AppleColours with ID and color. IN this table import both AppleColours tables from both datasources using a CROSS JOIN
Add a column in Apple table named AppleColorId which has the id's from AppleColours
If you want an Applet table that has all the data and don't need any join's to determine the apple color use method 1.
If you want a 'slim' apple table which has minimal data inside use method 2.
In this case to get the apple color you have to make an extra JOIN(INNER JOIN) to the AppleColour table
Maybe including a subquery making an UNION (you will get only one of each), but your problem still will be that, as each table has its own relationship with colours and you are joining both, same item can give you two different colours. My proposal: to make a switch to choose only one, If first is null, choose second, and if second is also null, a default value (some colour code). Other options are to use the lower id, because it was early created, or higher because it was the last...
Something like that
SELECT datasource1.*, datasource2.*, Q.Name, Q.Value FROM datasource1.apple
JOIN datasource2.apple on datasource1.apple.id = datasource2.apple.id
JOIN
(SELECT ColourID, Name, Value FROM datasource1.AppleColours UNION SELECT ColourID, Name, Value FROM datasource2.AppleColours) Q
ON Q.ColourID = COALESCE(datasource1.ColourID, datasource2.ColourID, {DefaultColor})
Are the two data sources supposed to represent slices of the same real population?
I.e., if full joining datasource1 with datasource2 on apple.id is logically consistent, then full joining AppleColours between the 2 datasources should be logically correct as well.
The one-to-many then logically preserves the information from the two datasets, and remains correctly one-to-many. If there are any relationships cardinality violations as the results of this join - those weren't the right cardinalities to begin with.
(btw, should be a full join)

SQL 2 JOINS USING SINGLE REFERENCE TABLE

I'm trying to achieve 2 joins. If I run the 1st join alone it pulls 4 lots of results, which is correct. However when I add the 2nd join which queries the same reference table using the results from the select statement it pulls in additional results. Please see attached. The squared section should not be being returned
So I removed the 2nd join to try and explain better. See pic2. I'm trying to get another column which looks up InvolvedInternalID against the initial reference table IRIS.Practice.idvClient.
Your database is simply doing as you tell it. When you add in the second join (confusingly aliased as tb1 in a 3 table query) the database is finding matching rows that obey the predicate/truth statement in the ON part of the join
If you don't want those rows in there then one of two things must be the case:
1) The truth you specified in the ON clause is faulty; for example saying SELECT * FROM person INNER JOIN shoes ON person.age = shoes.size is faulty - two people with age 13 and two shoes with size 13 will produce 4 results, and shoe size has nothing to do with age anyway
2) There were rows in the table joined in that didn't apply to the results you were looking for, but you forgot to filter them out by putting some WHERE (or additional restriction in the ON) clause. Example, a table holds all historical data as well as current, and the current record is the one with a NULL in the DeletedOn column. If you forget to say WHERE deletedon IS NULL then your data will multiply as all the past rows that don't apply to your query are brought in
Don't alias tables with tbX, tbY etc.. Make the names meaningful! Not only do aliases like tbX have no relation to the original table name (so you encounter tbX, and then have to go searching the rest of the query to find where it's declared so you can say "ah, it's the addresses table") but in this case you join idvclient in twice, but give them unhelpful aliases like tb1, tb3 when really you should have aliased them with something that describes the relationship between them and the rest of the query tables
For example, ParentClient and SubClient or OriginatingClient/HandlingClient would be better names, if these tables are in some relationship with each other.
Whatever the purpose of joining this table in twice is, alias it in relation to the purpose. It may make what you've done wriong easier to spot, for example "oh, of course.. i'm missing a WHERE parentclient.type = 'parent'" (or WHERE handlingclient.handlingdate is not null etc..)
The first step to wisdom is by calling things their proper names

Reproduce certain rows in PostgreSQL

I have a table with IDs that represent Individuals (call it id_table) and another table with characteristics an individual can have (call it char_table).
Now I want to add a column to the id_table containing certain values from the char_table. The difficulty, for me, is, that some ID shall get only one characteristic (simple case) and some ID shall get several characteristics. For that reason the rows with these ID that get several characteristics must be reproduced so often until I have matched every characteristic to that ID.
For Example:
ID '001' shall get the characteristics 'a', 'b', and 'c'. So the row ID='001' must be reproduced 2 times (to get the same row 3 times) and each of these rows shall get one of these 3 characteristics.
I hope I explained intelligible enough.
Anyone an idea how to do this?
Thanks.
In almost all cases, you want to do this with an association or junction table. This table would like like:
create table IndividualCharacteristicsr (
individualId int not null,
characterId int not null
);
(It might also have a unique id itself.)
Each characteristic an individual has would be on a separate row. So, three characteristics mean three rows. A typical query using this information would join all three tables:
from id_table i left outer join
IndividualCharacteristicsr ic
on i.individualId = ic.individualId left outer join
char_table c
on c.characteristicId = ic.characteristicId
(The left outer join includes individuals with no characteristics.)
In Postgres, you could store the characteristics in a string or an array. Neither is very convenient for joining back to id_char. The better approach is an additional table.

SQL join basic questions

When I have to select a number of fields from different tables:
do I always need to join tables?
which tables do I need to join?
which fields do I have to use for the join/s?
do the joins effects reflect on fields specified in select clause or on where conditions?
Thanks in advance.
Think about joins as a way of creating a new table (just for the purposes of running the query) with data from several different sources. Absent a specific example to work with, let's imagine we have a database of cars which includes these two tables:
CREATE TABLE car (plate_number CHAR(8),
state_code CHAR(2),
make VARCHAR(128),
model VARCHAR(128),);
CREATE TABLE state (state_code CHAR(2),
state_name VARCHAR(128));
If you wanted, say, to get a list of the license plates of all the Hondas in the database, that information is already contained in the car table. You can simply SELECT * FROM car WHERE make='Honda';
Similarly, if you wanted a list of all the states beginning with "A" you can SELECT * FROM state WHERE state_name LIKE 'A%';
In either case, since you already have a table with the information you want, there's no need for a join.
You may even want a list of cars with Colorado plates, but if you already know that "CO" is the state code for Colorado you can SELECT * FROM car WHERE state_code='CO'; Once again, the information you need is all in one place, so there is no need for a join.
But suppose you want a list of Hondas including the name of the state where they're registered. That information is not already contained within a table in your database. You will need to "create" one via a join:
car INNER JOIN state ON (car.state_code = state.state_code)
Note that I've said absolutely nothing about what we're SELECTing. That's a separate question entirely. We also haven't applied a WHERE clause limiting what rows are included in the results. That too is a separate question. The only thing we're addressing with the join is getting data from two tables together. We now, in effect, have a new table called car INNER JOIN state with each row from car joined to each row in state that has the same state_code.
Now, from this new "table" we can apply some conditions and select some specific fields:
SELECT plate_number, make, model, state_name
FROM car
INNER JOIN state ON (car.state_code = state.state_code)
WHERE make = 'Honda'
So, to answer your questions more directly, do you always need to join tables? Yes, if you intend to select data from both of them. You cannot select fields from car that are not in the car table. You must first join in the other tables you need.
Which tables do you need to join? Whichever tables contain the data you're interested in querying.
Which fields do you have to use? Whichever fields are relevant. In this case, the relationship between cars and states is through the state_code field in both table. I could just as easily have written
car INNER JOIN state ON (state.state_code = car.plate_number)
This would, for each car, show any states whose abbreviations happen to match the car's license plate number. This is, of course, nonsensical and likely to find no results, but as far as your database is concerned it's perfectly valid. Only you know that state_code is what's relevant.
And does the join affect SELECTed fields or WHERE conditions? Not really. You can still select whatever fields you want and you can still limit the results to whichever rows you want. There are two caveats.
First, if you have the same column name in both tables (e.g., state_code) you cannot select it without clarifying which table you want it from. In this case I might write SELECT car.state_code ...
Second, when you're using an INNER JOIN (or on many database engines just a JOIN), only rows where your join conditions are met will be returned. So in my nonsensical example of looking for a state code that matches a car's license plate, there probably won't be any states that match. No rows will be returned. So while you can still use the WHERE clause however you'd like, if you have an INNER JOIN your results may already be limited by that condition.
Very broad question, i would suggest doing some reading on it first but in summary:
1. joins can make life much easier and queries faster, in a nut shell try to
2. the ones with the data you are looking for
3. a field that is in both tables and generally is unique in at least one
4. yes, essentially you are createing one larger table with joins. if there are two fields with the same name, you will need to reference them by table name.columnname
do I always need to join tables?
No - you could perform multiple selects if you wished
which tables do I need to join?
Any that you want the data from (and need to be related to each other)
which fields do I have to use for the
join/s?
Any that are the same in any tables within the join (usually primary key)
do the joins effects reflect on fields specified in select clause or on where conditions?
No, however outerjoins can cause problems
(1) what else but tables would you want to join in mySQL?
(2) those from which you want to correlate and retrieve fields (=data)
(3) best use indexed fields (unique identifiers) to join as this is fast. e.g. join
retrieve user-email and all the users comments in a 2 table db
(with tables: tableA=user_settings, tableB=comments) and both having the column uid to indetify the user by
select * from user_settings as uset join comments as c on uset.uid = c.uid where uset.email = "test#stackoverflow.com";
(4) both...