How to make this relation in PostgreSQL? - sql

Hello.
As shown in the ER model, I want to create a relation between "Busses" and "Chauffeurs", where every entity in "Chauffeurs" must have at least one relation in "Certified", and every entity in "Busses" must have at least one relation in "Certified".
Though it was pretty easy to design the ER model, I can't seem to find a way of making this relation in PostgreSQL. Anybody got some ideas ?
Thanks

The solution should be database agnostic. If I understand you correctly, you probably want your certified table to look like:
CERTIFIED
id
bus_id
chauffer_id
...
...

The only solution I've been able to find is the notion of a single mandatory field in the parent table to represent the "at least one" and then storing the 2 or more relationships in the intersection table.
chauffeurs
chauffeur_id
chauffer_name
certified_bus_id (not null)
certified
chauffer_id
bus_id
busses
bus_id
bus_name
certified_chauffer_id (not null)
To get a list of all busses where a chauffer is certified becomes
select c.chauffer_name, b.bus_name
from chauffeurs c
inner join busses b on (b.bus_id = c.certified_bus_id)
UNION
select c.chauffer_name, b.bus_name
from chauffeurs c
inner join certified ct on (c.chauffeur_id = ct.chauffer_id)
inner join busses b on (ct.bus_id = b.bus_id)
The UNION (vs UNION ALL) takes care of deduplication with the values in certified.

Related

What is a SQL Server data structure to represent several objects chained together?

Suppose I have two tables in a SQL Server database: dbo.Companies and dbo.Contracts.
Suppose that I want to represent a one-to-one relationship between companies and contracts to indicate that a contract has been awarded to a company. I would simply create a join table with a foreign key to dbo.Companies and dbo.Contracts.
The above scenario is easy for any experienced SQL developer.
Suppose then, that I want to do something similar, but each company to which a contract is awarded can subcontract out to another company, which can subcontract out further.
I might have a company called "Acme", which might subcontract to "Evil Geniuses", which might subcontract to "Evil on a Budget".
Or then again, I "Acme" might subcontract to "Evil Geniuses" which subcontracts to "Evil on a Budget, but "Acme" might simultaneously subcontract another portion of the same contract to "Falling Anvils".
One must bear in mind that a company may be involved in many, many different contracts at different levels, or there may be companies in the system that are awarded no contracts at all.
What sort of data structure allows me to describe these relationships in SQL?
Edit: I am using MS SQL Server.
The same as a one-to-one (or one-to-many) relationship is sufficient.
SQL allows for recursive traversal of such relationships.
I have not tested the following query, but it should give you a basic idea.
WITH RECURSIVE ContractHolders AS (
SELECT * FROM dbo.Companies AS m
JOIN dbo.Contracts AS n ON m.id = n.ContractHolderId
WHERE m.name = 'Acme'
UNION ALL
SELECT m.* FROM ContractHolders AS h
JOIN dbo.Companies AS m ON m.id = h.ContractRecipientId
JOIN dbo.Contracts AS n ON m.id = n.ContractHolderId
)
SELECT * FROM ContractHolders;
I guess you use Microsoft SQL Server based on your use of dbo. Read Recursive Queries Using Common Table Expressions for more information.
I would use the contracts table like:
contract_id primary key,
company_id foreign key (companies.company_id),
parent_id foreign key (contracts.contract_id),
*** contract_level,
*** contract details
contract_level is optional here, depending on the business requirements. The largest drawback of this query is the inability of finding the top-level owner of the contract in one query. This can be solved in adding a column like contract_family equalling contract_id of the top-level contract. The query to view all the chain will be like:
select * from contracts
left join companies on contracts.company_id = companies.company_id
where contract_family = ...
order by contract_level

Subquery that matches column with several ranges defined in table

I've got a pretty common setup for an address database: a person is tied to a company with a join table, the company can have an address and so forth.
All pretty normalized and easy to use. But for search performance, I'm creating a materialized, rather denormalized view. I only need a very limited set of information and quick queries. Most of everything that's usually done via a join table is now in an array. Depending on the query, I can either search it directly or join it via unnest.
As a complement to my zipcodes column (varchar[]), I'd like to add a states column that has the (German fedaral) states already precomputed, so that I don't have to transform a query to include all kinds of range comparisons.
My mapping date is in a table like this:
CREATE TABLE zip2state (
state TEXT NOT NULL,
range_start CHARACTER VARYING(5) NOT NULL,
range_end CHARACTER VARYING(5) NOT NULL
)
Each state has several ranges, and ranges can overlap (one zip code can be for two different states). Some ranges have range_start = range_end.
Now I'm a bit at wit's end on how to get that into a materialized view all at once. Normally, I'd feel tempted to just do it iteratively (via trigger or on the application level).
Or as we're just talking about 5 digits, I could create a big table mapping zip to state directly instead of doing it via a range (my current favorite, yet something ugly enough that it prompted me to ask whether there's a better way)
Any way to do that in SQL, with a table like the above (or something similar)? I'm at postgres 9.3, all features allowed...
For completeness' sake, here's the subquery for the zip codes:
(select array_agg(distinct address.zipcode)
from affiliation
join company
on affiliation.ins_id = company.id
join address
on address.com_id = company.id
where affiliation.per_id = person.id) AS zipcodes,
I suggest a LATERAL join instead of the correlated subquery to conveniently compute both columns at once. Could look like this:
SELECT p.*, z.*
FROM person p
LEFT JOIN LATERAL (
SELECT array_agg(DISTINCT d.zipcode) AS zipcodes
, array_agg(DISTINCT z.state) AS states
FROM affiliation a
-- JOIN company c ON a.ins_id = c.id -- suspect you don't need this
JOIN address d ON d.com_id = a.ins_id -- c.id
LEFT JOIN zip2state z ON d.zipcode BETWEEN z.range_start AND z.range_end
WHERE a.per_id = p.id
) z ON true;
If referential integrity is guaranteed, you don't need to join to the table company at all. I took the shortcut.
Be aware that varchar or text behaves differently than expected for numbers. For example: '333' > '0999'. If all zip codes have 5 digits you are fine.
Related:
What is the difference between LATERAL and a subquery in PostgreSQL?

SQL Multiple Joins - How do they work exactly?

I'm pretty sure this works universally across various SQL implementations. Suppose I have many-to-many relationship between 2 tables:
Customer: id, name
has many:
Order: id, description, total_price
and this relationship is in a junction table:
Customer_Order: order_date, customer_id, order_id
Now I want to write SQL query to join all of these together, mentioning the customer's name, the order's description and total price and the order date:
SELECT name, description, total_price FROM Customer
JOIN Customer_Order ON Customer_Order.customer_id = Customer.id
JOIN Order = Order.id = Customer_Order.order_id
This is all well and good. This query will also work if we change the order so it's FROM Customer_Order JOIN Customer or put the Order table first. Why is this the case? Somewhere I've read that JOIN works like an arithmetic operator (+, * etc.) taking 2 operands and you can chain operator together so you can have: 2+3+5, for example. Following this logic, first we have to calculate 2+3 and then take that result and add 5 to it. Is it the same with JOINs?
Is it that behind the hood, the first JOIN must first be completed in order for the second JOIN to take place? So basically, the first JOIN will create a table out of the 2 operands left and right of it. Then, the second JOIN will take that resulting table as its left operand and perform the usual joining. Basically, I want to understand how multiple JOINs work behind the hood.
In many ways I think ORMs are the bane of modern programming. Unleashing a barrage of underprepared coders. Oh well diatribe out of the way, You're asking a question about set theory. THere are potentially other options that center on relational algebra but SQL is fundamentally set theory based. here are a couple of links to get you started
Using set theory to understand SQL
A visual explanation of SQL

Modeling Existential Facts in a Relational Database

I need a way to represent existential relations in a database. For instance I have a bio-historical table (i.e. a family tree) that stores a parent id and a child id which are foreign keys to a people table. This table is used to describe arbitrary family relationships. Thus I’d like to be able to say that X and Y are siblings without having to know exactly who the parents of X and Y are. I just want to be able to say that there exists two different people A and B such that A and B are each parents of X and Y. Once I do know who A and/or B are I’d need to be able to reconcile them.
The simplest solution I can think of is to store existential people with negative integer user ids. Once I know who the people are, I’d need to cascade update all of the IDs. Are there any well-known techniques for this?
Does existential mean "non existant"?
They don't have to be negative. You could just add a record to People table with no last/first name and perhaps a flag "unknown person". Or existential if you like.
Then when you know something (e.g. like last name but not first) you update this record.
Reconciling duplicate people could be more difficult. I guess you could just update FamilyTree set parent_id=new_id where parent_id=old_id, etc. But this means for instance that the same person could end up with too many parents, so you'll need to perform a number of complex checks before doing that.
I would document only the known relationships in a link table which links your Person table to itself with:
FK Person1ID
FK Person2ID
RelationshipTypeID (Sibling, Father, Mother, Step-Father, Step-Mother, etc.)
With some appropriate constraints on that table (or multiple tables, one for each relationship type if that makes the constraints more logical)
Then when other relationships can possibly (a half-sibling will only share one parent) be inferred (by running an exception query) but are missing, create them.
For instance, people who are siblings who don't have all their parents identified:
SELECT *
FROM People p1
INNER JOIN Relationship r_sibling
ON r_sibling.Person1ID = p1.PersonID
AND r_sibling.RelationshipType = SIBLING_TYPE_CONSTANT
INNER JOIN People p2
ON r_sibling.Person2ID = p2.PersonID
WHERE EXISTS (
-- p1 has a father
SELECT *
FROM Relationship r_father
ON r_father.RelationshipType = FATHER_TYPE_CONSTANT
AND r_father.Person2ID = p1.PersonID
)
AND NOT EXISTS (
-- p2 (p1's sibling) doesn't have a father yet
SELECT *
FROM Relationship r_father
ON r_father.RelationshipType = FATHER_TYPE_CONSTANT
AND r_father.Person2ID = p2.PersonID
)
You might need to UNION the reverse of this query depending on how you want your relationships constrained (siblings are always commutative, unlike other relationships) and then handle mothers similarly.
Hmmm, come to think of it, I guess I need a general way to reconcile duplicate people anyway and I can use it for this purpose. Thoughts?

Weird many to many and one to many relationship

I know I'm gonna get down votes, but I have to make sure if this is logical or not.
I have three tables A, B, C. B is a table used to make a many-many relationship between A and C. But the thing is that A and C are also related directly in a 1-many relationship
A customer added the following requirement:
Obtain the information from the Table B inner joining with A and C, and in the same query relate A and C in a one-many relationship
Something like:
alt text http://img247.imageshack.us/img247/7371/74492374sa4.png
I tried doing the query but always got 0 rows back. The customer insists that I can accomplish the requirement, but I doubt it. Any comments?
PS. I didn't have a more descriptive title, any ideas?
UPDATE:
Thanks to rcar, In some cases this can be logical, in order to have a history of all the classes a student has taken (supposing the student can only take one class at a time)
UPDATE:
There is a table for Contacts, a table with the Information of each Contact, and the Relationship table. To get the information of a Contact I have to make a 1:1 relationship with Information, and each contact can have like and an address book with; this is why the many-many relationship is implemented.
The full idea is to obtain the contact's name and his address book.
Now that I got the customer's idea... I'm having trouble with the query, basically I am trying to use the query that jdecuyper wrote, but as he warns, I get no data back
This is a doable scenario. You can join a table twice in a query, usually assigning it a different alias to keep things straight.
For example:
SELECT s.name AS "student name", c1.className AS "student class", c2.className as "class list"
FROM s
JOIN many_to_many mtm ON s.id_student = mtm.id_student
JOIN c c1 ON s.id_class = c1.id_class
JOIN c c2 ON mtm.id_class = c2.id_class
This will give you a list of all students' names and "hardcoded" classes with all their classes from the many_to_many table.
That said, this schema doesn't make logical sense. From what I can gather, you want students to be able to have multiple classes, so the many_to_many table should be where you'd want to find the classes associated with a student. If the id_class entries used in table s are distinct from those in many_to_many (e.g., if s.id_class refers to, say, homeroom class assignments that only appear in that table while many_to_many.id_class refers to classes for credit and excludes homeroom classes), you're going to be better off splitting c into two tables instead.
If that's not the case, I have a hard time understanding why you'd want one class hardwired to the s table.
EDIT: Just saw your comment that this was a made-up schema to give an example. In other cases, this could be a sensible way to do things. For example, if you wanted to keep track of company locations, you might have a Company table, a Locations table, and a Countries table. The Company table might have a 1-many link to Countries where you would keep track of a company's headquarters country, but a many-to-many link through Locations where you keep track of every place the company has a store.
If you can give real information as to what the schema really represents for your client, it might be easier for us to figure out whether it's logical in this case or not.
Perhaps it's a lack of caffeine, but I can't conceive of a legitimate reason for wanting to do this. In the example you gave, you've got students, classes and a table which relates the two. If you think about what you want the query to do, in plain English, surely it has to be driven by either the student table or the class table. i.e.
select all the classes which are attended by student 1245235
select all the students which attend class 101
Can you explain the requirement better? If not, tell your customer to suck it up. Having a relationship between Students and Classes directly (A and C), seems like pure madness, you've already got table B which does that...
Bear in mind that the one-to-many relationship can be represented through the many-to-many, most simply by adding a field there to indicate the type of relationship. Then you could have one "current" record and any number of "history" ones.
Was the customer "requirement" phrased as given, by the way? I think I'd be looking to redefine my relationship with them if so: they should be telling me "what" they want (ideally what, in business domain language, their problem is) and leaving the "how" to me. If they know exactly how the thing should be implemented, then I'd be inclined to open the source code in an editor and leave them to it!
I'm supposing that s.id_class indicates the student's current class, as opposed to classes she has taken in the past.
The solution shown by rcar works, but it repeats the c1.className on every row.
Here's an alternative that doesn't repeat information and it uses one fewer join. You can use an expression to compare s.id_class to the current c.id_class matched via the mtm table.
SELECT s.name, c.className, (s.id_class = c.id_class) AS is_current
FROM s JOIN many_to_many AS mtm ON (s.id_student = mtm.id_student)
JOIN c ON (c.id_class = mtm.id_class);
So is_current will be 1 (true) on one row, and 0 (false) on all the other rows. Or you can output something more informative using a CASE construct:
SELECT s.name, c.className,
CASE WHEN s.id_class = c.id_class THEN 'current' ELSE 'past' END AS is_current
FROM s JOIN many_to_many AS mtm ON (s.id_student = mtm.id_student)
JOIN c ON (c.id_class = mtm.id_class);
It doesn't seem to make sense. A query like:
SELECT * FROM relAC RAC
INNER JOIN tableA A ON A.id_class = RAC.id_class
INNER JOIN tableC C ON C.id_class = RAC.id_class
WHERE A.id_class = B.id_class
could generate a set of data but inconsistent. Or maybe we are missing some important part of the information about the content and the relationships of those 3 tables.
I personally never heard a requirement from a customer that would sound like:
Obtain the information from the Table
B inner joining with A and C, and in
the same query relate A and C in a
one-many relationship
It looks like that it is what you translated the requirement to.
Could you specify the requirement in plain English, as what results your customer wants to get?