Group by non-scalar value - sql

Given a one-to-many relationship between Person and Item
Person Item
------- ------
Id <-----. Id
Name `---- PersonId
Label
Where there are may people and Item.Label takes few distinct values, it might make sense to adopt an equivalent schema:
Person List Item
-------- ------ ------
Id .--> Id <--. Id
ListId --` `-- ListId
Name Label
That way many people can share the same list.
The migration from second schema to the first is trivial. My question is, how to migrate from the first schema to the second?
The challenge is to pick exactly one representative Person for each possible outcome of
SELECT Label FROM Item WHERE PersonId = ?
I was able to solve the problem by using FOR XML present in MS SQL server. That is,
SELECT P.Id, (SELECT Label FROM Item WHERE PersonId = P.Id FOR XML) list
FROM Person P
and then simply SELECT MIN(P.Id) FROM ... GROUP BY list to collect representatives. I'm unsatisfied with this workaround though and wish to find a more pure solution.
edit:
SELECT p.Id, q.Id FROM Person p, Person q
WHERE NOT EXISTS ( --symmetric difference between
(SELECT Label FROM Item WHERE PersonId = p.Id) --and
(SELECT Label FROM Item WHERE PersonId = q.id))
Should be the equivalence relation of Persons, for which representatives need to be found. I still wouldn't know how to finish, and this does seem rather inefficient.

It depends! I suggest you to stick your model to your business logic.
If people own pre-mades sets of items it makes senses to create a table to hold that logic.
Consider people can own just "home edition", "pro edition" or "std edition".
It makes sense to create a relational table between Edition_Items that way that edition can contain items (A,B),(A,B,C,D) and (A,C) for example.
And you can make a relational table between People and Edition it owns. At your scenario if that editions are "customized" editions, even if you got two to contain the same set of items you can consider they are different sets (just because they are owned by different people).
So that "Assembled Set" table can be used as a relational table between people and items.
Edit:
OP comment enforces my last statement.
So your "List" table can be a relational table between People and items.
|People | |List| |List_Item| |Item|
|-------| |----| |---------| |----|
|P1, L1 | | L1 | | L1, I1 | |I1 |
|P2, L2 | | L2 | | L1, I2 | |I2 |
| L3 | | L2, I1 | |I3 |
| L4 | | L2, I1 |
Seeing it you can ask, why keep a List table? That's use full if that List got some properties like: isDeleted, Description, CreateTime, etc
And the final question is? We put a reference of list on people or a reference of people in the list (or create another relational table?)
It depenses on:
1) People List is a 1-1 relation?
2) Who comes first? (egg and chicken problem?)
That's usually better questioning: Who can exist without the other.

Related

Language dependent column headers

I am working on an PostgreSQL based application and am very curious if there might be a clever solution to have language dependent column headers.
I sure know, that I can set an alias for a header with the "as" keyword, but that obviously has to be done for every select and over and over again.
So I have a table for converting the technical column name to a mnemonic one, to be shown to the user.
I can handle the mapping in the application, but would prefer a database solution. Is there any?
At least could I set the column header to table.column?
You could use a "view". You can think of a view as a psuedo-table, it can be created using a single or multiple tables created from a query. For instance, if I have a table that has the following shape
Table: Pets
Id | Name | OwnerId | AnimalType
1 | Frank| 1 | 1
2 | Jim | 1 | 2
3 | Bobo | 2 | 1
I could create a "view" that changes the Name field to look like PetName instead without changing the table
CREATE VIEW PetView AS
SELECT Id, Name as PetName, OwnerId, AnimalType
FROM Pets
Then I can use the view just like any other table
SELECT PetName
FROM PetView
WHERE AnimalType = 1
Further we could combine another table as well into the view. For instance if we add another table to our DB for Owners then we could create a view that automatically joins the two tables together before subjecting to other queries
Table: Owners
Id | Name
1 | Susan
2 | Ravi
CREATE VIEW PetsAndOwners AS
SELECT p.Id, p.Name as PetName, o.Name as OwnerName, p.AnimalType
FROM Pets p, Owners o
WHERE p.OwnerId = o.Id
Now we can use the new view again as in any other table (for querying, inserts and deletes are not supported in views).
SELECT * FROM PetsAndOwners
WHERE OwnerName = 'Susan'

Query Parent and Children from single table

I currently have a single table that hosts all of my users. Now some users have team_leaders which reference the user id of the team leader which is also stored in the database.
Now, what I wanted to do do (and can't figure out) is how to query the database where it retrieves a list of the ids of all the team members and the leader in one result set.
For Example
name | id | team_leader
--------------------------------------------------
Jack | 1 | null
--------------------------------------------------
Susan| 2 | 1
--------------------------------------------------
Bob | 3 | 1
--------------------------------------------------
Eric | 4 | null
--------------------------------------------------
SELECT name FROM users where team_leader = '<some user's id>'
returns [ 'Susan', Bob']
But I would like it to return the team leader included, such as
['Jack', 'Susan', 'Bob']
Does anyone have any idea how to include the team leader in the query results?
EDIT:
Okay, so it seems like I have not explained myself 100%, my apologies. so the goal of this query is to do as follows.
I have another table called leads and there is a field there that is called user_id which correlates to the user that has access to the lead. Now, I want to introduce the ability for team leaders to update the leads that are associated with their accounts, so if the current user is a team leader they should have the ability to update the user_id from their id to anyone on their team, from one of their children to another, and from one of the children to themselves, but not to anyone not on their team. So the way I thought of it was to have a WHERE EXISTS or a WHERE IN (this would mean adding a field to the lead table called leader_id) and it checks if the new user_id is in a list of that team leader's members, including themselves.
Based off the example above.
UPDATE lead SET user_id = xxx
WHERE lead.id = yyy
AND ...
-- here is where I would check that the user_id xxx is part of the current
-- user's team which must be a team leader, for example user.id = 1
So my thought process was to get the previous query to then check against.
Hope this clears things up.
If I'm understanding correctly, you can just use or:
select name
from users
where team_leader = 1 or id = 1
WITH CTE AS(
SELECT name,id,team_leader FROM [users]
WHERE team_leader=1
UNION ALL
SELECT u.name,u.id,u.team_leader from [users] u
JOIN CTE ON CTE.empno=u.team_leader`enter code here`
and u.team_leader=1
)
SELECT * FROM CTE

Recursively duplicating entries

I am attempting to duplicate an entry. That part isn't hard. The tricky part is: there are n entries connected with a foreign key. And for each of those entries, there are n entries connected to that. I did it manually using a lookup to duplicate and cross reference the foreign keys.
Is there some subroutine or method to duplicate an entry and search for and duplicate foreign entries? Perhaps there is a name for this type of replication I haven't stumbled on yet, is there a specific database related title for this type of operation?
PostgreSQL 8.4.13
main entry (uid is serial)
uid | title
-----+-------
1 | stuff
department (departmentid is serial, uidref is foreign key for uid above)
departmentid | uidref | title
--------------+--------+-------
100 | 1 | Foo
101 | 1 | Bar
sub_category of department (textid is serial, departmentref is foreign for departmentid above)
textid | departmentref | title
-------+---------------+----------------
1000 | 100 | Text for Foo 1
1001 | 100 | Text for Foo 2
1002 | 101 | Text for Bar 1
You can do it all in a single statement using data-modifying CTEs (requires Postgres 9.1 or later).
Your primary keys being serial columns makes it easier:
WITH m AS (
INSERT INTO main (<all columns except pk>)
SELECT <all columns except pk>
FROM main
WHERE uid = 1
RETURNING uid AS uidref -- returns new uid
)
, d AS (
INSERT INTO department (<all columns except pk>)
SELECT <all columns except pk>
FROM m
JOIN department d USING (uidref)
RETURNING departmentid AS departmentref -- returns new departmentids
)
INSERT INTO sub_category (<all columns except pk>)
SELECT <all columns except pk>
FROM d
JOIN sub_category s USING (departmentref);
Replace <all columns except pk> with your actual columns. pk is for primary key, like main.uid.
The query returns nothing. You can return pretty much anything. You just didn't specify anything.
You wouldn't call that "replication". That term usually is applied for keeping multiple database instances or objects in sync. You are just duplicating an entry - and depending objects recursively.
Aside about naming conventions:
It would get even simpler with a naming convention that labels all columns signifying "ID of table foo" with the same (descriptive) name, like foo_id. There are other naming conventions floating around, but this is the best for writing queries, IMO.

Delete data from child tables

I have 2 tables:
"customers" and "addresses". A customer can have several addresses, so they have an "n:m" relationship.
For this reason, I also have the table "customer-addr".
This is how my tables look like:
+---------------+
+-----------+ | customer_addr |
| customers | +---------------+ +-----------+
+-----------+ | id | | addresses |
| id | <---> | cid | +-----------+
| name | | aid | <---> | id |
+-----------+ +---------------+ | address |
+-----------+
I need to update all customer-data incl. all addresses. For this reason I thought about deleting all existing addresses first, then updating the customer-table, and after that, I create every address new.
My question: How can I delete all existing addresses from one customer efficiently? (I have to remove rows from 2 tables).
Is there a single-statement I can use? (Without the cascade-method, this is too risky)
Or can I do it with 2 statements, without using subselects?
What's the best approach for this?
Notice that I'm using postgresql
Edit:
My whole database-design is more complex, and the address-table is not only a child from "customers" but also from "suppliers","bulkbuyers",..
Every address belongs to only one customer OR one supplier OR one bulkbuyer.
(No address is used by more than one parent / no address-sharing)
Ever customer/supplier/.. can have multiple addresses.
For this reason, the edited solution from zebediah49 won't work, because it would also delete all addresses from every supplier/bulkbuyer/...
I would use a writable CTE also called data-modifying CTE in PostgreSQL 9.1 or later:
WITH del AS (
DELETE FROM customer_addr
WHERE cid = $kill_this_cid
RETURNING aid
)
DELETE FROM addresses a
USING (SELECT DISTINCT aid FROM del) d
WHERE a.id = d.aid;
This should be fastest and safest.
If (cid, aid) is defined UNIQUE in customer_addr you don't need the DISTINCT step:
...
DELETE FROM addresses a
USING del d
WHERE a.id = d.aid;
EDIT:
Got it; this is safer because of the risk of two customers sharing an address anyway:
DELETE FROM customer_addr WHERE cid = $TARGET_CID;
DELETE FROM addresses WHERE id NOT IN (SELECT aid FROM customer_addr);
First, delete all references, then delete all unreferenced addresses.
Note that you could, for example, only do the first step, and run the "cleanup" second step at a later time.
I would suggest a two step transaction:
DELETE FROM addresses WHERE id IN (SELECT ca.aid FROM customers c LEFT JOIN customer_addr ca ON ca.cid=c.id WHERE c.name='$NAME_TO_DELETE');
DELETE FROM customer_addr WHERE cid = (SELECT id FROM customers WHERE name='$NAME_TO_DELETE');
If you have the customer ID already (EDIT: You do), you can skip most of that:
DELETE FROM addresses WHERE id IN (SELECT aid FROM customer_addr WHERE cid=$TARGET_CID);
DELETE FROM customer_addr WHERE cid = $TARGET_CID;
Wrap those with the appropriate transactional BEGIN/END, to make sure that you don't end up in an inconsistent state, and you should be set.

Database structure for items with varying attributes

I am developing a clothes web application and would appreciate advice on how to structure the data in my mysql database.
Every product (item of clothing) will be photograped in a number of ways, let's call them 'modes'. For example a shirt would be photographed buttoned or unbuttoned, and/or tucked in/not tucked in. A pair of trousers would have a different set of possible attributes. I want to store information on the way these items are photographed so I can later use that information to display the item of clothing in particular way.
So one method would be just to store all the possible attributes in a single table, something like:
productId (FK,PK)
modeId (PK)
isLoose
isTuckedIn
Size
HasSmthUnderneath
Where the attributes could be a value or a code defined in another table or NULL if it does not apply to a particular mode.
Then given a particular productId and modeId, I imagine I could filter out the NULL values for attributes which do not apply and use only the relevant ones.
However, I am not sure if that is the ideal way to store this kind of values as I would have alot of NULL values, for example in a pair of trousers which are only photographed in one way. I've heard of the EAV model, is this appropriate?
It's probably worth noting that the number of attributes will be decided by me and not the user and should not change considerably; and that my end goal is to extract the attributes of a particular mode so I can use that data in my application.
Sorry if anything is unclear!
I would be tempted to have the following normalized schema design
Mode Table
id | mode_style
---------------
1 | buttoned
2 | unbuttoned
3 | tucked in
4 | untucked
Clothes Table
id | name | description
----------------------------
1 | shirt | mans shirt...
2 | dress | short sleeve
Clothes_mm_Mode Table (Junction/Map table)
mode_id | clothes_id
--------------------
1 | 1
1 | 2
3 | 3
Then you can easily query those clothes that have an unbuttoned display
SELECT
c.id,
c.name,
c.description
FROM
Clothes c
INNER JOIN
Clothes_Mode cm
ON c.id = cm.clothes_id
WHERE
cm.mode_id = 2
If certain types of clothes are always displayed in the same way i.e. all shirts always have a buttoned and unbuttoned display, you could take out the Clothes_mm_Mode Table and introduce a Common Mode table that maps Modes to a Common Mode id
Common_Modes Table
id | name | description
--------------------------------------------------
1 | Men's Shirt | Common Modes for a Mens shirt
2 | Women's Shirt | Common Modes for a Womens shirt
Common_Modes_mm_Mode Table (Junction/Map table)
common_mode_id | mode_id
--------------------------------------------------
1 | 1
1 | 2
2 | 1
2 | 2
and then associate each item of Clothing with a Common Mode type
Clothing_Common_Modes Table
clothing_id | common_mode_id
----------------------------
1 | 1
The advantage of this design would be that when adding a new item of clothing, only one record need be entered into the Common Modes table to associate that item of clothing with the Modes common to the clothing type. Of course this could be handled without a common modes table by having a procedure that inserts the appropriate records into the original Clothes_mm_Mode Table for a new item of clothing, but by having the relationship in the database, it will be more prominent, visible and easier to maintain.
I think your design is fine. It would be possible to apply database normalization to it, which may give you the following designs alternatively:
have one table per property, each with (id, propvalue) pairs. Only add rows into these tables for items where the property actually applies.
have generic tables (id, propname, propvalue), perhaps one such table per property datatype (boolean, number, string).
With your description, I feel that either is overkill. The only exception would be cases were properties are multi-valued (e.g. list of available colors)
I personally think plain old key/value pairs for this type of thing are underrated, so if you're happy to control it more in the application itself you could also do something like this:
create table ProductStates
(
ProductId int PK
ModeState nvarchar(200) PK
)
Nice and simple in my mind. You get no redundant null values; if the product has that mode then there's a row, if not there's no row. Also means no schema changes required if there's a new state. If you wanted to you could have ModeState instead link out to a ModeStates lookup table, if you think integrity is going to be a problem.
create table ProductStates
(
ProductId int PK
ModeStateId int PK
)
create table ModeStates
(
ModeStateId int PK
ModeStateDescription nvarchar(500)
(...whatever else you might need here)
)
... though that's probably redundant.
Just an alternative, not sure if I'd do it that way myself (depends on the brief(s)). Did I get the specification right?