Database structure for items with varying attributes

Database structure for items with varying attributes - sql

I am developing a clothes web application and would appreciate advice on how to structure the data in my mysql database.
Every product (item of clothing) will be photograped in a number of ways, let's call them 'modes'. For example a shirt would be photographed buttoned or unbuttoned, and/or tucked in/not tucked in. A pair of trousers would have a different set of possible attributes. I want to store information on the way these items are photographed so I can later use that information to display the item of clothing in particular way.
So one method would be just to store all the possible attributes in a single table, something like:
productId (FK,PK)
modeId (PK)
isLoose
isTuckedIn
Size
HasSmthUnderneath
Where the attributes could be a value or a code defined in another table or NULL if it does not apply to a particular mode.
Then given a particular productId and modeId, I imagine I could filter out the NULL values for attributes which do not apply and use only the relevant ones.
However, I am not sure if that is the ideal way to store this kind of values as I would have alot of NULL values, for example in a pair of trousers which are only photographed in one way. I've heard of the EAV model, is this appropriate?
It's probably worth noting that the number of attributes will be decided by me and not the user and should not change considerably; and that my end goal is to extract the attributes of a particular mode so I can use that data in my application.
Sorry if anything is unclear!

I would be tempted to have the following normalized schema design
Mode Table
id | mode_style
---------------
1 | buttoned
2 | unbuttoned
3 | tucked in
4 | untucked
Clothes Table
id | name | description
----------------------------
1 | shirt | mans shirt...
2 | dress | short sleeve
Clothes_mm_Mode Table (Junction/Map table)
mode_id | clothes_id
--------------------
1 | 1
1 | 2
3 | 3
Then you can easily query those clothes that have an unbuttoned display
SELECT
c.id,
c.name,
c.description
FROM
Clothes c
INNER JOIN
Clothes_Mode cm
ON c.id = cm.clothes_id
WHERE
cm.mode_id = 2
If certain types of clothes are always displayed in the same way i.e. all shirts always have a buttoned and unbuttoned display, you could take out the Clothes_mm_Mode Table and introduce a Common Mode table that maps Modes to a Common Mode id
Common_Modes Table
id | name | description
--------------------------------------------------
1 | Men's Shirt | Common Modes for a Mens shirt
2 | Women's Shirt | Common Modes for a Womens shirt
Common_Modes_mm_Mode Table (Junction/Map table)
common_mode_id | mode_id
--------------------------------------------------
1 | 1
1 | 2
2 | 1
2 | 2
and then associate each item of Clothing with a Common Mode type
Clothing_Common_Modes Table
clothing_id | common_mode_id
----------------------------
1 | 1
The advantage of this design would be that when adding a new item of clothing, only one record need be entered into the Common Modes table to associate that item of clothing with the Modes common to the clothing type. Of course this could be handled without a common modes table by having a procedure that inserts the appropriate records into the original Clothes_mm_Mode Table for a new item of clothing, but by having the relationship in the database, it will be more prominent, visible and easier to maintain.

I think your design is fine. It would be possible to apply database normalization to it, which may give you the following designs alternatively:
have one table per property, each with (id, propvalue) pairs. Only add rows into these tables for items where the property actually applies.
have generic tables (id, propname, propvalue), perhaps one such table per property datatype (boolean, number, string).
With your description, I feel that either is overkill. The only exception would be cases were properties are multi-valued (e.g. list of available colors)

I personally think plain old key/value pairs for this type of thing are underrated, so if you're happy to control it more in the application itself you could also do something like this:
create table ProductStates
(
ProductId int PK
ModeState nvarchar(200) PK
)
Nice and simple in my mind. You get no redundant null values; if the product has that mode then there's a row, if not there's no row. Also means no schema changes required if there's a new state. If you wanted to you could have ModeState instead link out to a ModeStates lookup table, if you think integrity is going to be a problem.
create table ProductStates
(
ProductId int PK
ModeStateId int PK
)
create table ModeStates
(
ModeStateId int PK
ModeStateDescription nvarchar(500)
(...whatever else you might need here)
)
... though that's probably redundant.
Just an alternative, not sure if I'd do it that way myself (depends on the brief(s)). Did I get the specification right?

Related

Language dependent column headers

I am working on an PostgreSQL based application and am very curious if there might be a clever solution to have language dependent column headers.
I sure know, that I can set an alias for a header with the "as" keyword, but that obviously has to be done for every select and over and over again.
So I have a table for converting the technical column name to a mnemonic one, to be shown to the user.
I can handle the mapping in the application, but would prefer a database solution. Is there any?
At least could I set the column header to table.column?

You could use a "view". You can think of a view as a psuedo-table, it can be created using a single or multiple tables created from a query. For instance, if I have a table that has the following shape
Table: Pets
Id | Name | OwnerId | AnimalType
1 | Frank| 1 | 1
2 | Jim | 1 | 2
3 | Bobo | 2 | 1
I could create a "view" that changes the Name field to look like PetName instead without changing the table
CREATE VIEW PetView AS
SELECT Id, Name as PetName, OwnerId, AnimalType
FROM Pets
Then I can use the view just like any other table
SELECT PetName
FROM PetView
WHERE AnimalType = 1
Further we could combine another table as well into the view. For instance if we add another table to our DB for Owners then we could create a view that automatically joins the two tables together before subjecting to other queries
Table: Owners
Id | Name
1 | Susan
2 | Ravi
CREATE VIEW PetsAndOwners AS
SELECT p.Id, p.Name as PetName, o.Name as OwnerName, p.AnimalType
FROM Pets p, Owners o
WHERE p.OwnerId = o.Id
Now we can use the new view again as in any other table (for querying, inserts and deletes are not supported in views).
SELECT * FROM PetsAndOwners
WHERE OwnerName = 'Susan'

Is there a way to insert a record in SQL server if it does not match the latest version of the record based on three of the columns?

Consider the following table named UserAttributes:
+----+--------+----------+-----------+
| Id | UserId | AttrName | AttrValue |
+----+--------+----------+-----------+
| 4 | 1 | FavFood | Apples |
| 3 | 2 | FavFood | Burgers |
| 2 | 1 | FavShape | Circle |
| 1 | 1 | FavFood | Chicken |
+----+--------+----------+-----------+
I would like to insert a new record in this table if the latest version of a particular attribute for a user has a value that does not match the latest.
What I mean by the latest is, for example, if I was to do:
SELECT TOP(1) * FROM [UserAttributes] WHERE [UserId] = 1 AND [AttrName] = 'FavFood' ORDER BY [Id] DESC
I will be able to see that user ID 1's current favorite food is "Apples".
Is there a query safe for concurrency that will only insert a new favorite food if it doesn't match the current favorite food for this user?
I tried using the MERGE query with a HOLDLOCK, but the problem is that WHEN MATCHED/WHEN NOT MATCHED, and that works if I never want to insert a new record after a user has previously set their favorite food (in this example) to the new value. However, it does not consider that a user might switch to a new favorite food, then subsequently change back to their old favorite food. I would like to maintain all the changes as a historical record.
In the data set above, I would like to insert a new record if the user ID 1's new favorite food is "Burgers", but I do not want to insert a record if their new favorite food is "Apples" (since that is their current favorite food). I would also like to make this operation safe for concurrency.
Thank you for your help!
EDIT: I should probably also mention that when I split this operation into two queries (ie: first select their current favorite food, then do an insert query only if there is a new food detected) it works under normal conditions. However, we are observing race conditions (and therefore duplicates) since (as you may have guessed) the data set above is simply an example and there are many threads operating on this table at the same time.

A bit ugly, but to do it in one command, you could insert the user's (new) favorite food but filter with an EXCEPT of their current values.
e.g., (assuming the user's new data is in #UserID, #FavFood
; WITH LatestFavFood AS
(SELECT TOP(1) UserID, AttrName, AttrValue
FROM [UserAttributes]
WHERE [UserId] = #UserID AND [AttrName] = 'FavFood'
ORDER BY [Id] DESC
)
INSERT INTO UserAttributes (UserID, AttrName, AttrValue)
SELECT #UserID, 'FavFood', #FavFood
EXCEPT
SELECT UserID, AttrName, AttrValue
FROM LatestFavFood
Here's a DB_Fiddle with three runs.
EDIT: I have changed the above to assume varchar types for AttrName rather than nvarchar. The fiddle has a mixture. Would be good to ensure you get them correct (especially food as it may have special characters).

Group by non-scalar value

Given a one-to-many relationship between Person and Item
Person Item
------- ------
Id <-----. Id
Name `---- PersonId
Label
Where there are may people and Item.Label takes few distinct values, it might make sense to adopt an equivalent schema:
Person List Item
-------- ------ ------
Id .--> Id <--. Id
ListId --` `-- ListId
Name Label
That way many people can share the same list.
The migration from second schema to the first is trivial. My question is, how to migrate from the first schema to the second?
The challenge is to pick exactly one representative Person for each possible outcome of
SELECT Label FROM Item WHERE PersonId = ?
I was able to solve the problem by using FOR XML present in MS SQL server. That is,
SELECT P.Id, (SELECT Label FROM Item WHERE PersonId = P.Id FOR XML) list
FROM Person P
and then simply SELECT MIN(P.Id) FROM ... GROUP BY list to collect representatives. I'm unsatisfied with this workaround though and wish to find a more pure solution.
edit:
SELECT p.Id, q.Id FROM Person p, Person q
WHERE NOT EXISTS ( --symmetric difference between
(SELECT Label FROM Item WHERE PersonId = p.Id) --and
(SELECT Label FROM Item WHERE PersonId = q.id))
Should be the equivalence relation of Persons, for which representatives need to be found. I still wouldn't know how to finish, and this does seem rather inefficient.

It depends! I suggest you to stick your model to your business logic.
If people own pre-mades sets of items it makes senses to create a table to hold that logic.
Consider people can own just "home edition", "pro edition" or "std edition".
It makes sense to create a relational table between Edition_Items that way that edition can contain items (A,B),(A,B,C,D) and (A,C) for example.
And you can make a relational table between People and Edition it owns. At your scenario if that editions are "customized" editions, even if you got two to contain the same set of items you can consider they are different sets (just because they are owned by different people).
So that "Assembled Set" table can be used as a relational table between people and items.
Edit:
OP comment enforces my last statement.
So your "List" table can be a relational table between People and items.
|People | |List| |List_Item| |Item|
|-------| |----| |---------| |----|
|P1, L1 | | L1 | | L1, I1 | |I1 |
|P2, L2 | | L2 | | L1, I2 | |I2 |
| L3 | | L2, I1 | |I3 |
| L4 | | L2, I1 |
Seeing it you can ask, why keep a List table? That's use full if that List got some properties like: isDeleted, Description, CreateTime, etc
And the final question is? We put a reference of list on people or a reference of people in the list (or create another relational table?)
It depenses on:
1) People List is a 1-1 relation?
2) Who comes first? (egg and chicken problem?)
That's usually better questioning: Who can exist without the other.

How to change values of foreign keys in postgresql?

Let's say I have two tables: Customer and City. There are many Customers that live in the same City. The cities have an uid that is primary key. The customers have a foreign key reference to their respective city via Customer.city_uid.
I have to swap two City.uids with one another for external reasons. But the customers should stay attached to their cities. Therefore it is necessary to swap the Customer.city_uids as well. So I thought I first swap the City.uids and then change the Customer.city_uids accordingliy via an UPDATE-statement. Unfortunately, I can not do that since these uids are referenced from the Customer-table and PostgreSQL prevents me from doing that.
Is there an easy way of swapping the two City.uids with one another as well as the Customer.city_uids?

One solution could be:
BEGIN;
1. Drop foreign key
2. Make update
3. Create foreign key
COMMIT;
Or:
BEGIN;
1. Insert "new" correct information
2. Remove outdated information
COMMIT;

My instinct is to recommend not trying to change the city table's id field. But there is lot of information missing here. So it really is a feeling rather than a definitive point of view.
Instead, I would swap the values in the other fields of the city table. For example, change the name of city1 to city2's name, and vice-versa.
For example:
OLD TABLE NEW TABLE
id | name | population id | name | population
------------------------- -------------------------
1 | ABerg | 123456 1 | BBerg | 654321
2 | BBerg | 654321 2 | ABerg | 123456
3 | CBerg | 333333 3 | CBerg | 333333
(The ID was not touched, but the other values were swapped. Functionally the same as swapping the IDs, but with 'softer touch' queries that don't need to make any changes to table constraints, etc.)
Then, in your associated tables, you can do...
UPDATE
Customer
SET
city_uid = CASE WHEN city_uid = 1 THEN 2 ELSE 1 END
WHERE
city_uid IN (1,2)
But then, do you have other tables that reference city_uid? And if so, is it feasible for you to repeat that update on all those tables?

You could create two temporary cities.
You would have:
City 1
City 2
City Temp 1
City Temp 2
Then, you could do the follow:
Update all Customer UIDs from City 1 to City Temp 1.
Update all Customer UIDs from City 2 to City Temp 2.
Swap City 1 and 2 UIDs
Move all Customers back from City Temp 1 to City 1.
Move all Customers back from City Temp 2 to City 2.
Delete the temporally cities.

You can also add an ON UPDATE CASCADE clause to the parent table's CREATE TABLE statement, as described here:
How to do a cascading update?

SQL Server 2008 localization of tables

I need to localize a SQL Server 2008 database. After investigating recommendations, I have found that it is best to have separate tables or each of the languages for the strings. That way different sorting settings can be set for each table. For example, a typical Product table has ProdID, Product Description, and Price fields. The recommended solution is to set the table structures to have the Product table be ProdID and Price. Then a specific table for each language would have the following structure: ProdID and Description.
My question is how do I create a store procedure that has a parameter which passes in the culture to use for the sub-table and then use that to join the tables? The sub-table needs to change based on the parameter. How can that be done? I am using SQL Server 2008.

First off, are you sure you really want to implement different tables for each culture? It would make more sense to modify your Product table to remove the description, and then add a ProductDescription table with a ProdID, culture, and description field. This way you don't have to toy around with dynamic SQL (which is what you'll have to use) to select the correct table based on the culture parameter.

...specific table for each language would have the following structure: ProdID and Description.
...which is why you're having to look at a really involved setup to get your information out of the database.
A better approach would be to use a single table, and use a code for the language. You don't want to be defining a column per attribute you want translated either, so you'd be looking at implementing something like:
LANGUAGES table
LANGUAGE_ID, pk
LANGUAGE_DESCRIPTION
Example data:
LANGUAGE_ID | LANGUAGE_DESCRIPTION
------------------------------------
1 | ENGLISH
2 | FRENCH
TRANSLATED_ATTRIBUTES table
TRANSLATED_ATTRIBUTE_ID, pk
TRANSLATED_ATTRIBUTE_DESC
Example data:
TRANSLATED_ATTRIBUTE_ID | TRANSLATED_ATTRIBUTE_DESC
------------------------------------
1 | PROD_ID
2 | PROD_DESC
LOCALIZATIONS table
LANGUAGE_ID, pk
TRANSLATED_ATTRIBUTE_ID, pk
TRANSLATED_VALUE
Example data:
LANGUAGE_ID | TRANSLATED_ATTRIBUTE_ID | TRANSLATED_VALUE
----------------------------------------------------------
1 | 1 | Product ID
2 | 1 | Produit ID
You'll want a table associating the TRANSLATED_ATTRIBUTE_ID with a given item - Product is the example you've given so:
ATTRIBUTES table
ATTRIBUTE_ID, pk
ATTRIBUTE_TYPE_CODE, fk
TRANSLATED_ATTRIBUTE_ID, fk
Example data:
ATTRIBUTE_ID | ATTRIBUTE_TYPE_CODE | TRANSLATED_ATTRIBUTE_ID
----------------------------------------------------------------
1 | PRODUCT | 1
If you want to relate on a per product basis:
ATTRIBUTES table
ATTRIBUTE_ID, pk
PRODUCT_ID, fk
TRANSLATED_ATTRIBUTE_ID, fk
Now can you use two parameters - the language (English) & what the item is (Product):
SELECT t.translated_attribute_desc,
t.translated_value
FROM LOCALIZATIONS t
JOIN TRANSLATED_ATTRIBUTES ta ON ta.translated_attribute_id = t.translated_attribute_id
JOIN ATTRIBUTES a ON a.translated_attribute_id = ta.translated_attribute_id
JOIN ATTRIBUTE_TYPE_CODES atc ON atc.attribute_type_code = a.attribute_type_code
JOIN LANGUAGES lang ON lang.language_id = t.language_id
WHERE lang.language_description = 'ENGLISH' --alternate: lang.language_id = 1
AND atc.attribute_type_code = 'PRODUCT'
You can pivot the data as necessary.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas