SQL Constraint/Check on Join tables - sql

I have three tables: store, product, storeproduct.
It doesn't really matter what's in the store and the product table, just know there is a storeID in the store table, and a productID in the product table. However the storeproduct table keeps track of the different products each store has. So the storeproduct table has two columns. The storeID column, and the productID column, both foreign keys from the store and the product table.
Is there a way to put a constraint or check on any of the table to make sure that a store must have more than 0 products, and less than 50 products.
Note: I do not want a select statement to do this. I just want to know if there is a way to put a constraint or a check when creating the tables.
The point of this is so a user cannot insert into the storeproduct table if there are already 50 products(rows) with the same storeID, or delete from the storeproduct table if deleting a row will cause the last row with that storeID to be gone.
The storeproduct table might look like this
storeID productID
1 1
1 2
1 3
2 4
2 5
2 6
2 7
3 4
3 2
3 6
3 1
3 8

Actually, depending on your database you may be able to do this.
Oracle (and maybe others) provide materialized views which you can apply constraints to. So you could create the MV with a column PRODUCTS_IN_STORES (being something like select storeID, count(*) as PRODUCTS_IN_STORES from stores left outer join storeproduct on store.storeid=storeproduct.storeid group by store.storeid .Then put a constraint on it asserting that PRODUCTS_IN_STORES is between 0 and 50 or whatever.
http://www.sqlsnippets.com/en/topic-12896.html
and
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:21389386132607
Not a complete answer for you, but something to think about and hopefully set you on your way.

Related

SQL insert, select performance for categorized product table

I have relational category & product tables. Categories are hierarchical. I will have queries based on category, for example
select *
from products
where CatId = 3
or
select *
from products
where CatId = 1
I have 6 level of category and 24 million row for products, I have to find fast and optimal solutions. My question is which structure is suitable.
I write some options, feel free to suggest a better alternative.
Current category table:
Id ParentId Name
---------------------
1 null CatA
2 null CatB
3 1 CatAa
4 2 CatBa
Product table option 1
Id Cat Name
------------------
1 3 Product_1
2 4 Product_2
Product table option 2
Id CatLevel1 CatLevel2 ... Name
-------------------------------------
1 1 3 . Product_1
2 2 4 . Product_2
Product table option 3
Id Cats Name
------------------
1 1:3 Product_1
2 2:4 Product_2
Always keep option one, plus some denormalised tables (options two onwards) if you so desire. By keeping option one, you have the source truth to revert to or derive the others from.
Option two is only recommended if the searcher always knows what depth/level to search at. For example, if they know they need Level2=CATAb then it works, but if they don't know CATAb is at level two, they don't know which column to look in. It also relies on knowing how many levels to represent; if you can have a hundred levels, you need a hundred columns, and it's fragile of you need to add more depths. Generally, this doesn't apply and so is generally not a good optimisation.
Option three is a straight no. Never store multiple values in a one field (one column of one row). It will make Efficient searching of that column next to impossible.
The alternative to option three is to have a "link" table. Just two columns, category_id and product_id. Then you list all ancestors of a product, just on different rows.
category_id
product_id
1
1
3
1
2
2
4
2
These are all known as adjacency lists. A different model altogether is Nested Sets. I'm on my phone, and it's hard to describe without lots of formatting, but if you research online you'll find lots of information. They're much harder to comprehend and implement Initially, but very fast at retrieval when specifying a parent.
Your product table option 1 is fine and need no change
product_id,
category_id,
... other attributes
Your problem is in accessing the product based on the category hierarchy - which would make a need of a hierarchical query to get all categories in the tree below your selected category.
Instead of
select * from product where category_id = 1;
you'll need to write an additional hierarchical query to get the whole hierarchy tree
with cat_tree (id) as (
select id
from category where id = 1
UNION ALL
select ca.id
from cat_tree ct
join category ca
on ct.id = ca.parent_id
)
select * from product
where category_id in
(select id from cat_tree);
Which may not be practicable, but you may simplify it by denormalizing the category table
Let's assume your category data is such as
ID PARENT_ID
---------- ----------
1
3 1
5 3
6 3
The query below, which may be implemented as a MATERIALIZED VIEW that is refreshed on each category change pre-calculates all direct and indirect parent and child relations.
The result is
ID CHILD_ID
---------- ----------
1 1
1 3
1 5
1 6
3 3
3 5
3 6
5 5
6 6
E.g. for 1 you get itself, all its child's, their child's etc.
Using this category_denormobject your query can be simplified to
select *
from product
where category_id in
(select child_id from category_denorm where id = 1);

SQL select from two tabels wrong result

I can not understand what I am doing wrong.
My array:
ps_product
id_product
active
1
1
2
1
and
ps_product_sync
id_product
status
1
0
2
1
and my SQL code
SELECT pr_product.id_product, pr_product.active
FROM pr_product, pr_product_sync
WHERE pr_product.active = pr_product_sync.status
I get a result like this:
id_product
status
2
1
2
1
2
1
...
..
24 rows
I try the same with inner but result is the same, I don't have duplicates in the arrays... I don't understand why I get one row 24 times
PS. all tables looks good before posting/saving
If you query two tables and include both in the FROM clause, you create a Cartesian product of these tables. In other words, if one table has 4 rows and the other 6 rows, the result is 24 rows.
It is better to create an INNER JOIN using the key of the first table and the foreign key of the second table.
Change your query accordingly
SELECT pr_product.id_product, pr_product.active
FROM pr_product
INNER JOIN pr_product_sync
ON pr_product.id_product = ps_product_sync.id_product
WHERE pr_product.active = pr_product_sync.status
Of course, you could also compare the Keys in the WHERE clause or eliminate duplicates using DISTINCT. IMHO the most understandable solution is an INNER JOIN.
I hope this solves your problem.
Missing the primary key join.
Add:
WHERE
pr_product.id_product=pr_product_sync.id_product
AND pr_product.active=pr_product_sync.status

Merge two versions of database tables with conflicting keys

I have been asked to merge 2 Access databases. They are conflicting versions of the same file.
A database was emailed to somebody. (I know.) Somebody added records to the 'main' copy while somebody else added records to their copy. I want to add the new records from the 'unauthorised' copy into the main version, before utterly destroying all other copies.
Unfortunately, the database has several related tables. As would naturally happen when records are added, records in different versions have conflicting primary keys. These conflicting keys are also used as foreign keys in the new records. A foreign key reference to ID x means different things in the 2 versions.
Is there any hope? I thought of maybe importing it all into excel and using formulas to update the primary and foreign keys.
Is there any way to fix this programatically?
EDIT: Here is a picture showing the full relationships. Tables teachers, tests, and test_results have been changed; the others are the same in both.
In the main database, add a Long field named [oldID] to each table into which you need to append data. Then create Linked Tables pointing to the relevant tables in the "other" database. Since the table names are the same, the linked tables will have a '1' appended to them.
For this example, we have
[teachers]
ID teacher oldID
-- -------- -----
1 TeacherA
2 TeacherB
3 TeacherX
[teachers1]
ID teacher
-- --------
1 TeacherA
2 TeacherB
3 TeacherY
[tests]
ID test_name teacher oldID
-- -------------- ------- -----
1 TeacherA_Test1 1
2 TeacherA_Test2 1
3 TeacherB_Test1 2
4 TeacherX_Test1 3
[tests1]
ID test_name teacher
-- -------------- -------
1 TeacherA_Test1 1
2 TeacherA_Test2 1
3 TeacherB_Test1 2
4 TeacherY_Test1 3
5 TeacherY_Test2 3
Make a note of where the tables diverge. In this case the [teachers] tables diverge after ID=2. So, insert the new rows from [teachers1] into [teachers], putting [teachers1].[ID] into [teachers].[oldID] so we can map old IDs to new ones:
INSERT INTO [teachers] ([teacher], [oldID])
SELECT [teacher], [ID] FROM [teachers1] WHERE [ID]>2
So now we have
[teachers]
ID teacher oldID
-- -------- -----
1 TeacherA
2 TeacherB
3 TeacherX
4 TeacherY 3
Now when we append the new rows from [tests1] into [tests] we can use an INNER JOIN on [teachers].[oldID] to adjust the foreign key values that get inserted:
INSERT INTO [tests] ([test_name], [teacher], [oldID])
SELECT [tests1].[test_name], [teachers].[ID], [tests1].[ID]
FROM [tests1] INNER JOIN [teachers] ON [tests1].[teacher]=[teachers].[oldID]
giving us
[tests]
ID test_name teacher oldID
-- -------------- ------- -----
1 TeacherA_Test1 1
2 TeacherA_Test2 1
3 TeacherB_Test1 2
4 TeacherX_Test1 3
5 TeacherY_Test1 4 4
6 TeacherY_Test2 4 5
Notice how the [teacher] foreign key has been mapped from the value 3 in [tests1] to 4 in [tests], reflecting the new [teachers].[ID] value for 'TeacherY'.
You can then repeat the process for child tables of [tests].
(Once the cleanup is complete you can remove the table links and drop the [oldID] columns.)
Is there any way to fix this programatically?
No. This must be done by a human capable of reading and understanding the data and taking decisions.
Create a query with an inner join between table one and table two, another query with an outer join between table one and table two, and another query with an outer join between table two and table one.
Now you can study the differences and decide which version of similar records to be kept and which records are completely new and should be kept - some with a new Primary Key.

optimizing child/parent structure in one table with a lot of data

I have a table which has a simple parent child structure
products:
- id
- product_id
- time_created
- ... a few other columns
It is a parent if product_id IS NULL. Product id behaves here like parent_id. Data inside looks like this:
id | product_id
1 NULL
2 1
3 1
4 NULL
4 4
This table is updated every night a new versions are added.
Every user is using a lot of these products but only one version. User is notified if new rows are added for an product_id.
He can stop using id:2 and start using id:3. An another user will continue using id:2 etc.
products table is updated every night and it grows pretty fast. There are around 500000 rows at the moment and every night adds around 20000, probably 5-7000000 changes (new rows) per year.
Is there a way to optimize this database/table structure? Should I change anything? Is it a problem to have so much data in one table?
Your question is not clear. The sample data is suggesting that the parent-child relationship is only one level deep. If so, this is not a particularly hard problem. You can create a query to look up the most recent product id for each product -- and I'm assuming this is the one with the maximum id:
select id, product_id,
max(id) over (partition by coalsesce(product_id, id)) as biggest_id
from table t;
This is then a lookup table, to get the biggest id. It would produce:
id | product_id | biggest_id
1 NULL 3
2 1 3
3 1 3
4 NULL 4
4 4 4
If your table has deeper hierarchies, you can solve the problem using recursive CTEs, or by doing the calculation when the table is updated.

Frequently finding the first OrderLine for each Orders

Given the OrderLine table below:
OrderID OrderLineID
======= ===========
1 1
1 2
2 3
3 4
1 5
3 6
... ...
... ...
221 123 365 282
What is the most efficient way to find the FIRST OrderLine for each order, given that this information is required to access every now and then by the user?
This is my SQL to find the first OrderLine, but it takes about 3~5 seconds to execute every-time. (about 300k rows)
SELECT OrderID, MIN(OrderLineID)
FROM OrderLine
GROUP BY OrderID
It's very expensive to repeat this every-time when I need to find the first orderline to join with another table. Consider that changing the table structure is not an option, what possible solution do I have to improve this?
Try adding an index by OrderID and OrderLineID.
(You say you can't change the table structure. If you were allowed to change the structure, you could add a flag that identifies the first line of every order, then index by that flag.)