I'm creating a (postgres) table that has:
CREATE TABLE workers (id INT PRIMARY KEY, deleted_at DATE, account_id INT)
I'd like to have a uniqueness constraint only across workers that have not been deleted. Is there a good way to achieve this in sql? As an example:
id | date | account_id
1 | NULL | 1
# valid, was deleted
2 | yesterday | 1
# invalid, dup account
# 3 | NULL | 1
You want what Postgres calls a "partial index" (and other databases call a filtered index):
create unique index idx_workers_account_id on workers(account_id)
where deleted_at is null;
Here is the documentation on this feature.
Related
I have a table with logs which grew in size (~100M records) to the point where querying even the latest entries takes a considerable amount of a time.
I am wondering is there a smart way to make access to latest records fast (largest PK values) while also make inserts (appends) to it fast? I do not want to delete any data if possible, actually there is already a mechanism which monthly deletes logs older than N days.
Ideally what I mean is have the query
select * from t_logs order by log_id desc fetch first 50 rows only
to run in a split second (up to reasonable row count, say 500, if that matters).
The table is defined as follows:
CREATE TABLE t_logs (
log_id NUMBER NOT NULL,
method_name VARCHAR2(128 CHAR) NOT NULL,
msg VARCHAR2(4000 CHAR) NOT NULL,
type VARCHAR2(1 CHAR) NOT NULL,
time_stamp TIMESTAMP(6) NOT NULL,
user_created VARCHAR2(50 CHAR) DEFAULT user NOT NULL
);
CREATE UNIQUE INDEX logs_pk ON t_logs ( log_id ) REVERSE;
ALTER TABLE t_logs ADD (
CONSTRAINT logs_pk PRIMARY KEY ( log_id )
);
I am not really a DBA, so I am not familiar with all the performance tuning methods. I just use logs a lot and I was wondering if I could do something data-not-invasive to ease my pain. Up to my knowledge, what I did: tried re-computing statistics/re-analyze table (no effect), looked into query plan
-------------------------------------------
| Id | Operation | Name |
-------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | VIEW | |
| 2 | WINDOW SORT PUSHED RANK| |
| 3 | TABLE ACCESS FULL | T_LOGS |
-------------------------------------------
I would expect query to leverage index to perform the lookup, why doesn't it? Maybe this is a reason it takes so long to find the results?
Version: Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Mr Cave, in the accepted answer, seems to be right
alter table t_logs drop constraint log_pk;
drop index log_pk;
create unique index logs_pk on t_logs ( log_id );
alter table t_logs add (
constraint logs_pk primary key ( log_id )
);
Queries run super fast now, plan looks as expected:
-------------------------------------------------
| Id | Operation | Name |
-------------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | VIEW | |
| 2 | WINDOW NOSORT STOPKEY | |
| 3 | TABLE ACCESS BY INDEX ROWID| T_LOGS |
| 4 | INDEX FULL SCAN DESCENDING| LOGS_PK |
-------------------------------------------------
100 million rows isn't that large.
Why are you creating a reverse-key index for your primary key? Sure, that has the potential to reduce contention on inserts but were you really constrained by contention? That would be pretty unusual. Maybe you have an unusual environment. But my guess is that someone was trying to prematurely optimize the design for inserts without considering what that did to queries.
My wager would be that a nice, basic design would be more than sufficient for your needs
CREATE TABLE t_logs (
log_id NUMBER NOT NULL,
method_name VARCHAR2(128 CHAR) NOT NULL,
msg VARCHAR2(4000 CHAR) NOT NULL,
type VARCHAR2(1 CHAR) NOT NULL,
time_stamp TIMESTAMP(6) NOT NULL,
user_created VARCHAR2(50 CHAR) DEFAULT user NOT NULL
);
CREATE UNIQUE INDEX logs_pk ON t_logs ( log_id );
ALTER TABLE t_logs ADD (
CONSTRAINT logs_pk PRIMARY KEY ( log_id )
);
If you can't recreate the primary key for some reason, create an index on time_stamp and change your queries to use that
CREATE INDEX log_ts ON t_logs( time_stamp );
SELECT *
FROM log_ts
ORDER BY time_stamp DESC
FETCH FIRST 100 ROWS ONLY;
I have a postgresql database, which heavily relies on events from the outside, e.g. administrator changing / adding some fields or records might trigger a change in overall fields structure in other tables.
There lies the problem, however, as sometimes the fields changed by the trigger function are primary key fields. There is a table, which uses two foreign keys ids as the primary key, as in example below:
# | PK id1 | PK id2 | data |
0 | 1 | 1 | ab |
1 | 1 | 2 | cd |
2 | 1 | 3 | ef |
However, within one transaction (if I may call it such, since, in fact, it is a plpgsql function), the structure might be changed to:
# | PK id1 | PK id2 | data |
0 | 1 | 3 | ab |
1 | 1 | 2 | cd |
2 | 1 | 1 | ef |
Which, as you might have noticed, changed the 0th record's second primary key to 3, and the 2nd's to 1, which is the opposite of what they were before.
It is 100% certain that after the function has taken its effect there will be no collisions whatsoever, but I'm wondering, how can this be implemented?
I could, in fact, use a synthetic primary key as a BIGSERIAL, yet there is still a need for those two ids to be UNIQUE constained, so it wouldn't do the trick, unfortunately.
You can declare a constraint as deferrable, for example a primary key:
CREATE TABLE elbat (id int,
nmuloc int,
PRIMARY KEY (id)
DEFERRABLE);
You can then use SET CONSTRAINTS in a transaction to set deferrable constraints as deferred. That means that they can be violated temporarily during the transaction but must be fulfilled at the transaction's COMMIT.
Let's assume we have some data in our example table:
INSERT INTO elbat (id,
nmuloc)
VALUES (1,
1),
(2,
2);
We can now switch the IDs like this:
BEGIN TRANSACTION;
SET CONSTRAINTS ALL DEFERRED;
UPDATE elbat
SET id = 2
WHERE nmuloc = 1;
SELECT *
FROM elbat;
UPDATE elbat
SET id = 1
WHERE nmuloc = 2;
COMMIT;
There's no error even though the IDs are both 2 after the first UPDATE.
db<>fiddle
More on that can be found in the documentation, e.g. in CREATE TABLE (or ALTER TABLE) and SET CONSTRAINTS.
I have the following three tables:
CREATE TABLE group (
id SERIAL PRIMARY KEY,
name VARCHAR NOT NULL,
insert_date TIMESTAMP WITH TIME ZONE NOT NULL
);
CREATE TABLE customer (
id SERIAL PRIMARY KEY,
ext_id VARCHAR NOT NULL,
insert_date TIMESTAMP WITH TIME ZONE NOT NULL
);
CREATE TABLE customer_in_group (
id SERIAL PRIMARY KEY,
customer_id INT NOT NULL,
group_id INT NOT NULL,
insert_date TIMESTAMP WITH TIME ZONE NOT NULL,
CONSTRAINT customer_id_fk
FOREIGN KEY(customer_id)
REFERENCES customer(id),
CONSTRAINT group_id_fk
FOREIGN KEY(group_id)
REFERENCES group(id)
)
I need to find all of the groups which have not had any customer_in_group entities' group_id column reference them within the last two years. I then plan to delete all of the customer_in_groups that reference them, and finally delete that group after finding them.
So basically given the following two groups and the following 3 customer_in_groups
Group
| id | name | insert_date |
|----|--------|--------------------------|
| 1 | group1 | 2011-10-05T14:48:00.000Z |
| 2 | group2 | 2011-10-05T14:48:00.000Z |
Customer In Group
| id | group_id | customer_id | insert_date |
|----|----------|-------------|--------------------------|
| 1 | 1 | 1 | 2011-10-05T14:48:00.000Z |
| 2 | 1 | 1 | 2020-10-05T14:48:00.000Z |
| 3 | 2 | 1 | 2011-10-05T14:48:00.000Z |
I would expect just to get back group2, since group1 has a customer_in_group referencing it inserted in the last two years.
I am not sure how I would write the query that would find all of these groups.
As a starter, I would recommend enabling on delete cascade on foreing keys of customer_in_group.
Then, you can just delete the rows you want from groups, and it will drop the dependent rows in the child table. For this, you can use not exists:
delete from groups g
where not exists (
select 1
from customer_in_group cig
where cig.group_id = g.id and cig.insert_date >= now() - interval '2 year'
)
I'm looking to use two columns in Table A as foreign keys for either one of two tables: Table B or Table C. Using columns table_a.item_id and table_a.item_type_id, I want to force any new rows to either have a matching item_id and item_type_id in Table B or Table C.
Example:
Table A: Inventory
+---------+--------------+-------+
| item_id | item_type_id | count |
+---------+--------------+-------+
| 2 | 1 | 32 |
| 3 | 1 | 24 |
| 1 | 2 | 10 |
+---------+--------------+-------+
Table B: Recipes
+----+--------------+-------------------+-------------+----------------------+
| id | item_type_id | name | consistency | gram_to_fluid_ounces |
+----+--------------+-------------------+-------------+----------------------+
| 1 | 1 | Delicious Juice | thin | .0048472 |
| 2 | 1 | Ok Tasting Juice | thin | .0057263 |
| 3 | 1 | Protein Smoothie | heavy | .0049847 |
+----+--------------+-------------------+-------------+----------------------+
Table C: Products
+----+--------------+----------+--------+----------+----------+
| id | item_type_id | name | price | in_stock | is_taxed |
+----+--------------+----------+--------+----------+----------+
| 1 | 2 | Purse | $200 | TRUE | TRUE |
| 2 | 2 | Notebook | $14.99 | TRUE | TRUE |
| 3 | 2 | Computer | $1,099 | FALSE | TRUE |
+----+--------------+----------+--------+----------+----------+
Other Table: Item_Types
+----+-----------+
| id | type_name |
+----+-----------+
| 1 | recipes |
| 2 | products |
+----+-----------+
I want to be able to have an inventory table where employees can enter inventory counts regardless of whether an item is a recipe or a product. I don't want to have to have a product_inventory and recipe_inventory table as there are many operations I need to do across all inventory items regardless of item types.
One solution would be to create a reference table like so:
Table CD: Items
+---------+--------------+------------+-----------+
| item_id | item_type_id | product_id | recipe_id |
+---------+--------------+------------+-----------+
| 2 | 1 | NULL | 2 |
| 3 | 1 | NULL | 3 |
| 1 | 2 | 1 | NULL |
+---------+--------------+------------+-----------+
It just seems very cumbersome, plus I'd now need to add/remove products/recipes from this new table whenever they are added/removed from their respective tables. (Is there an automatic way to achieve this?)
CREATE TABLE [dbo].[inventory] (
[id] [bigint] IDENTITY(1,1) NOT NULL,
[item_id] [smallint] NOT NULL,
[item_type_id] [tinyint] NOT NULL,
[count] [float] NOT NULL,
CONSTRAINT [PK_inventory_id] PRIMARY KEY CLUSTERED ([id] ASC)
) ON [PRIMARY]
What I would really like to do is something like this...
ALTER TABLE [inventory]
ADD CONSTRAINT [FK_inventory_sources] FOREIGN KEY ([item_id],[item_type_id])
REFERENCES {[products] ([id],[item_type_id]) OR [recipes] ([id],[item_type_id])}
Maybe there is no solution as I'm describing it, so if you have any ideas where I can maintain the same/similar schema, I'm definitely open to hearing them!
Thanks :)
Since your products and recipes are stored separately, and appear to mostly have separate columns, then separate inventory tables is probably the correct approach. e.g.
CREATE TABLE dbo.ProductInventory
(
Product_id INT NOT NULL,
[count] INT NOT NULL,
CONSTRAINT FK_ProductInventory__Product_id FOREIGN KEY (Product_id)
REFERENCES dbo.Product (Product_id)
);
CREATE TABLE dbo.RecipeInventory
(
Recipe_id INT NOT NULL,
[count] INT NOT NULL,
CONSTRAINT FK_RecipeInventory__Recipe_id FOREIGN KEY (Recipe_id)
REFERENCES dbo.Recipe (Recipe_id )
);
If you need all types combined, you can simply use a view:
CREATE VIEW dbo.Inventory
AS
SELECT Product_id AS item_id,
2 AS item_type_id,
[Count]
FROM ProductInventory
UNION ALL
SELECT recipe_id AS item_id,
1 AS item_type_id
[Count]
FROM RecipeInventory;
GO
IF you create a new item_type, then you need to amend the DB design anyway to create a new table, so you would just need to amend the view at the same time
Another possibility, would be to have a single Items table, and then have Products/Recipes reference this. So you start with your items table, each of which has a unique ID:
CREATE TABLE dbo.Items
(
item_id INT IDENTITY(1, 1) NOT NULL
Item_type_id INT NOT NULL,
CONSTRAINT PK_Items__ItemID PRIMARY KEY (item_id),
CONSTRAINT FK_Items__Item_Type_ID FOREIGN KEY (Item_Type_ID) REFERENCES Item_Type (Item_Type_ID),
CONSTRAINT UQ_Items__ItemID_ItemTypeID UNIQUE (Item_ID, Item_type_id)
);
Note the unique key added on (item_id, item_type_id), this is important for referential integrity later on.
Then each of your sub tables has a 1:1 relationship with this, so your product table would become:
CREATE TABLE dbo.Products
(
item_id BIGINT NOT NULL,
Item_type_id AS 2,
name VARCHAR(50) NOT NULL,
Price DECIMAL(10, 4) NOT NULL,
InStock BIT NOT NULL,
CONSTRAINT PK_Products__ItemID PRIMARY KEY (item_id),
CONSTRAINT FK_Products__Item_Type_ID FOREIGN KEY (Item_Type_ID)
REFERENCES Item_Type (Item_Type_ID),
CONSTRAINT FK_Products__ItemID_ItemTypeID FOREIGN KEY (item_id, Item_Type_ID)
REFERENCES dbo.Item (item_id, item_type_id)
);
A few things to note:
item_id is again the primary key, ensuring the 1:1 relationship.
the computed column item_type_id (as 2) ensuring all item_type_id's are set to 2. This is key as it allows a foreign key constraint to be added
the foreign key on (item_id, item_type_id) back to the items table. This ensures that you can only insert a record to the product table, if the original record in the items table has an item_type_id of 2.
A third option would be a single table for recipes and products and make any columns not required for both nullable. This answer on types of inheritance is well worth a read.
I think there is a flaw in your database design. The best way to solve your actual problem, is to have Recipies and products as one single table. Right now you have a redundant column in each table called item_type_id. That column is not worth anything, unless you actually have the items in the same table. I say redundant, because it has the same value for absolutely every entry in each table.
You have two options. If you can not change the database design, work without foreign keys, and make the logic layer select from the correct tables.
Or, if you can change the database design, make products and recipies exist in the same table. You already have a item_type table, which can identify item categorization, so it makes sense to put all items in the same table
you can only add one constraint for a column or pair of columns. Think about apples and oranges. A column cannot refer to both oranges and apples. It must be either orange or apple.
As a side note, this can be somehow achieved with PERSISTED COMPUTED columns, however It only introduces overhead and complexity.
Check This for Reference
You can add some computed columns to the Inventory table:
ALTER TABLE Inventory
ADD _recipe_item_id AS CASE WHEN item_type_id = 1 THEN item_id END persisted
ALTER TABLE Inventory
ADD _product_item_id AS CASE WHEN item_type_id = 2 THEN item_id END persisted
You can then add two separate foreign keys to the two tables, using those two columns instead of item_id. I'm assuming the item_type_id column in those two tables is already computed/constraint appropriately but if not you may want to consider that too.
Because these computed columns are NULL when the wrong type is selected, and because SQL Server doesn't check FK constraints if at least one column value is NULL, they can both exist and only one or the other will be satisfied at any time.
I have a huge table that is partitioned by a partition id. Each partition can have a different number of fields in its unique constraint. Consider this table:
+----+---------+-------+-----+--+
| id | part_id | name | age | |
+----+---------+-------+-----+--+
| 1 | 1 | James | 12 | |
+----+---------+-------+-----+--+
| 2 | 1 | Mary | 33 | |
+----+---------+-------+-----+--+
| 3 | 2 | James | 1 | |
+----+---------+-------+-----+--+
| 4 | 2 | Mike | 19 | |
+----+---------+-------+-----+--+
| 5 | 3 | James | 12 | |
+----+---------+-------+-----+--+
For part_id: 1 I need a unique constraint on fields name and age. part_id: 2 needs a unique constraint on name. part_id: 3 needs a unique constraint on name. I am open to any database that can accomplish this.
Classic RDBMS is designed to work with stable schema. It means that the structure of your tables, columns, indexes, relations don't change often, each table has a fixed number of columns with fixed types and it is hard/inefficient to make them dynamic.
SQL Server has filtered indexes.
So, you can create a separate unique index for each partition.
CREATE UNIQUE NONCLUSTERED INDEX IX_Part1 ON YourTable
(
name ASC,
age ASC
)
WHERE (part_id = 1)
CREATE UNIQUE NONCLUSTERED INDEX IX_Part2 ON YourTable
(
name ASC
)
WHERE (part_id = 2)
CREATE UNIQUE NONCLUSTERED INDEX IX_Part3 ON YourTable
(
name ASC
)
WHERE (part_id = 3)
These DDL statements are static and the value of part_id is hard coded in them. Optimiser is able to use such indexes in queries that have the same WHERE filter, so they are useful not just for enforcing the constraint.
You can always write a procedure that would generate a text of the CREATE INDEX statement dynamically and run it via EXEC/sp_executesql. There may be some clever use of triggers on YourTable to create it on the fly as the data in your table changes, but in the end it will be some static CREATE INDEX statement.
You can create these indexes in advance for all possible values of part_id, even if there are no such actual values in the table yet.
If you have thousands of part_id and you want to create thousands of such unique constraints, then your current schema may not be quite appropriate.
SQL Server allows max 999 nonclustered indexes per table. See Maximum Capacity Specifications for SQL Server.
Are you trying to build some variation of EAV (entity-attribute-value) model?
Maybe there are non-relational DBMS that allow greater flexibility that would suit better for your task, but I don't have experience with them.
In oracle, the below is possible to create unique index dynamically
CREATE UNIQUE INDEX idx_part_id_dynamic ON partition_table part_id,
(CASE WHEN part_id = 1 THEN name, age
WHEN part_id = 3 THEN age
ELSE height
END );
);