Create materialised view without primary key column - sql

Is it possible to create materialised view without using one of the primary key.
Use Case:-
I have a table with two primary keys, user_id and app_id, I want to create view to fetch data on the basis of app_id regardless of user_id. I am trying to create materialised view but Cassandra is not allowing me to do so if I keep only one primary key.
I know the fact that, I can use "allow filtering" but this will not give 100% accuracy in data.

In Cassandra, materialized view should always include all existing primary key components, but they could be in the different order. So in this case, you can create MV with primary key of app_id, user_id, but this may lead to big partitions if you have very popular application.
But I suggest just to create a second table with necessary primary key, and populate it from your application - it could be even more performant than having materialized view, because it needs to read data from disk every time you insert/update/delete record in the main table. Plus, take into account that materialized views in Cassandra are experimental feature, and have quite a lot of problems.

Related

Error while dropping column from a table with secondary index (Scylladb)

While dropping a column from a table that contains secondary index I get the following error. I am using ScyllaDB version 3.0.4.
[Invalid query] message="Cannot drop column name on base table warehouse.myuser with materialized views"
Below are the example commands
create table myuser (id int primary key, name text, email text);
create index on myuser(email);
alter table myuser drop name;
I can successfully run the above statements in Apache Cassandra.
Default secondary indexes in Scylla are global and implemented on top of materialized views (as opposed to Apache Cassandra's local indexing implementation), which gives them new possibilities, but also adds certain restrictions. Dropping a column from a table with materialized views is a complex operation, especially if the target column is selected by one of the views or its liveness can affect view row liveness. In order to avoid these problems, dropping a column is unconditionally not possible when there are materialized views attached to a table. The error you see is a combination of that and the fact that Scylla's index uses a materialized view underneath to store corresponding base keys for each row.
The obvious workaround is to drop the index first, then drop the column and recreate the index, but that of course takes time and resources.
However, in some cases columns can be allowed to be dropped from the base table even if it has materialized views, especially if the column is not selected in the view and its liveness does not have any impact on view rows. For reference, I created an issue that requests implementing it in our bug tracker: https://github.com/scylladb/scylla/issues/4448

Creating Oracle tables with foreign keys that reference primary keys on materialized views

I have several materialized views in Oracle which I can query to get information.
Now I want to create several tables with foreign keys referencing those MVs and to do so, I have already "added" the corresponding primary keys to the MVs (as stated in adding primary key to sql view).
Then, when I execute my SQL create table query, I get an Oracle (ORA-02270) error: no matching unique or primary key for this column-list error at position 0, right at the beginning...
Am I doing something wrong? Is it possible what I am trying to do?
If not, how is it usually done?
When there are materialized views referenced by other tables' foreign keys, you have to take note on your views refresh method and how it affects your foreign keys.
Two things may prevent you from refreshing your materialized views:
1) The data in the tables referencing your views may reference lines that need to be updated or deleted. In that case you have to fix your data.
2) Your views' refresh method is complete. In complete refresh Oracle deletes all data in your mviews tables and repopulates them by rerunning their queries as you can see in Oracle site documentation - Refresh Types, while in fast refresh only the differences are applied to your mviews tables. Fast refresh is an incremental refresh and it won't work only if your foreign keys aren't respected by your data.
Now if there are mviews that can't be created with fast refresh (what Oracle calls them "Complex queries") then you can alter constraints to these mviews to deferrable as you can see here.
That way even complete refresh will work because Oracle only validates deferrable constraints by the end of current transaction. Therefore, as long as your refresh method is atomic, Oracle will issue an DELETE and than INSERT all rows back, all in one transaction.
In other words, in the next command to refresh your mview keep parameter atomic_refresh as true:
dbms_mview.refresh(LIST=>'MVIEW', METHOD =>'C', ATOMIC_REFRESH => TRUE);
By the way, this parameter's default value is TRUE, so just don't mention it and it will work.
The documentation states that:
View Constraints
Oracle does not enforce view constraints. However, operations on
views are subject to the integrity constraints defined on the
underlying base tables. This means that you can enforce constraints on
views through constraints on base tables.
and also:
View constraints are a subset of table constraints and are subject to
the following restrictions:
...
View constraints are supported only in DISABLE NOVALIDATE mode. You cannot specify any other mode. You must specify the keyword DISABLE
when you declare the view constraint. You need not specify NOVALIDATE
explicitly, as it is the default.
...
In practice, the above means that although constrains on views can be created, they are blocked and do not work. So as if they were not at all.
Apart from this, think for a moment what sense it would have foreign key constrainst created on tables, that would refer to a materialized view:
tables are always "online" and have always "fresh" data
materialized views can contain stale data
Imagine this case: You insert a record X into some table. This record is not visible in the materialized view yet, because the view is not refreshed at this moment. Then you try to insert record X into another table that has a foreign key constraint pointing to that materialized view. What the database should do ? Should the database reject the insert statement (since for now X is not visible yet in the view and the foreign key exists) ? If yes, then what about data integrity ? Mayby should it block amd wait until the view is refreshed ? Should it force the view to start refreshing or not in such a case?
As you can see, such a case involves many questions and difficult problems in the implementation, so Oracle simply does not allow for constrains on views.

PostgreSQL - "polymorphic table" vs 3 tables

I am using PostgreSQL 9.5 (but upgrade is possible to say 9.6).
I have permissions table:
CREATE TABLE public.permissions
(
id integer NOT NULL DEFAULT nextval('permissions_id_seq'::regclass),
item_id integer NOT NULL,
item_type character varying NOT NULL,
created_at timestamp without time zone NOT NULL,
updated_at timestamp without time zone NOT NULL,
CONSTRAINT permissions_pkey PRIMARY KEY (id)
)
-- skipping indices declaration, but they would be present
-- on item_id, item_type
And 3 tables for many-to-many associations
-companies_permissions (+indices declaration)
CREATE TABLE public.companies_permissions
(
id integer NOT NULL DEFAULT nextval('companies_permissions_id_seq'::regclass),
company_id integer,
permission_id integer,
CONSTRAINT companies_permissions_pkey PRIMARY KEY (id),
CONSTRAINT fk_rails_462a923fa2 FOREIGN KEY (company_id)
REFERENCES public.companies (id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION,
CONSTRAINT fk_rails_9dd0d015b9 FOREIGN KEY (permission_id)
REFERENCES public.permissions (id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION
)
CREATE INDEX index_companies_permissions_on_company_id
ON public.companies_permissions
USING btree
(company_id);
CREATE INDEX index_companies_permissions_on_permission_id
ON public.companies_permissions
USING btree
(permission_id);
CREATE UNIQUE INDEX index_companies_permissions_on_permission_id_and_company_id
ON public.companies_permissions
USING btree
(permission_id, company_id);
-permissions_user_groups (+indices declaration)
CREATE TABLE public.permissions_user_groups
(
id integer NOT NULL DEFAULT nextval('permissions_user_groups_id_seq'::regclass),
permission_id integer,
user_group_id integer,
CONSTRAINT permissions_user_groups_pkey PRIMARY KEY (id),
CONSTRAINT fk_rails_c1743245ea FOREIGN KEY (permission_id)
REFERENCES public.permissions (id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION,
CONSTRAINT fk_rails_e966751863 FOREIGN KEY (user_group_id)
REFERENCES public.user_groups (id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION
)
CREATE UNIQUE INDEX index_permissions_user_groups_on_permission_and_user_group
ON public.permissions_user_groups
USING btree
(permission_id, user_group_id);
CREATE INDEX index_permissions_user_groups_on_permission_id
ON public.permissions_user_groups
USING btree
(permission_id);
CREATE INDEX index_permissions_user_groups_on_user_group_id
ON public.permissions_user_groups
USING btree
(user_group_id);
-permissions_users (+indices declaration)
CREATE TABLE public.permissions_users
(
id integer NOT NULL DEFAULT nextval('permissions_users_id_seq'::regclass),
permission_id integer,
user_id integer,
CONSTRAINT permissions_users_pkey PRIMARY KEY (id),
CONSTRAINT fk_rails_26289d56f4 FOREIGN KEY (user_id)
REFERENCES public.users (id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION,
CONSTRAINT fk_rails_7ac7e9f5ad FOREIGN KEY (permission_id)
REFERENCES public.permissions (id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION
)
CREATE INDEX index_permissions_users_on_permission_id
ON public.permissions_users
USING btree
(permission_id);
CREATE UNIQUE INDEX index_permissions_users_on_permission_id_and_user_id
ON public.permissions_users
USING btree
(permission_id, user_id);
CREATE INDEX index_permissions_users_on_user_id
ON public.permissions_users
USING btree
(user_id);
I will have to run SQL query like this a lot times:
SELECT
"permissions".*,
"permissions_users".*,
"companies_permissions".*,
"permissions_user_groups".*
FROM "permissions"
LEFT OUTER JOIN
"permissions_users" ON "permissions_users"."permission_id" = "permissions"."id"
LEFT OUTER JOIN
"companies_permissions" ON "companies_permissions"."permission_id" = "permissions"."id"
LEFT OUTER JOIN
"permissions_user_groups" ON "permissions_user_groups"."permission_id" = "permissions"."id"
WHERE
(companies_permissions.company_id = <company_id> OR
permissions_users.user_id in (<user_ids> OR NULL) OR
permissions_user_groups.user_group_id IN (<user_group_ids> OR NULL)) AND
permissions.item_type = 'Topic'
Let's say we have about 10000+ permissions and similar amount of records inside other tables.
Do I need to worry about performance?
I mean... I have 4 LEFT OUTER JOINs and it should return results pretty fast (say <200ms).
I was thinking about declaring 1 "polymorphic" table, something like:
CREATE TABLE public.permissables
(
id integer NOT NULL DEFAULT nextval('permissables_id_seq'::regclass),
permission_id integer,
resource_id integer NOT NULL,
resource_type character varying NOT NULL,
created_at timestamp without time zone NOT NULL,
updated_at timestamp without time zone NOT NULL,
CONSTRAINT permissables_pkey PRIMARY KEY (id)
)
-- skipping indices declaration, but they would be present
Then I could run query like this:
SELECT
permissions.*,
permissables.*
FROM permissions
LEFT OUTER JOIN
permissables ON permissables.permission_id = permissions.id
WHERE
permissions.item_type = 'Topic' AND
(permissables.owner_id IN (<user_ids>) AND permissables.owner_type = 'User') OR
(permissables.owner_id = <company_id> AND permissables.owner_type = 'Company') OR
(permissables.owner_id IN (<user_groups_ids>) AND permissables.owner_type = 'UserGroup')
QUESTIONS:
Which options is better/faster? Maybe there is better way to do this?
a) 4 tables (permissions, companies_permissions, user_groups_permissions, users_permissions)
b) 2 tables (permissions, permissables)
Do I need to declare different indexes than btree on permissions.item_type ?
Do I need to run a few times per day vacuum analyze for tables to make indices work (both options)?
EDIT1:
SQLFiddle examples:
wildplasser suggestion (from comment), not working: http://sqlfiddle.com/#!15/9723f8/1
Original query (4 tables): http://sqlfiddle.com/#!15/9723f8/2
{ I also removed backticks in wrong places thanks #wildplasser }
I'd recommend abstracting all access to your permissions system to a couple of model classes. Unfortunately, I've found that permission systems like this do sometimes end up being performance bottlenecks, and I've found that it is sometimes necessary to significantly refactor your data representation.
So, my recommendation is that try to keep the permission-related queries isolated in a few classes and try to keep the interface to those classes independent of the rest of the system.
Examples of good approaches here are what you have above. You don't actually join against the topics table; you already have the topic IDs you care about when you're constructing the permissions.
Examples of bad interfaces would be class interfaces that make it easy to join the permissions tables into arbitrary other SQL.
I understand you asked the question in terms of SQL rather than a particular framework on top of SQL, but from the rails constraint names it looks like you are using such a framework, and I think taking advantage of it will be useful to your future code maintainability.
In the 10,000 rows cases, I think either approach will work fine.
I'm not actually sure that the approaches will be all that different. If you think about the query plans generated, assuming you're getting a small number of rows from the table, the join might be handled with a loop against each table in exactly the same way that the or query might be handled assuming that the index is likely to return a small number of rows.
I have not fed a plausible data set into Postgres to figure out whether that's what it actually does given a real data set. I have reasonably high confidence that Postgres is smart enough to do that if it makes sense to do so.
The polymorphic approach does give you a bit more control and if you run into performance problems you may want to check if moving to it will help.
If you choose the polymorphic approach, I'd recommend writing code to go through and check to make sure that your data is consistent. That is, make sure that resource_type and resource_id corresponds to actual resources that exist in your system.
I'd make that recommendation in any case where application concerns force you to denormalize your data such that database constraints are not sufficient to enforce consistency.
If you start running into performance problems, here are the sorts of things you may need to do in the future:
Create a cache in your application mapping objects (such as topics) to the set of permissions for those objects.
Create a cache in your application caching all the permissions a given user has (including the groups they are a member of) for the objects in your application.
Materializing the user group permissions. That is create a materialized view that combines the user_group permissions with the user permissions and the user group memberships.
In my experience the thing that really kills performance of permission systems is when you add something like permitting one group to be a member of another group. At that point you very quickly get to a point where you need caching or materialized views.
Unfortunately, it's really hard to give more specific advice without actually having your data and looking at real query plans and real performance. I think that if you prepare for future changes you'll be fine though.
Maybe it's an obvious answer, but I think the option with 3 tables should be just fine. SQL databases are good at doing join operations and you have 10,000 records - this is not a big amount of data at all, so I am not sure what makes you think there will be a performance problem.
With proper indexes (btree should be OK), it should work fast and actually you can go just a bit further and generate the sample data for you tables and see how your query actually works on real amount of data.
I also don't think you'll need to worry about something like running vacuum manually.
Regarding the option two, polymorphic table, it can be not very good as you now have single resource_id field which can point out to different tables which is a source of problems (for example, due to a bug you can have a record with resource_type=User and resource_id pointing to Company - table structure doesn't prevent it).
One more note: you do not tell anything about relations between User, UserGropup and Company - if they are all related too, it may be possible to fetch permissions just using user id(s), joining also gropus and companies to users.
And one more: you don't need ids in many-many tables, nothing bad happens if you have them, but it's enough to have permission_id and user_id and make them to be composite primary key.
You can try to denormalize the many-to-many relations in a permission field on each of the 3 tables (user, user_group, company).
You can use this field to store the permissions in JSON format, and use it only for reading (SELECTs). You can still use the many-to-many tables for changing the permissions of specific users, groups and companies, just write a trigger on them, that will update the denormalized permission field whenever there is a new change on the many-to-many table. With this solution you will still get fast query execution time on the SELECTs, while keeping the relationship normalized and in compliance with database standards.
Here is an example script, that I have written for mysql for a one-to-many relation, but a similar thing can be applied for your case as well:
https://github.com/martintaleski/mysql-denormalization/blob/master/one-to-many.sql
I have used this approach several times, and it makes sense when the SELECT statements outnumber and are more important than the INSERT, UPDATE and DELETE statements.
In case you do not often change your permissions, materialized views might speed up your search enormously. I will prepare an example based on your setting later today and will post it. Afterwards, we can do some benchmark.
Nevertheless, materialized views require an update of the materialized view after changing the data. So that solution might be fast, but will speed up your queries only if basic data are not changed so often.

How to validate user's access to specific rows in a SQL table?

I'm working on a project and I'm new to both web apps and SQL, so bear with me. I'm building an API and I want to make sure that my users only have access to certain rows in a specific table that have a foreign key to their customer id in another table, but have to be validated by user id in another table. (A single Customer has multiple Users and owns multiple Assets. For right now, all of the Customer's Users can access any Asset, but no Customers share an Asset or User.) The way I can think to do this is to do
SELECT * FROM [Asset] WHERE Id=#AssetId AND CustomerId=(SELECT CustomerId FROM [User] WHERE UserId=#UserId);
This is great, but with many entries in the Asset and User tables, this query could take up a ton of time. This is bad since every request made to my API that needs the Asset data should be doing this check. I could set up an index, and in fact UserId is a Secondary Key in User because it's a unique identifier from the auth provider, but I'm not sure if I should add an index for CustomerId in Asset. The Asset table should grow relatively slowly compared to some other tables (have a messaging record table for auditing purposes), but I'm not sure if that's the right answer, or if there's some simpler answer that's more optimized. Or is this kind of query so fast at scale that I have nothing to worry about?
For your particular case, it looks like the perfect context to build a junction table between the User table and the Asset table. Both field together will become the primary key. Individually, AssetId and UserId will be foreign keys.
Let's say the junction table is called AssetUser.
Foreign keys :
CONSTRAINT [FK_AssetUser_User] FOREIGN KEY ([UserId]) REFERENCES [User]([UserId])
CONSTRAINT [FK_AssetUser_Asset] FOREIGN KEY ([AssetId]) REFERENCES [Asset]([AssetId])
Primary key :
CONSTRAINT [PK_AssetUser] PRIMARY KEY([AssetId], [UserId]));
You shouldn't worry about scale too much unless you are going to have ALOT of data and/or the performance is critical in your application. If so, you have the option to use hadoop or to migrate to a NoSQL database.

SQL table performance with foreign key

I have a website that needs to do a lot of active searching of users. I have a User table which contains links to all the full user details but that is only really of interest when looking at your own account. When searching for other users, there is very limited information you need so in order to make searches faster and more efficient, every time you update your user details, the code writes an entry to a separate table called UserLight - which only contains about 8 columns and is all pure data - ie no links to other child tables or collection objects, just string data for speed. Each user can only have one UserLight entry at a time which is the summary representation of how their account appears to other users.
My question is for performance, does it matter that I am making the UserId a foreign key constraint with the User table? So you cannot create a UserLight entry without the corresponding row in User, and also so when you delete the User row, it automatically cascades and deletes the UserLight entry. That is ideal and how I would like to have it but I'm just wondering if having this FK constraint on the UserLight table in any way slows down the performance on read or write operations to/from this table? If it does, I am happy to drop the FK constraint and have a completely isolated table with no constraints or external references to other objects to speed up performance, and just manage housekeeping manually, but if the FK constraint doesnt affect performance at all - I would prefer to keep it.
It will not hamper your performance instead its preferred to have data constrained so as to avoid insert/delete/update anomalies.