Database support for immutable fields - sql

We're about to start a new project and I'm breaking down the models at the moment. Most of the entities I'm modelling are meant to be immutable.
While I can control this to a certain extent at the code level by using something like django-immutablemodel, I'd be more comfortable if I could enforce this at the database level as well.
I'm planning to use postgresql although would be willing to consider alternatives if they supported this. From what I can tell the two main ways to do this currently are:
Add a set of triggers to make sure that immutable fields aren't modified
Enforce the immutability through user rights (i.e. don't give user update rights to columns that you want immutable)
I'd be interested if anyone has tried these methods and can comment on them or knows a better way to do this.
For some fields I want to be an effective write-once, so if the field is NULL allow it to be updated to a value, but never allow a field with a value to be updated. That would suggest I need to go down the trigger route.

If most of the entities (tables? you mention columns later on) are immutable then place that information in separate tables and revoke all access privileges to these tables. Create a second table for modifiable data, again with all privileges revoked, and link the two with a key. Now create a view that is built from both tables and create an INSTEAD OF INSERT OR UPDATE OR DELETE trigger which restricts updates to the modifiable table.
There are other solutions possible, such as with column privileges, but the above solution has the nice side effect that you can optimize read-only table access and that you only have to back up the few tables that are modifiable.

Related

How to design a immutable append-only database?

for a project I need to implement a database which is immutable and only allows new entries. Editing or deleting entries should be impossible in any case.
I was thinking about a database which allows editing and deleting only for admins (so only me). However, I'm unsure if that is 100% safe or if it's possible to illegally get admin rights and forge the data. So the best solution would be to have a database which does not offer editing or deleting in the first place.
Suggestions appreciated! Thanks
PostgreSQL supports, since 9.5, Row Security Policies, which allow you to define select, insert, delete and update policies depending on the user, and/or some fields values in the table. You might find what you search there.
The simplest way is to GRANT separated rights to INSERT/UPDATE/DELETE to users but it may be insufficient for some business rules. However, many DBMS (SQL Server for example) support INSTEAD OF triggers which can quietly bypass any DELETE/UPDATE and process INSERT depending on your custom criteria implemented in trigger code.
You can also define an updateable view having INSTEAD OF triggers to insert-only data.

Grant/Revoke Oracle SQL

I want to set grant/revoke and create roles in oracle apex. However I do not want to do this for users but for data in a table e.g based on the Id in the table I want to revoke their permissions to update any data. Is this possible and what is the best way to do this.
You can't do this for a specific row in your table. You can only issue the GRANT/REVOKE for a table as a whole.
http://docs.oracle.com/javadb/10.8.3.0/ref/rrefsqljgrant.html
You would have to deal with updates to a specific data in your table at the application (PL/SQL) level.
A couple of options:
You could either use Oracle Label Security.
You can create a view on top of the table, which would have a query based on the current user's EMPLOYEE_ID.
A few notes, having done things like this in the past on PostgreSQL.
It is possible to create views on top of data for this sort of thing.
Row level security is what you usually want to do. See http://www.dba-oracle.com/concepts/restricting_access.htm for a tutorial.
The first approach has all the issues with the second approach and then some. The reason is that you have to write a policy engine into your view. In the second approach you get a policy engine you can just apply policies to.
A few notes from experience:
You really want to avoid looking up permission data from other rows in a table if you are ever going to review things many rows at a time.
Think through the performance implications of permission checking carefully before you implement it.
This sort of thing is usually a performance headache that takes some time to resolve. Be prepared for a long period of optimizations as use cases you had not thought of become important.

saving track of changes made by users in a Multi-user sql database

I'm working on a design of a relational database. It has several tables and there are multiple users on application level. I need to know that changes to a certain record of a certain table are made, by which user, which time, and what has actually changed. There is a table for saving the user's information and this table is also included in this behavior.
How should I do this in the SQL database design so I can let users see which one of them made these changes?
What you want is a Wiki-like versioning. Basically, for every table you want to keep versions, you'll want to create at least a copy of that table with the fields you mentioned added (userid, when it was added). That's probably all there is to it, as long as you only need to track changes. Then, upon an edit, you just create a backup of the current row in that copied table and put the new one in the actual table. This way you can (hopefully) add the versioning without having to touch existing presentational code.
It gets a little more tricky, if you need to record additional actions like creation of new rows and deletion.
If you need a code example, just have look under the hood of some Wiki like https://mediawiki.org/
For starters you can look at sql server version tracking mechanisms (row versioning or row changes). After that you can look at sql server audit features. I think sql server audit would be the best for your needs.
On the other hand, if you want to make ad-hok versioning then YOU MUST NOT go to triggers. Imagine, you must create triggers for all tables for inserts, updates and deletes. This IS bad practice.
I think ad-hoc versioning should be avoided (degradation in performance and difficult to support) but in case it cannot be avoided, I would surely use CONTEXT_INFO in order to track current user and then I would try to create something that would read the schema of the table, I would get changes by using sql server change tracking mechanisms and store that in a tablename, changeduser, changedtime, column, prevValue, newValue style. I would not replicate each and every table for the changes.

In Oracle: how can I tell if an SQL query will cause changes without executing it?

I've got a string containing an SQL statement. I want to find out whether the query will modify data or database structure, or if it will only read data. Is there some way to do this?
More info: In our application we need to let the users enter SQL-queries, mainly as part of the applications report system. These SQL queries should be allowed to read whatever they like from the databse, but they shouldn't be allowed to modify anything. No updates, deletes insert, table drops, constraint removals etc.
As of now I only test whether the first word in the string is "select", but this is too constricting and too insecure.
You should grant only select privileges on your tables for the login used by the application to be sure.
Create a new user for that part of the application that only has select privileges. Bear in mind that you'll also need to create synonyms for all the tables/views that that "read-only" user will be able to view.
The "regular" part of your application will still be able to do other operations (insert, update, delete). Just the reporting will use the read-only user.
As Horacio suggests, it is also a good idea/practice to add "wrapper" views that only expose what you want to expose. Some sort of "public API". This can give you flexibility if you need to change the underlying tables and don't want to/can't change the reports to the new definitions of said tables. This might, however, be seen as a lot of "extra work".
I agree with others that the right thing to do is use a separate schema with limited access & privileges for those queries that should be read-only.
Another option, however, is to set the transaction read-only before executing the statement entered by the user (SET TRANSACTION READ ONLY).
Create VIEWS to expose the data to end users, this is worthy because of three things:
The end user doesn't know how really your database look like.
You may can provide a simpler way to extract some pieces of data.
You can create the view with a read-only constraint:
CREATE VIEW items (name, price, tax)
AS SELECT name, price, tax_rate
FROM item
WITH READ ONLY;
Something that has worked well for me in the past, but may not fit your situation:
Use stored procedures to implement an API for the application. All modifications are done via that API. The procedures exposed to the front end are all complete units of work, and those procedures are responsible for rights enforcement.
The users running the front end application are only allowed to call the API stored procedures and read data.
Since the exposed API does complete units of work that correspond to actions the user could take via the GUI, letting them run the procedures directly doesn't get them any additional ability, nor allow them to corrupt the database accidently.
SELECT * FROM table FOR UPDATE works even with only SELECT privilege, and can still cause a lot of damage. If you want to be safe, the read only transactions are better.

Ideas for Combining Thousand Databases into One Database

We have a SQL server that has a database for each client, and we have hundreds of clients. So imagine the following: database001, database002, database003, ..., database999. We want to combine all of these databases into one database.
Our thoughts are to add a siteId column, 001, 002, 003, ..., 999.
We are exploring options to make this transition as smoothly as possible. And we would LOVE to hear any ideas you have. It's proving to be a VERY challenging problem.
I've heard of a technique that would create a view that would match and then filter.
Any ideas guys?
Create a client database id for each of the client databases. You will use this id to keep the data logically separated. This is the "site id" concept, but you can use a derived key (identity field) instead of manually creating these numbers. Create a table that has database name and id, with any other metadata you need.
The next step would be to create an SSIS package that gets the ID for the database in question and adds it to the tables that have to have their data separated out logically. You then can run that same package over each database with the lookup for ID for the database in question.
After you have a unique id for the data that is unique, and have imported the data, you will have to alter your apps to fit the new schema (actually before, or you are pretty much screwed).
If you want to do this in steps, you can create views or functions in the different "databases" so the old client can still hit the client's data, even though it has been moved. This step may not be necessary if you deploy with some downtime.
The method I propose is fairly flexible and can be applied to one client at a time, depending on your client application deployment methodology.
Why do you want to do that?
You can read about Multi-Tenant Data Architecture and also listen to SO #19 (around 40-50 min) about this design.
The "site-id" solution is what's done.
Another possibility that may not work out as well (but is still appealing) is multiple schemas within a single database. You can pull common tables into a "common" schema, and leave the customer-specific stuff in customer-specific schema. In some database products, however, the each schema is -- effectively -- a separate database. In other products (Oracle, DB2, for example) you can easily write queries that work in multiple schemas.
Also note that -- as an optimization -- you may not need to add siteId column to EVERY table.
Sometimes you have a "contains" relationship. It's a master-detail FK, often defined with a cascade delete so that detail cannot exist without the parent. In this case, the children don't need siteId because they don't have an independent existence.
Your first step will be to determine if these databases even have the same structure. Even if you think they do, you need to compare them to make sure they do. Chances are there will be some that are customized or missed an upgrade cycle or two.
Now depending on the number of clients and the number of records per client, your tables may get huge. Are you sure this will not create a performance problem? At any rate you may need to take a fresh look at indexing. You may need a much more powerful set of servers and may also need to partion by client anyway for performance.
Next, yes each table will need a site id of some sort. Further, depending on your design, you may have primary keys that are now no longer unique. You may need to redefine all primary keys to include the siteid. Always index this field when you add it.
Now all your queries, stored procs, views, udfs will need to be rewritten to ensure that the siteid is part of them. PAy particular attention to any dynamic SQL. Otherwise you could be showing client A's information to client B. Clients don't tend to like that. We brought a client from a separate database into the main application one time (when they decided they didn't still want to pay for a separate server). The developer missed just one place where client_id had to be added. Unfortunately, that sent emails to every client concerning this client's proprietary information and to make matters worse, it was a nightly process that ran in the middle of the night, so it wasn't known about until the next day. (the developer was very lucky not to get fired.) The point is be very very careful when you do this and test, test, test, and test some more. Make sure to test all automated behind the scenes stuff as well as the UI stuff.
what I was explaining in Florence towards the end of last year is if you had to keep the database names and the logical layer of the database the same for the application. In that case you'd do the following:
Collapse all the data into consolidated tables into one master, consolidated database (hereafter referred to as the consolidated DB).
Those tables would have to have an identifier like SiteID.
Create the new databases with the existing names.
Create views with the old table names which use row-level security to query the tables in the consolidated DB, but using the SiteID to filter.
Set up the databases for cross-database ownership chaining so that the service accounts can't "accidentally" query the base tables in the consolidated DB. Access must happen through the views or through stored procedures and other constructs that will enforce row-level security. Now, if it's the same service account for all sites, you can avoid the cross DB ownership chaining and assign the rights on the objects in the consolidated DB.
Rewrite the stored procedures to either handle the change (since they are now referring to views and they don't know to hit the base tables and include SiteID) or use InsteadOf Triggers on the views to intercept update requests and put the appropriate site specific information into the base tables.
If the data is large you could look at using a partioned view. This would simplify your access code as all you'd have to maintain is the view; however, if the data is not large, just add a column to identify the customer.
Depending on what the data is and your security requirements the threat of cross contamination may be a show stopper.
Assuming you have considered this and deem it "safe enough". You may need/want to create VIEWS or impose some other access control to prevent customers from seeing each-other's data.
IIRC a product called "Trusted Oracle" had the ability to partition data based on such a key (about the time Oracle 7 or 8 was out). The idea was that any given query would automagically have "and sourceKey = #userSecurityKey" (or some such) appended. The feature may have been rolled into later versions of the popular commercial product.
To expand on Gregory's answer, you can also make a parent ssis that calls the package doing the actual moving within a foreach loop container.
The parent package queries a config table and puts this in an object variable. The foreach loop then uses this recordset to pass variables to the package, such as your database name and any other details the package might need.
You table could list all of your client databases and have a flag to mark when you are ready to move them. This way you are not sitting around running the ssis package on 32,767 databases. I'm hooked on the foreach loop in ssis.