I'm trying to design a system where an administrator will have to approve changes to the data and other various administrative tasks -- add a user, add an admin etc.
My idea is to have a notification table that contains these notifications, but the problem is that a notification can be any of the previously mentioned types, ie it's data is stored in one of many tables. Here is a picture to describe my current plan -- note I'm sure that it's not a proper ER diagram.
full_screen
Also, the data goes into a pending table, that reflects the table it will eventually wind up in, provided the data is approved -- it's a staging ground of sorts. So, a pending_user is a user that is not in the user table. And as you can see the user table, amongst others, is not shown here, but one can use their imagination.
I'm concerned that the multiple null values in the pending table will have adverse effects that I'm not totally aware of, such as increased space usage and possibly increase query time. Also, I'm not sure how I'll implement the retrieval of these notifications. My naive approach is to select the first X notifications, analyze the rows to find the non-null column, retrieve the appropriate data and then load all the data in a response.
Is there a more straight forward pattern for this type of problem?
Thanks in advance for any help.
I think, the traditional way is to provide various levels of access/read/write rights to users. These access rights define what actions a user can and can't perform. In this traditional approach if a user has access to a certain function, he can do it without further approval.
Also, traditionally there are some kind of audit logs that contain a trace of all important changes to the data. With such logs it would be possible to know who made a change (and when).
If you need to build a two-stage system, where a change has to go through an approval, I'd add a flag column to each important table that would indicate that values in the given row are not final and have to be approved. The table would store all historical changes to the data and with the help of this flag the system would know which variant is the latest approved version and which variant is pending and waiting for approval.
I would not try to make a single universal table that would hold data related to changes in many different tables. Each table is different and approval process for each table is likely to be different. I doubt that you'll have more than a dozen entities that are important enough to go through this approval process.
Related
I am currently developing a backend system that has two endpoints of concern that interacts with a common relational database table. The main purpose of this system is an after-registration email verification system that has a time limit.
Let's suppose there are three tables that contain the users that are pending verification, already verified, and out of time for verification. These tables will contain similar attributes of the users. One user (represented by a unique ID) should exist in only one of these tables.
The first endpoint is the verification endpoint, which will be triggered by the user through a verification link (e.g., www.hello.com/verify?token=XXXX). The to-be-verified user will be searched through the pending table. If not found, it means that the token is expired and nothing will be done after. Otherwise, it will be moved to the verified table. Moving, in this case, means that the selected row will be removed from the first table, and then will be inserted into the second table. Therefore, at least 3 queries will be executed as below, with the last two could be on a single transaction.
SELECT * FROM pending WHERE pending.id = id;
DELETE FROM pending WHERE pending.id = id;
INSERT INTO verified VALUES (what we get from SELECT);
The second endpoint is the expired users cleaning endpoint, which will be triggered by some kind of scheduler. Let's assume it will be triggered exactly when the user's verification token just expired. The overall task will be similar to the first endpoint, but the data row will be moved into the out of time table instead, and we assume that the user is already verified when we could not find the specified user when using SELECT.
SELECT * FROM pending WHERE pending.id = id;
DELETE FROM pending WHERE pending.id = id;
INSERT INTO outoftime VALUES (what we get from SELECT);
I believe the problem may arise if these two endpoints are unfortunately triggered at the same time (i.e., the user verify themselves right at the expiration time) by two concurrent processes. Both processes might manage to successfully find the user from SELECT before running DELETE. Therefore, they both will also run INSERT, causing the user data to be inserted into two tables, violating our rule (one user should exist in only one of these tables).
An ideal solution for me would be to find a way to detect and "fail" one of the two processes, which will produce a similar result to the more common situation where that process starts after another process has already done its job (i.e., the second process will terminate when it fails to retrieve a user from SELECT). The choice of the process to be failed is not significant in this case; either of the two would work.
I am aware that using locks is one of the possible solutions theoretically, by covering each critical section with a lock acquisition and release. However, I am not sure whether it is a good practice or not in this problem.
Are there any common design patterns or ideas that could solve this problem?
Please note that no specific technology/database stacks have been chosen yet.
Thanks!
Edit: There are multiple tables in this case since I found that the frequency of access in each type of user may not equal, so we could use different system specifications for each table. For example, the out-of-time table is more like an archive--just a big pile of data with minimal access, while the active table will be accessed every time there are changes to the user; so they might require better hardware, etc. Using a status column seems to be one solution though. However, will there be a similar situation in system design where this kind of problem is inevitable? How it will be dealt with?
I need to create an Audit table that is going to track the actions (insert, update, delete) of my tables in the database and add new row with date, row id, table name and a few more details, so I will know what action happened and when.
So basically from my understanding I need a trigger for each table which is going to track insert/update/delete and a trigger on the database which is going to track new table creation.
My main problem is understanding how to connect between those things so when a new table is being created a trigger will be created for that table which is going to track the actions and add new rows for the Audit table as needed.
Is it possible to make a DDL trigger for create_table and inside of it another trigger for insert / update / delete ?
What you're hoping for is not possible. And I'd strongly advise that you'd be better off thinking about what you really want to achieve at a business level with auditing. It will yield a much simpler and more practical solution.
First up
...trigger on the database which is going to track new table creation.
I cannot stress enough how terrible this idea is. Who exactly has such unfettered access to you database that they can create tables without going through code-review and QA? Which should of course be on the gated pathway towards production. Once you realise that schema changes should not happen ad-hoc, it's patently obvious that you don't need triggers (which are by their very nature reactive) to do something because the schema changed.
Even if you could write such triggers: it's at a meta-programming level that simply isn't worth the effort of trying to foresee all possible permutations.
Better options include:
Requirements assessment and acceptance: This is new information in the system. What are the audit requirements?
Design review: New table; does it need auditing?
Test design: How to test an audit requirements?
Code Review: You've added a new table. Does it need auditing?
Not to mention features provided by tools such as:
Source Control.
Db deployment utilities (whether home-grown or third party).
Part two
... a trigger will be created for that table which is going to track the actions and add new rows for the Audit table as needed.
I've already pointed out why doing the above automatically is a terrible. Now I'm going a step further to point out that doing the above at all is also a bad idea.
It's a popular approach, and I'm sure to get some flack from people who've nicely compartmentalised their particular flavour of it; swearing blind how much time it "saves" them. (There may even be claims to it being a "business requirement"; which I can assure you is more likely a misstated version of the real requirement.)
There are fundamental problems with this approach:
It's reactive instead of proactive. So it usually lacks context.
You'll struggle to audit attempted changes that get rolled back. (Which can be a nightmare for debugging and usually violates real business audit requirements.)
Interpreting audit will be a nightmare because it's just raw data. The information is lost in the detail.
As columns are added/renamed/deleted your audit data loses cohesion. (This is usually the least of problems though.)
These extra tables that always get updated as part of other updates can wreak havoc on performance.
Usually this style of auditing involves: every time a column is added to the "base" table, it's also added to the "audit" table. (This ultimately makes the "audit" table very much like a poorly architected persistent transaction log.)
Most people following this approach overlook the significance of NULLable columns in the "base" tables.
I can tell you from first hand experience, interpreting such audit trails in any but the simplest of cases is not easy. The amount of time wasted is ridiculous: investigating issues, training others to be able to interpret them correctly, writing utilities to try make working with these audit trails less painful, painstakingly documenting findings (because the information is not immediately apparent in the raw data).
If you have any sense of self-preservation you'll heed my advice.
Make it great
(Sorry, couldn't resist.)
A better approach is to proactively plan for what needs auditing. Push for specific business requirements. Note that different cases may need different auditing techniques:
If user performs action X, record A details about the action for legal traceability.
If user attempts to do Y but it prevented by system rules, record B details to track rule system integrity.
If user fails to log in, record C details for security purposes.
If system is upgraded, record D details for troubleshooting.
If certain system events occur, record E details ...
The important thing is that once you know the real business requirements, you won't be saying: "Uh, let's just track everything. It might be useful." Instead you'll:
Be able to produce a cleaner more appropriate and reliable design for each distinct kind of auditing.
Be able to test that it behaves as required!
Be able to use the audit data more easily whenever it's needed.
DB: Oracle11gR2
OS: Linux
I want to drop USER1 Oracle user which is already locked for few weeks now.
I can run "drop user USER1 cascade;" to drop user but before dropping want to confirm nobody else is using or used objects after user was locked.
How to verify in Oracle that nobody is accessing or have accessed USER1 objects in last month or so?
Is there a db query/view available which we can use to make sure it's safe to run DROP command?
Thanks
Ideally, you would have enabled auditing of accesses on the various objects when you locked the account and left that in place for however long you would need to feel comfortable. A month may be sufficient but there may be quarterly or annual processes as well that you need to consider.
Assuming that you didn't enable auditing at the time and don't want to enable auditing now and wait another month, there are less complete approaches that you may be able to use (with the understanding that those approaches are going to provide less certainty).
You can query v$segment_statistics joined to v$statname to look at a variety of statistics about the table segments. "db block gets" and "consistent gets", for example, would show you how many times some process did a current or a consistent read on a block in a table. But it won't tell you what did the reads-- the background job that gathers statistics, for example, might read the data from the table. Those tables should accumulate data since the database was last restarted which may be significantly longer or shorter than the time period that you're interested in. You can get a list of the available statistics in the Oracle documentation to fine-tune exactly what you want to look for.
You can query dba_hist_seg_stat rather than v$segment_statistics. That will break out the statistics by time period so it will tell you when reads and writes happened. But it won't tell you who did them. That also requires that you be licensed to use the AWR (otherwise querying the table may violate your license and create an issue if you're ever audited).
You can look at dba_dependencies to see if any objects depend on objects owned by the user in question. But that will only work for stored objects (views, procedures, etc.). It won't capture information about SQL statements that are submitted from applications or ad hoc queries issued by users.
If you don't want to enable auditing and wait an appropriate period, you may be better served revoking privileges on the user1 objects from whatever roles/ users have them rather than dropping the objects outright. That way if something blows up from lack of privileges, it's relatively easy to restore the privilege without getting the object(s) back from backup. You could also create a trigger on a permission denied error that told you where the request was coming from.
I'm working on creating an application best described as a CRM. There is a relatively complex table structure, and I'm thinking about allowing users to do a fair bit of customization (adding fields and the like). One concern is that I will be reaching a certain level of scale almost immediately. We have about 50,000 individual users who will be coming online within about nine months of launch. So I want to build to last.
I'm thinking about two and maybe even three options.
One table set with a userID column on everything and with a custom attributes table created by creating a table which indexes custom attributes, then another table which has their values, which can then be joined to the existing contact records for the user. -- From what I've read, this seems like the right option, but I keep feeling like it's not. It seems like once these tables start reaching the millions of records searching for just one users records in every query is going to become a database hog.
For each user account recreate the table set, preened with a unique identifier (the userID for example.) Then rather than using a WHERE userID=? everywhere I can use a FROM ?_contacts. For attributes I could then have a custom attributes table where users could add additional columns for custom attributes. -- This feels like the simplest way to go, though, of course when I decide to change the database structure there would be a migration from hell.
The third option, which I'm pretty confident is wrong, but for that reason alone I can not rule out, is that a new database should be created for each user with all the requisite tables.
Am I crazy? Is option one really the best?
The first method is the best. Create individual userId's and then you can assign specific roles to them. A database retrieval time indeed depends on the number of records too. But, there is a trade-off where you can write efficient sql queries to fetch data. Well, according to this site, you will probably won't run out of memory or run into concurrency issues, because with a good server, the performance ought to be good, provided that you are efficient in writing queries.
If you recreate table sets, you will just end up creating lots of tables and can make the indexing slow which is a bad practice. Whereas if you opt of relational database scheme rather than an ordinary database scheme, and normalize the database and datatables for improving efficiency.
Creating a new database for each and every user, just sums up the complexity from both the above statements resulting in a shabby and disorganized database access. Because, if you decide to run individual instances of databases for every single user, you would just end up consuming your servers physical resources like RAM and CPU usage which will affect the service quality of all the other users.
Take up option 1. Assign separate userIds and assign them roles and privileges where needed. That is more efficient than the other two methods.
I am trying to decide on the best method for audit logging within my application. The main reason for the log is reporting the sequence of events (changes).
I have a hierarchy of Objects, I need to create reports when something changes on any part of that hierarchy, at a latter date.
I think that I have three options:
Have a log for each table and therefore matching the hierarchy of objects then creating a view for the report.
Flatten the hierarchy and de-normalise the table, making reporting easier - simple select statement.
Have one log table and have a record for each change making reporting harder but more flexible to changes.
I am currently leaning towards option 1.
I have to talk to this subject even though it's old.
It is usually a poor idea to have only one audit table as you will create locking problems in the database as everything hits that table. Use separate audit tables for each table.
It is also a poor idea to have the application do the auditing. Audit must be done at the database level or you risk losing some of the information. Data does not change only from applications in most databases; no one is going to change the prices of all their products one at a time from the user interface when you need a 10% increase to all 10,000,000 of them. Auditing should capture all changes not just some of them. This should be done in a trigger in most databases (SQL server 2008 has a built in auditing function). Some of the worst potential possible changes (employees committing fraud or wanting to maliciously destroy data) also are frequently from places other than the application especially if you allow table level access to users (Which you should not do in any financial database or one that contains personal information). Auditing from the application won't catch this. Developers often forget that in protecting their data, outside sources are not the only threat.
An audit log is basically a chronological list of events that occurred, who performed these events, and what the events were.
I think a flat view would be better as it can be easily ordered and queried. So I'm leaning more towards your option #2/#3.
Include things like the transaction type, the time, the user id, a description of what's changed, and other pertinent information related to your product.
You can also add things to your product over time and you won't need to continually modify your audit log module.
If it's for auditing purposes I'd use a true append-only medium rather than a table/tables in the same db.
You suggest it's for change history purposes - in which case I would restructure your application/db to record the actual events in the first place rather than just the current state.
I would go with (2) and (3): create a single table for all Audit entries.
A flat view is good, provided the extra work flattening does not impact performance.
You could look into an AOP framework to help with this. It would allow you to inject logging functionality at the beginning or end of any/all methods. If you go down this road, it might help define what would make sense for storing the log data.