I've not really used access for a while and not too sure how best to proceed with this data model:
I have set of resource tables of differing type, eg: Data, Literature, Contractors, etc.
I also have a set of category tables such as Procedures, Topics, and Regions.
I need to create many-to-many relationships between the various resources and the various categories so it is possible to view a resource record and see lists of the various categories to which the resource is allocated, and vice-versa, that is to view all resources allocated to a specific category.
I realise that I could create lots of link tables, eg: LnkDataProcs, LnkDataTopics, etc, however with perhaps 10 resource tables and 3 category tables I would wind up with 30-odd link tables which seems wrong (it may also be useful to query all resources from each category anyway so it would be good to have one link table for each category).
I've done this kind of thing before using SQL in custom DB client apps by using one link table with fields as follows: CategoryTable, CategoryID, ResourceTable, ResourceID - So that the link table stores the table name as well as the foreign key.
However I'm not too sure how to fit this kind of model into an access database, it would be nice to use the Access framework (master-child form objects) rather than having to write loads of custom code to perform queries and populate forms.
Any ideas how to proceed, or even what this kind of relationship is called?
Related
I know SQL, but I'm not terribly experienced with it. I have a system in which I would like to log user logins, logouts and other security-related events in an indexed database to be able to pose manual queries to it, and I figure some SQL-based RDBMS should be the best fit for the job.
However, the records I'd like to store have similar, but not identical, data. All records would store a timestamp and a username, but the other data items would differ. For instance:
A login event would store the IP address the user logged in from, along with an ID for the created session.
A logout event would store the session ID but not the IP address (since I don't have access to the IP address at the point of logout).
An email-change event would store former and new e-mail address of the user.
How should one model something like this in a relational database? I can imagine at least three possibilities:
Use different tables for each kind of data item
Add columns for all the different data items and leave them as NULL for records that don't use them
Use one central table with the common data items, and auxiliary tables that store the rest of the data, linking to an event ID in the central table
Clearly, each one has its own advantages and disadvantages. I do realize that this is a somewhat subjective question and is also likely to depend on actual use-cases, but I imagine there ought to be standard/best practices for this kind of thing that I just haven't seen. Are any of my suggestions reasonable or standard? Is there some other option that I have missed that is better?
The solutions you mention appear in Martin Fowler's book Patterns of Enterprise Application Architecture. You might like to read that book to see what he says about using these patterns.
Use different tables for each kind of data item
Concrete Table Inheritance
Add columns for all the different data items and leave them as NULL for records that don't use them
Single Table Inheritance
Use one central table with the common data items, and auxiliary tables that store the rest of the data, linking to an event ID in the central table
Class Table Inheritance
Fowler also covers a fourth solution for this problem:
Serialized LOB
For a web application (with some real private data) we want to use privacy enhancing technology to prevent big risks when someone gets permission to our database.
The application is build with different layers, and we use (as said in the topic title) Fluent NHibernate to connect to our database and we've created our own wrapper class to create query's.
Security is a big issue for the kind of application we're building. I'll try to explain the setting by a simple example:
Our customers got some clients in their application (each installation of the application uses its own database), for which some sensitive data is added, there is a client table, and a person table, that are linked.
The base table, which links to the other tables (there will be hundreds of them soon), probably containing sensitive data, is the client table
At this moment, the client has a cleint_id, and a table_id in the database, our customer only knows the client_id, the system links the data by the table_id, which is unknown to the user.
What we want to ensure:
A possible hacker who would have gained access to our database, should not be able to see the link between the customer and the other tables by just opening the database. So actually there should be some kind of "hidden link" between the customer and other tables. The personal data and all sensitive other tables should not be obviously linked together.
Because of the data sensitivity we're looking for a more robust solution then "statically hash the table_id and use this in other tables", because when one of the persons is linked to the corresponding client, not all other clients data is compromised too.
Ultimately, the customer table cannot be linked to the other tables at all, just by working inside the database, the application-code is needed to link the tables.
To accomplish this we've been looking into different methods, but because of the multiple linked tables to this client, and further development (thus probably even more tables) we're looking for a centralised solution. That's why we concluded this should be handled in the database connector. Searching on the internet and here on Stack Overflow, did not point us in the right direction, perhaps we couldn't find this because of wrong search terms (PET, Privacy enhancing technology, combined with NHibernate did not give us any directions.
How can we accomplish our goals in this specific situation, or where to search to help us fix this.
We have a similar requirement for our application and what we ended up with using database schema's.
We have one database and each customer has a separate schema, where all the data for that customer is stored. It is possible to link from the schema to the rest of the database, but not to different schema's.
Security can be set for each schema separately so you can make the life of a hacker harder.
That being said I can also imagine a solution where you let NHibernate encrypt every peace of data it will send to the database and decrypt everything it gets back. The data will be store savely, but it will be very difficult to query over data.
So there is probably not a single answer to this question, and you have to decide what is better: Not being able to query, or just making it more difficult for a hacker to get to the data.
How do I describe the partition of client data when all data is stored in one place and separated via programming?
If a collection of data from various clients is stored in a variety of SQL tables and is separated via the code (E.g. members from different orgs defined by organisation table) at which layer is the data separation defined?
Sorry if this question is a bit poorly worded.
In terms of how to explain it, I'd need more information on how you're actually separating the data for consumption by different members, but we've done a similar thing using SQL views. In our case, it's pretty easy to explain because each role (i.e., a set of user permissions determined by their need-to-know) has a set of SQL views they have permissions to view and query but not modify. Then users can query the views as needed to make their own reports and datasets.
If you're looking for more technical jargon, this was one of the documents we came across when setting up our security.
It might be easiest to explain that each data element has a set of roles that have access to that data element. Your role within the multitude of client organizations determines which data elements you can work with in your reports. Then you would just want to use very strong language indicating how you have implemented safeguards ensuring that users cannot, in any way, access data that is not relevant to their need-to-know.
I want to build an online form builder much like wufoo that allows the users to create and publish their own web forms. Each submission should be saved to a data base where the user can later retrieve the submissions.
As these forms will be dynamic, ie. the user has complete control over the amount and type of form fields I am trying to think of a solid database design to store this information.
I would have one table fieldtype which contains every type of field available to the users, ie. textfield, emailfield etc.
One baseform table which will hold each forms id, url etc.
I would then have a table formfields which would contain ref to the baseform and to fieldtype, this table could also include custom validation to be done on each field.
Is this design good as a base structure? I imagine it will be easy to add new types of fields to the application however I don't know what the potential downsides are as I am far from a sql expert.
store user defined data in SQL
I think you are looking for the Entity–attribute–value database model in which:
The basic idea is to store attributes, and their corresponding values,
as rows in a single table.
Typically the table has at least three columns: entity, attribute, and
value. Though if there is only a single relevant entity, e.g. a table
for application configuration or option settings, the entity column
can be excluded.
See this pages as a start:
Using Database Metadata and its Semantics to Generate Automatic and Dynamic Web Entry Forms (pdf)
Planning and Implementing a Metadata-Driven Digital Repository (pdf)
I retagged your question with entity-attribute-value tag, in which you can browse a lot of threads that relate to your case.
As Mahmoud Gamal writes, The model you describe is "Entity/Attribute/Value"; as Borys writes, there are many known problems with this model.
As an alternative, you might consider storing the form entries in a "document" - e.g. XML or JSON - within a relational model.
For instance, you might have a table along the lines of:
FORM_SUBMISSION
--------------------
Submission_ID (pk)
Client_ID (fk to clients table)
Submission_date
SubmissionDocument
I'm using "client" to represent the users who create the form; to retrieve all submissions for a given client, you use a where clause on client_id.
This model makes it harder to run SQL queries against the form submission (though that becomes hard with EAV too when going beyond very simple queries), but it dramatically simplifies the persistence solution.
We have a SQL server that has a database for each client, and we have hundreds of clients. So imagine the following: database001, database002, database003, ..., database999. We want to combine all of these databases into one database.
Our thoughts are to add a siteId column, 001, 002, 003, ..., 999.
We are exploring options to make this transition as smoothly as possible. And we would LOVE to hear any ideas you have. It's proving to be a VERY challenging problem.
I've heard of a technique that would create a view that would match and then filter.
Any ideas guys?
Create a client database id for each of the client databases. You will use this id to keep the data logically separated. This is the "site id" concept, but you can use a derived key (identity field) instead of manually creating these numbers. Create a table that has database name and id, with any other metadata you need.
The next step would be to create an SSIS package that gets the ID for the database in question and adds it to the tables that have to have their data separated out logically. You then can run that same package over each database with the lookup for ID for the database in question.
After you have a unique id for the data that is unique, and have imported the data, you will have to alter your apps to fit the new schema (actually before, or you are pretty much screwed).
If you want to do this in steps, you can create views or functions in the different "databases" so the old client can still hit the client's data, even though it has been moved. This step may not be necessary if you deploy with some downtime.
The method I propose is fairly flexible and can be applied to one client at a time, depending on your client application deployment methodology.
Why do you want to do that?
You can read about Multi-Tenant Data Architecture and also listen to SO #19 (around 40-50 min) about this design.
The "site-id" solution is what's done.
Another possibility that may not work out as well (but is still appealing) is multiple schemas within a single database. You can pull common tables into a "common" schema, and leave the customer-specific stuff in customer-specific schema. In some database products, however, the each schema is -- effectively -- a separate database. In other products (Oracle, DB2, for example) you can easily write queries that work in multiple schemas.
Also note that -- as an optimization -- you may not need to add siteId column to EVERY table.
Sometimes you have a "contains" relationship. It's a master-detail FK, often defined with a cascade delete so that detail cannot exist without the parent. In this case, the children don't need siteId because they don't have an independent existence.
Your first step will be to determine if these databases even have the same structure. Even if you think they do, you need to compare them to make sure they do. Chances are there will be some that are customized or missed an upgrade cycle or two.
Now depending on the number of clients and the number of records per client, your tables may get huge. Are you sure this will not create a performance problem? At any rate you may need to take a fresh look at indexing. You may need a much more powerful set of servers and may also need to partion by client anyway for performance.
Next, yes each table will need a site id of some sort. Further, depending on your design, you may have primary keys that are now no longer unique. You may need to redefine all primary keys to include the siteid. Always index this field when you add it.
Now all your queries, stored procs, views, udfs will need to be rewritten to ensure that the siteid is part of them. PAy particular attention to any dynamic SQL. Otherwise you could be showing client A's information to client B. Clients don't tend to like that. We brought a client from a separate database into the main application one time (when they decided they didn't still want to pay for a separate server). The developer missed just one place where client_id had to be added. Unfortunately, that sent emails to every client concerning this client's proprietary information and to make matters worse, it was a nightly process that ran in the middle of the night, so it wasn't known about until the next day. (the developer was very lucky not to get fired.) The point is be very very careful when you do this and test, test, test, and test some more. Make sure to test all automated behind the scenes stuff as well as the UI stuff.
what I was explaining in Florence towards the end of last year is if you had to keep the database names and the logical layer of the database the same for the application. In that case you'd do the following:
Collapse all the data into consolidated tables into one master, consolidated database (hereafter referred to as the consolidated DB).
Those tables would have to have an identifier like SiteID.
Create the new databases with the existing names.
Create views with the old table names which use row-level security to query the tables in the consolidated DB, but using the SiteID to filter.
Set up the databases for cross-database ownership chaining so that the service accounts can't "accidentally" query the base tables in the consolidated DB. Access must happen through the views or through stored procedures and other constructs that will enforce row-level security. Now, if it's the same service account for all sites, you can avoid the cross DB ownership chaining and assign the rights on the objects in the consolidated DB.
Rewrite the stored procedures to either handle the change (since they are now referring to views and they don't know to hit the base tables and include SiteID) or use InsteadOf Triggers on the views to intercept update requests and put the appropriate site specific information into the base tables.
If the data is large you could look at using a partioned view. This would simplify your access code as all you'd have to maintain is the view; however, if the data is not large, just add a column to identify the customer.
Depending on what the data is and your security requirements the threat of cross contamination may be a show stopper.
Assuming you have considered this and deem it "safe enough". You may need/want to create VIEWS or impose some other access control to prevent customers from seeing each-other's data.
IIRC a product called "Trusted Oracle" had the ability to partition data based on such a key (about the time Oracle 7 or 8 was out). The idea was that any given query would automagically have "and sourceKey = #userSecurityKey" (or some such) appended. The feature may have been rolled into later versions of the popular commercial product.
To expand on Gregory's answer, you can also make a parent ssis that calls the package doing the actual moving within a foreach loop container.
The parent package queries a config table and puts this in an object variable. The foreach loop then uses this recordset to pass variables to the package, such as your database name and any other details the package might need.
You table could list all of your client databases and have a flag to mark when you are ready to move them. This way you are not sitting around running the ssis package on 32,767 databases. I'm hooked on the foreach loop in ssis.