I've been trying to get my head around NoSQL, and I do see the benefits to embedding data in documents.
What I can't understand, and hope someone can clear up, is how to store data if it must be relational.
For example.
I have many users. They are all buying a product. So everytime that they buy a product, we add it under the users document in mongo, so its embedded and its all great.
The problem I have is when something in reference to that product changes.
Lets say user A buys a car called "Porsche". Then, we add a reference to that under the users profile. However, in a strange turn of events Porsche gets purchased by Ferrari.
What do you do now, update each and every record and change to name from Porsche to Ferrari?
Typically in SQL, we would create 3 tables. One for users, one for Cars (description, model etc) & one for mapping users to purchases.
Do you do the same thing for Mongo? It seems like if you go down this route, you are trying to make Mongo do things SQL way, which is not what its intended for.
I can understand how certain data is great for embedding (addresses, contact details, comments, etc) but what happens when you need to reference data that can and needs to change at a regular basis?
I hope this question is clear
DBRefs/Manual References were made specifically to solve this issue. Instead of manually adding the data to each document and then needing to update when something changes, you can store a reference to another collection. Here is the mongoDB documentation for details.
References in Mongo
Then all you would need to do is update the reference collection and the change would be reflected in all downstream locations.
When i used the mongoose library for node js it actually creates 3 tables similar to how you might do it in SQL, you can use object id's as foreign keys and enrich them either on the client side or on the backend, still no joining but you could do an 'in' query for the ID's then enrich the objects that way, mongoose can do this automatically by 'populating'
Related
I am building a small social networking website, I have a doubt regarding database schema:
How should I store the posts(text) by a user?
I'll have a separate POST table and will link USERS table with it, through USERS_POST table.
But every time to display all the posts on user's profile, system will have to search the entire USERS_POST table for USER id and then display?
What else should I do?
Similarly how should I store the multiple places the user has worked or studied?
I understand it's broad but I am new to Database. :)
First don't worry too much, start by making it work and see where you get performance problems. The database might be a lot quicker then you expect. Also it is often much easier to see what the best solution is when you have an actual query that is too slow.
Regarding your design, if a post is never linked to more then one user then forget the USERS_POST table and put the user id in the POST table. In either case an index on the user id would help (as in not having to read the whole table) when the database grows large.
Multiple places for a single user you would store in an additional table. For instance called USERS_PLACES, give it a column user_id to link it to USERS plus other columns for the data you wish to store per place.
BTW In postgresql you might want to keep all object (tables, columns, ...) names lowercase because unless you take care to always quote them like "USERS" postgresql will make them lowercase which can be confusing.
I have just started using RavenDB on a personal project and so far inserting, updating and querying have all been very easy to implement. However, I have come across a situation where I need a GetOrCreate method and I'm wondering what the best way to achieve this is.
Specifically I am integrating with OpenID and once authentication has taken place the user is redirected to my site. At this point I'd either like to retrieve their user record from Raven (by querying on the ClaimsIdentifier property) or create a new record. The user's ID is currently being set by Raven.
Obviously I can write this in two statements but without some sort of transaction around the select and the create I could potentially end up with two user records in the database with the same claims identifier.
Is there anyway to achieve this kind of functionality? Possibly even more importantly is do you think I'm going down the wrong path. I'm assuming even if I could create a transaction it would make scaling out to multiple servers difficult and in anycase could add a performance bottle-neck.
Would a better approach be to have the Query and Create operations as separate statements and check for duplicates when the user is retrieved and merge at that point. Or do something similar but on a scheduled task?
I can't help but feel I'm missing something obvious here so any advice on this problem would be greatly appreciated.
Note: while scaling out to multiple servers may seem unnessecary for a personal project, I'm using it as an evaluation of Raven before using it in work.
Dan, although RavenDB has support for transactions, I wouldn't go that way in your case. Instead, you could just use the users ClaimsIdentifier as the user documents id, because they are granted to be unique.
Alternatively, you can also stay with user ids being generated by Raven (HiLo btw) and use the new UniqueConstraintsBundle, which lets you attribute certain properties to be unique. Internally it will create an additional document that has the value of your unique property as its id.
The templates for these HTML emails are all the same, but there are just different variables for say, first name, last name and such.
Would it just make sense to store the most minimal of data that I need, and load the template in and replace the variables everytime?
Another option would be to actually create the HTML file and store a reference to it, which probably would be the easiest to do except it might be a pain managing the files, and it adds complexity in regards to migrating, file permissions, et cetera.
Looking for opinions from people who've done this before...
GOAL/PURPOSE/USE:
I have a booking engine. When users make a booking, they are sent a confirmation email, generated from the sessionized booking data.
This email provides a "Cannot view this email? See it here" link which provides a web view of the email, in addition to a plaintext view.
I need to display the same email that was sent out, in addition to the plaintext view.
The template is subject to change, but I think because of that very fact I should have a table of templates and map the data to a template.
That's what I would do, because the template layout may change over the time, but the person information should remain the same. So, it makes sense to just store the person information in the database and leave the template out from the database.
In fact, it would be even better if you use template engine such as Velocity (in Java) to construct your HTML emails... very easy, by the way.
On the one hand cpu is more expensive then memory, so mostly it is better to save more data to reduce cpu power used by computation.
But in your case, I would save the minimal data, the emails or what you are tying to save, because it allows you to easily remodel your templates, and to reuse the data at multiple places of your application.
You persist redundant data (especially because of the template) which is in no way normalized. I would not suggest to do that. But mentioned in the comment it is important what you want to do with that data.
If you only save the data you need you could for example exchange that template easy and use another one.
Yea, your right on track. I did a similar thing. All dynamic/runtime variables were starting from ##symbol.
So in database you would have one Template table. One table would be for dynamic/runtime variables. One table for Mapping between Template and dynamic/runtime variables.
tblTemplate - TemplateID, TemplateValue
tblRuntimeVariables - RuntimeVariableID, VariableString, VariableSQL
tblMapping - TemplateID, RuntimeVariableID, RuntimeVariableValue
Advantage of using an extra mapping table is that on adding new dynamic variables to existing change would mean making no change to existing database. Only more rows would be added to tblMapping.
In my case I was also having one extra column for storing SQL Statements in tblRuntimeVariables in case the value for a runtime variable is fetched from database.
I recently acquired this book from Microsoft Press. I currently have Office Enterprise 2007 (Access included) and have firmly decided to convert my Informix-SQL app to Access 2010. However, I'm not experienced with VBA, Macros and several other functionality my app needs. This is going to be a new learning process for me, but I must modernize my 20 year old char-based app and take advantage of new features. I have begun defining my tables and columns, but not the relationships. With INFORMIX, I join a serial (autoincrement) column with an INT column in another table. Now when I display a customers master row, I would like to automatically display all of the transactions associated with that customer in a sub-form and have the ability to add, update, query, delete on either tables. Can this be accomplished with A'10?
EDIT: OK, this is what I have done so far, defined tables and relationships:
there are more validation lookup tables to follow, but these are the main tables. So if now I create a form and specify the CLIENTES (customer table), LOTES (lot table), ARTICULOS (item table) and TRANSACCIONES (transaction table) it will create a CLIENT table as the master form and the other child tables as subforms on one screen?
Also, the reason I created a lot table is because when customers pawn or sell items, the pawnshop groups all these items into one lot, calculates the total loan or purchase amount, stores it all under one transaction and prints the ticket with a description of all the items and total amount. So I want the ability to say, if customer defaults on interest payments or does not redeem pawn, then customer forfeits all items and pawnshop may choose to sell some items to gold refinery and/or transfer other non-gold items to inventory to sell to the public, so would the above ER be adequate for this capability?
I also want to ensure that every row in every table has the same store_ID (company ID) while users are working within a specific company, as this system will be multi-company and there will be consolidated reports, etc.
This type of setup can be accomplished in any version of access going back to 1992.
The way you model these classic one to many relationships in access is to base the parent form on the parent table (note I said partent table, not a query – I going to repeat this again: you do not need a query that joins the data for you). You then create what is called a continues form based on the child table. You now have two forms, and you then can simple drag + drop in the form for the child table into the above parent form, and you are done.
In fact, if you design and setup the relationship correctly in the relationships window, then if you use a wizard to create the form, it will actually build and automatically insert a sub form for you.
So, there is some several interesting issues about the above process that you should know As I pointed above, you don't have to build any SQL query at all. You don't have to write sql to join together the data. Access will do all of this for you, and do it without any code.
So, when you navigate records in the main form, the child records will be automatically displayed and filtered for you in the sub form. (and if you add or delete or edit those child records, the correct relationship key is inserted and maintained for you, again done without writing any code at all).
In the following form, we have a classic accounting funds distribution example. In this example we are tracking membership donations. So, the top part of the form is one record based on the master table and is the donation event. I then created an continuous form. When dropped into the main form, it becomes a sub form. That form is the one on the left side, and it simply allows me to enter a list of members who donated for the above event.
The form looks like:
At this point the form will work without any code having been written.
In fact the above form I have the one main record, on the left side I have many members who made a donation. However, I also needed to split out each donation for those memebers on the left side to an to particular account for accounting purposes. (a classic check spliting that you see in just about every accounting package these days)
So the above models a one to many and then the many members also split out into many different accounts for each donation. A really incredible powerful setup, and one that has almost no code.
So, in the above I'm really doing three tables deep as a model. |And, to be fair, the right side (donations split into accounts) did need one line of code to update correctly, as access does not do this for me when you go 3 tables deep.
However for the most part, to model these classic parent to child table relationships in access, you don't need to write any code at all. You use a main form and then for the child table, you insert a sub-form based on the child table.
And as noted if you set the relationships up correctly, access will automatically stitch the two together for you, and maintain the relationship for you. So display of the child records belonging to the one parent record will display for you automatically. And this ability includes you to edit and delete and add those child records. And, thus as you navigate to a new reocrd, all of the child records and information will automatically be refreshed for the next form.
In other words all the above can be done without building any SQL queries, and not one line of code is required.
Unfortunately Stack Overflow works great for asking a question and having an answer. However, a serious of Q + A that builds on previous question we much find that StackOverflow REALLY breaks down here. I will give this a shot, and with all due respect to the great stack overflow, I am temped to suggest you try a forum based system as opposed to so. Perahps Utter access, or even the access dev forum here:
http://social.msdn.microsoft.com/Forums/en-US/accessdev
Anyway, lets see what we can come up with. I should also point out that your table design and how to relate those tables is not really a access question but simply that of database design. How this should be laid out will be the same for MySql, Oracle, FoxPro or in fact just about ANY relational database system that we had in the last 20+ years in our industry. So, this is not really a access question when you ask about how a database is to be setup.
Ok, lets see. In the above you have the LOTES table attached to customers. If you WERE going to attach (relate) the LOTS table to a customer, then you don't need the store_id in the LOTES tables since the table is a child to that customer record. Anytime you can go up to the parent table to get that information, you don't need to repeat it in the child table(s). So, that customer record you are thus relating back to already has the store id and you be able to get at that store id value any time since it is a relation (that is the whole idea of a relational database, you don't have to repeat data over and over again).
And, as you have it now, the same advice applies to the two child table to LOTES. Again they don't need the store id since by you above design they are already attached to the LOTES table which in turn is attached to the customer table where the store id is. So, in your current design ALL of the child tables below customer can and should and would have the store_id removed.
However, the above I suspect is not what you looking to accomplish here. It is quite likely that LOTES records belong to a store. If that is the case, then you want LOTES table to be child of stores.
You thus should have STORES->LOTES
The only part not clear here is if your design is going to allow one customer to be attached to more then one store. If this is a very rare case, then perhaps you adopt a easier design that either forces you to enter the customer again, or even via some copy code. While this breaks normalizing here often there are some issues like if the shipping address is going to change for different stores for the same customer, then often you save a lot of code and design issues by not normalizing in this case.
However, if you need this feature, then the reverse is also true and there is MANY advantages to normalizing this data correctly, but often more desing work up front.
If you really do want customers to have more then one store, then you need a table called tblCustomerStores. You will attach this table to customers as a child table, and this this table will have the store_id. You can then attached all customer transactions etc. to this child store table. And this Customer store table might even include perhaps payment type and deliver preferences (if they vary from store to store for example).
So, you start at
Customers->customer stores->Transactions->
And, it not quite clear what is in table transactions, but it possible that everthing else belongs below the above trans table.
And, it not clear if Articles are attached to a particular transaction or not? If they are, then articles for thus be a child of transactions.
I thinking more of
Customers->customer stores->Articles->lots
And you want Customers stores->Tranascation
And you want Customers stores->Articles
So, I think you are currently going too many tables deep. Just keep in mind that you can relate many table to the parent table just like you have in your diagram, but I think what you have it not what you are looking for
I am working on a design where I can have flexible attributes for users and I am confused how to continue the design of the schema.
I made a table where I kept system needed information:
Table name: users
id
username
password
Now, I wish to create a profile table and have one to one relation where all the other attributes in profile table such as email, first name, last name, etc. My question is: is there a way to add a third table in which profiles will be flexible? In other words, if my clients need to create a new attribute he/she won't need any customization to the code.
You're looking for a normalized table. That is a table that has user_id, key, value columns which produce a 1:N relationship between User & this new table. Look into http://en.wikipedia.org/wiki/Database_normalization for a little more information. Performance isn't amazing with normalized tables and it can take some interesting planning for optimization of your code but it's a very standard practice.
Keep the fixed parts of the profile in a standard table to make it easy to query, add constraints, etc.
For the configurable parts it sounds like you are looking for an entity-attribute-value model. The extra configurability comes at a high cost though: everything will have to be stored as strings and you will have to do any data validation in the application, not in the database.
How will these attributes be used? Are they simply a bag of data or would the user expect that the system would do something with these values? Are there ever going to be any reports against them?
If the system must do something with these attributes then you should make them columns since code will have to be written anyway that does something special with the values. However, if the customers just want them to store data then an EAV might be the ticket.
If you are going to implement an EAV, I would suggest adding a DataType column to your attributes table. This enables you to do some rudimentary validation on the entered data and dynamically change the control used for entry.
If you are going to use an EAV, then the one rule you must follow is to never write any code where you specify a particular attribute. If these custom attributes are nothing more than a wad of data, then an EAV for this one portion of your system will work. You could even consider creating an XML column to store these attributes. SQL Server actually has an XML data type but all databases have some form of large text data type that will also work. On reports, the data would only ever be spit out. You would never place specific values in specific places on reports nor would you ever do any kind of numerical operation against the data.
The price of an EAV is vigilence and discipline. You have to have discipline amongst yourself and the other developers and especially report writers to never filter on a specific attribute no matter how much pressure you get from management. The moment a client wants to filter or do operations on a specific attribute, it must become a first class attribute as a column. If you feel that this kind of discipline cannot be maintained, then I would simply create columns for each attribute which would mean an adjustment to code but it will create less of mess down the road.