Collection of relations. NoSQL vs SQL - sql

I understand the difference between NoSQL and SQL, but I still have a small question. It's about a many-to-many relationship. I know that if I need such a relationship, I should use relational databases. However, in my case, I prefer to use document-oriented databases, because I need to store a large number of documents, much larger than the number of entities with relationships.
So, I need to implement user groups. Of course, users can exist outside of groups, therefore they are documents of a separate collection. In addition, one user can be in several groups, which means this is a real many-to-many relationship.
People say that “mongo-way” is to make a user with a list of links to groups and a group with a list of links to users, but this option doesn’t suit me, because sometimes I need to display a list of groups without displaying a list of the group users, which can take most of the document.
As an alternative, I want to use the traditional “relationship tables” that are used for many-to-many relationships in relational databases.
So my question is, what is the practical difference between using such tables in mysql or in mongodb? As far as I know, there are no foreign keys in mongodb, but does that really prevent me from doing something like this? I see the problem only in the fact that you can not get rid of the required _id and its indexes. By the way, in this case, should I create indexes for the UserId and GroupId fields?
Or maybe I should give up the idea to fit everything into one database and use SQL and NoSQL together in one project?

below links might help you for schema designing .
https://www.mongodb.com/blog/post/6-rules-of-thumb-for-mongodb-schema-design-part-1

Related

How to convert SQL many to many data model to firebase data model

I am having troubles trying to convert my data model of an attendance system for a football trainer (I have done it as if it were a SQL normalized relational model) to a firebase model. Here is a picture of my relational model:
I was thinking about making 4 Collections also:
Players
Attendance
Match
MatchType (it can be friendly-match, tournament, practice, among others)
I think this depends of how do you want to use the data. When I look at this it seems that one collection "Attandence" is enough, which in your schema is connecting all tables.
The idea of relational databases is that data should not be redundant, so every information is stored only once and connected by relation using keys like ex. PlayerID.
While in noSQL databases you does not care about data redundancy. So you are storing the same information (like player name) in many documents. The idea is to have everything in one document and do not create sophisticated queries to get information - just get document and you have everything.
So all depend how do you use information, which we do not know. You can put everything in one collection and just get all information from one document.
On the other hand you can create 4 collection with exactly the same fields as in SQL database and use in relational way just to have cheap, fast and serverless database engine.
What more you can change your solution any time as you do not define any schema.
So in Firestore you are free to choose any solution, you should think first how do you will use the information.

Create SQL tables for each user as security measure

I've research this topic and I'm relatively sure in most practices the answer is "No", but I would like some second opinions specific to my case.
We're currently working on a multi user web-app where each user will basically have there own copy "portal/app" within the web-app. It's not performance I'm worried about, but security.
I'm considering partitioning the data with a prefix userid_table1, userid_table2 to make it more manageable and ensure no security validation oversight is made by the team in development as we can easily add a validation to ensure that queries can only be run against tables with userid_*.
Would you still recommend against this method ?
I'm considering partitioning the data with a prefix userid_table1, userid_table2 to make it more manageable and ensure no security validation oversight is made by the team in development as we can easily add a validation to ensure that queries can only be run against tables with userid_*.
More manageable? That sounds like a joke. Your database will end up with a zillion different tables. Any operation that you want to do across all users will be a nightmare:
Declaring foreign key constraints.
Defining a new index on the tables.
Adding a new column.
Restructuring the tables.
And so on. And so on.
Your users may be limited to a single table. But the application developer and DBA need to deal with all of them. I cringe thinking about trying to figure out where performance bottlenecks are in such a system.
I should add that databases are optimized for big tables not lots of tables, so multiple tables are typically less efficient. And even less efficient when you think about all the half-filled pages in all those tables.
The same entities should not be spread among multiple tables, unless you have a really, really good reason. This is not a really good reason. One simple solution is to prevent users from having access to the base tables. Just give them access to views or user-defined table functions -- and have all of these filter on user ids.
There are some edge cases where you do want separate tables for each user. Typically, each user would have a very complex tables (think B2B application) and, in fact, they might have their own database. There may also be legal requirements to separate data. In these cases, though, the "separateness" would typically be at the database level, not the table level.

Modeling data in firebase using joins - many-to-many relationship

I'm interested in the new firebase.util package that allows you to join data (paths) and how I might be able to continue modeling with UML as I have become accustomed to over many years. I can see how easy it might be to make one-to-many relationships in this way. And because firebase is hierachical, component relationships are just very natural.
Aggregate relationships can be duck'd as we're all accustomed to this in javascript - enforcing aggregate relationship doesn't seem to me to be a barrier to modeling successful projects using firebase...
My question is if anyone has experimented | had success with | can show examples of how it might be possible to represent many-to-many relationships, perhaps by joining the join paths themselves.
If I don't get much interest in the question I may post my own trial-error results...
Thanks
I have tried to use composite key. For example, user can be member of many rooms. We need two queries: List of room members, and list of user's rooms. So we can have only one collection rooms-users, where key is built like this:
id = [roomId, userId].join()
The truth is, I'm not sure whether it is a good pattern. It seems it can prevent security rules settings https://stackoverflow.com/a/17431390/233902 and maybe even have performance implications.
So maybe two or even more collections are required. Two for many to many, third for relation metadata. As I'm thinking about, collections should be optimized for queries, so composite key is anti-pattern for Firebase.

Database efficiency - table per user vs. table of users

For a website having users. Each user having the ability to create any amount of, we'll call it "posts":
Efficiency-wise - is it better to create one table for all of the posts, saving the user-id of the user which created the post, for each post - OR creating a different separate table for each user and putting there just the posts created by that user?
The database layout should not change when you add more data to it, so the user data should definitely be in one table.
Also:
Having multiple tables means that you have to create queries dynamically.
The cached query plan for one table won't be used for any other of the tables.
Having a lot of data in one table doesn't affect performance much, but having a lot of tables does.
If you want to add an index to the table to make queries faster, it's a lot easier to do on a single table.
Well to answer the specific question: In terms of efficiency of querying, it will always be better to have small tables, hence a table per user is likely to be the most efficient.
However, unless you have a lot of posts and users, this is not likely to matter. Even with millions of rows, you will get good performance with a well-placed index.
I would strongly advise against the table-per-user strategy, because it adds a lot of complexity to your solution. How would you query when you need to find, say, users that have posted on a subject within the year ?
Optimize when you need to. Not because you think/are afraid something will be slow. (And even if you need to optimize, there will be easier options than table-per-user)
Schemas with a varying number of tables are, generally, bad. Use one single table for your posts.
If performance is a concern, you should learn about database indexes. While indexes is not part of the SQL standard, nearly all databases support them to help improve performance.
I recommend that you create a single table for all users' posts and then add an indexes to this table to improve the performance of searching. For example you can add an index on the user column so that you can quickly find all posts for a given user. You may also want to consider adding other indexes, depending on your application's requirements.
Your first proposal of having a single user and a single post table is the standard approach to take.
At the moment posts may be the only user-specific feature on your site, but imagine that it might need to grow in the future to support users having messages, preferences, etc. Now your separate table-per-user approach leads to an explosion in the number of tables you'd need to create.
I have a similar but different issue with your answer because both #guffa and #driis are assuming that the "posts" need to be shared among users.
In my particular situation: not a single user datapoint can be shared for privacy reason with any other user not even for analytics.
We plan on using mysql or postgres and here are the three options our team is warring about:
N schema and 5 tables - some of our devs feel that this is the best direction to make to keep the data completely segregated.
Pros - less complexity if you think of schema as a folder and tables as files. We'll have one schema per user
Cons - most ORMs do connection pooling per schema
1 schema and nx5 tables - some devs like this because it allows for connection pooling but appears to make the issue more complex.
Pros - connection pooling in the ORM is possible
Cons - cannot find an ORM where Models are set up for this
1 schema and 5 tables - some devs like this because they think we benefit from caching.
Pros: ORMs are happy because this is what they are designed to do
Cons: every query requires the username table
I, personally, land in camp 1: n schemas.
My lead dev lands in camp 3: 1 schema 5 tables.
Caching:
If data is always 1:1, I cannot see how caching will ever help regardless of the solution we use because each user will be searching for different info.
Any thoughts?

Address book database design: denormalize?

I'm designing a contact manager/address book-like application but can't settle on the database design.
In my current setup I have a Contact, which has Addresses, Phonenumbers, Emails, and Organizations. All contact properties are currently separate tables with a fk to the Contact table. Needless to say a contact can have any number of these properties.
Now, I find myself joining all these tables together if I want to read contacts into the app. Since no filters, reverse lookups, sorts etc. are performed on the related tables, isn't it a better/simpler solution to just store the related fields as json-encoded lists on direct properties of the Contact table?
E.g., instead of a Contact with a fk to a phonenumber table with 3 entries, just encode all phonenumbers and store them into a field of the Contact table?
Any insights really appreciated! (fyi I'm using Django although that doesn't really matter)
Can you guarantee that your app will never grow to need these other functionalities? Do you really want to paint yourself into the corner such that you can't easily support all of this later?
Generally, denormalization happens only for preformance reasons. And then, a copy of the normalized data is still kept for live work and the denormalized data is used for offline processing where having a static snapshot is fine.
Get used to writing joins. That's the way SQL works. Having to do so doesn't meant something is wrong.
I know I'm too late, but for anyone with the same issue.
IMO, in this case metadata modeling is the way to go.
http://searchdatamanagement.techtarget.com/feature/Data-model-patterns-A-metadata-map
Sounds like you propose taking data currently modelled as five SQL tables and converting it to a common multi-valued type (does your SQL product have good support for this?) The only way I can see this would constitute 'denormalization' would be if you were proposing to violate 1NF, at which point you may as well abandon SQL as a data store because your data would no longer be relational! Otherwise, your data would still be normalized but you will have lost the ability to query its attributes using SQL (unless your SQL product has extensions for querying multi-value attributes). The deciding factor seems to be: do you need to query these attributes using SQL?