Is it a good idea to store lists in document-based databases? - sql

I’m trying to build a mobile app that involves users following each other. I’ve seen posts (here) that say it is a cardinal sin to store a users’ followees and followers as a list in a SQL database as each “cell” should only store one discrete value.
However, is this the case for noSQL, document-based databases? What are the pros and cons of storing followers and followees as a list in the user document, vs storing it in a separate collection?
The only ones i can see now is that retrieving the follower/followee data (could be?) faster for the former method as you don’t have to index the entire follower/followee collection, unlike the latter method (or is the time difference negligible?). On the other hand, one would require 2 writes every time someone follows/unfollows another user, which may be disadvantageous for billing in cloud databases, but might not be a problem if the database is hosted locally (?)
I’m very new to working with databases so I’m hoping to get some insight from more experienced people about long term/large scale effects of this choice. Thanks!

Related

What is best practice to store user data?

I know this is a really basic question yet I did not find further information. Figure I want to develop a basic multiple-user application for notes. In my database I have a table where I store user IDs, usernames and passwords.
I now want to store the users notes, but a user should only be able to see their own notes. What is best practice to do this? The two possibilities that come to my mind are
Create a table for each user where you store their notes (probably
scales horribly bad)
Have one big notes-table and save the user IDs as secondary keys (It just
feels a bit "off" to have everything stored in one big table)
Is one of these two ideas used in this exact way in large scale real-world projects? If so, is there anything else one has to pay attention to?
In general you need the 2nd option.
My advice to you, please don’t create any auth functions, because it's a very hard solution for the beginners. Much better for this type of application (as notes) is to use a serverless architecture.
E.g. Firebase, Supabase and so on.
Where you will have database, security authentication, record level security, storage for files etc.

PostgreSQL for multiple users

I am building an app for a workshop at a conference. It will be used by the participants to input answers to a survey on their mobile devices and then these answers will be saved to a database.
I am currently looking at PostgreSQL and from what I have seen it is extremely capable of handling well over 100 expected users that I will have using the app at one time. What I haven't been able to decide conclusively is whether these 100 people all adding to the same database at once will cause any problems. I have looked into locks and understand that theres shouldn't be any conflicts when inserting into tables (which is all the users will be doing) but I just wanted to confirm before moving forward with the app.
I assume it is also important to deploy the app using a hosting service which can handle the load. I am intending to use Heroku which I have experience deploying postgreSQL databases to.
Just in case it is relevant I was intending to use Knex.js to build the database in a node backend.
Happy to provide any further information and would appreciate any input or better suggestions to look into.
Cheers,
Tim

vcard vs sql-table for contacts

The other day I was looking at SOGo SQL tables and saw that the records are stored as vcard data instead of a fine table with different columns like surname, phone number, etc.
Though there is a table called sogo_quick_contacts with the schema I was expecting, not all the columns are there, only some basic ones.
I'm wondering why is it like that way? Is it better to query a record with the whole vcard-data and extract the information I require? Wouldn't it be better (faster) to apply a SELECT query indicating some columns I'm looking for, if they were available?
CardDav seems to provide this vcard-data, are they more suitable to contacts lookup, why?
What if I want to just list the names and birthdays. Wouldn't extracting all the vcards much slower then using a SQL Query where I have everything split up for different columns?
There are a lot of things which played a role in the way the ScalableOGo database schema is designed. Which BTW was designed by me ;-)
I think the core thing here is that it is designed specifically for two types of clients: a) native CardDAV clients (macOS/iOS contacts, Thunderbird) and b) the ScalableOGo web interface.
Native clients essentially never do the type of query you are asking about. They always sync a full vCard to their local cache. So there has to be a fast way to store and retrieve a full vCard, it is the most common operation against the server.
Web clients in 2003 (I suppose that was around the time I wrote the original web client) didn't yet have the capacity to store full objects locally and had to do what you are asking for: query just the fields the web client needs to display on a respective page.
This is what the 'quick' tables are for. They contain the columns the web clients needs to display overviews and such. It is essentially an app server provided index over the vCard content.
This should be the main answer to your question.
There are other reasons too, some in no particular order:
a vCard is quite complex, to convert it to a proper SQL schema / normalise it, is (was at the time, but this is still relevant, since the scale of systems grew 100-fold over the last 15 years) quite compute intensive (hence OpenGroupware.org vs ScalableOGo) A BLOB just needs to be streamed to disk.
a CardDAV server is supposed to store a full vCard as-is, byte-by-byte. So that the clients can do ETag protected requests. And store custom fields (all clients use their own X- tags for client specific fields)
the quick tables are also setup so that they can be build asynchronously, though I think that feature never made it into SOGo. If a client quickly loads 10000 vCards into the server (e.g. just dragging the vCards into the server using Finder), the server can batch-update the quick table in the background. The vCard to DB conversion doesn't have to happen in real time.
(notably native clients often have a similar 'quick' table setup locally.)
Hope this helps. Maybe one would design the thing a little different in 2017, though I think the basic ideas are still sound ;-)

Data modeling Issue for Moqui custom application

We are working on one custom project management application on top of Moqui framework. Our requirement is, we need to inform any changes in ticket to the developers associated with the project through email.
Currently we are using WorkEffortParty entity to store all parties associated with the project and then PartyContactMech entity to store their email addresses. Here we need to iterate through WorkEffortParty and PartyContactMech everytime to fetch all email address to which we need to send emails for changes in tickets every time.
To avoid these iterations, we are now thinking of giving feature to add comma separated email addresses at project level. Project admin can add email addresses of associated parties or mailing list address to which he needs to send email notification for ticket change.
For this requirement, we studied around the data model but we didn't got the right place to store this information. Do we need to extend any entity for this or is there any best practice for this? This requirement is very useful in any project management application. We appreciate any help on this data modeling problem.
The best practice is to use existing data model elements as they are available. Having a normalized data model involves more work in querying data, but also more flexibility in addressing a wide variety of requirements without changes to the data structures.
In this case with a joined query you can get a list of email addresses in a single query based on the project's workEffortId. If you are dealing with massive data and message volumes there are better solutions than denormalizing source data, but I doubt that's the case... unless you're dealing with more than thousands of projects and millions of messages per day the basic query and iterate approach will work just fine.
If you need to go beyond that the easiest approach with Moqui is to use a DataDocument and DataFeed to send updates on the fly to ElasticSearch, and then use it for your high volume queries and filtering (with arbitrarily complex filtering, etc requirements).
Your question is way too open to answer directly, data modeling is a complex topic and without good understanding of context and intended usage there are no good answers. In general it's best to start with a data model based on decades of experience and used in a large number of production systems. The Mantle UDM is one such model.

Multi-tenancy with SQL/WCF/Silverlight

We're building a Silverlight application which will be offered as SaaS. The end product is a Silverlight client that connects to a WCF service. As the number of clients is potentially large, updating needs to be easy, preferably so that all instances can be updated in one go.
Not having implemented multi tenancy before, I'm looking for opinions on how to achieve
Easy upgrades
Data security
Scalability
Three different models to consider are listed on msdn
Separate databases. This is not easy to maintain as all schema changes will have to be applied to each customer's database individually. Are there other drawbacks? A pro is data separation and security. This also allows for slight modifications per customer (which might be more hassle than it's worth!)
Shared Database, Separate Schemas. A TenantID column is added to each table. Ensuring that each customer gets the correct data is potentially dangerous. Easy to maintain and scales well (?).
Shared Database, Separate Schemas. Similar to the first model, but each customer has its own set of tables in the database. Hard to restore backups for a single customer. Maintainability otherwise similar to model 1 (?).
Any recommendations on articles on the subject? Has anybody explored something similar with a Silverlight SaaS app? What do I need to consider on the client side?
Depends on the type of application and scale of data. Each one has downfalls.
1a) Separate databases + single instance of WCF/client. Keeping everything in sync will be a challenge. How do you upgrade X number of DB servers at the same time, what if one fails and is now out of sync and not compatible with the client/WCF layer?
1b) "Silos", separate DB/WCF/Client for each customer. You don't have the sync issue but you do have the overhead of managing many different instances of each layer. Also you will have to look at SQL licensing, I can't remember if separate instances of SQL are licensed separately ($$$). Even if you can install as many instances as you want, the overhead of multiple instances will not be trivial after a certain point.
3) Basically same issues as 1a/b except for licensing.
2) Best upgrade/management scenario. You are right that maintaining data isolation is a huge concern (1a technically shares this issue at a higher level). The other issue is if your application is data intensive you have to worry about data scalability. For example if every customer is expected to have tens/hundreds millions rows of data. Then you will start to run into issues and query performance for individual customers due to total customer base volumes. Clients are more forgiving for slowdowns caused by their own data volume. Being told its slow because the other 99 clients data is large is generally a no-go.
Unless you know for a fact you will be dealing with huge data volumes from the start I would probably go with #2 for now, and begin looking at clustering or moving to 1a/b setup if needed in the future.
We also have a SaaS product and we use solution #2 (Shared DB/Shared Schema with TenandId). Some things to consider for Share DB / Same schema for all:
As mention above, high volume of data for one tenant may affect performance of the other tenants if you're not careful; for starters index your tables properly/carefully and never ever do queries that force a table scan. Monitor query performance and at least plan/design to be able to partition your DB later on based some criteria that makes sense for your domain.
Data separation is very very important, you don't want to end up showing a piece of data to some tenant that belongs to other tenant. every query must have a WHERE TenandId = ... in it and you should be able to verify/enforce this during dev.
Extensibility of the schema is something that solutions 1 and 3 may give you, but you can go around it by designing a way to extend the fields that are associated with the documents/tables in your domain that make sense (ie. Metadata for tables as the msdn article mentions)
What about solutions that provide an out of the box architecture like Apprenda's SaaSGrid? They let you make database decisions at deploy and maintenance time and not at design time. It seems they actively transform and manage the data layer, as well as provide an upgrade engine.
I've similar case, but my solution is take both advantage.
Where data and how data being placed is the question from tenant. Being a tenant of course I don't want my data to be shared, I want my data isolated, secure and I can get at anytime I want.
Certain data it possibly share eg: company list. So database should be global and tenant database, just make sure to locked in operation tenant database schema, and procedure to update all tenant database at once.
Anyway SaaS model everything delivered as server / web service, so no matter where the database should come to client as service, then only render by client GUI.
Thanks
Existing answers are good. You should look deeply into the issue of upgrading and managing multiple databases. Without knowing the specific app, it might turn out easier to have multiple databases and not have to pay the extra cost of tracking the TenantID. This might not end up being the right decision, but you should certainly be wary of the dev cost of data sharing.