So I have, what would seem like a common question that I can't seem to find an answer to. I'm trying to find what is the "best practice" for how to architect a database that maintains data locally, then syncs that data to a remote database that is shared between many clients. To make things more clear, this remote database would have many clients that use it.
For example, if I had a desktop application that stored to-do lists (in SQL) that had individual items. Then I want to be able to send that data to a web-service that had a "master" copy of all the different clients information. I'm not worried about syncing problems as much as I am just trying to think through actual architecture of the client's tables and the web-services tables
Here's an example of how I was thinking about it:
Client Database
list
--list_client_id (primary key, auto-increment)
--list_name
list_item
--list_item_client_id (primary key, auto-increment)
--list_id
--list_item_text
Web Based Master Database (Shared between many clients)
list
--list_master_id
--list_client_id (primary key, auto-increment)
--list_name
--user_id
list_item
--list_item_master_id (primary key, auto-increment)
--list_item_remote_id
--list_id
--list_item_text
--user_id
The idea would be that the client can create todo lists with items, and sync this with the web service at any given time (i.e. if they lose data connectivity, and aren't able to send the information until later, nothing will get out of order). The web service would record the records with the clients id's as just extra fields.
That way, the client can say "update list number 4 with a new name" and the server takes this to mean "update user 12's list number 4 with a new name".
I think they general concept you're working with is the right direction, but you may need to pay careful attention to the use of auto-increment columns. For example, auto-increment on the server is useless if the client is the owner of this ID. Instead, you probably want list.list_master_id to be an auto-increment. Everything else you've mentioned is entirely plausible, though the complexity may increase if there may be multiple clients per user. Then, the use of an auto-increment alone probably isn't sufficient. Instead, you may need a guid or a datatype that also includes a client identifier to prevent id collision.
Without having more details it would be difficult to speculate on what other situations you may need to consider.
SERVER:
list
--id
--name
--user_id
--updated_at
--created_from_device_id
Those 2 tables link all records, might be grouped in one table also.
list_ids
--list_id
--device_id
--device_record_id
user_ids
--user_id
--device_id
--device_record_id
CLIENT (device_id=5)
list
--id
--name
--user_id
--updated_at
That will allow you to save records as(only showing relevant fields):
server
list: id=1, name=shopping, user_id=1234
user: id=27, name=John Doe
list_ids: list_id=1, device_id=5, device_record_id=999
user_ids: user_id=27, device_id=5, device_record_id=567
client
id=999, name=shopping, user_id=567
This way they are totally unaware of any ID's, translations can be done quite fast and you can supply the clients only with information and ID's they know of.
I have the same issue with a project i am working on, the solution in my case was to create an extra nullable field in the local tables named remote_id. When synchronizing records from local to remote database if remote_id is null, it means that this row has never been synchronized and needs to return a unique id matching the remote row id.
Local Table Remote Table
_id (used locally)
remote_id ------------- id
name ------------- name
In the client application i link tables by the _id field, remotely i use the remote id field to fetch data, do joins, etc..
example locally:
Local Client Table Local ClientType Table Local ClientType
_id
remote_id
_id -------------------- client_id
remote_id client_type_id -------------- _id
remote_id
name name name
example remotely:
Remote Client Table Remote ClientType Table Remote ClientType
id -------------------- client_id
client_type_id -------------- id
name name name
This scenario, and without any logical in the code, would cause data integrity failures, as the client_type table may not match the real id either in the local or remote tables, therefor whenever a remote_id is generated, it returns a signal to the client application asking to update the local _id field, this fires a previously created trigger in sqlite updating the affected tables.
http://www.sqlite.org/lang_createtrigger.html
1- remote_id is generated in the server
2- returns a signal to client
3- client updates its _id field and fires a trigger that updates local tables that join local _id
Of course i use also a last_updated field to help synchronizations and to avoid duplicated syncs.
Related
Question
What is the accepted way of using multiple databases that record information about the same object that will ultimately end up living in one central database?
Example
There is one main SQL database about trees.
This database holds information about unique trees from all over the UK.
To collect the information a blank Sqlite database is created (with the same schema) and taken to the tree on a phone.
The collected information is then stored in the Sqlite database until it is brought back to the main database, Where it is then transferred into the main database.
Now this works fine as long as there is only one Sqlite database out for any one tree at a time.
However, if two people wanted to collect different information for the same tree at the same time, when they both came back and attempted to transfer their data in to the main database, there would be collisions on their primary key constraints.
ID Schemes (with example data)
There is a tree table which has unique identifier called treeID
TreeID - TreeName - Location
1001 - Teddington Field - Plymouth
Branch table
BranchID - BranchName - TreeID
1001-10001 - 1st Branch - 1001
1001-10002 - 2nd Branch -1001
Leave table
LeafID - LeafName - BranchId
1001-10001-1 - Bedroom - 1001-10001
1001-10002-2 - Bathroom - 1001-10001
Possible ideas
Assign each database 1000 unique ID's and then one they come back in as the ids have already been assigned the ids on each database won't collide.
Downfall
This isn't very dynamic and could fail if one database overruns on its preassigned ids.
Is there another way to achieve the same flexibility but with out the downfall mentioned above?
So, as an answer:
on the master db, store an extra id field identifying the source/collection database that the dataset was collected on, as well as the tree id.
(src01, 1001), (src02, 1001)
This also allows you to link back easily to the collection source of the information which is likely gonna be a future requirement. Now, you may or may not want to autogenerate another sequence id key value on the master db's table (I wouldn't but that's because I am not that fond of surrogate keys), but I would definitely keep track of the source/treeid it was originally collected with in the field, separately of any master db unique key considerations.
Apparently you are talking about auto-generated IDs for related objects, not the IDs for the trees themselves. Two different people collecting information about the same tree, starting from the same starting set, end up generating the same IDs independently. The two sets of generated IDs cannot coexist in the same DB.
Since you want to keep all the new data. One possible solution is to avoid using the field-generated IDs in the central database at all. When each set of data comes in, take the data that were added in the field, and programmatically add them to the central DB in a way equivalent to how they are added in the field, letting the central DB autogenerate its own IDs.
This requires a mechanism to distinguish newly-collected data from old, but that might be as simple as a timestamp.
What is the best table design for a simple social networking website using Azure Table Service?
The website could have millions of users.
Users need to be able to view a list of all other users in the system sorted by the number of mutual connections.
Users must be able to view a list of their connections
User must be able to view content posted by themselves and their connections.
One major design constraint is that Azure table service queries are generally limited to the partition key and row key when there are a large number of records or else they get really slow. Another constraint is that query results are only sorted by the partition key and then the row key.
Try this Design:
UserTable
PK: GUID ( GUID for PK will maximize scalability, only one partition with single row in each server)
RK: GUID
... Rest of properties
UserFriendsTable
PK: UserTable.RK ( Every User with his friends in a separate server)
RK: GUID
FriendWith: UserTable.Pk - UserTable.RK (Concatenate PK and RK from user table separated with "-", this will help you to execute point query fast when you try to access friend profile )
PostsTable
PK: UserTable.RK + "-" +YYYYMM+ Random number (This will allow azure to put all monthly posts of any user in a separate server. Random number to prevent azure from auto grouping partitions in sequence. You can query posts with filtering PK partly ex: pk start with XCtghi94ktY-201411.
RK use following code to generate row key in descending order. means latest post comes first.
long ticks = DateTimeOffset.MaxValue.UtcDateTime.Ticks - DateTimeOffset.Now.UtcDateTime.Ticks;
string guid = Guid.NewGuid().ToString("N");
string suffix = "-";
string.Format("{0:d21}{1}{2}", ticks, suffix, guid);
Post : String
Hi everyone i have a project i am working on that consists on keeping tables the same at 3 different locations
i pull data that doesnt exist from each of these locations into a corporate table, i then need to send back down to the locations the new data so they are all the same
The table i am pulling from is a identity
My question is in Sql is there any way to make a table a identity without making it a identity as in setting the default value to be the max(id)+1, this is the only way i can figure i can keep the data structure the same without going to replication
The problem is that you're generating records in an IDENTITY field in multiple sources, then unable to combine them without those records being assigned new IDENTITY values.
By using a GUID as your key field, each of the 3 databases can create records which will have a unique ID, and you'll be able to then combine them without issue. You can still have a UNIQUE constraint on the field, but the likelihood of generating the same GUID is astronomically small.
Most replication processes utilize this GUID approach at some level already, so it's a common solution to this problem.
The think I'm trying to implement is an id table. Basically it has the structure (user_id, lecturer_id) which user_id refers to the primary key in my User table and lecturer_id refers to the primary key of my Lecturer table.
I'm trying to implement this in redis but if I set the key as User's primary id, when I try to run a query like get all the records with lecturer id=5 since lecturer is not the key, but value I won't be able to reach it in O(1) time.
How can I form a structure like the id table I mentioned in above, or Redis does not support that?
One of the things you learn fast while working with redis is that you get to design your data structure around your accessing needs, specially when it comes to relations (it's not a relational database after all)
There is no way to search by "value" with a O(1) time complexity as you already noticed, but there are ways to approach what you describe using redis. Here's what I would recommend:
Store your user data by user id (in e.g. a hash) as you are already doing.
Have an additional set for each lecturer id containing all user ids that correspond to the lecturer id in question.
This might seem like duplicating the data of the relation, since your user data would have to store the lecture id, and your lecture data would store user ids, but that's the (tiny) price to pay if one is to build relations in a no-relational data store like redis. In practical terms this works well; memory is rarely a bottleneck for small-ish data-sets (think thousands of ids).
To get a better picture at how are people using redis to model applications with relations, I recommend reading Design and implementation of a simple Twitter clone and the source code of Lamernews, both of which are written by redis author Salvatore Sanfilippo.
As already answered, in vanilla Redis there is no way to store the data only once and have Redis query them for you.
You have to maintain secondary indexes yourself.
However with the modules in Redis, this is not necessary true. Modules like zeeSQL, or RediSearch allow to store data directly in Redis and retrieve them with a SQL query (for zeeSQL) or simil SQL for RediSearch.
In your case, a small example with zeeSQL.
> ZEESQL.CREATE_DB DB
OK
> ZEESQL.EXEC DB COMMAND "CREATE TABLE user(user_id INT, lecture_id INT);"
OK
> ZEESQL.EXEC DB COMMAND "SELECT * FROM user WHERE lecture_id = 3;"
... your result ...
Earlier today I asked this question which arose from A- My poor planning and B- My complete disregard for the practice of normalizing databases. I spent the last 8 hours reading about normalizing databases and the finer points of JOIN and worked my way through the SQLZoo.com tutorials.
I am enlightened. I understand the purpose of database normalization and how it can suit me. Except that I'm not entirely sure how to execute that vision from a procedural standpoint.
Here's my old vision: 1 table called "files" that held, let's say, a file id and a file url and appropos grade levels for that file.
New vision!: 1 table for "files", 1 table for "grades", and a junction table to mediate.
But that's not my problem. This is a really basic Q that I'm sure has an obvious answer- When I create a record in "files", it gets assigned the incremented primary key automatically (file_id). However, from now on I'm going to need to write that file_id to the other tables as well. Because I don't assign that id manually, how do I know what it is?
If I upload text.doc and it gets file_id 123, how do I know it got 123 in order to write it to "grades" and the junction table? I can't do a max(file_id) because if you have concurrent users, you might nab a different id. I just don't know how to get the file_id value without having manually assigned it.
You may want to use LAST_INSERT_ID() as in the following example:
START TRANSACTION;
INSERT INTO files (file_id, url) VALUES (NULL, 'text.doc');
INSERT INTO grades (file_id, grade) VALUES (LAST_INSERT_ID(), 'some-grade');
COMMIT;
The transaction ensures that the operation remains atomic: This guarantees that either both inserts complete successfully or none at all. This is optional, but it is recommended in order to maintain the integrity of the data.
For LAST_INSERT_ID(), the most
recently generated ID is maintained in
the server on a per-connection basis.
It is not changed by another client.
It is not even changed if you update
another AUTO_INCREMENT column with a
nonmagic value (that is, a value that
is not NULL and not 0).
Using
LAST_INSERT_ID() and AUTO_INCREMENT
columns simultaneously from multiple
clients is perfectly valid. Each
client will receive the last inserted
ID for the last statement that client
executed.
Source and further reading:
MySQL Reference: How to Get the Unique ID for the Last Inserted Row
MySQL Reference: START TRANSACTION, COMMIT, and ROLLBACK Syntax
In PHP to get the automatically generated ID of a MySQL record, use mysqli->insert_id property of your mysqli object.
How are you going to find the entry tomorrow, after your program has forgotten the value of last_insert_id()?
Using a surrogate key is fine, but your table still represents an entity, and you should be able to answer the question: what measurable properties define this particular entity? The set of these properties are the natural key of your table, and even if you use surrogate keys, such a natural key should always exist and you should use it to retrieve information from the table. Use the surrogate key to enforce referential integrity, for indexing purpuses and to make joins easier on the eye. But don't let them escape from the database