I need to store some user data in one server database. It is not sensitive data, but neither I or other users should get access to this content. I want the owner to be confident that even I or other developers will not be able to read his data querying the database.
Tables structure will be like this: (ID | Data) where Data should contain an encrypted JSON string.
I would like to do encryption/decryption on the client, using a secret key that would not persisted anywhere (user should keep this password or otherwise he would not be able to get his data anymore).
Do you see any pitfall in this approach?
Related
I want to create an anonymized version of a table where one of the id fields (long) needs to be anonymized.
The table is queried by a huge number of different business stake holders so I would prefer to not change the field type in order to minimize SQL changes for consumers.
I guess it requires some sort of HMAC like hash algorithm with a secret that makes the mapping fully one-way after the secret is deleted/forgotten.
This sounds like something that one should not roll yourself.
It has to be secure and have very few collisions.
Is there something recommended by GDPR specialists?
Or is this not really possible? (We will need to change the field to a larger "string" field)
I'm new to databases and I'm thinking of creating one for a website. I started with SQL, but I really am not sure if I'm using the right kind of database.
Here's the problem:
What I have right now is the first option. So that means that, my query looks something like this:
user_id photo_id photo_url
0 0 abc.jpg
0 1 123.jpg
0 2 lol.png
etc.. But to me that seems a little bit inefficient when the database becomes BIG. So the thing I want is the second option shown in the picture. Something like this, then:
user_id photos
0 {abc.jpg, 123.jpg, lol.png}
Or something like that:
user_id photo_ids
0 {0, 1, 2}
I couldn't find anything like that, I only find the ordinary SQL. Is there anyway to do something like that^ (even if it isn't considered a "database")? If not, why is SQL more efficient for those kinds of situations? How can I make it more efficient?
Thanks in advance.
Your initial approach to having a user_id, photo_id, photo_url is correct. This is the normalized relationship that most database management systems use.
The following relationship is called "one to many," as a user can have many photos.
You may want to go as far as separating the photo details and just providing a reference table between the users and photos.
The reason your second approach is inefficient is because databases are not designed to search or store multiple values in a single column. While it's possible to store data in this fashion, you shouldn't.
If you wanted to locate a particular photo for a user using your second approach, you would have to search using LIKE, which will most likely not make use of any indexes. The process of extracting or listing those photos would also be inefficient.
You can read more about basic database principles here.
Your first example looks like a traditional relational database, where a table stores a single record per row in a standard 1:1 key-value attribute set. This is how data is stored in RDBMS' like Oracle, MySQL and SQL Server. Your second example looks more like a document database or NoSQL database, where data is stored in nested data objects (like hashes and arrays). This is how data is stored in database systems like MongoDB.
There are benefits and costs to storing data in either model. With relational databases, where data is spread accross multiple tables and linked by keys, it is easy to get at data from multiple angles and aggregate it for multiple purposes. With document databases, data is typically more difficult to join in single queries, but much faster to retrieve, and also typically formatted for quicker application use.
For your application, the latter (document database model) might be best if you only care about referencing a user's images when you have a user ID. This would not be ideal for say, querying for all images of category 'profile pic' or for all images uploaded after a certain date. You could probably accomplish your task with either database type, and choosing the right database will always depend on the application(s) that it will be used for, but as a general rule-of-thumb, relational databases are more flexible and hard to go wrong with.
What you want (having user -> (photo1, photo2, ...)) is kind of an INDEX :
When you execute your request, it will go to the INDEX and fetch the INDEX "user" in the photos table, and get the photo list to fetch. Not all the database will be looked up, it's optimised.
I would do something like
Users_Table(One User - One Photo)
With all the column that every user will have. if one user will have only one photo then just add a column in this table with photo_url
One User Many Photos
If one User Can have multiple Photos. then create a table separately for photos which contains only UserID from Users_Table and the Photo_ID and Photo_File.
Many Users Many Photos
If One Photo can be assigned to multiple users then Create a Separate table for Photos Where there are PhotoID and Photo_File. Third Table User_Photos which can have UserID from Users_Table and Photo_ID from Photos Table.
I have a database table that I'd like to prevent a user from modifying the values/rows. How can I accomplish this?
Here are some criteria:
The table to be protected has a single column with data stored in plain text.
Other columns can be added to the table if needed to help protect the single column.
My application needs to have the ability to add, edit, and delete the values/rows in the table.
For this question, I am assuming the user has full and direct admin/read/write access to the database, i.e. the user can log into the database the execute queries directly.
If the user changes the values directly in the database, my application needs to flag that this has happened when it inspects the table.
There are other tables in the database, but they do not need to be protected in this manner. They can be used, if needed, to help protect the first table.
A database engine agnostic solution would be nice, but I am using SQL Server 2005 or later.
For example:
Let's say my table has 3 rows with data "A", "B", "C". My application should be able change the values to "A", "B", "D", but not my user (through direct modification of the database). Additionally, I expect that if my application changes the values to "A", "B", "D", the user cannot edit the table directly to go back to "A", "B", "C". If that happens, the application will flag that the table has been tampered with.
all i can think of here, would be some sort of MAC or signature schema
derive a hash from the data in the columns you want to protect, plus a secret value to obtain a message authentication code, and store that in another column ...
your application can calculate that MAC again when you want to test for integrity ...
problem: that secret needs to be stored somewhere ...
you can also setup this as a digital signature schema where you application only holds the verification key, and a service somewhere will timestamp hash and sign the data for your application ... that way, you only have to keep the secret key of that service a real secret ...
So I have, what would seem like a common question that I can't seem to find an answer to. I'm trying to find what is the "best practice" for how to architect a database that maintains data locally, then syncs that data to a remote database that is shared between many clients. To make things more clear, this remote database would have many clients that use it.
For example, if I had a desktop application that stored to-do lists (in SQL) that had individual items. Then I want to be able to send that data to a web-service that had a "master" copy of all the different clients information. I'm not worried about syncing problems as much as I am just trying to think through actual architecture of the client's tables and the web-services tables
Here's an example of how I was thinking about it:
Client Database
list
--list_client_id (primary key, auto-increment)
--list_name
list_item
--list_item_client_id (primary key, auto-increment)
--list_id
--list_item_text
Web Based Master Database (Shared between many clients)
list
--list_master_id
--list_client_id (primary key, auto-increment)
--list_name
--user_id
list_item
--list_item_master_id (primary key, auto-increment)
--list_item_remote_id
--list_id
--list_item_text
--user_id
The idea would be that the client can create todo lists with items, and sync this with the web service at any given time (i.e. if they lose data connectivity, and aren't able to send the information until later, nothing will get out of order). The web service would record the records with the clients id's as just extra fields.
That way, the client can say "update list number 4 with a new name" and the server takes this to mean "update user 12's list number 4 with a new name".
I think they general concept you're working with is the right direction, but you may need to pay careful attention to the use of auto-increment columns. For example, auto-increment on the server is useless if the client is the owner of this ID. Instead, you probably want list.list_master_id to be an auto-increment. Everything else you've mentioned is entirely plausible, though the complexity may increase if there may be multiple clients per user. Then, the use of an auto-increment alone probably isn't sufficient. Instead, you may need a guid or a datatype that also includes a client identifier to prevent id collision.
Without having more details it would be difficult to speculate on what other situations you may need to consider.
SERVER:
list
--id
--name
--user_id
--updated_at
--created_from_device_id
Those 2 tables link all records, might be grouped in one table also.
list_ids
--list_id
--device_id
--device_record_id
user_ids
--user_id
--device_id
--device_record_id
CLIENT (device_id=5)
list
--id
--name
--user_id
--updated_at
That will allow you to save records as(only showing relevant fields):
server
list: id=1, name=shopping, user_id=1234
user: id=27, name=John Doe
list_ids: list_id=1, device_id=5, device_record_id=999
user_ids: user_id=27, device_id=5, device_record_id=567
client
id=999, name=shopping, user_id=567
This way they are totally unaware of any ID's, translations can be done quite fast and you can supply the clients only with information and ID's they know of.
I have the same issue with a project i am working on, the solution in my case was to create an extra nullable field in the local tables named remote_id. When synchronizing records from local to remote database if remote_id is null, it means that this row has never been synchronized and needs to return a unique id matching the remote row id.
Local Table Remote Table
_id (used locally)
remote_id ------------- id
name ------------- name
In the client application i link tables by the _id field, remotely i use the remote id field to fetch data, do joins, etc..
example locally:
Local Client Table Local ClientType Table Local ClientType
_id
remote_id
_id -------------------- client_id
remote_id client_type_id -------------- _id
remote_id
name name name
example remotely:
Remote Client Table Remote ClientType Table Remote ClientType
id -------------------- client_id
client_type_id -------------- id
name name name
This scenario, and without any logical in the code, would cause data integrity failures, as the client_type table may not match the real id either in the local or remote tables, therefor whenever a remote_id is generated, it returns a signal to the client application asking to update the local _id field, this fires a previously created trigger in sqlite updating the affected tables.
http://www.sqlite.org/lang_createtrigger.html
1- remote_id is generated in the server
2- returns a signal to client
3- client updates its _id field and fires a trigger that updates local tables that join local _id
Of course i use also a last_updated field to help synchronizations and to avoid duplicated syncs.
I have a User table in my Postgres database. In my application, the User can have various allowed websites. My question is: which is more disk space efficent, having a many-to-many relationship between a user and a url or storing the array in JSON in a column in the User table. Essintially, how much space does postgres use to store table headers.
Thanks.
which is more disk space efficent, having a many-to-many relationship between a user and a url or storing the array in JSON in a column in the User table.
Updating a many-to-many relationship means an UPDATE (and/or DELETE?) statement.
Updating a JSON array stored in a database tables means:
SELECTing the data to get it out of the database, to the application
Manipulating the data in the application
UPDATE statement to write the updated JSON array back to the table
Which is simpler/more efficient to you?