How can blockchains be used in audit trails? - audit

I'm currently trying to figure out how to use blockchain in audit trails and potentially in accounting (and if they actually make sense). Both Deloitte and EY mention them.
I somehow cannot understand how this could be of benefit for audits and/or accounting.
To my understanding to make use of the power of blockchains you need multiple users. Only one user means you cannot validate the integrity since all blocks of that user could be compromised (if one block of a blockchain of a user got changed maybe also all of the following where changed, making it impossible to detect the modification). This means blockchains only make sense if you can share them with different users?
Data and thus blockchains however aren't always shared between multiple users. In accounting you often only have one "user"/"owner" of the data. Sure you could create multiple users in one company but there wouldn't be any benefit since they are in one location (company) and potentially all compromised. Or if the admin want's to change something he could easily modify all users making it useless for audits.
To make it work you would need different partners (supplier/customer) to share the information with. In that case you could however only have two users share the same blockchain (depending on legal regulations in your country) and then again who do you trust if one of the two doesn't validate?
Deloitt mentions that they can be used for files. Again I don't see the benefit since you would need multiple users AND files might get compressed with a different algorithm over time rendering them invalid (the useful information didn't change but the block will still be invalid). Or is this a not an issue from your experience? To me it seems it could be a problem.
The same goes for all the internal data which may be important for audits from my point of view. Which company would like to share the information with independent users. Or is it only intendet for "public"/"shared" data?
To identify a modification of one block in a blockchain the user would have ot validate every single block (every hash in the header of a block needs to be compared to the data of the previous block). In terms of accounting a blockchain could be all transactions of one account during one fiscal year. This however could easily be thousands of transactions. Wouldn't this be very slow to validate?
Maybe I'm misunderstanding the point in terms of audit trails but as long as the users are not independent data can always be modified making it useless for audits. And you need a critical mass to share the blockchain with.

First of all, I think that it's neccesary to get the power of Blockchain. It gives us the chance to create descentralized data bases, i.e. data bases that are not controled by an authority. Also, the data of Blockchain is immutable and permanent, i.e. it can not be modified or deleted. Thanks to it you achive a unique descentralized registry in a distributed network, for example for audit trails.
It's true that it has no sense if you use it inside your company. But if you use it among different companies? Each one could encode its data, so the rest of the companies couldn't see it. However, all the data would be stored in all the companies, so anyone couldn't change it. Moreover, you can have more than one user (node) for each company.
Nowadays, there are many implementations of Blockchain, each one with a different objetive. To understan better the power of Blockchain, I suggest you to wathc the video were is explained the new version (the v 1.0) of the Hyperledger Fabric.

Related

MariaDB data separation in public and private, database design

I am working at a company that merged with another company a while ago.
There we have several business units that are basically equivalent. One in Europe, one in China, each. We already had an in-house MariaDB database, which we want to start sharing.
The problem is that there are different GDPR regulations and contracts that prohibit sharing certain data across sites. So what I can't do, is replicate data across sites and then just hide in from the user in the frontend. The private data has to stay at the facility, it belongs to.
So my idea was to separate each table that we have now and where possibly sensitive information is contained into two tables each.
One say table_contracts_private and table_contracts_public.
This would still seem pretty doable with basic database replication and replicating the public tables across sites. But how would you go about publishing private data? Also how would I best combine private and public data? Just using a view
I just could not find any good mechanisms for this, especially because we would also like to avoid data duplication, so the private entries would need to be removed and replaced by the public ones, which would entail also changing all referencing IDs.
Is this a possible application of sharding?
I'd be really grateful, if someone could point me in the right direction, or if someone has a demo project with similar requirements that I could check out.
Cheers
Is this a possible application of sharding?
I wouldn't think so. Sharding is a performance optimization method. What you need is to support legal constraints. Those are two very different problems.
I think you are on the right track. I call this a "walled garden" approach. You create a database with all non-PII information, using ids only. Nothing that remotely directly identifies people, their addresses, phones, credit cards, and so on. This can be tricky. In some jurisdictions combinations of demographics can be PII.
Some of those ids then refer to another database where you store all the sensitive information; this is the "walled garden". I would recommend that this second database be on a separate server. It has a very restricted access list. And this is where you implement requirements for things like "forgetting" a customer.
In any case, the point is that sharding is not the right approach. You want an application redesign with privacy and security as the top priorities. Happily, this is not actually that hard to implement, although if the databases are changing, you may need period auditing. For instance, in one database I worked with, we discovered that "coupon codes" sometimes contained unencrypted email addresses. Arrgggh!

Keeping information private, even from database users

I have a unique use case. I want to create a front-end system to manage employee pay. I will have a profile for each employee and their hourly rate stored for viewing/updates in the future.
With user permissions, we can block certain people from seeing pay in the frontend.
My challenge is that I want to keep developers from opening up the database and viewing pay.
An initial thought was to hash the pay against my password. I'm sure there is some reverse engineering that could be used to get the payout, but it wouldn't be as easy.
Open to thoughts on how this might be possible.
This is by no means a comprehensive answer, but I wanted at least to point out a couple of things:
In this case, you need to control security at the server level. Trying to control security at the browser level, using Javascript (or any similar frameword like ReactJs) is fighting a losing battle. It will be always insecure, since any one (given the necessary time and resources) will eventually find out how to break it, and will see (and maybe even modify) the whole database.
Also, if you need an environment with security, you'll need to separate developers from the Production environment. They can play in the Development environment, and maybe in the Quality Assurance environment, but by no means in the Production environment. Not even read-only access. A separate team controls Production (access, passwords, firewalls, etc.) and deploys to it -- using instructions provided by the developers.

Ever ok to remove referential integrity on database design for 'right to be forgotten' deletion of user records?

I'm currently reviewing a database design that in order to deal with the removal of user records, in order to deal with requirements such as DPA and EU GDPR Right to be Forgotten, is proposing not to enforce referential integrity between the user record and 'related' tables, such as Transaction, Communication Event, etc., so that the user record can be deleted when requested but records in related tables (that use a non-identifying key/sequence number) will remain intact.
So, before I push back on this and open up the 'discussion' that will follow, I wondered whether anyone thought it was ever acceptable to remove, or do without, referential integrity in cases such as this, or should other methods be used - such as masking the user details, or changing the user record to a placeholder record to show that the transaction relates to a redacted user.
All thoughts welcome...
This is a complicated topic that goes beyond referential integrity constraints.
My understanding of the EU privacy restrictions (and I stress that I am not a lawyer) is that they relate to personally identifiable information, not to business related "anonymous" relationships. For instance, I think you can still count a removed user as "active" for the period when they were active; you just can't know who they are.
My approach would be to put all PII data into a single table/database. When a user wishes to be forgotten, I would update the record to remove the PII. All the foreign key relationships are then fine. You are just missing the name, address, email address, and whatever else is deemed PII.
Just identifying the PII is very tricky, because email addresses and user names and so on can be embedded in the most unusual places (URLs are one obvious place to look but there can be others).
I don't recommend actually removing all traces of the person from all databases. You will then be in a situation where your reports no longer balance . . . Oh, our reports said we had 1,000,000 customers then, but we can only find 999,900 of them. Let's waste a bunch of people's efforts to figure out what happened.
My suggestion: Be careful. This is a long process and set expectations in your organization accordingly.
Please have a look at retention laws for your industry. People have the right to be forgotten, but businesses also have a legal obligation to retain certain records for a period of time.
At this point it's unclear to me which regulation overrules another so my advice is you bring in a legal expert that will be able to clear up this matter.
From a technical perspective, your application might require business data related to private data, so a good approach is to flag the records as forgotten and replace private data with generated data. This way, your application keeps behaving the same way, but the private information is gone.
This is a simple approach that can be applied on many legacy applications, even automated as a process.
The only thing you must watch out for are the backups taken as your changes might be reverted if data has to be restored from a backup. Keep a separate table with keys pointing to records require to be forgotten so if a backup is overwriting latest changes, you can use your automation script again to remove those wha want to be forgotten.

Is it considered a poor practiced to use a single database for different uses across different applcations

What if you had one large database to server all your apps. So your website that needs to store customer orders can use the same database that your game uses to store registered users. Different applications could have tables only for them to use. Some may say that this could be a security issue, because if someone cracks your database, they could attack all your applications. But in a lot of databases you could use a line like the following to restrict access:
deny select on aTable to aUser;
I am wondering if this central database would be considered a poor practice, and if so why?
They way I look at it, a web application is nothing more than a collection of web pages. Because of this, it really doesn't matter if one page is about, say, cooking, while the other page is about computer programming.
If you also consider it, this is very similar to Openid, which I use to log into my SO account!
If you have your fundamental security implemented correctly, it doesn't matter how the user is interacting with your website. Where I would make this distinction is in two cases:
Don't mix http with https. On a shared host, this isn't going to be an issue anyway; if you buy the certificate for https, make everything that way (excluding the rare case where this might affect performance).
E-commerce or financial data should be handled fundamentally in a different way. If you look at your typical bank, they have multiple log-in protocols, picture verification and short log-in times. This builds confidence in user's securities. It would be a pain in the butt for a game site, or most other non-mission critical applications.
Regarding structure, if you do mix applications into one large database, you should consider the other maintenance issues, such as:
Keep tables separate; consider a prefix for every table unique to each application. Following my example above, you would then start the cooking DB table names with 'ck', and the computer programming DB table names with 'pg'. This would allow you to easily separate the applications if you need to in the future.
Use a matching table to identify which ID goes to which web application.
Consider what you would do and how to handle it if a user decided to register for both applications. Do you want to offer transparency that they can share the same username?
Keep an eye on both your data storage limit AND your bandwidth limit.
If you are counting on these applications to drive revenue, you are putting "all your eggs in one basket". Make sure if it goes down, you have options to restore or move to another host.
These are just a few of the things to consider. But fundamentally, outside of huge (big data) applications there is nothing wrong with sharing resources/databases/hardware between applications.
Conceptually, it could be done.
Implementation-wise, to make the various parts distinct from one another, you could use both naming conventions (as per #Sable Foste) and/or separate database schemas (table Finance.Users, GameApp.Users, etc.)
Management-wise, things could get tricky. Repeating some points, adding others:
One application could use a disproportionally large share of resources (disk space, I/O, CPU)
Tracking versions could be tricky (App is v4, finance is v7) -- depends on how many application instances you have to support.
Disaster recovery-wise, everything is lumped together. It all gets backed up as one set, it all gets restored as one set. Finance corrupt? Restore from backup... and lose your more recent game data.
Single point of failure. One database goes down, all your applications are down.
These (and other similar issues) are trade-offs you'll want to consider. Plan ahead, to lessen the chance that what's reasonable and economic today becomes a major headache tomorrow.

Multi-tenancy with SQL/WCF/Silverlight

We're building a Silverlight application which will be offered as SaaS. The end product is a Silverlight client that connects to a WCF service. As the number of clients is potentially large, updating needs to be easy, preferably so that all instances can be updated in one go.
Not having implemented multi tenancy before, I'm looking for opinions on how to achieve
Easy upgrades
Data security
Scalability
Three different models to consider are listed on msdn
Separate databases. This is not easy to maintain as all schema changes will have to be applied to each customer's database individually. Are there other drawbacks? A pro is data separation and security. This also allows for slight modifications per customer (which might be more hassle than it's worth!)
Shared Database, Separate Schemas. A TenantID column is added to each table. Ensuring that each customer gets the correct data is potentially dangerous. Easy to maintain and scales well (?).
Shared Database, Separate Schemas. Similar to the first model, but each customer has its own set of tables in the database. Hard to restore backups for a single customer. Maintainability otherwise similar to model 1 (?).
Any recommendations on articles on the subject? Has anybody explored something similar with a Silverlight SaaS app? What do I need to consider on the client side?
Depends on the type of application and scale of data. Each one has downfalls.
1a) Separate databases + single instance of WCF/client. Keeping everything in sync will be a challenge. How do you upgrade X number of DB servers at the same time, what if one fails and is now out of sync and not compatible with the client/WCF layer?
1b) "Silos", separate DB/WCF/Client for each customer. You don't have the sync issue but you do have the overhead of managing many different instances of each layer. Also you will have to look at SQL licensing, I can't remember if separate instances of SQL are licensed separately ($$$). Even if you can install as many instances as you want, the overhead of multiple instances will not be trivial after a certain point.
3) Basically same issues as 1a/b except for licensing.
2) Best upgrade/management scenario. You are right that maintaining data isolation is a huge concern (1a technically shares this issue at a higher level). The other issue is if your application is data intensive you have to worry about data scalability. For example if every customer is expected to have tens/hundreds millions rows of data. Then you will start to run into issues and query performance for individual customers due to total customer base volumes. Clients are more forgiving for slowdowns caused by their own data volume. Being told its slow because the other 99 clients data is large is generally a no-go.
Unless you know for a fact you will be dealing with huge data volumes from the start I would probably go with #2 for now, and begin looking at clustering or moving to 1a/b setup if needed in the future.
We also have a SaaS product and we use solution #2 (Shared DB/Shared Schema with TenandId). Some things to consider for Share DB / Same schema for all:
As mention above, high volume of data for one tenant may affect performance of the other tenants if you're not careful; for starters index your tables properly/carefully and never ever do queries that force a table scan. Monitor query performance and at least plan/design to be able to partition your DB later on based some criteria that makes sense for your domain.
Data separation is very very important, you don't want to end up showing a piece of data to some tenant that belongs to other tenant. every query must have a WHERE TenandId = ... in it and you should be able to verify/enforce this during dev.
Extensibility of the schema is something that solutions 1 and 3 may give you, but you can go around it by designing a way to extend the fields that are associated with the documents/tables in your domain that make sense (ie. Metadata for tables as the msdn article mentions)
What about solutions that provide an out of the box architecture like Apprenda's SaaSGrid? They let you make database decisions at deploy and maintenance time and not at design time. It seems they actively transform and manage the data layer, as well as provide an upgrade engine.
I've similar case, but my solution is take both advantage.
Where data and how data being placed is the question from tenant. Being a tenant of course I don't want my data to be shared, I want my data isolated, secure and I can get at anytime I want.
Certain data it possibly share eg: company list. So database should be global and tenant database, just make sure to locked in operation tenant database schema, and procedure to update all tenant database at once.
Anyway SaaS model everything delivered as server / web service, so no matter where the database should come to client as service, then only render by client GUI.
Thanks
Existing answers are good. You should look deeply into the issue of upgrading and managing multiple databases. Without knowing the specific app, it might turn out easier to have multiple databases and not have to pay the extra cost of tracking the TenantID. This might not end up being the right decision, but you should certainly be wary of the dev cost of data sharing.