Access control in BigQuery - google-bigquery

In the documentation of BigQuery, it has only three types of sources: organization, project and dataset. Roles and permissions are on these resources.My question is is there any way to define an access control to a particular table in a dataset?

No, you cannot define access control on table. Only down to dataset!
At the same time there is a way to define row-level access
Yet, another option for you (depends on specific use case) would potentially be Protecting Data with Cloud KMS Keys - this will not control access but rather ability to see actual data vs. encrypted

With BigQuery you can define read access up to a per-row level:
https://cloud.google.com/bigquery/docs/views#row-level-permissions
With that said, going to a per-row or per-table level access will take a lot more work (involving authorized views) than working the native project/dataset access controls.

Related

BigQuery dataset level access control via IAM

Issue: In GCP IAM I have >30 users assigned the pre-defined roles BigQuery Data Viewer and BigQuery Data Editor, and now when I create a new dataset, it's automatically accessible to these 30+ users because of "policy inheritance".
Question: As BQ project admin, I want a newly created dataset only accessible to certain users (a small subset of the 30+ users). What's the best approach to do this? Thanks!
You cannot override the permissions granted at higher leves. So, if you want to restrict access at dataset level, the best approach would be to:
1) Remove the current permissions BigQuery Data Viewerand BigQuery Data Editor from project level.
2) Grant the permissions again, but only at dataset level
This also complies with the recommended best practice of least privilege. Also, if possible, use groups to grant the permissions, as it will be easier to manage.
In addition to this, you could use another project to create the dataset and allow access to the desired subset of users; however, I wouldn't recommend this approach as it only makes more difficult to handle the data and the users with access to them.

How to phrase an answer, data is separated at the logic / programmatic / application layer?

How do I describe the partition of client data when all data is stored in one place and separated via programming?
If a collection of data from various clients is stored in a variety of SQL tables and is separated via the code (E.g. members from different orgs defined by organisation table) at which layer is the data separation defined?
Sorry if this question is a bit poorly worded.
In terms of how to explain it, I'd need more information on how you're actually separating the data for consumption by different members, but we've done a similar thing using SQL views. In our case, it's pretty easy to explain because each role (i.e., a set of user permissions determined by their need-to-know) has a set of SQL views they have permissions to view and query but not modify. Then users can query the views as needed to make their own reports and datasets.
If you're looking for more technical jargon, this was one of the documents we came across when setting up our security.
It might be easiest to explain that each data element has a set of roles that have access to that data element. Your role within the multitude of client organizations determines which data elements you can work with in your reports. Then you would just want to use very strong language indicating how you have implemented safeguards ensuring that users cannot, in any way, access data that is not relevant to their need-to-know.

MS Access column-level security

I need to be able to restrict access to a specific columns in my database.
The user cannot make a SQL view of columns A and D or C and D but are allowed to for B and D.
Any suggestions and help would be greatly appreciated.
The Access Database Engine is not designed to manage security in the way that you have described*. You could restrict access to certain items at the application level, but users would still be able to open the back-end database file directly and see things that you apparently don't want them to see.
If this sort of security really is important then you should use a client/server back-end database and set permissions on various objects at the database level. For example, any edition of Microsoft SQL Server, even the Express Edition, can do this.
*(The older Access .mdb database format supported user-level security, but that security model has been deprecated.)
As a general approach for relational databases, I would just suggest a view creation (which would contain only columns available for particular users) and then just allowing access to the views and forbidding access to the tables.

Mongodb autosharding vs. authentication

Long time lurker, first time poster, please bear with me.
I'm trying to set up a sharded, secure Mongodb environment. I would like to make use of Mongo's autosharding capability, since I'm sort of new to databases and on a tight schedule.
It seems that autosharding only applies to individual collections (tables), but I don't want users to have access to the entire collection. Further, mongoDB only allows authentication into databases, so once authenticated, a user can see 1) every collection in the db and 2) all data within each collection. So, as far as I can tell, I can either have autosharding and no authentication, or manual sharding and authentication.
I would like the best of both worlds, that is: autosharding and authentication. Is this possible? If not, how should I go about manual sharding in MongoDB?
A simplified use case of this system: collection 'Users' has data on every user. I want to authenticate user X so that X can only see X's data in the User's collection. And Users is distributed across multiple servers partitioned (sharded) by user_name.
MongoDb doesn't have authentication like traditional SQL databases. In fact if you read the manual its recommended that you use a secured environment instead of using authentication. Any access control to your data would be implemented within your application.
Even with traditional SQL, access isnt control by row. Thats usually something implemented at the application level based on some sort of key within the data.

Can I create domain schema only (without any data) in Amazon SimpleDB?

I am evaluating Amazon SimpleDB at this time. SimpleDB is very flexible in the sense that it does not have to have table (or domain) schemas. The schema evolves as the create / update commands flow in. All this is good but while I am using a modeling tool (evaluating MindScape LightSpeed) I require the schema upfront, in order for the tool to generate models based on the schema. I can handcraft domains in SimpleDB and that does help but for that I have to perform at least one create operation on the domain. I am looking for the ability to create domain schema only. Any clues?
There is no schema in SimpleDB.
This is the reason why the NoSQL people suggest to "unlearn" relational databases before shifting the paradigm to these non-relational data stores.
So, you cannot do what you describe. Without the data, there will be nothing.
While it's true that SimpleDB has no schema support, keeping some type information turns out to be crucial if you run queries on on numeric data or dates*. Most NoSQL products have both queries and types, or else no-queries and no-types, but SimpleDB has chosen queries and no-types.
As a result, integrating with any tool outside of your main application will require you to either:
store duplicate type information in different places
create your own simple schema system to store the type information
Option 2 seems much better and choosing it, despite what some suggest, does not mean that you "don't have your mind right."
S3 can be a good option for this data, you can keep it in a file with the same name as your domain and it will be accessible from anywhere with the same AWS credentials as your SimpleDB account.
Storing the data as a list of attributename=formatname is the extent of what I have needed to do. You can, in fact, store all this in an item in your domain. The only issue is that this special item could unintentionally come back from a domain query where you are expecting live data not type information.
I'm not familiar with MindScape LightSpeed, but this is a general strategy I have found beneficial when using SimpleDB, and if the product is able to load/store a file in S3 then all the better.
*Note: just to be clear, I'm not talking about reinventing the wheel or trying to use SimpleDB as a relational database. I'm talking about the fact that numeric data must be stored with both zero padding (to a length of your choosing) and an offset value (depending on if it is signed or unsigned) in order to work with SimpleDB's string-base query language. Once you decide on a format, or a set of formats to be used in your application, it would be folly to leave that information hidden in and scattered across your source files in the case where that information is needed by source code tools, query tools, reporting tools or any other code.