How can I restrict access to sensitive columns in Apache Drill view based on user permissions in another view? - sql

Background:
I have users connect to Apache Drill with Kerberos authentication to read from a Parquet file so basically a single table with multiple columns. Some of the columns in that file are known to be sensitive and only certain users can see them. Apart from the data table Drill has access to another table with information who has access to sensitive data (2 columns there: userId, sensitiveDataAccess). To emphasize, users can see all rows in the data table, but only those who have access to sensitve data can see the sensitive columns.

This can be achieved using impersonation.
https://drill.apache.org/docs/configuring-user-impersonation/

The solution is to create a view joining the data table with a row from the security table containing information about access to sensitive data for the logged in user and then using conditions in the SELECT clause to nullify sensitive columns if a user does not have access to them.
SELECT
hc.name,
CASE WHEN sec.`sensitiveDataAccess`=TRUE THEN hc.`salary` ELSE null END AS salary, --example of a sensitive column
FROM dfs.`/data/headcount.parquet` hc
JOIN (SELECT * FROM dfs.`/data/security.parquet` WHERE userId=session_user) sec
ON sec.userId=session_user;
You might need to enable cartesian joins in Drill to make it work or add a dummy column with zeroes in both tables and then add the below to the join condition as a workaround:
AND hc.JoinHack=sec.JoinHack

Related

How to assign the IDs to the referring table and how to display this correctly? (SSMS)

I am in the process of creating an audit plan using ERD, going off the below image you can see that there's a permissions table with four FK columns referring to the other four tables PK column. I am just confused as to how the IDs will relate to the other tables and how will it show up correctly in the permissions table?
For the Users table, I imported the data from 'master.sys.server_principals.
For the Instance table, I imported the data by using ##SERVERNAME.
For the Databases table, I imported the data from master.sys.databases.
For the Object Types table, I imported the data from master.sys.objects.
Now, I am currently on the permissions table and stuck at this point because I am wondering how will the IDs match from the four other tables (mentioned above and shown in the image link below) to this permissions table. I know I need to query from master.sys.database_permissions to get the information for both columns 'Permissions_Permission_Name' and 'Permissions_Object_Name' but it's just the other four ID columns which I am confused about...(you can ignore the column Permissions_ID)
I'm going to use the Answer field, because there is no space in the Comment editor. This answer is an answer to only part your question, two of the four tables (Databases and Users) I can relate to system tables.
First and foremost: when filling in Id's, you would generate the other table records first, keep the Identity Id's generated, and finally create a new Permission record and fillin the correct indexes there, in each Id field. That counts for any such change when a table contains indexes to other tables. Suppose you know.
Issue is, your structure differs from the system tables. You will need more "permission" records than master.sys.database_permissions, because MsSQL registers these as permissions per principal (role) not permissions per user.
I solved two of the four:
The user is connected to a principal role via master.sys.database_role_members. The Id of the user role can be found in your source as master.sys.database_permissions.grantee_principal_id and the corresponding users that have this principal_id are listed in master.sys.database_role_members.
Your permission a database (ONE database) is defined in your Permission record. The database name in this database record should map to a database on your server. In that database, you will find database_permissions.sys.server_principals. users that have the permissions are (again) found in master.sys.database_role_members.
I'm not sure what you intend to do with the other 2 tables, Instances and Object Types.
Refer ms-docs about the subject at https://learn.microsoft.com/en-us/sql/relational-databases/system-catalog-views/sys-database-permissions-transact-sql?view=sql-server-ver15

Hide rows in a table from a user group

We have a table and multiple users access that table. However we would like to hide records in that table from a user group lets say GroupA. So that they don't see any records in their database except the ones that were created by them.
GroupA users should also have an option to add new records and edit their own record.
However, all other users (except groupA) should be able to see all records in the table.
We have SQL Server 2012.
Thanks.
You could create a view on the table for GroupA restricting records with appropriate criteria and grant SELECT to GroupA on view.
This is called row level security.
One common way is to allow access to the table only via a view.
(You can also write into a view).
The view must contain a WHERE statement which selects the rows the user is allowed to see.
BTW, be sure to use ORIGINAL_LOGIN when you reference to the current user.

How do I give different users access to different rows without creating separate views in BigQuery?

In this question: How do I use row-level permissions in BigQuery? it describes how to use an authorized view to grant access to only a portion of a table. But I'd like to give different users access to different rows. Does this mean I need to create separate views for each user? Is there an easier way?
Happily, if you want to give different users access to different rows in your table, you don't need to create separate views for each one. You have a couple of options.
These options all make use of the SESSION_USER() function in BigQuery, which returns the e-mail address of the currently running user. For example, if I run:
SELECT SESSION_USER()
I get back tigani#google.com.
The simplest option, then, for displaying different rows to different users, is to add another column to your table that is the user who is allowed to see the row. For example, the schema: {customer:string, id:integer} would become {customer:string, id:integer, allowed_viewer: string}. Then you can define a view:
#standardSQL
SELECT customer, id
FROM private.customers
WHERE allowed_viewer = SESSION_USER()
(note, don't forget to authorize the view as described here).
Then I'd be able to see only the fields where tigani#google.com was the value in the allowed_viewer column.
This approach has its own drawbacks, however; You can only grant access to a single user at a time. One option would be to make the allowed_viewer column a repeated field; this would let you provide a list of users for each row.
However, this is still pretty restrictive, and requires a lot of bookkeeping about which users should have access to which row. Chances are, what you'd really like to do is specify a group. So your schema would look like: {customer:string, id:integer, allowed_group: string}, and anyone in the allowed_group would be able to see your table.
You can make this work by having another table that has your group mappings. That table would look like: {group:string, user_name:string}. The rows might look like:
{engineers, tigani#google.com}
{engineers, some_engineer#google.com}
{administrators, some_admin#google.com}
{sales, some_salesperson#google.com}
...
Let's call this table private.access_control. Then we can change our view definition:
#standardSQL
SELECT c.customer, c.id
FROM private.customers c
INNER JOIN (
SELECT group
FROM private.access_control
WHERE SESSION_USER() = user_name) g
ON c.allowed_group = g.group
(note you will want to make sure that there are no duplicates in private.access_control, otherwise it could records to repeat in the results).
In this way, you can manage the groups in the private.access_control separately from the data table (private.customers).
There is still one piece missing that you might want; the ability for groups to contain other groups. You can get this by doing a more complex join to expand the groups in the access control table (you might want to consider doing this only once and saving the results, to save the work each time the main table is queried).

Joining two tables on different database servers

I need to join two tables Companies and Customers.
Companies table is in MS SQLServer and Customer table is in MySQL Server .
What is the best way to achieve this goal ?
If I am understand correctly, you need to join tables in SQL Server, not in code, because tag is sql.
If I have right, then you need to do some administrative tasks, like server linking.
Here you have an explanation how to link MySQL server into MSSQL server.
After you successfully link those servers, then your syntax is simple as:
SELECT
[column_list]
FROM companies
JOIN [server_name].[database_name].[schema_name].[table_name]
WHERE ...
Keep in mind that when accessing tables that exist on linked server, then you must write four-part names.
In order to query 2 databases, you need 2 separate connections. In this case, you would also need separate drivers, since you have a MSSQL and a MySQL database. Because you need separate connections, you need 2 separate queries. Depending on what you want to do, you can first retrieve your Companies and then do a query on Customers WHERE company = 'some value from COMPANIES' (or the other way around).
You could also just fetch every row from both tables in their own lists and compare those lists in your code, rather than using a query.
Try the following:
1 retrieve the data from Companies table from the SQL server and store the required columns in an ArrayList<HashMap<String,String>> format.
Therefore creating rows as arraylist index and HashMap as the key value pair responding to column names. Key: column name and Value as column value of that row.
2 Then pull data from Customer tables adding a where clause by converting data from your first Map into a comma separated format. Thus creating a filter similar to the join in SQL.
Add the data to the same result set data as before thus not over lapping the column names in the HashMap.
when u need to access the 5th row column7 then u write
ArrayList.get(4).get("column7");
The logic is given, Please implement it yourself.
Select Companies from DB1
Select Customers from DB2
Put them in Map<WhatToJoinOn, Company> and Map<WhatToJoinOn, Customer>
Join on map keys, creating a List<CompanyCustomer>

How can I get a complete list of unique values from multiple columns in multiple tables?

Back story: I have an odd situation. An organization affiliated with my own provides us with a database that we use heavily; a few other organizations also use it, but our records are easily more than 90% of the total. We have a web frontend for data entry that is connected to the live database, but we only get the backend data as an Access file of selected tables that are sent to us periodically.
That's a hassle in general, but a critical problem that I run into in every report is differentiating records produced by our organization from others'. Records are identified by the staff who created them, but I don't have (and am unlikely to get) the users table itself - which means I have to manually keep a list of which user IDs correspond to which users, and if those users belong to our organization, etc. Right now, I'm building a sort of shadow DB that links to the data extract and has queries that append that kind of information onto the data tables - so when I pull out a list of records, I can get them by user ID, name, organization, role, etc.
The problem: not all users create or modify records of all types, so the user IDs I need to make this list complete are scattered across several tables. How can I create a list of unique user IDs from across all of these tables? I'm currently using a union of the IDs from the two biggest tables, but I don't know if I can stack subquery upon subquery to make that work - and I'm kind of hesitant to dive into writing that for Access without knowing if it will ultimately work. I'm interested in other methods, too.
TL;DR: What's the simplest way to get a column of the unique values of several columns that are spread across several tables?
Combine SELECT queries on each of the tables into a UNION query. The UNION query returns distinct values.
SELECT UserID FROM Table1
UNION
SELECT UserID FROM Table2
UNION
SELECT UserID FROM Table3;