Ranger Permissions | Hive table acess based on Partition Directory - hive

I was looking for help on following usecase to be implemted by Ranger Authorization service in HDP.
I have a hive table 'customer' which holds below two partitions loaded from HDFS.
/data/mydatabase/customer/partition1/
/data/mydatabase/customer/SenstivePartition2/
I have two user - user1 and user2 and I want to define a policy in such a way that
user1 --> should be able to access --> partition1
user2 --> should be able to access --> partition1 and SenstivePartition2 both.
As Second partition is highly sensitive therefore I do not want to define table level policy otherwise both user will get all access.
Thanks
Shashi

I don't think you can set authorization based on partitions based on this forum info. You can set up a partitioned materialized view in HDP 3.0.1 and later as described in these docs and then set up the Ranger authorization on the views as tables.

It is not exactly what you asked, but what may help you is the even more fine-grained row level access control.
Here is documented how you can set up row-level filtering in ranger for hive tables:
https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.5/bk_security/content/ranger_row_level_filtering_in_hive.html

Related

Who and how many times accessed Hive tables

I want to know how many times my hive tables are accessed.
The details I would like to get here are, the tableName and how many times it was accessed. Eg:-
tableName
No.Of Access
Table1
100
Table2
80
....
....
Table n
n
Is there any Hive/Linux command/code to do so? Also, I tried to understand the last access time of my table using
describe formatted database.table
But it shows me
Name
type
'LastAccessTime:
'UNKNOWN
Any suggestions/help is greatly appreciated.
Check out your hive audit log.
Audit Logs Audit logs are logged from the Hive metastore server for
every metastore API invocation.
An audit log has the function and some of the relevant function
arguments logged in the metastore log file. It is logged at the INFO
level of log4j, so you need to make sure that the logging at the INFO
level is enabled (see HIVE-3505). The name of the log entry is
"HiveMetaStore.audit".
Audit logs were added in Hive 0.7 for secure client connections
(HIVE-1948) and in Hive 0.10 for non-secure connections (HIVE-3277;
also see HIVE-2797).
There is also HDFS audit logs that you could use to derive access to hive tables.
If you have Ranger enabled that is your best bet to help see who is access what.

IN with a list of numbers (BigQuery + GoogleSheetS)

I have access to BigQuery through work, but no write access. Just read.
So I have a bunch of integers in a GoogleSheets, one column (~400):
User.
332321
031230
938101
These numbers all correspond to a specific value in a table in BQ, but unfortunately, they aren't easily queried, as they are the result of multiple queries, etc.
So my dilemma. How can I take the column of integers from GoogleSheets and then use it in a query (say, in a WHERE clause)? My only suggestion has been to get write access: https://supermetrics.com/blog/bigquery-query-google-sheets
You will need to create an external table in BigQuery using your Google Sheet data for you to be able to query it in BigQuery. However, as you have already mentioned, the only solution for this is to get write/create access permission to BigQuery which is also mentioned in your provided reference --> https://supermetrics.com/blog/bigquery-query-google-sheets.
In addition, you may refer to Query Google Drive Data documentation as it conatains more details about the permissions needed for BigQuery and Google Drive before you can create/query an external table in BigQuery and also the actual creation and query execution.
BigQuery permissions
At a minimum, the following permissions are required to create and query an external table in BigQuery.
bigquery.tables.create
bigquery.tables.getData
bigquery.jobs.create
Drive permissions
At a minimum, to query external data in Drive you must be granted View access to the Drive file linked to the external table.

How to create delegated token in Metastore and HiveServer2?

I've got HDP3 kerberized cluster.
The question is - how can I create delegation token for user that doesn't have keytab?
With that user I want to retrieve information from Metastore and run SQL queries on Hive tables.
Property hive.cluster.delegation.token.store.class equals to org.apache.hadoop.hive.thrift.ZooKeeperTokenStore
Znodes /hive/cluster/delegationHIVESERVER2/tokens and /hive/cluster/delegationMETASTORE/tokens are empty.
I've found information about how to generate DT for HDFS.
But for Hive there is info only about how to get that token, it means, that the token already exists. But how to create one?

How transfer data from Google Cloud Storage to Biq Query in partitioned and clustered table?

In first time,I created a empty table with partition and cluster. After that, I would like to configure data transfere service to fill my table from Google Cloud Storage.But when I configure the transfer, I didn't see a parameter field which allows to choose the cluster field.
I tried to do the same thing without the cluster and I can fill my table easily.
Big query error when I ran the transfer:
Failed to start job for table matable$20190701 with error INVALID_ARGUMENT: Incompatible table partitioning specification. Destination table exists with partitioning specification interval(type:DAY,field:) clustering(string_field_15), but transfer target partitioning specification is interval(type:DAY,field:). Please retry after updating either the destination table or the transfer partitioning specification.
When you define the table you specify the partitioning and clustering columns. That's everything you need to do.
When you load the data (from CLI or UI) from GCS BigQuery automatically partition and cluster the data.
If you can give more detail of how you create the table and set up the transfer would be helpful to provide a more detailed explanation.
Thanks for your time.
Of course :
empty table configuration
transfer configuration
I success to transfer datat without cluster but, when I add a cluster in my empty table,the trasnfer fails.

Google BigQuery table disappears after a few days

I'm streaming data using bigquery, but somehow the table I created will disappear from WebUI while the dataset will remain.
I set up the dataset as never expire, is there any configuration for the table itself?
I'd look into Mikhail's suggestion of the table's explicit expiration time. The tables could also be getting deleted via the tables.delete API, possibly by another user or process. You could check operations on your table in your project's audit logs and see if something is deleting them.
is there any configuration for the table itself?
Expiration that is set for dataset is just default expiration for newly created tables
Table itself can be set with expiration using expirationTime property