I've got HDP3 kerberized cluster.
The question is - how can I create delegation token for user that doesn't have keytab?
With that user I want to retrieve information from Metastore and run SQL queries on Hive tables.
Property hive.cluster.delegation.token.store.class equals to org.apache.hadoop.hive.thrift.ZooKeeperTokenStore
Znodes /hive/cluster/delegationHIVESERVER2/tokens and /hive/cluster/delegationMETASTORE/tokens are empty.
I've found information about how to generate DT for HDFS.
But for Hive there is info only about how to get that token, it means, that the token already exists. But how to create one?
Related
I am using google cloud logging to sink Dialogflow CX requests data to big query. BigQuery tables are auto generated when you create the sink via Google Logging.
We keep getting a sink error - field: value is not a record.
This is because pageInfo/formInfo/parameterInfo/value is of type String in BigQuery BUT there are values that are records, not strings. One example is #sys.date-time
How do we fix this?
We have not tried anything at this point since the BigQuery dataset is auto created via a Logging Filter. We cannot modify the logs and if we could modify the table schema, what would we change it to since most of the time "Value" is a String but other times it is a Record
I want to know how many times my hive tables are accessed.
The details I would like to get here are, the tableName and how many times it was accessed. Eg:-
tableName
No.Of Access
Table1
100
Table2
80
....
....
Table n
n
Is there any Hive/Linux command/code to do so? Also, I tried to understand the last access time of my table using
describe formatted database.table
But it shows me
Name
type
'LastAccessTime:
'UNKNOWN
Any suggestions/help is greatly appreciated.
Check out your hive audit log.
Audit Logs Audit logs are logged from the Hive metastore server for
every metastore API invocation.
An audit log has the function and some of the relevant function
arguments logged in the metastore log file. It is logged at the INFO
level of log4j, so you need to make sure that the logging at the INFO
level is enabled (see HIVE-3505). The name of the log entry is
"HiveMetaStore.audit".
Audit logs were added in Hive 0.7 for secure client connections
(HIVE-1948) and in Hive 0.10 for non-secure connections (HIVE-3277;
also see HIVE-2797).
There is also HDFS audit logs that you could use to derive access to hive tables.
If you have Ranger enabled that is your best bet to help see who is access what.
I am attempting to pull in data from a CSV file that is stored in an Azure Blob container and when I try to query the file I get an error of
File 'https://<storageaccount>.blob.core.windows.net/<container>/Sales/2020-10-01/Iris.csv' cannot be opened because it does not exist or it is used by another process.
The file does exist and as far as I know of it is not being used by anything else.
I am using SSMS and also a SQL On-Demand endpoint from Azure Synapse.
What I did in SSMS was run the following commands after connecting to the endpoint:
CREATE DATABASE [Demo2];
CREATE EXTERNAL DATA SOURCE AzureBlob WITH ( LOCATION 'wasbs://<container>#<storageaccount>.blob.core.windows.net/' )
SELECT * FROM OPENROWSET (
BULK 'Sales/2020-10-01/Iris.csv',
DATA_SOURCE = 'AzureBlob',
FORMAT = '*'
) AS tv1;
I am not sure of where my issue is at or where to go next. Did I mess up anything with creating the external data source? Do I need to use a SAS token there and if so what is the syntax for that?
#Ubiquitinoob44, you need to create a database credential:
https://learn.microsoft.com/en-us/azure/synapse-analytics/sql/develop-storage-files-storage-access-control?tabs=shared-access-signature
I figured out what the issue was. I haven't tried Armando's suggestion yet.
First I had to go to the container and edit IAM policies to give my Active Directory login a Blob Data Contributor role. The user to give access to will be your email address for logging in to your portal.
https://learn.microsoft.com/en-us/azure/storage/common/storage-auth-aad-rbac-portal?toc=/azure/synapse-analytics/toc.json&bc=/azure/synapse-analytics/breadcrumb/toc.json
After that I had to re-connect to the On-Demand endpoint in SSMS. Make sure you login through the Azure AD - MFA option. Originally I was using the On-Demand endpoint username and password which was not given access to the Blob Data Contributor role for the container.
https://learn.microsoft.com/en-us/azure/synapse-analytics/sql/resources-self-help-sql-on-demand
I am using Hive based on HDInsight Hadoop cluster -- Hadoop 2.7 (HDI 3.6).
We have some old Hive tables that point to some very storage accounts that don't exist any more. But these tables still point to these storage locations , basically the Hive Metastore still contain references to the deleted storage accounts. If I try to drop such a hive table , I get an error
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Got exception: org.apache.hadoop.fs.azure.AzureException org.apache.hadoop.fs.azure.AzureException: No credentials found for account <deletedstorage>.blob.core.windows.net in the configuration, and its container data is not accessible using anonymous credentials. Please check if the container exists first. If it is not publicly available, you have to provide account credentials.)
Manipulating the Hive Metatstore directly is risky as it could land the Metastore in an invalid state.
Is there any way to get rid of these orphan tables?
In first time,I created a empty table with partition and cluster. After that, I would like to configure data transfere service to fill my table from Google Cloud Storage.But when I configure the transfer, I didn't see a parameter field which allows to choose the cluster field.
I tried to do the same thing without the cluster and I can fill my table easily.
Big query error when I ran the transfer:
Failed to start job for table matable$20190701 with error INVALID_ARGUMENT: Incompatible table partitioning specification. Destination table exists with partitioning specification interval(type:DAY,field:) clustering(string_field_15), but transfer target partitioning specification is interval(type:DAY,field:). Please retry after updating either the destination table or the transfer partitioning specification.
When you define the table you specify the partitioning and clustering columns. That's everything you need to do.
When you load the data (from CLI or UI) from GCS BigQuery automatically partition and cluster the data.
If you can give more detail of how you create the table and set up the transfer would be helpful to provide a more detailed explanation.
Thanks for your time.
Of course :
empty table configuration
transfer configuration
I success to transfer datat without cluster but, when I add a cluster in my empty table,the trasnfer fails.