What is the impact on running queries in Hive if i swap the partition using
ALTER TABLE user_data
PARTITION (name = 'ABC')
SET LOCATION = 'db/partitions/new';
Does this command wait until queries finished executing?
Hive translate your query into temporary Map/Reduce job and that job executed on behalf of your hive query.When you submit hive query it creates a Map/Reduce job based on your query and that job gets executed and you get a result from that job. But if you ALTER your hive query and change partition or anything during the execution of query, command will not wait to finish your running job, it will alter your table and you will get result from your previous query unless or until you kill your previous job.
Best way to understand this is try and run. Just submit your hive query and redirect it to store the result into file and then change the partition and again submit the query and redirect it to store the result into file. Verify the both output.
Related
I had a scheduled query with SELECT statement and APPEND configuration, the destination was a specific table within the dataset "genera_analytics". Recently, I modified the query, and now the query follows a sequence of steps: INSERT, DELETE, INSERT, through DML. Now I get this error when the query is executed:
"Dataset specified in the query ('') is not consistent with Destination dataset 'genera_analytics'"
I have tried to update the scheduled query configuration removing the destination dataset through the UI, but it seems impossible. Also I have tried some bq commands:
bq update --transfer_config --target_dataset='' resource_name
but the destination dataset is still 'genera_analytics'.
How can I update this scheduled query, removing the destination dataset from the configuration?
Looks like the scheduled query was defined earlier with destination dataset defined with APPEND type transaction. While updating to a DML query, GUI doesn't show the dataset fields to update to NULL. Hence this error is coming considering the previously set dataset and table name in the scheduled query.
Hence the fix is delete the scheduled query and create it from scratch with DML query option.
I want to alter 1000s table in hive database, but some of their tables exits some doesn't. As I execute that .sql file, as soon as it found table not present, it exits from hive. so help me out to override or skip those queries whose table is not present in hive
Try this configuration parameter:
set hive.cli.errors.ignore=true;
After setting it to 'true', all commands in the script are executed, no matter how many failed.
See here: https://issues.apache.org/jira/browse/HIVE-1847
We have recently had a test failing, and it brings up a question consistency model for BigQuery: After we create a table, should other operations immediately see that table?
Background:
Our test creates a table in BigQuery with some data, waits for the job to complete, and then checks whether the table exists.
gbq.write_gbq(df, dataset_id, table_name, project_id=project_id, block=True)
assert table_name in gbq.list_tables(dataset_id, project_id=project_id) # fails
FYI block=True runs wait_for_job, so waits for the job to complete.
Yes, the table should be ready for usage just after creation.
But I suspect that the issue is not with BigQuery.
Notice that in the docs, the tables.list() operation have this nextPageToken parameter. You probably will have to use it in order to retrieve all tables in your dataset.
This code has one example on how to use it. Basically while there's a pageToken being defined then not all tables have been listed yet.
We are scooping data from netezza to hadoop non-partitioned tables and then from non-partition to partitioned with insert overwrite method. After this we are running compute incremental stats for databasename.tablename on partitioned tables but this query failed for some of the partitions with error
Could not execute command: compute incremental stats and No such file or directory for some file in partitioned directory.
You can run a refresh statement before computing stats to refresh the metadata right away. It may be necessary to wait a few seconds before computing stats even if the refresh statement return code is 0 as past experience has shown that metadata is still refreshing even after a return code is given. You won't typically won't see this issue unless a script is executing these commands sequentially.
refresh yourTableName
compute stats yourTableName
As of Impala 2.3 your can also use the alter table recover partitions instead of refresh metadata or repair table.
alter yourTableName recover partitions
compute stats yourTableName
If configuration.load.writeDisposition is set to WRITE_TRUNCATE during a load job, is there a period of time when querying the table would raise an error?
The whole period when the job is marked as PENDING and/or RUNNING?
A small moment when the table is replaced at the end of the load job?
What would be the error? status.errors[].reason => "notFound"?
The WRITE_TRUNCATE is atomic and gets applied at the end of the load job. So any queries that happen during that time will see either only the old data or all of the new data. There should be no cases where you'd get an error querying the table.
If the load failed, then there should be no change to the table, and if it succeeded, all of the data should appear at once in the table.
If the table didn't already exist, and a load job specified CREATE_IF_NEEDED, then querying the table would give not found until the load job completed.
We're working on a doc rewrite that will make this more clear.