I am using spark and executing HQL queries using HiveContext. I want to delete table(student) from database(test) in hive. So, will this command delete test.student will run in hivecontext.
It can run in teradata so can i use it as it is in Hive(in spark).
No, "DELETE <database_name>.<table_name>" is not supported in hive, as well in Spark. Instead, you can do the following:
sqlContext.sql("DROP TABLE IF EXISTS <db_name>.<table_name>");
Related
I want to insert Json to hive database.
I try to transform Json to SQL using ConvertJsonToSQL Ni-Fi processor. How can I use PARTITION (....) part into my query??
Can I do this or I should use ReplaceText processor for making query?
What version of Hive are you using? There are Hive 1.2 and Hive 3 versions of PutHiveStreaming and PutHive3Streaming (respectively) that let you put the data directly into Hive without having to issue HiveQL statements. For external Hive tables in ORC format, there are also ConvertAvroToORC (for Hive 1.2) and PutORC (for Hive 3) processors.
Assuming those don't work for your use case, you may also consider ConvertRecord with a FreeFormTextRecordSetWriter that generates the HiveQL with the PARTITION statement and such. It gives a lot more flexibility than trying to patch a SQL statement to turn it into HiveQL for a partitioned table.
EDIT: I forgot to mention that the Hive 3 NAR/components are not included with the NiFi release due to space reasons. You can find the Hive 3 NAR for NiFi 1.11.4 here.
I want to use HIVE table as a source and Oracle table as target in my Informatica developer tool mapping with hive environment. Mean I want to run my Informatcia developer tool mapping in HIVE mode. Is it possible if yes then please let me know the steps.
Thanks
Yes, it is possible, You just need to change the execution engine.
Refer to Informatica BDM help document for the detailed steps
You can create Oracle connection via sqoop and use it with any of the Hadoop engine in informatica bdm
Yes this is possible using sqoop for oracle. But there is limitation that with sqoop you can only insert in oracle.
You can use PowerExchange for Hadoop to load data from PowerCenter to new Hive tables.
Source or targets will appear in PowerCenter as ODBC data sources or ODBC targets .
Currently we are dropping the table daily and running the script which loads the data to the tables. Script takes 3-4 hrs during which data will not be available. So now our aim is to make the old hive data available to analysts until new data load execution is complete.
I am achieving this thing in hql script by loading daily data to the hive tables partitioned on load_year, load_month and load_day and dropping the yesterdays data by dropping the partition.
But what is the option for pig script to achieve the same? Can we alter the table through pig script? I dont want to execute the other hql to drop partition after pig.
Thanks
Since HDP 2.3 you can use HCatalog commands inside Pig scripts. Therefore, you can use the HCatalog command to drop a Hive table partition. The following is an example of dropping a Hive partition:
-- Set the correct hcat path
set hcat.bin /usr/bin/hcat;
-- Drop a table partion or execute other any Hcatalog command
sql ALTER TABLE midb1.mitable1 DROP IF EXISTS PARTITION(activity_id = "VENTA_ALIMENTACION",transaction_month = 1);
Another way is to use sh command execution inside Pig Script. However I had some problems to escape special characters in ALTER commands. So, the first is the best option in my opinion.
Regards,
Roberto Tardío
What is the best (less expensive) equivalent of SQL Server UPDATE SET command in Hive?
For example, consider the case in which I want to convert the following query:
UPDATE TABLE employee
SET visaEligibility = 'YES'
WHERE experienceMonths > 36
to equivalent Hive query.
I'm assuming you have a table without partitions, in which case you should be able to do the following command:
INSERT OVERWRITE TABLE employee SELECT employeeId,employeeName, experienceMonths ,salary, CASE WHEN experienceMonths >=36 THEN ‘YES’ ELSE visaEligibility END AS visaEligibility FROM employee;
There are other ways but they are much more convoluted, I think the way Bejoy described is the most efficient.
(source: Bejoy KS blog)
Note that if you have to do this on a partitioned table (which is likely if you have a lot of data), you would probably need to overwrite your partition when doing this.
You can create an external table and use the 'insert overwrite into local directory' and in case you want to change the column values, you can use 'CASE WHEN', 'IF' or other conditional operations. And copy the output file back to HDFS location.
You can upgrade your hive to 0.14.0
Starting from 0.14.0 hive supports UPDATE operation.
To do the same we need to create hive tables such that they support ACID output format and need to set additional properties in hive-site.xml.
How to do CURD operations in Hive
I have changed the Hive Metastore from Derby to SQL as given in this specification.
https://ccp.cloudera.com/display/CDHDOC/Hive+Installation
Please tell me how can I ensure whether it is changed to SQL.
You can query the metastore schema in your MySQL database.
Something like:
SELECT * FROM TBLS;
on your MySQL database should you the names of your Hive tables.
Add a new table and verify that the above query returns updated results.