Is there a way to locate the hdfs/local path of a particular UDF's jar/class file?
When I run "show functions" I am able to see this UDF but I want to find out it's location.
Hive has two types of functions permanent/Built-in and temporary.
Permenent: Built-in functions are part of hive-exec**.jar under package "org.apache.hadoop.hive.ql.udf". Jar is under HIVE_HOME/lib/hive-exec*.jar.
Temporary: functions are added manually. in case of temporary function you will find details in your hive hql file. e.g.
ADD JAR xyz.jar;
CREATE TEMPORARY FUNCTION temp AS 'com.example.hive.udf.Temp';
Hive of 0.13+ UDF can also be added permanently using plugin.
Find out details about hive function:
DESCRIBE FUNCTION EXTENDED function_name;
DESCRIBE FUNCTION function_name;
Related
I am new to the hive and I am working on a project where I need to create a few UDFs for data wrangling. During my research, I came across two syntaxes for creating UDF from added jars
CREATE FUNCTION country AS 'com.hiveudf.employeereview.Country';
CREATE TEMPORARY FUNCTION country AS 'com.hiveudf.employeereview.Country';
I am not able to find any difference in the above two ways. Can someone explain it to me or guide me to right material?
The main difference between create function and create tmp function is this:
In Hive 0.13 or later, functions can be registered to the metastore, so they can be referenced in a query without having to create a temporary function each session.
If we use CREATE TEMPORARY FUNCTION , we will have to recreate the function every-time we start a new session.
Reference: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Create/Drop/ReloadFunction
CREATE TEMPORARY FUNCTION creates a new function which you can use in Hive queries as long as the session lasts . This is faster as we don't need to register the functions to the megastore.
Whereas CREATE FUNCTION acts more permanently. These functions can be registered to the metastore, so they can be referenced in a query without having to create a temporary function each session.
When to use:
The intermediate functions can be created using TEMPORARY which aims to just compute and can be later on used by any permanent functions.
Reference
I wrote a custom UDF in java and packed in a jar file. Then, I added it in Hive using:
create temporary function isstopword as 'org.dennis.udf.IsStopWord';
Every thing worked fine. But, after I updated a small part in the UDF, I did the previous steps again, consequently Hive obviously still used old version UDF.
How can I refresh the updated version of UDF?
I tried to deleted the old jar file in hdfs, and drop the udf function with:
DROP TEMPORARY FUNCTION IF EXISTS isstopword;
Then recreate a new function with the same name, it still used the older version UDF.
I solved it by following this document:http://bdlabs.edureka.co/static/help/topics/cm_mc_hive_udf.html#concept_zb2_rxr_lw_unique_1
Generally with the following steps:
added a config in hive-site.xml, then restart the hive server.
<property>
<name>hive.reloadable.aux.jars.path</name>
<value>/user/hive/udf</value>
</property>
deleted the old jar file in HDFS, and upload the new jar file.
DROP TEMPORARY FUNCTION IF EXISTS isstopword;
in hive console, run list jar; to check the local jar files,
it would print something like this:
/tmp/83ce8586-7311-4e97-813f-f2fbcec63a55_resources/isstopwordudf.jar
then delete them in your server file system.
create a temp function again.
create temporary function isstopword as 'org.dennis.udf.IsStopWord';
With the above steps, it worked for me!
All the jars you add and the temporary functions you create are only specific to that particular hive session . Once you exit from that session all the temporary functions are lost forever .
Did you try closing the session and repeat the steps again .
I'd like to create a BigQuery view that uses a query which invokes a user-defined function. How do I tell BigQuery where to find the code files for the UDF?
Views can reference UDF resources stored in Google Cloud Storage, inline code blobs, or local files (contents will be loaded into inline code blobs).
To create a view with a UDF using the BigQuery UI, just fill out the UDF resources as you would when running the query normally, and save as a view. (In other words, no special actions are required).
To specify these during view creation from the command-line client, use the --view_udf_resource flag:
bq mk --view="SELECT foo FROM myUdf(table)" \
--view_udf_resource="gs://my-bucket/my-code.js"
In the above example, gs://my-bucket/my-code.js would contain the definition for myUdf(). You can provide multiple --view_udf_resources flags if you need to reference multiple code files in your view query.
You may specify gs:// URIs or local files. If you specify a local file, then the code will be read once and packed into an inline code resource.
Via the API, this is a repeated field named userDefinedFunctionResources. It is a sibling of the query field that contains the view SQL.
I encounter weird behaviour when using Hive 13.0.1 on Amazon EMR.
This happens when I try to both use UDF and run external shell script that runs hive -e "..." commands
We have been using shell scripts to add partitions dynamically to a table and never encountered any problem in Hive 0.11
However in Hive 0.13.1 the following simplified example breaks:
add jar myjar;
create temporary function myfunc as '...';
create external table mytable...
!hive -e "";
select myfunc(someCol) from mytable;
Results in The UDF implementation class '...' is not present in the class path
Removing the shell command (!hive -e "") and the error disappears.
Adding the jar and function again after the shell and the error disappears (Adding just the function without the jar does not get rid of the error).
Is this known behavior or a bug, can I do anything besides reloading the jar and function before every usage?
AFAIK - this is the way it's always been. One hive shell cannot pass on the additional jars added to it's classpath to the child shell. and definitely not the function definitions.
We provide Hive/Hadoop etc. as a service in Qubole and have the notion of a hive bootstrap that is used to, for cases like this, capture common statements required for all queries. This is used extensively by most users. (caveat - i am one of Qubole and Hive's founders - but would recommend using Qubole over EMR for Hive).
I would like to execute sql files generated by the service builder, but the problem is that the sql files contains types like: LONG,VARCHAR... etc
Some of these types don't exist on Postgresql (for example LONG is Bigint).
I don't know if there is a simple way to convert sql file's structures to be able to run them on Postgresql?
execute ant build-db on the plugin and you will find sql folder with vary vendor specific scripts.
Daniele is right, using build-db task is obviously correct and is the right way to do it.
But... I remember a similar situation some time ago, I had only liferay-pseudo-sql file and need to create proper DDL. I managed to do this in the following way:
You need to have Liferay running on your desktop (or in the machine where is the source sql file), as this operation requires portal spring context fully wired.
Go to Configuration -> Server Administration -> Script
Change language to groovy
Run the following script:
import com.liferay.portal.kernel.dao.db.DB
import com.liferay.portal.kernel.dao.db.DBFactoryUtil
DB db = DBFactoryUtil.getDB(DB.TYPE_POSTGRESQL)
db.buildSQLFile("/path/to/folder/with/your/sql", "filename")
Where first parameter is obviously the path and the second is filename without .sql extension. The file on disk should have proper extension: must be called filename.sql.
This will produce tables folder next to your filename.sql which will contain single tables-postgresql.sql with your Postgres DDL.
As far as I remember, Service Builder uses the same method to generate database-specific code.