How To Add A JAR To Hive - hive

I have been banging my head against the wall for a while now on how to get a hive equivalent of MS SQL's IDENTITY column added to a table and auto incremented. I have found many references to org.apache.hadoop.hive.contrib.udf.UDFRowSequence but I have no idea where that is in my HortonWorks 2.3 install of my cluster. I have no idea where to start on this. I have seen an java file here which I assume I have to compile but once I have a .jar where does it go? I have tried using a SerDe jar for another task and I could never get hive to see/use it (see my question on this here).
I have tried to follow along with this case study on creating a custom UDF here. However, I can find no path like they are describing in my Hortonworks install (the path looks something like ql/src/java/org/apache/hadoop/hive/ql/udf/generic/).
It seems like each tutorial/guide/reference to creating a UDF is assuming some knowledge that I do not have yet. How can I create/use the UDFRowSequence functionality in Hortonworks install of hive?

In order to use the UDF inside Hive, compile the Java code and package the UDF bytecode class file into a JAR file.
Then, open your Hive session, add the JAR to the classpath
hive> ADD JAR full path to jar file;
The path here needs to be the full path of your local filesystem where you have to put your jar file.
and
use a CREATE FUNCTION statement to define a function that uses the Java class:
hive> CREATE TEMPORARY FUNCTION functionname
> AS 'classname with full package name';
Then you can use the functionname in your hive session.
The path
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/ is actually referring to the package name of java class.

Copy jar file inside $HIVE_HOME/lib and restart hive client
cp full_path_to_jar_file $HIVE_HOME/lib/

Related

How to dynamically map functions to serverless.yml

My team has been working with serverless and we're trying to establish a new standard in the company for a file organization that eases the collaborative development.
We plan to isolate the lambdas/functions handler each in their folder, alongside the function .yml file and other necessary configs.
Example expected directory structure (lean):
-- /app
--- /functions
---- /func_a
----- func_a.py
----- func_a.yml
---- /func_b
----- func_b.py
----- func_b.yml
- serverles.yml
The problem so far is that we have to manually declare all external config function files in the serverless.yml file, which breaks the whole purpose of the idea.
The question is: is there a way to automate this import?
What we've searched so far:
Wildcard path - does not work for file variables. Eg.: ${file(./app/functions/*.yml)} or ${file(./app/functions/{any+})}
Extending configuration using custom plugins - does not seem to be able to modify the functions list. Only found information about: DefineTopLevelProperty, defineCustomProperties, defineFunctionEvent, defineFunctionEventProperties, defineFunctionProperties, defineProvider.
Info from here: Serverless Doc - base schema link broken, so no other information aside from the one in the page.
What we thought to be options:
Maybe is there a plugin that does that? Didn't find any.
Create an isolated custom function (python) that is called before running sls deploy and creates the final serverles.yml file from a template by traversing all folders.
What is the better and most natural approach to that?
I think that for your use case, you might be better of with considering using JS/TS configuration file format instead of YAML. That allows you to use regular JS to define your config which makes importing such parts of configurations much easier. See the TS template for example on how to use it: https://github.com/serverless/serverless/blob/master/lib/plugins/create/templates/aws-nodejs-typescript/serverless.ts

How can we get the Exact java business code from the TBO deployed in Documentum?

Currently we need to customize one of the existing TBO to add some functionality. However we are unable to find the exact code of TBO which we deployed last time.
Is there any way we can get it back from repository itself?
Usually we convert our Code to a single Jar file and add that to jardef. And we deploy the Dar file to Documentum repository. So is there a way to retrieve that Jar file?
Yes, there is a way. Navigate to
System/Modules/TBO/<your TBO name>
There should be jardef under dmc_jar object type.
You can also query for dmc_module or dmc_jar to find them all, though I think all should be under TBO folder level.

U-SQL : How to merge two usql files with same import statement

I want to deploy multiple tables creation script as one adla job to save on cost. I am using packages to get set of defined partition keys for all tables. When i try to deploy as merged script it complains that import statement is declared multiple times and fails.
While i can still deploy script one by one but wanted to see if we can merge script for faster deployment.
Thanks
Amit
I am not sure I completely get your scenario. If you want to deploy a single object by itself, then that file needs to include all the dependencies (e.g., your package). If you want to deploy several objects, you should include the dependencies only once.
You probably should set up something that generates your script from the underlying "fragments". One fragment would be the reference to the package, the other fragments would be the creation of one object. And your deployment system would concatenate the files as needed.

FlywayDB ignore sub-folder in migration

I have a situation where I would like to ignore specific folders inside of where Flyway is looking for the migration files.
Example
/db/Migration
2.0-newBase.sql
/oldScripts
1.1-base.sql
1.2-foo.sql
I want to ignore everything inside of the 'oldScripts' sub folder. Is there a flag that I can set in Flyway configs like ignoreFolder=SOME_FOLDER or scanRecursive=false?
An example for why I would do this is say, I have 1000 scripts in my migration folder. If we onboard a new member, instead of having them run the migration on 1000 files, they could just run the one script (The new base) and proceed from there. The alternative would be to never sync those files in the first place, but then people would need to remember to check source control to prior migrations instead of just looking on their local drive.
This is not currently supported directly. You could put both directories at the same level in the hierarchy (without nesting them) and selectively configure flyway.locations to achieve the same thing.
Since Flyway 6.4.0 wildcards are supported in flyway.locations. Examples:
db/**/test
db/release1.*
db/release1.?
More info at https://flywaydb.org/blog/organising-your-migrations

How to deploy multiple database on Team build

I tried the following in MSBuild Arguments but this didn't work out.
/t:Build;Publish /p:SqlPublishProfilePath=McMData1.local.publish.xml
/p:SqlPublishProfilePath McMData1.local.publish.xml
You can only specify one SqlPublishProfilePath value. You can create another msbuild command task to publish another database project separately or create another build definition.
Also, you could make the database properties inside the xml file command line driven , but you can still only specify one .xml file and have one deployment.