Deploying jars for User Defined Functions - hive

https://cwiki.apache.org/confluence/display/Hive/HivePlugins
Hive provides a way to register user defined functions using the 'add jar' command, how should an application register these jars programatically?
If a particular class definition changes in the user defined function, should I have 'add jar' or is there a different command to achieve this.

The easiest way to add hive udf jar from java code is,
Connection con = DriverManager.getConnection(connectionUrl, "", "");
Statement stmt = con.createStatement();
stmt.executeQuery("add jar /user/jars/HiveUdf.jar");

Related

Get the JDBC Providers for the Cell using wsadmin

I am trying to list the jdbcprovider list at cell scope but it also list the jdbcproviders at node and server scope, how to get rid off the providers at node and server scope from the list?
AdminConfig.list('JDBCProvider', AdminConfig.getid( '/Cell:CellV70A/'))
output:
'"DB2 Universal JDBC Driver Provider(cells/CellV70A/nodes/nodename|resources.xml#JDBCProvider_1302300228086)"\n"DB2 Universal JDBC Driver Provider(cells/CellV70A|resources.xml#JDBCProvider_1263590015775)"\n"WebSphere embedded ConnectJDBC driver for MS SQL Server(cells/CellV70A|resources.xml#JDBCProvider_1272027151294)"'
If you look at the help for the AdminConfig.list command:
wsadmin>print AdminConfig.help('list')
WASX7056I: Method: list
...
Method: list
Arguments: type, scope
Description: Lists all the configuration objects of the type named
by "type" within the scope of the configuration object named by "scope."
...
It says "within the scope". Since node and server-scoped JDBCProviders are within the scope of the cell, they are returned by your command. If you list all JDBCProviders at cell scope using the Admin Console and then look at the Command Assistance, you'll see something like:
Note that scripting list commands may generate more information than is displayed by the administrative console because the console generally filters with respect to scope, templates, and built-in entries. AdminConfig.list('JDBCProvider', AdminConfig.getid('/Cell:MyCell/'))
So you'll need to filter your return list similarly. You could throw together a very simple script to do so:
jdbcProviders = AdminConfig.list('JDBCProvider', AdminConfig.getid('/Cell:MyCell')).split('\r\n')
for jdbcProvider in jdbcProviders:
if "/nodes/" or "/servers/" in jdbcProvider:
continue
print jdbcProvider

Calling java code from Apache Derby

I've written a simple method in Java:
package com.fidel.extensions;
public class Extensions {
public static String capitalize(String input) {
return input.toUpperCase();
}
}
I then registered it as a function in Apache Derby.
create function capitalize(inputString varchar(255))
returns varchar(255)
parameter style JAVA
no sql language JAVA
external name 'com.fidel.extensions.Extensions.capitalize'
In order to give the database access to that code, this page suggests I have two choices:
Install the jar into the database
Add the jar to CLASSPATH
This is the text from that article:
The compiled Java for a procedure (or function) may be stored in the
database using the standard SQL procedure SQLJ.INSTALL_JAR or may be
stored outside the database in the class path of the application.
If I use the INSTALL_JAR approach to embed the jar into the database, my queries work fine. For example:
select capitalize('hello') from SYSIBM.SYSDUMMY1
However I don't actually want to store the jar in the database. I would like derby to look in my CLASSPATH variable to find it.
So I've added it to my CLASSPATH using the following:
export CLASSPATH=${CLASSPATH}:/home/fidel/dev/DbExtensions/extensions.jar
But when I run the same query, I get this error message:
The class 'com.fidel.extensions.Extensions' does not exist or is
inaccessible.
I'm using Netbean's SQL editor, which I assume would pick up the CLASSPATH I've set.
Has anyone managed to reference code in an external jar, via the CLASSPATH?
ps. I know I can use the UCASE/UPPER methods. But the code above is just an example
pps. I am able to get the query to work by adding the jar to the Driver list, but I don't think that's the correct thing to do.
Services -> Drivers -> Java DB (Embedded) -> Customize -> Add

Setting user credentials on aws instance using jclouds

I am trying to create an aws instance using jclouds 1.9.0 and then run a script on it (via ssh). I am following the example locate here but I am getting authentication failed errors when the client (java program) tries to connect at the instance. The AWS console show that instance is up and running.
The example tries to create a LoginCrendentials object
String user = System.getProperty("user.name");
String privateKey = Files.toString(new File(System.getProperty("user.home") + "/.ssh/id_rsa"), UTF_8);
return LoginCredentials.builder().user(user).privateKey(privateKey).build();
which is latter used from the ssh client
responses = compute.runScriptOnNodesMatching(
inGroup(groupName), // predicate used to select nodes
exec(command), // what you actually intend to run
overrideLoginCredentials(login) // use my local user & ssh key
.runAsRoot(false) // don't attempt to run as root (sudo)
.wrapInInitScript(false));
Some Login information are injected to the instance with following commands
Statement bootInstructions = AdminAccess.standard();
templateBuilder.options(runScript(bootInstructions));
Since I am on Windows machine the creation of LoginCrendentials 'fails' and thus I alter its code to
String user = "ec2-user";
String privateKey = "-----BEGIN RSA PRIVATE KEY-----.....-----END RSA PRIVATE KEY-----";
return LoginCredentials.builder().user(user).privateKey(privateKey).build();
I also to define the credentials while building the template as described in "EC2: In Depth" guide but with no luck.
An alternative is to build instance and inject the keypair as follows, but this implies that I need to have the ssh key stored in my AWS console, which is not currently the case and also breaks the functionality of running a script (via ssh) since I can not infer the NodeMetadata from a RunningInstance object.
RunInstancesOptions options = RunInstancesOptions.Builder.asType("t2.micro").withKeyName(keypair).withSecurityGroup(securityGroup).withUserData(script.getBytes());
Any suggestions??
Note: While I am currently testing this on aws, I want to keep the code as decoupled from the provider as possible.
Update 26/10/2015
Based on #Ignasi Barrera answer, I changed my implementation by adding .init(new MyAdminAccessConfiguration()) while creating the bootInstructions
Statement bootInstructions = AdminAccess.standard().init(new MyAdminAccessConfiguration());
templateBuilder.options(runScript(bootInstructions));
Where MyAdminAccessConfiguration is my own implementation of the AdminAccessConfiguration interface as #Ignasi Barrera described it.
I think the issue relies on the fact that the jclouds code runs on a Windows machine and jclouds makes some Unix assumptions by default.
There are two different things here: first, the AdminAccess.standard() is used to configure a user in the deployed node once it boots, and later the LoginCredentials object passed to the run script method is used to authenticate against the user that has been created with the previous statement.
The issue here is that the AdminAccess.standard() reads the "current user" information and assumes a Unix System. That user information is provided by this Default class, and in your case I'm pretty sure it will fallback to the catch block and return an auto-generated SSH key pair. That means, the AdminAccess.standard() is creating a user in the node with an auto-generated (random) SSH key, but the LoginCredentials you are building don't match those keys, thus the authentication failure.
Since the AdminAccess entity is immutable, the better and cleaner approach to fix this is to create your own implementation of the AdminAccessConfiguration interface. You can just copy the entire Default class and change the Unix specific bits to accommodate the SSH setup in your Windows machine. Once you have the implementation class, you can inject it by creating a Guice module and passing it to the list of modules provided when creating the jclouds context. Something like:
// Create the custom module to inject your implementation
Module windowsAdminAccess = new AbstractModule() {
#Override protected void configure() {
bind(AdminAccessConfiguration.class).to(YourCustomWindowsImpl.class).in(Scopes.SINGLETON);
}
};
// Provide the module in the module list when creating the context
ComputeServiceContext context = ContextBuilder.newBuilder("aws-ec2")
.credentials("api-key", "api-secret")
.modules(ImmutableSet.<Module> of(windowsAdminAccess, new SshjSshClientModule()))
.buildView(ComputeServiceContext.class);

Applying SSIS Package Configuration to multiple packages

I have about 85 SSIS packages that are using the same connection manager.
I understand that each package has its own connection manager.
I am trying to decide what would be the best configurations approach to simply set the connectionstring of the connection manager based on the server the packages are residing on.
I have visited all kinds of suggestions online, but cannot find anywhere the practice where I can simply copy the configuration from one package to the rest of the packages.
There are obviously many approaches such as XML file, SQL Server, Environment Variable, etc.
All the articles out there are pointing to use an Indirect method by using XML or SQL approach. Why would using an environment variable for just holding a connection string is such a bad approach?
Any suggestions are highly appreciated.
Thanks!
Why would using an environment variable for just holding a connection string is such a bad approach?
I find the environment variable or registry key configuration approach to be severely limited by the fact that it can only configure one item at a time. For a connection string, you'd need to define an environment variable for each catalog on a given server. Maybe it's only 2 or 3 and that's manageable. We had a good 30+ per database instance and we had multi-instanced machines so you can see how quickly this problem explodes into a maintenance nightmare. Contrast that with a table or xml based approach which can hold multiple configuration items for a given configuration key.
...best configurations approach to simply set the connectionstring of the connection manager based on the server the packages are residing on.
If you go this route, I'd propose creating a variable, ConnectionString and using it to configure the property. It's an extra step but again I find it's easier to debug a complex expression on a variable versus a complex expression on a property. With a variable, you can always pop a breakpoint on the package and look at the locals window to see the current value.
After creating a variable named ConnectionString, I right click on it, select Properties and set EvaluateAsExpression equal to True and the Expression property to something like "Data Source="+ #[System::MachineName] +"\\DEV2012;Initial Catalog=FOO;Provider=SQLNCLI11.1;Integrated Security=SSPI;"
When that is evaluated, it'd fill in the current machine's name (DEVSQLA) and I'd have a valid OLE DB connection string that connects to a named instance DEV2012.
Data Source=DEVSQLA\DEV2012;Initial Catalog=FOO;Provider=SQLNCLI11.1;Integrated Security=SSPI;
If you have more complex configuration needs than just the one variable, then I could see you using this to configure a connection manager to a sql table that holds the full repository of all the configuration keys and values.
...cannot find anywhere the practice where I can simply copy the configuration from one package to the rest of the packages
I'd go about modifying all 80something packages through a programmatic route. We received a passel of packages from a third party and they had not followed our procedures for configuration and logging. The code wasn't terribly hard and if you describe exactly the types of changes you'd make to solve your need, I'd be happy to toss some code onto this answer. It could be as simple as the following. After calling the function, it will modify a package by adding a sql server configuration on the SSISDB ole connection manager to a table called dbo.sysdtsconfig for a filter named Default.2008.Sales.
string currentPackage = #"C:\Src\Package1.dtsx"
public static void CleanUpPackages(string currentPackage)
{
p = new Package();
p.app.LoadPackage(currentPackage, null);
Configuration c = null;
// Apply configuration Default.2008.Sales
// ConfigurationString => "SSISDB";"[dbo].[sysdtsconfig]";"Default.2008.Sales"
// Name => MyConfiguration
c = p.Configurations.Add();
c.Name = "SalesConfiguration";
c.ConfigurationType = DTSConfigurationType.SqlServer;
c.ConfigurationString = #"""SSISDB"";""[dbo].[sysdtsconfig]"";""Default.2008.Sales""";
app.SaveToXml(sourcePackage, p, null);
}
Adding a variable in to the packages would not take much more code. Inside the cleanup proc, add code like this to add a new variable into your package that has an expression like the above.
string variableName = string.Empty;
bool readOnly = false;
string nameSpace = "User";
string variableValue = string.Empty;
string literalExpression = string.Empty;
variableName = "ConnectionString";
literalExpression = #"""Data Source=""+ #[System::MachineName] +""\\DEV2012;Initial Catalog=FOO;Provider=SQLNCLI11.1;Integrated Security=SSPI;""";
p.Variables.Add(variableName, readOnly, nameSpace, variableValue);
p.Variables[variableName].EvaluateAsExpression = true;
p.Variables[variableName].Expression = literalExpression;
Let me know if I missed anything or you'd like clarification on any points.

Need information about JPA based transaction for dynamic SQL table

Firstly, I would like to state our environment details.
We are trying to use EJB-hibernate with sql Azure to create apps on Azure cloud using Eclipse.
We needed to create and transact on databases dynamically. We are able to create databases dynamically. However, on trying to transact on these we are getting an error:
"java.sql.SQLException: No suitable driver found for connection url"
When we tried statically transacting using jpa was not a problem. However, dynamic transactions cannot be done. The entitymanager object is created but not able to connect database.
Could someone help us and explain how we can handle transactions using JPA for dynamically created databases.
Thanks,
Saugata
[edit] We are using the following persistence.xml:
>org.hibernate.ejb.HibernatePersistence
java:jboss/EDS</jta-data-source> -->
net.oauth.database.Co
net.oauth.database.Cr
value="org.hibernate.transaction.JTATransactionFactory" />
value="org.hibernate.transaction.JBossTransactionManagerLookup" />
Our code to connect to the db is as follows:
Map configOverrides = new HashMap();
configOverrides.put("hibernate.connection.password", "");
configOverrides.put("hibernate.connection.username", "");
configOverrides.put("hibernate.connection.driver_class","com.microsoft.sqlserver.jdbc.SQLServerDriver");
configOverrides.put("hibernate.connection.url", "jdbc:sqlsever://;" + "databaseName=;user=;password=");
EntityManagerFactory factory = Persistence.createEntityManagerFactory(ENTERPRISE_UNIT_NAME, configOverrides);
Please note that we are trying to create and connect to db dynamically and hence to do not the db created statically.
For this we are getting the error:
"java.sql.SQLException: No suitable driver found for connection url"
Create a persistence.xml with a persistence unit and put everything there which is static (eg database dialect, logging parameters, etc.)
Then use the following method to create the entity manager:
javax.persistence.Persistence.createEntityManagerFactory(String persistenceUnitName, Map properties);
Supply the variable parameters in the map, like this:
properties.put("hibernate.connection.url", "jdbc:postgresql://127.0.0.1/test");
properties.put("hibernate.connection.username", "joe");
properties.put("hibernate.connection.password", "pass");