Execute SQL Procedures in Hive - hive

​Hi,
Below is the scenario we have, please suggest possible solutions,
We have an existing reporting solution (SAP) executing a procedure in RDBMS (using JDBC Connection) to generate reports based on the user input.
Now we are planning to move from RDBMS to hive as our data source.
Is there a way to connect to hive and execute a procedure (HPLSQL or a UDF performing an equivalent job of Oracle procedure) using a JDBC connection?
Or is there any alternate way to run a procedure or a program in hive or spark using jdbc?
Thanks

it is possible to connect to Hive using JDBC
https://cwiki.apache.org/confluence/display/Hive/HiveClient#HiveClient-JDBC
However if your cluster is secured, you will need to connect to HiveServer2
https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients#HiveServer2Clients-JDBC
Plus to that, store procedures are supported in Hive from the 2.0.0 version
https://issues.apache.org/jira/browse/HIVE-11055

Related

How to run a T-SQL query daily on an Azure database

I am trying to migrate a database from a sql server into Azure. This database have 2 rather simple TSQL script that inserts data. Since the SQL Agent does not exist on Azure, I am trying to find an alternative.
I see the Automation thing, but it seems really complex for something as simple as running SQL scripts. Is there any better or at least easier way to do this ?
I was under the impression that there was a scheduller for that for I can't find it.
Thanks
There are several ways to run a scheduled Task/job on the azure sql database for your use case -
If you are comfortable using the existing on-premise sql sever agent you can connect to your azure sql db(using linked servers) and execute jobs the same way we used to on on-premise sql server.
Use Automation Account/Runbooks to create sql jobs. If you see marketplace you can find several examples on azure sql db(backup,restore,indexing jobs..). I guess you already tried it and does not seem a feasible solution to you.
Another not very famous way could be to use the webjobs(under app service web app) to schedule tasks(can use powershell scripts here). The disadvantage of this is you cannot change anything once you create a webjob
As #jayendran suggested Azure functions is definitely an option to achieve this use case.
If some how out of these if you do not have options to work with the sql directly , there is also "Scheduler Job Collection" available in azure to schedule invocation of HTTP endpoints, and the sql operation could be abstracted/implemented in that endpoint. This would be only useful for less heavy sql operations else if the operation takes longer chances are it might time out.
You can use Azure Functions to Run the T-SQL Queries for Schedule use Timely Trigger.
You can use Microsoft Flow (https://flow.microsoft.com) in order to create a programmed flow with an SQL Server connector. Then in the connector you set the SQL Azure server, database name, username and password.
SQL Server connector
There are many options but the ones that you can use to run a T-SQL query daily are these:
SQL Connector options
Execute a SQL Query
Execute stored procedure
You can also edit your connection info in Data --> Connections menu.

Is there a way to extract from Redshift Via SSIS using Variables

I am currently migrating a sql server db to redshift and have a downstream ETL process which uses variable sql scripts in SSIS to extract data from various schemas and loads them back to a sql server.
I am currently trying to refactor the SSIS component to also use variables but this does not seem to be an option for ODBC connections.
Do anyone know if there is a way in which I can achive this?

Stored procedures in Hive

In my use case, I am trying to migrate SQL based traditional data warehousing application into big data infrastructure. I chose Hive and i think it is doing well. However, what I could not find is that stored procedures. Is it available in hive? I am using apache Hue to write and execute Hive query.
No, Stored Procedures are not yet available. However, there are plans to be available in the future.
Please refer to HPL/SQL, I am looking for same solution but not try yet.

Spark SQL - SQL scripts processing

I'm new to Spark and would like to know if there is any possibility to pass Spark an SQL script for processing.
My goal is to bring data from both mysql through jdbc and Cassandra into Spark and pass an SQL script file without having to modify it or minimal modifications applied to it. The reason why I'm saying minimal modifications is that I have a lot of SQL scripts (similar structure to stored procedures) which I don't want to convert them manually to RDD.
Main purpose is to process the data (execute these SQL scripts) through Spark, thus taking advantage of its capabilities and speed.
This guy found a pretty general way to run sql scripts, just pass in the connection to your database:
https://github.com/syncany/syncany/blob/15dc5344696a800061e8b363f94986e821a0b362/syncany-lib/src/main/java/org/syncany/util/SqlRunner.java
One limitation is that each of the statements in your SQL script has to be delimited with a semi-colon. It basically just parses the script like a text document and executes each statement as it goes. You could probably modify it to take advantage of Spark's SQLContext, instead of using a Connection.
In terms of performance, it probably won't be as fast as a stored procedure because you're bottle-necking with the InputStream. But it is a work-around.

Is it possible in Oracle SQLDeveloper to prefetch certain tables’ metadata and keep it cached locally?

I am working on a remote database which has several master tables. the meta-data & the actual data in these tables changes rarely.
When querying DB involving these tables and using certain functions (Ex: ctrl+space to auto-complete a table/column name), it takes too long to query the remote DB to fetch this data since its not cached locally.
Is there any extension/plug-in/configuration in SQLDeveloper to do this.
(Oracle SQLDeveloper Version 1.5.1 Build MAIN-5440)
Try to update to version 2.1
Use a tool like SQuirreL to build your queries and then copy them into SQLDeveloper. SQuirreL caches the metadata.
You could create a view on the local DB, that would keep the metadata local.