Converting SQL stored procedures into Hadoop UDFs - sql

I am new to hadoop and I have been provided with a task to come up with plans and ideas on how to convert the existing DB2 stored procedures into Hadoop UDF. Since Hive doesn't support stored procedures I am stuck.
I did a POC for a simple select statement and it worked:
POC takes a given SQL file as input and converts it first into an intermediate JSON file
Reads the JSON file - if any built-in function found, gets the equivalent hive function from a db2-hive functions lookup file.
Creates a hive SQL file from the intermediate JSON file.
Currently, it works for a simple SELECT SQL with WHERE clause
But i am unable to replicate it for procedures.
Please suggest me the approach and/or examples to proceed.
Thanks in advance.

Related

How to execute a sql file (with both DDL and DML statements separated by ;) stored in GCS against Bigquery

I have a file on Google Cloud Storage that contains a number of queries( Create table, truncate/delete table, insert, merge, select etc). I need to execute all statements in sequence as they appear in the file against bigquery. How do I do that?
Currently, at this moment there is no way to achieve this. You might follow the procedure:
1. Separate your file in order to have instructions DDL following the correct syntax and running them.
2. Create a CSV and import the data into the bigQuery, following the procedure.
If your database is huge, you may want to do the import using the API.
Also, here the documentation for DML Syntax.

ADF Copy into SQL table without creating source file

I have a scenario to copy output of GET Metadata activity into a SQL table. Can I do this directly without using Databricks notebook?
You can make use of look up activity.
GetMetadata -> Lookup
And write insert SQL statement in Query, or use stored procedure.

What is the easiest way to query a CSV file in Oracle SQL Developer?

I have a fairly simple CSV file that I would like to use within a SQL query. I'm using Oracle SQL Developer but none of the solutions I have found on the web so far seem to have worked. I don't need to store the data (unless I can use temp tables?) just to query it and show results.
Thank You!
You need to create an EXTERNAL TABLE. This essentially maps a CSV (or indeed any flat file) to a table. You can then use that table in queries. You will not be able to perform DML on the external table.

copy blob data into on-premise sql table

My problem statement is that I have a csv blob and I need to import that blob into a sql table. Is there an utility to do that?
I was thinking of one approach, that first to copy blob to on-premise sql server using AzCopy utility and then import that file in sql table using bcp utility. Is this the right approach? and I am looking for 1-step solution to copy blob to sql table.
Regarding your question about the availability of a utility which will import data from blob storage to a SQL Server, AFAIK there's none. You would need to write one.
Your approach seems OK to me. Though you may want to write a batch file or something like that to automate the whole process. In this batch file, you would first download the file on your computer and the run the BCP utility to import the CSV in SQL Server. Other alternatives to writing batch file are:
Do this thing completely in PowerShell.
Write some C# code which makes use of storage client library to download the blob and once the blob is downloaded, start the BCP process in your code.
To pull a blob file into an Azure SQL Server, you can use this example syntax (this actually works, I use it):
BULK INSERT MyTable
FROM 'container/folder/folder/file'
WITH ( DATA_SOURCE = 'ds_blob',BATCHSIZE=10000,FIRSTROW=2);
MyTable has to have identical columns (or it can be a view against a table that yields identical columns)
In this example, ds_blob is an external data source which needs to be created beforehand (https://learn.microsoft.com/en-us/sql/t-sql/statements/create-external-data-source-transact-sql)
The external data source needs to use a database contained credential, which uses an SAS key which you need to generate beforehand from blob storage https://learn.microsoft.com/en-us/sql/t-sql/statements/create-database-scoped-credential-transact-sql)
The only downside to this mehod is that you have to know the filename beforehand - there's no way to enumerate them from inside SQL Server.
I get around this by running powershell inside Azure Automation that enumerates blobds and writes them into a queue table beforehand

Querying (SQL) Oracle/Toad dump files without importing them

We're doing a monthly data dump of our databases using Toad for Oracle's Export function. We've got some SQL queries to create statistics about the data. I'd like to compare the results of the current state with the last few dumps.
I can open the files with the Export File Browser in Toad (v11) and sort/filter the data using the GUI, but that's not powerful enough. Is there a way to query the dump files with SQL without having to take extra steps like creating a new schema and importing it?
By far the best way would be to reimport the data.