how create stored procedure in athena aws from s3? - amazon-s3

I need to create procedural logic using data stored in aws s3 from athena or glue.
actually it is migrating a stored procedure from sql server to aws, but I don't know what aws service or where to do it with, it doesn't use database but s3 tables.
Thank you very much for guiding me on how to do it.

Athena doesn't support stored procedures, but however you can leverage UDFs to define the same logic as in your source stored procedure.
Below is the syntax for an UDF and refer to this for more information:
USING EXTERNAL FUNCTION UDF_name(variable1 data_type[, variable2 data_type][,...])
RETURNS data_type
LAMBDA 'lambda_function'
SELECT [...] UDF_name(expression) [...]

Related

How to execute a sql file (with both DDL and DML statements separated by ;) stored in GCS against Bigquery

I have a file on Google Cloud Storage that contains a number of queries( Create table, truncate/delete table, insert, merge, select etc). I need to execute all statements in sequence as they appear in the file against bigquery. How do I do that?
Currently, at this moment there is no way to achieve this. You might follow the procedure:
1. Separate your file in order to have instructions DDL following the correct syntax and running them.
2. Create a CSV and import the data into the bigQuery, following the procedure.
If your database is huge, you may want to do the import using the API.
Also, here the documentation for DML Syntax.

unioning tables from ec2 with aws glue

I have two mysql databases each on their own ec2 instance. Each database has a table ‘report’ under a schema ‘product’. I use a crawler to get the table schemas into the aws glue data catalog in a database called db1. Then I’m using aws glue to copy the tables from the ec2 instances into an s3 bucket. Then I’m querying the tables with redshift. I get the external schema in to redshift from the aws crawler using the script below in query editor. I would like to union the two tables together in to one table and add a column ’source’ with a flag to indicate the original table each record came from. Does anyone know if it’s possible to do that with aws glue during the etl process? Or can you suggest another solution? I know I could just union them with sql in redshift but my end goal is to create an etl pipeline that does that before it gets to redshift.
script:
create external schema schema1 from data catalog
database ‘db1’
iam_role 'arn:aws:iam::228276743211:role/madeup’
region 'us-west-2';
You can create a view that unions the 2 tables using Athena, then that view will be available in Redshift Spectrum.
CREATE OR REPLACE VIEW db1.combined_view AS
SELECT col1,cole2,col3 from db1.mysql_table_1
union all
SELECT col1,cole2,col3 from db1.mysql_table_2
;
run the above using Athena (not Redshift)

Azure SQL External table of Azure Table storage data

Is it possible to create an external table in Azure SQL of the data residing in Azure Table storage?
Answer is no.
I am currently facing similiar issue and this is my research so far:
Azure SQL Database doesn't allow Azure Table Storage as a external data source.
Sources:
https://learn.microsoft.com/en-us/sql/t-sql/statements/create-external-data-source-transact-sql?view=sql-server-2017
https://learn.microsoft.com/en-us/sql/t-sql/statements/create-external-file-format-transact-sql?view=sql-server-2017
https://learn.microsoft.com/en-us/sql/t-sql/statements/create-external-table-transact-sql?view=sql-server-2017
Reason:
The possible data source scenarios are to copy from Hadoop (DataLake/Hive,..), Blob (Text files,csv) or RDBMS (another sql server). The Azure Table Storage is not listed.
The possible external data formats are only variations of text files/hadoop: Delimited Text, Hive RCFile, Hive ORC,Parquet.
Note - even copying from blob in JSON format requires implementing custom data format.
Workaround:
Create a copy pipeline with Azure Data Factory.
Create a copy
function/script with Azure Functions using C# and manually transfer
the data
Yes, there are a couple options. Please see the following:
CREATE EXTERNAL TABLE (Transact-SQL)
APPLIES TO: SQL Server (starting with 2016) Azure SQL Database Azure SQL Data Warehouse Parallel Data Warehouse
Creates an external table for PolyBase, or Elastic Database queries. Depending on the scenario, the syntax differs significantly. An external table created for PolyBase cannot be used for Elastic Database queries. Similarly, an external table created for Elastic Database queries cannot be used for PolyBase, etc.
CREATE EXTERNAL DATA SOURCE (Transact-SQL)
APPLIES TO: SQL Server (starting with 2016) Azure SQL Database Azure SQL Data Warehouse Parallel Data Warehouse
Creates an external data source for PolyBase, or Elastic Database queries. Depending on the scenario, the syntax differs significantly. An external data source created for PolyBase cannot be used for Elastic Database queries. Similarly, an external data source created for Elastic Database queries cannot be used for PolyBase, etc.
What is your use case?

Redshift UDF boto sql query and S3

Is there a way to use UDF in Redshift, execute a SQL query and upload the result to AWS S3 ? Would really appreciate if someone knows how to this.
Thanks
To create a UDF in Redshift, you can use python. You can then call the function in a SQL SELECT statement. To output the results of a query to a file in S3, you can use the UNLOAD statement.

Converting SQL stored procedures into Hadoop UDFs

I am new to hadoop and I have been provided with a task to come up with plans and ideas on how to convert the existing DB2 stored procedures into Hadoop UDF. Since Hive doesn't support stored procedures I am stuck.
I did a POC for a simple select statement and it worked:
POC takes a given SQL file as input and converts it first into an intermediate JSON file
Reads the JSON file - if any built-in function found, gets the equivalent hive function from a db2-hive functions lookup file.
Creates a hive SQL file from the intermediate JSON file.
Currently, it works for a simple SELECT SQL with WHERE clause
But i am unable to replicate it for procedures.
Please suggest me the approach and/or examples to proceed.
Thanks in advance.