SQL Server Serverless - Openrowset - Metadata functions? - sql

I am using Openrowset with OpenJson to query files in a data lake using a Synapse SQL Serverless database.
I have found the filename() and filepath() file metadata functions.
https://learn.microsoft.com/en-us/azure/synapse-analytics/sql/query-specific-files#filepath
Are there more functions or is that it?
I would really like to be able to get the file modified timestamp.

File metadata features are not supported via serverless function.
You should use Synapse spark notebook to query the blob metadata instead. You can find more information at BlobProperties Class

Related

Azure Synapse Serverless - SQL query to return rows in directory for each file

I have an Azure Data Lake Gen2 Container in which I have several json files. I would like to write a query that returns a record for each file. I am not interested in parsing the files, I just want to know what files are there and have this returned in a view. Does anyone have any tips on how I might do this? Everything I have found is about how to parse/read the files...I am going to let Power BI do that since the json format is not standard. In this case I just need a listing of files. Thanks!
You can use the filepath() and filename() function in Azure Synapse Analytics serverless SQL pools to return those. You can even GROUP BY them to return aggregated results. A simple example:
SELECT
[result].filepath() AS filepath,
[result].filename() AS filename,
COUNT(*) AS records
FROM
OPENROWSET(
BULK 'https://azureopendatastorage.blob.core.windows.net/nyctlc/yellow/puYear=2019/puMonth=4/*.parquet',
FORMAT = 'PARQUET'
) AS [result]
GROUP BY [result].filepath(), [result].filename()
See the documentation for further examples.

Is it possible to export a single table from Azure sql database, then directly save into azure blob storage without downloading it to local disk

I've tried to use SSMS, but it requires a temporary location for the BacPac file which is local. But i don't want to download it to local, would like to export a single table directly to Azure Blob storage.
Per my experience, we could import table data from the csv file stored in blob storage. But didn't find a way to export the table data to Blob Storage as csv file directly.
You could think about using Data Factory.
It can achieve that , please reference bellow tutorials:
Copy and transform data in Azure SQL Database by using Azure Data
Factory
Copy and transform data in Azure Blob storage by using Azure Data
Factory
Using Azure SQL database as the Source, and choose the table as dataset:
Source dataset:
Source:
Using Blob storage as Sink, and choose the DelimitedText as the sink format file:
Sink dataset:
Sink:
Run the pipeline and you will get the csv file in Blob Storage.
Also thanks for the tutorial #Hemant Halwai provided for us.

Presto query engine with Azure Data Lake

I have a requirement to deploy a presto server which can help me query data stored in ADLS in Avro file formats.
I have gone through this tutorial and it seems that the Hive is used as a catalogue/connector in presto to query from ADLS. Can I bypass Hive and have any connector to extract data from ADLS?
Can I bypass Hive and have any connector to extract data from ADLS?
No.
Hive here plays two roles here:
storage for metadata. It contains information like:
schema and table name
columns
data format
data location
execution
it is capable to read data from (HDFS) distributed file systems (like HDFS, S3, ADLS)
it tells how execution can be distributed.

I have a lambda function in which I am fetching a csv file from s3 now I want to run SQL query on that data in node.js?

I have a lambda function in which I am fetching a csv file from s3 now I want to run SQL query on that csv or query on JSON(after converting csv into JSON) which is best and easiest approch for this in node.js. As I want to use group by query so S3 select is not possible?
I found module "querycsv" in python , so I changed environment of code to Python. https://pythonhosted.org/querycsv/
Take a look at AWS Athena which helps you to run more complex querieson the files in S3.

copy blob data into on-premise sql table

My problem statement is that I have a csv blob and I need to import that blob into a sql table. Is there an utility to do that?
I was thinking of one approach, that first to copy blob to on-premise sql server using AzCopy utility and then import that file in sql table using bcp utility. Is this the right approach? and I am looking for 1-step solution to copy blob to sql table.
Regarding your question about the availability of a utility which will import data from blob storage to a SQL Server, AFAIK there's none. You would need to write one.
Your approach seems OK to me. Though you may want to write a batch file or something like that to automate the whole process. In this batch file, you would first download the file on your computer and the run the BCP utility to import the CSV in SQL Server. Other alternatives to writing batch file are:
Do this thing completely in PowerShell.
Write some C# code which makes use of storage client library to download the blob and once the blob is downloaded, start the BCP process in your code.
To pull a blob file into an Azure SQL Server, you can use this example syntax (this actually works, I use it):
BULK INSERT MyTable
FROM 'container/folder/folder/file'
WITH ( DATA_SOURCE = 'ds_blob',BATCHSIZE=10000,FIRSTROW=2);
MyTable has to have identical columns (or it can be a view against a table that yields identical columns)
In this example, ds_blob is an external data source which needs to be created beforehand (https://learn.microsoft.com/en-us/sql/t-sql/statements/create-external-data-source-transact-sql)
The external data source needs to use a database contained credential, which uses an SAS key which you need to generate beforehand from blob storage https://learn.microsoft.com/en-us/sql/t-sql/statements/create-database-scoped-credential-transact-sql)
The only downside to this mehod is that you have to know the filename beforehand - there's no way to enumerate them from inside SQL Server.
I get around this by running powershell inside Azure Automation that enumerates blobds and writes them into a queue table beforehand