Validate a file name in an S3 bucket through SAP Data Intelligence 3.1

Validate a file name in an S3 bucket through SAP Data Intelligence 3.1 - sap-data-services

I need to validate the file name format of a .csv file that is available in an S3 bucket through SAP DI.
What is the operator used in SAP DI to perform this task?
Can I use 'Monitor Files' component to achieve this?

Related

How can I export data from azure storage table to .csv file in .Net core C#

is there an azure API to import/export an existing collection from Azure Table Storage in .csv?

The Table Storage REST API does not provide a response as CSV directly, so it's always necessary to transform the data accordingly, as for example the Azure Storage Explorer does using an older version of the azcopy v7.3.
I've built a little C# library that basically does the same. It currently caches all rows in memory though to create the CSV headers so that's something to be aware of.

How to copy IoT Hub stored blobs to an Azure SQL using Data Factory

We are using the IoT Hub routing feature to store messages into an Azure Blob container. By default it stores the messages in a hierarchical manner - creating a folder structure for year, month, day and so on. Within the folder for each day, it creates multiple block blob binary files. Each file may contain multiple JSON objects, each representing a unique IoT telemetry message.
How can I use Azure Data Factory to copy each of these messages into an Azure SQL database?
Screenshot from Azure Storage Explorer
A sample blob file containing multiple messages

It seams that all the files have the same json schema. Then you could follow my steps.
I created an folder csv in my container and have several csv files with json data:
Source Dataset: the data in csv file is json format, so I choose the json format file.
choose the container: test
import the schema(.json)
Source setting: using wildcard file path to choose all the folder and file in the container.
Sink setting:
Mapping:
Run the pipeline and check the result in sink table:

SSIS sending source Oledb data to S3 Buckets in parquet File

My source is SQL Server and I am using SSIS to export data to S3 Buckets, but now my requirement is to send files as parquet File formate.
Can you guys give some clues on how to achieve this?
Thanks,
Ven

For folks stumbling on this answer, Apache Parquet is a project that specifies a columnar file format employed by Hadoop and other Apache projects.
Unless you find a custom component or write some .NET code to do it, you're not going to be able to export data from SQL Server to a Parquet file. KingswaySoft's SSIS Big Data Components might offer one such custom component, but I've got no familiarity.
If you were exporting to Azure, you'd have two options:
Use the Flexible File Destination component (part of the Azure feature pack), which exports to a Parquet file hosted in Azure Blob or Data Lake Gen2 storage.
Leverage PolyBase, a SQL Server feature. It let's you export to a Parquet file via the external table feature. However, that file has to be hosted in a location mentioned here. Unfortunately S3 isn't an option.
If it were me, I'd move the data to S3 as a CSV file then use Athena to convert the CSV file to Pqrquet. There is a nifty article here that talks through the Athena piece:
https://www.cloudforecast.io/blog/Athena-to-transform-CSV-to-Parquet/
Net-net, you'll need to spend a little money, get creative, switch to Azure, or do the conversion in AWS.

Exporting table from Amazon RDS into a csv file using Golang API

I am looking for some way to directly export the SQL query results to a CSV file from AWS lambda. I have found this similar question - Exporting table from Amazon RDS into a csv file. But it will not work with the AWS Golang API.
Actually, I want to schedule a lambda function which will daily query some of the views/tables from RDS(SQL Server) and put it to the S3 bucket in CSV format. So, I want to directly download the query results in the CSV form in the lambda and then upload it to S3.
I have also found data pipeline service of AWS to copy RDS data to S3 directly, but I am not sure if I can make use of it here.
It would be helpful if anyone can suggest me the right process to do it and references to implement it.

You can transfer files between a DB instance running Amazon RDS for
SQL Server and an Amazon S3 bucket. By doing this, you can use Amazon
S3 with SQL Server features such as BULK INSERT. For example, you can
download .csv, .xml, .txt, and other files from Amazon S3 to the DB
instance host and import the data from D:\S3\into the database. All
files are stored in D:\S3\ on the DB instance
Reeference:
https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/User.SQLServer.Options.S3-integration.html

Importing data from AWS Athena to RDS instance

Currently I’m listening events from AWS Kinesis and writing them to S3. Then I query them using AWS Glue and Athena.
Is there a way to import that data, possibly with some transformation, to an RDS instance?

There are several general approaches to take with regards to that task.
Read data from and Athena query into a custom ETL script (using a JDBC connection) and load into the database
Mount the S3 bucket holding the data to a file system (perhaps using s3fs-fuse), read the data using a custom ETL script, and push it to the RDS instance(s)
Download the data to be uploaded to the RDS instance to a filesystem using the AWS CLI or the SDK, process locally, and then push to RDS
As you suggest, use AWS Glue to import the data to from Athena to the RDS instance. If you are building an application that is tightly coupled with AWS, and if you are using Kinesis and Athena you are, then such a solution makes sense.
When connecting GLUE to RDS a couple of things to keep in mind (mostly on the networking side:
Ensure that DNS Hostnames are enabled the VPC hosting the target RDS instance
You'll need to setup a self-referencing rule in the Security Group associated with the target RDS instance
For some examples of code targetting a relational database, see the following tutorials

One approach for Postgres:
Install the S3 extension in Postgres:
psql=> CREATE EXTENSION aws_s3 CASCADE;
Run the query in Athena and find the CSV result file location in S3 (S3 output location is in Athena settings) (You can also inspect the "Download results" button to get the S3 path)
Create your table in Postgres
Import from S3:
SELECT aws_s3.table_import_from_s3(
'newtable', '', '(format csv, header true)',
aws_commons.create_s3_uri('bucketname', 'reports/Unsaved/2021/05/10/aa9f04b0-d082-328g-5c9d-27982d345484.csv', 'us-east-1')
);
If you want to convert empty values to null, you can use this: (format csv, FORCE_NULL (columnname), header true)
https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/PostgreSQL.Procedural.Importing.html

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Validate a file name in an S3 bucket through SAP Data Intelligence 3.1 - sap-data-services

I need to validate the file name format of a .csv file that is available in an S3 bucket through SAP DI. What is the operator used in SAP DI to perform this task? Can I use 'Monitor Files' component to achieve this?

Related

How can I export data from azure storage table to .csv file in .Net core C#

How to copy IoT Hub stored blobs to an Azure SQL using Data Factory

SSIS sending source Oledb data to S3 Buckets in parquet File

Exporting table from Amazon RDS into a csv file using Golang API

Importing data from AWS Athena to RDS instance

Categories

Resources