There is no local machine to store a intermittent file. Read data from Blob to string variable, after modification of text string, need to upload the content to destination azure blob through azure automation account using powershell. Everything need to be done in Azure automation account.
Read data from Blob to string variable
If it is the big blob, then Output the blob content to a stream else Output the blob content as string.
$StorageAccount = Get-AzureRmStorageAccount -ResourceGroupName $ResourceGroupName -Name $StorageAccountName
$Blob = Get-AzureStorageBlob -Context $StorageAccount.Context -Container $container -Blob $Filename
$Text = $blob.ICloudBlob.DownloadText()
Write-Output $Text
To get the contents of the blob as a string and pipe it to convertfrom-json directly:
$blobs | Get-AzureStorageBlobContent -AsString | ConvertFrom-Json ...
There are different download options available for Text, Byte, File and Stream in ($Blob.ICloudBlob | Get-Member | where {$_.Name -like "Download*"} | select Name) gives you:
After modification of blob content, for uploading the modified content to the destination blob with the help of Azure Automation and PowerShell commands, please refer the below references will helps you for the following above:
Provision Azure Storage account, automatic file upload using PowerShell
Upload Files to Azure Blob Storage using Power Automate Desktop
I'm using Azcopy within a shell script to copy blobs within a container from one storage account to another on Azure.
Using the following command -
azcopy copy "https://$$container_name/?$source_sas" "https://$$container_name/?$dest_sas" --recursive
I'm generating the SAS token for both source and destination accounts and passing them as parameters in the command above along with the storage account and container names.
On execution, I keep getting this error ->
failed to parse user input due to error: the inferred source/destination combination could not be identified, or is currently not supported
When I manually enter the storage account names, container name and SAS tokens, the command executes successfully and storage data gets transferred as expected. However, when I use parameters in the azcopy command I get the error.
Any suggestions on this would be greatly appreciated.
You can use the below PowerShell Script
[string] $source_storage_name,
[string] $source_container_name,
[string] $dest_storage_name,
[string] $dest_container_name,
[string] $source_sas,
[string] $dest_sas
.\azcopy.exe copy "https://$$source_container_name/?$source_sas" "https://$$container_name/?$dest_sas" --recursive=true
To execute the above script you can run the below command.
.\ScriptFileName.ps1 -source_storage_name "<XXXXX>" -source_container_name "<XXXXX>" -source_sas "<XXXXXX>" -dest_storage_name "<XXXXX>" -dest_container_name "<XXXXXX>" -dest_sas "<XXXXX>"
I am Generating SAS token for both the Storage from here . Make Sure to Check all the boxes as i did in the picture.
OutPut ---
I am trying to export VMs list from Azure Update management using Azure Rest API. Below is my code.
$url = "$($SubscriptionId)/resourceGroups/$($resourceGroup)/providers/Microsoft.Automation/automationAccounts/$($automationAccount)/softwareUpdateConfigurations/$($UpdateScheduleName)?api-version=" + $apiversion
$RestMethod = (Invoke-RestMethod -Uri $url -Headers $headerParams -Method Get)
$NonAzComputerList = ($RestMethod).Properties.updateConfiguration.nonAzureComputerNames
Write-Output $NonAzComputerList
$NonAzComputerList | Export-Csv "VMList.csv" -NoTypeInformation
On Console, I do get the output correctly with VM names, but in CSV file, I get some random numbers instead of VM names.
I tried convertfrom-json as well but it shows error as "convertfrom-json : Invalid JSON primitive".
The GetType shows System.Object[]
In Console, I am getting correct VM names.
In CSV file, I am getting numbers (equal to number of characters in VM names).
Ok figured it out. Used out-file instead of export-csv
I'm trying to write matplotlib figures to the Azure blob storage using the method provided here:
Saving Matplotlib Output to DBFS on Databricks.
However, when I replace the path in the code with
path = 'wasbs://'
I get this error
[Errno 2] No such file or directory: 'wasbs://'
I don't understand the problem...
As per my research, you cannot save Matplotlib output to Azure Blob Storage directly.
You may follow the below steps to save Matplotlib output to Azure Blob Storage:
Step 1: You need to first save it to the Databrick File System (DBFS) and then copy it to Azure Blob storage.
Saving Matplotlib output to Databricks File System (DBFS): We are using the below command to save the output to DBFS: plt.savefig('/dbfs/myfolder/Graph1.png')
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({'fruits':['apple','banana'], 'count': [1,2]})
df.set_index('fruits',inplace = True)
Step 2: Copy the file from Databricks File System to Azure Blob Storage.
There are two methods to copy file from DBFS to Azure Blob Stroage.
Method 1: Access Azure Blob storage directly
Access Azure Blob Storage directly by setting "Spark.conf.set" and copy file from DBFS to Blob Storage.
spark.conf.set("< Blob Storage Name>", "<Azure Blob Storage Key>")
Use dbutils.fs.cp to copy file from DBFS to Azure Blob Storage:
dbutils.fs.cp('dbfs:/myfolder/Graph1.png', 'wasbs://<Container>#<Storage Name>')
Method 2: Mount Azure Blob storage containers to DBFS
You can mount a Blob storage container or a folder inside a container to Databricks File System (DBFS). The mount is a pointer to a Blob storage container, so the data is never synced locally.
source = "wasbs://",
mount_point = "/mnt/chepra",
extra_configs = {"":dbutils.secrets.get(scope = "azurestorage", key = "azurestoragekey")})
Use dbutils.fs.cp copy the file to Azure Blob Storage Container:
dbutils.fs.cp('dbfs:/myfolder/Graph1.png', '/dbfs/mnt/chepra')
By following Method1 or Method2 you can successfully save the output to Azure Blob Storage.
For more details, refer "Databricks - Azure Blob Storage".
Hope this helps. Do let us know if you any further queries.
You can write with .savefig() directly to Azure blob storage- you just need to mount the blob container before.
The following works for me, where I had mounted the blob container as /mnt/mydatalakemount
Documentation on mounting blob container is here.
This is what I also came up with so far. In order to reload the image from blob and display it as png in a databricks notebook again I use the following code:
blob_path = ...
dbfs_path = ...
dbutils.fs.cp( blob_path, dbfs_path )
with open( dbfs_path, "rb" ) as f:
im = BytesIO( )
img = mpimg.imread( im )
imgplot = plt.imshow( img )
display( imgplot.figure )
I didn't succeed using dbutils, which cannot be correctly created.
But I did succeed by mounting the file-shares to a Linux path, like this:
I am trying to transfer data from AWS S3 bucket (e.g. s3://mySrcBkt) to GCS location ( a folder under a bucket as gs://myDestBkt/myDestination ). I could not find the same option from Interface as it has only provision to provide bucket and not a subfolder. Neither I found the similar povision from the storagetransfer API. Here is my code snippet:
String SOURCE_BUCKET = .... ;
String ACCESS_KEY = .....;
String SECRET_ACCESS_KEY = .....;
String DESTINATION_BUCKET = .......;
TransferJob transferJob =
new TransferJob()
new TransferSpec()
.setObjectConditions(new ObjectConditions()
.setTransferOptions(new TransferOptions()
new AwsS3Data()
new AwsAccessKey()
new GcsData()
new Schedule()
Unfortunately I could not find anywhere to mention the destination folder for this transfer. I know gsutil rsync has similar however the scale & data integrity is a concern. Can anyone guide me/point me any way/workaround to achieve the goal ?
As the bucket and not a subdirectory is the available option for data transfer destination, the workaround for this scenario would be doing the transfer to your bucket, then doing the rsync operation between your bucket and the subdirectory, just keep in mind that you should try running the gsutil -m rsync -r -d -n to verify what it'll do, as you could delete data accidentally.
I've built an application using DynamoDB Local and now I'm at the point where I want to setup on AWS. I've gone through numerous tools but have had no success finding a way to take my local DB and setup the schema and migrate data into AWS.
For example, I can get the data into a CSV format but AWS has no way to recognize that. It seems that I'm forced to create a Data Pipeline... Does anyone have a better way to do this?
Thanks in advance
As was mentioned earlier, DynamoDB local is there for testing purposes. However, you can still migrate your data if you need to. One approach would be to save data into some format, like json or csv and store it into S3, and then use something like lambdas or your own server to read from S3 and save into your new DynamoDB. As for setting up schema, You can use the same code you used to create your local table to create remote table via AWS SDK.
you can create a standalone application to get the list of tables from the local dynamoDB and create them in your AWS account after that you can get all the data for each table and save them.
I'm not sure which language you familiar with but will explain some API might help you in Java.
example about how to create table using the above API
ProvisionedThroughput provisionedThroughput = new ProvisionedThroughput(1L, 1L);
CreateTableRequest groupTableRequest = mapper.generateCreateTableRequest(Group.class); //1
groupTableRequest.setProvisionedThroughput(provisionedThroughput); //2
// groupTableRequest.getGlobalSecondaryIndexes().forEach(index -> index.setProvisionedThroughput(provisionedThroughput)); //3
Table groupTable = client.createTable(groupTableRequest); //4
}catch(ResourceInUseException e){
log.debug("Group table already exist");
1- you will create TableRequest against mapping
2- setting the provision throughput and this will vary depend on your requirements
3- if the table has global secondary index you can use this line (Optional)
4- the actual table will be created here
5- the thread will be stopped till the table become active
I didn't mention the API related to data access (insert ... etc), I supposed that you're familiar with since you already use them in local dynamodb
I did a little work setting up my local dev environment. I use SAM to create the dynamodb tables in AWS. I didn't want to do the work twice so I ended up copying the schema from AWS to my local instance. The same approach can work the other way around.
aws dynamodb describe-table --table-name chess_lobby \
| jq '.Table' \
| jq 'del(.TableArn)' \
| jq 'del(.TableSizeBytes)' \
| jq 'del(.TableStatus)' \
| jq 'del(.TableId)' \
| jq 'del(.ItemCount)' \
| jq 'del(.CreationDateTime)' \
| jq 'del(.GlobalSecondaryIndexes[].IndexSizeBytes)' \
| jq 'del(.ProvisionedThroughput.NumberOfDecreasesToday)' \
| jq 'del(.GlobalSecondaryIndexes[].IndexStatus)' \
| jq 'del(.GlobalSecondaryIndexes[].IndexArn)' \
| jq 'del(.GlobalSecondaryIndexes[].ItemCount)' \
| jq 'del(.GlobalSecondaryIndexes[].ProvisionedThroughput.NumberOfDecreasesToday)' > chess_lobby.json
aws dynamodb create-table \
--cli-input-json file://chess_lobby.json \
--endpoint-url http://localhost:8000
The top command uses describe table aws cli capabilities to get the schema json. Then I use jq to delete all unneeded keys, since create-table is strict with its parameter validation. Then I can use create-table to create the table in the local environent by using the --endpoint-url command.
You can use the --endpoint-url parameter on the top command instead to fetch your local schema and then use the create-table without the --endpoint-url parameter to create it directly in AWS.