How to rename a Data Lake Gen2 folder using the Azure CLI? - azure-data-lake

I'm using Azure Data Lake Gen2 and I have a folder named myfolder with 1000s of files. Is there a command on the Azure Storage CLI for renaming the folder and/or move the entire folder to another location of the ADLS Gen2?
Inside Azure Databricks I can easily leverage the linux mv bash command:
mv myfolder newname for renaming myfolder
mv myfolder /dbfs/mount/myadls/target/ for moving myfolder to a target folder.
Is there a simple way of doing the same with the Azure CLI?

According to my research, if you want to manage Data Lake Gen2 directories, now we just can use Azure data lake gen2 rest api. For more details, please refer to the document.
For example, if you want to rename your folder, you can use the rest api
PuT https://<your account name>.dfs.core.windows.net/<file system name>/<new folder name>
Header:
x-ms-rename-source : /<file system name>/<orginal folder name>
Authorization : Bearer access token.
Regarding how to call the rest apiļ¼Œ please refer to the following steps
1. Create a service principal
az login
az ad sp create-for-rbac --name ServicePrincipalName
assign role to the service principal
az role assignment create \
--role "Storage Blob Data Contributor" \
--assignee < your service principal name> \
--scope "/subscriptions/<subscription>/resourceGroups/<resource-group>/providers/Microsoft.Storage/storageAccounts/<storage-account>"
Call the rest api
az login --service-principal --username <your service principal app id> --password <your service principal password>--tenant <your tenant id>
az rest --method put --uri https://testadls05.dfs.core.windows.net/test/testFolder --resource https://storage.azure.com --headers x-ms-rename-source=/test/testFolder1

UPDATE: An ADLS Gen2 CLI is now available
We can rename or move a directory by using the az storage fs directory move command.
Example 1: Renaming a directory from the name my-directory to the name my-new-directory in the same file system:
az storage fs directory move -n my-directory -f my-file-system --new-directory "my-file-system/my-new-directory" --account-name mystorageaccount --auth-mode login
Example 2: This example moves a directory to a file system named my-second-file-system.
az storage fs directory move -n my-directory -f my-file-system --new-directory "my-second-file-system/my
For more info, here's the official documentation.

Related

net use command asking for a username and password

Azure Files SMB Access On-premises with private endpoints, But when I used the net use command to mount the drive.
c:>net use Z: \myshare.file.core.windows.net\testshare
its keep asking username passowrd
Enter the user name for myacc.file.core.windows.net':
Thanks
Shanuka M
Azure file share with On-prem AD Authentication
Net use command fails if the storage account contains a forward slash
Try the below mentioned cmdlet
New-SmbMapping -LocalPath z: -RemotePath \\StorageAccountName.file.core.windows.net\sharename -UserName StorageAccountName -Password "AccountPassword"
This article will help in Mapping a Network Drive to an Azure File Share Using Domain Credentials
I would recommended to please refer the Prerequisites : on-premises Active Directory Domain Services authentication over SMB for Azure file shares

Can I pass a key file as parameter to connect to Google Cloud Storage using gsutil?

When I use gsutil to connect to my bucket on Google Cloud Storage, I usually use the following command:
gcloud auth activate-service-account --key-file="pathKeyFile"
What should I do if two scripts that are running on the same machine at the same time need two different Service Accounts?
I would like to use a command such as:
gsutil ls mybucket --key-file="mykeyspath"
I say this because in the case my script is running and another script changes the Service Account which is actually active, my script would not have permission to access the bucket anymore.
You can do this with BOTO file. You can create one as explained in the documentation.
Then you can specify which file to use when you run your gsutil command (here an example in linux)
# if you have several GSUTIL command to run
export BOTO_CONFIG=/path/to/.botoMyBucket
gsutil ls myBucket
# For only one command, you can define an env var inline like this
BOTO_CONFIG=/path/to/.botoMyBucket2 gsutil ls myBucket2

Gcloud auth login saves to legacy_credentials folder

I have no idea why, but I am running gcloud auth login, I have tried beta and application-default. All of them do not create the file: ~/.config/gcloud/credentials, instead I can find ~/.config/gcloud/legacy_credentials.
The issue I am having is that the library I am using does not want legacy_credentials, and renaming the folder did not work.
Here are my settings:
Google Cloud SDK 183.0.0
alpha 20
17.12.08
beta 2017.12.08
bq 2.0.27
core 2017.12.08
gsutil 4.28
Also I am using Ubuntu 16.04.3 LTS on digitalocean. I will be glad to supply any other information I can think of.
The credentials in the legacy folder contain:
" ============================================================================
" Netrw Directory Listing (netrw v155)
" /root/.config/gcloud/legacy_credentials/matt#mindbrainhive.org
" Sorted by name
" Sort sequence: [\/]$,\<core\%(\.\d\+\)\=\>,\.h$,\.c$,\.cpp$,\~\=\*$,*,\.o$,\.obj$,\.info$,\.swp$,\.bak$,\~$
" Quick Help: <F1>:help -:go up dir D:delete R:rename s:sort-by x:special
" ==============================================================================
../
./
.boto
adc.json
gcloud no longer uses ~/.config/gcloud/credentials, instead it stores credentials in sqlite3 ~/.config/gcloud/credentials.db.
These credentials files are considered internal to gcloud, they can change at any time. You should not be using them. What you likely want to use is
gcloud auth application-default login
instead of gcloud auth login. The former will create ~/.config/gcloud/application_default_credentials.json key file for user logged in account.
That said depending what you trying to do you probably want use service accounts (instead of user account). You can create the key file via
gcloud iam service-accounts keys create
See documentation for more info. Or you can use Google Cloud Platform Console to create key file.
Once you obtain json key file you can use it in your application as application default credentials, see https://developers.google.com/identity/protocols/application-default-credentials#howtheywork
You can also use this key in gcloud by using gcloud auth activate-service-account command.

Exporting data from Google Cloud Storage to Amazon S3

I would like to transfer data from a table in BigQuery, into another one in Redshift.
My planned data flow is as follows:
BigQuery -> Google Cloud Storage -> Amazon S3 -> Redshift
I know about Google Cloud Storage Transfer Service, but I'm not sure it can help me. From Google Cloud documentation:
Cloud Storage Transfer Service
This page describes Cloud Storage Transfer Service, which you can use
to quickly import online data into Google Cloud Storage.
I understand that this service can be used to import data into Google Cloud Storage and not to export from it.
Is there a way I can export data from Google Cloud Storage to Amazon S3?
You can use gsutil to copy data from a Google Cloud Storage bucket to an Amazon bucket, using a command such as:
gsutil -m rsync -rd gs://your-gcs-bucket s3://your-s3-bucket
Note that the -d option above will cause gsutil rsync to delete objects from your S3 bucket that aren't present in your GCS bucket (in addition to adding new objects). You can leave off that option if you just want to add new objects from your GCS to your S3 bucket.
Go to any instance or cloud shell in GCP
First of all configure your AWS credentials in your GCP
aws configure
if this is not recognising the install AWS CLI follow this guide https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-install.html
follow this URL for AWS configure
https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html
Attaching my screenshot
Then using gsutil
gsutil -m rsync -rd gs://storagename s3://bucketname
16GB data transferred in some minutes
Using Rclone (https://rclone.org/).
Rclone is a command line program to sync files and directories to and from
Google Drive
Amazon S3
Openstack Swift / Rackspace cloud files / Memset Memstore
Dropbox
Google Cloud Storage
Amazon Drive
Microsoft OneDrive
Hubic
Backblaze B2
Yandex Disk
SFTP
The local filesystem
Using the gsutil tool we can do a wide range of bucket and object management tasks, including:
Creating and deleting buckets.
Uploading, downloading, and deleting objects.
Listing buckets and objects. Moving, copying, and renaming objects.
we can copy data from a Google Cloud Storage bucket to an amazon s3 bucket using gsutil rsync and gsutil cp operations. whereas
gsutil rsync collects all metadata from the bucket and syncs the data to s3
gsutil -m rsync -r gs://your-gcs-bucket s3://your-s3-bucket
gsutil cp copies the files one by one and as the transfer rate is good it copies 1 GB in 1 minute approximately.
gsutil cp gs://<gcs-bucket> s3://<s3-bucket-name>
if you have a large number of files with high data volume then use this bash script and run it in the background with multiple threads using the screen command in amazon or GCP instance with AWS credentials configured and GCP auth verified.
Before running the script list all the files and redirect to a file and read the file as input in the script to copy the file
gsutil ls gs://<gcs-bucket> > file_list_part.out
Bash script:
#!/bin/bash
echo "start processing"
input="file_list_part.out"
while IFS= read -r line
do
command="gsutil cp ${line} s3://<bucket-name>"
echo "command :: $command :: $now"
eval $command
retVal=$?
if [ $retVal -ne 0 ]; then
echo "Error copying file"
exit 1
fi
echo "Copy completed successfully"
done < "$input"
echo "completed processing"
execute the Bash script and write the output to a log file to check the progress of completed and failed files.
bash file_copy.sh > /root/logs/file_copy.log 2>&1
I needed to transfer 2TB of data from Google Cloud Storage bucket to Amazon S3 bucket.
For the task, I created the Google Compute Engine of V8CPU (30 GB).
Allow Login using SSH on the Compute Engine.
Once logedin create and empty .boto configuration file to add AWS credential information. Added AWS credentials by taking the reference from the mentioned link.
Then run the command:
gsutil -m rsync -rd gs://your-gcs-bucket s3://your-s3-bucket
The data transfer rate is ~1GB/s.
Hope this help.
(Do not forget to terminate the compute instance once the job is done)
For large amounts of large files (100MB+) you might get issues with broken pipes and other annoyances, probably due to multipart upload requirement (as Pathead mentioned).
For that case you're left with simple downloading all files to your machine and uploading them back. Depending on your connection and data amount, it might be more effective to create VM instance to utilize high-speed connection and ability to run it in the background on different machine than yours.
Create VM machine (make sure the service account has access to your buckets), connect via SSH and install AWS CLI (apt install awscli) and configure the access to S3 (aws configure).
Run these two lines, or make it a bash script, if you have many buckets to copy.
gsutil -m cp -r "gs://$1" ./
aws s3 cp --recursive "./$1" "s3://$1"
(It's better to use rsync in general, but cp was faster for me)
Tools like gsutil and aws s3 cp won't use multipart uploads/downloads, so will have poor performance for large files.
Skyplane is a much faster alternative for transferring data between clouds (up to 110x for large files). You can transfer data with the command:
skyplane cp -r s3://aws-bucket-name/ gcs://google-bucket-name/
(disclaimer: I am a contributor)

gsutil - How to copy/download all files from Google private cloud?

Google Play Developer account reports are stored on private Google Cloud Storage bucket.
Every Google Play Developer account has Google Cloud Storage bucket ID
So to access I have installed gsutil on my windows machine.
Now I am using this command to copy all files from bucket
gsutil cp -r dir gs://[bucket_id]
its says
CommandException: No URLs matched
When I list all directories on bucket, this command works
gsutil ls gs://[bucket_id]
Can anyone help here to understand the gsutil exception ?
This exception is because destination URL is missing
It should be like...
gsutil cp -r dir gs://[bucket_id] [destination_bucket_url]