Azure Blob Storage - How to read source string: wasbs://training#dbtrainsouthcentralus.blob.core.windows.net - dataframe

I am doing a lab for an Azure Data course and there was some code to run from within Azure Databricks.
I noticed that it seemed to mount something from the following location:
wasbs://training#dbtrainsouthcentralus.blob.core.windows.net
So I am trying to figure out how to deconstruct the above string
wasbs looks to mean "windows azure storage blob"
The string training#dbtrainsouthcentralus.blob.core.windows.net looks like it means "container name"#"account name" - Which I would think should be something in my Azure Data Lake.
I dug around in my ADLS and was not able to find anything related to "training#dbtrainsouthcentralus.blob.core.windows.net"
So I was wondering, where on earth did this come from? How can I trace back to where this path came from?

The url is indeed constructed as follows:
wasbs://[container-name]#[storage-account-name].blob.core.windows.net[directory-name] (source)
I dug around in my ADLS ...
You won't find it in ALDS, it is a seperate resource in your subscription. There should be a storage account named dbtrainsouthcentralus.
Note: It could also be a public accessible storage account in some training subscription you do not have access to and is provided by microsoft for training purposes.

Related

Query blob storage with Get-AzDataLakeGen2ChildItem?

Our powershell test harness used to use Get-AzDataLakeGen2ChildItem to list blobs found in non data lake storage accounts. Today I updated the powershell and Az module versions they were locked at, and now when issuing the command (specifying a Filesystem container, and context), the following error is returned:
Get-AzDataLakeGen2ChildItem: Input string was not in a correct format.
I'm assuming something has changed, and this function cannot process a result from non data lake storage compatibly anymore.
For one reason or another, a while back we changed from using Get-AzStorageBlob. So interested to know if there's any solution to be able to continue working with this call, rather than to deviate from Get-AzDataLakeGen2ChildItem where required.
One of the workaround to list the sub directories and files in a directory or Filesystem from an Azure storage account using the Get-AzDataLakeGen2ChildItem .
To do that we must have enabled Hierarchical Namespace .
Then you will get something like below example;
NOTE:- If you are using existing storage which has not enabled Hierarchical Namespace then you need to upgrade that storage account by doing the below steps:
For more information please refer the below links:-
MS DOC| Get-AzDataLakeGen2ChildItem , Get-AzStorageBlob .
SO THREAD FOR SIMILAR ISSUE.

Cloud file storage with file tagging and search by tags/filename

My project needs to meet next requirements.
store large amount of files for reasonable price
tag individual files with custom tags
have API method to search files by name (contains) and tags (exact)
do it all via JS SDK (keep project serverless)
I made some work with Amazon S3 and turned out
no search method in JS SDK http://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/S3.html#listObjectsV2-property
listObjects accepts param Key Prefix (i.e. filename starts with), so there is no way to find by contains
no param to search by tag at all, i can only get it for individual file with getObjectTagging
So question is - what stable service can i use for file storage WITH functionality described above
Azure? Google Cloud? Backblaze B2? something else?
thanks!
If you use Azure blob storage, you can use Azure Search blob indexer to index both the metadata and textual content of your blobs. For a walkthrough of setting this up, see Build and query your first Azure Search index in the portal.

Transferring Storage Accounts Table into Data Lake using Data Factory

I am trying to use Data Factory to transfer a table from Storage Accounts into Data Lake. Microsoft claims that one can, "store files of arbitrary sizes and formats into Data Lake". I use the online wizard and try to create a pipeline. Pipeline gets created, but I then always get an error saying:
Copy activity encountered a user error: ErrorCode=UserErrorTabularCopyBehaviorNotSupported,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=CopyBehavior property is not supported if the source is tabular data source.,Source=Microsoft.DataTransfer.ClientLibrary,'.
Any suggestions what I can do to be able to use Data Factory to transfer data from Storage Accounts table into Data Lake?
Thanks.
Your case is supported by ADF. While for the error you hit, there is a known defect that for some cases the copy wizard mis-generate a "CopyBehavior" property which is not applicable. We are fixing that now.
For you to workaround, go to Azure portal -> Author and deploy -> select that pipeline -> find the "CopyBehavior": "MergeFiles" under AzureDataLakeStoreSink and remove that line -> then deploy and rerun the activity.
If you happened to author an run-once pipeline, please re-author a scheduled one given the former is hard to be updated using JSON.
Thanks,
Linda

Google Cloud Logging export to Big Query does not seem to work

I am using the the google cloud logging web ui to export google compute engine logs to a big query dataset. According to the docs, you can even create the big query dataset from this web ui (It simply asks to give the dataset a name). It also automatically sets up the correct permissions on the dataset.
It seems to save the export configuration without errors but a couple of hours have passed and I don't see any tables created for the dataset. According to the docs, exporting the logs will stream the logs to big query and will create the table with the following template:
my_bq_dataset.compute_googleapis_com_activity_log_YYYYMMDD
https://cloud.google.com/logging/docs/export/using_exported_logs#log_entries_in_google_bigquery
I can't think of anything else that might be wrong. I am the owner of the project and the dataset is created in the correct project (I only have one project).
I also tried exporting the logs to a google storage bucket and still no luck there. I set the permissions correctly using gsutil according to this:
https://cloud.google.com/logging/docs/export/configure_export#setting_product_name_short_permissions_for_writing_exported_logs
And finally I made sure that the 'source' I am trying to export actually has some log entries.
Thanks for the help!
Have you ingested any log entries since configuring the export? Cloud Logging only exports entries to BigQuery or Cloud Storage that arrive after the export configuration is set up. See https://cloud.google.com/logging/docs/export/using_exported_logs#exported_logs_availability.
You might not have given edit permission for 'cloud-logs#google.com' in the Big Query console. Refer this.

Can Someone Help Me Troubleshoot Error In BQ "does not contain valid backup metadata."

I keep trying to upload a new table onto my companies BQ, but I keep getting the error you see in the title ("does not contain valid backup metadata.").
For reference, I'm uploading a .csv file that has been saved to our Google Cloud data storage. It's being uploaded as a native table.
Can anyone help me troubleshoot this?
It sounds like you are specifying the file type DATASTORE_BACKUP. When you specify that file type, BigQuery will take whatever uri you provide (even if it has a .CSV suffix) and search for the Google Cloud Data Storage Backup files relative to that url.