How do you import a custom python library onto an apache spark pool with Azure Synapse Analytics? - python-wheel

According to Microsoft's documentation it is possible to upload a python wheel file so that you can use custom libraries in Synapse Analytics.
Here is that documentation: https://learn.microsoft.com/en-us/azure/synapse-analytics/spark/apache-spark-azure-portal-add-libraries
I have created a simple library with just a hello world function that I was able to install with pip on my own computer. So I know my wheel file works.
I uploaded my wheel file to the location Microsoft's documentation say to upload the file.
I also found a youtube video of a person doing exactly what I am trying to do.
Here is the video: https://www.youtube.com/watch?v=t4-2i1sPD4U
Microsoft's documentation mentions this, "Custom packages can be added or modified between sessions. However, you will need to wait for the pool and session to restart to see the updated package."
As far as I can tell there is no way to restart a pool, and I also do not know how to tell if the pool is down or has restarted.
When I try to use the library in a notebook I get a module not found error.

Scaling up or down will force the cluster to restart .

Making changes to the spark pool's scale settings does restart the spark pool as HimanshuSinha-msft suggested. That was not my problem though.
The actual problem was that I needed the Storage Blob Data Contributor role in the data lake storage the files were stored in. I assumed because I already had owner permissions and because I could create a folder and upload there I had all the permissions I needed. Once I got the Storage Blob Data Contributor role though everything worked.

Related

How do I access files generated during Cloud Run function execution?

I'm running a very simple program getting screenshots of a page using Selenium in Cloud Run. I know that Cloud Run is stateless and I cannot access the screenshot that is generated after the program finishes executing, but I wanted to know where/how can I access these files right after the screenshot is taken and read them, so I can store a reference to them in my Cloud Storage bucket too
You have several solution:
Store the screenshot locally, and then upload them to Cloud Storage (you can create a script for that, use client libraries,...). A good evolution is to make a tar (optionally a gzip also) to upload only 1 file, it's faster.
Use Cloud Run execution runtime 2nd generation, and mount a bucket with GCSFuse into your Cloud Run instance. Like that, a file directly written in the mounted directory will be written on Cloud Storage. For that solution, and despite the good tutorial, it requires good skills in container.

VMWare resize disk size using vcenter api

I have been trying to solve this for the past week.
I'm using the vcenter API to add a new disk to an existing VM
https://vdc-repo.vmware.com/vmwb-repository/dcr-public/1cd28284-3b72-4885-9e31-d1c6d9e26686/71ef7304-a6c9-43b3-a3cd-868b2c236c81/doc/operations/com/vmware/vcenter/vm/hardware/disk.create-operation.html
and as able to do it successfully.
But I cannot figure out how to resize an existing VM disk.
https://vdc-repo.vmware.com/vmwb-repository/dcr-public/1cd28284-3b72-4885-9e31-d1c6d9e26686/71ef7304-a6c9-43b3-a3cd-868b2c236c81/doc/operations/com/vmware/vcenter/vm/hardware/disk.update-operation.html
This disk update operation does not allow to update the "capacity" attribute. So I'm not sure how to resolve this, unless I use an SDK.
Can someone please point me in the right direction?
I'm not 100% up to speed on the latest version, but there are several things that the REST API cannot do compared to the "old" SDK which is based on SOAP / WSDL.
The documentation on the page also states that the call only: "Updates the configuration of a virtual disk. An update operation can be used to detach the existing VMDK file and attach another VMDK file to the virtual machine." So there's no mention of changing the size (which is pretty lame I have to say...).
So I think unfortunately it seems like you either
Wait for a new version and hope this will be included
You use the good old SDK

AzureFileShareConfiguration mount drive disconnected

I am trying to create a Pool using Azure Batch . I have uploaded content to Azure Storage using File Shares.
I would like my Pool to mount this Azure File Share as virtual file system (ref: https://learn.microsoft.com/en-us/azure/batch/virtual-file-mount#mount-a-virtual-file-system-on-a-pool ).
I am creating AzureFileShareConfiguration object using code:
mount_configuration=batchmodels.MountConfiguration(azure_file_share_configuration=batchmodels.AzureFileShareConfiguration(
account_name="mystorage",
azure_file_url="https://mystorage.file.core.windows.net/my-share1",
account_key="mystorage/key==",
relative_mount_path="S"
)
)
Using this, I get "CMDKEY: Credentials added successfully" in fsmounts. But when I RDP to the node in the pool, the S drive appears "Disconnected".
My Azure batch package versions are:
azure-batch==8.0.0
azure-common==1.1.24
Can you please help diagnose the issue or suggest the right usage?
Thanks in Advance!
I think this is windows VM you are trying?, just by looking at the drive letter : ).
Here is the key issue with RDP permissions is different then your Batch level model when your code runs and mount.
At Batch level when you mount your Drive: and you can see it via your Start task then it is working. i.e. that a Batch level permissioning model and when you RDP into Node it will be as a "user" you are logged-in. If you want to see via UI RDP user you should re-run the command from your RDP login to update that you have key to see that drive.
Although having said that try it with /persistent:Yes as mount_options.
The best test is going to be -- You mount the drive and from your start task go to the mounted directory via : S:\\Whatever_file.txt or read the mounted file which will add the result in your stdout.txt of batch node or might be just dir it or something.
Rest extra stuff below
try with this mount_options value
Also specifically this will help for various SMB version et. al. support: https://learn.microsoft.com/en-us/azure/storage/files/storage-how-to-use-files-windows and I think this you already know : https://learn.microsoft.com/en-us/azure/batch/virtual-file-mount#azure-files-share
In order to use an Azure file share outside of the Azure region it is
hosted in, such as on-premises or in a different Azure region, the OS
must support SMB 3.0.
So add this to your API and give it a try:
MountOptions = "/persistent:Yes" i.e. mount_options = "/persistent:Yes"
Also: key needs to be Storage account Key, i.e. it should not start with mystorage/key :) but it could be you hiding it, so just a mention and fyi.
Sample code:
I think SDK you have is python?
mount_configuration=batchmodels.MountConfiguration(azure_file_share_configuration=batchmodels.AzureFileShareConfiguration(
account_name="mystorage",
azure_file_url="https://mystorage.file.core.windows.net/my-share1",
account_key="mystorage/key==",
relative_mount_path="S",
mount_options = "/persistent:Yes"
)
hope this helps!
relative_mount_path: The relative path on the compute node where the file system will be mounted. All file systems are mounted relative to the Batch mounts directory, accessible via the AZ_BATCH_NODE_MOUNTS_DIR environment variable.
Azure Files is the standard Azure cloud file system offering. To learn more about how to get any of the parameters in the mount configuration code sample, see Use an Azure Files share.

How do I create SQL connection to my app and Upload it to google cloud

Thanks for getting back at me. Sorry for the late reply, it was bed-time this time. I need to connect the Cloud SQL database that I have created to my application that is in App Engine. I tried to follow the online tutorials but when I do apply such info I would get then gcloud app deploy it return a connection error. Please help. Also clarify here: When I execute the gcloud app deploy command I suppose it takes my local file to Google Cloud where I would see the entire folder and files of my project on the project I was deploying but I am seeing the old version of my project while presentation has changed to the latest version. Also last one how can I link domain nam from http://domain.google.com to my app in http://cloud.google.com . Please help I am dying with stress I have been trying in here
Given that you haven't provided any information as to what settings you are using, or what error has been provided it is impossible to know what kind of problem you are running into.
I suggest taking a look at the "Connecting to App Engine" page here. It should answer a lot of your questions around connecting from an App Engine app.
I see two questions here.
1.
I need to connect the Cloud SQL database that I have created to my
application that is in App Engine. I tried to follow the online
tutorials but when I do apply such info I would get then gcloud app
deploy it return a connection error. Please help. Also clarify here:
When I execute the gcloud app deploy command I suppose it takes my
local file to Google Cloud where I would see the entire folder and
files of my project on the project I was deploying but I am seeing the
old version of my project while presentation has changed to the latest
version.
I see your problem here to be with CloudSQL and GAE connectivity. Depending on whether you use GAE Standard or Flex and CloudSQL MySQL or POSTGRES the steps varies. Documentation is quite clear in here though.
2.
Also last one how can I link domain nam from http://domain.google.com
to my app in http://cloud.google.com . Please help I am dying with
stress I have been trying in here
This is going to be super simple, goto GCP cloud console, Navigate to GAE-->Settings-->Custom Domain and click on add custom domain "Enter the domain name you want to link" When you click continue you will be shown the steps for verifying the domain owneship and to point the DNS to the GAE.
Documented properly by GCP folks at https://cloud.google.com/appengine/docs/standard/python/mapping-custom-domains
If you are using GAE Standard or Flex, a possible result of command gcloud app deploy :
An app.yaml (or appengine-web.xml) file is required to deploy this directory as an App Engine App, check next links:
https://cloud.google.com/appengine/docs/flexible/python/configuring-your-app-with-app-yaml
https://cloud.google.com/appengine/docs/flexible/python/writing-application-logs
Mysql and Postgres connection:
https://cloud.google.com/sql/docs/mysql/connect-app-engine
https://cloud.google.com/sql/docs/postgres/connect-app-engine
Sometimes it easy share the app.yaml for replicate the app correctly.

AppEngine Backup from one app to another

I can't seem to restore my AppEngine backups to a new app as listed in the documentation.
We are using the cron backup as listed in the documentation.
I get through all the stages to launch the restore job successfully, but when it kicks of all the shards are failing with 503 errors.
I tried this with multiple backup files and the experience is the same.
any advice?
(Java runtime)
I'm posting this hoping this will help someone, as there is really lack of resources around Google's documentation and the web in general about this.
While the appengine documentation says this can be done, I actually found the piece of code that forbids this inside the data_storeadmin app.
I managed to connect through python remote-api shell, read an entity from the backup and tried saving to the datastore, but datastore.Put(entity) operation yielded: "BadRequestError: app s~app_a cannot access app s~app_b's data" so it seems to be on an even lower level.
In the end, I decided to restore only a specific namespace to the same app which was also a tedious task - but it did save the day.
I Managed to pull my backup locally through gsutil, install a python-remote-api version on my app, accessed the interactive shell and wrote this script:
https://gist.github.com/Shuky/ed8728f8eb6187475b9a
Hope this helps.
Shuky