How do I access files generated during Cloud Run function execution? - selenium

I'm running a very simple program getting screenshots of a page using Selenium in Cloud Run. I know that Cloud Run is stateless and I cannot access the screenshot that is generated after the program finishes executing, but I wanted to know where/how can I access these files right after the screenshot is taken and read them, so I can store a reference to them in my Cloud Storage bucket too

You have several solution:
Store the screenshot locally, and then upload them to Cloud Storage (you can create a script for that, use client libraries,...). A good evolution is to make a tar (optionally a gzip also) to upload only 1 file, it's faster.
Use Cloud Run execution runtime 2nd generation, and mount a bucket with GCSFuse into your Cloud Run instance. Like that, a file directly written in the mounted directory will be written on Cloud Storage. For that solution, and despite the good tutorial, it requires good skills in container.

Related

Mount S3 bucket as an NFS share on an EC2 instance

long time reader but I've usually been able to find the answers I've been looking for in existing posts - but this time I've not been able to.
I am essentially teaching myself AWS CDK from scratch, I've only really just started with it so not finding anything which helps me on my mission may be a result of not knowing enough yet to be asking the right questions... so please bare with me.
Thus far I've used the AWS CDK with Python to create a stack which creates an S3 bucket, and also fires up an EC2 instance with an AWS file storage gateway AMI loaded on it (so running Amazon Linux). This deploys and runs fine - however now I'd like to programmatically set up the S3 bucket to be accessed via an NFS share on the EC2 instance. From what I've seen I'd assumed it is or should be fairly trivial however I keep getting a bit lost in documentation and internet hunts and not quite sure I'm looking in the right places or asking search engines the right questions to unlock the path to achieve this.
It looks like I should be able to script something up to make it happen when the instance is start using user-data but I'm a bit lost. Is anyone able to throw me some crumbs to follow to find a good way of achieving this, or a better way of achieving what I want to happen (which is basically accessing the S3 bucket contents as though they are files on an EC2 instance) - if not tell me how to do it if it's trivial enough?
Much appreciated :)
Dan
You are on good track. user_data can be used for that.
I don't have full code to give you as its use case specific (e.g. which OS are you using?), but the user_data would have to download and install s3fs:
s3fs allows Linux and macOS to mount an S3 bucket via FUSE. s3fs preserves the native object format for files, allowing use of other tools like AWS CLI.
However, S3 is an object storage system, and it can't be really mounted on an instance like you would do with NFS or EBS storage solutions. But with s3fs-fuse you can mimic such a behavior. And for some use-cases it will be sufficient.
So what you can do, is to setup the user_data script through console, verify that it works, and then basically just copy and paste to CDK. Its more of a trial-and-see approach, but this is the best way to learn.

AzureFileShareConfiguration mount drive disconnected

I am trying to create a Pool using Azure Batch . I have uploaded content to Azure Storage using File Shares.
I would like my Pool to mount this Azure File Share as virtual file system (ref: https://learn.microsoft.com/en-us/azure/batch/virtual-file-mount#mount-a-virtual-file-system-on-a-pool ).
I am creating AzureFileShareConfiguration object using code:
mount_configuration=batchmodels.MountConfiguration(azure_file_share_configuration=batchmodels.AzureFileShareConfiguration(
account_name="mystorage",
azure_file_url="https://mystorage.file.core.windows.net/my-share1",
account_key="mystorage/key==",
relative_mount_path="S"
)
)
Using this, I get "CMDKEY: Credentials added successfully" in fsmounts. But when I RDP to the node in the pool, the S drive appears "Disconnected".
My Azure batch package versions are:
azure-batch==8.0.0
azure-common==1.1.24
Can you please help diagnose the issue or suggest the right usage?
Thanks in Advance!
I think this is windows VM you are trying?, just by looking at the drive letter : ).
Here is the key issue with RDP permissions is different then your Batch level model when your code runs and mount.
At Batch level when you mount your Drive: and you can see it via your Start task then it is working. i.e. that a Batch level permissioning model and when you RDP into Node it will be as a "user" you are logged-in. If you want to see via UI RDP user you should re-run the command from your RDP login to update that you have key to see that drive.
Although having said that try it with /persistent:Yes as mount_options.
The best test is going to be -- You mount the drive and from your start task go to the mounted directory via : S:\\Whatever_file.txt or read the mounted file which will add the result in your stdout.txt of batch node or might be just dir it or something.
Rest extra stuff below
try with this mount_options value
Also specifically this will help for various SMB version et. al. support: https://learn.microsoft.com/en-us/azure/storage/files/storage-how-to-use-files-windows and I think this you already know : https://learn.microsoft.com/en-us/azure/batch/virtual-file-mount#azure-files-share
In order to use an Azure file share outside of the Azure region it is
hosted in, such as on-premises or in a different Azure region, the OS
must support SMB 3.0.
So add this to your API and give it a try:
MountOptions = "/persistent:Yes" i.e. mount_options = "/persistent:Yes"
Also: key needs to be Storage account Key, i.e. it should not start with mystorage/key :) but it could be you hiding it, so just a mention and fyi.
Sample code:
I think SDK you have is python?
mount_configuration=batchmodels.MountConfiguration(azure_file_share_configuration=batchmodels.AzureFileShareConfiguration(
account_name="mystorage",
azure_file_url="https://mystorage.file.core.windows.net/my-share1",
account_key="mystorage/key==",
relative_mount_path="S",
mount_options = "/persistent:Yes"
)
hope this helps!
relative_mount_path: The relative path on the compute node where the file system will be mounted. All file systems are mounted relative to the Batch mounts directory, accessible via the AZ_BATCH_NODE_MOUNTS_DIR environment variable.
Azure Files is the standard Azure cloud file system offering. To learn more about how to get any of the parameters in the mount configuration code sample, see Use an Azure Files share.

When to use s3cmd over accessing the S3 API programmatically?

I've been having difficulty understanding when to use s3cmd program over using the Java API. A vendor has documentation on accessing S3 with s3cmd. It is unclear to me as the bucket names appear to be dynamic. No region is specified. Additionally, I'm reaching out over an endpoint. I've tried writing some Java code to interact with S3 the same way that s3cmd does but I haven't been able to connect. Overall, it appears to quite a bit different.
To me s3cmd seems to be a utility to manipulate these files or quickly get at them. Integrating this utility into a Java program seems meaningless.
Anyone have any resources or can help me understand this better?
S3cmd (s3cmd) is a free command line tool and client for uploading, retrieving and managing data in Amazon S3 and other cloud storage service providers that use the S3 protocol, such as Google Cloud Storage or DreamHost DreamObjects. It is best suited for power users who are familiar with command line programs. It is also ideal for batch scripts and automated backup to S3, triggered from cron, etc.
S3cmd is written in Python. It's an open source project available under GNU Public License v2 (GPLv2) and is free for both commercial and private use. You will only have to pay Amazon for using their storage.
Lots of features and options have been added to S3cmd, since its very first release in 2008.... we recently counted more than 60 command line options, including multipart uploads, encryption, incremental backup, s3 sync, ACL and Metadata management, S3 bucket size, bucket policies, and more!

AppEngine Backup from one app to another

I can't seem to restore my AppEngine backups to a new app as listed in the documentation.
We are using the cron backup as listed in the documentation.
I get through all the stages to launch the restore job successfully, but when it kicks of all the shards are failing with 503 errors.
I tried this with multiple backup files and the experience is the same.
any advice?
(Java runtime)
I'm posting this hoping this will help someone, as there is really lack of resources around Google's documentation and the web in general about this.
While the appengine documentation says this can be done, I actually found the piece of code that forbids this inside the data_storeadmin app.
I managed to connect through python remote-api shell, read an entity from the backup and tried saving to the datastore, but datastore.Put(entity) operation yielded: "BadRequestError: app s~app_a cannot access app s~app_b's data" so it seems to be on an even lower level.
In the end, I decided to restore only a specific namespace to the same app which was also a tedious task - but it did save the day.
I Managed to pull my backup locally through gsutil, install a python-remote-api version on my app, accessed the interactive shell and wrote this script:
https://gist.github.com/Shuky/ed8728f8eb6187475b9a
Hope this helps.
Shuky

Fastest / best way copy data between S3 to EC2?

I have a fairly large amount of data (~30G, split into ~100 files) I'd like to transfer between S3 and EC2: when I fire up the EC2 instances I'd like to copy the data from S3 to EC2 local disks as quickly as I can, and when I'm done processing I'd like to copy the results back to S3.
I'm looking for a tool that'll do a fast / parallel copy of the data back and forth. I have several scripts hacked up, including one that does a decent job, so I'm not looking for pointers to basic libraries; I'm looking for something fast and reliable.
Unfortunately, Adam's suggestion won't work as his understanding of EBS is wrong (although I wish he was right and often thought myself it should work that way)... as EBS has nothing to do with S3, but it will only give you an "external drive" for EC2 instances that are separate, but connectable to the instances. You still have to do copying between S3 and EC2, even though there are no data transfer costs between the two.
You didn't mention an operating system of your instance, so I cannot give tailored information. A popular command line tool I use is http://s3tools.org/s3cmd ... it is based on Python and therefore, according to info on its website it should work on Win as well as Linux, although I use it ALL the time on Linux. You could easily whip up a quick script that uses its built in "sync" command that works similar to rsync, and have it triggered every time you're done processing your data. You could also use the recursive put and get commands to get and put data only when needed.
There are graphical tools like Cloudberry Pro that have some command line options for Windows too that you can setup schedule commands. http://s3tools.org/s3cmd is probably the easiest.
By now, there is a sync command in the AWS Command line tools, that should do the trick: http://docs.aws.amazon.com/cli/latest/reference/s3/sync.html
On startup:
aws s3 sync s3://mybucket /mylocalfolder
before shutdown:
aws s3 sync /mylocalfolder s3://mybucket
Of course, the details are always fun to work out eg. how can parallel it is (and can you make it more parallel and is that any faster goven the virtual nature of the whole setup)
Btw hope you're still working on this... or somebody is. ;)
I think you might be better off using an Elastic Block Store to store your files instead of S3. An EBS is akin to a 'drive' on S3 that can be mounted into your EC2 instance without having to copy the data each time, thereby allowing you to persist your data between EC2 instances without having to write to or read from S3 each time.
http://aws.amazon.com/ebs/
Install s3cmd Package as
yum install s3cmd
or
sudo apt-get install s3cmd
depending on your OS
then copy data with this
s3cmd get s3://tecadmin/file.txt
also ls can list the files.
for more detils see this
For me the best form is:
wget http://s3.amazonaws.com/my_bucket/my_folder/my_file.ext
from PuTTy