MLflow artifacts on S3 but not in UI - amazon-s3

I'm running mlflow on my local machine and logging everything through a remote tracking server with my artifacts going to an S3 bucket. I've confirmed that they are present in S3 after a run but when I look at the UI the artifacts section is completely blank. There's no error, just empty space.
Any idea why this is? I've included a picture from the UI.

You should see the 500 response in your artifacts request to the MLflow tracking server e.g. by clicking on the model of interest page (in the browser console). The UI service wouldnt know the location (since you set that to be an S3 bucket) and tries to load the defaults.
You need to specify the --artifacts-destination s3://yourPathToArtifacts argument to your mlflow server command. Also, when running the server in your environment dont forget to supply some common AWS credentials provider(s) (such as AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY env variables) as well as the MLFLOW_S3_ENDPOINT_URL env variable to point to your S3 endpoint.

I had the same issue with mlflow running on an ec2 instance. I logged into the server and noticed that it was overloaded and no disk space was left. I deleted a few temp files and the mlflow UI started displaying the files again. It seems like mlflow stores tons of tmp files, but that is a separate issue.

Related

Read bucket object in CDK

In terraform to read an object from s3 bucket at the time of deployment I can use data source
data aws_s3_bucket_object { }
Is there a similar concept in CDK? I've seen various methods of uploading assets to s3, as well as importing an existing bucket, but not getting an object from the bucket. I need to read a configuration file from the bucket that will affect further deployment.
Its important to remember that CDK itself is not a deployment option. it can deploy, but the code you are writing in a cdk stack is the definition of your resources - not a method for deployment.
So, you can do one of a few things.
Use your SDK for your language to make a call to the s3 bucket and load the data directly. This is perfectly acceptable and an understood way to gather information you need before deployment - each time the stack Synths (which it does before every cdk deploy that code will run and will pull your data.
Use a CodePipeline to set up a proper pipeline, and give it two sources - one your version control repo and the second your s3 bucket:
https://docs.aws.amazon.com/codebuild/latest/userguide/sample-multi-in-out.html
The preferred way - drop the json file, and use Parameter Store. CDK contains modules that will create a token version of this parameter on synth, and when it deploys it will reference that properly back to the Systems Manager Parameter store
https://docs.aws.amazon.com/cdk/v2/guide/get_ssm_value.html
If your parameters change after deployment, you can have that as part of your cdk stack pretty easily (using cfn outputs). If they change in the middle/during deployment, you really need to be using a CodePipeline to manage these steps instead of just CDK.
Because remember: The cdk deploy option is just a convenience. It will execute everything and has no way to pause in the middle and execute specific steps. (other than a very basic, this depends on this resources)

How to make browser download html when its content changed in s3?

I am using s3 bucket to host my web site. Whenever I release a new version of my web site, I want all clients download it from s3 instead of reading from their browser cache. I know I can set up an expire time for the object saved on s3 bucket but it is not an idea solution since users have to use the cached content for a period of time. Is there a way to force browser to download the content if they are changed in s3 bucket?
Irrespective of whether you are using s3 bucket for hosting or any other hosting server, caching can be controlled by appending hash number to file name.
For example your js file bundle name should be like bundle.7e2c49a622975ebd9b7e.js.
When you deploy it again it will change to some other hash value bundle.205199ab45963f6a62ec.js.
By doing this, browser automatically knows that, new file has arrived and should be downloaded again.
This can be easily done using any popular bundlers like grunt, gulp, webpack.
webpack example

Backing up a Serverless Framework deployment

I'm familiar with Terraform and its terraform.tfstate file where it keeps track of which local resource identifiers map to which remote resources. I've noticed that there is a .serverless directory on my machine which seems to contain files such as CloudFormation templates and ZIP files containing Lambda code.
Suppose I create and deploy a project from my laptop, and Serverless spins up fooxyz.cloudfront.net which points to a Lambda function arn:aws:lambda:us-east-1:123456789012:function:handleRequest456. If I naively try to run Serverless again from another machine (or if I git clean my working directory), it'll spin up a new CloudFront endpoint since it doesn't know that fooxyz.cloudfront.net already represents the same application. I'm looking to back up the state it keeps internally, so that it modifies an existing resource rather than creates a new one. (The equivalent in Terraform would be to back up the terraform.tfstate file.)
If I wished to back up or restore a Serverless deployment state, which files would I back up? In the case of AWS, it seems like I should be backing up the CloudFormation templates; I don't want to back up the Lambda code since it's directly generated from the source. However, I'm likely going to use more than just AWS in the future, and so don't want to "special-case" the CloudFormation templates if at all possible.
How can I back up only the files I cannot regenerate?
I think what you are asking is If I or a colleague checks out the serverless code from git on a different machine, will we still be able to deploy and update the same lambda functions and the same API gateway endpoints?
And the answer to that is yes! Serverless keeps track of all of that for you within their files. Unless you run serverless destroy - no operation will create a new lambda or api endpoint.
My team and I are using this method: we commit all code to a git repo and one of us checks it out and deploys a function or the entire thing and it updates the existing set of functions properly. If you setup an environment file - that's all you need to worry about really. And I recommend leaving it outside of git entirely.
For AWS; Serverless Framework keeps track of your deployment via Cloudformation (CF) parameters/identifiers which are specific to an account/region. The CF stack templates are uploaded to an (auto-generated) S3 bucket so it's already backed up for you.
So all you really need to have is the original deployment code in a git repo and have access to your keys. Everything else is already backed up for you.

Access files in s3n://elasticmapreduce/samples/wordcount/input

How I can I access the file sitting in the following folder of S3 which is own by someone else
s3n://elasticmapreduce/samples/wordcount/input
The files in s3n://elasticmapreduce/samples/wordcount/input are public, and made available as input by Amazon to the sample word count Hadoop program. The best way to fetch them is to
Start a new Amazon Elastic MapReduce Job Flow (it doesn't matter which one) from the Amazon Web Services console, and make sure that you keep the the job alive with the Keep Alive option
Once the EC2 machines have started, find the instances on EC2 from the Amazon Web Services console
ssh into one of the running EC2 instances, using the hadoop user, for example
ssh -i keypair.pem hadoop#ec2-IPADDRESS.compute-1.amazonaws.com
Obtain the files you need, using hadoop dfs -copyToLocal s3://elasticmapreduce/samples/wordcount/input/0002 .
sftp the files to your local system
You can access wordSplitter.py here:
https://elasticmapreduce.s3.amazonaws.com/samples/wordcount/wordSplitter.py
You can access the input files here:
https://elasticmapreduce.s3.amazonaws.com/samples/wordcount/input/0012
https://elasticmapreduce.s3.amazonaws.com/samples/wordcount/input/0011
https://elasticmapreduce.s3.amazonaws.com/samples/wordcount/input/0010
https://elasticmapreduce.s3.amazonaws.com/samples/wordcount/input/0009
https://elasticmapreduce.s3.amazonaws.com/samples/wordcount/input/0008
https://elasticmapreduce.s3.amazonaws.com/samples/wordcount/input/0007
https://elasticmapreduce.s3.amazonaws.com/samples/wordcount/input/0006
https://elasticmapreduce.s3.amazonaws.com/samples/wordcount/input/0005
https://elasticmapreduce.s3.amazonaws.com/samples/wordcount/input/0004
https://elasticmapreduce.s3.amazonaws.com/samples/wordcount/input/0003
https://elasticmapreduce.s3.amazonaws.com/samples/wordcount/input/0002
https://elasticmapreduce.s3.amazonaws.com/samples/wordcount/input/0001
The owner of the folder (most likely a file in the folder) must have made it accessible to anonymous reader.
If that is the case, s3n://x/y... is translated to
http://s3.amazonaws.com/x/y...
or
http://x.s3.amazonaws.com/y...
x is the name of the bucket.
y... is the path wihtin the bucket.
If you want to make sure the file exists, e.g. if you suspect the name was misspelled, you can in your browser to open
http://s3.amazonaws.com/x
and you'll see XML describing "files" that is S3 objects, available.
Try this:
http://s3.amazonaws.com/elasticmapreduce
I tried this, and seems that the path you want is not public.
AWS EBS documentation quotes s3://elasticmapreduce/samples/wordcount/input in one of the "getting started" examples. But s3 is different from s3n, so input might be available to EMR, but not to HTTP access.
In Amazon S3, there is no concept of folders, a bucket it just a flat collection of objects. But you can list all the files you are interested in a browser with the following URL:
s3.amazonaws.com/elasticmapreduce?prefix=samples/wordcount/input/
Then you can download them by specifying the whole name, e.g.
s3.amazonaws.com/elasticmapreduce/samples/wordcount/input/0001

How to upload files directly to Amazon S3 from a remote server?

Is it possible to upload a file to S3 from a remote server?
The remote server is basically a URL based file server. Example, using http://example.com/1.jpg, it serves the image. It doesn't do anything else and can't run code on this server.
It is possible to have another server telling S3 to upload a file from http://example.com/1.jpg
upload from http://example.com/1.jpg
server -------------------------------------------> S3 <-----> example.com
If you can't run code on the server or execute requests then, no, you can't do this. You will have to download the file to a server or computer that you own and upload from there.
You can see the operations you can perform on amazon S3 at http://docs.amazonwebservices.com/AmazonS3/latest/API/APIRest.html
Checking the operations for both the REST and SOAP APIs you'll see there's no way to give Amazon S3 a remote URL and have it grab the object for you. All of the PUT requests require the object's data to be provided as part of the request. Meaning the server or computer that is initiating the web request needs to have the data.
I have had a similar problem in the past where I wanted to download my users' Facebook Thumbnails and upload them to S3 for use on my site. The way I did it was to download the image from Facebook into Memory on my server, then upload to Amazon S3 - the full thing took under 2 seconds. After the upload to S3 was complete, write the bucket/key to a database.
Unfortunately there's no other way to do it.
I think the suggestion provided is quite good, you can SCP the file to S3 Bucket. Giving the pem file will be a password less authentication, via PHP file you can validate the extensions. PHP file can pass the file, as argument to SCP command.
The only problem with this solution is, you must have your instance in AWS. You can't use this solution if your website is hosted in other Hosting Providers and you are trying to upload files straight to S3 Bucket.
Technically it's possible, using AWS Signature Version 4, Assuming your remote server is the customer in the image below, you could prepare a form in the main server, and send the form fields to the remote server, for it to curl it. Detailed example here.
you can use scp command from Terminal.
1)using terminal, go to the place where there is that file you want to transfer to the server
2) type this:
scp -i yourAmazonKeypairPath.pem fileNameThatYouWantToTransfer.php ec2-user#ec2-00-000-000-15.us-west-2.compute.amazonaws.com:
N.B. Add "ec2-user#" before your ec2blablbla stuffs that you got from the Ec2 website!! This is such a picky error!
3) your file will be uploaded and the progress will be shown. When it is 100%, you are done!