I've currently got involved in a project using GCP Ml-engine. It's already set & ready so my task is to use it's predict command to leverage the API. The whole project exists in VM instance so I want to know, does it help to get access token in a more concise way? I mean, SDK or something like that, because I didn't find anything useful. If not, what are my options here? JWT?
You might find this useful. https://github.com/GoogleCloudPlatform/python-docs-samples/blob/master/ml_engine/online_prediction/predict.py
Especially these lines:
# Create the ML Engine service object.
# To authenticate set the environment variable
# GOOGLE_APPLICATION_CREDENTIALS=<path_to_service_account_file>
service = googleapiclient.discovery.build('ml', 'v1')
name = 'projects/{}/models/{}'.format(project, model)
if version is not None:
name += '/versions/{}'.format(version)
response = service.projects().predict(
name=name,
body={'instances': instances}
).execute()
You can create the service account file from the project IAM page and download the token onto the VM.
Related
I have an aws setup that requires me to assume role and get corresponding credentials in order to write to s3. For example, to write with aws cli, I need to use --profile readwrite flag. If I write code myself with boot, I'd assume role via sts, get credentials, and create new session.
However, there is a bunch of applications and packages relying on boto3's configuration, e.g. internal code runs like this:
s3 = boto3.resource('s3')
result_s3 = s3.Object(bucket, s3_object_key)
result_s3.put(
Body=value.encode(content_encoding),
ContentEncoding=content_encoding,
ContentType=content_type,
)
From documentation, boto3 can be set to use default profile using (among others) AWS_PROFILE env variable, and it clearly "works" in terms that boto3.Session().profile_name does match the variable - but the applications still won't write to s3.
What would be the cleanest/correct way to set them properly? I tried to pull credentials from sts, and write them as AWS_SECRET_TOKEN etc, but that didn't work for me...
Have a look at the answer here:
How to choose an AWS profile when using boto3 to connect to CloudFront
You can get boto3 to use the other profile like so:
rw = boto3.session.Session(profile_name='readwrite')
s3 = rw.resource('s3')
I think the correct answer to my question is one shared by Nathan Williams in the comment.
In my specific case, given that I had to initiate code from python, and was a bit worried about setting AWS settings that might spill into other operations, I used
the fact that boto3 has DEFAULT_SESSION singleton, used each time, and just overwrote this with a session that assumed the proper role:
hook = S3Hook(aws_conn_id=aws_conn_id)
boto3.DEFAULT_SESSION = hook.get_session()
(here, S3Hook is airflow's s3 handling object). After that (in the same runtime) everything worked perfectly
I try to use the "google-api-nodejs-client" (https://github.com/googleapis/google-api-nodejs-client) with a JSON Web Token in a flowground connector implementation. Is there a possibility to get the environment variable "GOOGLE_APPLICATION_CREDENTIALS" point to a configurable JWT file that the user can upload into a flow?
Example of client usage from the library page:
// This method looks for the GCLOUD_PROJECT and GOOGLE_APPLICATION_CREDENTIALS
// environment variables.
const auth = new google.auth.GoogleAuth({
scopes: ['https://www.googleapis.com/auth/cloud-platform']
});
Lets see if I understand correctly what you want to do:
create a flow that can be triggered from outside and accesses any Google API via google-api-nodejs-client module.
every time you trigger the flow you will post a valid JWT for accessing any Google API
you want to store the JWT in the local file-system; the mentioned environment variables contains the path to the persisted JWT.
Generally spoken this is a valid approach for the moment.
You can create a file in the local file-system:
fs.writeFile(process.env.HOME + '/jwt.token', ...)
Sebastian already explained how to define the needed environment variables.
Please keep in mind that writing and reading the JWT file must take place in the same step of flow execution. There is no persistence of this file after finishing execution of this step.
Why is this a valid approach for the moment only?
I assume that we will prevent writing in the local file-system in the near future. This will prevent the described solution as well.
From my point of view the better solution would be using the OAuth2 mechanism build in flowground.
For more information regarding this approach
https://github.com/googleapis/google-api-nodejs-client#oauth2-client
https://doc.flowground.net/getting-started/credential.html
You can set environment variables in flowground following on the "ENV vars" page for your connector:
I want to make a simple http rest call to a google machine learning predict endpoint, but I can't find any information on how to do that. As far as I can tell from the limited documentation, you have to use either the Java or Python library (or figure out how to properly encrypt everything when using the REST auth endpoints) and get a credentials object. Then the instructions end and I have no idea how to actually use my credentials object. This is my code so far:
import urllib2
from google.oauth2 import service_account
# Constants
ENDPOINT_URL = 'ml.googleapis.com/v1/projects/{project}/models/{model}:predict?access_token='
SCOPES = ['https://www.googleapis.com/auth/cloud-platform']
SERVICE_ACCOUNT_FILE = 'service.json'
credentials = service_account.Credentials.from_service_account_file(SERVICE_ACCOUNT_FILE, scopes=SCOPES)
access_token=credentials.token
opener = urllib2.build_opener(urllib2.HTTPHandler)
request = urllib2.Request(ENDPOINT_URL + access_token)
request.get_method = lambda: 'POST'
result = opener.open(request).read()
print(str(result))
If I print credentials.valid it returns False, so I think there is an issue with the credentials object init but I don't know what since no errors are reported, the fields are all correct inside the credentials object, and I did everything according to the instructions. Also my service.json is the same one our mobile team is successfully using to get an access token so I know the json file has the correct data.
How do I get an access token for the machine learning service that I can use to call the predict endpoint?
It turns out the best way to do a simple query is to use the gcloud console. I ended up following the instructions here to setup my environment: https://cloud.google.com/sdk/docs/quickstart-debian-ubuntu
Then the instructions here to actually hit the endpoint (with some help from the person that originally setup the model):
https://cloud.google.com/sdk/gcloud/reference/ml-engine/predict
It was way easier than trying to use the python library and I highly recommend it to anyone trying to just hit the predict endpoint.
We are creating a online platform and exposing an Julia API via a embedded code-editor. The user can access the API and run some analysis on our web-app. I have a question related to controlling access to the API and objects.
The API right now contains a database handle and other objects that are exposed to the user and can be used to hack the internal system.
Below is the current architecture:
UserProgram.jl
function doanalysis()
data = getdata()
# some analysis on data
end
InternalProgram.jl
const client = MongoClient()
const collection = MongoCollection(client,"dbname","collectionName")
function getdata()
data = #some function to get data from collection
return data
end
#after parsing the user program
doanalysis()
To run the user analysis, we pass user program as a command-line argument (using ArgParse module) and run the internal program as follows
$ julia InternalProgram.jl --file Userprogram.jl
With this architecture, user potentially gets access to "client" and "collection" and can modify internal databases.
Is there a better way to solve this problem without exposing the objects?
I hope someone has an answer to this.
You will be exposing yourself to multiple types of vulnerabilities - as the general rule, executing user inputed code is a VERY BAD IDEA.
1/ like you said, you'll potentially allow users to execute random code against your database.
2/ your users will have access to all the power of Julia to do things on your server (download files they can later execute for example, access other servers and services on the server [MySQL, email, etc]). Depending on the level of access of the Julia process, think unauthorized access to your file system, installing key loggers, running spam servers, etc.
3/ will be able to use Julia packages and get you into a lot of trouble - like for example add/use the Requests.jl package and execute DoS attacks on other servers.
If you really want to go this way, I recommend that:
A/ set proper (minimal) permissions for the MongoDB user configured to be used in the app (ex: http://blog.mlab.com/2016/07/mongodb-tips-tricks-collection-level-access-control/)
B/ execute each user's code into a separate sandbox / container that only exposes the minimum necessary software
C/ have your containers running on a managed platform where tooling exists (firewalls) to monitor incoming and outgoing traffic (for example to block spam or DoS attacks)
In order to achieve B/ and C/ my recommendation is to use JuliaBox. I haven't used it myself, but seems to be exactly what you need: https://github.com/JuliaCloud/JuliaBox
Once you get that running, you can also use https://github.com/JuliaWeb/JuliaWebAPI.jl
I want to read an S3 file from my (local) machine, through Spark (pyspark, really). Now, I keep getting authentication errors like
java.lang.IllegalArgumentException: AWS Access Key ID and Secret
Access Key must be specified as the username or password
(respectively) of a s3n URL, or by setting the fs.s3n.awsAccessKeyId
or fs.s3n.awsSecretAccessKey properties (respectively).
I looked everywhere here and on the web, tried many things, but apparently S3 has been changing over the last year or months, and all methods failed but one:
pyspark.SparkContext().textFile("s3n://user:password#bucket/key")
(note the s3n [s3 did not work]). Now, I don't want to use a URL with the user and password because they can appear in logs, and I am also not sure how to get them from the ~/.aws/credentials file anyway.
So, how can I read locally from S3 through Spark (or, better, pyspark) using the AWS credentials from the now standard ~/.aws/credentials file (ideally, without copying the credentials there to yet another configuration file)?
PS: I tried os.environ["AWS_ACCESS_KEY_ID"] = … and os.environ["AWS_SECRET_ACCESS_KEY"] = …, it did not work.
PPS: I am not sure where to "set the fs.s3n.awsAccessKeyId or fs.s3n.awsSecretAccessKey properties" (Google did not come up with anything). However, I did try many ways of setting these: SparkContext.setSystemProperty(), sc.setLocalProperty(), and conf = SparkConf(); conf.set(…); conf.set(…); sc = SparkContext(conf=conf). Nothing worked.
Yes, you have to use s3n instead of s3. s3 is some weird abuse of S3 the benefits of which are unclear to me.
You can pass the credentials to the sc.hadoopFile or sc.newAPIHadoopFile calls:
rdd = sc.hadoopFile('s3n://my_bucket/my_file', conf = {
'fs.s3n.awsAccessKeyId': '...',
'fs.s3n.awsSecretAccessKey': '...',
})
The problem was actually a bug in the Amazon's boto Python module. The problem was related to the fact that MacPort's version is actually old: installing boto through pip solved the problem: ~/.aws/credentials was correctly read.
Now that I have more experience, I would say that in general (as of the end of 2015) Amazon Web Services tools and Spark/PySpark have a patchy documentation and can have some serious bugs that are very easy to run into. For the first problem, I would recommend to first update the aws command line interface, boto and Spark every time something strange happens: this has "magically" solved a few issues already for me.
Here is a solution on how to read the credentials from ~/.aws/credentials. It makes use of the fact that the credentials file is an INI file which can be parsed with Python's configparser.
import os
import configparser
config = configparser.ConfigParser()
config.read(os.path.expanduser("~/.aws/credentials"))
aws_profile = 'default' # your AWS profile to use
access_id = config.get(aws_profile, "aws_access_key_id")
access_key = config.get(aws_profile, "aws_secret_access_key")
See also my gist at https://gist.github.com/asmaier/5768c7cda3620901440a62248614bbd0 .
Environment variables setup could help.
Here in Spark FAQ under the question "How can I access data in S3?" they suggest to set AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY environment variables.
I cannot say much about the java objects you have to give to the hadoopFile function, only that this function already seems depricated for some "newAPIHadoopFile". The documentation on this is quite sketchy and I feel like you need to know Scala/Java to really get to the bottom of what everything means.
In the mean time, I figured out how to actually get some s3 data into pyspark and I thought I would share my findings.
This documentation: Spark API documentation says that it uses a dict that gets converted into a java configuration (XML). I found the configuration for java, this should probably reflect the values you should put into the dict: How to access S3/S3n from local hadoop installation
bucket = "mycompany-mydata-bucket"
prefix = "2015/04/04/mybiglogfile.log.gz"
filename = "s3n://{}/{}".format(bucket, prefix)
config_dict = {"fs.s3n.awsAccessKeyId":"FOOBAR",
"fs.s3n.awsSecretAccessKey":"BARFOO"}
rdd = sc.hadoopFile(filename,
'org.apache.hadoop.mapred.TextInputFormat',
'org.apache.hadoop.io.Text',
'org.apache.hadoop.io.LongWritable',
conf=config_dict)
This code snippet loads the file from the bucket and prefix (file path in the bucket) specified on the first two lines.