Enable Cloud Vision API to access a file on Cloud Storage - authentication

i have already seen there are some similar questions but none of them actually provide a full answer.
Since I cannot comment in that thread, i am opening a new one.
How do I address Brandon's comment below?
"...
In order to use the Cloud Vision API with a non-public GCS object,
you'll need to send OAuth authentication information along with your
request for a user or service account which has permission to read the
GCS object."?
I have the json file the system gave me as described here when I created the service account.
I am trying to run the api from a python script.
It is not clear how to use it.

I'd recommend to use the Vision API Client Library for python to perform the call. You can install it on your machine (ideally in a virtualenv) by running the following command:
pip install --upgrade google-cloud-vision
Next, You'll need to set the environment variable GOOGLE_APPLICATION_CREDENTIALS to the file path of the JSON file that contains your service account key. For example, on a Linux machine you'd do it like this:
export GOOGLE_APPLICATION_CREDENTIALS="/home/user/Downloads/service-account-file.json"
Finally, you'll just have to call the Vision API client's method you desire (for example here the label_detection method) like so:
def detect_labels():
"""Detects labels in the file located in Google Cloud Storage."""
client = vision.ImageAnnotatorClient()
image = types.Image()
image.source.image_uri = "gs://bucket_name/path_to_image_object"
response = client.label_detection(image=image)
labels = response.label_annotations
print('Labels:')
for label in labels:
print(label.description)
By initialyzing the client with no parameter, the library will automatically look for the GOOGLE_APPLICATION_CREDENTIALS environment variable you've previously set and run on behalf of this service account. If you granted it permissions to access the file, it'll run successfully.

Related

Trigger DAG Run - 403

I am following this tutorial to build a Cloud Function that triggers a DAG run. I have run into a permission issue. Upon the function being triggered and thus trying to run the DAG, I get a permission error message. It reads as follows:
Service account does not have permission to access the IAP-protected application.
I have followed the recommendation in the tutorial to have a service account with the Composer User role. What am I missing?
Note: I am calling Airflow version 2's Stable REST API and my Composer is version 1.
-Diana
I found a perhaps duplicate question here:
Receiving HTTP 401 when accessing Cloud Composer's Airflow Rest API
As Seng Cheong noted in their answer, the reason for this error is that Google Cloud seems to have issues adding service account IDs that are longer than 64 characters to the Airflow list of users. Upon changing my service account ID to one <= 64 characters, I was able to trigger the DAG successfully. If you can't make your service account ID shorter, then Google documentation suggests adding the "numeric user id" corresponding to your service account directly. The steps for how to do so can be found here: https://cloud.google.com/composer/docs/access-airflow-api#access_airflow_rest_api_using_a_service_account
Best of luck friend

Calling an API that runs on another GCP project with Airflow Composer

I'm running a task with SimpleHTTPOperator on Airflow Composer. This task calls an API that runs on Cloud Run Service living in another project. This means I need a service account in order to access the project.
When I try to make a call to the api, I get the following error :
{secret_manager_client.py:88} ERROR - Google Cloud API Call Error (PermissionDenied): No access for Secret ID airflow-connections-call_to_api.
Did you add 'secretmanager.versions.access' permission?
What's a solution to such an issue ?
Context : Cloud Composer and Cloud Run live in 2 different Projects
This specific error is irrelevant to the cross project scenario. It seems that you have configured Composer/Airflow to use Secret Manager as the primary backend for connections and variables. However, according to the error message , the service account used by Composer is missing the secretmanager.versions.access permission to access the connection (call_to_api) you have configured for the API.
Check this part of the documentation.

Configuring Google cloud bucket as Airflow Log folder

We just started using Apache airflow in our project for our data pipelines .While exploring the features came to know about configuring remote folder as log destination in airflow .For that we
Created a google cloud bucket.
From Airflow UI created a new GS connection
I am not able to understand all the fields .I just created a sample GS Bucket under my project from google console and gave that project ID to this Connection.Left key file path and scopes as blank.
Then edited airflow.cfg file as follows
remote_base_log_folder = gs://my_test_bucket/
remote_log_conn_id = test_gs
After this changes restarted the web server and scheduler .But still my Dags is not writing logs to the GS bucket .I am able to see the logs which is creating logs in base_log_folder .But nothing is created in my bucket .
Is there any extra configuration needed from my side to get it working
Note: Using Airflow 1.8 .(Same issue I faced with AmazonS3 also. )
Updated on 20/09/2017
Tried the GS method attaching screenshot
Still I am not getting logs in the bucket
Thanks
Anoop R
I advise you to use a DAG to connect airflow to GCP instead of UI.
First, create a service account on GCP and download the json key.
Then execute this DAG (you can modify the scope of your access):
from airflow import DAG
from datetime import datetime
from airflow.operators.python_operator import PythonOperator
def add_gcp_connection(ds, **kwargs):
"""Add a airflow connection for GCP"""
new_conn = Connection(
conn_id='gcp_connection_id',
conn_type='google_cloud_platform',
)
scopes = [
"https://www.googleapis.com/auth/pubsub",
"https://www.googleapis.com/auth/datastore",
"https://www.googleapis.com/auth/bigquery",
"https://www.googleapis.com/auth/devstorage.read_write",
"https://www.googleapis.com/auth/logging.write",
"https://www.googleapis.com/auth/cloud-platform",
]
conn_extra = {
"extra__google_cloud_platform__scope": ",".join(scopes),
"extra__google_cloud_platform__project": "<name_of_your_project>",
"extra__google_cloud_platform__key_path": '<path_to_your_json_key>'
}
conn_extra_json = json.dumps(conn_extra)
new_conn.set_extra(conn_extra_json)
session = settings.Session()
if not (session.query(Connection).filter(Connection.conn_id ==
new_conn.conn_id).first()):
session.add(new_conn)
session.commit()
else:
msg = '\n\tA connection with `conn_id`={conn_id} already exists\n'
msg = msg.format(conn_id=new_conn.conn_id)
print(msg)
dag = DAG('add_gcp_connection', start_date=datetime(2016,1,1), schedule_interval='#once')
# Task to add a connection
AddGCPCreds = PythonOperator(
dag=dag,
task_id='add_gcp_connection_python',
python_callable=add_gcp_connection,
provide_context=True)
Thanks to Yu Ishikawa for this code.
Yes, you need to provide additional information for both, S3 and GCP connection.
S3
Configuration is passed via extra field as JSON. You can provide only profile
{"profile": "xxx"}
or credentials
{"profile": "xxx", "aws_access_key_id": "xxx", "aws_secret_access_key": "xxx"}
or path to config file
{"profile": "xxx", "s3_config_file": "xxx", "s3_config_format": "xxx"}
In case of the first option, boto will try to detect your credentials.
Source code - airflow/hooks/S3_hook.py:107
GCP
You can either provide key_path and scope (see Service account credentials) or credentials will be extracted from your environment in this order:
Environment variable GOOGLE_APPLICATION_CREDENTIALS pointing to a file with stored credentials information.
Stored "well known" file associated with gcloud command line tool.
Google App Engine (production and testing)
Google Compute Engine production environment.
Source code - airflow/contrib/hooks/gcp_api_base_hook.py:68
The reason for logs not being written to your bucket could be related to service account rather than config on airflow itself. Make sure it has access to the mentioned bucket. I had same problems in the past.
Adding more generous permissions to the service account, e.g. even project wide Editor and then narrowing it down. You could also try using gs client with that key and see if you can write to the bucket.
For me personally this scope works fine for writing logs: "https://www.googleapis.com/auth/cloud-platform"

SCOPES_WARNING in BigQuery when accessed from a Cloud Compute instance

Every time I use bq on a Cloud Compute instance, I get this:
/usr/local/share/google/google-cloud-sdk/platform/bq/third_party/oauth2client/contrib/gce.py:73: UserWarning: You have requested explicit scopes to be used with a GCE service account.
Using this argument will have no effect on the actual scopes for tokens
requested. These scopes are set at VM instance creation time and
can't be overridden in the request.
warnings.warn(_SCOPES_WARNING)
This is a default micro in f1 with Debian 8. I gave this instance access to all Cloud APIs and its service account is also an owner of a project. I run gcloud init. But this error persists.
Is there something wrong?
I noticed that this warning did not appear on an older instance running SDK version 0.9.85, however I now get it when creating a new instance or upgrading the the latest Gcloud SDK.
The scopes warning can be safely ignored, as it's just telling you that the only scopes that will be used are the ones specified at instance creation time, which is the expected behavior of the default GCE service account.
It seems the 'bq' tool doesn't distinguish between the default service account on GCE and a regular service account and always tries to set the scopes explicitly. The warning comes from oauth2client, and it looks like it didn't display this warning in versions prior to v2.0.0.
I've created public issue to track this which you can star to get updates:
https://code.google.com/p/google-bigquery/issues/detail?id=557

Windows Azure Console for Worker Role Cloud Service

I have a worker role cloud service that I have recently developed on my local machine. The service exposes a WCF interface that receives a file as a byte array, recompiles the file, converts it to the appropriate format, then stores it in Azure Storage. I managed to get everything working using the Azure Compute Emulator on my machine and published the service to Azure and... nothing. Running it on my machine again, it works as expected. When I was working on it on my computer, the Azure Compute Emulator's console output was essential in getting the application running.
Is there a similar functionality that can be tapped into on the Cloud Service via RDP? Such as starting/restarting the role at the command prompt or in power shell? If not, what is the best way to debug/log what the worker role is doing (without using Intellitrace)? I have diagnostics enabled in the project, but it doesn't seem to be giving me the same level of detail as the Computer Emulator console. I've rerun the role and corresponding .NET application again on localhost and was unable to find any possible errors in the console.
Edit: The Next Best Thing
Falling back to manual logging, I implemented a class that would feed text files into my Azure Storage account. Here's the code:
public class EventLogger
{
public static void Log(string message)
{
CloudBlobContainer cbc;
cbc = CloudStorageAccount.Parse(RoleEnvironment.GetConfigurationSettingValue("StorageClientAccount"))
.CreateCloudBlobClient()
.GetContainerReference("errors");
cbc.CreateIfNotExist();
cbc.GetBlobReference(string.Format("event-{0}-{1}.txt", RoleEnvironment.CurrentRoleInstance.Id, DateTime.UtcNow.Ticks)).UploadText(message);
}
}
Calling ErrorLogger.Log() will create a new text file and record whatever message you put in there. I found an example in the answer below.
There is no console for worker roles that I'm aware of. If diagnostics isn't giving you any help, then you need to get a little hacky. Try tracing out messages and errors to blob storage yourself. Steve Marx has a good example of this here http://blog.smarx.com/posts/printf-here-in-the-cloud
As he notes in the article, this is not for production, just to help you find your problem.