Google Cloud Dataflow permission issues

Google Cloud Dataflow permission issues - permissions

Beginner in GCP here. I'm testing GCP Dataflow as part of a IOT project to move data from Pub/Sub to BigQuery. I created a Dataflow job from the Topic's page "Export to BigQuery" button.
Apart from the issue that I can't delete a dataflow, I am hitting the following issue:
As soon as the dataflow starts, I get the error:
Workflow failed. Causes: There was a problem refreshing your credentials. Please check: 1. Dataflow API is enabled for your project. 2. Make sure both the Dataflow service account and the controller service account have sufficient permissions. If you are not specifying a controller service account, ensure the default Compute Engine service account [PROJECT_NUMBER]-compute#developer.gserviceaccount.com exists and has sufficient permissions. If you have deleted the default Compute Engine service account, you must specify a controller service account. For more information, see: https://cloud.google.com/dataflow/docs/concepts/security-and-permissions#security_and_permissions_for_pipelines_on_google_cloud_platform. , There is no cloudservices robot account for your project. Please ensure that the Dataflow API is enabled for your project.
Here's where it's funny:
Dataflow API is definitely enabled, since I am looking at this from the Dataflow portion of the console.
Dataflow is using the default compute engine service account, that exists. The link it's pointing at says that this account is created automatically and has a broad access to project's resources. Well, does it?
Dataflows elude me.. How can I tell a dataflow job to restart, or edit or delete it?

please verify below checklist:
Dataflow API should be enabled check under APIs & Services. If you just enabled ,wait for some time to get it updated
[project-number]-compute#developer.gserviceaccount.com and service-[project-number]#dataflow-service-producer-prod.iam.gserviceaccount.com service accounts should exists if dataflow-service-producer-prod didn't get created you can contact dataflow support or you can create and assign Cloud Dataflow Service Agent role, If you are using shared VPC create it in host project and assign Compute Network User role

Related

Not able to get Azure SQL Server Extended Events to work when Blob Storage is set to Enabled from selected virtual networks and IP addresses

So I have an Azure Database and want to test extended events with the database.
I was able to set up my Blob Storage container and was able to get Extended Events via Azure Database to work as long as the Blob Storage network setting Public network access is set to Enabled from all networks. If I set Enabled from selected virtual networks and IP addresses and have Microsoft network routing checked as well as Resource type set with Microsoft.Sql/servers and its value as All In current subscription, it still doesn't work.
I'm not exactly sure what I'm doing wrong and I'm not able to find any documentation on how to make it work without opening up to all networks.
The error I'm getting is:
The target, "5B2DA06D-898A-43C8-9309-39BBBE93EBBD.package0.event_file", encountered a configuration error during initialization. Object cannot be added to the event session. (null) (Microsoft SQL Server, Error: 25602)
Edit - Steps to fix the issue
#Imran: Your answer led me to get everything working. The information you gave and the link provided was enough for me to figure it out.
However, for anyone in the future I want to give better instructions.
The first step I had to do was:
All I had to do was run Set-AzSqlServer -ResourceGroupName [ResourcegroupName] b -ServerName [AzureSQLServerName] -AssignIdentity.
This assigns the SQL Server an Azure Active Directory Identity. After running the above command, you can see your new identity in Azure Active Directory under Enterprise applicationsand then where you see theApplication type == Enterprise Applicationsheader, click the headerApplication type == Enterprise Applicationsand change it toManaged Identities`and click apply. You should see your new identity.
The next step is to give your new identity the role of Storage Blob Data Contributor to your container in Blob Storage. You will need to go to your new container and click Access Control (IAM) => Role assignments => click Add => Add Role assignment => Storage Blob Data Contributor => Managed identity => Select member => click your new identity and click select and then Review + assign
The last step is to get SQL Server to use an identity when connecting to `Blob Storage.
You do that by running the command below on your Azure SQL Server database.
CREATE DATABASE SCOPED CREDENTIAL [https://<mystorageaccountname>.blob.core.windows.net/<mystorageaccountcontainername>]
WITH IDENTITY = 'Managed Identity';
GO
You can see your new credentials when running
SELECT * FROM sys.database_scoped_credentials
The last thing I want to mention is when creating Extended Events with
an Azure SQL Server using SSMS, it gives you this link. This only works if you want your Blob Storage wide open. I think this is a disservice and wish they would have instructions when you want your Blob Storage not wide open by using RBAC instead of SAS.

I tried to reproduce the same in my environment I got the result successfully like below:
To resolve this issue, check whether your account type should be
StorageV2(general purpose v2). If you have a general-purpose v1 or blob storage account, try to upgrade like below.
In storage account -> under setting, configuration -> upgrade
Check whether you have choose Allow trusted Microsoft services to access this storage account under exception and I added firewall client Ip address range and vnet like below.
Make sure Microsoft.Authorization/roleAssignments/write permission in your storage account
After enabling firewall, we lose write access to the storage account and audit logs try to Resave the audit settings from the portal is required in order for auditing to function like below.
Note: Auditing to storage behind firewalls using user managed identity authentication type is not presently supported.
When I try to connect, I got result successfully like below:
Reference:
Configure extended events in SQL Azure to the blob storage with Private Endpoint - Microsoft Community Hub by Sakshi Gupta

What is the correct Cloud SQL connection string syntax for dotnetcore app with Cloud Run?

I want to setup a .NET Core web application on Cloud Run with a Google Cloud SQL database. I easily deployed the database which has a public IP on Cloud SQL and my web application with Docker Container on Cloud Run. I can access the database with SQL Server Management Studio without any difficulties and the web app is up and running as expected. The only piece missing is the link between them that allows them to connect.
In my web app, I got a connection string in that format :
Data Source=***;Initial Catalog=***;User ID=***;Password=***;Pooling=true;Trusted_Connection=false;Connection Timeout=60;Integrated Security=false;Persist Security Info={0};Encrypt=true;TrustServerCertificate=true;MultipleActiveResultSets=true;
Once I got the public IP and the connection name from Cloud SQL, how should be precisely be the connection string and/or the next steps?
Furthermore, in the connections tab under Cloud Run Service, I added the Cloud SQL connection. This is supposed to configure a Cloud SQL Proxy for me.

In order to connect to Cloud SQL from Cloud Run, you must follow this guide
You have already made some configurations in the Connections tab as stated in the Configuring Cloud Run section. You can check the guide for the Public IP since you configured your instance that way, to be sure that all steps were followed.
Briefly, the steps are:
Configure the service account for your service. Make sure that the service account has the appropriate Cloud SQL roles and permissions to connect to Cloud SQL.
The service account for your service needs one of the following IAM roles:
Cloud SQL Client (preferred)
Cloud SQL Admin
If the authorizing service account belongs to a different project than the Cloud SQL instance, the Cloud SQL Admin API and IAM permissions will need to be added for both projects.
Like any configuration change, setting a new configuration for the Cloud SQL connection leads to the creation of a new Cloud Run revision. Subsequent revisions will also automatically get this Cloud SQL connection, unless you make explicit updates to change it.
Go to Cloud Run
Configure the service:
If you are adding Cloud SQL connections to an existing service:
Click on the service name.
Click on the Connections tab.
Click Deploy.
Enable connecting to a Cloud SQL instance:
Click Advanced Settings.
Click on the Connections tab.
If you are adding a connection to a Cloud SQL instance in your project, select the desired Cloud SQL instance from the dropdown menu.
If you are deleting a connection, hover your cursor to the right of the connection to display the Trash icon, and click it.
Click Create or Deploy.
After you've double checked the steps above, you could continue with the section Connecting to Cloud SQL. You can follow the steps on the Public IP tab.
Connect with Unix sockets
Once correctly configured, you can connect your service to your Cloud SQL instance's Unix domain socket accessed on the environment's filesystem at the following path: /cloudsql/INSTANCE_CONNECTION_NAME.
The INSTANCE_CONNECTION_NAME can be found on the Overview page for your instance in the Google Cloud Console or by running the following command:
gcloud sql instances describe [INSTANCE_NAME].
These connections are automatically encrypted without any additional configuration.
The code samples shown below are extracts from more complete examples on the GitHub site. To see this snippet in the context of a web application, view the README on GitHub.
// Equivalent connection string:
// "Server=<dbSocketDir>/<INSTANCE_CONNECTION_NAME>;Uid=<DB_USER>;Pwd=<DB_PASS>;Database=<DB_NAME>;Protocol=unix"
String dbSocketDir = Environment.GetEnvironmentVariable("DB_SOCKET_PATH") ?? "/cloudsql";
String instanceConnectionName = Environment.GetEnvironmentVariable("INSTANCE_CONNECTION_NAME");
var connectionString = new MySqlConnectionStringBuilder()
{
// The Cloud SQL proxy provides encryption between the proxy and instance.
SslMode = MySqlSslMode.None,
// Remember - storing secrets in plain text is potentially unsafe. Consider using
// something like https://cloud.google.com/secret-manager/docs/overview to help keep
// secrets secret.
Server = String.Format("{0}/{1}", dbSocketDir, instanceConnectionName),
UserID = Environment.GetEnvironmentVariable("DB_USER"), // e.g. 'my-db-user
Password = Environment.GetEnvironmentVariable("DB_PASS"), // e.g. 'my-db-password'
Database = Environment.GetEnvironmentVariable("DB_NAME"), // e.g. 'my-database'
ConnectionProtocol = MySqlConnectionProtocol.UnixSocket
};
connectionString.Pooling = true;
// Specify additional properties here.
return connectionString;
Google recommends that you use Secret Manager to store sensitive information such as SQL credentials. You can pass secrets as environment variables or mount as a volume with Cloud Run.
After creating a secret in Secret Manager, update an existing service, with the following command:
gcloud run services update SERVICE_NAME \
--add-cloudsql-instances=INSTANCE_CONNECTION_NAME
--update-env-vars=INSTANCE_CONNECTION_NAME=INSTANCE_CONNECTION_NAME_SECRET \
--update-secrets=DB_USER=DB_USER_SECRET:latest \
--update-secrets=DB_PASS=DB_PASS_SECRET:latest \
--update-secrets=DB_NAME=DB_NAME_SECRET:latest
See also:
GoogleCloudPlatform/dotnet-docs-samples on GitHub

Log Analytics - Pricing tier doesn't match the subscriptions billing model

I have a log analytics resource setup on perGB and I am trying to deploy a solution that uses an Azure Automation account.
When deploying, I see the error in my log analytics resource :Pricing tier doesn't match the subscriptions billing model.
It is my understanding that something recently changed in OMS that may cause this. I have already tried to install the Upgrade Readiness solution but that didn't solve the problem.

Use Sku PerGB2018 instead of standard or free.

Google ML Engine - Unable to log objective metric due to exception <HttpError 403>

I am running a TensorFlow application on the Google ML Engine with hyper-parameter tuning and I've been running into some strange authentication issues.
My Data and Permissions Setup
My trainer code supports two ways of obtaining input data for my model:
Getting a table from BigQuery.
Reading from a .csv file.
For my IAM permissions, I have two members set up:
My user account:
Assigned to the following IAM roles:
Project Owner (roles/owner)
BigQuery Admin (roles/bigquery.admin)
Credentials were created automatically when I used gcloud auth application-default login
A service account:
Assigned to the following IAM roles:
BigQuery Admin (roles/bigquery.admin)
Storage Admin (roles/storage.admin)
PubSub Admin (roles/pubsub.admin)
Credentials were downloaded to a .json file when I created it in the Google Cloud Platform interface.
The Problem
When I run my trainer code on the Google ML Engine using my user account credentials and reading from a .csv file, everything works fine.
However, if I try to get my data from BigQuery, I get the following error:
Forbidden: 403 Insufficient Permission (GET https://www.googleapis.com/bigquery/v2/projects/MY-PROJECT-ID/datasets/MY-DATASET-ID/tables/MY-TABLE-NAME)
This is the reason why I created a service account, but the service account has a separate set of issues. When using the service account, I am able to read from both a .csv file and from BigQuery, but in both cases, I get the following error at the end of each trial:
Unable to log objective metric due to exception <HttpError 403 when requesting https://pubsub.googleapis.com/v1/projects/MY-PROJECT-ID/topics/ml_MY-JOB-ID:publish?alt=json returned "User not authorized to perform this action.">.
This doesn't cause the job to fail, but it prevents the objective metric from being recorded, so the hyper-parameter tuning does not provide any helpful output.
The Question
I'm not sure why I'm getting these permission errors when my IAM members are assigned to what I'm pretty sure are the correct roles.
My trainer code works in every case when I run it locally (although PubSub is obviously not being used when running locally), so I'm fairly certain it's not a bug in the code.
Any suggestions?
Notes
There was one point at which my service account was getting the same error as my user account when trying to access BigQuery. The solution I stumbled upon is a strange one. I decided to remove all roles from my service account and add them again, and this fixed the BigQuery permission issue for that member.

Thanks for the very detailed question.
To explain what happened here, in the first case Cloud ML Engine used an internal service account (the one that is added to your project with the Cloud ML Service Agent role). Due to some internal security considerations, that service account is restricted from accessing BigQuery, so hence the first 403 error that you saw.
Now, when you replaced machine credentials with your own service account using the .json credentials file, that restriction went away. However your service account didn't have all the access to the internal systems, such as the pubsub service used for Hyperparameter tuning mechanism internally. Hence the pubsub error in the second case.
There are a few possible solutions to this problem:
on the Cloud ML Engine side, we're working on better BigQuery support out-of-the-box, although we don't have an ETA at this point.
your approach with a custom service account might work as a short-term solution as long as you don't use Hyperparameter tuning. However this is obviously fragile because it depends on the implementation details in Cloud ML Engine, so I wouldn't recommend relying on this long-term
finally, consider exporting data from BigQuery to GCS first and using GCS to read training data. This scenario is well-supported in Cloud ML Engine. Besides you'll get performance gains on large datasets compared to reading BigQuery directly: the current implementation of BigQueryReader in TensorFlow has suboptimal perf characteristics, which we're also working to improve.

Why Elastic MapReduce job flow failed in AWS MapReduce?

I created a job flow in AWS MapReduce, I created a job flow of Contextual Advertising (Hive Script) - done 'Start interactive Hive Session', selected m1.small instances, proceeded without a VPC subnet id and Configure Hadoop in Configure Bootstrap actions.
Now, job flow goes into starting state and after 15-20 minutes it goes into failed state and it does not go into waiting state.
It shows "Last State Change Reason: User account is not authorized to call EC2 "
I gave PowerUserAccess to myself thru IAM. also I have given below policies to myself.
1.AmazonEC2FullAccess
2.AmazonElasticMapReduceFullAccess
3.IAMFullAccess
After giving all these policies still it shows "User account is not authorized to call EC2"
please guide. Thanks.

EMR builds on other AWS services that you also need to subscribe to. Giving IAM privileges to call ec2, s3 and emr is not sufficient.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas