How can I execute multiple bq commands for different projects using gcloud configurations - google-bigquery

On my linux server, I have a 'load_app' user account. My plan was to write a generic shell script to load data into bigquery using bq tool. gcloud is installed and I have multiple configurations in my environment. These configurations point to different service and user accounts for different projects. I am trying to set it up such that I can run multiple executions of this script at the same time for different configurations.
I tried setting CLOUDSDK_ACTIVE_CONFIG_NAME= in my script, the config-name value is being passed to the script. This setting should switch the active configuration, however, I am getting the following error :
ERROR: (bq) There was a problem refreshing your current auth tokens: invalid_grant: Bad Request

Related

Automating Zope5 Database Pack

I tried asking on the Plone forums but no one had any good responses.
I am running Zope5, no ZeoServer, no Plone, with Apache as a frontend proxy.
In the old Zope2 there was a script called zodb-pack that could pack the database from the command line. This is no longer included with Zope5 and I am searching for a way to pack the db from the command line.
Also, Apache is setup for client certificate authentication, so I cannot do something like:
curl -X POST https://username:password#zope.domain.com
I also don't want to hardcode that type of curl statement because of the need to include the username and password.
My Zope is running in a Docker container, so I thought about doing something like:
source /zope5/bin/activate
python scriptname
with a python script along the lines of
from ZODB.DB import DB
from ZODB.config import databaseFromString
from transaction import commit
db = databaseFromString("<zodb_config>")
storage = db.storage
storage.pack(None, referencesf)
but I'm not sure that's the correct way to do this. Basically I just want my bash script that automates the backups for the server to pack the Zope DB before backing it up, but I need a command line command to do so.
I cannot use any solution that requires me to modify how Zope runs, nor requires me to stop Zope to perform the pack.
Of course I can manually go to the ZMI's Control Panel and click Pack, but like I said, I was trying to automate it so it could run in off peak hours.

SSIS flat file folder permission error when NOT running from SQL Server Agent

Setup: A pretty standard data export SSIS package (SQL Server 2016 compatible), created in VS2019/Data Tools and deployed using the SSIS Project Deployment model to the Integration Services Catalog of a SQL Server 2016 instance. The package creates files in a network folder before sending the file out via FTP and putting a copy of the file in a Sent folder.
The project requirements include having the package running on a schedule using "default" parameter values, as well as allowing users to manually run the package using "non-default" parameter values from within a stand-alone application.
Current behavior: the package behaves correctly when run from a SQL Server Agent Job that is configured with a SQL proxy and credentials mapped to a domain login with the proper permissions for the network folder.
Problem: the Data Flow task fails to create the file with a "Cannot open the datafile" error when running the package directly using any of the following methods (even when the "current" session is using the same credentials as the SQL Server Credentials/Proxy used by the SQL Server Agent Job):
Using SSMS to right-click on the package and selecting Execute
Using the DTEXEC SQL utility
Using the SSISDB.catalog.start_execution SQL Server stored procedure
As far as I'm aware, these are the only methods capable of starting a SSIS package and changing the package's parameter values. I either need to get one of the latter 2 methods to work, find another option that allows for changing the parameter values while launching the package, or use one of 2 techniques I'm aware of (detailed below) that would add yet another failure point to the process as well as other potential issues.
Note: If the process is changed to initially create the file on the SQL Server's local harddrive, then the Data Flow task succeeds, but the later copy to Sent folder task fails with a very similar permissions error.
Alternative #1: this technique requires creating a new table, loading the parameter values to the table, changing the package to check the table and potentially set it's parameters/variables based on what it finds. The package can then be launched using a SQL Server Agent Job (for which there are multiple methods to manually launch them) and if the calling object has correctly populated the table, the package will behave as if it's parameters were changed at runtime otherwise it will run with the default values.
Alternative #2: Change all folders used by the package to point to folders local to the SQL Server instance and then create a separate scheduled task/application/whatever, with the valid credentials, that would synchronize or move the files to their proper network folders.
even when the "current" session is using the same credentials as the SQL Server Credentials/Proxy used by the SQL Server Agent Job
This is probably because the account is not logged on locally at the SQL Server, and so it's a Double-Hop Impersonation scenario, and would require Kerberos Constrained Delegation to be configured.
And you are correct in assessing the options. The general solution is to invoke catalog.start_execution from a session running on the SQL Server, and an Agent Job is the simplest built-in way to do this (the others being xp_cmdshell, Service Broker Activation, or SQL CLR).

Run single SQL build pipeline at a time

TLDR
I want to run only one instance of a build at a time that shares a resource that can only handle one execution at a time while still being able to run multiple agents at a time on other builds. Another option is to run an Azure SQL database instance within the build, if that is even possible.
1.) How can I restrict builds that share a resource to 1 agent at a time? I was looking for a named azure agent that can be limited or add some sort of namespace that can only be used one at a time.
2.) What would be a better approach to testing a SQL install script as part of a build pipeline.
Details
I have several build pipelines and release pipelines setup in Azure. One of my pipelines is testing a large SQL script for initializing new instances of a database. This is performed using the Azure SQL Execute Query task. Any errors encountered when running the SQL is supposed to kick back to Github as a failed build. However, when I increased the number of agents from 1 to 2 I encounter an issue every now and then where a build is triggered before the previous one finishes. This breaks the first build.
Here are the agents I am using
To limit one agent to builds, if i understand you correctly. You can try below steps.
First go to your build pipeline, Edit your pipeline. Specify the demands for your pipeline. Then the pipeline will only run on the agent that satisfy the demands.
For example, if i specify the demands like below pic. Then the pipeline will only run on the agent named agentname.
And you can find detailed capabilities of your agent and define custom capabilities. go agent pools under Pipelines in your Project Settings page.
Select your agent pool, select your agent and Add a new capablity to add custom capabilities. You can then use the custom defined capabilities as demands.
Update:
In yaml style pipeline, You can specify which agent to run your pipeline by define vmImage, Check here for details
pool:
name: string # name of the pool to run this job in
demands: string | [ string ] ## see below
vmImage: string # name of the vm image you want to use
You can aslo try using Azure Pipelines agent pool, where you can point a specific agent to run your pipeline.
Hope you can find this helpful.

Google cloud dataproc failing to create new cluster with initialization scripts

I am using the below command to create data proc cluster:
gcloud dataproc clusters create informetis-dev
--initialization-actions “gs://dataproc-initialization-actions/jupyter/jupyter.sh,gs://dataproc-initialization-actions/cloud-sql-proxy/cloud-sql-proxy.sh,gs://dataproc-initialization-actions/hue/hue.sh,gs://dataproc-initialization-actions/ipython-notebook/ipython.sh,gs://dataproc-initialization-actions/tez/tez.sh,gs://dataproc-initialization-actions/oozie/oozie.sh,gs://dataproc-initialization-actions/zeppelin/zeppelin.sh,gs://dataproc-initialization-actions/user-environment/user-environment.sh,gs://dataproc-initialization-actions/list-consistency-cache/shared-list-consistency-cache.sh,gs://dataproc-initialization-actions/kafka/kafka.sh,gs://dataproc-initialization-actions/ganglia/ganglia.sh,gs://dataproc-initialization-actions/flink/flink.sh”
--image-version 1.1 --master-boot-disk-size 100GB --master-machine-type n1-standard-1 --metadata "hive-metastore-instance=g-test-1022:asia-east1:db_instance”
--num-preemptible-workers 2 --num-workers 2 --preemptible-worker-boot-disk-size 1TB --properties hive:hive.metastore.warehouse.dir=gs://informetis-dev/hive-warehouse
--worker-machine-type n1-standard-2 --zone asia-east1-b --bucket info-dev
But Dataproc failed to create cluster with following errors in failure file:
cat
+ mysql -u hive -phive-password -e '' ERROR 2003 (HY000): Can't connect to MySQL server on 'localhost' (111)
+ mysql -e 'CREATE USER '\''hive'\'' IDENTIFIED BY '\''hive-password'\'';' ERROR 2003 (HY000): Can't connect to MySQL
server on 'localhost' (111)
Does anyone have any idea behind this failure ?
It looks like you're missing the --scopes sql-admin flag as described in the initialization action's documentation, which will prevent the CloudSQL proxy from being able to authorize its tunnel into your CloudSQL instance.
Additionally, aside from just the scopes, you need to make sure the default Compute Engine service account has the right project-level permissions in whichever project holds your CloudSQL instance. Normally the default service account is a project editor in the GCE project, so that should be sufficient when combined with the sql-admin scopes to access a CloudSQL instance in the same project, but if you're accessing a CloudSQL instance in a separate project, you'll also have to add that service account as a project editor in the project which owns the CloudSQL instance.
You can find the email address of your default compute service account under the IAM page for your project deploying Dataproc clusters, with the name "Compute Engine default service account"; it should look something like <number>#project.gserviceaccount.com`.
I am assuming that you already created the Cloud SQL instance with something like this, correct?
gcloud sql instances create g-test-1022 \
--tier db-n1-standard-1 \
--activation-policy=ALWAYS
If so, then it looks like the error is in how the argument for the metadata is formatted. You have this:
--metadata "hive-metastore-instance=g-test-1022:asia-east1:db_instance”
Unfortuinately, the zone looks to be incomplete (asia-east1 instead of asia-east1-b).
Additionally, with running that many initializayion actions, you'll want to provide a pretty generous initialization action timeout so the cluster does not assume something has failed while your actions take awhile to install. You can do that by specifying:
--initialization-action-timeout 30m
That will allow the cluster to give the initialization actions 30 minutes to bootstrap.
By the time you reported, it was detected an issue with cloud sql proxy initialization action. It is most probably that such issue affected you.
Nowadays, it shouldn't be an issue.

Mac terminal restriction

I'm building a web API that use a command line on the server.
This command do certain build tasks and it has many optional arguments
I'm not so happy about building a rest API to handle all the arguments and escape/validate all security risk there is...
Is there anyway you could instead whitelist only one command to run by a certain proccess/program or user?
Or can it be as easy as validateing the command if it contains ; if it dose > consider it as unsafe to run?
Or can you create a little sandbox?
You could create a user (login account) on your server that runs your whitelisted command instead of a regular shell when he logs in, if that's what you mean.
For example, I have worked at some sites where there is a user account called "reset" and another called "mountall". So, if you know the password of that account, when you log in, certain databases get reset and you get logged out, or all filesystems in /etc/fstab get mounted and then you get logged out.