Creating bigquery connection for airflow using config file - google-bigquery

I am trying to create bigquery connection. Below config is present in a yml file
gcp-conn:
conn_type: google_cloud_platform
conn_extra: '{ "extra__google_cloud_platform__key_path":"/usr/local/airflow/key.json", "extra__google_cloud_platform__project": "<project_name>", "extra__google_cloud_platform__scope": "https://www.googleapis.com/auth/cloud-platform"}'
Command: inv create-airflow-connections --env-file <yml_file>
Connection gets created but when i browse it from UI, leads me to an oops page with error:
Error:
File "/usr/local/lib/python3.6/site-packages/airflow/www/views.py", line 3054, in on_form_prefill
value = d.get(field, '')
AttributeError: 'str' object has no attribute 'get'
Any idea why is this happening?

I believe it wants something like
- conn_id: bigquery-warehouse
conn_type: google_cloud_platform
conn_extra:
extra__google_cloud_platform__project: "my_google_cloud_project_id"
extra__google_cloud_platform__key_path: "usr/local/airflow/service-account.json"
extra__google_cloud_platform__scope: "https://www.googleapis.com/auth/cloud-platform,https://www.googleapis.com/auth/drive"
- conn_id: google_cloud_default
conn_type: google_cloud_platform
conn_extra:
extra__google_cloud_platform__project: "my_google_cloud_project_id"
extra__google_cloud_platform__key_path: "usr/local/airflow/service-account.json"

Related

DBT: How to fix Database Error Expecting Value?

I was running into troubles today while running Airflow and airflow-dbt-python. I tried to debug a bit using the logs and the error shown in the logs was this one:
[2022-12-27, 13:53:53 CET] {functions.py:226} ERROR - [0m12:53:53.642186 [error] [MainThread]: Encountered an error:
Database Error
Expecting value: line 2 column 5 (char 5)
Quite a weird one.
Possibly check your credentials file that allows DBT to run queries on your DB (in our case we run DBT with BigQuery), in our case the credentials file was empty. We even tried to run DBT directly in the worker instead of running it through airflow, giving as a result exactly the same error. Unfortunately this error is not really explicit.

SQLite3 database is Locked in Azure

I have a Flask server Running on Azure provided by Azure App services with sqlite3 as a database. I am unable to update sqlite3 as it is showing that database is locked
2018-11-09T13:21:53.854367947Z [2018-11-09 13:21:53,835] ERROR in app: Exception on /borrow [POST]
2018-11-09T13:21:53.854407246Z Traceback (most recent call last):
2018-11-09T13:21:53.854413046Z File "/home/site/wwwroot/antenv/lib/python3.7/site-packages/flask/app.py", line 2292, in wsgi_app
2018-11-09T13:21:53.854417846Z response = self.full_dispatch_request()
2018-11-09T13:21:53.854422246Z File "/home/site/wwwroot/antenv/lib/python3.7/site-packages/flask/app.py", line 1815, in full_dispatch_request
2018-11-09T13:21:53.854427146Z rv = self.handle_user_exception(e)
2018-11-09T13:21:53.854431646Z File "/home/site/wwwroot/antenv/lib/python3.7/site-packages/flask/app.py", line 1718, in handle_user_exception
2018-11-09T13:21:53.854436146Z reraise(exc_type, exc_value, tb)
2018-11-09T13:21:53.854440346Z File "/home/site/wwwroot/antenv/lib/python3.7/site-packages/flask/_compat.py", line 35, in reraise
2018-11-09T13:21:53.854444746Z raise value
2018-11-09T13:21:53.854448846Z File "/home/site/wwwroot/antenv/lib/python3.7/site-packages/flask/app.py", line 1813, in full_dispatch_request
2018-11-09T13:21:53.854453246Z rv = self.dispatch_request()
2018-11-09T13:21:53.854457546Z File "/home/site/wwwroot/antenv/lib/python3.7/site-packages/flask/app.py", line 1799, in dispatch_request
2018-11-09T13:21:53.854461846Z return self.view_functions[rule.endpoint](**req.view_args)
2018-11-09T13:21:53.854466046Z File "/home/site/wwwroot/application.py", line 282, in borrow
2018-11-09T13:21:53.854480146Z cursor.execute("UPDATE books SET stock = stock - 1 WHERE bookid = ?",(bookid,))
2018-11-09T13:21:53.854963942Z sqlite3.OperationalError: database is locked
Here is the route -
#app.route('/borrow',methods=["POST"])
def borrow():
# import pdb; pdb.set_trace()
body = request.get_json()
user_id = body["userid"]
bookid = body["bookid"]
conn = sqlite3.connect("database.db")
cursor = conn.cursor()
date = datetime.now()
expiry_date = date + timedelta(days=30)
cursor.execute("UPDATE books SET stock = stock - 1 WHERE bookid = ?",(bookid,))
# conn.commit()
cursor.execute("INSERT INTO borrowed (issuedate,returndate,memberid,bookid) VALUES (?,?,?,?)",("xxx","xxx",user_id,bookid,))
conn.commit()
cursor.close()
conn.close()
return json.dumps({"status":200,"conn":"working with datess update"})
I tried checking the database integrity using pragma. There was no integrity loss. So I don't know what might be causing that error. Any help is Appreciated :)
I use Azure app service on Docker on Linux, and have the same issue. If you are using Azure app service on Windows, the problem is different from mine.
The problem is that /home is mounted as CIFS filesystem which can not deal with SQLite3 lock.
My workaround is to copy db.sqlite3 file to some directory other than /home, and properly set permissions and ownerships of the db.sqlite3 file and its directory as well. Then, let my project read/write it. However, this workaround is pretty awkward. I don't recommned.
Presumably this solution is not safe for production workloads but at least I got it working by executing the following command:
sqlite3 <database-file> 'PRAGMA journal_mode=wal;'
After running the above command, my database stored on an Azure File share works inside a container Web App.
I got it by setting up the azure mount options with the following configuration:
dir_mode=0777,file_mode=0777,uid=0,gid=0,mfsymlinks,nobrl,cache=strict
But the real solution is to add the flag nobrl (Byte-Range Lock).
Add storageclass example for kubernetes:
---
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: azureclass
provisioner: kubernetes.io/azure-file
mountOptions:
- dir_mode=0777
- file_mode=0777
- uid=0
- gid=0
- mfsymlinks
- nobrl
- cache=strict
parameters:
skuName: Standard_LRS
This answer appears toward the top of a typical Google search for this issue so I thought I'd add a couple of additional tips:
For those running JavaScript and using Sequelize as the interface to your SQLite DB, running
await sequelize.query('PRAGMA journal_mode=WAL;')
prior to creating your database will allow you to read/write the DB file in an Azure web app running under a Linux service plan. I have a separate script that creates one via a call to sequelize.sync(). I'm storing the DB file in a separate directory under /home within the file system for the Linux container. It seems to run fine and my workload is expected to be very light. Note that you don't need to set the journal mode again when your app starts and you try to connect to the database, that mode will be set in the file itself (this wasn't obvious from the SQLite docs).

Error in BQ shell Loading Datastore with write_disposition as Write append

1: I tried to load on an existing table [using Datastore file]
2. Bq Shell asked me to add write_disposition to write append to load to existing table
3. If I do the above, throws an error as follows:
load --source_format=DATASTORE_BACKUP --write_disposition=WRITE_append --allow_jagged_rows=None sample_red.t1estchallenge_1 gs://test.appspot.com/bucket/ahFzfnZpcmdpbi1yZWQtdGVzdHJBCxIcX0FFX0RhdGFzdG9yZUFkbWluX09wZXJhdGlvbhiBwLgCDAsSFl9BRV9CYWNrdXBfSW5mb3JtYXRpb24YAQw.entity.backup_info
Error parsing command: flag --allow_jagged_rows=None: ('Non-boolean argument to boolean flag',None)
I tried allow jagged rows = 0 and allow jagged rows = None, nothing works just the same error.
Please advise on this.
UPDATE: As Mosha suggested --allow_jagged_rows=false has worked. It should be before --write_disposition=Write_truncate. But this has led to another issue on encoding. Can anyone say what should be the encoding type for DATASTORE_BACKUP?. I tried both --encoding=UTF-8 and --encoding=ISO-8859.
load --source_format=DATASTORE_BACKUP --allow_jagged_rows=false --write_disposition=WRITE_TRUNCATE sample_red.t1estchallenge_1 gs://test.appspot.com/STAGING/ahFzfnZpcmdpbi1yZWQtdGVzdHJBCxIcX0FFX0RhdGFzdG9yZUFkbWluX09wZXJhdGlvbhiBwLgCDAsSFl9BRV9CYWNrdXBfSW5mb3JtYXRpb24YAQw.entityname.backup_info
Please advise.
You should use "false" (or "true") with boolean arguments, i.e.
--allow_jagged_rows=false

pyhs2/hive No files matching path file and file Exists

Using the hive or beeline client, I have no problem executing this statement:
hive -e "LOAD DATA LOCAL INPATH '/tmp/tmpBKe_Mc' INTO TABLE unit_test_hs2"
The data from the file is loaded successfully into hive.
However, when using pyhs2 from the same machine, the file is not found:
import pyhs2
conn_str = {'authMechanism':'NOSASL', 'host':'azus',}
conn = pyhs2.connect(conn_str)
with conn.cursor() as cur:
cur.execute("LOAD DATA LOCAL INPATH '/tmp/tmpBKe_Mc' INTO TABLE unit_test_hs2")
Throws exception:
Traceback (most recent call last):
File "data_access/hs2.py", line 38, in write
cur.execute("LOAD DATA LOCAL INPATH '%s' INTO TABLE %s" % (csv_file.name, table_name))
File "/edge/1/anaconda/lib/python2.7/site-packages/pyhs2/cursor.py", line 63, in execute
raise Pyhs2Exception(res.status.errorCode, res.status.errorMessage)
pyhs2.error.Pyhs2Exception: "Error while compiling statement: FAILED: SemanticException Line 1:23 Invalid path ''/tmp/tmpBKe_Mc'': No files matching path file:/tmp/tmpBKe_Mc"
I've seen similar questions posted about this problem, and the usual answer is that the query is running on a different server that doesn't have the local file '/tmp/tmpBKe_Mc' stored on it. However, if that is the case, why would running the command directly from the CLI work but using pyhs2 not work?
(Secondary question: how can I show which server is trying to handle the query? I've tried cur.execute("set"), which returns all configuration parameters but when grepping for "host" the returned parameters don't seem to contain a real hostname.)
Thanks!
This happens because pyhs2 trying to find file on cluster
Solution is to have your source saved in related hdfs location instead of /tmp

Registering a new Command Line Option in RYU App

I need to be able to read in a path file from my simple_switch.py application.I have added the following code to my simple_switch.py in python.
LOG = logging.getLogger(__name__)
CONF = cfg.CONF
CONF.register_cli_opts([
cfg.StrOpt('path-file', default='test.txt',
help='path-file')
])
I attempt to start the application as follows.
bin/ryu-manager --observe-links --path-file test.txt ryu/app/simple_switch.py
However I get the following error.
usage: ryu-manager [-h] [--app-lists APP_LISTS] [--ca-certs CA_CERTS]
[--config-dir DIR] [--config-file PATH]
[--ctl-cert CTL_CERT] [--ctl-privkey CTL_PRIVKEY]
[--default-log-level DEFAULT_LOG_LEVEL] [--explicit-drop]
[--install-lldp-flow] [--log-config-file LOG_CONFIG_FILE]
[--log-dir LOG_DIR] [--log-file LOG_FILE]
[--log-file-mode LOG_FILE_MODE]
[--neutron-admin-auth-url NEUTRON_ADMIN_AUTH_URL]
[--neutron-admin-password NEUTRON_ADMIN_PASSWORD]
[--neutron-admin-tenant-name NEUTRON_ADMIN_TENANT_NAME]
[--neutron-admin-username NEUTRON_ADMIN_USERNAME]
[--neutron-auth-strategy NEUTRON_AUTH_STRATEGY]
[--neutron-controller-addr NEUTRON_CONTROLLER_ADDR]
[--neutron-url NEUTRON_URL]
[--neutron-url-timeout NEUTRON_URL_TIMEOUT]
[--noexplicit-drop] [--noinstall-lldp-flow]
[--noobserve-links] [--nouse-stderr] [--nouse-syslog]
[--noverbose] [--observe-links]
[--ofp-listen-host OFP_LISTEN_HOST]
[--ofp-ssl-listen-port OFP_SSL_LISTEN_PORT]
[--ofp-tcp-listen-port OFP_TCP_LISTEN_PORT] [--use-stderr]
[--use-syslog] [--verbose] [--version]
[--wsapi-host WSAPI_HOST] [--wsapi-port WSAPI_PORT]
[--test-switch-dir TEST-SWITCH_DIR]
[--test-switch-target TEST-SWITCH_TARGET]
[--test-switch-tester TEST-SWITCH_TESTER]
[app [app ...]]
ryu-manager: error: unrecognized arguments: --path-file
It does look like I need to register a new command line option somewhere before I can use it.Can some-one point out to me how to do that? Also can someone explain how to access the file(text.txt) inside the program?
You're on the right track, however the CONF entry that you are creating actually needs to be loaded before your app is loaded, otherwise ryu-manager has no way of knowing it exists!
The file you are looking for is flags.py, under the ryu directory of the source tree (or under the root installation directory).
This is how the ryu/tests/switch/tester.py Ryu app defines it's own arguments, so you might use that as your reference:
CONF.register_cli_opts([
# tests/switch/tester
cfg.StrOpt('target', default='0000000000000001', help='target sw dp-id'),
cfg.StrOpt('tester', default='0000000000000002', help='tester sw dp-id'),
cfg.StrOpt('dir', default='ryu/tests/switch/of13',
help='test files directory')
], group='test-switch')
Following this format, the CONF.register_cli_opts takes an array of config types exactly as you have done it (see ryu/cfg.py for the different types available).
You'll notice that when you run the ryu-manager help, i.e.
ryu-manager --help
the list that comes up is sorted by application (e.g. the group of arguments under 'test-switch options'). For that reason, you will want to specify a group name for your set of commands.
Now let us say that you used the group name 'my-app' and have an argument named 'path-file' in that group, the command line argument will be --my-app-path-file (this can get a little long), while you can access it in your application like this:
from ryu import cfg
CONF = cfg.CONF
path_file = CONF['my-app']['path_file']
Note the use of dash versus the use of underscores.
Cheers!