pyhs2/hive No files matching path file and file Exists

pyhs2/hive No files matching path file and file Exists - hive

Using the hive or beeline client, I have no problem executing this statement:
hive -e "LOAD DATA LOCAL INPATH '/tmp/tmpBKe_Mc' INTO TABLE unit_test_hs2"
The data from the file is loaded successfully into hive.
However, when using pyhs2 from the same machine, the file is not found:
import pyhs2
conn_str = {'authMechanism':'NOSASL', 'host':'azus',}
conn = pyhs2.connect(conn_str)
with conn.cursor() as cur:
cur.execute("LOAD DATA LOCAL INPATH '/tmp/tmpBKe_Mc' INTO TABLE unit_test_hs2")
Throws exception:
Traceback (most recent call last):
File "data_access/hs2.py", line 38, in write
cur.execute("LOAD DATA LOCAL INPATH '%s' INTO TABLE %s" % (csv_file.name, table_name))
File "/edge/1/anaconda/lib/python2.7/site-packages/pyhs2/cursor.py", line 63, in execute
raise Pyhs2Exception(res.status.errorCode, res.status.errorMessage)
pyhs2.error.Pyhs2Exception: "Error while compiling statement: FAILED: SemanticException Line 1:23 Invalid path ''/tmp/tmpBKe_Mc'': No files matching path file:/tmp/tmpBKe_Mc"
I've seen similar questions posted about this problem, and the usual answer is that the query is running on a different server that doesn't have the local file '/tmp/tmpBKe_Mc' stored on it. However, if that is the case, why would running the command directly from the CLI work but using pyhs2 not work?
(Secondary question: how can I show which server is trying to handle the query? I've tried cur.execute("set"), which returns all configuration parameters but when grepping for "host" the returned parameters don't seem to contain a real hostname.)
Thanks!

This happens because pyhs2 trying to find file on cluster
Solution is to have your source saved in related hdfs location instead of /tmp

Related

Streaming job failure-State Schema not Compatible issue

My streaming job is now failing with the below error, streaming job worked fine for almost 2 months, and it is completely stateless transformation and just needs to append the new rows to the destination delta table. Before streaming, I'm manually providing the schema to a csv files, even verified the streaming job schema and downstream table schema both matches perfectly along with the datatype.
Not sure, why even in the stateless transformation, I'm getting the below error. Any help would be appreciated.
File "/databricks/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 2442, in _call_proxy
return_value = getattr(self.pool[obj_id], method)(*params)
File "/databricks/spark/python/pyspark/sql/utils.py", line 195, in call
raise e
File "/databricks/spark/python/pyspark/sql/utils.py", line 192, in call
self.func(DataFrame(jdf, self.sql_ctx), batch_id)
File "<command-422857213447422>", line 2, in write_to_managed_table
print(f"inside foreachBatch for batch_id:{batchId}, rows in passed dataframe: {micro_batch_df.count()}")
File "/databricks/spark/python/pyspark/sql/dataframe.py", line 670, in count
return int(self._jdf.count())
File "/databricks/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1304, in __call__
return_value = get_return_value(
File "/databricks/spark/python/pyspark/sql/utils.py", line 110, in deco
return f(*a, **kw)
File "/databricks/spark/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", line 326, in get_return_value
raise Py4JJavaError(
py4j.protocol.Py4JJavaError: An error occurred while calling o433.count.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 28 in stage 13792.0
failed 4 times, most recent failure: Lost task 28.3 in stage 13792.0 (TID 752198)
(10.139.64.13 executor 45):
org.apache.spark.sql.execution.streaming.state.StateSchemaNotCompatible: Provided schema
doesn't match to the schema for existing state! Please note that Spark allow difference of
field name: check count of fields and data type of each field.

There might a problem with the CSV file, it could be corrupted.
You can ignore this csv file by setting the "mode" option to "PERMISSIVE" or "DROPMALFORMED".
mode (default PERMISSIVE): allows a mode for dealing with corrupt records during parsing.
PERMISSIVE : sets other fields to null when it meets a corrupted record. When a schema is set by user, it sets null for extra fields.
DROPMALFORMED : ignores the whole corrupted records.
FAILFAST : throws an exception when it meets corrupted records.
https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/streaming/DataStreamReader.html#csv(path:String):org.apache.spark.sql.DataFrame
spark.read.format("csv")
.option("header,"true")
.option("path","your.csv")
.option("mode","DROPMALFORMED")
.schema(csvSchema)
.load()

Creating bigquery connection for airflow using config file

I am trying to create bigquery connection. Below config is present in a yml file
gcp-conn:
conn_type: google_cloud_platform
conn_extra: '{ "extra__google_cloud_platform__key_path":"/usr/local/airflow/key.json", "extra__google_cloud_platform__project": "<project_name>", "extra__google_cloud_platform__scope": "https://www.googleapis.com/auth/cloud-platform"}'
Command: inv create-airflow-connections --env-file <yml_file>
Connection gets created but when i browse it from UI, leads me to an oops page with error:
Error:
File "/usr/local/lib/python3.6/site-packages/airflow/www/views.py", line 3054, in on_form_prefill
value = d.get(field, '')
AttributeError: 'str' object has no attribute 'get'
Any idea why is this happening?

I believe it wants something like
- conn_id: bigquery-warehouse
conn_type: google_cloud_platform
conn_extra:
extra__google_cloud_platform__project: "my_google_cloud_project_id"
extra__google_cloud_platform__key_path: "usr/local/airflow/service-account.json"
extra__google_cloud_platform__scope: "https://www.googleapis.com/auth/cloud-platform,https://www.googleapis.com/auth/drive"
- conn_id: google_cloud_default
conn_type: google_cloud_platform
conn_extra:
extra__google_cloud_platform__project: "my_google_cloud_project_id"
extra__google_cloud_platform__key_path: "usr/local/airflow/service-account.json"

SQLite3 database is Locked in Azure

I have a Flask server Running on Azure provided by Azure App services with sqlite3 as a database. I am unable to update sqlite3 as it is showing that database is locked
2018-11-09T13:21:53.854367947Z [2018-11-09 13:21:53,835] ERROR in app: Exception on /borrow [POST]
2018-11-09T13:21:53.854407246Z Traceback (most recent call last):
2018-11-09T13:21:53.854413046Z File "/home/site/wwwroot/antenv/lib/python3.7/site-packages/flask/app.py", line 2292, in wsgi_app
2018-11-09T13:21:53.854417846Z response = self.full_dispatch_request()
2018-11-09T13:21:53.854422246Z File "/home/site/wwwroot/antenv/lib/python3.7/site-packages/flask/app.py", line 1815, in full_dispatch_request
2018-11-09T13:21:53.854427146Z rv = self.handle_user_exception(e)
2018-11-09T13:21:53.854431646Z File "/home/site/wwwroot/antenv/lib/python3.7/site-packages/flask/app.py", line 1718, in handle_user_exception
2018-11-09T13:21:53.854436146Z reraise(exc_type, exc_value, tb)
2018-11-09T13:21:53.854440346Z File "/home/site/wwwroot/antenv/lib/python3.7/site-packages/flask/_compat.py", line 35, in reraise
2018-11-09T13:21:53.854444746Z raise value
2018-11-09T13:21:53.854448846Z File "/home/site/wwwroot/antenv/lib/python3.7/site-packages/flask/app.py", line 1813, in full_dispatch_request
2018-11-09T13:21:53.854453246Z rv = self.dispatch_request()
2018-11-09T13:21:53.854457546Z File "/home/site/wwwroot/antenv/lib/python3.7/site-packages/flask/app.py", line 1799, in dispatch_request
2018-11-09T13:21:53.854461846Z return self.view_functions[rule.endpoint](**req.view_args)
2018-11-09T13:21:53.854466046Z File "/home/site/wwwroot/application.py", line 282, in borrow
2018-11-09T13:21:53.854480146Z cursor.execute("UPDATE books SET stock = stock - 1 WHERE bookid = ?",(bookid,))
2018-11-09T13:21:53.854963942Z sqlite3.OperationalError: database is locked
Here is the route -
#app.route('/borrow',methods=["POST"])
def borrow():
# import pdb; pdb.set_trace()
body = request.get_json()
user_id = body["userid"]
bookid = body["bookid"]
conn = sqlite3.connect("database.db")
cursor = conn.cursor()
date = datetime.now()
expiry_date = date + timedelta(days=30)
cursor.execute("UPDATE books SET stock = stock - 1 WHERE bookid = ?",(bookid,))
# conn.commit()
cursor.execute("INSERT INTO borrowed (issuedate,returndate,memberid,bookid) VALUES (?,?,?,?)",("xxx","xxx",user_id,bookid,))
conn.commit()
cursor.close()
conn.close()
return json.dumps({"status":200,"conn":"working with datess update"})
I tried checking the database integrity using pragma. There was no integrity loss. So I don't know what might be causing that error. Any help is Appreciated :)

I use Azure app service on Docker on Linux, and have the same issue. If you are using Azure app service on Windows, the problem is different from mine.
The problem is that /home is mounted as CIFS filesystem which can not deal with SQLite3 lock.
My workaround is to copy db.sqlite3 file to some directory other than /home, and properly set permissions and ownerships of the db.sqlite3 file and its directory as well. Then, let my project read/write it. However, this workaround is pretty awkward. I don't recommned.

Presumably this solution is not safe for production workloads but at least I got it working by executing the following command:
sqlite3 <database-file> 'PRAGMA journal_mode=wal;'
After running the above command, my database stored on an Azure File share works inside a container Web App.

I got it by setting up the azure mount options with the following configuration:
dir_mode=0777,file_mode=0777,uid=0,gid=0,mfsymlinks,nobrl,cache=strict
But the real solution is to add the flag nobrl (Byte-Range Lock).
Add storageclass example for kubernetes:
---
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: azureclass
provisioner: kubernetes.io/azure-file
mountOptions:
- dir_mode=0777
- file_mode=0777
- uid=0
- gid=0
- mfsymlinks
- nobrl
- cache=strict
parameters:
skuName: Standard_LRS

This answer appears toward the top of a typical Google search for this issue so I thought I'd add a couple of additional tips:
For those running JavaScript and using Sequelize as the interface to your SQLite DB, running
await sequelize.query('PRAGMA journal_mode=WAL;')
prior to creating your database will allow you to read/write the DB file in an Azure web app running under a Linux service plan. I have a separate script that creates one via a call to sequelize.sync(). I'm storing the DB file in a separate directory under /home within the file system for the Linux container. It seems to run fine and my workload is expected to be very light. Note that you don't need to set the journal mode again when your app starts and you try to connect to the database, that mode will be set in the file itself (this wasn't obvious from the SQLite docs).

LOAD DATA LOCAL INPUT PATH doesnt exist

I am new to the Spark and Scala Technology. I'm getting the following exception while trying to load a file from local file system into table using Spark.
Spark version -2.0 and Scala version - 2.11
scala> sqlContext.sql("LOAD DATA LOCAL INPATH 'file.txt' INTO TABLE student")
org.apache.spark.sql.AnalysisException: LOAD DATA input path does not exist: file.txt

Please try to give complete path as file:/complete path to the file.
In above case:
sqlContext.sql("LOAD DATA LOCAL INPATH 'file:/complete path to the file.txt' INTO TABLE student")
~Kedar

How to upgrade odoo 8 to odoo 9 database?

I am trying to upgrade an odoo installation from 8.0 to 9.0. What I've done so far is the following:
Backup the odoo database from the production system
Installed the backup DB as test in my current system
Copied the odoo folder in a folder on my system
Checked, if everything works. It works!
Updated to the latest v8.0 version, still works
Did a git checkout 9.0 followed by a git pull.
Started odoo 9.0 with the command ./openerp-server -d testDB -u all
This commands breaks with the following error and does not update my database:
LINE 1: select model, transient from ir_model where state='manual'
^
, in query select model, transient from ir_model where state=%s
2015-10-26 00:37:29,823 4501 CRITICAL testDB openerp.service.server:
Failed to initialize database `testDB`.
Traceback (most recent call last):
File "/opt/odoo/openerp/service/server.py", line 885, in preload_registries
registry = RegistryManager.new(dbname, update_module=update_module)
File "/opt/odoo/openerp/modules/registry.py", line 385, in new
openerp.modules.load_modules(registry._db, force_demo, status, update_module)
File "/opt/odoo/openerp/modules/loading.py", line 279, in load_modules
loaded_modules, processed_modules = load_module_graph(cr, graph, status, perform_checks=update_module, report=report)
File "/opt/odoo/openerp/modules/loading.py", line 136, in load_module_graph
registry.setup_models(cr, partial=True)
File "/opt/odoo/openerp/modules/registry.py", line 185, in setup_models
cr.execute('select model, transient from ir_model where state=%s', ('manual',))
File "/opt/odoo/openerp/sql_db.py", line 139, in wrapper
return f(self, *args, **kwargs)
File "/opt/odoo/openerp/sql_db.py", line 215, in execute
res = self._obj.execute(query, params)
ProgrammingError: column "transient" does not exist
LINE 1: select model, transient from ir_model where state='manual'
Are there any steps which I have to follow to upgrade the database or has everything to be done by hand? And if yes, what should I do? Obviously it failed because the specific column is non-existent in my database. But is there any update script because I fear, if I change this there will be the next error waiting for me.
Thanks in advance.

You can ask the odoo company to do that task for you by going to this link
.But they will charge money for that. If you can do it yourself here is the documentation on how to do that,
https://doc.therp.nl/openupgrade/intro.html
Option 2: We can use pgadmin(postgresql gui tool).Just select your database name and in the top you can see sql enabled,click it and issue an sql query to display all data(you must know the table name which you want to retreive) after that you can export it.The exported file contains all the data with column headings,we may have to rearrange columns according to odoo9 DB.Once it is done select odoo9 database,right click on the table name which you want to import data to and select import option.It may take a while and it should give message as "data imported successfully".

I found the answer on Github.
The trick is to create a field called transient which is Boolean with the default value false in the table ir_model.
As I expected, this is not the complete solution as there are other problem with the database needing adjustments.

You are trying to run a Odoo 8.0 database on Odoo 9.0.
The column 'transient' is in the code base for 9.0 and not in the 8.0 code base. Hence the 8.0 database is being ran on the 9.0 code base. Hence, the database has not been upgraded properly.
As stated in the previous answer. You can either get Odoo to do it or can do it yourself as well.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

pyhs2/hive No files matching path file and file Exists - hive

This happens because pyhs2 trying to find file on cluster Solution is to have your source saved in related hdfs location instead of /tmp

Related

Streaming job failure-State Schema not Compatible issue

Creating bigquery connection for airflow using config file

SQLite3 database is Locked in Azure

LOAD DATA LOCAL INPUT PATH doesnt exist

How to upgrade odoo 8 to odoo 9 database?

Categories

Resources