When I run a query for a table around 20Million records the job gets successful in other environments but when I run the same on Windows command line I get an error saying - You have encountered a bug in the BigQuery CLI.
Query : bq --quiet --format=csv --project_id="proj_id" query
--max_rows=999999999 " select ids from dataset.table group each by id"
Do the execution change according to OS? The same query works for 5 Million records
Error log
== UTC timestamp ==
2015-02-19 08:33:16
== Error trace ==
File "C:\Program Files\Google\Cloud SDK\google-cloud-sdk\platform/bq\bq.py", line 719, in RunSafely
return_value = self.RunWithArgs(*args, **kwds)
File "C:\Program Files\Google\Cloud SDK\google-cloud-sdk\platform/bq\bq.py", line 1086, in RunWithArgs
max_rows=self.max_rows)
File "C:\Program Files\Google\Cloud SDK\google-cloud-sdk\platform\bq\bigquery_client.py", line 846, in ReadSchemaAndJobRows
return reader.ReadSchemaAndRows(start_row, max_rows)
File "C:\Program Files\Google\Cloud SDK\google-cloud-sdk\platform\bq\bigquery_client.py", line 2200, in ReadSchemaAndRows
rows.append(self._ConvertFromFV(schema, row))
========================================
Related
I work with SQL Server 2019 on server I face issue when I try to read an Excel file from shared path using python 3.10.
SQL Server exists on server 7.7 and files shared i need to access and read exist on same server .
When I execute reading to Excel file on local server, it is working from path D:\ExportExcel\testData.xlsx.
But when try to read the Excel from a shared Path as below
EXECUTE sp_execute_external_script
#language = N'Python',
#script = N'import pandas as pd
df = pd.read_excel(r"\\192.168.7.7\ExportExcel\testData.xlsx", sheet_name = "Sheet1")
print(df)';
I get an error:
Msg 39004, Level 16, State 20, Line 48
A 'Python' script error occurred during execution of 'sp_execute_external_script' with HRESULT 0x80004004.
Msg 39019, Level 16, State 2, Line 48
An external script error occurred:
Error in execution. Check the output for more information.
Traceback (most recent call last):
File "", line 5, in
File "D:\ProgramData\MSSQLSERVER\Temp-PY\Appcontainer1\9D383F5D-F77E-444E-9A82-B8839C8801E3\sqlindb_0.py", line 31, in transform
df = pd.read_excel(r"\192.168.7.7\ExportExcel\testData.xlsx", sheet_name = "Sheet1")
File "D:\SQL Data\MSSQL15.MSSQLSERVER\PYTHON_SERVICES\lib\site-packages\pandas\util_decorators.py", line 178, in wrapper
return func(*args, **kwargs)
File "D:\SQL Data\MSSQL15.MSSQLSERVER\PYTHON_SERVICES\lib\site-packages\pandas\util_decorators.py", line 178, in wrapper
return func(*args, **kwargs)
File "D:\SQL Data\MSSQL15.MSSQLSERVER\PYTHON_SERVICES\lib\site-packages\pandas\io\excel.py", line 307, in read_excel
io = ExcelFile(io, engine=engine)
File "D:\SQL Data\MSSQL15.MSSQLSERVER\PYTHON_SERVICES\lib\site-packages\pandas\io\excel.py", line 394, in init
Msg 39019, Level 16, State 2, Line 48
An external script error occurred:
self.book = xlrd.open_workbook(self.io)
File "D:\SQL Data\MSSQL15.MSSQLSERVER\PYTHON_SERVICES\lib\site-packages\xlrd_init.py", line 111, in open_workbook
with open(filename, "rb") as f:
PermissionError: [Errno 13] Permission denied: '\\192.168.7.7\ExportExcel\testData.xlsx'
SqlSatelliteCall error: Error in execution. Check the output for more information.
STDOUT message(s) from external script:
SqlSatelliteCall function failed. Please see the console output for more information.
Traceback (most recent call last):
File "D:\SQL Data\MSSQL15.MSSQLSERVER\PYTHON_SERVICES\lib\site-packages\revoscalepy\computecontext\RxInSqlServer.py", line 605, in rx_sql_satellite_call
rx_native_call("SqlSatelliteCall", params)
File "D:\SQL Data\MSSQL15.MSSQLSERVER\PYTHON_SERVICES\lib\site-packages\revoscalepy\RxSerializable.py", line 375, in rx_native_call
ret = px_call(functionname, params)
RuntimeError: revoscalepy function failed.
How to solve issue above please?
What I tried:
I try to open shared path from run; I can open it and create new file and read and write on same path
I tried to use another tool for reading as openrowset
select *
from OPENROWSET('Microsoft.ACE.OLEDB.12.0', 'Excel 12.0 Xml;Database=\\192.168.7.7\ExportExcel\testData.xlsx;HDR=YES','select * FROM [Sheet1$]')
and it read the Excel file successfully.
Folder path and file have all permission like
network service and owner and administrator and authenticated user and all application package and every one and all these have full control over all that .
Please - what could be the issue?
I tried using python pysharm to run script python
for shared path it read success .
My streaming job is now failing with the below error, streaming job worked fine for almost 2 months, and it is completely stateless transformation and just needs to append the new rows to the destination delta table. Before streaming, I'm manually providing the schema to a csv files, even verified the streaming job schema and downstream table schema both matches perfectly along with the datatype.
Not sure, why even in the stateless transformation, I'm getting the below error. Any help would be appreciated.
File "/databricks/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 2442, in _call_proxy
return_value = getattr(self.pool[obj_id], method)(*params)
File "/databricks/spark/python/pyspark/sql/utils.py", line 195, in call
raise e
File "/databricks/spark/python/pyspark/sql/utils.py", line 192, in call
self.func(DataFrame(jdf, self.sql_ctx), batch_id)
File "<command-422857213447422>", line 2, in write_to_managed_table
print(f"inside foreachBatch for batch_id:{batchId}, rows in passed dataframe: {micro_batch_df.count()}")
File "/databricks/spark/python/pyspark/sql/dataframe.py", line 670, in count
return int(self._jdf.count())
File "/databricks/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1304, in __call__
return_value = get_return_value(
File "/databricks/spark/python/pyspark/sql/utils.py", line 110, in deco
return f(*a, **kw)
File "/databricks/spark/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", line 326, in get_return_value
raise Py4JJavaError(
py4j.protocol.Py4JJavaError: An error occurred while calling o433.count.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 28 in stage 13792.0
failed 4 times, most recent failure: Lost task 28.3 in stage 13792.0 (TID 752198)
(10.139.64.13 executor 45):
org.apache.spark.sql.execution.streaming.state.StateSchemaNotCompatible: Provided schema
doesn't match to the schema for existing state! Please note that Spark allow difference of
field name: check count of fields and data type of each field.
There might a problem with the CSV file, it could be corrupted.
You can ignore this csv file by setting the "mode" option to "PERMISSIVE" or "DROPMALFORMED".
mode (default PERMISSIVE): allows a mode for dealing with corrupt records during parsing.
PERMISSIVE : sets other fields to null when it meets a corrupted record. When a schema is set by user, it sets null for extra fields.
DROPMALFORMED : ignores the whole corrupted records.
FAILFAST : throws an exception when it meets corrupted records.
https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/streaming/DataStreamReader.html#csv(path:String):org.apache.spark.sql.DataFrame
spark.read.format("csv")
.option("header,"true")
.option("path","your.csv")
.option("mode","DROPMALFORMED")
.schema(csvSchema)
.load()
I am running in snakefile on cluster using this default config:
"__default__":
"account" : "myAccount"
"queue" : "myQueue"
"nCPUs" : "16"
"memory" : 20000
"resources" : "\"select[mem>20000] rusage[mem=20000] span[hosts=1]\""
"name" : "JOBNAME.{rule}.{wildcards}"
"output" : "{rule}.{wildcards}.out"
"error" : "{rule}.{wildcards}.err"
"time" : "24:00:00"
It runs ok for some rules, but it raise this error for one of the rules.
Traceback (most recent call last):
File "home/conda3_64/lib/python3.5/site-packages/snakemake/__init__.py", line 537, in snakemake
report=report)
File "home/conda3_64/lib/python3.5/site-packages/snakemake/workflow.py", line 653, in execute
success = scheduler.schedule()
File "home/conda3_64/lib/python3.5/site-packages/snakemake/scheduler.py", line 286, in schedule
self.run(job)
File "home/conda3_64/lib/python3.5/site-packages/snakemake/scheduler.py", line 302, in run
error_callback=self._error)
File "home/conda3_64/lib/python3.5/site-packages/snakemake/executors.py", line 638, in run
jobscript = self.get_jobscript(job)
File "home/conda3_64/lib/python3.5/site-packages/snakemake/executors.py", line 496, in get_jobscript
cluster=self.cluster_wildcards(job))
File "home/conda3_64/lib/python3.5/site-packages/snakemake/executors.py", line 556, in cluster_wildcards
return Wildcards(fromdict=self.cluster_params(job))
File "home/conda3_64/lib/python3.5/site-packages/snakemake/executors.py", line 547, in cluster_params
cluster.update(self.cluster_config.get(job.name, dict()))
ValueError: need more than 1 value to unpack
This is how I run the snakemake;
snakemake -j 20 --cluster-config ./config.yaml --cluster "qsub -A {cluster.account} -l walltime={cluster.time} -q {cluster.queue} -l nodes=1:ppn={cluster.nCPUs},mem={cluster.memory}" -p
If I run it without a cluster snakemake only it will run normally.
similar error is here ValueError: need more than 1 value to unpack python but I could not relate.
I cannot test it right now but based on the error I suspect the issue seems to be with your JOBNAME placeholder not being explicitly specified in call to qsub.
Adding -N some_name to your qsub arguments should resolve it.
So I have my bigquery tables split up by day - each table as that day's worth of data.
When running a select statement it seemed to work for some tables (for example adroit.raw_data_2013_12_09) but not for tables that are created over 3 days ago (for example adroit.raw_data_2013_12_05)
Here is the error readout:
bigquery service returned an invalid reply in query operation: pagetoken missing
for table '123856490061:_2863529bd240bcbd666b3debc039d3c62827fd67.anon64863979ca
e70f1bf67661ed0fafb2a8c5bb36e1'.
please make sure you are using the latest version of the bq tool and try again.
if this problem persists, you may have encountered a bug in the bigquery client.
google engineers monitor and answer questions on stack overflow, with the tag
google-bigquery:
http://stackoverflow.com/questions/ask?tags=google-bigquery
please include a brief description of the steps that led to this issue, as well
as the following information:
========================================
== platform ==
cpython:2.7.5:linux-2.6.26-2-xen-amd64-x86_64-with-debian-5.0.8
== bq version ==
v2.0.12
== command line ==
['/usr/local/bin/bq', 'query', '--max_rows=100000', '--format=csv', '-q', 'select count(*) as views, template_id, string_id, state_id, reel_1,reel_2,reel_3,reel_4,reel_5,reel_6,reel_7,reel_8,reel_9,reel_10 from adroit.raw_data_2013_12_03 where operation_type in (1,3) and template_id in (2659,2660,2661) group by template_id, string_id, state_id, reel_1,reel_2,reel_3,reel_4,reel_5,reel_6,reel_7,reel_8,reel_9,reel_10']
== utc timestamp ==
2013-12-10 23:15:43
== error trace ==
file "build/bdist.linux-x86_64/egg/bq.py", line 652, in runsafely
return_value = self.runwithargs(*args, **kwds)
file "build/bdist.linux-x86_64/egg/bq.py", line 932, in runwithargs
max_rows=self.max_rows)
file "build/bdist.linux-x86_64/egg/bq.py", line 383, in _printtable
fields, rows = client.readschemaandrows(table_dict, **extra_args)
file "build/bdist.linux-x86_64/egg/bigquery_client.py", line 668, in readschemaandrows
self.readtablerows(table_dict, max_rows))
file "build/bdist.linux-x86_64/egg/bigquery_client.py", line 649, in readtablerows
apiclienthelper.tablereference.create(**table_dict),))
========================================
unexpected exception in query operation: pagetoken missing for table '1238564900
61:_2863529bd240bcbd666b3debc039d3c62827fd67.anon64863979cae70f1bf67661ed0fafb2a
8c5bb36e1'
So I tried easy_install update and the script says that my bq client is up to date. I'm at a loss as to why I'm getting this error. Thanks.
Looks like you're using an old version of bq. The latest version is 2.0.17.
$ bq version
This is BigQuery CLI v2.0.17
When you ran easy_install, did you use the --upgrade flag? If not, it doesn't actually update the version that you're using.
You can see the latest version at the download page here.
Had ran a overnight job using a script.
It worked for lot of tables and then some 4 hrs back 7am IST approx started behaving weird
Now even single commands give the same error
bq load --max_bad_records=10 tbl163.a_V3_14Jun2012 a_V3_14Jun2012.log.gz ../schema/analyze.schema
Error:
BigQuery error in load operation: Could not connect with BigQuery server, http
response status: 502
Update: I have received the following error just now
You have encountered a bug in the BigQuery CLI. Please send an email to bigquery- team#google.com to report this, with the following information:
========================================
== Platform ==
CPython:2.7.3:Linux-3.2.0-25-virtual-x86_64-with-Ubuntu-12.04-precise
== bq version ==
v2.0.6
== Command line ==
['/usr/local/bin/bq', 'load', '--max_bad_records=10', 'vizvrm299.analyze_VIZVRM299_26Jun2012', 'analyze_VIZVRM299_26Jun2012.log.gz', '../schema/analyze.schema']
== Error trace ==
File "build/bdist.linux-x86_64/egg/bq.py", line 614, in RunSafely
self.RunWithArgs(*args, **kwds)
File "build/bdist.linux-x86_64/egg/bq.py", line 791, in RunWithArgs
job = client.Load(table_reference, source, schema=schema, **opts)
File "build/bdist.linux-x86_64/egg/bigquery_client.py", line 1473, in Load
upload_file=upload_file, **kwds)
File "build/bdist.linux-x86_64/egg/bigquery_client.py", line 1228, in ExecuteJob
job_id=job_id)
File "build/bdist.linux-x86_64/egg/bigquery_client.py", line 1214, in RunJobSynchronously
upload_file=upload_file, job_id=job_id)
File "build/bdist.linux-x86_64/egg/bigquery_client.py", line 1208, in StartJob
projectId=project_id).execute()
File "build/bdist.linux-x86_64/egg/bigquery_client.py", line 184, in execute
return super(BigqueryHttp, self).execute(**kwds)
File "build/bdist.linux-x86_64/egg/apiclient/http.py", line 644, in execute
_, body = self.next_chunk(http)
File "build/bdist.linux-x86_64/egg/apiclient/http.py", line 708, in next_chunk
raise ResumableUploadError("Failed to retrieve starting URI.")
========================================
Unexpected exception in load operation: Failed to retrieve starting URI.
Just to close out this question, are you still observing these errors? This looks like a network configuration issue, not directly an issue with the bq command line tool. However, AFAIK, bq doesn't provide functions for resuming job insertion if there is a problem with the network.