When calling the load endpoint with a query > ~1700 bytes, we are receiving a 413 (request entity too large) error. We have narrowed it down to between 1706 and 1758.
Steps to reproduce the behavior:
post large query to <host>:<port>/cubejs-api/v1/load
Receive 413
removing one or two entries from Dimensions will cause the query to work as expected
standard response JSON and 200 status
Version: 0.31.14
The smallest failing query we have is:
{
"query": {
"measures": [
"rpt_clm_clm_990_clab9f47d0000356e2szw7p19.count",
"rpt_clm_clm_990_clab9f47d0000356e2szw7p19.totalDiscAmt",
"rpt_clm_clm_990_clab9f47d0000356e2szw7p19.totalCopayAmt",
"rpt_clm_clm_990_clab9f47d0000356e2szw7p19.totalDedctAmt",
"rpt_clm_clm_990_clab9f47d0000356e2szw7p19.totalRejAmt",
"rpt_clm_clm_990_clab9f47d0000356e2szw7p19.totalExGrtaAmt",
"rpt_clm_clm_990_clab9f47d0000356e2szw7p19.totalClmsGrsAmt",
"rpt_clm_clm_990_clab9f47d0000356e2szw7p19.totalClmsStlAmt",
"rpt_clm_clm_990_clab9f47d0000356e2szw7p19.totalClmsRbnsAmt",
"rpt_clm_clm_990_clab9f47d0000356e2szw7p19.totalClmsIbnrAmt",
"rpt_clm_clm_990_clab9f47d0000356e2szw7p19.totalClmsIncrAmt",
"rpt_clm_clm_990_clab9f47d0000356e2szw7p19.totalClmsRskStlAmt",
"rpt_clm_clm_990_clab9f47d0000356e2szw7p19.totalClmsRskRbnsAmt",
"rpt_clm_clm_990_clab9f47d0000356e2szw7p19.totalClmsRskIbnrAmt",
"rpt_clm_clm_990_clab9f47d0000356e2szw7p19.totalClmsRskIncrAmt",
"rpt_clm_clm_990_clab9f47d0000356e2szw7p19.totalClmsReinsAmt",
"rpt_clm_clm_990_clab9f47d0000356e2szw7p19.totalClmsReinsRbnsAmt",
"rpt_clm_clm_990_clab9f47d0000356e2szw7p19.totalClmsReinsIbnrAmt",
"rpt_clm_clm_990_clab9f47d0000356e2szw7p19.totalClmsReinsIncrAmt"
],
"dimensions": [
"rpt_clm_clm_990_clab9f47d0000356e2szw7p19.clntCoCd",
"rpt_clm_clm_990_clab9f47d0000356e2szw7p19.trtyClntGrpNm",
"rpt_clm_clm_990_clab9f47d0000356e2szw7p19.grpNbr",
"rpt_clm_clm_990_clab9f47d0000356e2szw7p19.primLfNbr",
"rpt_clm_clm_990_clab9f47d0000356e2szw7p19.insLfNbr",
"rpt_clm_clm_990_clab9f47d0000356e2szw7p19.rskIncptnDt",
"rpt_clm_clm_990_clab9f47d0000356e2szw7p19.rskRnwlDt",
"rpt_clm_clm_990_clab9f47d0000356e2szw7p19.trtmtStrtDt",
"rpt_clm_clm_990_clab9f47d0000356e2szw7p19.trtmtEndDt",
"rpt_clm_clm_990_clab9f47d0000356e2szw7p19.admsnDt"
]
},
"queryType": "multi"
}
I tried submitting an issue on the cube.js github, but they have marked it as a question and asked that I post it here. I have also searched their docs and have not been able to find any configuration that relates to this. It looks like the max payload size is hard-coded to 1M (see link), but here we are failing with 1758 bytes.
Related
i'm doing Stress Test for my API for two endpoint. First is /api/register and second is /api/verify_tac/
request body on /api/register is
{
"provider_id": "lifecare.com.my",
"user_id": ${random},
"secure_word": "Aa123456",
"id_type": "0",
"id_number": "${id_number}",
"full_name": "test",
"gender": "F",
"dob": "2009/11/11",
"phone_number": ${random},
"nationality": "MY"
}
where ${random} and ${id_number} is a list from csv data config.
while request body for verify_tac is
{
"temp_token": "${temp_token}",
"tac":"123456"
}
${temp_token} is a response extract from /api/register response body.
For the test. I have done 5 type of testing without returning all error.
100 users with 60 seconds ramp up periods. All success.
200 users with 60 seconds ramp up periods. All success.
300 users with 60 seconds ramp up periods. All success.
400 users with 60 seconds ramp up periods. All success.
500 users with 60 seconds ramp up periods. All success.
600 users with 60 seconds ramp up periods. most of the /api/register response data is empty resulting in /api/verify_tac return with an error. request data from /api/verify_tac that return an error is
{
"temp_token": "NotFound",
"tac":"123456"
}
How can test number 6 was return with an error while all other 5 does not return error. They had the same parameter.
Does this means my api is overload with request? or weather my testing parameter is wrong?
If for 600 users response body is empty - then my expectation is that your application simply gets overloaded and cannot handle 600 users.
You can add a listener like Simple Data Writer configured as below:
this way you will be able to see request and response details for failing requests. If you untick Errors box JMeter will store request and response details for all requests. This way you will be able to see response message, headers, body, etc. for previous request and determine the failure reason.
Also it would be good to:
Monitor the essential resources usage (like CPU, RAM, Disk, Network, Swap usage, etc.) on the application under test side, it can be done using i.e. JMeter PerfMon Plugin
Check your application logs for any suspicious entries
Re-run your test with profiler tool for .NET like YourKit, this way you will be able to see the most "expensive" functions and identify where the application spends most time and what is the root cause of the problems
From various other StackOverflow posts I understand I can do a Zeppelin API call to run and get the output from a paragraph using the URL:
https://[zeppelin url]:[port]/api/notebook/run/[note ID]/[paragraph ID]
but this gives me:
HTTP ERROR 405
Problem accessing /api/notebook/run/2GG52SU6/2025492809-066545_207456631. Reason:
Method Not Allowed
Is there a way of fixing this? Other API calls work fine and the paragraph runs fine in the Zeppelin Web UI (it just does a simple Impala query). I just want to get the output via a REST API so I can call it from an Angular paragraph and manipulate the results before display.
Thanks!
The documentation of the run paragraph api states it to be a post request ; If you send the get request it will fail with 405 not allowed.
curl -X POST http://localhost:8000/zeppelin/api/notebook/run/2GUEWJDQ4/paragraph_1642773079113_366171993|jq
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 298 100 298 0 0 2712 0 --:--:-- --:--:-- --:--:-- 2733
{
"status": "OK",
"body": {
"code": "SUCCESS",
"msg": [
{
"type": "TEXT",
"data": "common.cmd\ncommon.sh\nfunctions.cmd\nfunctions.sh\ninstall-interpreter.sh\ninterpreter.cmd\ninterpreter.sh\nstop-interpreter.sh\nupgrade-note.sh\nzeppelin-daemon.sh\nzeppelin-systemd-service.sh\nzeppelin.cmd\nzeppelin.sh\n"
}
]
}
}
I wanted to try to upload big JSON record object to BigQuery.
I am talking of JSON records of 1.5 MB each, with a complex nested schema up to 7th degree.
For simplicity, I started to load file with a single record on one line.
At first I try to have BigQuery to autodetect my schema, but that resulted in table that is not responsive and I cannot perform query on, albeit it says it had at least a record.
Then, assuming that my schema could be too hard to reverse for the loader, I tried to write the schema myself and I then I tried to load my my file with single record.
At first I got a simple error with just "invalid".
bq load --source_format=NEWLINE_DELIMITED_JSON invq_data.test_table
my_single_json_record_file
Upload complete.
Waiting on bqjob_r5a4ce64904bbba9d_0000015e14aba735_1 ... (3s) Current
status: DONE
BigQuery error in load operation: Error processing job 'invq-
test:bqjob_r5a4ce64904bbba9d_0000015e14aba735_1': JSON table
encountered too many errors, giving up. Rows:
1; errors: 1.
Which after checking for the job error was just giving me the following:
"status": {
"errorResult": {
"location": "file-00000000",
"message": "JSON table encountered too many errors, giving up. Rows: 1; errors: 1.",
"reason": "invalid"
},
"errors": [
{
"location": "file-00000000",
"message": "JSON table encountered too many errors, giving up. Rows: 1; errors: 1.",
"reason": "invalid"
}
],
"state": "DONE"
},
The after a couple of more attempts creating new tables, it actually started to succeed on command line, without reporting errors:
bq load --max_bad_records=1 --source_format=NEWLINE_DELIMITED_JSON invq_data.test_table_4 my_single_json_record_file
Upload complete.
Waiting on bqjob_r368f1dff98600a4b_0000015e14b43dd5_1 ... (16s) Current status: DONE
with no error on the status checker...
"statistics": {
"creationTime": "1503585955356",
"endTime": "1503585973623",
"load": {
"badRecords": "0",
"inputFileBytes": "1494390",
"inputFiles": "1",
"outputBytes": "0",
"outputRows": "0"
},
"startTime": "1503585955723"
},
"status": {
"state": "DONE"
},
But no actual records are added to my tables.
I tried to perform the same from WebUI but the result is the same. Green on the completed job, but no actual record added.
Is there something else that I can do for checking where the data is sinking to? Maybe some more log?
I can imagine that maybe I am on the the edge of the 2 MB JSON row size limit but, if so, should this be reported as error?
Thanks in advance for the help!!
EDIT:
It turned out the complexity of my schema was a bit the devil in here.
My json files were valid, but my complex schema had several errors.
It turned out that I had to simplify it such schema anyway, because I got a new batch of data where single json instances where more 30MB and I had to restructure this data in a more relational way, whilst making smaller rows to insert in the database.
Funny enough when the schema was scattered across multiple entities (ergo, simplified) the actually error/inconsistencies of the schema started to actually show up in error returned and it was easier to fix them. (Mostly it was new nested undocumented data which I was not aware anyway... but still my bad).
The lesson here, is when a table schema is too long (I didn't experiment how much precisely is too long) BigQuery just hide itself behind reporting too many errors to show.
But that is a point where you should consider simplify the schema(/structure) of your data.
I'm trying to insert data into BigQuery using the BigQuery Api C# Sdk.
I created a new Job with Json Newline Delimited data.
When I use :
100 lines for inputs : OK
250 lines for inputs : OK
500 lines for inputs : KO
2500 lines : KO
The error encountered is :
"status": {
"state": "DONE",
"errorResult": {
"reason": "invalid",
"message": "Too many errors encountered. Limit is: 0."
},
"errors": [
{
"reason": "internalError",
"location": "File: 0",
"message": "Unexpected. Please try again."
},
{
"reason": "invalid",
"message": "Too many errors encountered. Limit is: 0."
}
]
}
The file works well when I use the Bq Tools with command :
bq load --source_format=NEWLINE_DELIMITED_JSON dataset.datatable pathToJsonFile
Something seems to be wrong on server side or maybe when I transmit the file but we cannot get more log than "internal server error"
Does anyone have more informations on this ?
Thanks you
"Unexpected. Please try again." could either indicate that the contents of the files you provided had unexpected characters, or it could mean that an unexpected internal server condition occurred. There are several questions which might help shed some light on this:
does this consistently happen no matter how many times you retry?
does this directly depend on the lines in the file, or can you construct a simple upload file which doesn't trigger the error condition?
One option to potentially avoid these problems is to send the load job request with configuration.load.maxBadRecords higher than zero.
Feel free to comment with more info and I can maybe update this answer.
Streaming data into BigQuery keeps failing due to the following error, which occurs more frequently recently:
com.google.api.client.googleapis.json.GoogleJsonResponseException: 503 Service Unavailable
{
"code" : 503,
"errors" : [ {
"domain" : "global",
"message" : "Connection error. Please try again.",
"reason" : "backendError"
} ],
"message" : "Connection error. Please try again."
}
at com.google.api.client.googleapis.json.GoogleJsonResponseException.from(GoogleJsonResponseException.java:145)
at com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:113)
at com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:40)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest$1.interceptResponse(AbstractGoogleClientRequest.java:312)
at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1049)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:410)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:343)
Relevant question references:
Getting high rate of 503 errors with BigQuery Streaming API
BigQuery - BackEnd error when loading from JAVA API
We (the BigQuery team) are looking into your report of increased connection errors. From the internal monitoring, there hasn't been global a spike in connection errors in the last several days. However, that doesn't mean that your tables, specifically, weren't affected.
Connection errors can be tricky to chase down, because they can be caused by errors before they get to the BigQuery servers or after they leave. The more information your can provide, the easier it is for us to diagnose the issue.
The best practice for streaming input is to handle temporary errors like this to retry the request. It can be a little tricky, since when you get a connection error you don't actually know whether the insert succeeded. If you include a unique insertId with your data (see the documentation here), you can safely resend the request (within the deduplication window period, which I think is 15 minutes) without worrying that the same row will get added multiple times.