Per row size limits in BigQuery data? - google-bigquery

Is there a limit to the amount of data that can be put in a single row in BigQuery? Is there a limit on the size of a single column entry (cell)? (in bytes)
Is there a limitation when importing from Cloud Storage?

The largest size of a single row allowed is 1MB for CSV and 2 MB for JSON. There are no limits on field sizes, but obviously they must be under the row size as well.
These limits are described here.

Related

Why spark is reading more data that I expect it to read using read schema?

In my spark job, I'm reading a huge table (parquet) with more than 30 columns. To limit the size of data read I specify schema with one column only (I need only this one). Unfortunately, when reading the info in spark UI I get the information that the size of files read equals 1123.8 GiB but filesystem read data size total equals 417.0 GiB. I was expecting that if I take one from 30 columns the filesystem read data size total will be around 1/30 of the initial size, not almost half.
Could you explain to me why is that happening?

Google BigQuery Payload size limit of 10485760 bytes

We encountered an error while trying to stream data into bigquery table, it says: payload size limit of 10485760 bytes, anyone has any idea of it? According to the third party integration vendor which we use to move data across from sql server to bigquery table, they advised it is an issue by bigquery?
Thanks.
Best regards,
BigQuery has some maximum limitations and also has some quotas policies as you can see here.
The limitations for Streaming are:
If you do not populate the insertId field when you insert rows:
Maximum rows per second: 1,000,000
Maximum bytes per second: 1 GB
If you populate the insertId field when you insert rows:
Maximum rows per second: 100,000
Maximum bytes per second: 100 MB
The following additional streaming quotas apply whether or not you populate the insertId field:
Maximum row size: 1 MB
HTTP request size limit: 10 MB
Maximum rows per request: 10,000 rows per request
insertId field length: 128
I hope it helps
Indeed the streaming limit is 10MB per request.
Row size is 1MB according to https://cloud.google.com/bigquery/quotas
What you need to do is parallelize the streaming jobs. BigQuery supports up to 1M/rows per second.

BQ Load error : Avro parsing error in position 893786302. Size of data block 27406834 is larger than the maximum allowed value 16777216

To BigQuery experts,
I am working on the process which requires us to represent customers shopping history in way where we concatenate all last 12 months of transactions in a single column for Solr faceting using prefixes.
while trying to load this data in BIG Query, we are getting below row limit exceed error. Is there any way to get around this? the actual tuple size is around 64 mb where as the avro limit is 16mb.
[ ~]$ bq load --source_format=AVRO --allow_quoted_newlines --max_bad_records=10 "syw-dw-prod":"MAP_ETL_STG.mde_golden_tbl" "gs://data/final/tbl1/tbl/part-m-00005.avro"
Waiting on bqjob_r7e84784c187b9a6f_0000015ee7349c47_1 ... (5s) Current status: DONE
BigQuery error in load operation: Error processing job 'syw-dw-prod:bqjob_r7e84784c187b9a6f_0000015ee7349c47_1': Avro parsing error in position 893786302. Size of data
block 27406834 is larger than the maximum allowed value 16777216.
Update: This is no longer true, the limit has been lifted.
BigQuery's limit on loaded Avro file's block size is 16MB (https://cloud.google.com/bigquery/quotas#import). Unless each row is actually greater than 16MB, you should be able to split up the rows into more blocks to stay within the 16MB block limit. Using a compression codec may reduce the block size.

Best Firebird blob size page size relation

I have a small Firebird 2.5 database with a blob field called "note" declared as this:
BLOB SUB_TYPE 1 SEGMENT SIZE 80 CHARACTER SET UTF8
The data base page size is:
16.384 (That I'm suspecting is too high)
I have ran this select in order to discover the average size of the blob fields available:
select avg(octet_length(items.note)) from items
and got this information:
2.671
As a beginner, I would like to know the better segment size for this blob field and the best database page size in your opinion (I know that this depends of other information, but I still don't know how to figure it out).
Blobs in Firebird are stored in separate pages of your database. The exact storage format depends on the size of your blob. As described in Blob Internal Storage:
Blobs are created as part of a data row, but because a blob could be
of unlimited length, what is actually stored with the data row is a
blobid, the data for the blob is stored separately on special blob
pages elsewhere in the database.
[..]
A blob page stores data for a blob. For large blobs, the blob page
could actually be a blob pointer page, i.e. be used to store pointers
to other blob pages. For each blob that is created a blob record is
defined, the blob record contains the location of the blob data, and
some information about the blobs contents that will be useful to the
engine when it is trying to retrieve the blob. The blob data could be
stored in three slightly different ways. The storage mechanism is
determined by the size of the blob, and is identified by its level
number (0, 1 or 2). All blobs are initially created as level 0, but
will be transformed to level 1 or 2 as their size increases.
A level 0 blob is a blob that can fit on the same page as the blob
header record, for a data page of 4096 bytes, this would be a blob of
approximately 4052 bytes (Page overhead - slot - blob record header).
In other words, if your average size of blobs is 2671 bytes (and most larger ones are still smaller than +/- 4000 bytes), then likely a page size of 4096 is optimal as it will reduce wasted space from on average 16340 - 2671 = 13669 bytes to 4052 - 2671 = 1381 bytes.
However for performance itself this likely hardly going to matter, and smaller page sizes have other effects that you will need to take into account. For example a smaller page size will also reduce the maximum size of a CHAR/VARCHAR index key, indexes might become deeper (more levels), and less records fit in a single page (or wider records become split over multiple pages).
Without measuring and testing it is hard to say if using 4096 for the page size is the right size for your database.
As to segment sizes: it is a historic artifact that is best ignored (and left off). Sometimes applications or drivers incorrectly assume that blobs need to be written or read in the specified segment size. In those rare cases specifying a larger segment size might improve performance. If you leave it off, Firebird will default to a value of 80.
From Binary Data Types:
Segment Size: Specifying the BLOB segment is throwback to times past,
when applications for working with BLOB data were written in C
(Embedded SQL) with the help of the gpre pre-compiler. Nowadays, it is
effectively irrelevant. The segment size for BLOB data is determined
by the client side and is usually larger than the data page size, in
any case.

What is Big Queries maximum row size?

When trying to load data into a big query table, I get an error telling me a row is larger than the maximum allowed size. I could not find this limitation anywhere in the documentation. What is the limit? And is there a workaround?
The file is compressed json and is 360M.
2018 update: 100 MB maximum row size. https://cloud.google.com/bigquery/quotas
The maximum row size is 64k. See: https://developers.google.com/bigquery/docs/import#import
The limitation for json will likely increase soon.
2013 update: The maximum row size is 1MB, and 20MB for JSON.
See: https://developers.google.com/bigquery/preparing-data-for-bigquery#dataformats
2017 update: 10MB for CSV & JSON. https://cloud.google.com/bigquery/quotas#import
Except 1MB if streaming, https://cloud.google.com/bigquery/quotas#streaminginserts