BigQuery's query is extremely slow - google-bigquery

I have a table with 1.6 billion rows. I have been running a query that uses a group-by field that has over 5 million unique values and then sort by sum of another integer value in descending order and finally return only the top 10. Notice after more than an hour, that query is still stuck in running state.
I have created this big table by using "bq cp -a ". Originally those source tables are "bq cp" from 1000 smaller tables and each table were loaded from over 12 compressed csv load files.
I have searched related question and found "Google BigQuery is running queries slowly" mention slowness caused by fragmentation from a lot of small ingestion. Is my approach of data infestion consider as "too small data bit" during ingestion which caused fragmentation?
Is it possible 5 million unique values is too much and that is the root cause of slow response?

We've had a latency spike yesterday, and a smaller one today. Can you give project id + job ids of query jobs that took longer than you expected?

Related

BigQuery extremely slow for small table queries with many WHERE IN parameters

I'd like to understand why a query such as the one below on an 8mb table in BigQuery takes 10-15 seconds.
If I remove most of the WHERE IN parameters it performs as expected.
When comparing performance locally vs BigQuery with all the WHERE IN parameters it's about 20 times faster on my old netbook than it is on BigQuery.
SELECT id
FROM `example.example.emails`
WHERE id IN (104281,100567,100701,100941,101087,101157,101337,101391,101513,101831,101967,101973,102111,102403,102613,102653,102671,102843,103597,103613,103623,103729,104145,104163,104315,104325,104537,105165,105179,105257,105275,105291,105351,105363,105401,105797,105811,105893,106007,106315,106329,106363,106743,106829,107213,107371,107499,107501,107561,107677,107759,107873,107899,108029,108061,108093,108153,108537,108561,108599,108839,108861,108863,108895,108907,109011,109113,109227,109845,109931,110061,110221,110227,110267,110325,110471,110521,110553,110685,110977,110989,111179,111249,111291,111733,112015,112107,112139,112141,112431,112461,112641,112835,112971,113101,113123,113227,113281,113449,113549,113609,113661,113685,113905,113961,114015,114119,114427,114459,114609,114713,114839,114973,115187,115265,115453,115491,115497,115911,115915,116099,116497,116553,116825,116869,117005,117059,117089,117105,117235,117423,117597,117787,117929,118147,118297,118535,118573,118643,118841,118987,119147,119195,119273,119335,119389,119469,119555,119997,120369,120761,120875,120977,121029,121063,121107,121127,121155,121159,121187,121251,121265,121317,121337,121339,121363,121371,121427,121457,121467,121637,121683,121701,121827,121915,121949,122119,122441,122619,122663,122669,122737,122867,122919,122999,123121,123139,123257,123655,123741,123889,124123,124145,124221,124309,124833,124837,124909,124925,124991,125015,125019,125073,125391,125467,125725,125743,125827,125851,125895,126033,126069,126107,126593,126687,127005,127017,127059,127235,127473,128137,128201,128267,128323,128417,128539,128741,128751,129049,129245,129325,129691,129877,130329,130337,130769,131231,131317,131389,131413,131505,131591,131725,131961,131971,132081,132155,132277,132297,132299,132359,132361,132363,132365,132367,132375,132599,132613,132625,132635,132721,132747,132925,132947,133401,133493,133525,134283,134787,134801,134803,134807,134837,134845,135161,135195,135199,135271,135283,135285,135303,135311,135347,135361,135707,135869,135889,135913,136155,136205,136523,136621,136751,136757,136845,136875,136877,137127,137143,137155,137509,137623,137637,137693,137719,137739,138013,138279,138343,138553,138731,138761,138847,138929,139413,139489,139507,139995,140031,140105,140725,140813,140867,140883,140917,140993,141109,141135,141197,141265,141721,141729,141763,141849,141907,141939,141971,141973,142033,14244,142779,142797,14290,142931,143029,143153,143403,143639,143707,143711,143713,143737,143739,143745,143773,143779,143783,143791,143793,143807,143827,143829,143833,143849,143851,143853,143855,143863,143883,143887,143889,143967,144219,144311,14445,144569,144745,145357,145439,145545,145679,145681,145757,146133,146391,146407,146475,146605,146751,146989,147089,147161,147381,147543,147679,147693,14771,14783,147889,147969,14817,148533,14861,148761,148781,149101,149157,149231,149331,149333,14936,149517,149525,149725,149745,149857,15001,150079,150139,150167,150259,150345,150697,15084,151037,151115,151341,151399,15140,151423,151431,151503,151513,151543,151603,151607,151617,152079,15228,152345,152413,152529,152635,152805,152889,152901,152933,152955,152965,152977,153003,153099,153165,153189,153277,153533,153749,15401,154307,15436,154419,154667,154687,15471,154863,154883,15506,155115,155215,155321,155467,155491,155529,155549,155697,155923,155971,156279,156311,156315,156319,156389,156525,156535,156539,156617,156643,156685,156689,156695,156715,15689,157009,157551,157607,157765,157839,157945,157997,158015,15805,158059,15825,15827,158393,158413,15850,158537,158641,158669,15874,158889,159141,159455,159565,159637,15965,159795,159821,159843,160063,160095,160105,160355,160443,160535,160537,160549,16083,16109,161173,161201,161371,161563,162009,162123,162175,162281,162513,162661,162675,162765,162801,16304,163043,16345,164021,16444,164659,164831,164967,165541,165551,165563,165631,165639,165755,165883,165959,165985,165991,166009,166027,166029,166407,166591,166727,166769,166809,166847,166973,167125,167459,167551,167573,167655,16791,16797,168371,16856,168981,169729,169743,169865,16992,170079,170149,17050,170559,170587,170847,17098,171047,171199,17122,171253,171281,171379,171543,17157,171593,171713,171837,172005,172015,172041,17223,17240,17248,172503,172709,172953,173423,173437,173497,173527,173555,173655,173693,173711,173743,173765,173805,173823,173825,173897,174007,174225,174261,174337,174447,174599,17472,174805,174853,174899,175263,175339,17540,175489,17550,175607,175655,175813,175853,176213,176247,176355,176423,176429,176561,176683,176851,176857,17688,177021,17722,17726,177431,177543,177797,178135,178235,178409,178677,178725,178855,178907,178981,179053,17906,179507,179557,179625,179857,180249,180269,181049,181067,181097,181305,181429,181499,181565,181701,181771,182017,182169,182321,182689,18276,183061,183261,183589,183895,183897,183947,184091,184171,184191,184627,184635,184653,184659,184797,184907,185291,185905,185981,186513,186519,186541,186567,186665,18676,186799,187243,187247,18752,187585,187591,187617,187941,187949,188123,188281,18831,188581,188587,188607,188743,188965,189221,189393,189513,18969,189743,189787,189869,190249,190539,190557,190769,190777,190801,190903,190949,191359,191417,191447,191561,191819,191823,191873,191879,191975,192001,192111,19231,192497,192617,192737,192851,192909,19297,19300,193069,193095,193155,19328,193459,193645,193971,193979,194027,194053,194657,194837,195225,19527,195271,195321,195503,196013,196107,196673,196843,196863,196865,196959,197043,197261,197489,197569,197765,198085,198135,198233,198399,198513,19887,20163,20214,20222,20250,20346,20396,20468,20632,20664,20679,20736,20764,20766,20780,20789,20809,20813,20847,20861,20899,20906,20914,20951,21071,21107,21132,21171,21176,21348,21428,21475,21631,21937,22043,22107,22241,22297,22729,22750,22755,22787,22814,22865,23034,23173,23206,23260,23276,23311,23350,23355,23395,23420,23445,23545,23726,23839,23899,23907,24402,24619,24643,24692,24712,24781,24786,25098,25112,25236,25266,25393,25428,25439,25502,25512,25553,25734,25735,25753,25789,25800,25814,25960,26007,26068,26083,26084,26085,26152,26164,26212,26231,26260,26283,26376,26411,26617,26652,26664,26741,26758,26773,26790,26885,26949,26976,27100,27193,27287,27325,27400,27467,27528,27587,27714,27814,27840,27911,28020,28102,28239,28246,28262,28279,28306,28316,28364,28387,28447,28533,28555,28602,28709,28824,28870,29031,29076,29095,29119,29159,29161,29331,29340,29420,29464,29532,29643,29781,29898,30017,30020,30033,30084,30123,30124,30140,30150,30158,30176,30216,30249,30276,30305,30343,30350,30397,30403,30581,30595,30731,30787,30793,30842,31063,31153,31182,31186,31193,31247,31390,31437,31460,31548,31559,31683,31703,31722,31736,31837,31918,31977,32006,32021,32142,32247,32262,32283,32297,32442,32532,32591,32607,32662,32687,32724,32764,32821,32904,32905,32921,32928,32993,33172,33185,33240,33261,33299,33303,33334,33336,33341,33452,33490,33539,33584,33627,33635,33698,33724,33740,33945,33964,33971,33982,34082,34211,34242,34244,34390,34397,34402,34438,34471,34654,34684,34759,34886,34920,34978,34981,35015,35068,35165,35175,35276,35390,35438,35486,35628,35668,35777,35819,35863,36008,36035,36062,36070,36130,36157,36170,36190,36336,36413,36420,36450,36469,36475,36575,36608,36612,36651,36825,36826,36875,37008,37081,37106,37124,37213,37326,37402,37688,37898,38020,38079,38123,38170,38198,38241,38646,38883,38990,39030,39050,39126,39141,39165,39300,39316,39355,39387,39431,39447,39480,39481,39505,39559,39642,39717,39776,39783,39787,39896,40006,40041,40055,40138,40217,40243,40256,40286,40299,40313,40344,40391,40419,40452,40547,40560,40564,40670,40699,40726,40756,40825,40897,40948,41024,41078,41091,41131,41276,41283,41316,41336,41471,41476,41506,41677,41698,41771,41835,41880,42104,42248,42251,42292,42316,42324,42430,42439,42443,42444,42460,42517,42530,42607,42668,42688,42750,42755,42792,42846,42951,42987,43077,43212,43313,43327,43471,43500,43550,43577,43582,43599,43654,43734,43774,43788,43810,43857,43859,43914,44211,44408,44503,44644,44823,44839,44852,44912,44939,44943,44981,44997,45089,45137,45186,45189,45455,45586,45670,45785,45864,45903,46001,46032,46139,46147,46294,46392,46508,46518,46545,46548,46575,46837,46840,46842,46860,46890,46907,46935,47000,47066,47180,47211,47216,47217,47251,47281,47660,47698,47879,48166,48207,48381,48415,48466,48541,48567,48674,48765,48779,48799,48824,48893,49018,49052,49115,49326,49331,49342,49345,49439,49468,49729,49743,49750,49789,49977,50038,50080,50148,50313,50315,50325,50374,50388,50410,50414,50430,50573,50696,50743,50769,50795,50803,50858,50891,50895,50964,50990,51010,51082,51114,51132,51152,51291,51324,51366,51575,51875,51955,51961,52028,52071,52184,52236,52239,52277,52502,52666,52710,52772,52783,52855,52933,52959,52974,53021,53053,53098,53182,53259,53320,53324,53416,53475,53660,53662,53676,53706,53787,53921,53995,54063,54077,54190,54256,54263,54374,54377,54408,54554,54563,54570,54586,54590,54607,54983,55014,55079,55101,55152,55248,55264,55275,55282,55290,55351,55453,55510,55635,55649,55760,55773,55788,55968,56208,56242,56358,56447,56459,56529,56536,56537,56619,56704,56736,56942,56991,57009,57064,57150,57152,57153,57216,57241,57372,57435,57535,57593,57792,57816,57854,57903,57928,57934,57945,58209,58214,58272,58273,58307,58320,58325,58461,58467,58512,58522,58593,58604,58683,58758,58938,58969,59074,59202,59220,59271,59278,59286,59297,59324,59341,59377,59378,59406,59453,59458,59468,59505,59515,59519,59527,59553,59568,59594,59627,59651,59919,59920,59979,60013,60038,60163,60360,60390,60414,60489,60506,60606,60708,60718,60720,60805,60885,60934,60935,60941,60981,61030,61042,61206,61303,61344,61516,61537,61709,61725,61743,62167,62293,62343,62514,62630,62634,62708,62763,62771,63053,63076,63183,63411,63483,63493,63542,63583,63631,63695,63700,63723,63741,63762,63772,63846,63872,63893,63911,63947,63963,63971,63986,64076,64082,64106,64126,64211,64232,64249,64461,64473,64482,64483,64583,64594,64664,64733,64760,64785,64866,64873,64974,65009,65010,65043,65089,65093,65217,65259,65342,65397,65489,65539,65682,66261,66265,66272,66436,66634,66751,67165,67252,67363,67472,67487,67626,67856,67901,67914,68085,68117,68402,68404,68623,68674,68757,68770,68794,68834,68845,68867,68903,69107,69175,69209,69247,69315,69486,69652,69755,69801,69809,70027,70239,70299,70361,70782,70834,70851,70884,70952,71112,71139,71154,71411,71557,71615,71711,71718,71806,71871,71964,71986,72066,72083,72603,72709,72999,73063,73287,73293,73406,73606,73726,73728,73734,73766,73778,73899,74326,74421,74727,74835,74859,75047,75127,75207,75583,76031,76061,76069,76217,76259,76333,76423,76553,76679,76687,76699,76723,76847,76877,76925,76951,77192,77299,77307,77411,77681,77819,77847,77933,77981,78005,78099,78431,78745,78955,79009,79093,79121,79159,79165,79239,79403,79787,79909,80001,80037,80069,80279,80449,80521,80717,80849,80941,80997,81006,81125,81265,81267,81355,81409,81560,81609,81717,81723,81935,81953,82019,82331,82461,82487,82631,82655,82839,82969,83001,83061,83137,83331,83413,83437,83603,83665,83929,83937,83941,83947,83949,83953,83987,84031,84093,84127,84141,84171,84185,84189,84191,84201,84309,84331,84343,84361,84383,84435,84439,84489,84501,84523,84535,84573,84613,84665,84779,84833,84932,84991,85015,85347,85367,85445,85495,85559,85571,85797,85807,85973,86157,86203,86223,86275,86447,86523,86615,86897,86915,87039,87103,87281,87475,87795,87873,89075,89197,89205,89241,89251,89261,89695,89721,89741,89783,90325,90425,90845,90861,90915,90995,91009,91091,91131,91165,91169,91197,91201,91233,91307,91429,91477,91479,91491,91511,91567,91649,91999,92339,92457,92617,92759,92765,92907,93035,93417,93525,93691,93763,93805,93917,94043,94051,94193,94303,94847,95041,95433,95475,95545,95557,95645,95749,95885,96153,96211,96417,96479,96567,96639,96651,96673,96773,96791,96853,96921,96943,97053,97073,97495,97597,97625,97681,97697,97727,98241,98377,98519,98817,98895,98937,98991,98993,99007,99067,99555,99569,99643,99647,99657,99781,137953)
I don't believe my table to be anything unusual.
Just in case anyone asks why I'm running BigQuery on an 8mb table...this 8mb table is part of a data set that contains much larger tables 1TB+ and is far easier for me/my application if it's all in one place.
EDIT
Here is the execution data for a similar query with 3140 INs on a 14mb table. This takes 50 seconds.
BigQuery team flagged it as a bug. The workaround is to use WHERE id IN UNNEST([1,2,3,...]) instead of WHERE id IN (1,2,3,...).

Bigquery Tier 1 exceeded for partitioned table but not for by day tables

We have two tables in bigquery: one is large (a couple billion rows) and on a 'table-per-day' basis, the other is date partitioned, has the exact same schema but a small subset of the rows (~100 million rows).
I wanted to run a (standard-sql) query with a subselect in form of a join (same when subselect is in the where clause) on the small partitioned dataset. I was not able to run it because tier 1 was exceeded.
If I run the same query on the big dataset (that contains the data I need and a lot of other data) it runs fine.
I do not understand the reason for this.
Is it because:
Partitioned tables need more resources to query
Bigquery has some internal rules that the ratio of data processed to resources needed must meet a certain threshold, i.e. I was not paying enough when I queried the small dataset given the amount of resources I needed.
If 1. is true, we could simply make the small dataset also be on a 'table-per-day' basis in order to solve the issue. But before we do that though we would like to know if it is really going to solve our problem.
Details on the queries:
Big datset
queries 11 GB, runs 50 secs, Job Id remilon-study:bquijob_2adced71_15bf9a59b0a
Small dataset
Job Id remilon-study:bquijob_5b09f646_15bf9acd941
I'm an engineer on BigQuery and I just took a look at your jobs but it looks like your second query has an additional filter with a nested clause that your first query does not. It is likely that that extra processing is making your query exceed your tier. I would recommend running the queries in the BigQuery UI and looking at the Explanation tab to see how the queries differ in the query plan.
If you try running the exact same query (modifying only the partition syntax) for both tables and still get the same error I would recommend filing a bug.

Loading huge flatfiles to SQL table is too slow via SSIS package

I receive about 8 huge delimited flatfiles to be loaded into an SQL server (2012)table once every week. Total number of rows in all the files would be about 150 million and each file has different number of rows. I have a simple SSIS package which loads data from flatfiles(using foreach container) into a history table. And then a select query runs on this history table to select current weeks data and loads into a staging table.
We ran into problems as history table grew very large(8 billion rows). So I decided to back up the data in history table and truncate. Before truncation the package execution time ranged from 15hrs to 63 hrs in that order.We hoped after truncation it should go back to 15hrs or less.But to my surprise even after 20+ hours the package is still running. The worst part is that it is still loading the history table. Latest count is around 120 million. It still has to load the staging data and it might take just as long.
Neither history table nor staging tables have any indexes, which is why select query on the history table used to take most of the execution time. But loading from all the flatfiles to history table was always under 3 hrs.
I hope i'm making sense. Can someone help me understand what could be the reason behind this unusual execution time for this week? Thanks.
Note: The biggest file(8GB) was read at flatfile source in 3 minutes. So I'm thinking source is not the bottle neck here.
There's no good reason, IMHO, why that server should take that long to load that much data. Are you saying that the process which used to take 3 hours, now takes 60+? Is it the first (data-load) or the second (history-table) portion that has suddenly become slow? Or, both at once?
I think the first thing that I would do is to "trust, but verify" that there are no indexes at play here. The second thing I'd look at is the storage allocation for this tablespace ... is it running out of room, such that the SQL server is having to do a bunch of extra calesthenics to obtain and to maintain storage? How does this process COMMIT? After every row? Can you prove that the package definition has not changed in the slightest, recently?
Obviously, "150 million rows" is not a lot of data, these days; neither is 8GB. If you were "simply" moving those rows into an un-indexed table, "3 hours" would be a generous expectation. Obviously, the only credible root-cause of this kind of behavior is that the disk-I/O load has increased dramatically, and I am healthily suspicious that "excessive COMMITs" might well be part of the cause: re-writing instead of "lazy-writing," re-reading instead of caching.

BigQuery performance: Is this correct?

Folks, I'm using BigQuery as a superfast database for my analytics queries, but I'm very disappointed with its performance.
Let me show you the numbers:
Just one Table at "from" clause
Select about 15 fields with group by each, about 5 fields with SUM()
Total table rows: 3.7 millions
Total rows returned: 830K
When I execute this query on BigQuery's console, it takes about 1 minute to process. Is this ok for you? I was expecting that it will return in about 2 seconds... If I execute this query on a columnar database, like Sybase IQ, it takes less than 2 seconds.
Big Query is a highly scalable database, before being a "super fast" database. It's designed to process HUGE amount of data distributing the processing among several different machines using a technique named Dremel. Because it's designed to use several machines and parallel processing, you should expect to have super-scalability with a good performance.
For example: analyzing all the wikipedia revisions in 5-10 seconds isn't bad, is it? But even a much smaller table would take about the same time.
Sybase IQ is often installed in a single database and it doesn't use Dremel. That said, it's going to be faster than Big Query in many scenarios...as designed.
Cheers!
Since you are returning 830k rows and BQ is always creating a temporary result table, the creation is more than a small result.
Have you turned on large results?
We are working in a shared environment and sometime loads ( table creation ) takes a while.
Certainly the performance differ from a dedicated environment. You get your dedicated environment for 20K$ a month.

How big is too big for a PostgreSQL table?

I'm working on the design for a RoR project for my company, and our development team has already run into a bit of a debate about the design, specifically the database.
We have a model called Message that needs to be persisted. It's a very, very small model with only three db columns other than the id, however there will likely be A LOT of these models when we go to production. We're looking at as much as 1,000,000 insertions per day. The models will only ever be searched by two foreign keys on them which can be indexed. As well, the models never have to be deleted, but we also don't have to keep them once they're about three months old.
So, what we're wondering is if implementing this table in Postgres will present a significant performance issue? Does anyone have experience with very large SQL databases to tell us whether or not this will be a problem? And if so, what alternative should we go with?
Rows per a table won't be an issue on it's own.
So roughly speaking 1 million rows a day for 90 days is 90 million rows. I see no reason Postgres can't deal with that, without knowing all the details of what you are doing.
Depending on your data distribution you can use a mixture of indexes, filtered indexes, and table partitioning of some kind to speed thing up once you see what performance issues you may or may not have. Your problem will be the same on any other RDMS that I know of. If you only need 3 months worth of data design in a process to prune off the data you don't need any more. That way you will have a consistent volume of data on the table. Your lucky you know how much data will exist, test it for your volume and see what you get. Testing one table with 90 million rows may be as easy as:
select x,1 as c2,2 as c3
from generate_series(1,90000000) x;
https://wiki.postgresql.org/wiki/FAQ
Limit Value
Maximum Database Size Unlimited
Maximum Table Size 32 TB
Maximum Row Size 1.6 TB
Maximum Field Size 1 GB
Maximum Rows per Table Unlimited
Maximum Columns per Table 250 - 1600 depending on column types
Maximum Indexes per Table Unlimited
Another way to speed up your queries significantly on a table with > 100 million rows is to cluster the table on the index that is most often used in your queries. Do this in your database's "off" hours. We have a table with > 218 million rows and have found 30X improvements.
Also, for a very large table, it's a good idea to create an index on your foreign keys.
EXAMPLE:
Assume we have a table named investment in a database named ccbank.
Assume the index most used in our queries is (bankid,record_date)
Here are the steps to create and cluster an index:
psql -c "drop index investment_bankid_rec_dt_idx;" ccbank
psql -c "create index investment_bankid_rec_dt_idx on investment(bankid, record_date);"
psql -c "cluster investment_bankid_rec_dt_idx on investment;"
vacuumdb -d ccbank -z -v -t investment
In steps 1-2 we replace the old index with a new, optimized one. In step 3 we cluster the table: this basically puts the DB table in the physical order of the index, so that when PostgreSQL performs a query it caches the most likely next rows. In step 4 we vacuum the database to reset the statistics for the query planner.