BigQuery performance: Is this correct? - google-bigquery

Folks, I'm using BigQuery as a superfast database for my analytics queries, but I'm very disappointed with its performance.
Let me show you the numbers:
Just one Table at "from" clause
Select about 15 fields with group by each, about 5 fields with SUM()
Total table rows: 3.7 millions
Total rows returned: 830K
When I execute this query on BigQuery's console, it takes about 1 minute to process. Is this ok for you? I was expecting that it will return in about 2 seconds... If I execute this query on a columnar database, like Sybase IQ, it takes less than 2 seconds.

Big Query is a highly scalable database, before being a "super fast" database. It's designed to process HUGE amount of data distributing the processing among several different machines using a technique named Dremel. Because it's designed to use several machines and parallel processing, you should expect to have super-scalability with a good performance.
For example: analyzing all the wikipedia revisions in 5-10 seconds isn't bad, is it? But even a much smaller table would take about the same time.
Sybase IQ is often installed in a single database and it doesn't use Dremel. That said, it's going to be faster than Big Query in many scenarios...as designed.
Cheers!

Since you are returning 830k rows and BQ is always creating a temporary result table, the creation is more than a small result.
Have you turned on large results?
We are working in a shared environment and sometime loads ( table creation ) takes a while.
Certainly the performance differ from a dedicated environment. You get your dedicated environment for 20K$ a month.

Related

BigQuery extremely slow for small table queries with many WHERE IN parameters

I'd like to understand why a query such as the one below on an 8mb table in BigQuery takes 10-15 seconds.
If I remove most of the WHERE IN parameters it performs as expected.
When comparing performance locally vs BigQuery with all the WHERE IN parameters it's about 20 times faster on my old netbook than it is on BigQuery.
SELECT id
FROM `example.example.emails`
WHERE id IN (104281,100567,100701,100941,101087,101157,101337,101391,101513,101831,101967,101973,102111,102403,102613,102653,102671,102843,103597,103613,103623,103729,104145,104163,104315,104325,104537,105165,105179,105257,105275,105291,105351,105363,105401,105797,105811,105893,106007,106315,106329,106363,106743,106829,107213,107371,107499,107501,107561,107677,107759,107873,107899,108029,108061,108093,108153,108537,108561,108599,108839,108861,108863,108895,108907,109011,109113,109227,109845,109931,110061,110221,110227,110267,110325,110471,110521,110553,110685,110977,110989,111179,111249,111291,111733,112015,112107,112139,112141,112431,112461,112641,112835,112971,113101,113123,113227,113281,113449,113549,113609,113661,113685,113905,113961,114015,114119,114427,114459,114609,114713,114839,114973,115187,115265,115453,115491,115497,115911,115915,116099,116497,116553,116825,116869,117005,117059,117089,117105,117235,117423,117597,117787,117929,118147,118297,118535,118573,118643,118841,118987,119147,119195,119273,119335,119389,119469,119555,119997,120369,120761,120875,120977,121029,121063,121107,121127,121155,121159,121187,121251,121265,121317,121337,121339,121363,121371,121427,121457,121467,121637,121683,121701,121827,121915,121949,122119,122441,122619,122663,122669,122737,122867,122919,122999,123121,123139,123257,123655,123741,123889,124123,124145,124221,124309,124833,124837,124909,124925,124991,125015,125019,125073,125391,125467,125725,125743,125827,125851,125895,126033,126069,126107,126593,126687,127005,127017,127059,127235,127473,128137,128201,128267,128323,128417,128539,128741,128751,129049,129245,129325,129691,129877,130329,130337,130769,131231,131317,131389,131413,131505,131591,131725,131961,131971,132081,132155,132277,132297,132299,132359,132361,132363,132365,132367,132375,132599,132613,132625,132635,132721,132747,132925,132947,133401,133493,133525,134283,134787,134801,134803,134807,134837,134845,135161,135195,135199,135271,135283,135285,135303,135311,135347,135361,135707,135869,135889,135913,136155,136205,136523,136621,136751,136757,136845,136875,136877,137127,137143,137155,137509,137623,137637,137693,137719,137739,138013,138279,138343,138553,138731,138761,138847,138929,139413,139489,139507,139995,140031,140105,140725,140813,140867,140883,140917,140993,141109,141135,141197,141265,141721,141729,141763,141849,141907,141939,141971,141973,142033,14244,142779,142797,14290,142931,143029,143153,143403,143639,143707,143711,143713,143737,143739,143745,143773,143779,143783,143791,143793,143807,143827,143829,143833,143849,143851,143853,143855,143863,143883,143887,143889,143967,144219,144311,14445,144569,144745,145357,145439,145545,145679,145681,145757,146133,146391,146407,146475,146605,146751,146989,147089,147161,147381,147543,147679,147693,14771,14783,147889,147969,14817,148533,14861,148761,148781,149101,149157,149231,149331,149333,14936,149517,149525,149725,149745,149857,15001,150079,150139,150167,150259,150345,150697,15084,151037,151115,151341,151399,15140,151423,151431,151503,151513,151543,151603,151607,151617,152079,15228,152345,152413,152529,152635,152805,152889,152901,152933,152955,152965,152977,153003,153099,153165,153189,153277,153533,153749,15401,154307,15436,154419,154667,154687,15471,154863,154883,15506,155115,155215,155321,155467,155491,155529,155549,155697,155923,155971,156279,156311,156315,156319,156389,156525,156535,156539,156617,156643,156685,156689,156695,156715,15689,157009,157551,157607,157765,157839,157945,157997,158015,15805,158059,15825,15827,158393,158413,15850,158537,158641,158669,15874,158889,159141,159455,159565,159637,15965,159795,159821,159843,160063,160095,160105,160355,160443,160535,160537,160549,16083,16109,161173,161201,161371,161563,162009,162123,162175,162281,162513,162661,162675,162765,162801,16304,163043,16345,164021,16444,164659,164831,164967,165541,165551,165563,165631,165639,165755,165883,165959,165985,165991,166009,166027,166029,166407,166591,166727,166769,166809,166847,166973,167125,167459,167551,167573,167655,16791,16797,168371,16856,168981,169729,169743,169865,16992,170079,170149,17050,170559,170587,170847,17098,171047,171199,17122,171253,171281,171379,171543,17157,171593,171713,171837,172005,172015,172041,17223,17240,17248,172503,172709,172953,173423,173437,173497,173527,173555,173655,173693,173711,173743,173765,173805,173823,173825,173897,174007,174225,174261,174337,174447,174599,17472,174805,174853,174899,175263,175339,17540,175489,17550,175607,175655,175813,175853,176213,176247,176355,176423,176429,176561,176683,176851,176857,17688,177021,17722,17726,177431,177543,177797,178135,178235,178409,178677,178725,178855,178907,178981,179053,17906,179507,179557,179625,179857,180249,180269,181049,181067,181097,181305,181429,181499,181565,181701,181771,182017,182169,182321,182689,18276,183061,183261,183589,183895,183897,183947,184091,184171,184191,184627,184635,184653,184659,184797,184907,185291,185905,185981,186513,186519,186541,186567,186665,18676,186799,187243,187247,18752,187585,187591,187617,187941,187949,188123,188281,18831,188581,188587,188607,188743,188965,189221,189393,189513,18969,189743,189787,189869,190249,190539,190557,190769,190777,190801,190903,190949,191359,191417,191447,191561,191819,191823,191873,191879,191975,192001,192111,19231,192497,192617,192737,192851,192909,19297,19300,193069,193095,193155,19328,193459,193645,193971,193979,194027,194053,194657,194837,195225,19527,195271,195321,195503,196013,196107,196673,196843,196863,196865,196959,197043,197261,197489,197569,197765,198085,198135,198233,198399,198513,19887,20163,20214,20222,20250,20346,20396,20468,20632,20664,20679,20736,20764,20766,20780,20789,20809,20813,20847,20861,20899,20906,20914,20951,21071,21107,21132,21171,21176,21348,21428,21475,21631,21937,22043,22107,22241,22297,22729,22750,22755,22787,22814,22865,23034,23173,23206,23260,23276,23311,23350,23355,23395,23420,23445,23545,23726,23839,23899,23907,24402,24619,24643,24692,24712,24781,24786,25098,25112,25236,25266,25393,25428,25439,25502,25512,25553,25734,25735,25753,25789,25800,25814,25960,26007,26068,26083,26084,26085,26152,26164,26212,26231,26260,26283,26376,26411,26617,26652,26664,26741,26758,26773,26790,26885,26949,26976,27100,27193,27287,27325,27400,27467,27528,27587,27714,27814,27840,27911,28020,28102,28239,28246,28262,28279,28306,28316,28364,28387,28447,28533,28555,28602,28709,28824,28870,29031,29076,29095,29119,29159,29161,29331,29340,29420,29464,29532,29643,29781,29898,30017,30020,30033,30084,30123,30124,30140,30150,30158,30176,30216,30249,30276,30305,30343,30350,30397,30403,30581,30595,30731,30787,30793,30842,31063,31153,31182,31186,31193,31247,31390,31437,31460,31548,31559,31683,31703,31722,31736,31837,31918,31977,32006,32021,32142,32247,32262,32283,32297,32442,32532,32591,32607,32662,32687,32724,32764,32821,32904,32905,32921,32928,32993,33172,33185,33240,33261,33299,33303,33334,33336,33341,33452,33490,33539,33584,33627,33635,33698,33724,33740,33945,33964,33971,33982,34082,34211,34242,34244,34390,34397,34402,34438,34471,34654,34684,34759,34886,34920,34978,34981,35015,35068,35165,35175,35276,35390,35438,35486,35628,35668,35777,35819,35863,36008,36035,36062,36070,36130,36157,36170,36190,36336,36413,36420,36450,36469,36475,36575,36608,36612,36651,36825,36826,36875,37008,37081,37106,37124,37213,37326,37402,37688,37898,38020,38079,38123,38170,38198,38241,38646,38883,38990,39030,39050,39126,39141,39165,39300,39316,39355,39387,39431,39447,39480,39481,39505,39559,39642,39717,39776,39783,39787,39896,40006,40041,40055,40138,40217,40243,40256,40286,40299,40313,40344,40391,40419,40452,40547,40560,40564,40670,40699,40726,40756,40825,40897,40948,41024,41078,41091,41131,41276,41283,41316,41336,41471,41476,41506,41677,41698,41771,41835,41880,42104,42248,42251,42292,42316,42324,42430,42439,42443,42444,42460,42517,42530,42607,42668,42688,42750,42755,42792,42846,42951,42987,43077,43212,43313,43327,43471,43500,43550,43577,43582,43599,43654,43734,43774,43788,43810,43857,43859,43914,44211,44408,44503,44644,44823,44839,44852,44912,44939,44943,44981,44997,45089,45137,45186,45189,45455,45586,45670,45785,45864,45903,46001,46032,46139,46147,46294,46392,46508,46518,46545,46548,46575,46837,46840,46842,46860,46890,46907,46935,47000,47066,47180,47211,47216,47217,47251,47281,47660,47698,47879,48166,48207,48381,48415,48466,48541,48567,48674,48765,48779,48799,48824,48893,49018,49052,49115,49326,49331,49342,49345,49439,49468,49729,49743,49750,49789,49977,50038,50080,50148,50313,50315,50325,50374,50388,50410,50414,50430,50573,50696,50743,50769,50795,50803,50858,50891,50895,50964,50990,51010,51082,51114,51132,51152,51291,51324,51366,51575,51875,51955,51961,52028,52071,52184,52236,52239,52277,52502,52666,52710,52772,52783,52855,52933,52959,52974,53021,53053,53098,53182,53259,53320,53324,53416,53475,53660,53662,53676,53706,53787,53921,53995,54063,54077,54190,54256,54263,54374,54377,54408,54554,54563,54570,54586,54590,54607,54983,55014,55079,55101,55152,55248,55264,55275,55282,55290,55351,55453,55510,55635,55649,55760,55773,55788,55968,56208,56242,56358,56447,56459,56529,56536,56537,56619,56704,56736,56942,56991,57009,57064,57150,57152,57153,57216,57241,57372,57435,57535,57593,57792,57816,57854,57903,57928,57934,57945,58209,58214,58272,58273,58307,58320,58325,58461,58467,58512,58522,58593,58604,58683,58758,58938,58969,59074,59202,59220,59271,59278,59286,59297,59324,59341,59377,59378,59406,59453,59458,59468,59505,59515,59519,59527,59553,59568,59594,59627,59651,59919,59920,59979,60013,60038,60163,60360,60390,60414,60489,60506,60606,60708,60718,60720,60805,60885,60934,60935,60941,60981,61030,61042,61206,61303,61344,61516,61537,61709,61725,61743,62167,62293,62343,62514,62630,62634,62708,62763,62771,63053,63076,63183,63411,63483,63493,63542,63583,63631,63695,63700,63723,63741,63762,63772,63846,63872,63893,63911,63947,63963,63971,63986,64076,64082,64106,64126,64211,64232,64249,64461,64473,64482,64483,64583,64594,64664,64733,64760,64785,64866,64873,64974,65009,65010,65043,65089,65093,65217,65259,65342,65397,65489,65539,65682,66261,66265,66272,66436,66634,66751,67165,67252,67363,67472,67487,67626,67856,67901,67914,68085,68117,68402,68404,68623,68674,68757,68770,68794,68834,68845,68867,68903,69107,69175,69209,69247,69315,69486,69652,69755,69801,69809,70027,70239,70299,70361,70782,70834,70851,70884,70952,71112,71139,71154,71411,71557,71615,71711,71718,71806,71871,71964,71986,72066,72083,72603,72709,72999,73063,73287,73293,73406,73606,73726,73728,73734,73766,73778,73899,74326,74421,74727,74835,74859,75047,75127,75207,75583,76031,76061,76069,76217,76259,76333,76423,76553,76679,76687,76699,76723,76847,76877,76925,76951,77192,77299,77307,77411,77681,77819,77847,77933,77981,78005,78099,78431,78745,78955,79009,79093,79121,79159,79165,79239,79403,79787,79909,80001,80037,80069,80279,80449,80521,80717,80849,80941,80997,81006,81125,81265,81267,81355,81409,81560,81609,81717,81723,81935,81953,82019,82331,82461,82487,82631,82655,82839,82969,83001,83061,83137,83331,83413,83437,83603,83665,83929,83937,83941,83947,83949,83953,83987,84031,84093,84127,84141,84171,84185,84189,84191,84201,84309,84331,84343,84361,84383,84435,84439,84489,84501,84523,84535,84573,84613,84665,84779,84833,84932,84991,85015,85347,85367,85445,85495,85559,85571,85797,85807,85973,86157,86203,86223,86275,86447,86523,86615,86897,86915,87039,87103,87281,87475,87795,87873,89075,89197,89205,89241,89251,89261,89695,89721,89741,89783,90325,90425,90845,90861,90915,90995,91009,91091,91131,91165,91169,91197,91201,91233,91307,91429,91477,91479,91491,91511,91567,91649,91999,92339,92457,92617,92759,92765,92907,93035,93417,93525,93691,93763,93805,93917,94043,94051,94193,94303,94847,95041,95433,95475,95545,95557,95645,95749,95885,96153,96211,96417,96479,96567,96639,96651,96673,96773,96791,96853,96921,96943,97053,97073,97495,97597,97625,97681,97697,97727,98241,98377,98519,98817,98895,98937,98991,98993,99007,99067,99555,99569,99643,99647,99657,99781,137953)
I don't believe my table to be anything unusual.
Just in case anyone asks why I'm running BigQuery on an 8mb table...this 8mb table is part of a data set that contains much larger tables 1TB+ and is far easier for me/my application if it's all in one place.
EDIT
Here is the execution data for a similar query with 3140 INs on a 14mb table. This takes 50 seconds.
BigQuery team flagged it as a bug. The workaround is to use WHERE id IN UNNEST([1,2,3,...]) instead of WHERE id IN (1,2,3,...).

Bigquery Tier 1 exceeded for partitioned table but not for by day tables

We have two tables in bigquery: one is large (a couple billion rows) and on a 'table-per-day' basis, the other is date partitioned, has the exact same schema but a small subset of the rows (~100 million rows).
I wanted to run a (standard-sql) query with a subselect in form of a join (same when subselect is in the where clause) on the small partitioned dataset. I was not able to run it because tier 1 was exceeded.
If I run the same query on the big dataset (that contains the data I need and a lot of other data) it runs fine.
I do not understand the reason for this.
Is it because:
Partitioned tables need more resources to query
Bigquery has some internal rules that the ratio of data processed to resources needed must meet a certain threshold, i.e. I was not paying enough when I queried the small dataset given the amount of resources I needed.
If 1. is true, we could simply make the small dataset also be on a 'table-per-day' basis in order to solve the issue. But before we do that though we would like to know if it is really going to solve our problem.
Details on the queries:
Big datset
queries 11 GB, runs 50 secs, Job Id remilon-study:bquijob_2adced71_15bf9a59b0a
Small dataset
Job Id remilon-study:bquijob_5b09f646_15bf9acd941
I'm an engineer on BigQuery and I just took a look at your jobs but it looks like your second query has an additional filter with a nested clause that your first query does not. It is likely that that extra processing is making your query exceed your tier. I would recommend running the queries in the BigQuery UI and looking at the Explanation tab to see how the queries differ in the query plan.
If you try running the exact same query (modifying only the partition syntax) for both tables and still get the same error I would recommend filing a bug.

libpq very slow for large (20 million record) database

I am new to SQL/RDBMS.
I have an application which adds rows with 10 columns in PostgreSQL server using the libpq library. Right now, my server is running on same machine as my visual c++ application.
I have added around 15-20 million records. The simple query of getting total count is taking 4-5 minutes using select count(*) from <tableName>;.
I have indexed my table with the time I am entering the data (timecode). Most of the time I need count with different WHERE / AND clauses added.
Is there any way to make things fast? I need to make it as fast as possible because once the server moves to network, things will become much slower.
Thanks
I don't think network latency will be a large factor in how long your query takes. All the processing is being done on the PostgreSQL server.
The PostgreSQL MVCC design means each row in the table - not just the index(es) - must be walked to calculate the count(*) which is an expensive operation. In your case there are a lot of rows involved.
There is a good wiki page on this topic here http://wiki.postgresql.org/wiki/Slow_Counting with suggestions.
Two suggestions from this link, one is to use an index column:
select count(index-col) from ...;
... though this only works under some circumstances.
If you have more than one index see which one has the least cost by using:
EXPLAIN ANALYZE select count(index-col) from ...;
If you can live with an approximate value, another is to use a Postgres specific function for an approximate value like:
select reltuples from pg_class where relname='mytable';
How good this approximation is depends on how often autovacuum is set to run and many other factors; see the comments.
Consider pg_relation_size('tablename') and divide it by the seconds spent in
select count(*) from tablename
That will give the throughput of your disk(s) when doing a full scan of this table. If it's too low, you want to focus on improving that in the first place.
Having a good I/O subsystem and well performing operating system disk cache is crucial for databases.
The default postgres configuration is meant to not consume too much resources to play nice with other applications. Depending on your hardware and the overall utilization of the machine, you may want to adjust several performance parameters way up, like shared_buffers, effective_cache_size or work_mem. See the docs for your specific version and the wiki's performance optimization page.
Also note that the speed of select count(*)-style queries have nothing to do with libpq or the network, since only one resulting row is retrieved. It happens entirely server-side.
You don't state what your data is, but normally the why to handle tables with a very large amount of data is to partition the table. http://www.postgresql.org/docs/9.1/static/ddl-partitioning.html
This will not speed up your select count(*) from <tableName>; query, and might even slow it down, but if you are normally only interested in a portion of the data in the table this can be helpful.

How long should a query that returns 5 million records take?

I realise the answer should probably be 'as little time as possible' but I'm trying to learn how to optimise databases and I have no idea what an acceptable time is for my hardware.
For a start I'm using my local machine with a copy of sql server 2008 express. I have a dual-core processor, 2GB ram and a 64bit OS (if that makes a difference). I'm only using a simple table with about 6 varchar fields.
At first I queried the data without any indexing. This took a ridiculously long amount of time so I cancelled and added a clustered index (using the PK) to the table. This cut the time down to 1 minute 14 sec. I have no idea if this is the best I can get or whether I'm still able to cut this down even further?
Am I limited by my hardware or is there anything else I can do to my table/database/queries to get results faster?
FYI I'm only using a standard SELECT * FROM <Table> to retrieve my results.
EDIT: Just to clarify, I'm only doing this for testing purposes. I don't NEED to pull out all the data, I'm just using that as a consistent test to see if I can cut down the query times.
I suppose what I'm asking is: Is there anything I can do to speed up the performance of my queries other than a) upgrading hardware and b) adding indexes (assuming the schema is already good)?
I think you are asking the wrong question.
First of all - why do you need so many articles at one time on the local machine? What do you want to do with them? I'm asking because I think you want to transfer this of data to somewhere, so you should be measuring how long it takes to transfer the data.
Some advice:
Your applications should not select 5 million records at the time. Try to split your query and get the data in smaller sets.
UPDATE:
Because you are doing this for testing, I suggest that you
Remove * from your query - it takes SQL server some time to resolve this.
Put your data in temporary storage, try using VIEW or a temporary table for this.
Use plan caching on your server
to improve performance. But even if you're just testing, I still don't understand why you would need such tests if your application would never use such a query. Testing just for the sake of testing is a bad use of time
Look at the query execution plan. If your query is doing a table scan, it will obviously take a long time. The query execution plan can help you decide what kind of indexing you would need on the table. Also, creating table partitions can help sometimes in cases where the data is partitioned by a condition (usually date and time).
I did 5.5 million in 20 seconds. That's taking over 100k schedules with different frequencies and forecasting them for the next 25 years. Just max scenario testing, but proves the speed you can achieve in a scheduling system as an example.
The best optimized way depends on the indexing strategy you choose. As many of the above answers, i too would say partitioning the table would help sometimes. And its not the best practice to query all the billion record in a single time frame. Will give you much better results if you could try to query partially with the iterations. you may check this link to clear the doubts on the minimum requirements for the Sql server 2008 Minimum H/W and S/W Requirements for Sql server 2008
When fecthing 5 million rows you are almost 100% going spool to tempdb. you should try to optimize your temp Db by adding additional files. if you have multiple drives on seperate disks you should split the table data into different ndf files located on seperate disks. parititioning wont help when querying all the data on the disk
U can also use a query hint to force parrallelism MAXDOP this will increase the CPU utilization. Ensure that the columns contain few nulls as possible and rebuild ur indexes and stats

Long running Jobs Performance Tips

I have been working with SQL server for a while and have used lot of performance techniques to fine tune many queries. Most of these queries were to be executed within few seconds or may be minutes.
I am working with a job which loads around 100K of data and runs for around 10 hrs.
What are the things I need to consider while writing or tuning such query? (e.g. memory, log size, other things)
Make sure you have good indexes defined on the columns you are querying on.
Ultimately, the best thing to do is to actually measure and find the source of your bottlenecks. Figure out which queries in a stored procedure or what operations in your code take the longest, and focus on slimming those down, first.
I am actually working on a similar problem right now, on a job that performs complex business logic in Java for a large number of database records. I've found that the key is to process records in batches, and make as much of the logic as possible operate on a batch instead of operating on a single record. This minimizes roundtrips to the database, and causes certain queries to be much more efficient than when I run them for one record at a time. Limiting the batch size prevents the server from running out of memory when working on the Java side. Since I am using Hibernate, I also call session.clear() after every batch, to prevent the session from keeping copies of objects I no longer need from previous batches.
Also, an RDBMS is optimized for working with large sets of data; use normal SQL operations whenever possible. Avoid things like cursors, and a lot procedural programming; as other people have said, make sure you have your indexes set up correctly.
It's impossible to say without looking at the query. Just because you have indexes doesn't mean they are being used. You'll have to look at the execution plan and see if they are being used. They might show that they aren't useful to the execution plan.
You can start with looking at the estimated execution plan. If the job actually completes, you can wait for the actual execution plan. Look at parameter sniffing. Also, I had an extremely odd case on SQL Server 2005 where
SELECT * FROM l LEFT JOIN r ON r.ID = l.ID WHERE r.ID IS NULL
would not complete, yet
SELECT * FROM l WHERE l.ID NOT IN (SELECT r.ID FROM r)
worked fine - but only for particular tables. Problem was never resolved.
Make sure your statistics are up to date.
If possible post your query here so there is something to look at. I recall a query someone built with joins to 12 different tables dealing with around 4 or so million records that took around a day to run. I was able to tune that to run within 30 mins by eliminating the unnecessary joins. Where possible try to reduce the datasets you are joining before returning your results. Use plenty of temp tables, views etc if you need.
In cases of large datasets with conditions try to preapply your conditions through a view before your joins to reduce the number of records.
100k joining 100k is a lot bigger than 2k joining 3k