We have a couple of scheduled queries in BigQuery running daily to append data to two dataset tables (one for each query).
However i've noticed today that we have some days missing from both tables.
I can't find any error logs regarding this and the status of both scheduled queries are successful, no errors, though we have some random days missing from the tables (in the last 14 days we have 4 missing).
Have anyone experienced this kind of issue before?
Thanks for support,
Filipe
Related
I've been trying to figure out a performance issue for a while and would appreciate if someone can help me understand the issue.
Our application is connected to Oracle 11g. We have a very big table in which we keep data for last two months. We do millions of inserts every half an hour and a big bulk delete operation at the end of each day. Two of our columns are indexed and we definitely have skewed columns.
The problem is that we are facing many slow responses when reading from this table. I've done some researches as I am not a DB expert. I know about bind variable peeking and cursor sharing. The issue is that even for one specific query with a specific parameters, we see different execution time!
There is no LOB column in the table and the query we use to read data is not complex! it looks for all rows with a specific name (column is indexed) within a specific range (column is indexed).
I am wondering if the large number of insertions/deletions we do could cause any issues?
Is there any type of analysis we could consider to get more input on this issue?
I can see several possible causes of the inconsistency of your query times.
The number of updates being done while your query is running. As long as there are locks on the tables you use in the query your query has to wait for them to be release.
The statistics on the table can become very out of synch with this much data manipulation. I would try two things. First, I would find out when the DBMS_STATS.GATHER_DATABASE_STATS_JOB_PROC job is run and make sure the bulk delete is performed before this job each night. If this does not help I would ask the DBA to set up DBMS_MONITOR on your database to help you trouble shoot the issue.
Why do recently added records seem to be quicker to retrieve than older records?
I have a postgres table that is being added to at a rate of about 2 million rows per day. It contains an indexed column called ‘fixTime’ of type ‘timestamp’ which records the moment the record was added to the database.
I have noticed that retrieving a set of records (filtered on ‘fixTime’) added in the last 2-3 days seems to take a couple of seconds, but retrieving a similar set of records from 6 months ago can take 3-4 minutes. Why the big difference, and what can I do get similar performance for my older records?
Some background info…..
Version is “PostgreSQL 9.5.7 on x86_64-pc-mingw64, compiled by
gcc.exe (Rev5, Built by MSYS2 project) 4.9.2, 64-bit”
The records are distributed approximately evenly throughout the day (about 20 new records added every second)
My select queries are identical except
for the time period specified. Both the fast (recent) and slow (old)
queries are returning a similar number of records (typically a few
hundred)
I accepted all defaults when creating my database – so
ANALYSE and VACUUM should be on (don’t understand what they do, but I
gather they’re important!)
I've tried explain (but don't understand the results) I have noticed that the query plan is different depending on how long ago the data was added.
I had thought that all data was 'equal' and that the speed to return the data should be (mostly) the same irrespective of the when/how it was added. Any ideas on what the issue might be and how to diagnose it?
We've been experiencing timeouts, and long running queries with BigQuery. The table is 1 row (8 bytes) and the query is taking on average 50 seconds.
This is causes issues with our applications, which are timing out after 10 seconds. They don't expect a query over 1 row to take that long.
Is there something wrong with BigQuery today?
Some example job ids:
job_TlXh5ISWUc6QXpf3HQ3KC-iRdRw
job_4OLNwAFN_j6mqMsnw2q8UUAJ528
job_3bZjpkVOfb55PZbldCwot53HqWA
job_V_EJzsuM9hjikBM-zQ_hCoEJNu8
job_66awPpXPPfd7WrDuRzYf7o3bKPs
there was a temporary issue yesterday afternoon where some queries experienced added latency. I looked at several of your queries and they all appeared to have run during that time.
The issue was related to a blocking call that we were making to a monitoring service that was timing out. We've since fixed the issue. We're in the process of conducting an internal post-mortem to figure out how to prevent issues like this in the future.
I'm investigating a data correctness issue in a regularly-running job that I wrote, and the problem seems to be caused by BigQuery overwriting the same table twice in a non-atomic way. More specifically, I had two copies of the same query running at the same time (due to retry logic), both set to overwrite the same table (using the WRITE_TRUNCATE option), and the resulting table had two copies of every row. I was expecting one query to write a table with the query results and the other query to overwrite it with the same results, rather than ending up with a double-sized table.
My understanding when designing the system was that all BigQuery actions are atomic (based on atomic inserts in big query, Can I safely query a BigQuery table being replaced with WRITE_TRUNCATE, and Views are failing when their underlying table is repopulated). Is the issue I'm running into a bug, or am I misunderstanding the exact guarantees I can expect?
Looking through history, it looks like this has happened in at least 4 separate cases in the past week.
Here's the timeline of what causes this to happen (with the specific details applying to the most noticeable case):
At about 18:07 April 30th UTC, my code submitted 82 queries at the same time. Each one queried a table ending in conversions_2014_04_30_14 and another table and wrote to a table ending in conversions_2014_04_30_16 (specifying WRITE_TRUNCATE).
About 25 minutes later, 25 of the queries were still not finished (which is more than usual), so it triggered "retry" logic that gives up on all queries still running and just submits them again (this is to work around an issue I've seen where queries would stay in pending for hours without being run, which I mentioned here: https://code.google.com/p/google-bigquery/issues/detail?id=83&can=1 ). This means that 50 queries were outstanding at once, two of each of the 25 queries that hadn't finished yet.
After all queries finished, 6 of the 82 resulting tables were twice as big as they should be.
Here's one example:
First query job: 124072386181:job_tzqbfxfLmZv_QMYL6ozlQpWlG5U
Second query job: 124072386181:job_j9_7uJEjtvYbyeVmEVP0u2er9Lk
The resulting table: 124072386181:bigbingo_history.video_task_companions_conversions_2014_04_30_16
And another example:
First query job: 124072386181:job_TQJzGabFT9FtHI05ftTkD5O8KKU
Second query job: 124072386181:job_5hogbjnLX_5a2opEJl9Jacnn53s
Table: 124072386181:bigbingo_history.Item_repetition__Elimination_conversions_2014_04_27_16
The tables haven't been touched since these queries ran (aside from a schema addition for the first table), so they still contain the duplicate rows. One way to confirm this is to see that the queries all had "GROUP BY alternative, bingo_id", but the tables have two of each (alternative, bingo_id) pair.
We had a bug in which write-truncate could end up appending in certain cases. We released the fix yesterday (May 22), and haven't seen any further instances of the problem since then.
I have a query that is pulling information from one table. That table is rather large at 1.8 Million rows and growing by week. The query takes quite a while to run and is problematic when pulling multiple times. Is there any process that may speed up a query in a database with this many or more rows. I have another one with around 5 Million rows... The query is rather basic using a prompt to pull the rows relevant to the site number, and a prompt for between dates.
Arrival_ID criteria = [Select Arrival ID]
Week criteria = Between[Select week begin:] And [Select week end:]
Any help or direction pointing would be greatly appreciated.
Indexes on the columns Arrival_ID and Week might help.
Unless you're selecting a lot of columns from a very wide table, you should get fairly quick performance from Access on 1.8 million rows, as long as your indexes are selective.
I agree with Kieren Johnstone - can you store the data in SQL and then use access to run the queries?
Do double check the indexes.
When you compact/repair - do it twice - make it a habit. The second time clears up any issues set aside from the first one.