I have scheduled query to refresh an existing BQ table.
BQ says the job runs, and confirms the time it finished.
However, the rows never actually get appended.
There are no sorts of errors or anything firing.
The table even says it was last modified at the same time that the scheduled query runs.
The write type is write append.
Anyone experience this issue?
Thank you
Related
I frequently run BigQuery jobs in the web gui that take 30 minutes or more, saving the results into another table to view later.
Since I'm not waiting for the result to come soon, and not storing them in my computer's memory, it would be great if I could start a query and then turn off my computer, to come back the next day and look at the results in the destination table.
Will this work?
The same applies if my computer crashes, or browser runs out of memory, or anything else that causes me to lose my connection to Bigquery while the job is running.
The simple answer is yes, the processing takes place in the cloud, not on your browser. As long as you set a destination table, the results will be saved there or if not, you can check the query history to see if there were any issues which caused it not to be produced.
If you don't set a destination table it will save to a temporary table which may not be available if you don't return in time.
I'm sure someone can give you a much more detailed answer.
Even if you have not defined destination table - you still can access result of the query by checking Query History. You should locate your query in the list of presented queries and then expand respective item and locate value of Destination Table.
Note: this is not regular table - rather so called anonymous table that is being available for about 24 hours after query was executed
So, knowing that table you can just use it in whatever way you want - for example just simply query it as in below
SELECT *
FROM `yourproject._1e65a8880ba6772f612fbe6ff0eee22c939f1a47.anon9139110fa21b95d8c8729cf0bb6e4bb6452946d4`
Note: anonymous table is being "saved" in a "system" dataset that is started with underscore so you will not be able to see it in UI. Also table name startes with 'anon' which I believe states for 'anonymous'
I have a SQL Server job that has run for almost 2 years.
It's connecting to a bad Oracle database that keeps disconnecting, it always fails due to that. And when I run it again after 10 or 15 minutes, it works successfully. I'm getting bored of checking it every day...
Is there a way that make the job run to connect to that Oracle source until it succeeds, or another job that looks over this job status and if it failed, then it runs it again until it succeeds?
A solution we are using is something like this:
Wrap your Oracle query in an SSIS package, and after the query, have the package update a SQL table that keeps either a history of executions, or just a single row that tracks the last time the job ran successfully. In short, if the Oracle query was successful, then put something in a table saying the query ran successfully today. If it was not successful, then don't put anything in the table for today.
Then at the beginning of the package, BEFORE the Oracle query, check to see if the query has been run successfully today. If it has already run successfully, then do nothing and exit the package. If it has not run successfully today, then go ahead and try to run it, following the post-query steps described above. If you have any other conditions about when the package should run (like "only after 10 am" or anything like that) you would include that logic here.
Finally, schedule the job to call the package, and schedule to run every 15 minutes, or however often you like. It will try every 15 minutes until it runs successfully, and after that it will stop doing anything until the next day.
As a bonus, you can use this same package and job to initiate all tasks that you want handled the same way. You just need to keep meta data about all these tasks in your history/metadata table.
an alternative is to create the job step and leave it unscheduled, and create an ssis job that acts as the master to all your jobs and it runs every minute checking all job steps from a config table that have yet to succeed today and any it finds execute using sp_start_job.
if they do run successfully log the stats to a log table and this prevents them ever being launched again until the next day. This prevents all yours jobs needing to be scheduled every 15 minutes etc, they launch asap, and you can add extra logic to handle dependencies, number parallel running, importance level etc, start time, latest start time, max number to retty etc
I am new to BODS, At present I have configured a job to execute every 2 mins to perform transaction from MySQL server and Load into HANA tables.
But sometimes when the data volume in MySQL is too large to transform and load into HANA within 2 Mins, the job is still executing my next iteration for the same job starts which results in BODS failure.
My question is: is there is any option BODS to check for the execution status of the scheduled JOB between runs?
Please help me out with this.
You can create a control/audit table to keep history of each run of bods job. The table should contain fields like eExtractionStart, ExtractionEnd, EndTime etc. And you need to make a change in the job, so that it reads status of previous run from this table before starting the load to Hana data flow. If previous run has not finished, the job can raise an exception.
Let me know if this has been helpful or if you need more information.
I have a series of workflows in oozie that will periodically fail silently by simply not filling the target table. The failures are a result of, among other things, a change input like a non-ascii character or a double escape sneaking into the data, that kind of thing. However, the job actually finishes successfully. I would like the jobs to fail if the table does not fill. Is there any easy way to do this directly in Oozie, or with a simple Hive query that will fail on an empty table?
Oozie doesn't fail the action as oozie sees that the hive query has been successfully executed , it doesn't care about anything else
A workaround for your case :
hive action that loads the table
another hive action that checks the count of the table , capture output.
use decision node such that if above captured output value is 0 then kill the workflow.
hope this workaround helps .
I'm running a BigQuery query using bq that selects a subset of rows from one table into a destination table.
Our command looks like:
bq --format=none query --destination_table=dpm_legacy.unique_test [query]
On the command line I get:
Waiting on job_cda83335e0a4416ea9d4a2a0262d1ec7 ... (0s) Current status: RUNNING
Waiting on job_cda83335e0a4416ea9d4a2a0262d1ec7 ... (10s) Current status: DONE
But then the process hangs for awhile and it's CPU and memory usage begin to creep up until it finally exists with no output.
Empirically, it seems like the amount of time the tool hangs is directly proportional to how large the destination table is so is it possible that even with the --format=none flag it is still returning data?
Thanks!
bq does try to read the whole table on reply, even if the format is set to none. One way to prevent this is to use --nosync which will exit immediately and not wait for the query to complete. I'm in the process of adding a --max_rows flag that will allow you to specify how many rows you want in the result (so if you want none you can just specify 0).