How can I schedule a script in BigQuery? - google-bigquery

At last BigQuery supports using ; in the queries, so I can write more than one query in one "block", if I seperate them with semicolon.
If I run the code manually, it works. But I cannot schedule that.
When I want to schedule, I have two choices:
(New) Web UI: I must give a destination table. If I don't do it, I could not save the scheduled query. But all my queries are updates and inserts with different "destination tables". Like these:
UPDATE project.exampledataset.a
SET date = current_date()
WHEN TRUE
;
INSERT INTO project.otherdataset.b
SELECT c,d
FROM project.otherdataset.c
So I cannot even make a scheduling in the Web UI.
Classic UI: I tried this, because the official documentary states, that I should leave the "destination table" blank, and Classic UI allows it. I can setup the scheduling, but it doesn't run, when it should. I get the error message in email "Error status: Dataset specified in the query ('') is not consistent with Destination dataset 'exampledataset'."
AIK scripting (and using semicolon) is a very new feature in BigQuery, but I hope someone can help me.
Yes, I know that I could schedule every query one by one, but I would like to resolve it with one big script.

Looks like the scheduled query was defined earlier with destination dataset defined with APPEND/TRUNCATE type transaction. While updating the same scheduled query to a DML query, GUI doesn't show the dataset field / table name to update to NULL. Hence this error is coming considering the previously set dataset and table name in the scheduled query.
Hence the fix is to delete the scheduled query and create it from scratch with DML query option. It worked for me.

Scripting is supported in scheduled query now. However, scripting query, when being scheduled, doesn't support setting a destination table for now. You still need to use DDL/DML to make change to existing table.
E.g.:
CREATE OR REPLACE TABLE destinationTable AS
SELECT *
FROM sourceTable
WHERE date >= maxDate

As of 2022, the BQ Console UI will let you create a new scheduled query without a destination dataset, but it won't let you update a prior SELECT to use DDL/DML block syntax. However, you can use the BigQuery Data Transfer API to update the destinationDatasetId field, via transferconfigs/patch. Use transferconfigs/list to get the configId for a given scheduled query.
Note that you can either use the in-browser API Explorer, if you have the appropriate credentials, or write a programmatic solution. Also seems useful for setting/updating any other fields, including renaming scheduled queries.

Related

Update after a Copy Data Activity in Azure Data Factory

I've got this doubt in Azure Data Factory. My pipeline has a copy data activity, and after loading the information in the table I need to update a field in that destination based on a parameter. It is a simple update, but given that we do not have a SQL task (present in SSIS) I do not what to use. Create a SP for this does not seem to be the most appropriate solution, besides, modify the database is complicated. I thought the option "Use Query" in the Lookup activity could be a solution, but this does not allow me to create a SQL query with a parameter, just like in a Source.
What could be a possible workaround?
You are on the right track with the Lookup. That is definitely the way to go. The query field there will allow you to create dynamic SQL just like you did within the copy activity. You just need to reference the variable/parameter properly.
Also, with the Lookup, it will always expect something returned. You don't have to do anything with that returned value. Just ignore it, but the Lookup will not work without returning something. So, that query field would contain something like:
UPDATE dbo.MyTable SET IsComplete = 1 WHERE RunId = #{pipeline().parameters.runId};
SELECT 0 AS DummyValue; -- Necessary for Lookup to work

capture executed sql from input table in pentaho pdi

I am using pentaho for data migration testing. I have set a "table input" step where many parts of the query inside "table inputs" are variables. I have been looking for a way to capture that query after it gets executed during runtime.
I was wondering if there is any specific system log variables for sql or is it to do with metadata. need help! Thanks
Maybe the following approach will help:
We assume a transformation reading a CSV file to get the dynamic portion of the SELECT statement (e.g. the columns) and setting the variable columns with it.
The second transformation uses this variable to generate the SELECT statement and store it into the variable sql_statement.
In the main transformation we use ${sql_statement} as the SELECT statement of the table input and write the data to an output file (that's the business process so to say). From the same input we copy the output to another path. There we add the current time as a field (use element "Get system data") and we add the generated SQL statement, join them as a cartesian product and group the result by the sql_statement. That way we can compute the first time and the last time that the statement was used. These results are written to a text file.
The last thing we need is a job calling the three transformations sequentially.
This is a sample output:
sql_statement;min_time;max_time
SELECT my_column FROM test_table;2014/05/08 00:41:21.143;2014/05/08 00:41:21.144
Thank you Marcus! I did some thing similar.
It works. awesome.
I gathered parts of queries from table field where they were kept and formed a full query in javascript. After that full query will be sent as parameter to a transformation that will run and log the query.

How to get the query displayed when a change is made to a table or a field in a table in Postgresql?

I have used mysql for some projects and recently I moved to postgresql. In mysql when I alter a table or a field the corresponding query will be displayed in the page. But such a feature was not found in postgresql(kindly excuse me if I'm wrong). Since the query was readily available it was very helpful for me to test something in the local database(without explicitly typing the query), copy the printed query and run it in the server. Now it seems like I've to manually do all the trick. Even though I'm familiar with the query operations,at times it can be pretty time consuming process. Can anybody help me? How can I get the corresponding query to get displayed in postgresql(like in mysql) whenever a change is made to the table?
If you use SELECT * FROM ... there should not be any reason for your output to not include newly added columns, no matter how you get your results - would that be psql in command line, PgAdmin3 or any other IDE.
After you add new columns, it is possible that these changes are still in open transaction in other window or SQL command - be sure to COMMIT such transaction. Note that your changes to data or schema will not be visible to any other database clients until transaction commits.
If your IDE still does not show changes, maybe you need to refresh list of tables or if that option is not available, restart your IDE. If that does not work still, maybe you should use better IDE.
If you have used SELECT field1, field2, ... FROM ... then you must add new fields into your SELECT statement(s) - but this would be true for any other SQL implementation, MySQL included.
You could use the LISTEN / NOTIFY mechanism in PostgreSQL to notify your client on altering the database schema.

Data Driven Subscriptions SSRS Standard Edition 2008

I'm fairly new to MSSQL and SSRS.
I'm trying to create a data driven subscription in MSSQL 2008 Standard SSRS that does the following.
Email the results of the report to a email address found within the report.
Run Daily
For Example:
Select full_name, email_address from users where (full_name = 'Mark Price')
This would use the email_address column to figure out who to email, This must also work for multiple results with multiple email address's.
The way I'm thinking of doing this is making a subscription to run the query, if no result is found then nothing happens.
But if a result is found then the report changes the row in Subscriptions table to run the report again in the next minute or so with the correct email information found in the results.
Is this a silly idea or not?
I've found a couple blog posts claiming this works but i couldn't understand their code enough to know what it does.
So, Any suggestions on how to go about this or if you can suggest something already out there on the internet with a brief description?
This takes me back to my old job where I wrote a solution to a problem using data-driven subscriptions on our SQL Server 2005 Enterprise development box and then discovered to my dismay that our customer only had Standard.
I bookmarked this post at the time and it looked very promising, but I ended up moving jobs before I had a chance to implement it.
Of course, it is targeted at 2005, but one of the comments seems to suggest it works in 2008 as well.
I've implemented something like this on SQL Server Standard to avoid having to pay for Enterprise. First, I built a report called “Schedule a DDR” (Data Driven Report). That report has these parameters:
Report to schedule: the name of the SSRS report (including folder) that you want to trigger if the data test is met. E.g. "/Accounting/Report1".
Parameter set: a string that will be used to look up the parameters to use in the report. E.g. "ABC".
Query to check if report should be run: a SQL query that will return a single value, either zero or non-zero. Zero will be interpreted as "do not run this report"
Email recipients: a list of semicolon-separated email recipients that will receive the report, if it is run.
Note that the “Schedule a DDR” report is the report we’re actually running here, and it will send its output to me; what it does is run another report – in this case it’s “/Accounting/Report1” and it’s that report that needs these email addresses. So “Schedule a DDR” isn’t really a report, although it’s scheduled and runs like one – it’s a gadget to build and run a report.
I also have a table in SQL defined as follows:
CREATE TABLE [dbo].[ParameterSet](
[ID] [varchar](50) NULL,
[ParameterName] [varchar](50) NULL,
[Value] [varchar](2000) NULL
) ON [PRIMARY]
Each parameter set – "ABC" in this case – has a set of records in the table. In this case the records might be ABC/placecode/AA and ABC/year/2013, meaning that there are two parameters in ABC: placecode and year, and they have values "AA" and "2013".
The dataset for the "Schedule a DDR" report in SSRS is
DDR.dbo.DDR3 #reportName, #parameterSet, #nonZeroQuery, #toEmail;
DDR3 is a stored procedure:
CREATE PROCEDURE [dbo].[DDR3]
#reportName nvarchar(200),
#parameterSet nvarchar(200),
#nonZeroQuery nvarchar(2000),
#toEmail nvarchar(2000)
AS
BEGIN
-- SET NOCOUNT ON added to prevent extra result sets from
-- interfering with SELECT statements.
SET NOCOUNT ON;
select ddr.dbo.RunADDR(#reportName,#parameterSet,#nonZeroQuery,#toEmail) as DDRresult;
END
RunADDR is a CLR. Here's an outline of how it works; I can post some code if anyone wants it.
Set up credentials
Select all the parameters in the ParameterSet table where the parameterSet field matches the parameter set name passed in from the Schedule A DDR report
For each of those parameters
Set up the parameters array to hold the parameters defined in the retrieved rows. (This is how you use the table to fill in parameters dynamically.)
End for
If there’s a “nonZeroQuery” value passed in from Schedule A DDR
Then run the nonZeroQuery and exit if you got zero rows back. (This is how you prevent query execution if some condition is not met; any query that returns something other zero will allow the report to run)
End if
Now ask SSRS to run the report, using the parameters we just extracted from the table, and the report name passed in from Schedule A DDR
Get the output and write it to a local file
Email the file to whatever email addresses were passed in from Schedule A DDR
Instead of creating a subscription to modify the subscriptions table, I would put that piece somewhere else, such as in a SQL agent. But the idea is the same. A regularly running piece of SQL can add or change lines in the subscription table.
A Google of "SSRS Subscription table" returned a few helpful results: Here's an article based on 2005, but the principles should be the same for 2008: This article is for 2008, and is really close to what you are describing as well.
I would just look at the fields one by one in the subscriptions table and determine what you need for each. Try creating a row by hand (a manual insert statement) to send yourself a subscription.
R-Tag supports SSRS data driven reports with SQL Server standard edition
You can use SQL-RD, a third-party solution, to create and run data-driven schedules without having to upgrade to SQL enterprise. It also comes with event-based scheduling (triggers the report on events including database changes, file changes, emails received and so on).

How to remove a tuple from an SQL table after a timeout?

I am faced with a peculiar requirement which is as follows:
A network-intensive operation is triggered to a server by multiple clients, through a web-interface. However, only one operation is allowed at a time, and hence an entry(tuple) is made in an SQL table to indicate that the operation is in progress. Once the operation is complete (irrespective of success or failure), the appropriate result is displayed back to the client(s), and the corresponding tuple is removed from the SQL table.
Since the operation is network-intensive, a scenario where the operation needs to be "considered" to be cancelled, after some timeout (10 minutes) has to be introduced.
Is there ANY way the lifetime of a row in SQL be associated with a timeout value, so that is is deleted after certain time? My application is primarily written in Java 1.5 and EJB 3.0, using JPA/Hibernate to access Oracle 10g DB engine.
Thanks in advance.
Regards,
Nagendra U M
I would suggest that you try using a timestamp column containing the start time of the task.
A before trigger can be then made to delete the old column before a new one is inserted if the task timed out.
If you want to have multiple tasks with different timeouts, you can even add a column with the timeout in seconds. Just code your trigger accordingly.
I don't know that Oracle has this kind of facility but I think no db engine have this.
If you want to do it at DB level,
you must have a datetime column, e.g.; 'CreatedDate' in
table. This column will have
datetime when record was created.
Write a procedure and put it in a
schedule job. This job will run after every 10 minutes and remove the 10 minutes old records. The query will be like
this.
T-SQL: Please convert it according to your db engine.
DELETE FROM yourtable WHERE CreatedDate < DATEADD(mi, -10, GETDATE())
This will delete all records older than 10 minutes from table.
This is just to give you idea of schedule job. It is in SQL Server. I don't know about Oracle
step_by_step_guide_to_add_a_sql_job_in_sql_server_2005
It sounds like you're implementing a mutex using the database, take a look at this question and see if it helps? Sounds like transactional access to a flag table will solve this for you, as long as you catch both success & failure states in your server code.