In my app I have a Budget model with daily_avg column.
Also there are BudgetIncome and BudgetSpending models.
daily_avg = (BudgetIncome.sum(:debit) - BudgetSpending.sum(:credit))/Time.days_in_month(current_month)
I need to update this column every 1st day of month, cause in every month there're different quantity of days.
So, I'll write the script, but I don't know:
Where to put this script?
How to start it every 1st day of month?
I'm using PostgreSQL and deploying my app on Heroku.
Thanks for any help.
It's the first time I'm working with scripts for DB, so I don't know what information I need to provide you to help me with it.
Heroku has a scheduler you can use:
https://devcenter.heroku.com/articles/scheduler
"Scheduler is an add-on for running jobs on your app at scheduled time intervals, much like cron in a traditional server environment."
Install with: $ heroku addons:add scheduler:standard
Related
I am using ADF to keep an Azure SQL DB in sync with an on-prem DB. The on-prem DB is read only and the direction is one-way, from the Azure SQL DB to the on-prem DB.
My source table in the Azure SQL Cloud DB is quite large (10's of millions of rows) so I have the pipeline set to use an UPSERT (merge, trying to create a differential merge). I am using a filter on the Source table and the and the Filter Query has a WHERE condition that looks like this:
[HistoryDate] >= '#{formatDateTime(pipeline().parameters.windowStart, 'yyyy-MM-dd HH:mm' )}'
AND [HistoryDate] < '#{formatDateTime(pipeline().parameters.windowEnd, 'yyyy-MM-dd HH:mm' )}'
The HistoryDate column is auto-maintained in the source table with a getUTCDate() type approach. New records will always get a higher value and be included in the WHERE condition.
This works well, but here is my question: I am testing on my local machine before deploying to the client. When I am not working, my laptop hibernates and the pipeline rightfully fails because my local SQL Instance is "offline" during that run. When I move this to production this should not be an issue (computer hibernating), but what happens if the clients connection is temporarily lost (i.e, the client loses internet for a time)? Because my pipeline has a WHERE condition on the source to reduce the table size upsert to a practical number, any failure would result in a loss of any data created during that 5 minute window.
A failed pipeline can be rerun, but the run time would be different at that moment in time and I would effectively miss the block of records that would have been picked up if the pipeline had been run on time. pipeline().parameters.windowStart and pipeline().parameters.windowEnd will now be different.
As an FYI, I have this running every 5 minutes to keep the local copy in sync as close to real-time as possible.
Am I approaching this correctly? I'm sure others have this scenario and it's likely I am missing something obvious. :-)
Thanks...
Sorry to answer my own question, but to potentially help others in the future, it seems there was a better way to deal with this.
ADF offers a "Metadata-driven Copy Task" utility/wizard on the home screen that creates a pipeline. When I used it, it offers a "Delta Load" option for tables which takes a "Watermark". The watermark is a column for an incrementing IDENTITY column, increasing date or timestamp, etc. At the end of the wizard, it allows you to download a script that builds a table and corresponding stored procedure that maintains the values of each parameters after each run. For example, if I wanted my delta load to be based on an IDENTITY column, it stores the value of the max value of a particular pipeline run. The next time a run happens (trigger), it uses this as the MIN value (minus 1) and the current MAX value of the IDENTITY column to get the added records since the last run.
I was going to approach things this way, but it seems like ADF already does this heavy lifting for us. :-)
I've been developing a data pipeline in SSIS on an on-premise VM during my internship, and was tasked with gathering data from Marketo (re: https://www.marketo.com/ ). This package runs without error, starting with a Truncate table execute SQL task, followed by 5 data flow tasks that gather data from different sources within Marketo and moves them to staging tables within SQL Server, and concludes with an execute SQL task to load processing tables with only new data.
The problem I'm having: my project lead wants this process to be automated to run daily, and I have noticed tons of resources online that show automation of an SSIS package, but within my package, I have to have user input for the Marketo source. The Marketo source requires a user input of a time frame from which to gather data.
Is it possible to automate this package to run daily even with user input required? I was thinking there may be a way to increment the date value by one for the start and end dates (So start date could be 2018-07-01, and end date could be 2018-07-02, incrementing each day by one), to make the package run by itself. Thank you in advance for any help!
As you are automating your extract, this suggests that you have a predefined schedule on which to pull that data. From this schedule, you should be able to work out your start and end dates based on the date that the package was run.
In SSIS there are numerous ways to achieve this depending on the data source and your connection methods. If you are using a script task, you can simply calculate the dates required using your .Net code. Another alternative would be to use calculated variables that return the result of an expression, such as:
DATEADD("Month", -1, GETDATE())
Assuming you schedule your extract to run on the first day of the month, the expression above would return the first day of the previous month.
I have a SQL Server job that has run for almost 2 years.
It's connecting to a bad Oracle database that keeps disconnecting, it always fails due to that. And when I run it again after 10 or 15 minutes, it works successfully. I'm getting bored of checking it every day...
Is there a way that make the job run to connect to that Oracle source until it succeeds, or another job that looks over this job status and if it failed, then it runs it again until it succeeds?
A solution we are using is something like this:
Wrap your Oracle query in an SSIS package, and after the query, have the package update a SQL table that keeps either a history of executions, or just a single row that tracks the last time the job ran successfully. In short, if the Oracle query was successful, then put something in a table saying the query ran successfully today. If it was not successful, then don't put anything in the table for today.
Then at the beginning of the package, BEFORE the Oracle query, check to see if the query has been run successfully today. If it has already run successfully, then do nothing and exit the package. If it has not run successfully today, then go ahead and try to run it, following the post-query steps described above. If you have any other conditions about when the package should run (like "only after 10 am" or anything like that) you would include that logic here.
Finally, schedule the job to call the package, and schedule to run every 15 minutes, or however often you like. It will try every 15 minutes until it runs successfully, and after that it will stop doing anything until the next day.
As a bonus, you can use this same package and job to initiate all tasks that you want handled the same way. You just need to keep meta data about all these tasks in your history/metadata table.
an alternative is to create the job step and leave it unscheduled, and create an ssis job that acts as the master to all your jobs and it runs every minute checking all job steps from a config table that have yet to succeed today and any it finds execute using sp_start_job.
if they do run successfully log the stats to a log table and this prevents them ever being launched again until the next day. This prevents all yours jobs needing to be scheduled every 15 minutes etc, they launch asap, and you can add extra logic to handle dependencies, number parallel running, importance level etc, start time, latest start time, max number to retty etc
Good Morning all
I wish to create a table which essentially acts like a view but only updates once a week. Not sure how to start this process...looking at scheduled tasks.
I was thinking essentially it would be a clearing function which clears the table at the start of the week and re-runs the initial SQL command to populate...But unsure and need some advice please.
The reason I am looking at doing a table rather than a view is so I can set up periodically to update tables to create / show trends analysis over time
Thanks in advance
You may also look at using SSIS package and then schedule this as a task
I am facing issue with sql agent , on changing the server date it does not start at its schedule time while job is schedule daily at fixed time. No logs are found on this . This issue also occurs while system date has been changed to its real date . I have to restart the sql agent after that to invoke the job at its schedule time.
SQL Agent scheduled jobs will have last run date/time and next run date/time. Each time the jobs run, these values get updated. Please look in MSDB and you will see these details. You can also look in Job history.
When you manually reset the system clock to a date later than now, then your next run date/time will be in the past. As you have mentioned, bouncing the agent service should start the jobs again.
Raj