Salt Stack - Best way to get schedule output - schedule

I'm using saltstack on my servers. I did a simple schedule:
job1:
schedule.present:
- function: state.apply
- seconds: 1800
- splay: 5
Now I want to get the output of the schedule back on my master
(or on my minion but I just like to know the best way)
I don't really know how to use salt mine or returners or what is best preferred for my needs.
thank you :)

Job data return and job metadata
By default, data about jobs runs from the Salt scheduler is returned to the master.
It can therefore be useful to include specific data to differentiate a job from other jobs. By using the metadata parameter a value can be associated with a scheduled job, although these metadata values aren't used in the execution of the job they can be used to search for specific job later.
job1:
schedule.present:
- function: state.apply
- seconds: 1800
- splay: 5
- metadata:
foo: bar
An example job search on the salt-master might look similar to this:
salt-run jobs.list_jobs search_metadata='{"foo": "bar"}'
Take a look at the salt.runners.jobs documentation for more examples.
Scheduler with returner
The schedule.present function includes a parameter to set a returner to use to return the results of the scheduled job. You could for example use the pushover returner:
job1:
schedule.present:
- function: state.apply
- seconds: 1800
- splay: 5
- returner: pushover
Take a look at the returners documentation for a full list of of builtin returners and usage information.

Related

Setting Cloudwatch Event Cron Job

I'm a little bit confused on the cron job documentation for cloudwatch events. My goal is to create a cron job that run every day at 9am, 5pm, and 11pm EST. Does this look correct or did I do it wrong? It seems like cloudwatch uses UTC military time so I tried to convert it to EST.
I thought I was right, but I got the following error when trying to deploy cloudformation template via sam deploy
Parameter ScheduleExpression is not valid. (Service: AmazonCloudWatchEvents; Status Code: 400; Error Code: ValidationException
What is wrong with my cron job? I appreciate any help!
(SUN-SAT, 4,0,6)
UPDATED:
this below gets same error Parameter ScheduleExpression is not valid
Events:
CloudWatchEvent:
Type: Schedule
Properties:
Schedule: cron(0 0 9,17,23 ? * * *)
MemorySize: 128
Timeout: 100
You have to specify a value for all six required cron fields
This should satisfy all your requirements:
0 4,14,22 ? * * *
Generated using:
https://www.freeformatter.com/cron-expression-generator-quartz.html
There are a lot of other cron generators you can find online.

How to set up job dependencies in google bigquery?

I have a few jobs, say one is loading a text file from a google cloud storage bucket to bigquery table, and another one is a scheduled query to copy data from one table to another table with some transformation, I want the second job to depend on the success of the first one, how do we achieve this in bigquery if it is possible to do so at all?
Many thanks.
Best regards,
Right now a developer needs to put together the chain of operations.
It can be done either using Cloud Functions (supports, Node.js, Go, Python) or via Cloud Run container (supports gcloud API, any programming language).
Basically you need to
issue a job
get the job id
poll for the job id
job is finished trigger other steps
If using Cloud Functions
place the file into a dedicated GCS bucket
setup a GCF that monitors that bucket and when a new file is uploaded it will execute a function that imports into GCS - wait until the operations ends
at the end of the GCF you can trigger other functions for next step
another use case with Cloud Functions:
A: a trigger starts the GCF
B: function executes the query (copy data to another table)
C: gets a job id - fires another function with a bit of delay
I: a function gets a jobid
J: polls for job is ready?
K: if not ready, fires himself again with a bit of delay
L: if ready triggers next step - could be a dedicated function or parameterized function
It is possible to address your scenario with either cloud functions(CF) or with a scheduler (airflow). The first approach is event-driven getting your data crunch immediately. With the scheduler, expect data availability delay.
As it has been stated once you submit BigQuery job you get back job ID, that needs to be check till it completes. Then based on the status you can handle on success or failure post actions respectively.
If you were to develop CF, note that there are certain limitations like execution time (max 9min), which you would have to address in case BigQuery job takes more than 9 min to complete. Another challenge with CF is idempotency, making sure that if the same datafile event comes more than once, the processing should not result in data duplicates.
Alternatively, you can consider using some event-driven serverless open source projects like BqTail - Google Cloud Storage BigQuery Loader with post-load transformation.
Here is an example of the bqtail rule.
rule.yaml
When:
Prefix: "/mypath/mysubpath"
Suffix: ".json"
Async: true
Batch:
Window:
DurationInSec: 85
Dest:
Table: bqtail.transactions
Transient:
Dataset: temp
Alias: t
Transform:
charge: (CASE WHEN type_id = 1 THEN t.payment + f.value WHEN type_id = 2 THEN t.payment * (1 + f.value) END)
SideInputs:
- Table: bqtail.fees
Alias: f
'On': t.fee_id = f.id
OnSuccess:
- Action: query
Request:
SQL: SELECT
DATE(timestamp) AS date,
sku_id,
supply_entity_id,
MAX($EventID) AS batch_id,
SUM( payment) payment,
SUM((CASE WHEN type_id = 1 THEN t.payment + f.value WHEN type_id = 2 THEN t.payment * (1 + f.value) END)) charge,
SUM(COALESCE(qty, 1.0)) AS qty
FROM $TempTable t
LEFT JOIN bqtail.fees f ON f.id = t.fee_id
GROUP BY 1, 2, 3
Dest: bqtail.supply_performance
Append: true
OnFailure:
- Action: notify
Request:
Channels:
- "#e2e"
Title: Failed to aggregate data to supply_performance
Message: "$Error"
OnSuccess:
- Action: query
Request:
SQL: SELECT CURRENT_TIMESTAMP() AS timestamp, $EventID AS job_id
Dest: bqtail.supply_performance_batches
Append: true
- Action: delete
You want to use an orchestration tool, especially if you want to set up this tasks as recurring jobs.
We use Google Cloud Composer, which is a managed service based on Airflow, to do workflow orchestration and works great. It comes with automatically retry, monitoring, alerting, and much more.
You might want to give it a try.
Basically you can use Cloud Logging to know almost all kinds of operations in GCP.
BigQuery is no exception. When the query job completed, you can find the corresponding log in the log viewer.
The next question is how to anchor the exact query you want, one way to achieve this is to use labeled query (means attach labels to your query) [1].
For example, you can use below bq command to issue query with foo:bar label
bq query \
--nouse_legacy_sql \
--label foo:bar \
'SELECT COUNT(*) FROM `bigquery-public-data`.samples.shakespeare'
Then, when you go to Logs Viewer and issue below log filter, you will find the exactly log generated by above query.
resource.type="bigquery_resource"
protoPayload.serviceData.jobCompletedEvent.job.jobConfiguration.labels.foo="bar"
The next question is how to emit an event based on this log for the next workload. Then, the Cloud Pub/Sub comes into play.
2 ways to publish an event based on log pattern are:
Log Routers: set Pub/Sub topic as the destination [1]
Log-based Metrics: create alert policy whose notification channel is Pub/Sub [2]
So, the next workload can subscribe to the Pub/Sub topic, and be triggered when the previous query has completed.
Hope this helps ~
[1] https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#jobconfiguration
[2] https://cloud.google.com/logging/docs/routing/overview
[3] https://cloud.google.com/logging/docs/logs-based-metrics

Jenkins Job information

How to get below mentioned job information by using Jenkins api or some other command line option.
Time-stamp of last job that succeeded.
Time-stamp of last job that failed.
I looked in to this API but its giving only build info but not the time stamp i.e at what time and date information when this build failed of succeeded.
http://javadoc.jenkins-ci.org/hudson/model/Job.html
You do this by using the method getLastSuccessfulBuild() and getLastFailedBuild() on Job and then asking each one for their timestamp. E.g. there is no methods for doing this directly on the Job node, instead you need multiple methods.
So, using, for example, the XML API, it would look something like this:
https://<JENKINS_URL>/job/<JOB_NAME>/api/xml?tree=lastSuccessfulBuild[timestamp],lastFailedBuild[timestamp]
In my case this gives me:
<freeStyleProject _class="hudson.model.FreeStyleProject">
<lastFailedBuild _class="hudson.model.FreeStyleBuild">
<timestamp>1484291786712</timestamp>
</lastFailedBuild>
<lastSuccessfulBuild _class="hudson.model.FreeStyleBuild">
<timestamp>1486285440897</timestamp>
</lastSuccessfulBuild>
</freeStyleProject>

How to define every five minutes to run jobs

How to define every five minutes to run jobs
in the play.jobs.every class ,it define an example every("1h") to run job every hour,bu i want to run every 5 minutes,how to define this.
i try the every("5m") or every("0.1h") ,play reports internal error.
Short answer:
You can use either of the following
#Every("5mn")
#Every("5min")
Long answer:
The #Every annotation uses the play.libs.Time class, and specifically the parseDuration method to determine how often the job is scheduled.
If you look at the source code, the Javadoc states...
/**
* Parse a duration
* #param duration 3h, 2mn, 7s
* #return The number of seconds
*/
This would suggest that you should specify your code as #Every("5mn")
If you look deeper into the code, it determines that the time is in minutes by using the following regular expression.
"^([0-9]+)mi?n$"
So, this states that either of the following are valid
#Every("5mn")
#Every("5min")
Try using "5min" in your annotation instead:
#Every("5min")

Celery task schedule (Celery, Django and RabbitMQ)

I want to have a task that will execute every 5 minutes, but it will wait for last execution to finish and then start to count this 5 minutes. (This way I can also be sure that there is only one task running) The easiest way I found is to run django application manage.py shell and run this:
while True:
result = task.delay()
result.wait()
sleep(5)
but for each task that I want to execute this way I have to run it's own shell, is there an easy way to do it? May be some king custom ot django celery scheduler?
Wow it's amazing how no one understands this person's question. They are asking not about running tasks periodically, but how to ensure that Celery does not run two instances of the same task simultaneously. I don't think there's a way to do this with Celery directly, but what you can do is have one of the tasks acquire a lock right when it begins, and if it fails, to try again in a few seconds (using retry). The task would release the lock right before it returns; you can make the lock auto-expire after a few minutes if it ever crashes or times out.
For the lock you can probably just use your database or something like Redis.
You may be interested in this simpler method that requires no changes to a celery conf.
#celery.decorators.periodic_task(run_every=datetime.timedelta(minutes=5))
def my_task():
# Insert fun-stuff here
All you need is specify in celery conf witch task you want to run periodically and with which interval.
Example: Run the tasks.add task every 30 seconds
from datetime import timedelta
CELERYBEAT_SCHEDULE = {
"runs-every-30-seconds": {
"task": "tasks.add",
"schedule": timedelta(seconds=30),
"args": (16, 16)
},
}
Remember that you have to run celery in beat mode with the -B option
manage celeryd -B
You can also use the crontab style instead of time interval, checkout this:
http://ask.github.com/celery/userguide/periodic-tasks.html
If you are using django-celery remember that you can also use tha django db as scheduler for periodic tasks, in this way you can easily add trough the django-celery admin panel new periodic tasks.
For do that you need to set the celerybeat scheduler in settings.py in this way
CELERYBEAT_SCHEDULER = "djcelery.schedulers.DatabaseScheduler"
To expand on #MauroRocco's post, from http://docs.celeryproject.org/en/v2.2.4/userguide/periodic-tasks.html
Using a timedelta for the schedule means the task will be executed 30 seconds after celerybeat starts, and then every 30 seconds after the last run. A crontab like schedule also exists, see the section on Crontab schedules.
So this will indeed achieve the goal you want.
Because of celery.decorators deprecated, you can use periodic_task decorator like that:
from celery.task.base import periodic_task
from django.utils.timezone import timedelta
#periodic_task(run_every=timedelta(seconds=5))
def my_background_process():
# insert code
Add that task to a separate queue, and then use a separate worker for that queue with the concurrency option set to 1.