pushgateway or node exporter - how to add string data? - sql

I have a cron job that runs an sql query every day, and gives me an important integer.
And I have to expose that integer to the Prometheus server.
As I've seen I have two options; use the pushgateway or node exporter.
But that metric (integer) that I get from the sql query also need some information (like the company name, and the database that I got it from).
What would be a better way?
For instance this is what I made for my metric:
count = some number
registry = CollectorRegistry()
g = Gauge('machine_number', 'machfoobarine_stat', registry=registry).set(count)
push_to_gateway('localhost:9091', job='batchA', registry=registry)
So how do I add key-value pairs to my metric above?
Do I have to change the job name ('batchA') for every single sql count that I get and expose as a metric to the pushgateway, because I can only see the last one?
Tnx,
Tom

The best way is to set a general name to your metric, for example animal_count and then specialize it with label. Here is an pseudocode:
g = Gauge.build("animal_count", "Number of animal in zoo")
.labelsName("sex", "classes")
.create();
g.labels("male", "mammals")
.set(count);

Related

How to setup a pr. job_type.Murnaghan that for each volume reads the structure output of another Murnaghan job?

For the Ene-Vol calculations of the non-cubic structures, one has to relax the structures for all volumes.
Suppose that I start with a pr.jobtype.Murnaghan() job that its ref_job_relax is a cell-shape and internal coordinates relaxation. Let's call the Murnaghan job R1 with 7 volumes, i.e. R1-V1,...,R1-V7.
After one or more rounds of relaxation (R1...RN), one has to perform a static calculation to acquire a precise energy. Let's call the final static round S.
For the final round, I want to create a pr.jobtype.Murnaghan() job that reads all the required setup configurations from the ref_job_static except the input structures .
Then for each volume S-Vn it should read the corresponding output structure of RN-Vn, e.g. R1-V1-->S-V1, ..., R1-V7-->S-V7 if there were only one round of relaxation.
I am looking for an implementation like below:
murn_relax = pr.create_job(pr.job_type.Murnaghan, 'R1')
murn_relax.ref_job = ref_job_relax
murn_relax.run()
murn_static = pr.create_job(pr.job_type.Murnaghan, 'S', continuation=True)
murn_static.ref_job = ref_job_static
murn_static.structures_from(prev_job='R1')
murn_static.run()
The Murnaghan object has two relevant functions:
get_structure() https://github.com/pyiron/pyiron_atomistics/blob/master/pyiron_atomistics/atomistics/master/murnaghan.py#L829
list_structures() https://github.com/pyiron/pyiron_atomistics/blob/master/pyiron_atomistics/atomistics/master/murnaghan.py#L727
The first returns the predicted equilibrium structure and the second returns the structures at the different volumes.
In addition you can get the IDs of the children and iterate over those:
structure_lst = [
pr.load(job_id).get_structure()
for job_id in murn_relax.child_ids
]
to get a list of converged structures.

How to do math operation with columns on grouped rows

I have Event model with following attributes (I quoted only problem related attributes), this model is filled periodically by API call, calling external service (Google Calendar):
colorid: number # (0-11)
event_start: datetime
event_end: datetime
I need to count duration of grouped events, grouped by colorid. I have Event instance method to calculate single event duration:
def event_duration
((event_end.to_datetime - event_start.to_datetime) * 24 * 60 ).to_i
end
Now, I need to do something like this:
event = Event.group(:colorid).sum(event_duration)
But this doesnot work for me, as long as I get error that event_duration column doesnot exists. My idea is to add one more attribute to Event model "event_duration", and count and update this attribute during Event record creation, in this case I would have column called "event_duration", and I might be ale to use sum on this attribute. But I am not sure this is good and "system solution", as long as I would like to have model data reflecting "raw" received data from API call, and do all math and statistics on the top of model data.
event_duration is instance method (not column name). error was raised because Event.sum only calculates the sum of certain column
on your case, I think it would be easier to use enumerable methods
duration_by_color_id = {}
grouped_events = Event.all.group_by(&:colorid)
grouped_events.each do |colorid, events|
duration_by_color_id[colorid] = events.collect(&:event_duration).sum
end
Source :
Enumerable's group_by
Enumerable's collect

Django ORM Cross Product

I have three models:
class Customer(models.Model):
pass
class IssueType(models.Model):
pass
class IssueTypeConfigPerCustomer(models.Model):
customer=models.ForeignKey(Customer)
issue_type=models.ForeignKey(IssueType)
class Meta:
unique_together=[('customer', 'issue_type')]
How can I find all tuples of (custmer, issue_type) where there is no IssueTypeConfigPerCustomer object?
I want to avoid a loop in Python. A solution which solves this in the DB would be preferred.
Background: for every customer and for every issue-type, there should be a config in the DB.
If you can afford to make one database trip for each issue type, try something like this untested snippet:
def lacking_configs():
for issue_type in IssueType.objects.all():
for customer in Customer.objects.filter(
issuetypeconfigpercustomer__issue_type=None
):
yield customer, issue_type
missing = list(lacking_configs())
This is probably OK unless you have a lot of issue types or if you are doing this several times per second, but you may also consider having a sensible default instead of making a config object mandatory for each combination of issue type and customer (IMHO it is a bit of a design-smell).
[update]
I updated the question: I want to avoid a loop in Python. A solution which solves this in the DB would be preferred.
In Django, every Queryset is either a list of Model instances or a dict (values querysets), so it is impossible to return the format you want (a list of tuples of Model) without some Python (and possibly multiple trips to the database).
The closest thing to a cross product would be using the "extra" method without a where parameter, but it involves raw SQL and knowing the underlying table name for the other model:
missing = Customer.objects.extra(
select={"issue_type_id": 'appname_issuetype.id'},
tables=['appname_issuetype']
)
As a result, each Customer object will have an extra attribute, "issue_type_id", containing the id of one IssueType. You can use the where parameter to filter based on NOT EXISTS (SELECT 1 FROM appname_issuetypeconfigpercustomer WHERE issuetype_id=appname_issuetype.id AND customer_id=appname_customer.id). Using the values method you can have something close to what you want - this is probably enough information to verify the rule and create the missing records. If you need other fields from IssueType just include them in the select argument.
In order to assemble a list of (Customer, IssueType) you need something like:
cross_product = [
(customer, IssueType.objects.get(pk=customer.issue_type_id))
for customer in
Customer.objects.extra(
select={"issue_type_id": 'appname_issuetype.id'},
tables=['appname_issuetype'],
where=["""
NOT EXISTS (
SELECT 1
FROM appname_issuetypeconfigpercustomer
WHERE issuetype_id=appname_issuetype.id
AND customer_id=appname_customer.id
)
"""]
)
]
Not only this requires the same number of trips to the database as the "generator" based version but IMHO it is also less portable, less readable and violates DRY. I guess you can lower the number of database queries to a couple using something like this:
missing = Customer.objects.extra(
select={"issue_type_id": 'appname_issuetype.id'},
tables=['appname_issuetype'],
where=["""
NOT EXISTS (
SELECT 1
FROM appname_issuetypeconfigpercustomer
WHERE issuetype_id=appname_issuetype.id
AND customer_id=appname_customer.id
)
"""]
)
issue_list = dict(
(issue.id, issue)
for issue in
IssueType.objects.filter(
pk__in=set(m.issue_type_id for m in missing)
)
)
cross_product = [(c, issue_list[c.issue_type_id]) for c in missing]
Bottom line: in the best case you make two queries at the cost of legibility and portability. Having sensible defaults is probably a better design compared to mandatory config for each combination of Customer and IssueType.
This is all untested, sorry if some homework was left for you.

order_by() method not working in peewee

I am using a SQLite backend with a simple show - season - episode schema:
class Show(BaseModel):
name = CharField()
class Season(BaseModel):
show = ForeignKeyField(Show, related_name='seasons')
season_number = IntegerField()
class Episode(BaseModel):
season = ForeignKeyField(Season, related_name='episodes')
episode_number = IntegerField()
and I would need the following query :
seasons = (Season.select(Season, Episode)
.join(Episode)
.where(Season.show == SHOW_ID)
.order_by(Season.season_number.desc(), Episode.episode_number.desc())
.aggregate_rows())
SHOW_ID being the id of the show for which I want the list of seasons.
But when I iterate over the query with the following code :
for season in seasons:
for episode in season.episodes:
print(episode.episode_number)
... I get something which is not ordered at all, and which does not even follow the order I would get without using order_by(), i.e. the insertion order.
I activated the debug logs to see the outgoing query, and the query does contain the ORDER BY clause, and manually applying it returns the proper descending order.
I am new to peewee, and I have seen so many examples making use of a join() combines with an order_by(), but I can still not find out what I am doing wrong.
This was due to a bug in the processing of nested collections in the aggregate query result wrapper.
The github issue is: https://github.com/coleifer/peewee/issues/519
The fix has been merged here: https://github.com/coleifer/peewee/commit/ec0e87f1a480695d98bf1f0d7f2e63aed8dfc440
So, to get the fix you'll need to either clone master or wait til the next release which should be in the next week or two (2.4.7).

Rails show different object every day

I want to match my user to a different user in his/her community every day. Currently, I use code like this:
#matched_user = User.near(#user).order("RANDOM()").first
But I want to have a different #matched_user on a daily basis. I haven't been able to find anything in Stack or in the APIs that has given me insight on how to do it. I feel it should be simpler than having to resort to a rake task with cron. (I'm on postgres.)
Whenever I find myself hankering for shared 'memory' or transient state, I think to myself "this is what (distributed) caches were invented for".
#matched_user = Rails.cache.fetch(#user.cache_key + '/daily_match', expires_in: 1.day) {
User.near(#user).order("RANDOM()").first
}
NOTE: While specifying a TTL for cache entry tells Rails/the cache system to try and keep that value for the given timeframe, there's NO guarantee that it will. In particular, a cache that aggressively tries to reclaim memory may expire an entry well before its desired expires_in time.
For this particular use case, it shouldn't be a big deal but in cases where the business/domain logic demands periodically generated values that are durable then you really have to factor that into your database.
How about using PostgreSQL's SETSEED function? I used the date to seed so that every day the seed will change, but within a day, the seed will be consistent.:
User.connection.execute "SELECT SETSEED(#{Date.today.strftime("%y%d%m").to_i/1000000.0})"
#matched_user = User.near(#user).order("RANDOM()").first
You may want to seed a random value after using this so that any future calls to random aren't biased:
random = User.connection.execute("SELECT RANDOM()").to_a.first["random"]
# Same code as above:
User.connection.execute "SELECT SETSEED(#{Date.today.strftime("%y%d%m").to_i/1000000.0})"
#matched_user = User.near(#user).order("RANDOM()").first
# Use random value before seed to make new seed:
User.connection.execute "SELECT SETSEED(#{random})"
I have split these steps in different sections just for readability. you can optimise query later.
1) Find all user records till today morning. so that the count will freeze.
usrs_till_today_morning = User.where("created_at <?", DateTime.now.in_time_zone(Time.zone).beginning_of_day)
2) Pluck all ID's
user_ids = usr_till_today_morning.pluck(:id)
3) Today date it will be a range (1..30) but will remain constant throughout the day.
day_today = Time.now.day
4) Select the same ID for the day
todays_user_id = user_ids[day_today % user_ids.count]
#matched_user = User.find(todays_user_id)
So it will give you random user records by maintaining same record throughout the day!!