YouTrack - Historical issue snapshots - youtrack

The new YouTrack API is missing the old Issue history /rest/issue/{issue}/history end-point which our code heavily depends on. There's only the Issue activities /api/issues/{issueID}/activities end-point, which returns only delta differences between changes from this never-ending list of diff/activity categories.
Is there some simple way to get a list of issue's historical snapshots, or do I actually have to parse all these activity categories and somehow merge them together to (re)implement this whole thing by myself?

The /history endpoint didn't provide a history snapshot either, but /activity does output much more data indeed. Yet, that's the way to do it — traverse through data and build a snapshot based on the provided timestamps.

Related

Most efficient way to clean data in BigQuery

I need some help cleaning my data...
I have a BQ table where I receive new entries from my back-end, these data are recorded to my BQ and I'm using Google Data Studio to present these data.
My problem is, I a field named sessions that sometimes are duplicates, I can't solve that directly in my back-end because a user can send different data from the same session so I can't just stop recording duplicates.
I've managed my problem by creating a View that selects the newest duplicate record and I'm using this view as data-source for my report. The problem with this approach is that I lost the feature of "real-time report" and that is important in this case. And another problem is that I also lost Accelerated by BigQuery BI Engine and I would like to have these feature too.
Is this the best solution for my problem and I'll need to accept this outcome or there is another way?
Many thanks in advance, kind regards.
Using the view should work for BI Engine acceleration. Can you please share more details on BI Engine? It should show you the reason query wasn't accelerated, likely mentioning one of the limitations. If you hover over "not accelerated" sign it should give you more details on why your query wasn't supported. Feel free to share it here and I will be happy to help.
Another way you can clean up the data: Have scheduled job to preprocess the data. It will mean data may not be the most recent, but it will give you ability to clean up and aggregate data.

How to pre-process BigQuery data coming from Stackdriver

I am currently exporting logs from Stackdriver to BigQuery using sinks. But i am only interessted in the jsonPayload. I would like to ignore pretty much everything else.
But since the table creation and data insertion happens automatically, i could not do this.
Is there a way to preprocess data coming from sink to store only what matters?
If the answer is no, is there a way to run a cron job each day to copy yesterday data into a seperate table and then remove it? (knowing that the tables are named using timestamps which makes it possible to query them by day)
As far as I know both options mentioned are currently not possible in the GCP platform. On my end I've also tried to create an internal reproduction of your request and noticed that there isn't a way to solely filter the jsonPayload.
I would therefore suggest creating a feature request in regards to your ask on the following public issue tracker link. Note that feature requests do not have an ETA as to when they'll processed or if they'll be implemented.

What is the best way to structure this database?

So I am in the process of building a database from my clients data. Each month they create roughly 25 csv's, which are unique by their topic and attributes, but they all have 1 thing in common; a registration number.
The registration number is the only common variable across all of these csv's.
My task is to move all of this into a database, for which I am leaning towards postgres (If anyone believes nosql would be best for this then please shout out!).
The big problem; structuring this within a database. Should I create 1 table per month that houses all the data, with column 1 being registration and column 2-200 being the attributes? Or should put all the csv's into postgres as they are, and then join them later?
I'm struggling to get my head around the method to structure this when there will be monthly updates to every registration, and we dont want to destroy historical data - we want to keep it for future benchmarks.
I hope this makes sense - I welcome all suggestions!
Thank you.
There are some ways where your question is too broad and asking for an opinion (SQL vs NoSQL).
However, the gist of the question is whether you should load your data one month at a time or into a well-developed data model. Definitely the latter.
My recommendation is the following.
First, design the data model around how the data needs to be stored in the database, rather than how it is being provided. There may be one table per CSV file. I would be a bit surprised, though. Data often wants to be restructured.
Second, design the archive framework for the CSV files.
You should archive all the incoming files in a nice directory structure with files from each month. This structure should be able to accommodate multiple uploads per month, either for all the files or some of them. Mistakes happen and you want to be sure the input data is available.
Third, copy (this is the Postgres command) the data into staging tables. This is the beginning of the monthly process.
Fourth, process the data -- including doing validation checks to load it into your data model.
There may be tweaks to this process, based on questions such as:
Does the data need to be available 24/7 even during the upload process?
Does a validation failure in one part of the data prevent uploading any data?
Are SQL checks (referential integrity and check) sufficient for validating the data?
Do you need to be able to "rollback" the system to any particular update?
These are just questions that can guide your implementation. They are not intended to be answered here.

Data modeling Issue for Moqui custom application

We are working on one custom project management application on top of Moqui framework. Our requirement is, we need to inform any changes in ticket to the developers associated with the project through email.
Currently we are using WorkEffortParty entity to store all parties associated with the project and then PartyContactMech entity to store their email addresses. Here we need to iterate through WorkEffortParty and PartyContactMech everytime to fetch all email address to which we need to send emails for changes in tickets every time.
To avoid these iterations, we are now thinking of giving feature to add comma separated email addresses at project level. Project admin can add email addresses of associated parties or mailing list address to which he needs to send email notification for ticket change.
For this requirement, we studied around the data model but we didn't got the right place to store this information. Do we need to extend any entity for this or is there any best practice for this? This requirement is very useful in any project management application. We appreciate any help on this data modeling problem.
The best practice is to use existing data model elements as they are available. Having a normalized data model involves more work in querying data, but also more flexibility in addressing a wide variety of requirements without changes to the data structures.
In this case with a joined query you can get a list of email addresses in a single query based on the project's workEffortId. If you are dealing with massive data and message volumes there are better solutions than denormalizing source data, but I doubt that's the case... unless you're dealing with more than thousands of projects and millions of messages per day the basic query and iterate approach will work just fine.
If you need to go beyond that the easiest approach with Moqui is to use a DataDocument and DataFeed to send updates on the fly to ElasticSearch, and then use it for your high volume queries and filtering (with arbitrarily complex filtering, etc requirements).
Your question is way too open to answer directly, data modeling is a complex topic and without good understanding of context and intended usage there are no good answers. In general it's best to start with a data model based on decades of experience and used in a large number of production systems. The Mantle UDM is one such model.

What is this vague accusation of RRD data loss about?

I want to use CollectD to gather some statistics (about storage) and have Graphite display them nicely. Apparently this can be done either by
having CollectD store the data as RRD files and pointing Graphite at
those, or
using a CollectD plugin to push the data to Graphite's Carbon API, which will store the data in a Whisper database (which is similar to RRD but not compatible).
I think I want to go with RRDs, but I found this statement in the Whisper docs that concerns me:
In many cases (depending on configuration) if an update is made to an
RRD series but is not followed up by another update soon, the original
update will be lost.
Hmmm. That's a bit scary, but the accusation is so vague that I don't know what to make of it. What is the configuration they are talking about, and the situation in which it causes data loss?
My situation is that the metrics data I am gathering will be available in chunks -- periodically I will go get the latest data and make as many entries into the database as there are new samples available. So, for example, I might grab some data and update the database with the values from 3 minutes ago, 2 minutes ago, and 1 minute ago, one right after the other. In fact, I might have dozens of new samples to put in the database at once. Does using RRD this way have anything to do with the Whisper accusation?
NOTE: I do not need to back-fill data; I will always be adding newer data than what has already been stored.
One scenario I see this happening would be if you have an AVERAGE RRA setup, and have the xxf value set to a low percentage. When the data is compressed over time, you could receive an unknown value and 'loose' all the data that was averaged. If you are using a RRD for what it was designed for, and have it setup with the proper type and settings, I wouldn't think you will run into a problem.
I would recommend taking an in depth look at the RRD documentation found HERE to answer questions about how RRD's and RRA's handle the data, and the different storage techniques that are available to you.