How to create diff object from patch? - libgit2

I have git patch in database. How can I convert it to diff object?
Here https://github.com/libgit2/rugged#diffs I can take patch from diff diff.patch I want to make opposite operation.

I have git patch in database
Do you mean you have a diff in text format stored? In that case you cannot convert it to a git_diff, as there is no parser for unidiff. It wouldn't help much, since there's nothing libgit2 would be able to do with such an object.
There will likely be one at some point, as it's needed for some versions of rebase, but for now I'd suggest storing which objects you diffed and recreating the diff from there.

Related

Updating Parquet datasets where the schema changes overtime

I have a single parquet file that I have been incrementally been building every day for several months. The file size is around 1.1GB now and when read into memory it approaches my PCs memory limit. So, I would like to split it up into several files base on the year and month combination (i.e. Data_YYYYMM.parquet.snappy) that will all be in a directory.
My current process reads in the daily csv that I need to append, reads in the historical parquet file with pyarrow and converts to pandas, concats the new and historical data in pandas (pd.concat([df_daily_csv, df_historical_parquet])) and then writes back to a single parquet file. Every few weeks the schema of the data can change (i.e. a new column). With my current method this is not an issue since the concat in pandas can handle the different schemas and I overwriting it each time.
By switching to this new setup I am worried about having inconsistent schemas between months and then being unable to read in data over multiple months. I have tried this already and gotten errors due to non matching schemas. I thought might be able to specify this with the schema parameter in pyarrow.parquet.Dataset. From the doc it looks like it takes a type of pyarrow.parquet.Schema. When I try using this I get AttributeError: module 'pyarrow.parquet' has no attribute 'Schema'. I also tried taking the schema of a pyarrow Table (table.schema) and passing that to the schema parameter but got an error msg (sry I forget error right now and cant connect workstation right now so cant reproduce error - I will update with this info when I can).
I've seen some mention of schema normalization in the context of the broader Arrow/Datasets project but I'm not sure if my use case fits what that covers and also the Datasets feature is experimental so I dont want to use it in production.
I feel like this is a pretty common use case and I wonder if I am missing something or if parquet isn't meant for schema changes over time like I'm experiencing. I've considered investigating the schema of the new file and comparing vs historical and then if there is change deserializing, updating schema, and reserializing every file in the dataset but I'm really hoping to avoid that.
So my questions are:
Will using a pyarrow parquet Dataset (or something else in the pyarrow API) allow me to read in all of the data in multiple parquet files even if the schema is different? To be specific, my expectation is that the new column would be appended and the values prior to when this column were available would be null). If so, how do you do this?
If the answer to 1 is no, is there another method or library for handling this?
Some resources I've been going through.
https://arrow.apache.org/docs/python/dataset.html
https://issues.apache.org/jira/browse/ARROW-2659
https://arrow.apache.org/docs/python/generated/pyarrow.parquet.ParquetDataset.html#pyarrow.parquet.ParquetDataset

What ABAP object has been changed today?

Some functionality in a big project is broken on the development system.
Pretty sure it worked a few hours ago.
How do I know, which ABAP objects have been changed lately?
(I think I can guess the transport and the package that contains the change if that helps)
The nearest answer that I found is table VRSD.
It contains the date of the version of an object.
This doesn't help, since you need to export the transport or create a manual version to get an entry in this table.
So which objects have been changed without creating a new version?
(Yes we will find the change with functional checks, but knowing the changed objects would be a nice shortcut)
For code - table TRDIR has a changed on date that updates when code is activated.
For data dictionary objects check the DD* tables. I know DD01L is domains and DD02L is tables. Both of these will have a change date. I'm sure there are others for the other data types.
There is also the table REPOLOAD which contains the ABAP byte code. There are 3 fields UDAT, UTIME and UNAME for date, time and user who did the last generation (PS: don't be confused by SDAT and STIME fields).

How to parse aerospike backup file to regenerate data?

In the backup file there are a lot of encoded values. How do I get back the original data.
For example there is
+ d q+LsiGs1gD9duJDbzQSXytajtCY=
which is of the format ["+"] [SP] ["d"] [SP] [{digest}] [LF] where q+LsiGs1gD9duJDbzQSXytajtCY= is the key digest. How would the get the primary key from this?
Also Map and List values are represented as opaque byte values. How do we restore the original Map and List?
I would currently need to do all this if I wanted to make a CSV dump out of the backup.
The tool asbackup is an open source tool, as is asrestore. The file format is described in the repo aerospike/aerospike-tools-backup on GitHub.
Alternatively, you could use the Kafka connector to move data from Aerospike to another database via Kafka.
The easiest way to do what you're looking for is still to write a program that scans the target namespace, and parses each record into a csv format. You can use predicate filtering to only get records whose last-update-time is greater than a specific timestamp, giving you the progressive backup you want. See the PredExp class of the Java client and its examples.

Best data structure to store temperature readings over time

I used to work with SQL like MySQL, Postgres or MSSQL.
Now I want to play with Redis. I'm working on a little home project, that I think is the best choice for starting using Redis.
I have a machine that reads temperature (indoor and outdoor) and humidity. I need to store the readings into Redis. Can you help me to understand the best data structure to do so?
Other than this data I need to store the time (ex. unix timestamp) of the temperature reading for use plotting a graphic.
I installed Redis read the documentation, so I understand the commands and data types.
Since this is your first Redis project and it's a home project, I'd be careful about being to careful. Here's a couple ways to consider designing it (NOTE: I only dug deep into REDIS this past weekend so hopefully others will weigh in).
IDEA 1:
Four ordered sets
KEY for sets are "indoor_temps", "outdoor_temps", "indoor_humidity", "outdoor_humidity"
VALUES are the temperatures / humidities
SCORE is the date stored as EPOCH
IDEA 2:
Four types of keys (best shown by example)
datetime_key = /year:2014/month:07/day:12/hour:07/minute:32/second:54
type_keys = [indoor_temps, outdoor_temps, indoor_humidity, outdoor_humidity]
keys are of form type + "/" + datetime_key
values are the temp and humidity itself
You probably want to implement some initial design and then work with the data immediately - graph it, do stats, etc. Whatever you plan to do with it. That will expose flaws and if they are major, flush the database and try again. These designs should really only take ~1 hour to implement since the only thing you're really changing is a few Redis commands and some string manipulation to convert the data to keys.
I like Tony's suggestions, but I'll also throw out another possibility.
4 lists
keys are "indoor_temps", "outdoor_temps", "indoor_humidity", "outdoor_humidity"
values are of the form < timestamp >_< reading > ie.( "1403197981_27.2" )
Push items onto the front of the list using LPUSH. Get a set of readings using LRANGE. The list will always be ordered by the time of the reading. Obviously split the value on "_" to get your time and reading...
In all honesty, this will give the same properties as Tony's first example, with slightly worse lookup performance, but better memory usage. I'm guessing for this project you'll be neither memory, nor CPU constrained, so the choice is probably not an issue. That said, if you expect to be saving 100's of thousands or more readings, I would suggest the list unless you want to consume a large portion of your system's memory.
Also, it's a good idea to call EXPIRE on your entries with some reasonable TTL that encompasses the length of time you want to save the readings for. If your plan is to have them live in perpetuity then you may want to look at backing them up to a disk DB over time, and just use Redis as a quick lookup cache for recent readings.
Thank to all answer, I choose this strucure:
4 lists: tempIN, tempOut, humidIN and humidOUT
values are: [value]:[timestamp]. For example: "25.4:1403615247"
As suggested from wallacer i want to backup old entries out from Redis.
For main frontend i need only last two days of sample.
For example i can create Redis RDB file snapshot and "trim" the live lists. This solution is not convenient in the event that, in the future you want to recover old values​​.
Do you have any tips on what kind of procedure to adopt to store the data? Maybe use of SQLIte DB?

querying generation_time on mongo ids

John Nunemaker has a blog post with some nice tips about Mongo ObjectIds -- http://mongotips.com/b/a-few-objectid-tricks/ -- in particular I was interested in the tip about generation_time. He suggests it's not necessary to explicitly store the created_at time in mongo documents because you can always pull it from the ID, which caught my attention. Problem is I can't figure out how to generate mongo queries in mongomapper to find documents based on creation time if all I have is the id.
If I store a key :created_at as part of the document I can do a query in mongomapper to get all documents created since Dec 1st like this:
Foo.where(:created_at.gt=>Time.parse("2011-12-01"))
(which maps to:
{created_at: {"$gt"=>Thu Dec 01 06:00:00 UTC 2011}}
I can't figure out how to make the equivalent query using the ObjectId.. I imagine it'd look something like this (though obviously generation_time is a ruby function, but is there an equivalent I can use on the objectid in the context of a mongo query?)
Foo.where('$where'=>"this.id.generation_time > new Date('2011-12-01')")
{$where: "this.id.generation_time > new Date('2011-12-01')"}
One further question: if I forgo storing separate timestamps, will I lose the timestamp metadata if I dump and restore my database using mongodump? Are there recommended backup/restore techniques that preserve ObjectIds?
this is javascript code which would be run in the shell but generation time is a mongomapper method so it doesn't make sense in the code you have.
In rails you would get the id by saying something like
created_at = self.id.generation_time.in_time_zone(Time.zone)
Where self refers to an instance of Foo.
And you would query by saying
Foo.find('_id' => {'$gte' => BSON::ObjectId.from_time(created_at)}).count
Why bother though... the hassle isn't worth it, just store the time.
Regarding the backup/restore techniques, unless you are manually reading and re-inserting mongodump/restore and similar tools will preserve the object id so you have nothing to worry about there.