Google diff-match-patch : How to unpatch to get Original String? - google-diff-match-patch

I am using Google diff-match-patch JAVA plugin to create patch between two JSON strings and storing the patch to database.
diff_match_patch dmp = new diff_match_patch();
LinkedList<Patch> diffs = dmp.patch_make(latestString, originalString);
String patch = dmp.patch_toText(diffs); // Store patch to DB
Now is there any way to use this patch to re-create the originalString by passing the latestString?
I google about this and found this very old comment # Google diff-match-patch Wiki saying,
Unpatching can be done by just looping through the diff, swapping
DIFF_INSERT with DIFF_DELETE, then applying the patch.
But i did not find any useful code that demonstrates this. How could i achieve this with my existing code ? Any pointers or code reference would be appreciated.
Edit:
The problem i am facing is, in the front-end i am showing a revisions module that shows all the transactions of a particular fragment (take for example an employee details), like which user has updated what details etc. Now i am recreating the fragment JSON by reverse applying each patch to get the current transaction data and show it as a table (using http://marianoguerra.github.io/json.human.js/). But some JSON data are not valid JSON and I am getting JSON.parse error.

I was looking to do something similar (in C#) and what is working for me with a relatively simple object is the patch_apply method. This use case seems somewhat missing from the documentation, so I'm answering here. Code is C# but the API is cross language:
static void Main(string[] args)
{
var dmp = new diff_match_patch();
string v1 = "My Json Object;
string v2 = "My Mutated Json Object"
var v2ToV1Patch = dmp.patch_make(v2, v1);
var v2ToV1PatchText = dmp.patch_toText(v2ToV1Patch); // Persist text to db
string v3 = "Latest version of JSON object;
var v3ToV2Patch = dmp.patch_make(v3, v2);
var v3ToV2PatchTxt = dmp.patch_toText(v3ToV2Patch); // Persist text to db
// Time to re-hydrate the objects
var altV3ToV2Patch = dmp.patch_fromText(v3ToV2PatchTxt);
var altV2 = dmp.patch_apply(altV3ToV2Patch, v3)[0].ToString(); // .get(0) in Java I think
var altV2ToV1Patch = dmp.patch_fromText(v2ToV1PatchText);
var altV1 = dmp.patch_apply(altV2ToV1Patch, altV2)[0].ToString();
}
I am attempting to retrofit this as an audit log, where previously the entire JSON object was saved. As the audited objects have become more complex the storage requirements have increased dramatically. I haven't yet applied this to the complex large objects, but it is possible to check if the patch was successful by checking the second object in the array returned by the patch_apply method. This is an array of boolean values, all of which should be true if the patch worked correctly. You could write some code to check this, which would help check if the object can be successfully re-hydrated from the JSON rather than just getting a parsing error. My prototype C# method looks like this:
private static bool ValidatePatch(object[] patchResult, out string patchedString)
{
patchedString = patchResult[0] as string;
var successArray = patchResult[1] as bool[];
foreach (var b in successArray)
{
if (!b)
return false;
}
return true;
}

Related

Materialized view to use different Serde

Version used: Kafka 3.1.1, Confluent 7.1.0, Avro 1.11.0
I’m creating a REST controller which is “searching” for AVRO objects in a topic. The objects in the topic are serialized using SpecificAvroSerde<>. Each topic has assigned two AVRO schemas. One for the key (with several fields of various types) and one for the value (multiple fields and types).
I’ve done this several times whereby I’m consuming the topic in a KTable and then materialize it. There is only one pair of serdes involved and the serialized format is the same for both the topic and the materialized view (RocksaltDb). The REST controller then can look up the store and either perform a get with a key or do a range scan between two keys. This all works as expected.
private final static String TOPIC_NAME = "input-topic";
private final static String VIEW_NAME = "materialized-view";
private final SpecificAvroSerde<ProductXrefKey> productXrefKeySerde = new SpecificAvroSerde<>();
private final SpecificAvroSerde<ProductXref> productXrefSerde = new SpecificAvroSerde<>();
final Map<String, Object> props = this.kafkaProperties.buildStreamsProperties();
productXrefKeySerde.configure(props, true);
productXrefSerde.configure(props, false);
KTable<ProductXrefKey, ProductXref> productXrefTable = builder
.table(TOPIC_NAME, Consumed.with(productXrefKeySerde, productXrefSerde),
Materialized.<ProductXrefKey, ProductXref, KeyValueStore<Bytes, byte[]>>as(VIEW_NAME)
.withKeySerde(productXrefKeySerde)
.withValueSerde(productXrefSerde));
<…>
final ReadOnlyKeyValueStore<ProductXrefKey, ProductXref> store =
streamsBuilderFactoryBean.getKafkaStreams().store(fromNameAndType(VIEW_NAME, keyValueStore()));
try (KeyValueIterator<ProductXrefKey, ProductXref> range = store.range(fromKey, toKey)) {
if (range != null) {
range.forEachRemaining(kv -> {
<…>
});
} else {
log.info("Could not find {} in local ReadOnlyKeyValueStore {}", fromKey, viewName);
}
}
I now want to change this using a prefix scan instead. Since the key contains multiple fields there is no way to only serialize first part (i.e. first few fields) of the key I need a specialized serializer. This also means I have to use a different serializer for the materialized view itself (SpecificAvroSerde puts the magic number and schema ID at the beginning of the byte array) as otherwise the serialized output for the prefix and the key in the materialized view can’t be compared. Hence I created a specialised Serde which serializes the key using the same logic as when used for serializing the prefix but omitting the fields not required for the scan (i.e. omitting the last field). Above code now looks
private final static String TOPIC_NAME = "input-topic";
private final static String VIEW_NAME = "materialized-view";
private final SpecificAvroSerde<ProductXrefKey> productXrefKeySerde = new SpecificAvroSerde<>();
private final SpecificAvroSerde<ProductXref> productXrefSerde = new SpecificAvroSerde<>();
private final SpecificAvroSerde<ProductXrefKey> materializedProductXrefKeySerde = new ProductXrefKeySerde();
// for the value part we can still used standard serde as no change in serialization logic needed
private final SpecificAvroSerde<ProductXref> materializedProductXrefSerde = new SpecificAvroSerde<>();
// telling the serializer to cut off last field
private final SpecificAvroSerde<ProductXref> prefixScanProductXrefSerde = new ProductXrefKeySerde(true);
final Map<String, Object> props = this.kafkaProperties.buildStreamsProperties();
productXrefKeySerde.configure(props, true);
productXrefSerde.configure(props, false);
KTable<ProductXrefKey, ProductXref> productXrefTable = builder
.table(TOPIC_NAME, Consumed.with(productXrefKeySerde, productXrefSerde),
Materialized.<ProductXrefKey, ProductXref, KeyValueStore<Bytes, byte[]>>as(VIEW_NAME)
.withKeySerde(materializedProductXrefKeySerde)
.withValueSerde(materializedProductXrefSerde));
<…>
final ReadOnlyKeyValueStore<ProductXrefKey, ProductXref> store =
streamsBuilderFactoryBean.getKafkaStreams().store(fromNameAndType(VIEW_NAME, keyValueStore()));
try (KeyValueIterator<ProductXrefKey, ProductXref> range = store.prefixScan(prefixKey, prefixScanProductXrefSerde)){
if (range != null) {
range.forEachRemaining(kv -> {
<…>
});
} else {
log.info("Could not find {} in local ReadOnlyKeyValueStore {}", prefixKey, viewName);
}
}
My assumption was, that the topic gets deserialized using the SpecificAvroSerde and then gets serialized for the view using my ProductXrefKeySerde. The problem is, that the content in the materialized view is still serialized using the same logic as in the original topic. It appears that the serializer is never used during the topic being processed and stored in the materialized view. I can verify that also on the file system and see that the keys in the RocksaltDb files are serialized with the magic byte and schema ID and hence prefixScan wont be able to fine anything.
How can I change the serialization format for the materialized view?
Or is there a better way for serializing a prefix AVRO object?
It appears that there is some optimization happening which avoids deserialization/serialization if KTable is directly materialized. I've changed the logic such that it consumes it as a KStream and then creates the KTable (toTable(...))
KTable<ProductXrefKey, ProductXref> productXrefStream = builder
.stream(TOPIC_NAME, Consumed.with(productXrefKeySerde, productXrefSerde))
.toTable(Materialized.<ProductXrefKey, ProductXref, KeyValueStore<Bytes, byte[]>>as(VIEW_NAME)
.withKeySerde(productXrefKeySerde)
.withValueSerde(productXrefSerde));
With this small change, data now gets deserialized (using SpecificAvroSerde<>) and serialized again using the provided ProductXrefKeySerde. Now also the prefix scan works and returns the records as expected.

Update Document with external object

i have a database containing Song objects. The song class has > 30 properties.
My Music Tagging application is doing changes on a song on the file system.
It then does a lookup in the database using the filename.
Now i have a Song object, which i created in my Tagging application by reading the physical file and i have a Song object, which i have just retrieved from the database and which i want to update.
I thought i just could grab the ID from the database object, replace the database object with my local song object, set the saved id and store it.
But Raven claims that i am replacing the object with a different object.
Do i really need to copy every single property over, like this?
dbSong.Artist = songfromFilesystem.Artist;
dbSong.Album = songfromFileSystem.Album;
Or are there other possibilities.
thanks,
Helmut
Edit:
I was a bit too positive. The suggestion below works only in a test program.
When doing it in my original code i get following exception:
Attempted to associate a different object with id 'TrackDatas/3452'
This is produced by following code:
try
{
originalFileName = Util.EscapeDatabaseQuery(originalFileName);
// Lookup the track in the database
var dbTracks = _session.Advanced.DocumentQuery<TrackData, DefaultSearchIndex>().WhereEquals("Query", originalFileName).ToList();
if (dbTracks.Count > 0)
{
track.Id = dbTracks[0].Id;
_session.Store(track);
_session.SaveChanges();
}
}
catch (Exception ex)
{
log.Error("UpdateTrack: Error updating track in database {0}: {1}", ex.Message, ex.InnerException);
}
I am first looking up a song in the database and get a TrackData object in dbTracks.
The track object is also of type TrackData and i just put the ID from the object just retrieved and try to store it, which gives the above error.
I would think that the above message tells me that the objects are of different types, which they aren't.
The same error happens, if i use AutoMapper.
any idea?
You can do what you're trying: replace an existing object using just the ID. If it's not working, you might be doing something else wrong. (In which case, please show us your code.)
When it comes to updating existing objects in Raven, there are a few options:
Option 1: Just save the object using the same ID as an existing object:
var song = ... // load it from the file system or whatever
song.Id = "Songs/5"; // Set it to an existing song ID
DbSession.Store(song); // Overwrites the existing song
Option 2: Manually update the properties of the existing object.
var song = ...;
var existingSong = DbSession.Load<Song>("Songs/5");
existingSong.Artist = song.Artist;
existingSong.Album = song.Album;
Option 3: Dynamically update the existing object:
var song = ...;
var existingSong = DbSession.Load<Song>("Songs/5");
existingSong.CopyFrom(song);
Where you've got some code like this:
// Inside Song.cs
public virtual void CopyFrom(Song other)
{
var props = typeof(Song)
.GetProperties(System.Reflection.BindingFlags.Public | System.Reflection.BindingFlags.Instance)
.Where(p => p.CanWrite);
foreach (var prop in props)
{
var source = prop.GetValue(other);
prop.SetValue(this, source);
}
}
If you find yourself having to do this often, use a library like AutoMapper.
Automapper can automatically copy one object to another with a single line of code.
Now that you've posted some code, I see 2 things:
First, is there a reason you're using the Advanced.DocumentQuery syntax?
// This is advanced query syntax. Is there a reason you're using it?
var dbTracks = _session.Advanced.DocumentQuery<TrackData, DefaultSearchIndex>().WhereEquals("Query", originalFileName).ToList();
Here's how I'd write your code using standard LINQ syntax:
var escapedFileName = Util.EscapeDatabaseQuery(originalFileName);
// Find the ID of the existing track in the database.
var existingTrackId = _session.Query<TrackData, DefaultSearchIndex>()
.Where(t => t.Query == escapedFileName)
.Select(t => t.Id);
if (existingTrackId != null)
{
track.Id = existingTrackId;
_session.Store(track);
_session.SaveChanges();
}
Finally, #2: what is track? Was it loaded via session.Load or session.Query? If so, that's not going to work, and it's causing your problem. If track is loaded from the database, you'll need to create a new object and save that:
var escapedFileName = Util.EscapeDatabaseQuery(originalFileName);
// Find the ID of the existing track in the database.
var existingTrackId = _session.Query<TrackData, DefaultSearchIndex>()
.Where(t => t.Query == escapedFileName)
.Select(t => t.Id);
if (existingTrackId != null)
{
var newTrack = new Track(...);
newTrack.Id = existingTrackId;
_session.Store(newTrack);
_session.SaveChanges();
}
This means you already have a different object in the session with the same id. The fix for me was to use a new session.

Sitecore Glass mapper GetItem<TypeName>(guid) always return null

I saw a related question:
Sitecore Glass Mapper always null
But unfortunately it does not give a solution for my case.
Here goes a code snippet:
var db = Factory.GetDatabase("master");
var context = new SitecoreContext();
// the ID of Needed item
var g = new Guid("{F21C04FE-8826-41AB-9F3C-F7BDF5B35C76}");
// just to test if it's possible to fetch item using db.GetItem
var i = db.GetItem(new ID(g), Language.Current, Sitecore.Data.Version.Latest);
// Grab item
var t = context.GetItem<Article>(g);
In the code above:
i is not null
t is null
Article is the simple class like:
[SitecoreType(TemplateId = "{4C4EC1DA-EB77-4001-A7F9-E4C2F61A9BE9}")]
public class Article
{
[SitecoreField(FieldName = "Title")]
public string Title { get; set; }
}
There are only one language installed in Sitecore - en, it has been specified in the web.config in the items as well.
Also I have added GlassMapperSc.Start(); to Application_Start in the Global.asax.cs and added my assembly to the list of included assemblies via var attributes = new AttributeConfigurationLoader(new[] { "Assembly.Name" }); and I succeeded to find my class in the SitecoreContext mappings.
It does not looks like a language issue, as stated in the link provided in the very beginning. And I'm struggling with it already for a pretty long time, but no luck...
Thank You!
I just noticed that you are using master db for the Sitecore DB and SitecoreContext for Glass.
The SitecoreContext class will use the database that is defined by the Sitecore.Context.Database property at runtime. This probably means that it is using the web database.
Can you check that you have published the item to the web database or instead using:
var context = new SitecoreService("master");

RavenDB IsOperationAllowedOnDocument not supported in Embedded Mode

RavenDB throws InvalidOperationException when IsOperationAllowedOnDocument is called using embedded mode.
I can see in the IsOperationAllowedOnDocument implementation a clause checking for calls in embedded mode.
namespace Raven.Client.Authorization
{
public static class AuthorizationClientExtensions
{
public static OperationAllowedResult[] IsOperationAllowedOnDocument(this ISyncAdvancedSessionOperation session, string userId, string operation, params string[] documentIds)
{
var serverClient = session.DatabaseCommands as ServerClient;
if (serverClient == null)
throw new InvalidOperationException("Cannot get whatever operation is allowed on document in embedded mode.");
Is there a workaround for this other than not using embedded mode?
Thanks for your time.
I encountered the same situation while writing some unit tests. The solution James provided worked; however, it resulted in having one code path for the unit test and another path for the production code, which defeated the purpose of the unit test. We were able to create a second document store and connect it to the first document store which allowed us to then access the authorization extension methods successfully. While this solution would probably not be good for production code (because creating Document Stores is expensive) it works nicely for unit tests. Here is a code sample:
using (var documentStore = new EmbeddableDocumentStore
{ RunInMemory = true,
UseEmbeddedHttpServer = true,
Configuration = {Port = EmbeddedModePort} })
{
documentStore.Initialize();
var url = documentStore.Configuration.ServerUrl;
using (var docStoreHttp = new DocumentStore {Url = url})
{
docStoreHttp.Initialize();
using (var session = docStoreHttp.OpenSession())
{
// now you can run code like:
// session.GetAuthorizationFor(),
// session.SetAuthorizationFor(),
// session.Advanced.IsOperationAllowedOnDocument(),
// etc...
}
}
}
There are couple of other items that should be mentioned:
The first document store needs to be run with the UseEmbeddedHttpServer set to true so that the second one can access it.
I created a constant for the Port so it would be used consistently and ensure use of a non reserved port.
I encountered this as well. Looking at the source, there's no way to do that operation as written. Not sure if there's some intrinsic reason why since I could easily replicate the functionality in my app by making a http request directly for the same info:
HttpClient http = new HttpClient();
http.BaseAddress = new Uri("http://localhost:8080");
var url = new StringBuilder("/authorization/IsAllowed/")
.Append(Uri.EscapeUriString(userid))
.Append("?operation=")
.Append(Uri.EscapeUriString(operation)
.Append("&id=").Append(Uri.EscapeUriString(entityid));
http.GetStringAsync(url.ToString()).ContinueWith((response) =>
{
var results = _session.Advanced.DocumentStore.Conventions.CreateSerializer()
.Deserialize<OperationAllowedResult[]>(
new RavenJTokenReader(RavenJToken.Parse(response.Result)));
}).Wait();

Uploading with multipart/form-data using OpenRasta and IMultipartHttpEntity

I'm trying to post some files using OpenRasta. I've gotten as far as getting my handler called, but by all appearances the stream in the entity is empty. Here's my handler:
public OperationResult Post( IEnumerable<IMultipartHttpEntity> entities)
{
var foo = entities.ToList();
foreach (var entity in foo)
{
if (entity.Stream != null && entity.ContentType != null)
{
var memoryStream = new MemoryStream();
entity.Stream.CopyTo(memoryStream);
}
}
return new OperationResult.Created();
}
Each time through the loop memoryStream has a length of 0. What am I doing wrong?
Nothing like posting on StackOverflow to make the answer immediately obvious. Apparently you only get one enumeration of the entities in order to grab the stream. I had added the "foo" variable above to make debugging easier, but it was causing the streaming to fail. As I stored the stream to the database, I had also failed to reset memoryStream to the beginning before writing it. Fixing these two issues got the file to upload correctly.