Graph traversal name to graph name mapping - tinkerpop

Is there any API using which I can get graphTraversalName to graphName mapping defined in the script?
I am using the below messy code but it's error-prone if both graphs are using the same underlying storage.
Map<String, String> graphTraversalToNameMap = new ConcurrentHashMap<String, String>();
while(traversalSourceIterator.hasNext()){
String traversalSource = traversalSourceIterator.next();
String currentGraphString = ( (GraphTraversalSource) graphManager.getAsBindings().get(traversalSource)).getGraph().toString();
graphNameTraversalMap.put(currentGraphString, traversalSource);
}
Iterator<String> graphNamesIterator = graphManager.getGraphNames().iterator();
while(graphNamesIterator.hasNext()){
String graphName = graphNamesIterator.next();
String currentGraphString = graphManager.getGraph(graphName).toString();
String traversalSource = graphNameTraversalMap.get(currentGraphString);
graphTraversalToNameMap.put(traversalSource, graphName);
}
Does gremlinExecutor.getScriptEngineManager().getBindings().entrySet() provide order guarantee? I can iterate over this and populate my map

Is there any API using which I can get graphTraversalName to graphName mapping defined in the script?
No. They share the same namespace in Gremlin Server so the relationship gets lost programmatically. You would need to do something like what you are doing but I wouldn't rely on toString() of a Graph for equality. Perhaps use the Graph instance itself? Although that might not work either depending on your situation and what you want for equality as you could have two different Graph configurations pointed at the same data and want to resolve those as the same graph. I'm also not sure that any approach will work generally for all graph systems. Anyway, I think I'd experiment with using Map<Graph, String> graphTraversalToNameMap for your case and see how that goes.
Does gremlinExecutor.getScriptEngineManager().getBindings().entrySet() provide order guarantee?
No as it is backed by a ConcurrentHashMap. You would have to provide your own order.

Underlying storage details can be obtained from the configuration object and can be used for the mapping, sample code:
public class GraphTraversalMappingUtil {
public static void populateGraphTraversalToNameMapping(GraphManager graphManager){
if(graphTraversalToNameMap.size() != 0){
return;
}
Iterator<String> traversalSourceIterator = graphManager.getTraversalSourceNames().iterator();
Map<StorageBackendKey, String> storageKeyToTraversalMap = new HashMap<StorageBackendKey, String>();
while(traversalSourceIterator.hasNext()){
String traversalSource = traversalSourceIterator.next();
StorageBackendKey key = new StorageBackendKey(
graphManager.getTraversalSource(traversalSource).getGraph().configuration());
storageKeyToTraversalMap.put(key, traversalSource);
}
Iterator<String> graphNamesIterator = graphManager.getGraphNames().iterator();
while(graphNamesIterator.hasNext()) {
String graphName = graphNamesIterator.next();
StorageBackendKey key = new StorageBackendKey(
graphManager.getGraph(graphName).configuration());
graphTraversalToNameMap.put(storageKeyToTraversalMap.get(key), graphName);
}
}
}
For full code, refer: https://pastebin.com/7m8hi53p

Related

Materialized view to use different Serde

Version used: Kafka 3.1.1, Confluent 7.1.0, Avro 1.11.0
I’m creating a REST controller which is “searching” for AVRO objects in a topic. The objects in the topic are serialized using SpecificAvroSerde<>. Each topic has assigned two AVRO schemas. One for the key (with several fields of various types) and one for the value (multiple fields and types).
I’ve done this several times whereby I’m consuming the topic in a KTable and then materialize it. There is only one pair of serdes involved and the serialized format is the same for both the topic and the materialized view (RocksaltDb). The REST controller then can look up the store and either perform a get with a key or do a range scan between two keys. This all works as expected.
private final static String TOPIC_NAME = "input-topic";
private final static String VIEW_NAME = "materialized-view";
private final SpecificAvroSerde<ProductXrefKey> productXrefKeySerde = new SpecificAvroSerde<>();
private final SpecificAvroSerde<ProductXref> productXrefSerde = new SpecificAvroSerde<>();
final Map<String, Object> props = this.kafkaProperties.buildStreamsProperties();
productXrefKeySerde.configure(props, true);
productXrefSerde.configure(props, false);
KTable<ProductXrefKey, ProductXref> productXrefTable = builder
.table(TOPIC_NAME, Consumed.with(productXrefKeySerde, productXrefSerde),
Materialized.<ProductXrefKey, ProductXref, KeyValueStore<Bytes, byte[]>>as(VIEW_NAME)
.withKeySerde(productXrefKeySerde)
.withValueSerde(productXrefSerde));
<…>
final ReadOnlyKeyValueStore<ProductXrefKey, ProductXref> store =
streamsBuilderFactoryBean.getKafkaStreams().store(fromNameAndType(VIEW_NAME, keyValueStore()));
try (KeyValueIterator<ProductXrefKey, ProductXref> range = store.range(fromKey, toKey)) {
if (range != null) {
range.forEachRemaining(kv -> {
<…>
});
} else {
log.info("Could not find {} in local ReadOnlyKeyValueStore {}", fromKey, viewName);
}
}
I now want to change this using a prefix scan instead. Since the key contains multiple fields there is no way to only serialize first part (i.e. first few fields) of the key I need a specialized serializer. This also means I have to use a different serializer for the materialized view itself (SpecificAvroSerde puts the magic number and schema ID at the beginning of the byte array) as otherwise the serialized output for the prefix and the key in the materialized view can’t be compared. Hence I created a specialised Serde which serializes the key using the same logic as when used for serializing the prefix but omitting the fields not required for the scan (i.e. omitting the last field). Above code now looks
private final static String TOPIC_NAME = "input-topic";
private final static String VIEW_NAME = "materialized-view";
private final SpecificAvroSerde<ProductXrefKey> productXrefKeySerde = new SpecificAvroSerde<>();
private final SpecificAvroSerde<ProductXref> productXrefSerde = new SpecificAvroSerde<>();
private final SpecificAvroSerde<ProductXrefKey> materializedProductXrefKeySerde = new ProductXrefKeySerde();
// for the value part we can still used standard serde as no change in serialization logic needed
private final SpecificAvroSerde<ProductXref> materializedProductXrefSerde = new SpecificAvroSerde<>();
// telling the serializer to cut off last field
private final SpecificAvroSerde<ProductXref> prefixScanProductXrefSerde = new ProductXrefKeySerde(true);
final Map<String, Object> props = this.kafkaProperties.buildStreamsProperties();
productXrefKeySerde.configure(props, true);
productXrefSerde.configure(props, false);
KTable<ProductXrefKey, ProductXref> productXrefTable = builder
.table(TOPIC_NAME, Consumed.with(productXrefKeySerde, productXrefSerde),
Materialized.<ProductXrefKey, ProductXref, KeyValueStore<Bytes, byte[]>>as(VIEW_NAME)
.withKeySerde(materializedProductXrefKeySerde)
.withValueSerde(materializedProductXrefSerde));
<…>
final ReadOnlyKeyValueStore<ProductXrefKey, ProductXref> store =
streamsBuilderFactoryBean.getKafkaStreams().store(fromNameAndType(VIEW_NAME, keyValueStore()));
try (KeyValueIterator<ProductXrefKey, ProductXref> range = store.prefixScan(prefixKey, prefixScanProductXrefSerde)){
if (range != null) {
range.forEachRemaining(kv -> {
<…>
});
} else {
log.info("Could not find {} in local ReadOnlyKeyValueStore {}", prefixKey, viewName);
}
}
My assumption was, that the topic gets deserialized using the SpecificAvroSerde and then gets serialized for the view using my ProductXrefKeySerde. The problem is, that the content in the materialized view is still serialized using the same logic as in the original topic. It appears that the serializer is never used during the topic being processed and stored in the materialized view. I can verify that also on the file system and see that the keys in the RocksaltDb files are serialized with the magic byte and schema ID and hence prefixScan wont be able to fine anything.
How can I change the serialization format for the materialized view?
Or is there a better way for serializing a prefix AVRO object?
It appears that there is some optimization happening which avoids deserialization/serialization if KTable is directly materialized. I've changed the logic such that it consumes it as a KStream and then creates the KTable (toTable(...))
KTable<ProductXrefKey, ProductXref> productXrefStream = builder
.stream(TOPIC_NAME, Consumed.with(productXrefKeySerde, productXrefSerde))
.toTable(Materialized.<ProductXrefKey, ProductXref, KeyValueStore<Bytes, byte[]>>as(VIEW_NAME)
.withKeySerde(productXrefKeySerde)
.withValueSerde(productXrefSerde));
With this small change, data now gets deserialized (using SpecificAvroSerde<>) and serialized again using the provided ProductXrefKeySerde. Now also the prefix scan works and returns the records as expected.

Use Java8 Stream on JDBCTemplate Results from HIVE

I am using jdbcTemplate to query hive then writing the results to a .csv file. I basically just generate a list of objects then steam the list to write each record to the file.
I will like to stream the results as they coming back from hive and write it to the file instead of wait to get the whole thing then processing it. Can anyone pointing me to the right direction? Thanks!
private List<Avs> queryAvsData(String asSql) {
List<Avs> llistAvs = new ArrayList<Avs>();
List<Map<String, Object>> rows = hiveJdbcTemplate.queryForList(asSql);
Iterator<Map<String, Object>> it = rows.iterator();
while (it.hasNext()) {
Map<String, Object> row = it.next();
Avs laAvs = Avs.builder()
.make((String) row.get("make"))
.model((String) row.get("model"))
.build();
llistAvs.add(laAvs);
}
return llistAvs;
}
It doesn't look like there's a built-in solution, but you can do it. Basically, you wrap the existing functionality in an iterator, and use a spliterator to turn it into a stream. Here's a blog post on the subject:
The code implements Spring’s ResultSetExtractor interface, which is a Single Abstract Method (SAM) interface, allowing the use of a lambda expression to implement it.
The implementation wraps the SQL ResultSet in an iterator, constructs a stream using the Spliterators and StreamSupport utility classes, and applies that to a Function taking a stream of row sets and returning a generic result.
It's possible to stream values from JdbcTemplate. The following example is a service based on Spring Boot 2.4.8.
As, I run into problems (connection leak) using queryForStream then I will put a demo code here just to know that stream must be closed after usage.
import lombok.RequiredArgsConstructor;
import org.springframework.jdbc.core.SingleColumnRowMapper;
import org.springframework.jdbc.core.namedparam.NamedParameterJdbcTemplate;
import org.springframework.stereotype.Service;
import java.util.Map;
import java.util.stream.Stream;
#Service
#RequiredArgsConstructor
public class DataCleaningService {
private final NamedParameterJdbcTemplate jdbcTemplate;
public void doSomeStreaming() {
String nativeQuery = "SELECT string_value FROM my_table WHERE column = :valueToFiler";
Map<String, Object> queryParameters = Map.of("valueToFiler", "my value");
SingleColumnRowMapper<String> stringRowMapper = SingleColumnRowMapper.newInstance(String.class);
try (Stream<String> stringValueStream = jdbcTemplate.queryForStream(nativeQuery, queryParameters, stringRowMapper)) {
stringValueStream.forEach(stringValue -> {
// do the needed action with the value
//..
System.out.printf("My cool value: %s", stringValue);
});
}
}
}

Ignite CacheJdbcPojoStoreFactory using Enum fields

I am to using the CacheJdbcPojoStoreFactory
I want to have a VARCHAR field in the database which maps to an Enum in Java.
The way I am trying to achieve this is something like the following. I want the application code to work with the enum, but the persistence to use the string so that it is human readable in the database. I do not want to use int values in the database.
This seems to work fine for creating new objects, but not for reading them out. It seems that it tries to set the field directly, and the setter (setSideAsString) is not called. Of course there is no field called sideAsString. Should this work? Any suggestions?
Here is the code excerpt
In some application code I would do something like
trade.setSide(OrderSide.Buy)
And this will persist fine. I can read "Buy" in the side column as a VARCHAR.
In Trade
private OrderSide side; // OrderSide is an enum with Buy,Sell
public OrderSide getSide() {
return side;
}
public void setSide(OrderSide side) {
this.side = side;
}
public String getSideAsString() {
return this.side.name();
}
public void setSideAsString(String s) {
this.side = OrderSide.valueOf(s);
}
Now when configuring the store, I do this
Collection<JdbcTypeField> vals = new ArrayList<>();
vals.add(new JdbcTypeField(Types.VARCHAR, "side", String.class, "sideAsString"));
After a clean start, If I query Trade using Ignite SQL query, and call trade.getSide() it will be null. Other (directly mapped) columns are fine.
Thanks,
Gordon
BinaryMarshaller deserialize only fields which used in query.
Please try to use OptimizedMarshaller:
IgniteConfiguration cfg = new IgniteConfiguration();
...
cfg.setMarshaller(new OptimizedMarshaller());
Here's the ticket for support enum mapping in CacheJdbcPojoStore.

Google diff-match-patch : How to unpatch to get Original String?

I am using Google diff-match-patch JAVA plugin to create patch between two JSON strings and storing the patch to database.
diff_match_patch dmp = new diff_match_patch();
LinkedList<Patch> diffs = dmp.patch_make(latestString, originalString);
String patch = dmp.patch_toText(diffs); // Store patch to DB
Now is there any way to use this patch to re-create the originalString by passing the latestString?
I google about this and found this very old comment # Google diff-match-patch Wiki saying,
Unpatching can be done by just looping through the diff, swapping
DIFF_INSERT with DIFF_DELETE, then applying the patch.
But i did not find any useful code that demonstrates this. How could i achieve this with my existing code ? Any pointers or code reference would be appreciated.
Edit:
The problem i am facing is, in the front-end i am showing a revisions module that shows all the transactions of a particular fragment (take for example an employee details), like which user has updated what details etc. Now i am recreating the fragment JSON by reverse applying each patch to get the current transaction data and show it as a table (using http://marianoguerra.github.io/json.human.js/). But some JSON data are not valid JSON and I am getting JSON.parse error.
I was looking to do something similar (in C#) and what is working for me with a relatively simple object is the patch_apply method. This use case seems somewhat missing from the documentation, so I'm answering here. Code is C# but the API is cross language:
static void Main(string[] args)
{
var dmp = new diff_match_patch();
string v1 = "My Json Object;
string v2 = "My Mutated Json Object"
var v2ToV1Patch = dmp.patch_make(v2, v1);
var v2ToV1PatchText = dmp.patch_toText(v2ToV1Patch); // Persist text to db
string v3 = "Latest version of JSON object;
var v3ToV2Patch = dmp.patch_make(v3, v2);
var v3ToV2PatchTxt = dmp.patch_toText(v3ToV2Patch); // Persist text to db
// Time to re-hydrate the objects
var altV3ToV2Patch = dmp.patch_fromText(v3ToV2PatchTxt);
var altV2 = dmp.patch_apply(altV3ToV2Patch, v3)[0].ToString(); // .get(0) in Java I think
var altV2ToV1Patch = dmp.patch_fromText(v2ToV1PatchText);
var altV1 = dmp.patch_apply(altV2ToV1Patch, altV2)[0].ToString();
}
I am attempting to retrofit this as an audit log, where previously the entire JSON object was saved. As the audited objects have become more complex the storage requirements have increased dramatically. I haven't yet applied this to the complex large objects, but it is possible to check if the patch was successful by checking the second object in the array returned by the patch_apply method. This is an array of boolean values, all of which should be true if the patch worked correctly. You could write some code to check this, which would help check if the object can be successfully re-hydrated from the JSON rather than just getting a parsing error. My prototype C# method looks like this:
private static bool ValidatePatch(object[] patchResult, out string patchedString)
{
patchedString = patchResult[0] as string;
var successArray = patchResult[1] as bool[];
foreach (var b in successArray)
{
if (!b)
return false;
}
return true;
}

Does this saving/loading pattern have a name?

There's a variable persistence concept I have integrated multiple times:
// Standard initialiation
boolean save = true;
Map<String, Object> dataHolder;
// variables to persist
int number = 10;
String text = "I'm saved";
// Use the variables in various ways in the project
void useVariables() { ... number ... text ...}
// Function to save the variables into a datastructure and for example write them to a file
public Map<String, Object> getVariables()
{
Map<String, Object> data = new LinkedHashMap<String, Object>();
persist(data);
return(data);
}
// Function to load the variables from the datastructure
public void setVariables(Map<String, Object> data)
{
persist(data);
}
void persist(Map<String, Object> data)
{
// If the given datastructure is empty, it means data should be saved
save = (data.isEmpty());
dataHolder = data;
number = handleVariable("theNumber", number);
text = handleVariable("theText", text);
...
}
private Object handleVariable(String name, Object value)
{
// If currently saving
if(save)
dataHolder.put(name, value); // Just add to the datastructure
else // If currently writing
return(dataHolder.get(name)); // Read and return from the datastruct
return(value); // Return the given variable (no change)
}
The main benefit of this principle is that you only have a single script where you have to mention new variables you add during the development and it's one simple line per variable.
Of course you can move the handleVariable() function to a different class which also contains the "save" and "dataHolder" variables so they wont be in the main application.
Additionally you could pass meta-information, etc. for each variable required for persisting the datastructure to a file or similar by saving a custom class which contains this information plus the variable instead of the object itself.
Performance could be improved by keeping track of the order (in another datastructure when first time running through the persist() function) and using a "dataHolder" based on an array instead of a search-based map (-> use an index instead of a name-string).
However, for the first time, I have to document this and so I wondered whether this function-reuse principle has a name.
Does someone recognize this idea?
Thank you very much!