Hazelcast, Kryo, JsonNode Serializer - jackson

I am implementing a Hazelcast application using distributed MAP with as the entry. My JsonNodeSerializer looks like as shown below
private final ObjectReader jsonNodeReader;
private final ObjectWriter jsonNodeWriter;
#Override
public void write(ObjectDataOutput out, JsonNode jsonNode)
throws IOException {
out.write(jsonNodeWriter.writeValueAsBytes(jsonNode));
}
#Override
public JsonNode read(ObjectDataInput in)
throws IOException {
return jsonNodeReader.readTree(in);
}
However, I wanted to use Kryo to avoid using JsonNodeReader/Writer to save some space and improve performance.
I tried using Kryo and I am not able to read JsonNode/ObjectNode as we do not have no-args constructor.
#Override
public void write(ObjectDataOutput out, JsonNode jsonNode)
throws IOException {
Kryo kryo = KRYO_THREAD_LOCAL.get();
Output output = new Output((OutputStream) out);
kryo.writeObject(output, jsonNode);
output.flush();
//out.write(jsonNodeWriter.writeValueAsBytes(jsonNode));
}
#Trace(dispatcher = true)
#Override
public JsonNode read(ObjectDataInput in)
throws IOException {
InputStream inputStream = (InputStream) in;
Input input = new Input(inputStream);
Kryo kryo = KRYO_THREAD_LOCAL.get();
return kryo.readObject(input, ObjectNode.class);
// return jsonNodeReader.readTree(in);
}
Not sure if my approach to use JsonNodeReader/Writer is optimal or Using Kryo will make my solution better.
My goal is the save space and improve performance.
Any suggestions are welcome to put me in right direction.
Thanks

Not sure if kryo is actually able to write those JSON nodes. I think there are multiple possible options:
You stay with kryo, but that means you should read and write the objects as separate values, than you can recreate the JsonNode instances with constructor parameters
If you anyways gonna write independent values, you might want to write the values directly into ObjectDataOutput and read it using ObjectDataInput
From my pov the best way though is to use Jackson - you might want to have a look into the CBOR dataformat which is binary, very concise and directly available for Jackson - in addition you won't loose the schemaless, dynamic nature of JSON (https://github.com/FasterXML/jackson-dataformats-binary/tree/master/cbor)

In addition to good points and suggestions by #noctarius, there's another binary JSON alternative aside from CBOR called Smile. Found from same binary dataformats module:
https://github.com/FasterXML/jackson-dataformats-binary
In your case I do not think use of Kryo makes sense if and when you are dealing with JSON tree (or general tree models): Kryo works best when using POJOs, and can take full advantage of exact knowledge of structures. Tree models require inclusion of names, which eliminates size benefits that formats like Kryo, Avro, Protobuf and Thrift otherwise have.

Related

How to implement ClassFileTransformer#transform with byte buddy?

Is there a way to use byte buddy to implement ClassFileTransformer#transform?
At the moment my implementation uses javassist but I want to replace it with byte buddy as it has a better generics support.
So far my implementation looks like this:
public byte[] transform(ClassLoader loader, String className, Class<?> classBeingRedefined,
ProtectionDomain protectionDomain, byte[] classfileBuffer)
{
if (className.startsWith("my.package."))
{
try {
final CtClass ctClass = classPool.makeClass(new ByteArrayInputStream(classfileBuffer));
/* class manipulation */
return ctClass.toBytecode();
// remove class from class pool if it hasn't been modified
ctClass.detach();
} catch(final Exception ex) {
logger.error("failed to analyse/transform class {}", className, ex);
}
}
return classfileBuffer;
}
Is something similar possible with byte buddy? Are there ways to feed byte buddy with the byte code provided in parameter classfileBuffer?
The ClassFileTransformer implementation is configured into the Spring Load Time Weaver. So I already have the "infrastructure" available. Therefore I would rather not install another byte buddy agent to solve this problem.
Yes, look into AgentBuilder.Default. It offers a DSL for implementing Java agents. You do not need to implement your own class file transformer using it, just specify the transformations you want to make.

How to provide a custom MessageBodyWriter for Strings in CXF

I have registered a custom MessageBodyWriter<Object> implementation in my JAX-RS application. This writer can convert various types, including strings.
The custom converter is successfully used for other types, but for strings, CXF does not consider it: It does not even call isWriteable. (This was different in CXF 2.x, so there seems to have been a regression in CXF 3.x.)
Stepping through the CXF 3.1.11 code, I see that in the ProviderFactory.messageWriters list has two entries (StringTextProvider, JAXBElementTypedProvider) before my custom provider. The first one wants to convert strings, and being first in the list, it is preferred by CXF.
How can I change this to make my provider the preferred provider for strings? E.g. is it possible to drop the StringTextProvider? Or is it possible to reorder the list so that my provider comes first?
I found out that subclassing StringTextProvider and registering that class works:
#Provider
#Produces(MediaType.APPLICATION_JSON)
public class CustomeStringProvider extends StringTextProvider {
#Override
public void writeTo(String object, Class<?> type, Type genType, Annotation[] annotations, MediaType mediaType,
MultivaluedMap<String, Object> httpHeaders, OutputStream outputStream) throws IOException {
// ...
}
}
I got the idea for this approach from looking at the implementation of ProviderFactory.MessageBodyWriterComparator, which checks class hierarchies for ordering converters.

How to deal with hard to express requirements for dependencies?

When doing IoC, I (think that I) understand its use for getting the desired application level functionality by composing the right parts, and the benefits for testability. But at the microlevel, I don't quite understand how to make sure that an object gets dependencies injected that it can actually work with. My example for this is a BackupMaker for a database.
To make a backup, the database needs to be exported in a specific format, compressed using a specific compression algorithm, and then packed together with some metadata to form the final binary. Doing all of these tasks seems to be far from a single responsibility, so I ended up with two collaborators: a DatabaseExporter and a Compressor.
The BackupMaker doesn't really care how the database is exported (e.g. using IPC to a utility that comes with the database software, or by doing the right API calls) but it does care a lot about the result, i.e. it needs to be a this-kind-of-database backup in the first place, in the transportable (version agnostic) format, either of which I don't really know how to wrap in a contract. Neither does it care if the compressor does the compression in memory or on disk, but it has to be BZip2.
If I give the BackupMaker the wrong kinds of exporter or compressor, it will still produce a result, but it will be corrupt - it'll look like a backup, but it won't have the format that it should have. It feels like no other part of the system can be trusted to give it those collaborators, because the BackupMaker won't be able to guarantee to do the right thing itself; its job (from my perspective) is to produce a valid backup and it won't if the circumstances ain't right, and worse, it won't know about it. At the same time, even when writing this, it seems to me that I'm saying something stupid now, because the whole point of single responsibilities is that every piece should do its job and not worry about the jobs of others. If it were that simple though, there would be no need for contracts - J.B. Rainsberger just taught me there is. (FYI, I sent him this question directly, but I haven't got a reply yet and more opinions on the matter would be great.)
Intuitively, my favorite option would be to make it impossible to combine classes/objects in an invalid way, but I don't see how to do that. Should I write horrendously specific interface names, like IDatabaseExportInSuchAndSuchFormatProducer and ICompressorUsingAlgorithmXAndParametersY and assume that no classes implement these if they don't behave as such, and then call it a day since nothing can be done about outright lying code? Should I go as far as the mundane task of dissecting the binary format of my database's exports and compression algorithms to have contract tests to verify not only syntax but behavior as well, and then be sure (but how?) to use only tested classes? Or can I somehow redistribute the responsibilities to make this issue go away? Should there be another class whose responsibility it is to compose the right lower level elements? Or am I even decomposing too much?
Rewording
I notice that much attention is given to this very particular example. My question is more general than that, however. Therefore, for the final day of the bounty, I will try to summarize is as follows.
When using dependency injection, by definition, an object depends on other objects for what it needs. In many book examples, the way to indicate compatibility - the capability to provide that need - is by using the type system (e.g. implementing an interface). Beyond that, and especially in dynamic languages, contract tests are used. The compiler (if present) checks the syntax, and the contract tests (that the programmer needs to remember about) verify the semantics. So far, so good. However, sometimes the semantics are still too simple to ensure that some class/object is usable as a dependency to another, or too complicated to be described properly in a contract.
In my example, my class with a dependency on a database exporter considers anything that implements IDatabaseExportInSuchAndSuchFormatProducer and returns bytes as valid (since I don't know how to verify the format). Is very specific naming and such a very rough contract the way to go or can I do better than that? Should I turn the contract test into an integration test? Perhaps (integration) test the composition of all three? I'm not really trying to be generic but am trying to keep responsibilities separate and maintain testability.
What you have discovered in your question is that you have 2 classes that have an implicit dependency on one another. So, the most practical solution is to make the dependency explicit.
There are a number of ways you could do this.
Option 1
The simplest option is to make one service depend on the other, and make the dependent service explicit in its abstraction.
Pros
Few types to implement and maintain.
The compression service could be skipped for a particular implementation just by leaving it out of the constructor.
The DI container is in charge of lifetime management.
Cons
May force an unnatural dependency into a type where it is not really needed.
public class MySqlExporter : IExporter
{
private readonly IBZip2Compressor compressor;
public MySqlExporter(IBZip2Compressor compressor)
{
this.compressor = compressor;
}
public void Export(byte[] data)
{
byte[] compressedData = this.compressor.Compress(data);
// Export implementation
}
}
Option 2
Since you want to make an extensible design that doesn't directly depend on a specific compression algorithm or database, you can use an Aggregate Service (which implements the Facade Pattern) to abstract the more specific configuration away from your BackupMaker.
As pointed out in the article, you have an implicit domain concept (coordination of dependencies) that needs to be realized as an explicit service, IBackupCoordinator.
Pros
The DI container is in charge of lifetime management.
Leaving compression out of a particular implementation is as easy as passing the data through the method.
Explicitly implements a domain concept that you are missing, namely coordination of dependencies.
Cons
Many types to build and maintain.
BackupManager must have 3 dependencies instead of 2 registered with the DI container.
Generic Interfaces
public interface IBackupCoordinator
{
void Export(byte[] data);
byte[] Compress(byte[] data);
}
public interface IBackupMaker
{
void Backup();
}
public interface IDatabaseExporter
{
void Export(byte[] data);
}
public interface ICompressor
{
byte[] Compress(byte[] data);
}
Specialized Interfaces
Now, to make sure the pieces only plug together one way, you need to make interfaces that are specific to the algorithm and database used. You can use interface inheritance to achieve this (as shown) or you can just hide the interface differences behind the facade (IBackupCoordinator).
public interface IBZip2Compressor : ICompressor
{}
public interface IGZipCompressor : ICompressor
{}
public interface IMySqlDatabaseExporter : IDatabaseExporter
{}
public interface ISqlServerDatabaseExporter : IDatabaseExporter
{}
Coordinator Implementation
The coordinators are what do the job for you. The subtle difference between implementations is that the interface dependencies are explicitly called out so you cannot inject the wrong type with your DI configuration.
public class BZip2ToMySqlBackupCoordinator : IBackupCoordinator
{
private readonly IMySqlDatabaseExporter exporter;
private readonly IBZip2Compressor compressor;
public BZip2ToMySqlBackupCoordinator(
IMySqlDatabaseExporter exporter,
IBZip2Compressor compressor)
{
this.exporter = exporter;
this.compressor = compressor;
}
public void Export(byte[] data)
{
this.exporter.Export(byte[] data);
}
public byte[] Compress(byte[] data)
{
return this.compressor.Compress(data);
}
}
public class GZipToSqlServerBackupCoordinator : IBackupCoordinator
{
private readonly ISqlServerDatabaseExporter exporter;
private readonly IGZipCompressor compressor;
public BZip2ToMySqlBackupCoordinator(
ISqlServerDatabaseExporter exporter,
IGZipCompressor compressor)
{
this.exporter = exporter;
this.compressor = compressor;
}
public void Export(byte[] data)
{
this.exporter.Export(byte[] data);
}
public byte[] Compress(byte[] data)
{
return this.compressor.Compress(data);
}
}
BackupMaker Implementation
The BackupMaker can now be generic as it accepts any type of IBackupCoordinator to do the heavy lifting.
public class BackupMaker : IBackupMaker
{
private readonly IBackupCoordinator backupCoordinator;
public BackupMaker(IBackupCoordinator backupCoordinator)
{
this.backupCoordinator = backupCoordinator;
}
public void Backup()
{
// Get the data from somewhere
byte[] data = new byte[0];
// Compress the data
byte[] compressedData = this.backupCoordinator.Compress(data);
// Backup the data
this.backupCoordinator.Export(compressedData);
}
}
Note that even if your services are used in other places than BackupMaker, this neatly wraps them into one package that can be passed to other services. You don't necessarily need to use both operations just because you inject the IBackupCoordinator service. The only place where you might run into trouble is if using named instances in the DI configuration across different services.
Option 3
Much like Option 2, you could use a specialized form of Abstract Factory to coordinate the relationship between concrete IDatabaseExporter and IBackupMaker, which will fill the role of the dependency coordinator.
Pros
Few types to maintain.
Only 1 dependency to register in the DI container, making it simpler to deal with.
Moves lifetime management into the BackupMaker service, which makes it impossible to misconfigure DI in a way that will cause a memory leak.
Explicitly implements a domain concept that you are missing, namely coordination of dependencies.
Cons
Leaving compression out of a particular implementation requires you implement the Null object pattern.
The DI container is not in charge of lifetime management and each dependency instance is per request, which may not be ideal.
If your services have many dependencies, it may become unwieldy to inject them through the constructor of the CoordinationFactory implementations.
Interfaces
I am showing the factory implementation with a Release method for each type. This is to follow the Register, Resolve, and Release pattern which makes it effective for cleaning up dependencies. This becomes especially important if 3rd parties could implement the ICompressor or IDatabaseExporter types because it is unknown what kinds of dependencies they may have to clean up.
Do note however, that the use of the Release methods is totally optional with this pattern and excluding them will simplify the design quite a bit.
public interface IBackupCoordinationFactory
{
ICompressor CreateCompressor();
void ReleaseCompressor(ICompressor compressor);
IDatabaseExporter CreateDatabaseExporter();
void ReleaseDatabaseExporter(IDatabaseExporter databaseExporter);
}
public interface IBackupMaker
{
void Backup();
}
public interface IDatabaseExporter
{
void Export(byte[] data);
}
public interface ICompressor
{
byte[] Compress(byte[] data);
}
BackupCoordinationFactory Implementation
public class BZip2ToMySqlBackupCoordinationFactory : IBackupCoordinationFactory
{
public ICompressor CreateCompressor()
{
return new BZip2Compressor();
}
public void ReleaseCompressor(ICompressor compressor)
{
IDisposable disposable = compressor as IDisposable;
if (disposable != null)
{
disposable.Dispose();
}
}
public IDatabaseExporter CreateDatabaseExporter()
{
return new MySqlDatabseExporter();
}
public void ReleaseDatabaseExporter(IDatabaseExporter databaseExporter)
{
IDisposable disposable = databaseExporter as IDisposable;
if (disposable != null)
{
disposable.Dispose();
}
}
}
public class GZipToSqlServerBackupCoordinationFactory : IBackupCoordinationFactory
{
public ICompressor CreateCompressor()
{
return new GZipCompressor();
}
public void ReleaseCompressor(ICompressor compressor)
{
IDisposable disposable = compressor as IDisposable;
if (disposable != null)
{
disposable.Dispose();
}
}
public IDatabaseExporter CreateDatabaseExporter()
{
return new SqlServerDatabseExporter();
}
public void ReleaseDatabaseExporter(IDatabaseExporter databaseExporter)
{
IDisposable disposable = databaseExporter as IDisposable;
if (disposable != null)
{
disposable.Dispose();
}
}
}
BackupMaker Implementation
public class BackupMaker : IBackupMaker
{
private readonly IBackupCoordinationFactory backupCoordinationFactory;
public BackupMaker(IBackupCoordinationFactory backupCoordinationFactory)
{
this.backupCoordinationFactory = backupCoordinationFactory;
}
public void Backup()
{
// Get the data from somewhere
byte[] data = new byte[0];
// Compress the data
byte[] compressedData;
ICompressor compressor = this.backupCoordinationFactory.CreateCompressor();
try
{
compressedData = compressor.Compress(data);
}
finally
{
this.backupCoordinationFactory.ReleaseCompressor(compressor);
}
// Backup the data
IDatabaseExporter exporter = this.backupCoordinationFactory.CreateDatabaseExporter();
try
{
exporter.Export(compressedData);
}
finally
{
this.backupCoordinationFactory.ReleaseDatabaseExporter(exporter);
}
}
}
Option 4
Create a guard clause in your BackupMaker class to prevent non-matching types from being allowed, and throw an exception in the case they are not matched.
In C#, you can do this with attributes (which apply custom metadata to the class). Support for this option may or may not exist in other platforms.
Pros
Seamless - no extra types to configure in DI.
The logic for comparing whether types match could be expanded to include multiple attributes per type, if needed. So a single compressor could be used for multiple databases, for example.
100% of invalid DI configurations will cause an error (although you may wish to make the exception specify how to make the DI configuration work).
Cons
Leaving compression out of a particular backup configuration requires you implement the Null object pattern.
The business logic for comparing types is implemented in a static extension method, which makes it testable but impossible to swap with another implementation.
If the design is refactored so that ICompressor or IDatabaseExporter are not dependencies of the same service, this will no longer work.
Custom Attribute
In .NET, an attribute can be used to attach metadata to a type. We make a custom DatabaseTypeAttribute that we can compare the database type name with two different types to ensure they are compatible.
[AttributeUsage(AttributeTargets.Class, AllowMultiple = false)]
public DatabaseTypeAttribute : Attribute
{
public DatabaseTypeAttribute(string databaseType)
{
this.DatabaseType = databaseType;
}
public string DatabaseType { get; set; }
}
Concrete ICompressor and IDatabaseExporter Implementations
[DatabaseType("MySql")]
public class MySqlDatabaseExporter : IDatabaseExporter
{
public void Export(byte[] data)
{
// implementation
}
}
[DatabaseType("SqlServer")]
public class SqlServerDatabaseExporter : IDatabaseExporter
{
public void Export(byte[] data)
{
// implementation
}
}
[DatabaseType("MySql")]
public class BZip2Compressor : ICompressor
{
public byte[] Compress(byte[] data)
{
// implementation
}
}
[DatabaseType("SqlServer")]
public class GZipCompressor : ICompressor
{
public byte[] Compress(byte[] data)
{
// implementation
}
}
Extension Method
We roll the comparison logic into an extension method so every implementation of IBackupMaker automatically includes it.
public static class BackupMakerExtensions
{
public static bool DatabaseTypeAttributesMatch(
this IBackupMaker backupMaker,
Type compressorType,
Type databaseExporterType)
{
// Use .NET Reflection to get the metadata
DatabaseTypeAttribute compressorAttribute = (DatabaseTypeAttribute)compressorType
.GetCustomAttributes(attributeType: typeof(DatabaseTypeAttribute), inherit: true)
.SingleOrDefault();
DatabaseTypeAttribute databaseExporterAttribute = (DatabaseTypeAttribute)databaseExporterType
.GetCustomAttributes(attributeType: typeof(DatabaseTypeAttribute), inherit: true)
.SingleOrDefault();
// Types with no attribute are considered invalid even if they implement
// the corresponding interface
if (compressorAttribute == null) return false;
if (databaseExporterAttribute == null) return false;
return (compressorAttribute.DatabaseType.Equals(databaseExporterAttribute.DatabaseType);
}
}
BackupMaker Implementation
A guard clause ensures that 2 classes with non-matching metadata are rejected before the type instance is created.
public class BackupMaker : IBackupMaker
{
private readonly ICompressor compressor;
private readonly IDatabaseExporter databaseExporter;
public BackupMaker(ICompressor compressor, IDatabaseExporter databaseExporter)
{
// Guard to prevent against nulls
if (compressor == null)
throw new ArgumentNullException("compressor");
if (databaseExporter == null)
throw new ArgumentNullException("databaseExporter");
// Guard to prevent against non-matching attributes
if (!DatabaseTypeAttributesMatch(compressor.GetType(), databaseExporter.GetType()))
{
throw new ArgumentException(compressor.GetType().FullName +
" cannot be used in conjunction with " +
databaseExporter.GetType().FullName)
}
this.compressor = compressor;
this.databaseExporter = databaseExporter;
}
public void Backup()
{
// Get the data from somewhere
byte[] data = new byte[0];
// Compress the data
byte[] compressedData = this.compressor.Compress(data);
// Backup the data
this.databaseExporter.Export(compressedData);
}
}
If you decide on one of these options, I would appreciate if you left a comment as to which one you go with. I have a similar situation in one of my projects, and I am leaning toward Option 2.
Response to your Update
Is very specific naming and such a very rough contract the way to go or can I do better than that? Should I turn the contract test into an integration test? Perhaps (integration) test the composition of all three? I'm not really trying to be generic but am trying to keep responsibilities separate and maintain testability.
Creating an integration test is a good idea, but only if you are certain that you are testing the production DI configuration. Although it also makes sense to test it all as a unit to verify it works, it doesn't do you much good for this use case if the code that ships is configured differently than the test.
Should you be specific? I believe I have already given you a choice in that matter. If you go with the guard clause, you don't have to be specific at all. If you go with one of the other options, you have a good compromise between specific and generic.
I know you stated that you are not intentionally trying to be generic, and it is good to draw the line somewhere to ensure a solution is not over-engineered. On the other hand, if the solution has to be redesigned because an interface was not generic enough that is not a good thing either. Extensibility is always a requirement whether it is specified up front or not because you never really know how business requirements will change in the future. So, having a generic BackupMaker is definitely the best way to go. The other classes can be more specific - you just need one seam to swap implementations if future requirements change.
My first suggestion would be to critically think if you need to be that generic: You have a concrete problem to solve, you want to backup a very specific database into a specific format. Is there any benefit you get by solving the problem for arbitary databases and arbitary formats? What you surely get of a generic solution is boilerplate code and increased complexity (people understand concrete problems, not generic ones).
If this applies to you, then my suggestion would be to not let your DatabaseExporter accept interfaces, but instead only concrete implementations. There are enough modern tools out there which will also allow you mocking concrete classes, so testability is not an argument for using interfaces here aswell.
on the other hand, if you do have to backup several databases with different strategies, then I would probably introduce something like a
class BackupPlan {
public DatabaseExporter exporter() {/**...*/}
public Compressor compressor() {/** ... */}
}
then your BackupMaker will get passed one BackupPlan, specifying which database to be compressed with which algorithm.
Your question is emphasizing the fact that object composition is very important and that the entity that is responsible for such composition (wiring) has a big responsibility.
Since you already have a generic BackupMaker, I would suggest that you keep it this way, and push the big responsibility of making sure that the right composition of objects (to solve the specific problem) is done in the composition root.
Readers of your application source code (you and your team members), would have a single place (the composition root) to understand how you compose your objects to solve your specific problem by using the generic classes (e.g. BackupMaker).
Put in other words, the composition root is where you decide on the specifics. Its where you use the generic to create the specific.
To reply on the comment:
which should know what about those dependencies?
The composition root needs to know about everything (all the dependencies) since it is creating all the objects in the application and wiring them together. The composition root knows what each piece of the puzzle does and it connects them together to create a meaningful application.
For the BackupMaker, it should only care about just enough to be able to do its single responsibility. In your example, its single (simple) responsibility (as it seems to me) is to orchestrate the consumption of other objects to create a backup.
As long as you are using DI, a class will never be sure that its collaborator will behave correctly, only the composition root will. Consider this simple and extreme example of an IDatabaseExporter implementation (assume that the developer actually gave this class this name, and that he intentionally implemented it this way):
public class StupidDisastrousDatabaseExporter : IDatabaseExporter
{
public ExportedData Export()
{
DoSomethingStupidThatWillDeleteSystemDataAndMakeTheEnterpriseBroke();
...
}
private void DoSomethingStupidThatWillDeleteSystemDataAndMakeTheEnterpriseBroke()
{
//do it
...
}
}
Now, the BackupMaker will never know that it is consuming a stupid and disastrous database exporter, only the composition root does. We can never blame the programmer that wrote the BackupMaker class for this disastrous mistake (or the programmer who designed the IDatabaseExporter contract). But the programmer(s) that are composing the application in the composition root are blamed if they inject a StupidDisastrousDatabaseExporter instance into the constructor of BackupMaker.
Of course, no one should have written the StupidDisastrousDatabaseExporter class in the first place, but I gave you an extreme example to show you that a contract (interface) can never (and should never) guarantee every aspect about its implementors. It should just say enough.
Is there a way to express IDatabaseExporter in such a way that guarantees that implementors of such interface will not make stupid or disastrous actions? No.
Please note that while the BackupMaker is dealing with contracts (no 100% guarantees), the composition root is actually dealing with concrete implementation classes. This gives it the great power (and thus the great responsibility) to guarantee the composition of the correct object graph.
how do I make sure that I'm composing in a sensible way?
You should create automated end-to-end tests for the object graph created by the composition root. Here you are making sure that the composition root has done its big responsibility of composing the objects in a correct way. Here you can test the exact details that you wanted (like that the backup result was in some exact format/details).
Take a look at this article for an approach to automated testing using the Composition Root.
I believe this may be a problem that occurs when focusing too much on object models, at the exclusion of function compositions. Consider the first step in a naive function decomposition (function as in f : a -> b):
exporter: data -> (format, memory), or exception
compressor: memory -> memory, or exception
writer: memory -> side-effect, or exception
backup-maker: (data, exporter, compressor, writer) -> backup-result
So backup-maker, the last function, can be parametized with those three functions, assuming I've considered your use-case correctly, and if the three parameters have the same input and output types, e.g. format, and memory, despite their implementation.
Now, "the guts", or a possible decomposition (read right to left) of backup-maker, with all functions bound, taking data as the argument, and using the composition operator ".":
backup-maker: intermediate-computation . writer . intermediate-computation . compressor . intermediate-computation . exporter
I especially want to note that this model of architecture can be expressed later as either object interfaces, or as first-class functions, e.g. c++ std::function.
Edit: It can also be refined to terms of generics, where memory is a generic type argument, to provide type safety where wanted. E.g.
backup-maker<type M>: (data, exporter<M>, compressor<M>, writer<M>) -> ..
More information about the technique and benefits of Function Decomposition can be found here:
http://jfeltz.com/posts/2015-08-30-cost-decreasing-software-architecture.html
Your requirements seem contradictory:
You want to be specific (allowing only a subset (or only one ?) of combinations)
But you also want to be generic by using interfaces, DI, etc.
My advice is to keep things simple (in your case it means don't try to be generic) until your code evolve.
Only when your code will evolve, refactor in a more generic way. The code below shows a compromise between generic/specific:
public interface ICompressor {
public byte[] compress(byte[] source); //Note: the return type and type argument may not be revelant, just for demonstration purpose
}
public interface IExporter {
public File export(String connectionString); //Note: the return type and type argument may not be revelant, just for demonstration purpose
}
public final class Bzip2 implements ICompressor {
#Override
public final byte[] compress(byte[] source) {
//TODO
}
}
public final class MySQL implements IExporter {
#Override
public final File export(String connnectionString) {
//TODO
}
}
public abstract class ABackupStrategy {
private final ICompressor compressor;
private final IExporter exporter;
public ABackupStrategy(final ICompressor compressor, final IExporter exporter) {
this.compressor = compressor;
this.exporter = exporter;
}
public final void makeBackup() {
//TODO: compose with exporter and compressor to make your backup
}
}
public final class MyFirstBackupStrategy extends ABackupStrategy {
public MyFirstBackupStrategy(final Bzip2 compressor, final MySQL exporter) {
super(compressor, exporter);
}
}
With ICompressor and IExporter, you can easily add other compression algorithm, other database from which to export.
With ABackupStrategy, you can easily define a new allowed combination of concrete compressor/exporter by inheriting it.
Drawback: I had to make ABackupStrategy abstract without declaring any abstract method, which is in contradiction with the OOP-principles.

How can I serialize/deserialize java.util.stream.Stream using Jackson?

Assuming I have the following object
public class DataObjectA {
private Stream<DataObjectB> dataObjectBStream;
}
How can I serialize them using Jackson?
As others have pointed out, you can only iterate once over a stream. If that works for you, you can use this to serialize:
new ObjectMapper().writerFor(Iterator.class).writeValueAsString(dataObjectBStream.iterator())
If you're using a Jackson version prior to 2.5, use writerWithType() instead of writerFor().
See https://github.com/FasterXML/jackson-modules-java8/issues/3 for the open issue to add java.util.Stream support to Jackson. There's a preliminary version of the code included. (edit: this is now merged and supported in 2.9.0).
Streaming support feels like it would work naturally/safely if the stream is the top level object you were (de)serializing, eg returning a java.util.stream.Stream<T> from a JAX-RS resource, or reading a Stream from a JAX-RS client.
A Stream as a member variable of a (de)serialized object, as you have in your example, is trickier, because it's mutable and single use:
private Stream<DataObjectB> dataObjectBStream;
Assuming it was supported, all of the caveats around storing references to streams would apply. You wouldn't be able to serialize the object more than once, and once you deserialized the wrapping object presumably it's stream member would retain a live connection back through the JAX-RS client and HTTP connection, which could create surprises.
You don’t.
A Stream is a single-use chain of operations and never meant to be persistent. Even storing it into an instance field like in your question is an indicator for a misunderstanding of it’s purpose. Once a terminal operation has been applied on the stream, it is useless and streams can’t be cloned. This, there is no point in remembering the unusable stream in a field then.
Since the only operations offered by Stream are chaining more operations to the pipeline and finally evaluating it, there is no way of querying its state such that it would allow to create an equivalent stream regarding its behavior. Therefore, no persistence framework can store it. The only thing a framework could do, is traversing the resulting elements of the stream operation and store them but that means effectively storing a kind of collection of objects rather than the Stream. Besides that, the single-use nature of a Stream also implies that a storage framework traversing the stream in order to store the elements had the side-effect of making the stream unusable at the same time.
If you want to store elements, resort to an ordinary Collection.
On the other hand, if you really want to store behavior, you’ll end up storing an object instance whose actual class implements the behavior. This still works with Streams as you can store an instance of a class which has a factory method producing the desired stream. Of course, you are not really storing the behavior but a symbolic reference to it, but this is always the case when you use an OO storage framework to store behavior rather than data.
I had below class having 2 elements one of them was Stream, had to annotate the getterStream method with#JsonSerializer and then override Serialize method, produces stream of JSON in my Response API:
public class DataSetResultBean extends ResultBean
{
private static final long serialVersionUID = 1L;
private final List<ComponentBean> structure;
private final Stream<DataPoint> datapoints;
private static class DataPointSerializer extends JsonSerializer<Stream<DataPoint>>
{
#Override
public void serialize(Stream<DataPoint> stream, JsonGenerator gen, SerializerProvider serializers) throws IOException, JsonProcessingException
{
gen.writeStartArray();
try
{
stream.forEach(dp -> serializeSingle(gen, dp));
}
catch (UncheckedIOException e)
{
throw (IOException) e.getCause();
}
finally
{
stream.close();
}
gen.writeEndArray();
}
public synchronized void serializeSingle(JsonGenerator gen, DataPoint dp) throws UncheckedIOException
{
try
{
gen.writeStartObject();
for (Entry<DataStructureComponent<?, ?, ?>, ScalarValue<?, ?, ?>> entry: dp.entrySet())
{
gen.writeFieldName(entry.getKey().getName());
gen.writeRawValue(entry.getValue().toString());
}
gen.writeEndObject();
}
catch (IOException e)
{
throw new UncheckedIOException(e);
}
}
}
public DataSetResultBean(DataSet dataset)
{
super("DATASET");
structure = dataset.getMetadata().stream().map(ComponentBean::new).collect(toList());
datapoints = dataset.stream();
}
public List<ComponentBean> getStructure()
{
return structure;
}
#JsonSerialize(using = DataPointSerializer.class)
public Stream<DataPoint> getDatapoints()
{
return datapoints;
}
}

Serialization in Hadoop - Writable

This is the class that implements Writable ..
public class Test implements Writable {
List<AtomicWritable> atoms = new ArrayList<AtomicWritable>();
public void write(DataOutput out) throws IOException {
IntWritable size = new IntWritable(atoms.size());
size.write(out);
for (AtomicWritable atom : atoms)
atom.write(out);
}
public void readFields(DataInput in) throws IOException {
atoms.clear();
IntWritable size = new IntWritable();
size.readFields(in);
int n = size.get();
while(n-- > 0) {
AtomicWritable atom = new AtomicWritable();
atom.readFields(in);
atoms.add(atom);
}
}
}
I will really appreciate if one can help me understand how to invoke write and readFields method.
Basically I m failing to understand how to construct Test object in this case. Once the object is written to DataOutput obj, how do we restore it in DataInput object. This may sound silly, but am a newbie to Hadoop and have been assigned a project that uses Hadoop. Please help.
Thanks!!!
Basically I m failing to understand how to construct Test object in this case.
Yup, you're missing the point. If you need to construct an instance of Test and populate atoms, then you need to add a constructor to Test:
public Test(ArrayList<AtomicWritable> atoms) {
this.atoms = atoms;
}
or you need to use the default constructor and add a method or a setter that lets you add items to atoms or set the value of atoms. The latter is actually pretty common in the Hadoop framework, to have a default constructor and a set method. cf., e.g., Text.set.
You don't call readFields and write; the Hadoop framework does that for you when it needs to serialize and deserialize inputs and outputs to and from map and reduce.