Serialization in Hadoop - Writable

Serialization in Hadoop - Writable - serialization

This is the class that implements Writable ..
public class Test implements Writable {
List<AtomicWritable> atoms = new ArrayList<AtomicWritable>();
public void write(DataOutput out) throws IOException {
IntWritable size = new IntWritable(atoms.size());
size.write(out);
for (AtomicWritable atom : atoms)
atom.write(out);
}
public void readFields(DataInput in) throws IOException {
atoms.clear();
IntWritable size = new IntWritable();
size.readFields(in);
int n = size.get();
while(n-- > 0) {
AtomicWritable atom = new AtomicWritable();
atom.readFields(in);
atoms.add(atom);
}
}
}
I will really appreciate if one can help me understand how to invoke write and readFields method.
Basically I m failing to understand how to construct Test object in this case. Once the object is written to DataOutput obj, how do we restore it in DataInput object. This may sound silly, but am a newbie to Hadoop and have been assigned a project that uses Hadoop. Please help.
Thanks!!!

Basically I m failing to understand how to construct Test object in this case.
Yup, you're missing the point. If you need to construct an instance of Test and populate atoms, then you need to add a constructor to Test:
public Test(ArrayList<AtomicWritable> atoms) {
this.atoms = atoms;
}
or you need to use the default constructor and add a method or a setter that lets you add items to atoms or set the value of atoms. The latter is actually pretty common in the Hadoop framework, to have a default constructor and a set method. cf., e.g., Text.set.
You don't call readFields and write; the Hadoop framework does that for you when it needs to serialize and deserialize inputs and outputs to and from map and reduce.

Related

How to programmatically register extensions in Junit5

Say, a test needs a parameter that is only known when the tests are about to run.
#ExtendWith(MyParameterExtension.class)
public class Test {
protected final MyParameter p;
public Test(MyParameter p) {}
#Test
public void test() { assertSuccess(TestedCode.doComplexThing(p)); }
}
Only before the tests are executed, the specific contents of MyParameter instance can be determined. So I can have a resolver extension that simple pastes that parameter value where needed:
class MyParameterExtension implements ParameterResolver {
private final MyParameter myParameter;
public MyParameterExtension(MyParameter p) {
myParameter = p;
}
#Override
public boolean supportsParameter(ParameterContext parameterContext, ExtensionContext extensionContext) {
return (parameterContext.getParameter().getType() == MyParameter.class);
}
#Override
public MyParameter resolveParameter(ParameterContext parameterContext, ExtensionContext extensionContext) {
return myParameter;
}
}
I run the tests by starting Junit5 from my own code. That's when I can determine what the corresponding parameter values are. Let's say these parameters drive the behavior of the tests, and a user can specify (i.e., over a CLI) the values that a run should use.
How do I register the extension with the test run, as I'm about to commence it?
void launchSuite(List<DiscoverySelector> selectors, Object something) {
// The input to this are all the necessary selectors.
LauncherDiscoveryRequest ldr = LauncherDiscoveryRequestBuilder.request()
.selectors(selectors).build();
Launcher launcher = LauncherFactory.create();
TestPlan plan = launcher.discover(ldr);
MyParameter myParameter = new MyParameter(something);
MyParameterExtension ext = new MyParameterExtension(myParameter);
// $TODO: how do I register my extension with the test run
// before starting it?
launcher.execute(plan);
}
Auto-registering extensions doesn't help me (how would that process know the value of MyParameter)
Using #RegisterExtension in the test code doesn't help me (A static block in the test code won't know the proper input for constructing instances of MyParameter)
Looking at the mechanics of launching the test, I don't see anything that lets me register those extensions in advance.
I considered using a ThreadLocal field in an extension registered statically but AFAIU, this won't (reliably) work because JUnit may create its own threads at least in certain cases.
I considered sticking the value of MyParameter in the "extension context", but I don't see a way to grab a hold of that before the test execution starts either. The root context is created in JupiterEngineDescriptor that is, if nothing else, all internal API.
The obvious solution is to stick the parameter in a static field somewhere, but that would preclude me from running tests with different parameters in parallel, unless I resort to loading tests into isolated class loaders, which sounds too cumbersome for something that I believe should be simpler. After all, all of the contexts of a test run are otherwise fully isolated.
What I'm ultimately trying to do, at then, as to make something like this possible:
// ...
new Thread(()->launchSuite(selectors, "assume Earth gravity")).start();
new Thread(()->launchSuite(selectors, "assume Mars gravity")).start();
So what's are the reasonable ways to wire something this together?

Let's start with the one thing that does not work: Using the launcher API.
The launcher API is a platform feature, whereas extensions are Jupiter-related. That's why there is no mechanism to register an extension in the API.
What should work, though, is #RegisterExtension - although you claim it would not. As the documentation shows it is not restricted to static fields. Therefore, whatever you do here:
MyParameter myParameter = new MyParameter(something);
MyParameterExtension ext = new MyParameterExtension(myParameter);
could be done in a static method to instantiate an extension during runtime:
public class Test {
private static MyParameterExtension createExtension() {
MyParameter myParameter = new MyParameter(something);
return new MyParameterExtension(myParameter);
}
#RegisterExtension
private MyParameterExtension my = createExtension();
#Test
public void test(MyParameter p) {
assertSuccess(TestedCode.doComplexThing(p));
}
}
If that doesn't work in your case, some information is missing from your problem statement IMO.
Update
If your extension creation code requires parameters that can only be determined at launch time, you have the option of adding configuration parameters to the discovery request:
LauncherDiscoveryRequest ldr = LauncherDiscoveryRequestBuilder.request()
.configurationParameter("selectors", "assume Earth gravity")
.selectors(selectors).build();
This parameter can then be retrieved within the extension:
class MyParameterExtension implements ParameterResolver {
...
#Override
public MyParameter resolveParameter(ParameterContext parameterContext, ExtensionContext extensionContext) {
var selectors = extensionContext.getConfigurationParameter("selectors").orElse("");
return new MyParameter(selectors);
}
}

How can I serialize/deserialize java.util.stream.Stream using Jackson?

Assuming I have the following object
public class DataObjectA {
private Stream<DataObjectB> dataObjectBStream;
}
How can I serialize them using Jackson?

As others have pointed out, you can only iterate once over a stream. If that works for you, you can use this to serialize:
new ObjectMapper().writerFor(Iterator.class).writeValueAsString(dataObjectBStream.iterator())
If you're using a Jackson version prior to 2.5, use writerWithType() instead of writerFor().

See https://github.com/FasterXML/jackson-modules-java8/issues/3 for the open issue to add java.util.Stream support to Jackson. There's a preliminary version of the code included. (edit: this is now merged and supported in 2.9.0).
Streaming support feels like it would work naturally/safely if the stream is the top level object you were (de)serializing, eg returning a java.util.stream.Stream<T> from a JAX-RS resource, or reading a Stream from a JAX-RS client.
A Stream as a member variable of a (de)serialized object, as you have in your example, is trickier, because it's mutable and single use:
private Stream<DataObjectB> dataObjectBStream;
Assuming it was supported, all of the caveats around storing references to streams would apply. You wouldn't be able to serialize the object more than once, and once you deserialized the wrapping object presumably it's stream member would retain a live connection back through the JAX-RS client and HTTP connection, which could create surprises.

You don’t.
A Stream is a single-use chain of operations and never meant to be persistent. Even storing it into an instance field like in your question is an indicator for a misunderstanding of it’s purpose. Once a terminal operation has been applied on the stream, it is useless and streams can’t be cloned. This, there is no point in remembering the unusable stream in a field then.
Since the only operations offered by Stream are chaining more operations to the pipeline and finally evaluating it, there is no way of querying its state such that it would allow to create an equivalent stream regarding its behavior. Therefore, no persistence framework can store it. The only thing a framework could do, is traversing the resulting elements of the stream operation and store them but that means effectively storing a kind of collection of objects rather than the Stream. Besides that, the single-use nature of a Stream also implies that a storage framework traversing the stream in order to store the elements had the side-effect of making the stream unusable at the same time.
If you want to store elements, resort to an ordinary Collection.
On the other hand, if you really want to store behavior, you’ll end up storing an object instance whose actual class implements the behavior. This still works with Streams as you can store an instance of a class which has a factory method producing the desired stream. Of course, you are not really storing the behavior but a symbolic reference to it, but this is always the case when you use an OO storage framework to store behavior rather than data.

I had below class having 2 elements one of them was Stream, had to annotate the getterStream method with#JsonSerializer and then override Serialize method, produces stream of JSON in my Response API:
public class DataSetResultBean extends ResultBean
{
private static final long serialVersionUID = 1L;
private final List<ComponentBean> structure;
private final Stream<DataPoint> datapoints;
private static class DataPointSerializer extends JsonSerializer<Stream<DataPoint>>
{
#Override
public void serialize(Stream<DataPoint> stream, JsonGenerator gen, SerializerProvider serializers) throws IOException, JsonProcessingException
{
gen.writeStartArray();
try
{
stream.forEach(dp -> serializeSingle(gen, dp));
}
catch (UncheckedIOException e)
{
throw (IOException) e.getCause();
}
finally
{
stream.close();
}
gen.writeEndArray();
}
public synchronized void serializeSingle(JsonGenerator gen, DataPoint dp) throws UncheckedIOException
{
try
{
gen.writeStartObject();
for (Entry<DataStructureComponent<?, ?, ?>, ScalarValue<?, ?, ?>> entry: dp.entrySet())
{
gen.writeFieldName(entry.getKey().getName());
gen.writeRawValue(entry.getValue().toString());
}
gen.writeEndObject();
}
catch (IOException e)
{
throw new UncheckedIOException(e);
}
}
}
public DataSetResultBean(DataSet dataset)
{
super("DATASET");
structure = dataset.getMetadata().stream().map(ComponentBean::new).collect(toList());
datapoints = dataset.stream();
}
public List<ComponentBean> getStructure()
{
return structure;
}
#JsonSerialize(using = DataPointSerializer.class)
public Stream<DataPoint> getDatapoints()
{
return datapoints;
}
}

Looking for a Ninject scope that behaves like InRequestScope

On my service layer I have injected an UnitOfWork and 2 repositories in the constructor. The Unit of Work and repository have an instance of a DbContext I want to share between the two of them. How can I do that with Ninject ? Which scope should be considered ?
I am not in a web application so I can't use InRequestScope.
I try to do something similar... and I am using DI however, I need my UoW to be Disposed and created like this.
using (IUnitOfWork uow = new UnitOfWorkFactory.Create())
{
_testARepository.Insert(a);
_testBRepository.Insert(b);
uow.SaveChanges();
}
EDIT: I just want to be sure i understand… after look at https://github.com/ninject/ninject.extensions.namedscope/wiki/InNamedScope i though about my current console application architecture which actually use Ninject.
Lets say :
Class A is a Service layer class
Class B is an unit of work which take into parameter an interface (IContextFactory)
Class C is a repository which take into parameter an interface (IContextFactory)
The idea here is to be able to do context operations on 2 or more repository and using the unit of work to apply the changes.
Class D is a context factory (Entity Framework) which provide an instance (keep in a container) of the context which is shared between Class B et C (.. and would be for other repositories aswell).
The context factory keep the instance in his container so i don’t want to reuse this instance all the name since the context need to be disposed at the end of the service operaiton.. it is the main purpose of the InNamedScope actually ?
The solution would be but i am not sure at all i am doing it right, the services instance gonna be transcient which mean they actually never disposed ? :
Bind<IScsContextFactory>()
.To<ScsContextFactory>()
.InNamedScope("ServiceScope")
.WithConstructorArgument(
"connectionString",
ConfigurationUtility.GetConnectionString());
Bind<IUnitOfWork>().To<ScsUnitOfWork>();
Bind<IAccountRepository>().To<AccountRepository>();
Bind<IBlockedIpRepository>().To<BlockedIpRepository>();
Bind<IAccountService>().To<AccountService>().DefinesNamedScope("ServiceScope");
Bind<IBlockedIpService>().To<BlockedIpService>().DefinesNamedScope("ServiceScope");

UPDATE: This approach works against NuGet current, but relies in an anomaly in the InCallscope implementation which has been fixed in the current Unstable NuGet packages. I'll be tweaking this answer in a few days to reflect the best approach after some mulling over. NB the high level way of structuring stuff will stay pretty much identical, just the exact details of the Bind<DbContext>() scoping will work. (Hint: CreateNamedScope in unstable would work or one could set up the Command Handler as DefinesNamedScope. Reason I dont just do that is that I want to have something that composes/plays well with InRequestScope)
I highly recommend reading the Ninject.Extensions.NamedScope integration tests (seriously, find them and read and re-read them)
The DbContext is a Unit Of Work so no further wrapping is necessary.
As you want to be able to have multiple 'requests' in flight and want to have a single Unit of Work shared between them, you need to:
Bind<DbContext>()
.ToMethod( ctx =>
new DbContext(
connectionStringName: ConfigurationUtility.GetConnectionString() ))
.InCallScope();
The InCallScope() means that:
for a given object graph composed for a single kernel.Get() Call (hence In Call Scope), everyone that requires an DbContext will get the same instance.
the IDisposable.Dispose() will be called when a Kernel.Release() happens for the root object (or a Kernel.Components.Get<ICache>().Clear() happens for the root if it is not .InCallScope())
There should be no reason to use InNamedScope() and DefinesNamedScope(); You don't have long-lived objects you're trying to exclude from the default pooling / parenting / grouping.
If you do the above, you should be able to:
var command = kernel.Get<ICommand>();
try {
command.Execute();
} finally {
kernel.Components.Get<ICache>().Clear( command ); // Dispose of DbContext happens here
}
The Command implementation looks like:
class Command : ICommand {
readonly IAccountRepository _ar;
readonly IBlockedIpRepository _br;
readonly DbContext _ctx;
public Command(IAccountRepository ar, IBlockedIpRepository br, DbContext ctx){
_ar = ar;
_br = br;
_ctx = ctx;
}
void ICommand.Execute(){
_ar.Insert(a);
_br.Insert(b);
_ctx.saveChanges();
}
}
Note that in general, I avoid having an implicit Unit of Work in this way, and instead surface it's creation and Disposal. This makes a Command look like this:
class Command : ICommand {
readonly IAccountService _as;
readonly IBlockedIpService _bs;
readonly Func<DbContext> _createContext;
public Command(IAccountService #as, IBlockedIpServices bs, Func<DbContext> createContext){
_as = #as;
_bs = bs;
_createContext = createContext;
}
void ICommand.Execute(){
using(var ctx = _createContext()) {
_ar.InsertA(ctx);
_br.InsertB(ctx);
ctx.saveChanges();
}
}
This involves no usage of .InCallScope() on the Bind<DbContext>() (but does require the presence of Ninject.Extensions.Factory's FactoryModule to synthesize the Func<DbContext> from a straightforward Bind<DbContext>().

As discussed in the other answer, InCallScope is not a good approach to solving this problem.
For now I'm dumping some code that works against the latest NuGet Unstable / Include PreRelease / Instal-Package -Pre editions of Ninject.Web.Common without a clear explanation. I will translate this to an article in the Ninject.Extensions.NamedScope wiki at some stagehave started to write a walkthrough of this technique in the Ninject.Extensions.NamedScope wiki's CreateNamedScope/GetScope article.
Possibly some bits will become Pull Request(s) at some stage too (Hat tip to #Remo Gloor who supplied me the outline code). The associated tests and learning tests are in this gist for now), pending packaging in a proper released format TBD.
The exec summary is you Load the Module below into your Kernel and use .InRequestScope() on everything you want created / Disposed per handler invocation and then feed requests through via IHandlerComposer.ComposeCallDispose.
If you use the following Module:
public class Module : NinjectModule
{
public override void Load()
{
Bind<IHandlerComposer>().To<NinjectRequestScopedHandlerComposer>();
// Wire it up so InRequestScope will work for Handler scopes
Bind<INinjectRequestHandlerScopeFactory>().To<NinjectRequestHandlerScopeFactory>();
NinjectRequestHandlerScopeFactory.NinjectHttpApplicationPlugin.RegisterIn( Kernel );
}
}
Which wires in a Factory[1] and NinjectHttpApplicationPlugin that exposes:
public interface INinjectRequestHandlerScopeFactory
{
NamedScope CreateRequestHandlerScope();
}
Then you can use this Composer to Run a Request InRequestScope():
public interface IHandlerComposer
{
void ComposeCallDispose( Type type, Action<object> callback );
}
Implemented as:
class NinjectRequestScopedHandlerComposer : IHandlerComposer
{
readonly INinjectRequestHandlerScopeFactory _requestHandlerScopeFactory;
public NinjectRequestScopedHandlerComposer( INinjectRequestHandlerScopeFactory requestHandlerScopeFactory )
{
_requestHandlerScopeFactory = requestHandlerScopeFactory;
}
void IHandlerComposer.ComposeCallDispose( Type handlerType, Action<object> callback )
{
using ( var resolutionRoot = _requestHandlerScopeFactory.CreateRequestHandlerScope() )
foreach ( object handler in resolutionRoot.GetAll( handlerType ) )
callback( handler );
}
}
The Ninject Infrastructure stuff:
class NinjectRequestHandlerScopeFactory : INinjectRequestHandlerScopeFactory
{
internal const string ScopeName = "Handler";
readonly IKernel _kernel;
public NinjectRequestHandlerScopeFactory( IKernel kernel )
{
_kernel = kernel;
}
NamedScope INinjectRequestHandlerScopeFactory.CreateRequestHandlerScope()
{
return _kernel.CreateNamedScope( ScopeName );
}
/// <summary>
/// When plugged in as a Ninject Kernel Component via <c>RegisterIn(IKernel)</c>, makes the Named Scope generated during IHandlerFactory.RunAndDispose available for use via the Ninject.Web.Common's <c>.InRequestScope()</c> Binding extension.
/// </summary>
public class NinjectHttpApplicationPlugin : NinjectComponent, INinjectHttpApplicationPlugin
{
readonly IKernel kernel;
public static void RegisterIn( IKernel kernel )
{
kernel.Components.Add<INinjectHttpApplicationPlugin, NinjectHttpApplicationPlugin>();
}
public NinjectHttpApplicationPlugin( IKernel kernel )
{
this.kernel = kernel;
}
object INinjectHttpApplicationPlugin.GetRequestScope( IContext context )
{
// TODO PR for TrgGetScope
try
{
return NamedScopeExtensionMethods.GetScope( context, ScopeName );
}
catch ( UnknownScopeException )
{
return null;
}
}
void INinjectHttpApplicationPlugin.Start()
{
}
void INinjectHttpApplicationPlugin.Stop()
{
}
}
}

How to prevent dead code being optimized by JVM?

public class A
{
public String getText()
{
Marker.start();
...
...
Marker.end();
}
}
public class Marker
{
public static void start()
{
long now = System.currentTimeMillis;
}
public static void end()
{
long now = System.currentTimeMillis;
}
}
I want to use JPDA (Java Platform Debugger Architecture) to detect the occurrence of Marker.start() and Marker.end() from external application. However I think the code may be optimized / eliminated away by JVM. How to prevent dead code being optimized by JVM?

You could for example create a fake int variable somewhere in the class Marker and increment/decrement its value in the start() and end() methods. I don't think any optimizer could remove an instance field from a class even if the value is not used anywhere. After all, someone could always inject new agent code into the JVM and ask for the value. This means calls to start() and stop() shouldn't get optimized out, either.

Changing Class Variables in runtime?

Let me give an idea of what I wish to do: I have a structure or class called student, which contains variables like
int roll_no
and
int reg_no
If the user wishes to add a new variable like char name at run time how can it be done?

Based on the word "Structure" and the variable declarations, I'm going to guess this question is about some flavor of C. How exactly to do this will depend on the language, but as a general rule, if the language is compiled (e.g. C/C++, Java), this is not possible. If the language is interpreted (e.g. Python), this might sort of be possible, like this:
class MyObj:
message = "Hi there"
a = MyObj() # Creating a new instance variable
a.name = "Bill" # Adding a new attribute
Here we've added the name attribute to the a object only, and not the entire class. I'm not sure how you're go about that for the whole class.
But really, the answer to your question is "Don't". You should think about your program and the objects you're using enough to know what fields you will and won't need. If you'll want to have a name field at some point in your program, put it in the class declaration. If you don't want it to have a value on object creation, use a sensible default like null.
Edit
Based on your comments, there are a couple of ways to approach this. I'm still not entirely clear on what you want, but I think one of these cases should cover it. Of the languages I know, Python is the most flexible at runtime:
Python
In Python, a class is just another kind of object. Class variables (check out this question too) belong to the class itself, and are inherited by any instances you create:
class MyObj:
a = 2 # A class variable
b = "a string" # Another one
ObjInstance = MyObj() # Creating an instance of this class
print ObjInstance.a # Output: "2"
ObjInstance.a = 3 # You can access and change the value of class variables *for this instance*
print MyObj.a, ObjInstance.a # Outputs "2 3". We've changed the value of a for the instance
MyObj.c = (3,4) # You can add a new class variable at runtime
# Any instance objects inherit the new variable, whether they already exist or not.
print MyObj.c, ObjInstance.c # Outputs "(3, 4) (3, 4)"
You can use this to add attributes to every instance of your class, but they will all have the same value until you change them. If you want to add an attribute to just one instance, you can do this:
ObjInstance.d = "I belong to ObjInstance!"
print ObjInstance.d # Output: "I belong to ObjInstance!"
print MyObj.d # Throws "AttributeError: class MyObj has no attribute 'd'"
One drawback to using Python is that it can be kinda slow. If you want to use a compiled language it will be slightly more complicated, and it will be harder to get the same functionality that I mentioned above. However, I think it's doable. Here's how I would do it in Java. The implementation in C/C++ will be somewhat different.
Java
Java's class attributes (and methods) are called (and declared) static:
class MyObj {
public static int a = 2;
public static String b = "a string";
}
static variables are normally accessed through the class name, as in Python. You can get at them through an instance, but I believe that generates a warning:
System.out.println(MyObj.a); //Outputs "2"
MyObj ObjInst = new MyObj();
System.out.println(ObjInst.a); //Outputs "2" with a warning. Probably.
You can't add attributes to a Java object at runtime:
ObjInst.c = "This will break"; // Throws some exception or other
However, you can have a HashMap attribute, static or not, which you can add entries to at runtime that act like attributes. (This is exactly what Python does, behind the scenes.) For example:
class MyObj {
private HashMap<String, Object> att = new HashMap<String, Object>();
public void setAttribute(String name, Object value) {
att.put(name, value);
}
public Object getAttribute(String name) {
return att.get(name);
}
}
And then you can do things like:
ObjInst.setAttribute("name", "Joe");
System.out.println(ObjInst.getAttribute("name"));
Notice that I did not declare att static above, so in this case each instance of the MyObj class has this attribute, but the class itself does not. If I had declared it static, the class itself would have one copy of this hash. If you want to get really fancy, you can combine the two cases:
class MyObj {
private static HashMap<String, Object> classAtt = new HashMap<String, Object>();
private HashMap<String, Object> instAtt = new HashMap<String, Object>();
public static void setClassAttribute(String name, Object value) {
classAtt.put(name, value);
}
public void setInstAttribute(String name, Object value) {
instAtt.put(name, value);
}
public Object getAttribute(String name) {
// Check if this instance has the attribute first
if (this.instAtt.containsKey(name) {
return instAtt.get(name);
}
// Get the class value if not
else {
return classAtt.get(name);
}
}
}
There are a few details I've left out, like handling the case of the HashMaps not having the value you're asking for, but you can figure out what to do there. As one last note, you can do in Python exactly what I did here in Java with a dict, and that might be a good idea if the attribute names will be strings. You can add an attribute as a string in Python but it's kind of hard; look at the documentation on reflection for more info.
Good luck!

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Serialization in Hadoop - Writable - serialization

Related

How to programmatically register extensions in Junit5

How can I serialize/deserialize java.util.stream.Stream using Jackson?

Looking for a Ninject scope that behaves like InRequestScope

How to prevent dead code being optimized by JVM?

Changing Class Variables in runtime?

Categories

Resources