hdfsFileStatus and FileStatus difference - apache

what is the main difference between the 2 classes.
mainly, what situation would i use one and not the other?
org.apache.hadoop.hdfs.protocol package
http://www.sching.com/javadoc/hadoop/org/apache/hadoop/hdfs/protocol/HdfsFileStatus.html
org.apache.hadoop.fs package
https://hadoop.apache.org/docs/r2.6.1/api/org/apache/hadoop/fs/FileStatus.html

HdfsFileStatus is marked with #InterfaceAudience.Private and #InterfaceStability.Evolving annotations (check the source code). The first annotation means it intended to be used for internal Hadoop implementations. The second annotation means the file might be changing (backwards compatible support might not be available between releases). Basically you should not use HdfsFileStatus in your code.

Related

How can I have a "private" Erlang module?

I prefer working with files that are less than 1000 lines long, so am thinking of breaking up some Erlang modules into more bite-sized pieces.
Is there a way of doing this without expanding the public API of my library?
What I mean is, any time there is a module, any user can do module:func_exported_from_the_module. The only way to really have something be private that I know of is to not export it from any module (and even then holes can be poked).
So if there is technically no way to accomplish what I'm looking for, is there a convention?
For example, there are no private methods in Python classes, but the convention is to use a leading _ in _my_private_method to mark it as private.
I accept that the answer may be, "no, you must have 4K LOC files."
The closest thing to a convention is to use edoc tags, like #private and #hidden.
From the docs:
#hidden
Marks the function so that it will not appear in the
documentation (even if "private" documentation is generated). Useful
for debug/test functions, etc. The content can be used as a comment;
it is ignored by EDoc.
#private
Marks the function as private (i.e., not part of the public
interface), so that it will not appear in the normal documentation.
(If "private" documentation is generated, the function will be
included.) Only useful for exported functions, e.g. entry points for
spawn. (Non-exported functions are always "private".) The content can
be used as a comment; it is ignored by EDoc.
Please note that this answer started as a comment to #legoscia's answer
Different visibilities for different methods is not currently supported.
The current convention, if you want to call it that way, is to have one (or several) 'facade' my_lib.erl module(s) that export the public API of your library/application. Calling any internal module of the library is playing with fire and should be avoided (call them at your own risk).
There are some very nice features in the BEAM VM that rely on being able to call exported functions from any module, such as
Callbacks (funs/anonymous funs), MFA, erlang:apply/3: The calling code does not need to know anything about the library, just that it's something that needs to be called
Behaviours such as gen_server need the previous point to work
Hot reloading: You can upgrade the bytecode of any module without stopping the VM. The code server inside the VM maintains at most two versions of the bytecode for any module, redirecting external calls (those with the Module:) to the most recent version and the internal calls to the current version. That's why you may see some ?MODULE: calls in long-running servers, to be able to upgrade the code
You'd be able to argue that these points'd be available with more fine-grained BEAM-oriented visibility levels, true. But I don't think it would solve anything that's not solved with the facade modules, and it'd complicate other parts of the VM/code a great deal.
Bonus
Something similar applies to records and opaque types, records only exist at compile time, and opaque types only at dialyzer time. Nothing stops you from accessing their internals anywhere, but you'll only find problems if you go that way:
You insert a new field in the record, suddenly, all your {record_name,...} = break
You use a library that returns an opaque_adt(), you know that it's a list and use like so. The library is upgraded to include the size of the list, so now opaque_adt() is a tuple() and chaos ensues
Only those functions that are specified in the -export attribute are visible to other modules i.e "public" functions. All other functions are private. If you have specified -compile(export_all) only then all functions in module are visible outside. It is not recommended to use -compile(export_all).
I don't know of any existing convention for Erlang, but why not adopt the Python convention? Let's say that "library-private" functions are prefixed with an underscore. You'll need to quote function names with single quotes for that to work:
-module(bar).
-export(['_my_private_function'/0]).
'_my_private_function'() ->
foo.
Then you can call it as:
> bar:'_my_private_function'().
foo
To me, that communicates clearly that I shouldn't be calling that function unless I know what I'm doing. (and probably not even then)

ClassImp preprocessor macro in ROOT - Is it really needed?

Do I really have to use the ClassImp macro to benefit the automatic dictionary and streamer generation in ROOT? Some online tutorials and examples mention it but I noticed that simply adding the ClassDef(MyClass, <ver>) macro to MyClass.h and processing it with rootcint/rootcling already generates most of such code.
I did look at Rtypes.h where these macros are defined but to follow preprocessor macros calling each other is not easy and so, it would be nice if experts could confirm the role of ClassImp. I am specifically interested in recent versions of ROOT >= 5.34
Here is the answer I got on roottalk mailing list confirming that the usage of ClassImp is essentially outdated.
ClassImp is used to register in the TClass the name of the source file
for the class. This was used in particular by THtml (which has now
been deprecated in favor of Doxygen). So unless you code/framework
needs to know the name of the source files, it is no longer necessary
to have ClassImp.
ClassDef is necessary for class inheriting from TObject (or from any
classes that has a ClassDef). In the other cases, it provide
accelerator that makes the I/O slightly faster (and thus is
technically not compulsory in this case). It also assign a version
number to the schema layout which simplifies writing schema evolution
rules (on the other hand, there is other alternative to assign a
version number to the schema layout).
What exactly are you trying to do? The ClassImp and ClassDef macros add members to the class that provide Run-Time Type Information and allow the class to be written to root files. If you are not interested in that, then don't bother with these macros.
I never use them.

keep around a piece of context built during compile-time for later use in runtime?

I'm aware this might be a broad question (there's no specific code for you to look at), but I'm hoping I'd get some insights as to what to do, or how to approach the problem.
To keep things simple, suppose the compiler that I'm writing performs these three steps:
parse (and bind all variables)
typecheck
codegen
Also the language that I'm building the compiler for wants to support late-analysis/late-binding (ie., it has a function that takes a String, which is to be compiled and executed as a piece of source-code during runtime).
Now during parse-phase, I have a piece of context that I need to keep around till run-time for the sole benefit of the aforementioned function (because it needs to parse and typecheck its argument in that context).
So the question, how should I do this? What do other compilers do?
Should I just serialise the context object to disk (codegen for it) and resurrect it during run-time or something?
Thanks
Yes, you'll need to emit the type information (or other context, you weren't very specific) in your object/executable files, so that your eval can read it at runtime. You might look at Java's .class file format for inspiration; Java doesn't have eval as such, but you can dynamically spin new bytecode at runtime that must be linked in a type-safe manner. David Conrad's comment is spot-on: this information can also be used to implement reflection, if your language has such a feature.
That's as much as I can help you without more specifics.

Class versioning

I'm looking for a clean way to make incremental updates to my code library, without breaking backwards compatibility. This could mean adding new members to classes, or changing existing members to provide additional functionality. Sometimes I am required to change a member in such a way that it would break existing code (e.g. renaming a method or changing its return type), so I'd rather not touch any of my existing types once they are shipped.
The way I currently set this up is through inheritance and polymorphism by creating a new class that extends the previous "version" of that class.
The way this works is by creating the appropriate version of StatusResult (e.g. StatusResultVersion3), based on the actual value of the ProtocolVersion property, and returning it as an instance of CommandResult.
Because .NET does not seem to have a concept of class versioning, I had to come up with my own: appending the version number to the end of the class name. This will no doubt make you cringe. I could easily imagine yourself scratching your eyes out after zooming in on the diagram. But it works. I can add new members and override existing members, without introducing any code breaking changes.
Is there a better way to version my classes?
There are typically two approaches when considering existing code and assembly updates:
Regression Testing
This is a great approach for non-breaking changes, where you can simply overload functions to provide new parameters, etc. Visual Studio has some very advanced unit testing capabilities to make your regression testing relatively easy and automated.
Assembly Versions
If your changes are going to start breaking things, like rewriting the way some utility works, then it's time for a new assembly version. .NET is very good about working with assembly versions. You can deploy the versioned assemblies to different folders so that existing code can continue to reference the old version while new code can take advantage of the features in the new version.
The problem with interfaces is that once published they're largely set in stone. To quote Anders Hejlsburg:
... It's like adding a method to an interface. After you publish an interface, it is for all practical purposes immutable, because any implementation of it might have the methods that you want to add in the next version. So you've got to create a new interface instead.
So you can never just update an interface, you need to create a completely new one. Of course, you can have a single class implement both interfaces so your maintainability effort is fairly low compared with (say) polymorphic classes where your code will become spread out between multiple classes over time.
Multiple Interfaces also allows you to remove methods in a way that classes do not (Sure, you can Deprecate them but that can result in very noisy intellisense after a few iterations)
I personally lean towards having entirely stand-alone versions of the interface in each assembly version.
That is to say...
v 0.1.0.0
interface IExample
{
String DoSomething();
}
v 0.2.0.0
interface IExample
{
void DoSomethingElse();
}
How you implement them behind the scenes is up to you, but most likely it'll be the same classes with slightly different methods doing similar jobs (otherwise, why use the same interface?)
All the old code should be referencing 0.1.x.x and new code will reference 0.2.x.x. About the only issue is when you find (say) a security flaw and the fix needs to be back-ported to an earlier version. This is where a decent VCS comes in (Personal preference is TFS but SVN or anything else which supports branching/merging will do).
Merge the fixes from the 0.2 branch back into the 0.1 branch and then do a recompile to result in (say) 0.1.1.0.
As long as you stick to a process like this:
Major or Minor build will increment if there are any breaking changes (aka signatures will not change on Build/Revision increments)
Use publisher policies if the new Major/Minor version should be used by older programs (equivalent to guaranteeing nothing broke so use the new version anyway)
References in client apps should point at a Major/Minor version but not specify revision/build
This gives you:
A clean codebase without legacy clutter
Allows clients to use the latest version with no code changes if nothing has broken
Prevents clients using newer versions of an assembly which do have breaking changes until they recompile (and, one hopes, update their code as appropriate to take advantage of the new features.)
Allows you to release security patches for previous versions
The OP solved his problem as indicated by this comment:
In the end, I went with the interfaces idea because it allows me to keep multiple versions of a class member in a single class file. When I need to update the class, I'll just add the new interface, shadowing the member that has been changed, and change the return type on some of my methods. This works without breaking backwards compatibility because of polymorphism.
If this is mainly for serialization, This can be achieved in .Net using DataContractSerializers and DataAnnotations. They can deserialize different versions an object into the same object to allow for different versions of the same class to be deserialized, leaving any properties it can't map blank.

DynamicMethod in Cecil

Is there anything similar to Reflection.Emit.DynamicMethod in Cecil? Thanks.
DynamicMethod
Edit:
What about for the following things?
EmitCall (e.g.
IL.EmitCall(OpCodes.Callvirt, GetBuildKey, null);
IL.Emit(OpCodes.Unbox_Any, dependencyType);
)
LocalBuilder (e.g. LocalBuilder resolving = ilContext.IL.DeclareLocal(typeof(bool));)
System.Reflection.Emit.Label (e.g. Label existingObjectNotNull = buildContext.IL.DefineLabel();) //Do I have to use TextMap?
ILGenerator.BeginCatchBlock (e.g. ilContext.IL.BeginCatchBlock(typeof(Exception)); )
ILGenerator.MarkLabel (e.g. ilContext.IL.MarkLabel(parameterResolveFailed); )
ILGenerator.EndExceptionBlock() (e.g. ilContext.IL.EndExceptionBlock(); )
There's no way to create a DynamicMethod with Cecil, nor does it have an equivalent.
A DynamicMethod is strongly tied to the runtime, while Cecil is completely decoupled. The two of them have a completely separate type system. DynamicMethod are meant to be, well, dynamic, and as such have to use the System.Reflection type system, as it's the one available at runtime. Mono.Cecil has another representation of this type system suitable to static analysis, without having to actually load the assembly at runtime. So if you want to use a DynamicMethod, you have to use it along with its environment.
This question was originally asked, iirc, in the context of runtimes without DynamicMethods or SRE all-together, like the Compact Framework, where Cecil can be used to emit code at runtime.
Of course it's possible, but then you have to pay the price of loading the assembly, which is no small price on CF devices. It means that if you could somehow emulate a DynamicMethod by creating an assembly with only one static method with Cecil, it sounds a terrible idea. The assemblies would not be collectable (DynamicMethods are), making it a giant memory leak.
If you need to emit code at runtime on the Compact Framework, emit as less as possible, and emit as few assemblies as possible.