HDF parsing using Apache Tika

HDF parsing using Apache Tika - apache

How to parse HDF files(.h5) using Apache Tika.
Apache Tika provides parser for .h5 files, but Using that I'm not able to parse the data.
Parser parser=new HDFParser();
Metadata metadata=new Metadata();
ContentHandler handler=new BodyContentHandler();
FileInputStream fileInputStream=new FileInputStream(path+h5File);
parser.parse(fileInputStream,handler,metadata,new ParseContext());
I can see metadata of file, but I can't get content using handler.
If anyone has done this, Please help me through this.

Simply you can't for the nature of HDF format file.
You have to use metadata.get(field-name-in-string-format); for retrieving information you want.
Alternatively you can try directly this Java library: NetCDF (which it is used, under the hood, by Tika)

Related

Karate DSL: Support for Apache Thrift [duplicate]

We have a requirement where we need to send .avro file as an input request to our API's. Really stuck at this point. If any detail example provided would be more appreciated.

Just use Java interop: https://github.com/intuit/karate#calling-java
You need to write a helper (start with a static method) to convert JSON to Avro and vice versa. I know teams using this for gRPC. Read this thread for tips: https://github.com/intuit/karate/issues/412
Also there is even a "karate-grpc" project: https://github.com/pecker-io/karate-grpc
Also see:
https://twitter.com/KarateDSL/status/1128170638223364097
https://twitter.com/KarateDSL/status/1417023536082812935

Webmethods - Retrieve PDF file from SFTP and encode

I am new to webmethods and need some guidance. I am tasked to read the PDF document from SFTP server and base64 encode it. I have managed to GET the file from SFTP location and now struggling to encode it. Below is the code snippet to read the file:
INVOKE pub.client.sftp:login
INVOKE pub.client.sftp:cd
INVOKE pub.client.sftp:ls
INVOKE pub.client.sftp:get
MAP
I later modified the code to include base64 encoding, no output is produced. Code is shown below:
INVOKE pub.client.sftp:login
INVOKE pub.client.sftp:cd
INVOKE pub.client.sftp:ls
INVOKE pub.client.sftp:get
INVOKE pub.string:base64Encode
MAP
when i debug the code, it executes SFTP GET and stops. No information about encoding.
Please guide me to achieve this requirement. Thank you for your help in advance.

Try something like this,
after sftp:get, invoke the pub.io:streamToBytes and then invoke pub.string:base64Encode
Don't forget to map field in pipeline.

The pub.string:base64Encode expect a byte[] Input, you can try to convert the stream to byte[] with pub.io:streamToBytes

How to transform a Java POJO to CSV using Smooks?

The title says it all. I couldn't find an example of a Smooks config mapping a POJO to a CSV.

OOTB, Smooks doesn't have a writer that is specific to CSV. Use Smooks's template cartridge (i.e., Freemarker or StringTemplate) or develop a visitor to read from the SAX stream and generate CSV.

Load *.vtl file from remote location

is it possible in dotCMS to parse a vtl file from a remote source?
// this doesn't seem to work
#dotParse("https://some.random.source.com/vtl/index.vtl")

Not normally and it would be very easy to write a plugin to do it. Without writing any java code, it should be possible to do something like this:
#set($code = $import.read("https://some.random.source.com/vtl/index.vtl"))
$render.eval($context, $code)
The downside is that it will probably be much slower than just rendering a vtl in dotcms.

.NET ZipPackage vs DotNetZip when getting streams to entries

I have been using the ZipPackage-class in .NET for some time and I really like the simple and intuitive API it has. When reading from an entry I do entry.GetStream() and I read from this stream. When writing/updating an entry I do entry.GetStream(FileAccess.ReadWrite) and write to this stream. Very simple and useful because I can hand over the reading/writing to some other code not knowing where the Stream comes from originally.
Now since the ZipPackage-API doesn't contain support for entry properties such as LastModified etc I have been looking into other zip-api's such as DotNetZip. But I'm a bit confused over how to use it. For instance, when wanting to read from an entry I first have to extract the entire entry into a MemoryStream, seek to the beginning and hand-over this stream to my other code. And to write to an entry I have to input a stream that the ZipEntry itself can read from. This seem very backwards to me. Am I using this API in a wrong way?
Isn't it possible for the ZipEntry to deliver the file straight from the disk where it is stored and extract it as the reader reads it? Does it really need to be fully extracted into memory first? I'm no expert but it seems wrong to me.

using the DotNetZip libraries does not require you to read the entire zip file into a memory stream. When you instantiate an instance an instance of ZipFile as shown below, the library is only reading from the zip file header. The zip file headers contain properties such as last modified, etc. Here is an example of opening a zip file. The DotNetZip library then reads the zip file headers and constructs a list of all entries on the zip:
using (Ionic.Zip.ZipFile zipFile = Ionic.Zip.ZipFile.Read(this.FileAbsolutePath))
{
...
}
It's up to you to then extract zip files either to a stream, to the file system, etc. In the example below, I'm using a string property accessor on zipFile to get a zip file named SomeFile.txt. The matching ZipEntry object is then extracted to a memory stream.
MemoryStream memStr = new MemoryStream();
zipFile["SomeFile.txt"].Extract(memStr); // Response.OutputStream);
Zip entries must be read into the .NET process space in order to be deflated, there's no way to bypass that by going straight into the filesystem. Similar to how your Windows Explorer shell zip extractor would work - The Windows shell extensions for 7zip or Windows built in Compressed Folders have to read entries into memory and then write them to the file system in order for you to be able to open an entry.

Okey I'm answering this my self because I found the answers. There are apparently methods for both these things I wanted in DotNetZip. For opening a read-stream -> myZipEntry.OpenReader() and for opening a write-stream -> myZipFile.UpdateEntry(e, (fn, obj) => Serialize(obj)). This works fine.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

HDF parsing using Apache Tika - apache

Simply you can't for the nature of HDF format file. You have to use metadata.get(field-name-in-string-format); for retrieving information you want. Alternatively you can try directly this Java library: NetCDF (which it is used, under the hood, by Tika)

Related

Karate DSL: Support for Apache Thrift [duplicate]

Webmethods - Retrieve PDF file from SFTP and encode

How to transform a Java POJO to CSV using Smooks?

Load *.vtl file from remote location

.NET ZipPackage vs DotNetZip when getting streams to entries

Categories

Resources