Best tool to identify encoding of streams (e.g. protobuf, avro, thrift, capnproto, etc.)? - serialization

I'm interested in playing with AI/ML tools and was wondering if this is a good problem to solve with that kind of tool? It would also be pretty useful in my day job if it actually worked.
The idea is if you have a bunch of messages of unknown origin that are binary data, would some form of pattern-recognition be able to determine that the message was encoded using a particular encoding tool like protobuf/capnproto/avro/etc.?

Related

Is it common to have RESTful endpoint returning Protobuf strings?

Instead of having a gRPC server (say, due to platform restrictions), you have a REST endpoint that returns data.SerializeToString() as the payload. Of course, any clients of this endpoint would have the appropriate proto files for each response, so they can ParseFromString(data) and be on their way. Reasons for doing this includes the benefits of Protobufs.
Improved understanding of the question: is it common to use PBs for other purposes than gRPC transport?
Yes it is totally common and reasonable. PBs are really nothing more than a data serialization format. gRPC just uses it as message interchange format (natural choice as both are Google creations). Let the answer be the description from Google itself:
Protocol buffers are Google's language-neutral, platform-neutral, extensible mechanism for serializing structured data.
Google's basic tutorial is saving it to disk. Do anything you would do with any other binary blob (jpeg, mp3, ...)
BUT! if serialization speed is really critical for you, don't assume anything. Today's JSON libs may be unexpectedly well performing - depends on your specific platforms and dominant message characteristics. Do your own performance tests. If JSON inferiority is confirmed, then there are again libs with faster serialization than PB. To name a couple: Google's less popular PB sibling FlatBuffers and something called Simple Binary Encoding, which was developed for High Frequency Trading... speaks for itself.

Text-to-speech: prosody transfer

Probably "prosody transfer" isn't the right term, but I don't know what would be.
I'm looking for some solution, most likely a software library, that would allow me to:
Record an utterance.
Also provide a text transcription.
Play it back using a different voice, but with intonation, stress, rhythm, etc. preserved from the original recording.
Alternatively, I'm also interested in ways of annotating the text with prosody information before feeding it into a text-to-speech system.
Could you please suggest me some software or libraries (preferably free and open-source, or at least cheap), or even just pointers about where to read up on such topics? Thanks.
(Note that I'm a software engineer, but new to TTS.)

Difference between properties File, Yaml & Json?

I'm a beginner in software testing. I'm working with selenium with page object design patterns. I want to keep the test data separately, but i'm confusing how to do it.
I want to know the difference between the usage of properties file, yaml, json.Which is most useful in software testing?
Which should I choose yaml, properties file, or json. So I need to keep the test data separately in json or properties file or yaml. Which is more people using nowadays ? As a tester using yaml, json and properties file is knowiing well. or following as particular pattern which is more easier. whats your suggestion ?
XML (Extensible Markup Language) is flexible and powerful markup capabilities. It is often used in configuration and preference files like those used for the Eclipse IDE. Most Web browsers have XML viewers, although XML is designed for structured data, making it a bit like looking at the internals of a database.
JavaScript Object Notation (JSON) is used with JavaScript, of course. It will be familiar to Web developers that use it for client/server communication.
YAML stands for YAML Ain’t Markup Language. It uses line and whitespace delimiters instead of explicitly marked blocks that could span one or more lines like XML and JSON. This approach is used in many programming languages, such as Python.
So it comes down to YAML or JSON-
Technically YAML is a superset of JSON. That is, in theory at least, a YAML parser can understand JSON, but not necessarily the other way around.
In general, there are certain things I like about YAML that are not available in JSON.
1) YAML is visually easier to look at. In fact the YAML homepage is itself valid YAML, yet it is easy for a human to read.
2)YAML has the ability to reference other items within a YAML file using "anchors." Thus it can handle relational information as one might find in a MySQL database.
3)YAML is more robust about embedding other serialization formats such as JSON or XML within a YAML file.
4)YAML, depending on how you use it, can be more readable than JSON
5)JSON is often faster and is probably still interoperable with more systems
6)Duplicate keys, which are potentially valid JSON, are definitely invalid YAML.
7)YAML has a ton of features, including comments and relational anchors. YAML syntax is accordingly quite complex, and can be hard to understand.
8)YAML can be used, directly, for complex tasks like grammar definitions, and is often a better choice than inventing a new language.
If you don't need any features which YAML has and JSON doesn't, I would prefer JSON because it is very simple and is widely supported (has a lot of libraries in many languages). YAML is more complex and has less support. I don't think the parsing speed or memory use will be very much different, and maybe not a big part of your program's performance.But JSON is the winner for performance (if relevant) and interoperability. YAML is better for human-maintained files.So basically use as per your requirements not what most people are using.

Should I move from REST-HTTP to Rabbitmq-RPC for synchronous call?

I read lots, many people suggested does not use AQMP-RPC for synchronous call. My response data size is 4MB, so, REST-HTTP taking too much time to send data from server to client. So, we decided to move RPC.
Can someone please suggest, should I move from REST-HTTP to AQMP-RPC or any other RPC methods like Apache Avro, Thrift or Google Protocol Buffer for sending bigger data.
You could do worse than take a look at Cap'n Proto. It's an interesting take on serialisastion, in that it endeavours to remove the need for it at all whilst still making things sane in application code. It's written by one of the guys who did Google Protocol Buffers v2. They're doing a sneaky thing with RPC too, allowing some time saving if the result of one RPC call is merely the input to a subsequent RPC call.
GPB aren't too bad either, ASN.1, etc. Anything (apart from Cap'n Proto) that has a binary wire format is probably going to be about the same - they have to marshal bits and bytes to and from a local representations. Avro of course includes its own schema with messages - pity if that's bigger than the message that's being sent.
Anything binary is probably way better than anything text (JSON, XML, etc).

Posting an image and textual based data to a wcf service

I have a requirement to write a web service that allows me to post an image to a server along with some additional information about that image.
I'm completely new to developing web services (normally client side dev) so I'm a little stumped as to what I need to look into and try.
How do you post binary data and plain text into a servic?
What RequestFormat should I use?
It looks like my options are xml or json. Can I use either of these?
Bit of a waffly question but I just need some direction rather than a solution as I can't seem to find much online.
After reading this guide to building restuful services I figured I was going about the problem the wrong way. The image and text are actually two seperate resources and so should probably be handled seperately. I now have a service that uploads an image and returns a uri to that image and a seperate service to post textual data relating to that image along with the uri to that image.
Though I don't have experience with WCF, I can tell you a painless way to handle POSTing/PUTting of binary data in a REST API (especially one with a mix of text + binary) is to encode the binary data as base64 and treat it much like any other text data in your API.
Yes, there is a slight overhead with base64 in terms of size and an additional encode/decode process, however, base64 is typically only 1.37x larger than binary.
I find in many cases the overhead is well worth avoiding the pain that can be involved with binary data in APIs, especially when you need to POST/PUT a combination of binary and text data. If you wanted to POST an image and additional meta/text data you could easily do so with a json string ("image" would be your base64 encoded image)...
{
"image":"aGVsbG8...gd29ybGQ=",
"user" : 1234,
"sub_title": "A picture from my trip to Pittsburgh"
}
This is by no means the best solution for all cases, and as I said, I'm not a WCF expert, but it's certainly something to consider in the general case to make your life easier.
If you are using WebServiceHost in WCF 3.5 then you need to read this. If you have to use WCF to do HTTP based stuff then try and get onto .Net 4. I believe they have made a whole lot of things much easier.
If you are stuck with 3.5, welcome to a world of pain. Find everything you can written by Aaron Skonnard on the subject. And as I suggested in the comments of the other question, learn how to use SvcTrace.