Do invalid samples count towards a DataReader's history depth QoS? - data-distribution-service

The KEEP_LAST_HISTORY QoS setting for a DataReader limits the amount of recently received samples kept by the DataReader on a per-instance basis. As documented, for example, by RTI:
For a DataReader: Connext DDS attempts to keep the most recent depth DDS samples received for each instance (identified by a unique key) until the application takes them via the DataReader's take() operation.
Besides valid data samples, in DDS a DataReader will also receive invalid samples, e.g. to indicate a change in liveliness or to indicate the disposal of an instance. My questions concern how the history QoS settings affect these samples:
Are invalid samples treated the same as valid samples when it comes to a KEEP_LAST_HISTORY setting? For example, say I use the default setting of keeping only the latest sample (history depth of 1), and a DataWriter sends a valid data sample and then immediately disposes the instance. Do I risk missing either of the samples, or will the invalid sample(s) be handled specially in any way (e.g. in a separate buffer)?
In either case, can anyone point me to where the standard provides a definitive answer?
Assuming the history depth setting affects all (valid and invalid) samples, what would be a good history depth setting on a keyed (and Reliable) topic, to make sure I miss neither the last datum nor the disposal event? Is this then even possible in general without resorting to KEEP_ALL_HISTORY?
Just in case there are any (unexpected) implementation-specific differences, note that I am using RTI Connext 5.2.0 via the modern C++ API.

I could not verify it since I don't have a licence vor Connext anymore. I also haven't found any explicit specification in the user or api manual. But to answer your first question: I think valid and invalid samples are treated equally when it comes to using the history qos. the reason why I think so is the following code in the on_data_available callback for datareaders.
retcode = fooDataReader_take(
foo_data_reader,
&data_seq,
&info_seq,
DDS_LENGTH_UNLIMITED,
DDS_ANY_SAMPLE_STATE,
DDS_ANY_VIEW_STATE,
DDS_ANY_INSTANCE_STATE);
you can explicitly specify which sample state you wish to receive (in this case any sample state). additionally, the sample info for each sample read with the datareader contains the information if a sample is valid or not. Again, I'm not 100% sure as I couldn't verify it, but I think there is no speciall handling/automatic handling of invalid samples, you handle them like the valid ones through the sample, view and instance state.
regarding the "good value" for history qos: this depends on your application and how frequently data is exchanged and accessed. you'll have to figure it out by trying.
hope this helps at least a little bit.

Related

Dropping samples with tagged streams

I'm trying to accomplish the following behavior:
I have a continuous stream of symbols, part of which is pilot and part is data, periodically. I have the Correlation Estimator block that tags the locations of the pilots in the stream. Now, I would like to filter out the pilots such that following blocks will receive data only, and the pilots will be discarded by the tag received from the Correlation Estimator block.
Are there any existing blocks that allow me to achieve this? I've tried to search but am a bit lost.
Hm, technically, the packet-header Demux could do that, but it's a complex beast and the things you need to do to satisfy its input requirements might be a bit complicated.
So, instead, simply write your own (general) block! That's pretty easy: just save your current state (PASSING or DROPPING) in a member of the block class, and change it based on the tags you see (when in PASSING mode, you look for the correlator tag), or whether you've dropped enough symbols (in DROPPING mode). A classical finite state machine!

Length extension attack doubts

So I've been studying this concept of length extension attacks and there are few things that I noticed during my study about it which are not very bright to me.
1.Research papers are explaining how you can append some type of data to the end and make newly formed data. For example
Desired New Data: count=10&lat=37.351&user_id=1&long=-119.827&waffle=eggo&waffle=liege
(notice 2 waffles). My question is if a parser function on the server side can track duplicate attributes, could then the entire length extension attack be nonsense? Because the server would notice duplicate attributes. Is a proper parser that is made to check any duplicates a good solution versus length extension attacks? I'm aware of HMAC approach and other protections, but specifically talking just about parsers here now.
2.Research says that only vulnerable data is H(key|message). They claim that H(message|key) won't work for the attacker because we would have to append a new key (which we obviously don't know). My question is why would we have to append a new key? We don't do it when we are attacking H(key|message). Why can't we rely on the fact that we will pass the verification test (we would create the correct hash) and that if the parser tries to extract the key from it, that it would take the only key in the block we send out and resume from there? Why would we have to send 2 keys? Why doesn't attack against H(message|key) work?
My question is if a parser function on server side can track duplicate attributes, could then the entire length extension attack be a nonsense?
You are talking about a well-written parser. Writing software is hard and writing correct software is very hard.
In that example, you have seen an overwritten attribute. Are you able to say that a good parser must take the last one or the first one? What is the rule? There can be stations that the last one must be taken! That is an attack that can be applied or not. This depends on the station. If you consider that the knowledge of the length extension attack goes back to 1990s, then finding a place applicable to this should amaze someone!. And, it is applied in the wild to Flickr API in 2009, after almost 20 years;
Flickr's API Signature Forgery by Thai Duong and Juliano Rizzo Published on Sep. 28, 2009.
My question is why would we have to append new key? We don't do it when we are attacking H(key|message). Why can't we relay on the fact that we will pass verification test (we would create correct hash) and that if parser tries to extract key from it, that it would take the only key in the block we send out and resume from there. Why would we have to send 2 keys? Why doesnt attack against H(message|key) work?
The attack is a signature forgery. The key is not known to the attacker, but they can still forge new signatures. The new message and signature - extended hash - is sent to the server, then the server takes the key and appends it to the message to execute a canonical verification, that is; if it does the signature is valid.
The parser doesn't extract the key, it already knows the key. The point is that can you make sure that the data is really extended or not. The padding rule is simple, add 1 and fill many zeroes so that the last 64 (128) is the length encoding (very simplified, for example, the final length must be multiple of 512 for SHA256). To see that there is another padding inside you must check every block and then you may claim that there is an extension attack. Yes, you can do this, however, the one of aims of cryptography is to reduce the dependencies, too. If we can create a better signature that eliminates the checking then we suggest to left the others. This enables the software developers to write more secure implementation easily.
Why doesn't attack against H(message|key) work?
Simple, you get the extended message message|extended and send the extended hash
H(message|key|extended) to the server. Then the server takes the message message|extended and appends the key message|extended|key and hashes it H(message|extended|key) and clearly this is not equal to the extended one H(message|key|extended)
Note that the trimmed version of the SHA2 series like SHA-512/256 has resistance to length extension attacks. SHA3 is immune to it by design and that enables a simple KMAC signature scheme. Blake2 is also immune since it is designed with the HAIFA construction.

Proprietary handling/collecting of user defined errors

I do not know how to implement a proprietary handling procedure of user defined errors (routines/algorithm stop) or warnings messages (routine/algorithm can proceed) without using Exceptions (i.e. failwith .... Standard System Exception).
Example: I have a Module with a series of Functions that uses a lot of input data to be checked and to be used to calculate the thickness of a Pressure Vessel Component.
The calculation procedure is complex, iterative and there are a lot of checks to be performed before getting a result, check that can generate "User Defined Errors" that stop the procedure/routine/algorithm or generate a "Warning Messages" proceeding on.
I need to collect these Errors and Messages to be shown to the User in a dedicated form (Wpf or Windows form). This later at the end.
Note: every time that I read a books of F# or C# or Visual basic or an Article in Internet, I found the same Phylosophy/Warning: raise of System/User-Defined Exception should be limited as much as possible: Exception are for unmanageable Exceptional Events ( not predictable) and cause "overload" to the Computer System.
I do not know which handling philosophy to implement. I'm confused. Limited sources available on internet on this particular argument.
Actually I'm planning to adopt this Phylosophy , taken from: "https://fsharpforfunandprofit.com/posts/recipe-part2/". It sounds good for me, ... complex, but good. No other references I was able to go find on this relevant argument.
Question: there are other Phylosophies that I can consider to create this Proprietary handling/collecting of user defined errors? Some books to read or some articles?
My decision will give a big impact on how to design and write my code (splitting problem in several functions, generate a "motor" that run in sequence functions or compose then in different ways depending on results, where to check for Errors/Warnings, how to store Errors and Warning Messages to understand what is going on or where "Errors/Warnings" are genetate and caused "By Which Function"?).
Many thanks in advance.
The F# way is to encode the errors in the types as much as possible. The easiest example is an option type where you would return None if the operation failed ans Some value when it succeeded. Surprisingly, very often this is enough! If not, then you can encode different types of errors AND a success "state" in a discriminated union, e.g.
[<Measure>]
type psi
type VesselPressureResult =
| PressureOk
| WarningApproachingLimit
| ErrorOverLimitBy of int<psi>
and then you will use pattern matching to "decide" what to do in each case. If you need to add more variants, e.g. ErrorTooLow, then you would add that to the DU and then the compiler will "tell" you about all places where you need to fix the logic.
Here is the perfect source with detailed information: https://fsharpforfunandprofit.com/series/designing-with-types.html

Input Validation WCF

It seems that is you mark a DataMember property in an object you create and use the IsRequired attribute, you are only telling the comsumer that the tag for this proerty needs to be in the input schema. I need to tell the customer is not only needs to be in the input schema it needs to be populated with a value. And even further why not have a regular expression to check against?
Can someone give me a sample on how to tell the consumer of a WCF method input validation for the value being pass?
The best approach to input validation in WCF is to use a custom schema validator. Microsoft has a tutorial on the subject here:
http://msdn.microsoft.com/en-us/library/ff647820.aspx
Note: as RQDQ mentioned, this is non-trivial. However, the approach outlined in the link above is at the very least fairly modular.
There is no such mechanism at the current time in WCF (at least that I know of).
What you're describing is very non-trivial. For example, the same data control might be used by more than one operation. Each operation might specify a different set of requirements for what is valid input. Those requirements may be very complicated (e.g. some fields are required given the value of other fields on that or another DataContract).
There is no free lunch here - API documentation is the only way I know of to specify this level of information.

How to handle complex availability of information in OOP from a RESTful API

My issue is that I'm dealing with a RESTful API that returns information about objects, and when writing classes to represent them, I'm not sure how best to handle all the possibilities of the status of each variable's availability. From what I can tell, there are 5 possibilities: The information
is available
has not been requested
is currently being requested (asynchronously)
is unavailable
is not applicable
So with these, having an object represent its data with a value or null doesn't cut it. To give a more concrete example, I'm working with an API about the United States Congress, so the problem goes as thus:
I request information about a bill, and it contains a stub about the sponsoring legislator.
I eventually need to request all the information about that legislator. Not all the legislators will have all the information. Those in the House of Representatives won't have a senate class (Senators' six-year terms are staggered so a third expire every two years, the House is entirely re-elected every two years). Some won't have a twitter id, just because they don't have one. And, of course, if I have already requested information, I shouldn't try to request it again.
There's a couple options I see:
I can create a Legislator object and fill it with what information I have, but then I have to have some mechanism of tracking information availability with the getters and setters. This is kind of what I'm doing right now, but it requires a lot of repeated code.
I could create a separate class for abbreviated objects and replace them when I get more with immutable "complete" objects, but then I have to be really careful about replacing all references to them and also go through a bunch of hoops for unavailable, and especially, not applicable information.
So, I'm just wondering what other people's take on this issue is. Are there other (better?) ways of handling this complexity? What are the advantages and drawbacks of different approaches? What should I consider about what I'm trying to do in choosing an approach?
[Note: I'm working in Objective-C, but this isn't necessarily specific to that language.]
If you want to treat those remote resources as objects on the client side, the do yourself a huge favour and forget about the REST buzzword. You will drive yourself crazy. Just accept that you are doing HTTP RPC and move on as you would doing any other RPC project.
However, if you really want to do REST, you need to understand what is meant by the "State Transfer" part of the REST acronym and you need to read about HATEOAS. It is a huge mental shift for building clients, but it does have a bunch of benefits. But maybe you don't need those particular benefits.
What I do know, is if you are trying using a "REST API" to retrieve objects over the wire, you are going to come to the conclusion that REST is a load of crap.
It's an interesting question, but I think you're probably overthinking this a bit.
Firstly, I think you're considering the possible states of information a bit too much; consider the more basic consideration that you either have the information or you don't. WHY you have the information doesn't really matter, except in one case. Let me explain; if the information about a certain bill or legislator or anything is not applicable, you shouldn't be requesting it / needing it. That "state" is irrelevant. Similarly, if the information is in the process of being requested, then it is simply not yet available; the only state you really care about is whether you have the information or if you do not yet have the information.
If you start worrying about further depths of the request process, you risk getting into a deep, endless cycle of managing state; has the information changed between when I got it and now? All you can know about the information is if you've been told what it is. This is fundamental to the REST process; you're getting REPRESENTATION of the underlying data, but there's no mistake about it; the representation is NOT the underlying data, any more than a congressman's name is the congressman himself.
Second, don't worry about information availability. If an object has a subobject, when you query the object, query for the subobject. If you get back data, great. If you get back that the data isn't available, that too is a representation of the subobject's data; it's just a different representation than you were hoping for, but it's equally valid. I'd represent that as an object with a null value; the object exists (was instantiated because it belonged to the parent), but you have no valid data about it (the representation returned was empty due to some reason; lack of availability, server down, data changed; whatever).
Finally, the real key here is that you need to be remembering that a RESTful structure is driven by hypermedia; a request to an object that does not return the full object's data should return an URI for requesting the subobject's data; and so forth. The key here is that those structures aren't static, like your object structure seems to be hoping to treat them; they're dynamic, and it's up to the server to determine the representation (i.e., the interrelationship). Attempting to define that in stone with a concrete object representation ahead of time means that you're dealing with the system in a way that REST was never meant to be dealt with.