HL7 parsing using XML as message definition - xsd-validation

I have wrote a simple HL7 message parser which is capable of parsing any message type. What it can't do is to validate the message.
The result of the message parsing is a tree which is easily traversable.
Now I want to improve the parser. I want it to be able to apply validation rules to the received message. I am thinking of using XML as a message definition. I am stuck on which approach to take. I am not sure if this makes sense.
Have you ever written a parser not necessarily for HL7 message where you had to apply schema to the message? How did you do that?
Thank you

You could roll your own solution. However there is a library out there that already does this quite well: https://hapifhir.github.io/hapi-hl7v2/
Usually those home-made solutions have all kinds of problems with intricacies of HL7 (quoting, non-standard delimiters, etc).

Related

Proprietary handling/collecting of user defined errors

I do not know how to implement a proprietary handling procedure of user defined errors (routines/algorithm stop) or warnings messages (routine/algorithm can proceed) without using Exceptions (i.e. failwith .... Standard System Exception).
Example: I have a Module with a series of Functions that uses a lot of input data to be checked and to be used to calculate the thickness of a Pressure Vessel Component.
The calculation procedure is complex, iterative and there are a lot of checks to be performed before getting a result, check that can generate "User Defined Errors" that stop the procedure/routine/algorithm or generate a "Warning Messages" proceeding on.
I need to collect these Errors and Messages to be shown to the User in a dedicated form (Wpf or Windows form). This later at the end.
Note: every time that I read a books of F# or C# or Visual basic or an Article in Internet, I found the same Phylosophy/Warning: raise of System/User-Defined Exception should be limited as much as possible: Exception are for unmanageable Exceptional Events ( not predictable) and cause "overload" to the Computer System.
I do not know which handling philosophy to implement. I'm confused. Limited sources available on internet on this particular argument.
Actually I'm planning to adopt this Phylosophy , taken from: "https://fsharpforfunandprofit.com/posts/recipe-part2/". It sounds good for me, ... complex, but good. No other references I was able to go find on this relevant argument.
Question: there are other Phylosophies that I can consider to create this Proprietary handling/collecting of user defined errors? Some books to read or some articles?
My decision will give a big impact on how to design and write my code (splitting problem in several functions, generate a "motor" that run in sequence functions or compose then in different ways depending on results, where to check for Errors/Warnings, how to store Errors and Warning Messages to understand what is going on or where "Errors/Warnings" are genetate and caused "By Which Function"?).
Many thanks in advance.
The F# way is to encode the errors in the types as much as possible. The easiest example is an option type where you would return None if the operation failed ans Some value when it succeeded. Surprisingly, very often this is enough! If not, then you can encode different types of errors AND a success "state" in a discriminated union, e.g.
[<Measure>]
type psi
type VesselPressureResult =
| PressureOk
| WarningApproachingLimit
| ErrorOverLimitBy of int<psi>
and then you will use pattern matching to "decide" what to do in each case. If you need to add more variants, e.g. ErrorTooLow, then you would add that to the DU and then the compiler will "tell" you about all places where you need to fix the logic.
Here is the perfect source with detailed information: https://fsharpforfunandprofit.com/series/designing-with-types.html

Reorder token stream in XText lexer

I am trying to lex/parse an existing language by using XText. I have already decided I will use a custom ANTLRv3 lexer, as the language cannot be lexed in a context-free way. Fortunately, I do not need parser information; just the previously encountered tokens is enough to decide the lexing mode.
The target language has an InputSection that can be described as follows: InputSection: INPUT_SECTION A=ID B=ID;. However, it can be specified in two different ways.
; The canonical way
$InputSection Foo Bar
$SomeOtherSection Fonzie
; The haphazard way
$InputSection Foo
$SomeOtherSection Fonzie
$InputSection Bar
Could I use TokenStreamRewriter to reorder all tokens in the canonical way, before passing this on to the parser? Or will this generate issues in XText later?
After a lot of investigation, I have come to the conclusion that editor tools themselves are truly not fit for this type of problem.
If you would start typing on one rule, you would have to take into account the AST context from subsequent sections to know how to auto-complete. At the same time, this will be very confusing for the user.
In the end, I will therefore simply not support this obscure feature of the language. Instead, the AST will be constructed so that a section (reasonably) divided between two parts will still parse correctly.

Why do integer error codes exist in APIs?

In almost every kind of API there are integer error codes like (ex. 123) which indicates error type. I was wondering if it wasn't better to use descriptive string codes like user_not_found or invalid_request. In my opinion they are much more practical: let's say you get back to your code after months or so and you can easily go through error handling parts without searching for error codes in documentation.
Why integer error codes still exists in APIs?
In an API, clients are usually computers that test for response codes using conditions.
It is much faster to test agains integers than to test agains strings, that's all.
Moreover, error codes have a certain logic: APIs usually use HTTP codes, so when you (as a human) read them, you know that the 2xx indicate success, 4xx indicate client-side errors and 5xx indicate server-side errors, even if you don't know them all by heart.
EDIT:
Your question made me think about this answer, about how loading times in websites affect profits. You should read it to convince yourself that even a few milliseconds sometimes matter.
But there are nice names for most errors. Both standard C, POSIX and Windows have names for their error codes. Of course, most of these names are made as preprocessor macros, but there are also functions to get a nice string or message from these messages.

vCard Parsing different parameters

I need to write a vCard Parser.
Now the problem is that the Vcard I get can have n number of paramenters
Like say
TEL;CELL:123
or
TEL;CELL;VOICE:123
or
TEL:HOME;CELL;VOICE:123
now how i get this format really depends on my sources(which can be diverse and many).
Now I need to make a generic reader which can identify tht all these different set of parameters can map to a single field(in this case Mobile number), but the way of sending this information varies across all sources(google, MS, Nokia).
can someone please give any suggestion on how to handle such situation
vCard is a bloody mess to parse, especially since almost nothing out there produces RFC 2426-compliant output. For similar reasons I ended up writing a vCard parser / validator which you can use to massage the data into compliance. I use it daily to keep my own vCards (a few hundred people/companies) compliant, and the result has for example been that Gmail now imports all of them properly, address, phones, images and all.

removing dead variables using antlr

I am currently servicing an old VBA (visual basic for applications) application. I've got a legacy tool which analyzes that application and prints out dead variables. As there are more than 2000 of them I do not want to do this by hand.
Therefore I had the idea to transform the separate codefiles which contain the dead variable according to the aforementioned tool to ASTs and remove them that way.
My question: Is there a recommended way to do this?
I do not want to use StringTemplate, as I would need to create templates for all rules and if I had a commend on the hidden channel, it would be lost, right?
Everything I need is to remove parts of that code and print out the rest as it was read in.
Any one has any recommendations, please?
Some theory
I assume that regular expressions are not enough to solve your task. That is you can't define the notion of a dead-code section in any regular language and expect to express it in a context-free language described by some antlr grammar.
The algorithm
The following algorithm can be suggested:
Tokenize source code with a lexer.
Since you want to preserve all the correct code -- don't skip or hide it's tokens. Make sure to define separate tokens for parts which may be removed or which will be used to determine the dead code, all other characters can be collected under a single token type. Here you can use output of your auxiliary tool in predicates to reduce the number of tokens generated. I guess antlr's tokenization (like any other tokenization) is expressed in a regular language so you can't remove all the dead code on this step.
Construct AST with a parser.
Here all the powers of a context-free language can be applied -- define dead-code sections in parser's rules and remove it from the AST being constructed.
Convert AST to source code. You can use some tree parser here, but I guess there is an easier way which can be found observing toString and similar methods of a tree type returned by the parser.