equivalent of normalize function of DOM in JDOM - jdom

Can some tell me the function similar to normalize() of DOM in JDOM? I actually want to normalize the XML content and serialise it through XMLSerializer.
Thank You
Sam

Sandeep.
JDOM does not have a direct 'normalize' concept. Writing one would not be particularly hard, though. On the other hand, your intention is to output the XML in some format, and all the JDOM Output mechanisms will normalize the data for you.
So, for example, if you want to output the JDOM document as plain XML text, you can use the XMLOutputter class in org.jdom2.output and use an appropriate org.jdom2.output.Format instance (say, Format.getPrettyFormat() - do not use getRawFormat() as the raw formatter will not normalize the output at all).
In addition to outputting the JDOM document as text-based XML, you can also output to a DOM document, a SAX even stream, and even StAX streams. Each of these will produce a 'Normalized' output.
So, what you want to do (probably), is to:
Document mudoc = .....;
XMLOutputter xout = new XMLOutputter(Format.getPrettyFormat());
xout.output(mydoc, somestream);
Rolf

Related

Tabulator - formatting print and PDF output

I am a relatively new user of Tabulator so please forgive me if I am asking anything that, perhaps, should be obvious.
I have a Tabulator report that I am able to print and create as a PDF, but the report's formatting (as shown on the screen) is not used in either output.
For printing I have used printAsHtml and printStyled=true, but this doesn't produce a printout that matches what is on the screen. I have formatted number fields (with comma separators) and these are showing correctly, but the number columns should be right-aligned but all of the columns appear as left-aligned.
I am also using Tree View where the tree rows are coloured differently to the main table, but when I print the report with a tree open it colours the whole table with the tree colours and not just the tree.
For the PDF none of the Tabulator formatting is being used. I've looked for anything similar to the printStyled option, but I can't see anything. I've also looked at the autoTable option, but I am struggling to find what to use.
I want to format the print and PDF outputs so that they look as close to the screen representation as possible.
Is there anywhere I could look that would provide examples of how to achieve the above? The Tabulator documentation is very good, but the provided examples don't appear to explain what I am trying to do.
Perhaps there are there CSS classes that I am missing or even mis-using? I have tried including .tabulator-print-table in my CSS, but I am probably not using it correctly. I also couldn't find anything equivalent for producing PDFs. Some examples would help immensely.
Thank you in advance for any advice or assistance.
Formatting is deliberately not included in these, below i will outline why:
Downloaders
Downloaded files do not contain formatted data, only the raw data, this is because a lot of the formatters create visual elements (progress bar, star formatter etc) that cannot be replicated sensibly in downloaded files.
If you want to change the format of data in the download you will need to use an accessor, the accessorDownload option is the one you want to use in this case. The accessors transform the data as it is leaving the table.
For instance we could create an accessor that prepended "Mr " to the front of every name in a column:
var mrAccessor= function(value, data, type, params, column, row){
return "Mr " + value;
}
Assign it to a columns definition:
{title:"Name", field:"name", accessorDownload:mrAccessor}
Printing
Printing also does not include the formatters, this is because when you print a Tabulator table, the whole table is actually rebuilt as a standard HTML table, which allows the printer to work out how to layout everything across multiple pages with column headers etc. The downside of this is that it is only loosely styled like a Tabulator and so formatted contents generated inside Tabulator cells will likely break when added to a normal td element.
For this reason there is also a accessorPrint option that works in the same way as the download accessor but for printing.
If you want to use the same accessor for both occasions, you can assign the function once to the accessor option and it will be applied in both instances.
Checkout the Accessor Documentation for full details.

How to serialize data in example-in-example format for tensorflow-ranking?

I'm building a ranking model with tensorflow-ranking. I'm trying to serialize a data set in the TFRecord format and read it back at training time.
The tutorial doesn't show how to do this. There is some documentation here on an example-in-example data format but it's hard for me to understand: I'm not sure what the serialized_context or serialized_examples fields are or how they fit into examples and I'm not sure what the Serialize() function in the code block is.
Concretely, how can I write and read data in example-in-example format?
The context is a map from feature name to tf.train.Feature. The examples list is a list of maps from feature name to tf.train.Feature. Once you have these, the following code will create an "example-in-example":
context = {...}
examples = [{...}, {...}, ...]
serialized_context = tf.train.Example(features=tf.train.Features(feature=context)).SerializeToString()
serialized_examples = tf.train.BytesList()
for example in examples:
tf_example = tf.train.Example(features=tf.train.Features(feature=example))
serialized_examples.value.append(tf_example.SerializeToString())
example_in_example = tf.train.Example(features=tf.train.Features(feature={
'serialized_context': tf.train.Feature(bytes_list=tf.train.BytesList(value=[serialized_context])),
'serialized_examples': tf.train.Feature(bytes_list=serialized_examples)
}))
To read the examples back, you may call
tfr.data.parse_from_example_in_example(example_pb,
context_feature_spec = context_feature_spec,
example_feature_spec = example_feature_spec)
where context_feature_spec and example_feature_spec are maps from feature name to tf.io.FixedLenFeature or tf.io.VarLenFeature.
First of all, I recommend reading this article to ensure that you know how to create a tf.Example as well as tf.SequenceExample (which by the way, is the other data format supported by TF-Ranking):
Tensorflow Records? What they are and how to use them
In the second part of this article, you will see that a tf.SequenceExample has two components: 1) Context and 2)Sequence (or examples). This is the same idea that Example-in-Example is trying to implement. Basically, context is the set of features that are independent of the items that you want to rank (a search query in the case of search, or user features in the case of a recommendation system) and the sequence part is a list of items (aka examples). This could be a list of documents (in search) or movies (in recommendation).
Once you are comfortable with tf.Example, Example-in-Example will be easier to understand. Take a look at this piece of code for how to create an EIE instance:
https://www.gitmemory.com/issue/tensorflow/ranking/95/518480361
1) bundle context features together in a tf.Example object and serialize it
2) bundle sequence(example) features (each of which could contain a list of values) in another tf.Example object and serialize this one too.
3) wrap these inside a parent tf.Example
4) (if you're writing to tfrecords) serialize the parent tf.Example object and write to your tfrecord file.

Can we do string manipulation and conditional check in smooks?

I want to manipulate a large text file, which is coming as TEXT and want to use smooks to manipulate it. The text file contains large number of lines. And from each line, i have to split the characters and get information out of that.
Eg: i do following in java;
row.substring(0, 4)
row.substring(4, 64)
I have to convert the text content to CSV file.
Can we do exact same string manipulation in smooks too? (that is in smooks configuration can i do that?) I believe i can use Fixed Length processing for that?
How to add IF ELSE condition in smooks configuration?
Like in java;
if (row.length() == 900) {
//DO
}else(){
//DO
}
We can do string manipulation using fixed length reader[1]. but still i do not find a way to do condition check.
Eg: if /else
[1]http://www.smooks.org/mediawiki/index.php?title=V1.4:Smooks_v1.4_User_Guide#XML
If the format does not fit the flatfile reader, then you might be able to use the regex reader: https://github.com/smooks/smooks/tree/v1.5.1/smooks-examples/flatfile-to-xml-regex/
As for the conditional stuff... you really need to bind the data fragments into a Java model of some sort (real or virtual) and then conditionally process those fragments by either adding elements on the visitors being applied, or process the fragments by routing them to another process that processes them in parallel, which is a far better way of processing a huge data stream.

How to write to a file in Go

I have seen How to read/write from/to file using golang? and http://golang.org/pkg/os/#File.Write but could not get answer.
Is there a way, I can directly write an array of float/int to a file. Or do I have to change it to byte/string to write it. Thanks.
You can use the functions in the encoding/binary package for this purpose.
As far as writing an entire array at once goes, there are no functions for this. You will have to iterate the array and write each element individually. Ideally, you should prefix these elements with a single integer, denoting the length of the array.
If you want a higher level solution, you can try the encoding/gob package:
Package gob manages streams of gobs - binary values exchanged between an Encoder (transmitter) and a Decoder (receiver). A typical use is transporting arguments and results of remote procedure calls (RPCs) such as those provided by package "rpc".

Take input from function parameters, not a file

All of the examples which I see read in a file the lex & parse it.
I need to have a function which takes a string (char *, I'm generating C code) as parameter, and acts upon that.
How can I do that best? I thought of writing the string to a stream, then feeding that to the lexer, but it doesn't feel right. Is there any better way?
Thanks in advance
You would need to use the antlr3NewAsciiStringInPlaceStream method.
You didn't say what version of Antlr you were using so I'll assume Antlr v3.
The inputs to this method are the string to parse, it's length and then you can probably use NULL for the last input.
This produces an input stream similar to the antlr3AsciiFileStreamNew that you would use to parse a file.
I see that you mentioned writing the input to a stream. If you can use C++ then that's the best method you'll probably come by.
This is the barebones code I normally use:
std::istringstream issInput(std::cin); // make this an ifstream if you want to parse a file
Lexer oLexer(issInput);
Parser oParser(oLexer);
oFactory("CommonASTWithHiddenTokens",&antlr::CommonASTWithHiddenTokens::factory);
antlr::ASTFactory oFactory;
oParser.initializeASTFactory(oFactory);
oParser.setASTFactory(&oFactory);
oParser.main();
antlr::RefAST ast = oParser.getAST();
if (ast)
{
TreeWalker oTreeWalker;
oTreeWalker.main(ast, rPCode);
}
I think you should feed it to a stream. You could feed it to stdin if you'd like. That way, your code shouldn't differ too much from reading strings from a file.