How can I use the StreamWriteAsText() to write data of the Number type? - dm-script

My ultimate goal is to write a file of image data and the time it was taken, for multiple times. This could be used to produce time vs intensity plots.
To do this, I am trying to write a 1D image to a file stream repeatedly in time using the ImageWriteImageDataToStream() function. I go about this by attaching a Listener object to the camera view I am reading out and this listener executes a function that writes the image to a file stream using ImageWriteImageDataToStream() every time the data changes (messagemap = "data_changed:MyFunctiontoExecute") .
My question is, is there a way to also write a time stamp to this same file stream?
All I can find is StreamWriteAsText(), which takes a String data type. Can I convert time which is a Number type to a String type?
Does anyone have a better way to do this?
My solution at the moment is to create a separate file at the same time and record the timing using WriteFile(), so not using a file stream.
//MyFunctiontoExecute, where Img is the 1D image at the current time
My_file_stream.StreamSetPos(2,0)
ImageWriteImageDataToStream(Img, My_file_stream, 0)
//Write the time to the same file
Number tmp_time = GetHighResTickCount() - start_time
My_file_stream.StreamSetPos(2,0)
My_file_stream.StreamWriteAsText(0,tmp_time) //does not work
//instead using a different file
WriteFile(My_extrafileID,tmp_time+"/n")

I think your concept of streaming is wrong. When you stream to a file, at the end of the toStream() commands, the stream-position is already at the end. So you don't set the position.
Your script essentially tells the computer to set the stream back to that starting position and then to write the text - overwriting the data.
You only need the 'StreamSetPos()' command when you want to jump over some sections during reading (useful when defining import-scripts for specific file formats, for example. Or to extract only specific sub-sets from a file.).
If all you want to do is "stream-out some raw-data", you do exactly that: Just call the commands after each other:
void WriteDataPlusDateToStream( object fStream, image img, string dateStr )
{
number endian = 0
number encoding = 0
img.ImageWriteImageDataToStream(fStream,endian)
fStream.StreamWriteAsText(encoding,dateStr)
}
Similarly, you just "stream-in" by just following the same sequence:
void ReadDataPlusDateFromStream( object fStream, image img, string &dateStr )
{
number endian = 0
number encoding = 0
img.ImageReadImageDataFromStream(fStream,endian)
fStream.StreamReadTextLine(encoding,dateStr)
}
Two things are important here:
in ImageReadImageDataFromStream it is the size and data-type of the image img which defines how many bytes are read from the stream and how they are interpreted. Therefore img must have been pre-created and of fitting size and file-type.
in StreamReadTextLine the stream will continue to read in as text until it encounters the end-of-line character (\n) or the end of the stream. Therefore make sure to write this end-of-line character when streaming-out. Alternatively, you can make sure that the strings are always of a specific size and then use StreamReadAsText with the appropriate length specified.
Using the two methods above, you can use the following test-script as a starting point:
void WriteDataPlusDateToStream( object fStream, image img, string dateStr )
{
number endian = 0
number encoding = 0
img.ImageWriteImageDataToStream(fStream,endian)
fStream.StreamWriteAsText(encoding,dateStr)
}
void ReadDataPlusDateFromStream( object fStream, image img, string &dateStr )
{
number endian = 0
number encoding = 0
img.ImageReadImageDataFromStream(fStream,endian)
fStream.StreamReadTextLine(encoding,dateStr)
}
void writeTest(string path)
{
Result("\n Writing to :" + path )
image testImg := RealImage("Test",4,100)
string dateStr;
number loop = 5;
number doAutoClose = 1
object fStream = NewStreamFromFileReference( CreateFileForWriting(path), doAutoClose )
for( number i=0; i<loop; i++ )
{
testImg = icol * random()
dateStr = GetDate(1)+"#"+GetTime(1)+"|"+Format(GetHighResTickCount(),"%.f") + "\n"
fStream.WriteDataPlusDateToStream(testImg,dateStr)
sleep(0.33)
}
}
void readTest(string path)
{
Result("\n Reading form :" + path )
image testImg := RealImage("Test",4,100)
string dateStr;
number doAutoClose = 1
object fStream = NewStreamFromFileReference( OpenFileForReading(path), doAutoClose )
while ( fStream.StreamGetPos() < fStream.StreamGetSize() )
{
fStream.ReadDataPlusDateFromStream(testImg,dateStr)
result("\n time:"+dateStr)
testImg.ImageClone().ShowImage()
}
}
string path = "C:/test.dat"
ClearResults()
writeTest(path)
readTest(path)
Note, that when streaming "binary data" like this, it is you who defines the file-format. You must make sure that the writing and reading code matches up.

Related

How to get an image from a long string data in dm-script

I would like to get an image data from a string data array.
The below script runs well but speed is low.
(The actual length of the string data is much longer than in the example below.)
I guess pixel addressing in the for loop would take a time.
image str2img(string str)
{
image img:=RealImage("",4,10,1)
string tempstr=str
for(number i=0;i<10;i++)
{
if(find(tempstr,",")!=-1)
{
img[i,0]=tempstr.left(find(tempstr,",")).val()
tempstr=tempstr.right(tempstr.len()-find(tempstr,",")-1)
result(tempstr+"\n")
}else
{
img[i,0]=tempstr.val()
}
}
return img
}
string input="1,2,3,4,5,6,7,8,9,10"
image output=str2img(input)
output.showimage()
Then I wrote the following script to use stream.
However I got the error massage 'Non-numeric text encountered'.
image str2img(string str)
{
TagGroup Tg=NewTagGroup()
Tg.TagGroupSetTagAsString("data",str)
object fstream=NewStreamFromBuffer(0)
TagGroupWriteTagDataToStream(Tg,"data",fstream,0)
fstream.StreamSetPos(0,0)
number bLinesAreRows=1
number bSizeByCount=1
number dtype=2 //2 for real4 (float)
object imgSizeObj = Alloc( "ImageData_ImageDataSize" )
image img := ImageImportTextData( "Imag Name " , fstream , dtype , imgSizeObj , bLinesAreRows , bSizeByCount )
return img
}
string input="1,2,3,4,5,6,7,8,9,10"
image output=str2img(input)
output.showimage()
Is the "ImageImportTextData()" function valid only for reading a saved file?
Or are there any efficient way to obtain an image from a long string data?
Very good question and I like they way you were going about it.
No, ImageImportTextData() works for any stream as you will see in the example below.
However, the command requires text-lines to be finalized by line-breaks if you want it to count, and there seems to be an issue with String-tags streaming. I never use this, as there are dedicated commands to stream text.
So, your fixed script looks like:
image str2img(string str)
{
object fstream=NewStreamFromBuffer(0)
fStream.StreamWriteAsText(0,str) // Write text to stream directly
fstream.StreamSetPos(0,0)
number bLinesAreRows=1
number bSizeByCount=1
number dtype=2 //2 for real4 (float)
object imgSizeObj = Alloc( "ImageData_ImageDataSize" )
image img := ImageImportTextData( "Imag Name " , fstream , dtype , imgSizeObj , bLinesAreRows , bSizeByCount )
return img
}
string input="1,2,3,4,5,6,7,8,9,10\n" // Note final line-break if you want to count.
image output=str2img(input)
output.showimage()

What are the advantages of using tf.train.SequenceExample over tf.train.Example for variable length features?

Recently I read this guide on undocumented featuers in TensorFlow, as I needed to pass variable length sequences as input. However, I found the protocol for tf.train.SequenceExample relatively confusing (especially due to lack of documentation), and managed to build an input pipe using tf.train.Example just fine instead.
Are there any advantages to using tf.train.SequenceExample? Using the standard example protocol when there is a dedicated one for variable length sequences seems like a cheat, but does it bear any consequence?
Here are the definitions of the Example and SequenceExample protocol buffers, and all the protos they may contain:
message BytesList { repeated bytes value = 1; }
message FloatList { repeated float value = 1 [packed = true]; }
message Int64List { repeated int64 value = 1 [packed = true]; }
message Feature {
oneof kind {
BytesList bytes_list = 1;
FloatList float_list = 2;
Int64List int64_list = 3;
}
};
message Features { map<string, Feature> feature = 1; };
message Example { Features features = 1; };
message FeatureList { repeated Feature feature = 1; };
message FeatureLists { map<string, FeatureList> feature_list = 1; };
message SequenceExample {
Features context = 1;
FeatureLists feature_lists = 2;
};
An Example contains a Features, which contains a mapping from feature name to Feature, which contains either a bytes list, or a float list or an int64 list.
A SequenceExample also contains a Features, but it also contains a FeatureLists, which contains a mapping from list name to FeatureList, which contains a list of Feature. So it can do everything an Example can do, and more. But do you really need that extra functionality? What does it do?
Since each Feature contains a list of values, a FeatureList is a list of lists. And that's the key: if you need lists of lists of values, then you need SequenceExample.
For example, if you handle text, you can represent it as one big string:
from tensorflow.train import BytesList
BytesList(value=[b"This is the first sentence. And here's another."])
Or you could represent it as a list of words and tokens:
BytesList(value=[b"This", b"is", b"the", b"first", b"sentence", b".", b"And", b"here",
b"'s", b"another", b"."])
Or you could represent each sentence separately. That's where you would need a list of lists:
from tensorflow.train import BytesList, Feature, FeatureList
s1 = BytesList(value=[b"This", b"is", b"the", b"first", b"sentence", b"."])
s2 = BytesList(value=[b"And", b"here", b"'s", b"another", b"."])
fl = FeatureList(feature=[Feature(bytes_list=s1), Feature(bytes_list=s2)])
Then create the SequenceExample:
from tensorflow.train import SequenceExample, FeatureLists
seq = SequenceExample(feature_lists=FeatureLists(feature_list={
"sentences": fl
}))
And you can serialize it and perhaps save it to a TFRecord file.
data = seq.SerializeToString()
Later, when you read the data, you can parse it using tf.io.parse_single_sequence_example().
The link you provided lists some benefits. You can see how parse_single_sequence_example is used here https://github.com/tensorflow/magenta/blob/master/magenta/common/sequence_example_lib.py
If you managed to get the data into your model with Example, it should be fine. SequenceExample just gives a little more structure to your data and some utilities for working with it.

How to handle text file with multiple spaces as delimiter

I have a source data set which consists of text files where the columns are separated by one or more spaces, depending on the width of the column value. The data is right adjusted, i.e. the spaces are added before the actual data.
Can I use one of the built-in extractors or do I have to implement a custom extractor?
#wBob's solution works if your row fits into a string (128kB). Otherwise, write your custom extractor that does fixed with extraction. Depending on what information you have on the format, you can write it by using input.Split() to split into rows and then split the rows based on your whitespace rules as shown below (full example for Extractor pattern is here) or you could write one similar to the one described in this blog post.
public override IEnumerable<IRow> Extract(IUnstructuredReader input, IUpdatableRow outputrow)
{
foreach (Stream current in input.Split(this._row_delim))
{
using (StreamReader streamReader = new StreamReader(current, this._encoding))
{
int num = 0;
string[] array = streamReader.ReadToEnd().Split(new string[]{this._col_delim}, StringSplitOptions.None).Where(x => !String.IsNullOrWhiteSpace(x)));
for (int i = 0; i < array.Length; i++)
{
// Now write your code to convert array[i] into the extract schema
}
}
yield return outputrow.AsReadOnly();
}
}
}
You could create a custom extractor or more simply, import the data as one row then split and clean and it using c# methods available to you within U-SQL like Split and IsNullOrWhiteSpace, something like this:
My right-aligned sample data
// Import the row as one column to be split later; NB use a delimiter that will NOT be in the import file
#input =
EXTRACT rawString string
FROM "/input/input.txt"
USING Extractors.Text(delimiter : '|');
// Add a row number to the line and remove white space elements
#working =
SELECT ROW_NUMBER() OVER() AS rn, new SqlArray<string>(rawString.Split(' ').Where(x => !String.IsNullOrWhiteSpace(x))) AS columns
FROM #input;
// Prepare the output, referencing the column's position in the array
#output =
SELECT rn,
columns[0] AS id,
columns[1] AS firstName,
columns[2] AS lastName
FROM #working;
OUTPUT #output
TO "/output/output.txt"
USING Outputters.Tsv(quoting : false);
My results:
HTH

Split text file into several parts by character

I apologise in advance if there is already an answer to this problem; if so please just link it (I have looked, btw! I just didn't find anything relating to my specific example) :)
I have a text (.txt) file which contains data in the form 1.10.100.0.200 where 1, 10, 100, 0 and 200 are numbers storing the map terrain layout of a game. This file has multiple lines of 1.10.100.0.200 where each line represents an item of terrain in the map.
Here is what I would like to know:
How do I find out how many lines there are, so I know how many items of terrain to create when I read the map file?
What is the method I should use to get each of 1, 10, 100, 0 and 200:
E.g. when I am translating the file into a map terrain at runtime I might use the terrainitem1.Location = New Point(x, y) or terrainitem1.Size = New Size(p, q) commands, where x, y, p and q are integers or doubles relating to the terrain's location or size. Where would I then find x, y etc. out of 1, 10, 100, 0 and 200, if say x is equal to 1, y to 10 and so on?
I am sorry if this isn't clear, please just ask me and I'll try to explain.
N.B. I am using VB.NET WinForms
There is no way to know how many lines a file has without opening the file and reading its contents.
You didn't indicate how far you've got on this. Do you know how to open a file?
Here's some basic code to do what you want. (Sorry, this is C# but the idea is the same in VB.)
string line;
using (TextReader reader = File.OpenText(#"C:\filename.txt"))
{
// Read each line from the file (until null returned)
while ((line = myTextReader.ReadLine()) != null)
{
// Get each number in line (as string)
string[] values = line.Split(new[] { '.' }, StringSplitOptions.RemoveEmptyEntries);
// Convert each number to integer
id = int.Parse(values[0]);
height = int.Parse(values[1]);
width = int.Parse(values[2]);
x = int.Parse(values[3]);
y = int.Parse(values[4]);
}
}

Lucene Highlighter class: highlight different words in different colors

Probably most people reading the title who know a bit about Lucene won't need much further explanation. NB I use Jython but I think most Java users will understand the Java equivalent...
It's a classic thing to want to do: you have more than one term in your search string... in Lucene terms this returns a BooleanQuery. Then you use something like this code to highlight (NB I am a Lucene newbie, this is all closely tweaked from Net examples):
yellow_highlight = SimpleHTMLFormatter( '<b style="background-color:yellow">', '</b>' )
green_highlight = SimpleHTMLFormatter( '<b style="background-color:green">', '</b>' )
...
stream = FrenchAnalyzer( Version.LUCENE_46 ).tokenStream( "both", StringReader( both ) )
scorer = QueryScorer( fr_query, "both" )
fragmenter = SimpleSpanFragmenter(scorer)
highlighter = Highlighter( yellow_highlight, scorer )
highlighter.setTextFragmenter(fragmenter)
best_fragments = highlighter.getBestTextFragments( stream, both, True, 5 )
if best_fragments:
for best_frag in best_fragments:
print "=== best frag: %s, type %s" % ( best_frag, type( best_frag ))
html_text += "&bull %s<br>\n" % unicode( best_frag )
... and then the html_text is put in a JTextPane for example.
But how would you make the first word in your query highlight with a yellow background and the second word highlight with a green background? I have tried to understand the various classes in org.apache.lucene.search... to no avail. So my only way of learning was googling. I couldn't find any clues...
I asked this question four years ago... At the time I did manage to implement a solution using javax.swing.text.html.HTMLDocument. There's also the interface org.w3c.dom.html.HTMLDocument in the standard Java library. This way is hard work.
But for anyone interested there's a far simpler solution. Taking advantage of the fact that Lucene's SimpleHTMLFormatter returns about the simplest imaginable "marked up" piece of text: chosen words are highlighted with the HTML B tag. That's it. It's not even a "proper" HTML fragment, just a String with <B>s and </B>s in it.
A multi-word query generates a BooleanQuery... from which you can extract multiple TermQuerys by going booleanQuery.clauses() ... getQuery()
I'm working in Groovy. The colouring I want to apply is console codes, as per BASH (or Cygwin). Other types of colouring can be worked out on this model.
So you set up a map before to hold your "markup details":
def markupDetails = [:]
Then for each TermQuery, you call this, with the same text param each time, stipulating a different colour param for each term. NB I'm using Lucene 6.
def createHighlightAndAnalyseMarkup( TermQuery tq, String text, String colour ) {
def termQueryScorer = new QueryScorer( tq )
def termQueryHighlighter = new Highlighter( formatter, termQueryScorer )
TokenStream stream = TokenSources.getTokenStream( fieldName, null, text, analyser, -1 )
String[] frags = termQueryHighlighter.getBestFragments( stream, text, 999999 )
// not sure under what circs you get > 1 fragment...
assert frags.size() <= 1
// NB you don't always get all terms in all returned LDocuments...
if( frags.size() ) {
String highlightedFrag = frags[ 0 ]
Matcher boldTagMatcher = highlightedFrag =~ /<\/?B>/
def pos = 0
def previousEnd = 0
while( boldTagMatcher.find()) {
pos += boldTagMatcher.start() - previousEnd
previousEnd = boldTagMatcher.end()
markupDetails[ pos ] = boldTagMatcher.group() == '<B>'? colour : ConsoleColors.RESET
}
}
}
As I said, I wanted to colourise console output. The colour parameter in the method here is per the console colour codes as found here, for example. E.g. yellow is \033[033m. ConsoleColors.RESET is \033[0m and marks the place where each coloured bit of text stops.
... after you've finished doing this with all TermQuerys you will have a nice map telling you where individual colours begin and end. You work backwards from the end of the text so as to insert the "markup" at the right position in the String. NB here text is your original unmarked-up String:
markupDetails.sort().reverseEach{ pos, markup ->
String firstPart = text.substring( 0, pos )
String secondPart = text.substring( pos )
text = firstPart + markup + secondPart
}
... at the end of which text contains your marked-up String: print to console. Lovely.