I just found a weird behavior when attempting to extract a string from the Binary table in the MSI.
I have a file containing Hello world, the data I get is ???Hello world. (Literary question mark.)
Is this as intended?
Will it always be exactly 3 characters in the beginning?
Sample code:
[CustomAction]
public static ActionResult CustomAction2(Session session)
{
View v = session.Database.OpenView("SELECT `Name`,`Data` FROM `Binary`");
v.Execute();
Record r = v.Fetch();
int datalen = r.GetDataSize("Data");
System.IO.Stream strm = r.GetStream("Data");
byte[] rawData = new byte[datalen];
int res = strm.Read(rawData, 0, datalen);
strm.Close();
String s = System.Text.Encoding.ASCII.GetString(rawData);
// s == "???Hello World"
return ActionResult.Success;
}
Wild guess, but if you created the file using Notepad, couldn't that just be your byte order mark?
Try
String s = System.Text.Encoding.UTF8.GetString(rawData);
if (s.Length > 0 && s[0] == '\uFEFF')
{
s = s.Substring(1);
}
instead of String s = System.Text.Encoding.ASCII.GetString(rawData);
Related
I'm writing data to BigQuery and successfully gets written there. But I'm concerned with the format in which it is getting written.
Below is the format in which the data is shown when I execute any query in BigQuery :
Check the first row, the value of SalesComponent is CPS_H but its showing 'BeamRecord [dataValues=[CPS_H' and In the ModelIteration the value is ended with a square braket.
Below is the code that is used to push data to BigQuery from BeamSql:
TableSchema tableSchema = new TableSchema().setFields(ImmutableList.of(
new TableFieldSchema().setName("SalesComponent").setType("STRING").setMode("REQUIRED"),
new TableFieldSchema().setName("DuetoValue").setType("STRING").setMode("REQUIRED"),
new TableFieldSchema().setName("ModelIteration").setType("STRING").setMode("REQUIRED")
));
TableReference tableSpec = BigQueryHelpers.parseTableSpec("beta-194409:data_id1.tables_test");
System.out.println("Start Bigquery");
final_out.apply(MapElements.into(TypeDescriptor.of(TableRow.class)).via(
(MyOutputClass elem) -> new TableRow().set("SalesComponent", elem.SalesComponent).set("DuetoValue", elem.DuetoValue).set("ModelIteration", elem.ModelIteration)))
.apply(BigQueryIO.writeTableRows()
.to(tableSpec)
.withSchema(tableSchema)
.withCreateDisposition(CreateDisposition.CREATE_IF_NEEDED)
.withWriteDisposition(WriteDisposition.WRITE_TRUNCATE));
p.run().waitUntilFinish();
EDIT
I have transformed BeamRecord into MyOutputClass type using below code and this also doesn't work:
PCollection<MyOutputClass> final_out = join_query.apply(ParDo.of(new DoFn<BeamRecord, MyOutputClass>() {
private static final long serialVersionUID = 1L;
#ProcessElement
public void processElement(ProcessContext c) {
BeamRecord record = c.element();
String[] strArr = record.toString().split(",");
MyOutputClass moc = new MyOutputClass();
moc.setSalesComponent(strArr[0]);
moc.setDuetoValue(strArr[1]);
moc.setModelIteration(strArr[2]);
c.output(moc);
}
}));
It looks like your MyOutputClass is constructed incorrectly (with incorrect values). If you look at it, BigQueryIO is able to create rows with correct fields just fine. But those fields have wrong values. Which means that when you call .set("SalesComponent", elem.SalesComponent) you already have incorrect data in the elem.
My guess is the problem is in some previous step, when you convert from BeamRecord to MyOutputClass. You would get a result similar to what you're seeing if you did something like this (or some other conversion logic did this for you behind the scenes):
convert BeamRecord to string by calling beamRecord.toString();
if you look at BeamRecord.toString() implementation you can see that you're getting exactly that string format;
split this string by , getting an array of strings;
construct MyOutputClass from that array;
Pseudocode for this is something like:
PCollection<MyOutputClass> final_out =
beamRecords
.apply(
ParDo.of(new DoFn() {
#ProcessElement
void processElement(Context c) {
BeamRecord record = c.elem();
String[] fields = record.toString().split(",");
MyOutputClass elem = new MyOutputClass();
elem.SalesComponent = fields[0];
elem.DuetoValue = fields[1];
...
c.output(elem);
}
})
);
Correct way of doing something like this is to call getters on the record instead of splitting its string representation, along these lines (pseudocode):
PCollection<MyOutputClass> final_out =
beamRecords
.apply(
ParDo.of(new DoFn() {
#ProcessElement
void processElement(Context c) {
BeamRecord record = c.elem();
MyOutputClass elem = new MyOutputClass();
//get field value by name
elem.SalesComponent = record.getString("CPS_H...");
// get another field value by name
elem.DuetoValue = record.getInteger("...");
...
c.output(elem);
}
})
);
You can verify something like this by adding a simple ParDo where you either put a breakpoint and look at the elements in the debugger, or output the elements somewhere else (e.g. console).
I was able to resolve this issue using below methods :
PCollection<MyOutputClass> final_out = record40.apply(ParDo.of(new DoFn<BeamRecord, MyOutputClass>() {
private static final long serialVersionUID = 1L;
#ProcessElement
public void processElement(ProcessContext c) throws ParseException {
BeamRecord record = c.element();
String strArr = record.toString();
String strArr1 = strArr.substring(24);
String xyz = strArr1.replace("]","");
String[] strArr2 = xyz.split(",");
I'm trying to save tables from excel sheets as pictures. Is there a way to just put that table on the clipboard and save it? This is what I've got so far but the library referenced is not there?
Thank you in advance!
-Rueben Ramirez
Public Sub extract_excelTable(ByRef data_file As String, ByRef app1 As excel.Application, ByRef sheet_name As String)
'defining new app to prevent out of scope open applications
Dim temp_app As excel.Application = app1
Dim workbook As excel.Workbook = temp_app.Workbooks.Open(Path.GetFullPath(data_file))
temp_app.Visible = False
For Each temp_table As excel.DataTable In workbook.Worksheets(sheet_name)
temp_table.Select()
'temp_app.Selection.CopyAsPicture?
Next
End Sub
I'm not going to write any code here, but I will outline a solution for you that will work. Note that this will not reproduce the formatting of the excel document, just simply get the data from it, and put it on an image in the same column/row order as the excel file.
STEP 1:
My solution to this problem would be to read the data from the excel file using an OLEDB connection as outlined in the second example of this post: Reading values from an Excel File
Alternatively, you may need to open the document in excel and re-save it as a CSV if it's too large to fit in your computer's memory. I have some code that reads a CSV into a string list in C# that may help you:
static void Main(string[] args)
{
string Path = "C:/File.csv";
System.IO.StreamReader reader = new System.IO.StreamReader(Path);
//Ignore the header line
reader.ReadLine();
string[] vals;
while (!reader.EndOfStream)
{
ReadText = reader.ReadLine();
vals = SplitLine(ReadText);
//Do some work here
}
}
private static string[] SplitLine(string Line)
{
string[] vals = new string[42];
string Temp = Line;
for (int i = 0; i < 42; i++)
{
if (Temp.Contains(","))
{
if (Temp.Substring(0, Temp.IndexOf(",")).Contains("\""))
{
vals[i] = Temp.Substring(1, Temp.IndexOf("\",", 1) - 1);
Temp = Temp.Substring(Temp.IndexOf("\",", 1) + 2);
}
else {
vals[i] = Temp.Substring(0, Temp.IndexOf(","));
Temp = Temp.Substring(Temp.IndexOf(",") + 1);
}
}
else
{
vals[i] = Temp.Trim();
}
}
return vals;
}
STEP 2:
Create a bitmap object to create an image, then use a for loop to draw all of the data from the excel document onto the image. This post had an example of using the drawstring method to do so: how do i add text to image in c# or vb.net
I am trying to perform a "translation" of sorts of a stream of text. More specifically, I need to tokenize the input stream, look up every term in a specialized dictionary and output the corresponding "translation" of the token. However, i also want to preserve all the original whitespaces, stopwords etc from the input so that the output is formatted in the same way as the input instead of ended up being a stream of translations. So if my input is
Term1: Term2 Stopword! Term3
Term4
then I want the output to look like
Term1': Term2' Stopword! Term3'
Term4'
(where Termi' is translation of Termi) instead of simply
Term1' Term2' Term3' Term4'
Currently I am doing the following:
PatternAnalyzer pa = new PatternAnalyzer(Version.LUCENE_31,
PatternAnalyzer.WHITESPACE_PATTERN,
false,
WordlistLoader.getWordSet(new File(stopWordFilePath)));
TokenStream ts = pa.tokenStream(null, in);
CharTermAttribute charTermAttribute = ts.getAttribute(CharTermAttribute.class);
while (ts.incrementToken()) { // loop over tokens
String termIn = charTermAttribute.toString();
...
}
but this, of course, loses all the whitespaces etc. How can I modify this to be able to re-insert them into the output? thanks much!
============ UPDATE!
I tried splitting the original stream into "words" and "non-words". It seems to work fine. Not sure whether it's the most efficient way, though:
public ArrayList splitToWords(String sIn)
{
if (sIn == null || sIn.length() == 0) {
return null;
}
char[] c = sIn.toCharArray();
ArrayList<Token> list = new ArrayList<Token>();
int tokenStart = 0;
boolean curIsLetter = Character.isLetter(c[tokenStart]);
for (int pos = tokenStart + 1; pos < c.length; pos++) {
boolean newIsLetter = Character.isLetter(c[pos]);
if (newIsLetter == curIsLetter) {
continue;
}
TokenType type = TokenType.NONWORD;
if (curIsLetter == true)
{
type = TokenType.WORD;
}
list.add(new Token(new String(c, tokenStart, pos - tokenStart),type));
tokenStart = pos;
curIsLetter = newIsLetter;
}
TokenType type = TokenType.NONWORD;
if (curIsLetter == true)
{
type = TokenType.WORD;
}
list.add(new Token(new String(c, tokenStart, c.length - tokenStart),type));
return list;
}
Well it doesn't really lose whitespace, you still have your original text :)
So I think you should make use of OffsetAttribute, which contains startOffset() and endOffset() of each term into your original text. This is what lucene uses, for example, to highlight snippets of search results from the original text.
I wrote up a quick test (uses EnglishAnalyzer) to demonstrate:
The input is:
Just a test of some ideas. Let's see if it works.
The output is:
just a test of some idea. let see if it work.
// just for example purposes, not necessarily the most performant.
public void testString() throws Exception {
String input = "Just a test of some ideas. Let's see if it works.";
EnglishAnalyzer analyzer = new EnglishAnalyzer(Version.LUCENE_35);
StringBuilder output = new StringBuilder(input);
// in some cases, the analyzer will make terms longer or shorter.
// because of this we must track how much we have adjusted the text so far
// so that the offsets returned will still work for us via replace()
int delta = 0;
TokenStream ts = analyzer.tokenStream("bogus", new StringReader(input));
CharTermAttribute termAtt = ts.addAttribute(CharTermAttribute.class);
OffsetAttribute offsetAtt = ts.addAttribute(OffsetAttribute.class);
ts.reset();
while (ts.incrementToken()) {
String term = termAtt.toString();
int start = offsetAtt.startOffset();
int end = offsetAtt.endOffset();
output.replace(delta + start, delta + end, term);
delta += (term.length() - (end - start));
}
ts.close();
System.out.println(output.toString());
}
I want to save content of a RichTextBox to varbinary (= byte array) in XamlPackage format.
I need technicial advise on how to it.
I actually need to know how to convert between FlowDocument to byte array.
Is it even recommended to store it as varbinary, or this is a bad idea?
Update
Code snippet:
///Load
byte[] document = GetDocumentFromDataBase();
RickTextBox tb = new RickTextBox();
TextRange tr = new TextRange(tb.Document.ContentStart, tb.Document.ContentEnd)
tr.Load(--------------------------) //Load from the byte array.
///Save
int maxAllowed = 1024;
byte[] document;
RichTextBox tb = new RichTextBox();
//User entered text and designs in the rich text
TextRange tr = new TextRange(tb.Document.ContentStart, tb.Document.ContentEnd)
tr.Save(--------------------------) //Save to byte array
if (document.Length > maxAllowed)
{
MessageBox.Show((document.Length - maxAllowed) + " Exceeding limit.");
return;
}
SaveToDataBase();
TextRange
I can't find my full example right now, but you can use XamlReader and XamlWriter to get the document into and out of a string. From there, you can use UnicodeEncoding, AsciiEncoding or whatever encoder you want to get it into and out of bytes.
My shorter example for setting the document from a string...
docReader is my flow document reader
private void SetDetails(string detailsString)
{
if (docReader == null)
return;
if (String.IsNullOrEmpty(detailsString))
{
this.docReader.Document = null;
return;
}
using (
StringReader stringReader = new StringReader(detailsString))
{
using (System.Xml.XmlReader reader = System.Xml.XmlReader.Create(stringReader))
{
this.docReader.Document = XamlReader.Load(reader) as FlowDocument;
}
}
}
I am trying to download a file I have uploaded to an image field in my MS-SQL database. The problem is that when I try to open the file it just says System.Byte[] instead of containing the actual content.
UploadFiles is my class which contains the filename, id, filedata etc.
public void DownloadUploadedFile(Page sender, UploadFiles uf)
{
sender.Response.Clear();
sender.Response.ContentType = uf.FileType;
sender.Response.AddHeader("Content-Disposition",
"attachment; filename=" + uf.FileName);
sender.Response.BinaryWrite(uf.FileData); // the binary data
sender.Response.End();
}
Here I retrieve the data from my database:
while (reader.Read())
{
UploadFiles uf = new UploadFiles();
uf.FileData = encoding.GetBytes(reader["filedata"].ToString());
uf.FileName = reader["name"].ToString();
uf.FileType = reader["filetype"].ToString();
uf.FileId = Convert.ToInt32(reader["id"]);
return uf;
}
The
uf.FileData = encoding.GetBytes(reader["filedata"].ToString());
should be
uf.FileData = (byte[])reader["filedata"];
The data returned is a byte array, and you call ToString() on the byte array, which just defaults to returning the class name (system.byte[]) - which you then convert to a byte array. It should just be cast right away
Try uf.FileName.ToString(), otherwise you're getting the object type, not the FileName property text.