I want to save content of a RichTextBox to varbinary (= byte array) in XamlPackage format.
I need technicial advise on how to it.
I actually need to know how to convert between FlowDocument to byte array.
Is it even recommended to store it as varbinary, or this is a bad idea?
Update
Code snippet:
///Load
byte[] document = GetDocumentFromDataBase();
RickTextBox tb = new RickTextBox();
TextRange tr = new TextRange(tb.Document.ContentStart, tb.Document.ContentEnd)
tr.Load(--------------------------) //Load from the byte array.
///Save
int maxAllowed = 1024;
byte[] document;
RichTextBox tb = new RichTextBox();
//User entered text and designs in the rich text
TextRange tr = new TextRange(tb.Document.ContentStart, tb.Document.ContentEnd)
tr.Save(--------------------------) //Save to byte array
if (document.Length > maxAllowed)
{
MessageBox.Show((document.Length - maxAllowed) + " Exceeding limit.");
return;
}
SaveToDataBase();
TextRange
I can't find my full example right now, but you can use XamlReader and XamlWriter to get the document into and out of a string. From there, you can use UnicodeEncoding, AsciiEncoding or whatever encoder you want to get it into and out of bytes.
My shorter example for setting the document from a string...
docReader is my flow document reader
private void SetDetails(string detailsString)
{
if (docReader == null)
return;
if (String.IsNullOrEmpty(detailsString))
{
this.docReader.Document = null;
return;
}
using (
StringReader stringReader = new StringReader(detailsString))
{
using (System.Xml.XmlReader reader = System.Xml.XmlReader.Create(stringReader))
{
this.docReader.Document = XamlReader.Load(reader) as FlowDocument;
}
}
}
Related
Trying to use Apache PDFBox version 2.0.2 for a text replace (with the below code) produces an output where few of the characters would not be displayed, mostly the capital Case Character. For example a replacement with "ABCDEFGHIJKLMNOPQRSTUVWXYZ" the output appears in pdf as "ABCDEF HIJKLM OP RST W Y ". Is this some bug ?? or we have some workaround to handle these character .
public static PDDocument replaceText(PDDocument document, String searchString, String replacement) throws IOException {
if (StringUtils.isEmpty(searchString) || StringUtils.isEmpty(replacement)) {
return document;
}
PDPageTree pages = document.getDocumentCatalog().getPages();
for (PDPage page : pages) {
PDFStreamParser parser = new PDFStreamParser(page);
parser.parse();
List tokens = parser.getTokens();
for (int j = 0; j < tokens.size(); j++) {
Object next = tokens.get(j);
if (next instanceof Operator) {
Operator op = (Operator) next;
//Tj and TJ are the two operators that display strings in a PDF
if (op.getName().equals("Tj")) {
// Tj takes one operator and that is the string to display so lets update that operator
COSString previous = (COSString) tokens.get(j - 1);
String string = previous.getString();
string = string.replaceFirst(searchString, replacement);
previous.setValue(string.getBytes());
} else if (op.getName().equals("TJ")) {
COSArray previous = (COSArray) tokens.get(j - 1);
for (int k = 0; k < previous.size(); k++) {
Object arrElement = previous.getObject(k);
if (arrElement instanceof COSString) {
COSString cosString = (COSString) arrElement;
String string = cosString.getString();
string = StringUtils.replaceOnce(string, searchString, replacement);
cosString.setValue(string.getBytes());
}
}
}
}
}
// now that the tokens are updated we will replace the page content stream.
PDStream updatedStream = new PDStream(document);
OutputStream out = updatedStream.createOutputStream();
ContentStreamWriter tokenWriter = new ContentStreamWriter(out);
tokenWriter.writeTokens(tokens);
page.setContents(updatedStream);
out.close();
}
return document;
}
Quoting from
https://pdfbox.apache.org/2.0/migration.html
Why was the ReplaceText example removed?
The ReplaceText example has been removed as it gave the incorrect illusion that text can be replaced easily. Words are often split, as seen by this excerpt of a content stream:
[ (Do) -29 (c) -1 (umen) 30 (tation) ] TJ
Other problems will appear with font subsets: for example, if only the glyphs for a, b and c are used, these would be encoded as hex 0, 1 and 2, so you won’t find “abc”. Additionally, you can’t replace “c” with “d” because it isn’t part of the subset.
You could also have problems with ligatures, e.g. “ff”, “fl”, “fi”, “ffi”, “ffl”, which can be represented by a single code in many fonts. To understand this yourself, view any file with PDFDebugger and have a look at the “Contents” entry of a page.
======================================================================
Your description suggests that the initial file has been using a font subset, that is missing the characters G, N, Q, V and Y.
And no, there is no easy workaround. You would have to delete the text you don't want from the content stream, and then append a new content stream with the text you want with a new font at the correct place.
P.S. the current PDFBox version is 2.0.7, not 2.0.2.
I'm trying to save tables from excel sheets as pictures. Is there a way to just put that table on the clipboard and save it? This is what I've got so far but the library referenced is not there?
Thank you in advance!
-Rueben Ramirez
Public Sub extract_excelTable(ByRef data_file As String, ByRef app1 As excel.Application, ByRef sheet_name As String)
'defining new app to prevent out of scope open applications
Dim temp_app As excel.Application = app1
Dim workbook As excel.Workbook = temp_app.Workbooks.Open(Path.GetFullPath(data_file))
temp_app.Visible = False
For Each temp_table As excel.DataTable In workbook.Worksheets(sheet_name)
temp_table.Select()
'temp_app.Selection.CopyAsPicture?
Next
End Sub
I'm not going to write any code here, but I will outline a solution for you that will work. Note that this will not reproduce the formatting of the excel document, just simply get the data from it, and put it on an image in the same column/row order as the excel file.
STEP 1:
My solution to this problem would be to read the data from the excel file using an OLEDB connection as outlined in the second example of this post: Reading values from an Excel File
Alternatively, you may need to open the document in excel and re-save it as a CSV if it's too large to fit in your computer's memory. I have some code that reads a CSV into a string list in C# that may help you:
static void Main(string[] args)
{
string Path = "C:/File.csv";
System.IO.StreamReader reader = new System.IO.StreamReader(Path);
//Ignore the header line
reader.ReadLine();
string[] vals;
while (!reader.EndOfStream)
{
ReadText = reader.ReadLine();
vals = SplitLine(ReadText);
//Do some work here
}
}
private static string[] SplitLine(string Line)
{
string[] vals = new string[42];
string Temp = Line;
for (int i = 0; i < 42; i++)
{
if (Temp.Contains(","))
{
if (Temp.Substring(0, Temp.IndexOf(",")).Contains("\""))
{
vals[i] = Temp.Substring(1, Temp.IndexOf("\",", 1) - 1);
Temp = Temp.Substring(Temp.IndexOf("\",", 1) + 2);
}
else {
vals[i] = Temp.Substring(0, Temp.IndexOf(","));
Temp = Temp.Substring(Temp.IndexOf(",") + 1);
}
}
else
{
vals[i] = Temp.Trim();
}
}
return vals;
}
STEP 2:
Create a bitmap object to create an image, then use a for loop to draw all of the data from the excel document onto the image. This post had an example of using the drawstring method to do so: how do i add text to image in c# or vb.net
I am trying to perform a "translation" of sorts of a stream of text. More specifically, I need to tokenize the input stream, look up every term in a specialized dictionary and output the corresponding "translation" of the token. However, i also want to preserve all the original whitespaces, stopwords etc from the input so that the output is formatted in the same way as the input instead of ended up being a stream of translations. So if my input is
Term1: Term2 Stopword! Term3
Term4
then I want the output to look like
Term1': Term2' Stopword! Term3'
Term4'
(where Termi' is translation of Termi) instead of simply
Term1' Term2' Term3' Term4'
Currently I am doing the following:
PatternAnalyzer pa = new PatternAnalyzer(Version.LUCENE_31,
PatternAnalyzer.WHITESPACE_PATTERN,
false,
WordlistLoader.getWordSet(new File(stopWordFilePath)));
TokenStream ts = pa.tokenStream(null, in);
CharTermAttribute charTermAttribute = ts.getAttribute(CharTermAttribute.class);
while (ts.incrementToken()) { // loop over tokens
String termIn = charTermAttribute.toString();
...
}
but this, of course, loses all the whitespaces etc. How can I modify this to be able to re-insert them into the output? thanks much!
============ UPDATE!
I tried splitting the original stream into "words" and "non-words". It seems to work fine. Not sure whether it's the most efficient way, though:
public ArrayList splitToWords(String sIn)
{
if (sIn == null || sIn.length() == 0) {
return null;
}
char[] c = sIn.toCharArray();
ArrayList<Token> list = new ArrayList<Token>();
int tokenStart = 0;
boolean curIsLetter = Character.isLetter(c[tokenStart]);
for (int pos = tokenStart + 1; pos < c.length; pos++) {
boolean newIsLetter = Character.isLetter(c[pos]);
if (newIsLetter == curIsLetter) {
continue;
}
TokenType type = TokenType.NONWORD;
if (curIsLetter == true)
{
type = TokenType.WORD;
}
list.add(new Token(new String(c, tokenStart, pos - tokenStart),type));
tokenStart = pos;
curIsLetter = newIsLetter;
}
TokenType type = TokenType.NONWORD;
if (curIsLetter == true)
{
type = TokenType.WORD;
}
list.add(new Token(new String(c, tokenStart, c.length - tokenStart),type));
return list;
}
Well it doesn't really lose whitespace, you still have your original text :)
So I think you should make use of OffsetAttribute, which contains startOffset() and endOffset() of each term into your original text. This is what lucene uses, for example, to highlight snippets of search results from the original text.
I wrote up a quick test (uses EnglishAnalyzer) to demonstrate:
The input is:
Just a test of some ideas. Let's see if it works.
The output is:
just a test of some idea. let see if it work.
// just for example purposes, not necessarily the most performant.
public void testString() throws Exception {
String input = "Just a test of some ideas. Let's see if it works.";
EnglishAnalyzer analyzer = new EnglishAnalyzer(Version.LUCENE_35);
StringBuilder output = new StringBuilder(input);
// in some cases, the analyzer will make terms longer or shorter.
// because of this we must track how much we have adjusted the text so far
// so that the offsets returned will still work for us via replace()
int delta = 0;
TokenStream ts = analyzer.tokenStream("bogus", new StringReader(input));
CharTermAttribute termAtt = ts.addAttribute(CharTermAttribute.class);
OffsetAttribute offsetAtt = ts.addAttribute(OffsetAttribute.class);
ts.reset();
while (ts.incrementToken()) {
String term = termAtt.toString();
int start = offsetAtt.startOffset();
int end = offsetAtt.endOffset();
output.replace(delta + start, delta + end, term);
delta += (term.length() - (end - start));
}
ts.close();
System.out.println(output.toString());
}
I would like to make a simple code that counts the top three most recurring lines/ text in a txt file then saves that line/ text to another text file (this in turn will be read into AutoCAD’s variable system).
Forgetting the AutoCAD part which I can manage how do I in VB.net save the 3 most recurring lines of text each to its own text file see example below:
Text file to be read reads as follows:
APG
BTR
VTS
VTS
VTS
VTS
BTR
BTR
APG
PNG
The VB.net program would then save the text VTS to mostused.txt BTR to 2ndmostused.txt and APG to 3rdmostused.txt
How can this be best achieved?
Since I'm C# developer, I'll use it:
var dict = new Dictionary<string, int>();
using(var sr = new StreamReader(file))
{
var line = string.Empty;
while ((line = sr.ReadLine()) != null)
{
var words = line.Split(' '); // get the words
foreach(var word in words)
{
if(!dict.Contains(word)) dict.Add(word, 0);
dict[word]++; // count them
}
}
}
var query = from d in dict select d order by d.Value; // now you have it sorted
int counter = 1;
foreach(var pair in query)
{
using(var sw = new StreamWriter("file" + counter + ".txt"))
sw.writer(pair.Key);
}
I am trying to download a file I have uploaded to an image field in my MS-SQL database. The problem is that when I try to open the file it just says System.Byte[] instead of containing the actual content.
UploadFiles is my class which contains the filename, id, filedata etc.
public void DownloadUploadedFile(Page sender, UploadFiles uf)
{
sender.Response.Clear();
sender.Response.ContentType = uf.FileType;
sender.Response.AddHeader("Content-Disposition",
"attachment; filename=" + uf.FileName);
sender.Response.BinaryWrite(uf.FileData); // the binary data
sender.Response.End();
}
Here I retrieve the data from my database:
while (reader.Read())
{
UploadFiles uf = new UploadFiles();
uf.FileData = encoding.GetBytes(reader["filedata"].ToString());
uf.FileName = reader["name"].ToString();
uf.FileType = reader["filetype"].ToString();
uf.FileId = Convert.ToInt32(reader["id"]);
return uf;
}
The
uf.FileData = encoding.GetBytes(reader["filedata"].ToString());
should be
uf.FileData = (byte[])reader["filedata"];
The data returned is a byte array, and you call ToString() on the byte array, which just defaults to returning the class name (system.byte[]) - which you then convert to a byte array. It should just be cast right away
Try uf.FileName.ToString(), otherwise you're getting the object type, not the FileName property text.