R Markdown - Output lines too long in pdf - pdf

I have the following problem. I have a report that I wrote in RMarkdown. It works great, but when I look at the output in pdf (in HTML is fine), some output lines are way too long and goes outside the page. You can see an example in the image. I simply get some strings from a data.frame, nothing fancy. The strings are long but they don't get wrapped... Anyone had any idea? I found this question Pandoc: Long tablerows in Markdown->PDF documents do not get linewrap. I checked the docs but I could not really understand what to change in my markdown code...

Related

Trying to scrape item from website

I was attempting to create a simple program that would pull a text item from a website and add it to the textbox. I'm simply just experimenting and thought I could do it but it is not that easy for me. I know how to get the entire source code of a website(below). It has a id I know but it does not have a tag name. So Im not really sure how to make it read through the text and only keep the part next to the id . Or would it be better to use a Webbrowser tool and then try and get the text item like that. I'm just trying to do whatever is faster. I think my 1st option is better because it would be better for the computer's ram. Using the code below I don't know what to add next?
Dim request As System.Net.HttpWebRequest = System.Net.HttpWebRequest.Create("Website")
Dim response As System.Net.HttpWebResponse = request.GetResponse()
Dim sr As System.IO.StreamReader = New System.IO.StreamReader(response.GetResponseStream())
Dim source As String = sr.ReadToEnd()
Lets say the id is "name" for example. Viewing the source of the page this is what the part looks like(below). How can I parse through the source which is a string and find this section, get the name Brandon, and add it to the textbox.
<span id="name">Brandon</span>
There are a few ways to go about this. I'm not going to write any source code though since I haven't used Visual Basic in a very long time. But if you Google for how to do any of the following you should find many tutorials and documents on it.
Regular Expressions
Using a Regular Expression on the full source code can help you find the element by searching for the ID attribute which should be unique. Regular Expressions can sometimes be very slow, which is why if you have to perform a lot of searches on large sections of text, it should be avoided.
/<([a-z0-9]+)\sid="name"(.*?)>(.*?)<\// -> Not Tested, but might help you
String Position
Using a function that will find the position of a substring in a string would be useful. In C it's strstr and in PHP it's strpos. These type of functions will give you starting position of a string, in which your case would be searching for id="name". Once you find that, you will find the position of the end of the tag and then find the closing tag for that element. You then will perform a substring function that will get you the text starting at position X for the length of that you specify, which would be the closing tag position - end of opening tag position.
HTML / XML Library
There are probably a ton of HTML / XML libraries that will parse the document into some sort of object or an array. You then can loop through these elements until you find the one you are looking for. Some of these libraries may even have search functions of element ID's similar to how JavaScript will sort for a specific element.
These libraries may be hard to get started with, but they will offer you a lot of options in the future if you need to continue finding more HTML elements.

How do hard code HTML in VB for the Body of an email?

Sorry if that title wasn't very easy to get the gist of. This is my first post.
Basically within my program is the option to send an email. Its an E-Ticket. I have set 'IsBodyHtml' to true. It sends it fine. No problems at all.
Within the HTML code however I want to insert some fields that are relevant to each customer.
When I put set ETicket.Body = to the HTML Code I get a number of errors because words such as 'Width' and 'Height' etc are being picked up as VB words.
As a short term fix so I could test that the HTML body actually works I put the code into a rich text box and then set ETicket.Body = RichTextBox1.Text . It works, but doesn't have the data in it that I want.
The data relevant to each customer is held in an array. Any idea how I can get the HTML code to be accepted by VB? Or how I can get my data from the array into the relevant position in the rich text box?
Thankyou!
Joe
This will likely be due to the double quotes in the HMTL markup. Try doing a find and replace on the HTML, and replace double quotes (") with single ones (').

How do I grab particular google results with vb.net? to listbox

I know how to make the vb program go to Google. I even know how to navigate around, but I don't know how to manipulate the results.
Basically I want the program to grab search results from Google and output them to a listbox. So if the user searches for burgers, then the search results would be output to a listbox. Does anyone know how to do this?
here's my code so far:
Public Class Form1
Dim look, retrieve As String
Private Sub Search_Click(sender As Object, e As EventArgs) Handles Search.Click
look = InputBox("What are you looking for?")
look = look.Replace(" ", "+")
Dim G1 As String = "http://www.google.co.uk/#hl=en&tbo=d&output=search&sclient=psy-ab&q="
WebBrowser1.Navigate(G1 + look)
retrieve = InputBox("What links do you want to retrieve?")
End Sub
End Class
I know it is easier to use the google api, but it is also a lot slower. I've used the API in the past and have seen performance issues. I've just seen in another thread how to download a website's source; pretty quickly. I just don't know how to grab the urls from the downloaded source. Is anyone here any good with string manipulation?
Code so far:
sourcecode = ((New Net.WebClient).DownloadString(G1 + look))
If you look into XPATH and are not adverse to using open source third party tools, the HTML Agility Pack (Cose Examples) is supposed to be a great tool for parsing html.
Another option, that can be a pain, is to convert the source html string into a valid xml document, and then parse it using VB's xml name space. I have done this in an application I use to parse youtube play lists. The issue with this approach is it takes a bit of manual cleaning of the html string before you can turn it into an xml document.
Lastly you could try to digest the html string using string methods only, however this is going to be error prone and will again depend very largely on the structure of the document.
No matter what, once you have your method of parsing the html, currently in Google search results there is a div with the ID 'Search'. From a purely string stand point you could search for this in your source string as such:
dim searchTerm as string = "<div id=""search"""
dim searchLoc as integer = 0
searchLoc = sourceCode.indexOf(searchTerm)
once you know where the search results section starts you can then start searching first for "<li class=""g""" tokens and then "<h3 class=""r""" tokens inside those. Inside the h3 is where the result text is. You would want to consume to the first </h3> and </li> respectively to get the tokens.
once you had this text, you would need to sanitize it by searching through it and removing the html tags. You could easily write an algorithm to consume just the link text by looping through the indexes of key characters.
The whole point is to break it down into smaller pieces incrementally and then digest the smaller pieces. No matter how you approach it you are going to be doing this. However using a parser of some kind and utilizing the power of XPATH selector expressions would make it much easier than manually generating the tokens.
The pure string way is going to be the most difficult and also the slowest way to try and accomplish this. I would highly recommend trying to find a way to do it with some form of HTML parser otherwise you may go mad before you get a working solution.
As a final note, it looks like you are using a webbrowser control on your form. You can use this control and its related classes to parse the html of the pages it retrieves. I have done this before and it is not the most efficient way of scraping the web, but it can be very easy. Look into the HTMLDocument class for methods involving this controls return objects.

Convert text to image in Microsoft Word

I have a large book written in Microsoft Word and want to create a macro that will find all text using a predefined style and convert that text to an inline image. This text will be in Arabic and generally no longer than 4-5 lines. Is this possible?
UPDATE: Here's an example to show what I'm referring to:
I want to replace that entire line in Arabic with an image (as if I cropped this attached image to only include the Arabic and then replaced the line in Arabic with the image).
The reason I want a macro or script to do this is because there are hundreds of such lines and updating them one by one is cumbersome plus that will make modifications difficult later on.
UPDATE2: I found an interesting option here: http://windowssecrets.com/forums/showthread.php/31344-Convert-Text-to-an-Image-of-Text-in-VBA-(Office-2000-Sr1a)
It looks like you can cut a piece of text and then "Paste Special" as an image. So if there's a way to automate that that might work.
This is not an answer although I hope it will grow into a community answer. At the moment it is an exploration of what is required to solve the problem.
I know from the discussion when this question was posted on Super User that Abdullah wishes to publish his book on Kindle. So the question is really about how to get a document in English and Arabic ready for publication as an e-Book.
The Kindle does not support Arabic. The number of languages it does support is slowly increasing but there is no evidence I can find that Amazon has plans to add Arabic in the foreseeable future.
The format behind an Amazon e-Book is a cut down version of HTML. If a Word document containing Arabic letters is exported to HTML, the Arabic letters are included as character entities; for example: “ﭐ &#amp;64337; ﭒ ﭓ”. Importing the original Word or the HTML version to Kindle, results in the leading bits being discarded so these characters are displayed as P, Q, R and S instead of “ﭐ ﭑ ﭒ ﭓ (Alef Wasla isolated form, Alef Wasla final form, Beeh Wasla isolated form and Beeh Wasla final form).
I have tried Abdullah’s idea of saving some Arabic letters in a PNG file and creating an HTML file containing <p> … </p> <img src= “Arabic.png” > <p> … </p>. The appearance of this file on my Kindle 2 is perfectly acceptable so this has the potential to be a solution. The question is: how can the necessary conversions be performed?
We need to extract each Arabic string from either the Word document or its HTML equivalent and import it into a program that can convert them to PNG files.
The only way that I know of automating this would be to copy each string to a slide within PowerPoint. With PowerPoint’s SaveAs option it is possible to save each slide as a separate PNG file. The slides are named: SLIDE1.PNG, SLIDE2.PNG, SLIDE3.PNG and so on in sequence which would allow a macro to relate the results to the original strings. It would then be possible to replace the Arabic strings in the HTML file with the image elements. None of this would be too difficult to automate but there is a problem with the slides all being the size of the PowerPoint page. The page could be made smallish but what we need is for each slide to be cropped to just bigger than that slide’s text. I cannot think of any way of automating this cropping.
Does anyone have a better approach than converting each Arabic phrase to a PNG file?
I have been looking for PNG editors with some sort of command line interface but can find nothing that would be easier than using PowerPoint. Does anyone know of an alternative to PowerPoint?
Does anyone have any suggestions for automating the cropping of each image? When a string is placed in a PowerPoint slide it is possible to set its width to, say, 6.5cm (which looks good on my Kindle) and get the height determined by PowerPoint. This could be saved for later use if anyone knows how to use it.
Implementing solution
Pending any suggestions for improving the approach described above, the following outlines how I would implement it.
I would not attempt to process the Word document. I would save it as a Web Page, Filtered HTML file, which is a required step on the way to creating a Kindle eBook, and process that.
Within the HTML file created from my test document, the Arabic phrase comes out as:
<p class="MsoNormal"></p>
<p class="MsoNormal" align="center" style="text-align:center"><span dir="RTL"
style="font-size:24.0pt;font-family:Arial">
&#64336;&#64337;&#64338;&#64339;&#64340;&#64341;
&#64342;&#64343;&#65153;&#65154;&#65276;&#65275;
&#65274;&#65273;&#65246;&#65226;&#65227;&#65228;
</span><span style="font-size:24.0pt"></span></p>
<p class="MsoNormal"></p>
<p class="MsoNormal"></p>
I assume Abdullah's document will result in something similar. Note 1: the above is a random collection of Arabic letters. Note 2: they are held left-to-right in reading sequence even though, when displayed or printed, they are read right-to-left.
The whole of this block will have to be replaced with something like:
<br><imc src="xxxx.png"><br>
where the file xxxx.png holds an image of the Arabic text.
The file names, such as xxxx.png, could be systematic (A001.png, A002.png, ...) but I would have thought that transliterating the first ten or twenty characters of the phrase from the Arabic to English alphabets and using the result, with a numeric suffix, as the file name would be more convenient.
I would hold the records necessary to manage the process in an Excel worksheet. I would place the VBA code in the same workbook.
The steps in the conversion process that I envisage are:
VBA macro to extract Arabic strings from latest HTML file and add new strings to the Excel worksheet. (More about the Excel worksheet later.)
VBA macro to create PowerPoint file, with one slide per new string, and use SaveAs in PNG format to create one PNG file per slide before discarding the PowerPoint file.
Human to crop each PNG file. (There appears to be no way of automating the cropping so this task will be minimised by use of data in the Excel worksheet.)
VBA macro to rename each slide from SLIDEnnn.PNG to its permanent name and to record the permanent name in the Excel worksheet.
VBA macro to update the latest HTML file by replacing the block containing the Arabic phrase with the appropriate HTML IMG element.
The Excel worksheet needs two columns: Arabic phrase and PNG file name. If there is any risk of the worksheet being sorted between steps 2 and 4, we may need a sequence number as well.
Macro 1 will extract an Arabic phrase from the HTML file, look down the list in the worksheet for this phrase and add the phrase at the bottom if it is not already present.
Macro 2 will look for phrases in the worksheet that do not have a PNG file name. These new phrases are the ones to be written to the PowerPoint presentation. That is, a phrase only goes into this process once.
Task 3, cropping each PNG file, will be a pain. All I can say is that it will only be once per phrase.
Macro 4 will assume that the SLIDE001.PNG, SLIDE002.PNG, … are in the sequence of phrases without PNG files in the worksheet. If this might not be true (because the worksheet has been sorted) we will either need a sequence number or to retain the PowerPoint file. The macro will assign a unique name to each new phrase, record this name in the worksheet and rename the PNG file.
Macro 5 creates a new copy of the latest HTML file using the contents of the worksheet to determine which phrase to replace with which PNG file.
This process is not ideal but it will achieve the desired result and has no obvious complications. Any suggestions for improving it?
Before you begin these instructions, press record in the Microsoft Word macro editor, so you can see what the VBA code is.
I'm wondering if this will be easier if you convert the docx file to .rtf (rich text format) and replace that line with an image? Go to File > Save As.. > name it "old.rtf", then replace the line with an image and Save As.. again and name it "new.rtf" and then download Beyond Compare or your favorite diff program to see what happened. It should be easy to do this pro-grammatically if you choose to. I think working in text would be easier than Microsoft's binary format unless you can find a good library to modify their doc or docx formats.
Sub CopySelPasteAsPicture()
' Take a picture of a selection and paste it at the
' document end
With Selection
.CopyAsPicture
End With
ActiveDocument.Content.Select
With Selection
.Collapse Direction:=wdCollapseEnd
.TypeParagraph
.TypeParagraph
.PasteSpecial DataType:=wdPasteMetafilePicture
End With
End Sub

VB.NET: Preserve image metadata when moved to clipboard as an image

Visual Studio 2010, .NET 4, VB.NET
Hello,
I am writing a little program to convert LaTeX snippets to images which can be pasted into whatever program one can paste images into. It's working alright but the next obvious step is to include the source LaTeX code as a piece of metadata in the image so that the results can be modified without having to retype everything.
I have succeeded in adding a title PropertyItem with the latex encoded as an ASCII byte array as its value (id=800, type=2, value=System.Text.Encoding.ASCII.GetBytes(codestring)). I verify that the PropertyItem is really there before trying to put the image on the clipboard.
Then I do Clipboard.SetImage(myImage). The result is all of the PropertyItems are removed (my title plus anything else that was there)! I check this by doing MsgBox(Clipboard.GetImage.PropertyItems.Count.ToString) which gives zero.
This makes me very sad. Anyone know what's up?
Thanks in advance!
Brian
Update: I have figured out how to move the image onto the clipboard and then back off while preserving the PropertyItems like so:
Format = DataFormats.GetFormat(GetType(Image).FullName)
Dim dataObject As New DataObject
dataObject.SetData(Format.Name, image)
Clipboard.SetDataObject(dataObject)
Dim copiedImage As Image = CType(Clipboard.GetDataObject.GetData(Format.Name), Image)
This way, the copiedImage has the same PropertyItems as the original. However, new problem:
Other programs don't recognize what's on the clipboard as an image anymore, which defeats the whole purpose. I.e., if I put an image on the clipboard this way, when I try pasting into some context that accepts pasted images, nothing happens.
What to do?!
I believe the Windows clipboard image has no metadata. If you change the format of the image to add metadata, it is no longer a clipboard image. If the other programs can accept it, you could copy and paste the image file (instead of the image) to the clipboard, and the metadata will of course be intact when it's read by the target app.
Have you tried Clipboard.SetData or Clipboard.SetDataObject? SetImage only copies the image in bitmap format, so I am not surprised that it strips the property items. You might try:
Clipboard.SetData(DataFormats.EnhancedMetafile, myImage)
or
Clipboard.SetData(DataFormats.MetafilePict, myImage)