I have been developing an application in Julia using Genie Framework and Stipple, and the main task of this app is to implement Sobel and Prewitt operator. The problem that I am struggling with is the uploader component. So basically I am able to upload an image, on button click the image is transformed, but then when i upload another image and try to output the transformed version of it, the output that i get is still the old image. I have been trying to find the issue and I noticed that QUploader API has some methods that could help solve this problem: reset() method, or removeUploadedFiles() method, but I do not know how to call/use these functions regarding Julia syntax. Are there any solutions available?
const FILE_PATH = "public/sample.jpg"
const FINAL_PATH = "final.jpg"
#const IMGPATH = "demo.png"
model = Model |> init
on(model.process_s3) do _
model.imageurl[] = ""
#info "Working"
img = FileIO.load(FILE_PATH)
img_gray = Gray.(img)
#info img_gray
sobel_image = convert(Array{Float64}, img_gray)
lastImage = clamp01nan.(sobel(sobel_image, sobel3_kernel_x, sobel3_kernel_y))
save(joinpath(#__DIR__, "public", FINAL_PATH), lastImage)
model.imageurl[] = "/$FINAL_PATH#$(Base.time())" * string(rand())
#info model.imageurl[]
if (model.process_s3[])
model.process_s3[] = false
end
end
function ui(model)
[
page( model,
class = "container",
title = "Card Demo",
partial = true,
[
row( # row takes a tuple of cells. Creates a `div` HTML element with a CSS class named `row`.
cell([h1("Edge Detection Project")]),
)
row(
[
cell(class="st-module", [
h2("Initial Image"),
card(
class = "q-pa-md row items-start q-gutter-md",
uploader(
label = "Upload Image",
method = "POST",
:multiple,
url = "http://localhost:8000/upload",
field__name = "img",
:finish="finished",
ref="uploader"
),
),
btn("Sobel 3x3",color="primary", #click("process_s3 = true")),
])
cell(class="st-module", [
h2("Transformed Image"),
card(
class = "q-pa-md row items-start q-gutter-md",
#quasar(:img, src=:imageurl, spinner__color="white", style="height: 300px; max-width: 350px")
imageview(src=:imageurl, spinner__color="white", style="height: 250px; max-width: 250px")
),
])
],
)
],
),
]
end
route("/") do
html(ui(model), context = #__MODULE__)
end
route("/upload", method = POST) do
if infilespayload(:img)
#info Requests.filename(filespayload(:img))
open(FILE_PATH, "w") do io
write(FILE_PATH, filespayload(:img).data)
#info File
end
else
#info "No image uploaded"
end
Genie.Renderer.redirect(:get)
end
# isrunning(:webserver) || up()
Replace:
"/$FINAL_PATH#$(Base.time())"
with
"/$(FINAL_PATH)?t=$(Base.time())"
Explanation:
# makes just an anchor link to an HTML document. This will obviously result in buffering the document as the browser might just look for different anchors (and not find them) yet has no motivation to re-download.
On the other hand adding the ? makes the request actually different every time (understood by browser as a different document). In result the cache will not be used - a new copy gets requested.
Related
I finally got my html2pdf to work showing my web page just how I want it in the pdf(Any other size was not showing right so I kept adjusting the format size until it all fit properly), and the end result is exactly what I want it to look like... EXCEPT even though my aspect ratio is correct for a landscape, it is still using a very large image and the pdf is not standard letter size (Or a4 for that matter), it is the size I set. This makes for a larger pdf than necessary and does not print well unless we adjust it for the printer. I basically want this exact image just converted to a a4 or letter size to make a smaller pdf. If I don't use the size I set though things are cut off.
Anyway to take this pdf that is generated and resize to be an a4 size(Still fitting the image on it). Everything I try is not working, and I feel like I am missing something simple.
const el = document.getElementById("test);
var opt = {
margin: [10, 10, 10, 10],
filename: label,
image: { type: "jpeg", quality: 0.98 },
//pagebreak: { mode: ["avoid-all", "css"], after: ".newPage" },
pagebreak: {
mode: ["css"],
avoid: ["tr"],
// mode: ["legacy"],
after: ".newPage",
before: ".newPrior"
},
/*pagebreak: {
before: ".newPage",
avoid: ["h2", "tr", "h3", "h4", ".field"]
},*/
html2canvas: {
scale: 2,
logging: true,
dpi: 192,
letterRendering: true
},
jsPDF: {
unit: "mm",
format: [463, 600],
orientation: "landscape"
}
};
var doc = html2pdf()
.from(el)
.set(opt)
.toContainer()
.toCanvas()
.toImg()
.toPdf()
.save()
I have been struggling with this a lot as well. In the end I was able to resolve the issue for me. What did the trick for me was setting the width-property in html2canvas. My application has a fixed width, and setting the width of html2canvas to the width of my application, scaled the PDF to fit on an A4 paper.
html2canvas: { width: element_width},
Try adding the above option to see if it works. Try to find out the width of your print area in pixels and replace element_width with that width.
For completeness: I am using Plotly Dash to create web user interfaces. On my interface I include a button that when clicked generates a PDF report of my dashboard. Below I added the code that I used for this, in case anybody is looking for a Dash solution. To get this working in Dash, download html2pdf.bundlemin.js and copy it to the assets/ folder. The PDF file will be downloaded to the browsers default downloads folder (it might give a download prompt, however that wasn't how it worked for me).
from dash import html, clientside_callback
import dash_bootstrap_components as dbc
# Define your Dash app in the regular way
# In the layout define a component that will trigger the download of the
# PDF report. In this example a button will be responsible.
app.layout = html.Div(
id='main_container',
children = [
dbc.Button(
id='button_download_report',
children='Download PDF report',
className='me-1')
])
# Clientside callbacks allow you to directly insert Javascript code in your
# dashboards. There are also other ways, like including your own js files
# in the assets/ directory.
clientside_callback(
'''
function (button_clicked) {
if (button_clicked > 0) {
// Get the element that you want to print. In this example the
// whole dashboard is printed
var element = document.getElementById("main_container")
// create a date-time string to use for the filename
const d = new Date();
var month = (d.getMonth() + 1).toString()
if (month.length == 1) {
month = "0" + month
}
let text = d.getFullYear().toString() + month + d.getDay() + '-' + d.getHours() + d.getMinutes();
// Set the options to be used when printing the PDF
var main_container_width = element.style.width;
var opt = {
margin: 10,
filename: text + '_my-dashboard.pdf',
image: { type: 'jpeg', quality: 0.98 },
html2canvas: { scale: 3, width: main_container_width, dpi: 300 },
jsPDF: { unit: 'mm', format: 'A4', orientation: 'p' },
// Set pagebreaks if you like. It didn't work out well for me.
// pagebreak: { mode: ['avoid-all'] }
};
// Execute the save command.
html2pdf().from(element).set(opt).save();
}
}
''',
Output(component_id='button_download_report', component_property='n_clicks'),
Input(component_id='button_download_report', component_property='n_clicks')
)
I am trying to mimic the autosave function in GMS v3 so that I can use in version 1 and 2. I would like to first acknowledge that the main bulk of the script originates from Dr Bernhard Schaffer's "How to script... Digital Micrograph Scripting Handbook". I have modified it a bit, so that any new image recorded by the camera can be autosave into the file. However, I met some problems because if I decide to click on live-view image and move the image around, or using live-fft, the live view image or the FFT image will be saved as well. One of the ideas I have is to use the taggroup information such as the "Acquisition:Parameters:Parameter Set Name" because for live view or live-FFT, this would be either in search or focus mode. Another idea is to use the document ID e.g iDocID = idoc.ImageDocumentGETID() to locate the ID of the live image. However, i am clueless then how to use this information to exclude them from autosaving. Can anyone point to me how i can proceed with this script?
Below is the script
Class PeriodicAutoSave : Object
{
Number output
PeriodicAutoSave(Object self) Result("\n Object ID"+self.ScriptObjectGetID()+" created.")
~PeriodicAutoSave(Object self) Result("\n Object ID"+self.ScriptObjectGetID()+" destroyed")
Void Init2(Object self, Number op)
output=op
Void AutoSave_SaveAll(Object self)
{
String path, name, targettype, targettype1, area, mag, mode, search, result1
ImageDocument idoc
Number nr_idoc, count, index_i, index, iDocID, iDocID_search
path = "c:\\path\\"
name = "test"
targettype=="Gatan Format (*.dm4)"
targettype1 = "dm4"
If (output) Result("\n AutoSave...")
nr_idoc = CountImageDocuments()
For (count = 1; count<nr_idoc; count++)
{
idoc = GetImageDocument(count) //imagedocument
index = 1 // user decide the index to start with
index_i= nr_idoc - index
If (idoc.ImageDocumentIsDirty())
{
idoc = getfrontimagedocument()
iDocID = idoc.ImageDocumentGetID()
TagGroup tg = ImageGetTagGroup(idoc.ImageDocumentGetImage(0)) // cannot declare an 'img' for this line as it will prompt an error?
tg.TagGroupGetTagAsString("Microscope Info:Formatted Indicated Mag", mag)
Try{
{
idoc.ImageDocumentSavetoFile( "Gatan Format", path+index_i+"-"+name+"-"+mag+".dm4")
idoc.ImageDocumentSetName(index_i + "-"+name+"-"+mag+".dm4")
idoc.ImageDocumentClean()
}
If (Output) Result("\n\t saving: "+idoc.ImageDocumentGetCurrentFile())
}
Catch{
Result("\n image cannot be saved at the moment:" + GetExceptionString())
Break
}
Result("\ Continue autosave...")
}
}
}
}
Void LaunchAutoSave()
{
Object obj = Alloc(PeriodicAutoSave)
obj.Init2(2)
Number task_id = obj.AddMainThreadPeriodicTask("AutoSave_SaveALL",6)
//Sleep(10)
while(!shiftdown()) 1==2
RemoveMainThreadTask(task_id)
}
LaunchAutoSave()
thank you very much for your pointers! I have tried and it works very well with my script. as the 'TagGroupDoesTagExist' only refers to the taggroup, I modified further to include the tags I want to filter e.g "Search" or "Focus" and it seems to work well. The script that I modified to your existing ones is as below :
If (idoc.ImageDocumentIsDirty())
{
//now find out what is a filter condition and skip if it is true
skip = 0
TagGroup tg = idoc.ImageDocumentGetImage(0).ImageGetTagGroup()
tg.TagGroupGetTagAsString("Microscope Info:Formatted Indicated Mag", mag)
tg.TagGroupGetTagAsString("Acquisition:Parameters:Parameter Set Name", mode)
skip = tg.TagGroupDoesTagExist("Acquisition:Parameters:Parameter Set Name")
if(skip && (mode == "Search" || mode== "Focus")) continue
Your idea of filtering is a good one, but there is something strange with your for loop.
in
nr_idoc = CountImageDocuments()
For (count = 1; count<nr_idoc; count++)
{
idoc = GetImageDocument(count) //imagedocument
you iterate over all currently open imageDocuments (except the first one!?) and get them one by one, but then in
If (idoc.ImageDocumentIsDirty())
{
idoc = getfrontimagedocument()
you actually get the front-most (selected) document instead each time. Why are you doing this?
Why not go with:
number nr_idoc = CountImageDocuments()
for (number count = 0; count<nr_idoc; count++)
{
imagedocument idoc = GetImageDocument(count)
If (idoc.ImageDocumentIsDirty())
{
// now find out what is a filter condition and skip if it is true
number skip = 0
TagGroup tg = idoc.ImageDocumentGetImage(0).ImageGetTagGroup()
skip = tg.TagGroupDoesTagExist("Microscope Info:Formatted Indicated Mag")
if (skip) continue
// do saving
}
}
For years, I have been using Google Cloud Print to print labels in our laboratories on campus (to standardize) using a Google Apps Script custom HtmlService form.
Now that GCP is becoming depreciated, I am in on a search for a solution. I have found a few options but am struggling to get the file to convert to a pdf as would be needed with these other vendors.
Currently, when you submit a text/html blob to the GCP servers in GAS, the backend converts the blob to application/pdf (as evidenced by looking at the job details in the GCP panel on Chrome under 'content type').
That said, because these other cloud print services require pdf printing, I have tried for some time now to have GAS change the file to pdf format before sending to GCP and I always get a strange result. Below, I'll show some of the strategies that I have used and include pictures of one of our simple labels generated with the different functions.
The following is the base code for the ticket and payload that has worked for years with GCP
//BUILD PRINT JOB FOR NARROW TAPES
var ticket = {
version: "1.0",
print: {
color: {
type: "STANDARD_COLOR",
vendor_id: "Color"
},
duplex: {
type: "NO_DUPLEX"
},
copies: {copies: parseFloat(quantity)},
media_size: {
width_microns: 27940,
height_microns:40960
},
page_orientation: {
type: "LANDSCAPE"
},
margins: {
top_microns:0,
bottom_microns:0,
left_microns:0,
right_microns:0
},
page_range: {
interval:
[{start:1,
end:1}]
},
}
};
var payload = {
"printerid" : QL710,
"title" : "Blank Template Label",
"content" : HtmlService.createHtmlOutput(html).getBlob(),
"contentType": 'text/html',
"ticket" : JSON.stringify(ticket)
};
This generates the expected following printout:
When trying to convert to pdf using the following code:
The following is the code used to transform to pdf:
var blob = HtmlService.createTemplate(html).evaluate().getContent();
var newBlob = Utilities.newBlob(html, "text/html", "text.html");
var pdf = newBlob.getAs("application/pdf").setName('tempfile');
var file = DriveApp.getFolderById("FOLDER ID").createFile(pdf);
var payload = {
"printerid" : QL710,
"title" : "Blank Template Label",
"content" : pdf,//HtmlService.createHtmlOutput(html).getBlob(),
"contentType": 'text/html',
"ticket" : JSON.stringify(ticket)
};
an unexpected result occurs:
This comes out the same way for direct coding in the 'content' field with and without .getBlob():
"content" : HtmlService.createHtmlOutput(html).getAs('application/pdf'),
note the createFile line in the code above used to test the pdf. This file is created as expected, of course with the wrong dimensions for label printing (not sure how to convert to pdf with the appropriate margins and page size?): see below
I have now tried to adopt Yuri's ideas; however, the conversion from html to document loses formatting.
var blob = HtmlService.createHtmlOutput(html).getBlob();
var docID = Drive.Files.insert({title: 'temp-label'}, blob, {convert: true}).id
var file = DocumentApp.openById(docID);
file.getBody().setMarginBottom(0).setMarginLeft(0).setMarginRight(0).setMarginTop(0).setPageHeight(79.2).setPageWidth(172.8);
This produces a document looks like this (picture also showing expected output in my hand).
Does anyone have insights into:
How to format the converted pdf to contain appropriate height, width
and margins.
How to convert to pdf in a way that would print correctly.
Here is a minimal code to get a better sense of context https://script.google.com/d/1yP3Jyr_r_FIlt6_aGj_zIf7HnVGEOPBKI0MpjEGHRFAWztGzcWKCJrD0/edit?usp=sharing
I've made the template (80 x 40 mm -- sorry, I don't know your size):
https://docs.google.com/document/d/1vA93FxGXcWLIEZBuQwec0n23cWGddyLoey-h0WR9weY/edit?usp=sharing
And there is the script:
function myFunction() {
// input data
var matName = '<b>testing this to <u>see</u></b> if it <i>actually</i> works <i>e.coli</i>'
var disposeWeek = 'end of semester'
var prepper = 'John Ruppert';
var className = 'Cell and <b>Molecular</b> Biology <u>Fall 2020</u> a few exercises a few exercises a few exercises a few exercises';
var hazards = 'Lots of hazards';
// make a temporary Doc from the template
var copyFile = DriveApp.getFileById('1vA93FxGXcWLIEZBuQwec0n23cWGddyLoey-h0WR9weY').makeCopy();
var doc = DocumentApp.openById(copyFile.getId());
var body = doc.getBody();
// replace placeholders with data
body.replaceText('{matName}', matName);
body.replaceText('{disposeWeek}', disposeWeek);
body.replaceText('{prepper}', prepper);
body.replaceText('{className}', className);
body.replaceText('{hazards}', hazards);
// make Italics, Bold and Underline
handle_tags(['<i>', '</i>'], body);
handle_tags(['<b>', '</b>'], body);
handle_tags(['<u>', '</u>'], body);
// save the temporary Doc
doc.saveAndClose();
// make a PDF
var docblob = doc.getBlob().setName('Label.pdf');
DriveApp.createFile(docblob);
// delete the temporary Doc
copyFile.setTrashed(true);
}
// this function applies formatting to text inside the tags
function handle_tags(tags, body) {
var start_tag = tags[0].toLowerCase();
var end_tag = tags[1].toLowerCase();
var found = body.findText(start_tag);
while (found) {
var elem = found.getElement();
var start = found.getEndOffsetInclusive();
var end = body.findText(end_tag, found).getStartOffset()-1;
switch (start_tag) {
case '<b>': elem.setBold(start, end, true); break;
case '<i>': elem.setItalic(start, end, true); break;
case '<u>': elem.setUnderline(start, end, true); break;
}
found = body.findText(start_tag, found);
}
body.replaceText(start_tag, ''); // remove tags
body.replaceText(end_tag, '');
}
The script just changes the {placeholders} with the data and saves the result as a PDF file (Label.pdf). The PDF looks like this:
There is one thing, I'm not sure if it's possible -- to change a size of the texts dynamically to fit them into the cells, like it's done in your 'autosize.html'. Roughly, you can take a length of the text in the cell and, in case it is bigger than some number, to make the font size a bit smaller. Probably you can use the jquery texfill function from the 'autosize.html' to get an optimal size and apply the size in the document.
I'm not sure if I got you right. Do you need make PDF and save it on Google Drive? You can do in Google Docs.
As example:
Make a new document with your table and text. Something like this
Add this script into your doc:
function myFunction() {
var copyFile = DriveApp.getFileById(ID).makeCopy();
var newFile = DriveApp.createFile(copyFile.getAs('application/pdf'));
newFile.setName('label');
copyFile.setTrashed(true);
}
Every time you run this script it makes the file 'label.pdf' on your Google Drive.
The size of this pdf will be the same as the page size of your Doc. You can make any size of page with add-on: Page Sizer https://webapps.stackexchange.com/questions/129617/how-to-change-the-size-of-paper-in-google-docs-to-custom-size
If you need to change the text in your label before generate pdf or/and you need change the name of generated file, you can do it via script as well.
Here is a variant of the script that changes a font size in one of the cells if the label doesn't fit into one page.
function main() {
// input texts
var text = {};
text.matName = '<b>testing this to <u>see</u></b> if it <i>actually</i> works <i>e.coli</i>';
text.disposeWeek = 'end of semester';
text.prepper = 'John Ruppert';
text.className = 'Cell and <b>Molecular</b> Biology <u>Fall 2020</u> a few exercises a few exercises a few exercises a few exercises';
text.hazards = 'Lots of hazards';
// initial max font size for the 'matName'
var size = 10;
var doc_blob = set_text(text, size);
// if we got more than 1 page, reduce the font size and repeat
while ((size > 4) && (getNumPages(doc_blob) > 1)) {
size = size-0.5;
doc_blob = set_text(text, size);
}
// save pdf
DriveApp.createFile(doc_blob);
}
// this function takes texts and a size and put the texts into fields
function set_text(text, size) {
// make a copy
var copyFile = DriveApp.getFileById('1vA93FxGXcWLIEZBuQwec0n23cWGddyLoey-h0WR9weY').makeCopy();
var doc = DocumentApp.openById(copyFile.getId());
var body = doc.getBody();
// replace placeholders with data
body.replaceText('{matName}', text.matName);
body.replaceText('{disposeWeek}', text.disposeWeek);
body.replaceText('{prepper}', text.prepper);
body.replaceText('{className}', text.className);
body.replaceText('{hazards}', text.hazards);
// set font size for 'matName'
body.findText(text.matName).getElement().asText().setFontSize(size);
// make Italics, Bold and Underline
handle_tags(['<i>', '</i>'], body);
handle_tags(['<b>', '</b>'], body);
handle_tags(['<u>', '</u>'], body);
// save the doc
doc.saveAndClose();
// delete the copy
copyFile.setTrashed(true);
// return blob
return docblob = doc.getBlob().setName('Label.pdf');
}
// this function formats the text beween html tags
function handle_tags(tags, body) {
var start_tag = tags[0].toLowerCase();
var end_tag = tags[1].toLowerCase();
var found = body.findText(start_tag);
while (found) {
var elem = found.getElement();
var start = found.getEndOffsetInclusive();
var end = body.findText(end_tag, found).getStartOffset()-1;
switch (start_tag) {
case '<b>': elem.setBold(start, end, true); break;
case '<i>': elem.setItalic(start, end, true); break;
case '<u>': elem.setUnderline(start, end, true); break;
}
found = body.findText(start_tag, found);
}
body.replaceText(start_tag, '');
body.replaceText(end_tag, '');
}
// this funcion takes saved doc and returns the number of its pages
function getNumPages(doc) {
var blob = doc.getAs('application/pdf');
var data = blob.getDataAsString();
var pages = parseInt(data.match(/ \/N (\d+) /)[1], 10);
Logger.log("pages = " + pages);
return pages;
}
It looks rather awful and hopeless. It turned out that Google Docs has no page number counter. You need to convert your document into a PDF and to count pages of the PDF file. Gross!
Next problem, even if you managed somehow to count the pages, you have no clue which of the cells was overflowed. This script takes just one cell, changes its font size, counts pages, changes the font size again, etc. But it doesn't granted a success, because there can be another cell with long text inside. You can reduce font size of all the texts, but it doesn't look like a great idea as well.
EDIT : OK, It was my css page which had a rule on path, 'cause I use svg a lot. Removed that rule and the problem was gone !
I'm facing something pretty annoying and which I do not understand.
I'm using amChart to make a XY chart with multiple series. Not that hard.
The thing is, I can't customize my series ! Bullets and legend are ok, but not series.
Here's a screenshot for better understanding :
MyWeirdChart (new OP can't embed images, sorry)
As you can see I have my custom bullet pushed on my series and my legend is exactly what I want for my chart BUT series are staying unchanged.
Here is my JS draw function :
function drawChart(dateArray, casesArray, deathsArray, healedArray, hospitalizationsArray, reanimationsArray) {
am4core.useTheme(am4themes_animated);
var chart = am4core.create("chartdiv", am4charts.XYChart);
chart.data = generateChartData(dateArray, casesArray, deathsArray, healedArray, hospitalizationsArray, reanimationsArray);
var dateAxis = chart.xAxes.push(new am4charts.DateAxis());
var valueAxis = chart.yAxes.push(new am4charts.ValueAxis());
function pushSeries(field, name, color) {
let series = chart.series.push(new am4charts.LineSeries());
series.dataFields.valueY = field;
series.dataFields.dateX = "date";
series.name = name;
series.tooltipText = name + ": [b]{valueY}[/]";
series.stroke = am4core.color(color);
series.strokeWidth = 3;
series.fill = am4core.color(color);
series.fillOpacity = 0.5;
let bullet = series.bullets.push(new am4charts.CircleBullet());
bullet.circle.stroke = am4core.color(color);
bullet.circle.strokeWidth = 2;
bullet.circle.fill = am4core.color(color);
bullet.circle.fillOpacity = 0.5;
bullet.circle.radius = 3;
}
pushSeries("cases", "Cas confirmés", "#32B3E3");
pushSeries("healed", "Guéris", "#00C750");
pushSeries("hospitalizations", "Hospitalisations", "#FFBB33");
pushSeries("reanimations", "Réanimations", "#FE3446");
pushSeries("deaths", "Morts", "black");
chart.cursor = new am4charts.XYCursor();
chart.scrollbarX = new am4core.Scrollbar();
chart.legend = new am4charts.Legend();
chart.cursor.maxTooltipDistance = 0;
}
Did I miss something ? I crawled forums and documentations and I'm now helpless.
My code is in my webpack app.js file. But I include amCharts with HTML scripts,
<script src="https://www.amcharts.com/lib/4/core.js"></script>
<script src="https://www.amcharts.com/lib/4/charts.js"></script>
<script src="https://www.amcharts.com/lib/4/themes/animated.js"></script>
not with webpack import. But I guess that if this was the problem, I would not be able to draw a chart at all.
OK, It was my css page which had a rule on path, 'cause I use svg a lot. Removed that rule and the problem was gone !
I'm wondering if it is possible to export some annotations as images. I already know how to export highlighted text as text, but this doesn't work well with equations. If equations were denoted by an annotation, such as a box encircling them, could I convert them all at once to images using a pdf snapshot tool?
It is easy to do each one individually by hand with the pdf snapshot tool. Do any pdf libraries or programs have any tools that let you make image snapshots programmatically, not of whole pages, but of individual equations that are marked somehow with an annotation?
For the purposes of the question, they don't necessarily have to be free programs.
Thanks.
I came up with a full ruby based solution here, using the ruby gems pdf-reader and rmagick (along with an installation of imagemagick).
require 'pdf-reader'
require 'RMagick'
pdf_file_name='statmech' #without extension
doc = PDF::Reader.new(File.expand_path(pdf_file_name+".pdf"))
$objects = doc.objects
def convertpagetojpgandcrop(filename,pagenum,croprect,imgname)
pagename = filename+".pdf[#{pagenum-1}]"
#higher density used for quality purposes (otherwise fuzzy)
pageim = Magick::Image.read(pagename){ |opts| opts.density = 216}.first
#factors of 3 needed because higher density TODO: generalize to pdf density!=72
#SouthWestGravity puts coordinate origin in bottom left to match pdf coords
eqim =pageim.crop(Magick::SouthWestGravity,...
3*croprect[0],3*croprect[1],3*croprect[2]-3*croprect[0],3*croprect[3]-3*croprect[1])
eqim.write(imgname)
end
def is_square?(object)
object[:Type] == :Annot && object[:Subtype] == :Square
end
def is_highlight?(object)
object[:Type] == :Annot && object[:Subtype] == :Highlight
end
def annots_on_page(page)
references = (page.attributes[:Annots] || [])
lookup_all(references).flatten
end
def lookup_all(refs)
refs = *refs
refs.map { |ref| lookup(ref) }
end
def lookup(ref)
object = $objects[ref]
return object unless object.is_a?(Array)
lookup_all(object)
end
def highlights_on_page(page)
all_annots = annots_on_page(page)
all_annots.select { |a| is_highlight?(a) }
end
def squares_on_page(page)
all_annots = annots_on_page(page)
all_annots.select { |a| is_square?(a) }
end
def restricted_annots_on_page(page)
all_annots = annots_on_page(page)
all_annots.select { |a| is_square?(a)||is_highlight?(a) }
end
#This block exports a jpg for each 'square' annotation in pdf
doc.pages.each do |page|
eqnum=0
all_squares = squares_on_page(page)
all_squares.each do |annot|
eqnum = eqnum+1
puts "#{annot[:Rect]}"
convertpagetojpgandcrop(pdf_file_name,page.number,annot[:Rect],...
pdf_file_name+"page#{page.number}eq#{eqnum}.jpg")
end
end
#This block gives the text of the highlights and wikilinks to the images
#TODO:(needs to go in text file)
doc.pages.each do |page|
eqnum = 0
annots = restricted_annots_on_page(page)
if annots.length>0
puts "# Page #{page.number}"
end
annots.each do |annot|
if is_square?(annot)
eqnum = eqnum+1
puts "{{wiki:#{pdf_file_name}page#{page.number}eq#{eqnum}.jpg}}"
else
puts "#{annot[:Contents]}"
end
end
end
This code expands upon example code for the pdf-reader and rmagick gems found online. Few of the lines are original.
This code sample uses Amyuni PDF Creator .Net, it will export the page with only one annotation visible at a time:
using System.IO;
using Amyuni.PDFCreator;
using System.Collections;
//open a pdf document
FileStream testfile = new FileStream(filename, FileMode.Open, FileAccess.Read, FileShare.Read);
IacDocument document = new IacDocument(null);
document.SetLicenseKey("your license", "your code");
document.Open(testfile, "");
document.CurrentPageNumber = 1;
IacAttribute attribute = document.CurrentPage.AttributeByName("Objects");
// listobj is an array list of objects
ArrayList listobj = (System.Collections.ArrayList)attribute.Value;
ArrayList annotations = new ArrayList();
foreach (Amyuni.PDFCreator.IacObject iacObj in listobj)
{
if ((bool)iacObj.AttributeByName("Annotation").Value)
{
annotations.Add(iacObj);
// Put the annotation out of sight
iacObj.Coordinates = Rectangle.FromLTRB(
-iacObj.Coordinates.Left,
-iacObj.Coordinates.Top,
-iacObj.Coordinates.Right,
-iacObj.Coordinates.Bottom);
}
else
iacObj.Delete(false);
}
ArrayList images = new ArrayList();
int i = 0;
foreach (Amyuni.PDFCreator.IacObject iacObj in annotations)
{
// Back on sight
iacObj.Coordinates = Rectangle.FromLTRB(
-iacObj.Coordinates.Left,
-iacObj.Coordinates.Top,
-iacObj.Coordinates.Right,
-iacObj.Coordinates.Bottom);
//Draw the page
Bitmap bmp = new Bitmap(1000, 1000);
Graphics gr = Graphics.FromImage(bmp);
IntPtr hdc = gr.GetHdc();
document.DrawCurrentPage(hdc.ToInt32(), true);
gr.ReleaseHdc();
images.Add(bmp);
bmp.Save("c:\\temp\\image" + i + ".pdf");
iacObj.Delete(false); // object not needed anymore
i++;
}
If needed, you can extract the part of the resulting image that corresponds to the annotation by using the Coordinates property of the annotation object.
If you want to extract all objects from a rectangular area (annotations or otherwise) you can replace the loop that collects annotations with a call to the method IacDocument.GetObjectsInRectangle
Usual disclaimer applies