Fiddler: Programmatically add word to Query string - scripting

Please be kind, I'm new to Fiddler
My purpose:I want to use Fiddler as a Google search filter
Summary:
I'm tired of manually adding "dog" every time I use Google.I do not want the "dog" appearing in my search results.
For example:
//www.google.com/search?q=cat+-dog
//www.google.com/search?q=baseball+-dog
CODE:
dog replaced with -torrent-watch-download
// ==UserScript==
// #name Tamper with Google Results
// #namespace http://superuser.com/users/145045/krowe
// #version 0.1
// #description This just modifies google results to exclude certain things.
// #match http://*.google.com
// #match https://*.google.com
// #copyright 2014+, KRowe
// ==/UserScript==
function GM_main () {
window.onload = function () {
var targ = window.location;
if(targ && targ.href && targ.href.match('https?:\/\/www.google.com/.+#q=.+') && targ.href.search("/+-torrent/+-watch/+-download")==-1) {
targ.href = targ.href +"+-torrent+-watch+-download";
}
};
}
//-- This is a standard-ish utility function:
function addJS_Node(text, s_URL, funcToRun, runOnLoad) {
var D=document, scriptNode = D.createElement('script');
if(runOnLoad) scriptNode.addEventListener("load", runOnLoad, false);
scriptNode.type = "text/javascript";
if(text) scriptNode.textContent = text;
if(s_URL) scriptNode.src = s_URL;
if(funcToRun) scriptNode.textContent = '(' + funcToRun.toString() + ')()';
var targ = D.getElementsByTagName('head')[0] || D.body || D.documentElement;
targ.appendChild(scriptNode);
}
addJS_Node (null, null, GM_main);
At first I was going to go with Tampermonkey userscripts,Because I did not know about Fiddler
==================================================================================
Now,lets focus on Fiddler
Before Request:
I want Fiddler to add text at the end of Google Query string.
Someone suggested me to use
static function OnBeforeRequest(oSession: Session) {
if (oSession.uriContains("targetString")) {
var sText = "Enter a string to append to a URL";
oSession.fullUrl = oSession.fullUrl + sText;
}
}
Before Response:
This is where my problem lies
I totally love the HTML response,Now I just want to scrape/hide the word in the search box without changing the search results.How can it be done? Any Ideas?
http://i.stack.imgur.com/4mUSt.jpg
Can you guys please take the above information and fix the problem for me
Thank you

Basing on goal definition above, I believe you can achieve better results with your own free Google custom search engine service. In particular, because you have control over GCSE fine-tuning results, returned by regular Google search.
Links:
https://www.google.com/cse/all
https://developers.google.com/custom-search/docs/structured_search

Related

HTTP request won't get data from API. Gamemaker Studio 1.4.9

I'm trying to figure out how to get information from a dictionary API in Gamemaker Studio 1.4.9
I'm lost since I can't figure out how to get around the API's server block. All my return shows is a blank result.
Step Event:
if(keyboard_check_pressed(vk_space)){
http_get("https://api.dictionaryapi.dev/api/v2/entries/en/test");
}
HTTP Event:
var requestResult = ds_map_find_value(async_load, "result");
var resultMap = json_decode(requestResult);
if(resultMap == -1)
{
show_message("Invalid result");
exit;
}
if(ds_map_exists(resultMap,"word")){
var name= ds_map_find_value(resultMap, "word");
show_message("The word name is "+name);
}
Maybe my formatting is wrong? It's supposed to say the word test in the show_message function, but again, all I get returned is a blank result.
Any help would be appreciated, thanks!
You can see through the debugger that the data is coming from the server. But your code does not correctly try to retrieve the Word.
https://imgur.com/a/icQSnnx
This code gets this word
show_debug_message("http received")
var requestResult = ds_map_find_value(async_load, "result");
var resultMap = json_decode(requestResult);
if(resultMap == -1)
{
show_message("Invalid result");
exit;
}
if(ds_map_exists(resultMap,"default")){
var defaultList = ds_map_find_value(resultMap, "default")
var Map = ds_list_find_value(defaultList, 0)
var name= ds_map_find_value(Map, "word");
show_message("The word name is "+name);
}

Printing to pdf from Google Apps Script HtmlOutput

For years, I have been using Google Cloud Print to print labels in our laboratories on campus (to standardize) using a Google Apps Script custom HtmlService form.
Now that GCP is becoming depreciated, I am in on a search for a solution. I have found a few options but am struggling to get the file to convert to a pdf as would be needed with these other vendors.
Currently, when you submit a text/html blob to the GCP servers in GAS, the backend converts the blob to application/pdf (as evidenced by looking at the job details in the GCP panel on Chrome under 'content type').
That said, because these other cloud print services require pdf printing, I have tried for some time now to have GAS change the file to pdf format before sending to GCP and I always get a strange result. Below, I'll show some of the strategies that I have used and include pictures of one of our simple labels generated with the different functions.
The following is the base code for the ticket and payload that has worked for years with GCP
//BUILD PRINT JOB FOR NARROW TAPES
var ticket = {
version: "1.0",
print: {
color: {
type: "STANDARD_COLOR",
vendor_id: "Color"
},
duplex: {
type: "NO_DUPLEX"
},
copies: {copies: parseFloat(quantity)},
media_size: {
width_microns: 27940,
height_microns:40960
},
page_orientation: {
type: "LANDSCAPE"
},
margins: {
top_microns:0,
bottom_microns:0,
left_microns:0,
right_microns:0
},
page_range: {
interval:
[{start:1,
end:1}]
},
}
};
var payload = {
"printerid" : QL710,
"title" : "Blank Template Label",
"content" : HtmlService.createHtmlOutput(html).getBlob(),
"contentType": 'text/html',
"ticket" : JSON.stringify(ticket)
};
This generates the expected following printout:
When trying to convert to pdf using the following code:
The following is the code used to transform to pdf:
var blob = HtmlService.createTemplate(html).evaluate().getContent();
var newBlob = Utilities.newBlob(html, "text/html", "text.html");
var pdf = newBlob.getAs("application/pdf").setName('tempfile');
var file = DriveApp.getFolderById("FOLDER ID").createFile(pdf);
var payload = {
"printerid" : QL710,
"title" : "Blank Template Label",
"content" : pdf,//HtmlService.createHtmlOutput(html).getBlob(),
"contentType": 'text/html',
"ticket" : JSON.stringify(ticket)
};
an unexpected result occurs:
This comes out the same way for direct coding in the 'content' field with and without .getBlob():
"content" : HtmlService.createHtmlOutput(html).getAs('application/pdf'),
note the createFile line in the code above used to test the pdf. This file is created as expected, of course with the wrong dimensions for label printing (not sure how to convert to pdf with the appropriate margins and page size?): see below
I have now tried to adopt Yuri's ideas; however, the conversion from html to document loses formatting.
var blob = HtmlService.createHtmlOutput(html).getBlob();
var docID = Drive.Files.insert({title: 'temp-label'}, blob, {convert: true}).id
var file = DocumentApp.openById(docID);
file.getBody().setMarginBottom(0).setMarginLeft(0).setMarginRight(0).setMarginTop(0).setPageHeight(79.2).setPageWidth(172.8);
This produces a document looks like this (picture also showing expected output in my hand).
Does anyone have insights into:
How to format the converted pdf to contain appropriate height, width
and margins.
How to convert to pdf in a way that would print correctly.
Here is a minimal code to get a better sense of context https://script.google.com/d/1yP3Jyr_r_FIlt6_aGj_zIf7HnVGEOPBKI0MpjEGHRFAWztGzcWKCJrD0/edit?usp=sharing
I've made the template (80 x 40 mm -- sorry, I don't know your size):
https://docs.google.com/document/d/1vA93FxGXcWLIEZBuQwec0n23cWGddyLoey-h0WR9weY/edit?usp=sharing
And there is the script:
function myFunction() {
// input data
var matName = '<b>testing this to <u>see</u></b> if it <i>actually</i> works <i>e.coli</i>'
var disposeWeek = 'end of semester'
var prepper = 'John Ruppert';
var className = 'Cell and <b>Molecular</b> Biology <u>Fall 2020</u> a few exercises a few exercises a few exercises a few exercises';
var hazards = 'Lots of hazards';
// make a temporary Doc from the template
var copyFile = DriveApp.getFileById('1vA93FxGXcWLIEZBuQwec0n23cWGddyLoey-h0WR9weY').makeCopy();
var doc = DocumentApp.openById(copyFile.getId());
var body = doc.getBody();
// replace placeholders with data
body.replaceText('{matName}', matName);
body.replaceText('{disposeWeek}', disposeWeek);
body.replaceText('{prepper}', prepper);
body.replaceText('{className}', className);
body.replaceText('{hazards}', hazards);
// make Italics, Bold and Underline
handle_tags(['<i>', '</i>'], body);
handle_tags(['<b>', '</b>'], body);
handle_tags(['<u>', '</u>'], body);
// save the temporary Doc
doc.saveAndClose();
// make a PDF
var docblob = doc.getBlob().setName('Label.pdf');
DriveApp.createFile(docblob);
// delete the temporary Doc
copyFile.setTrashed(true);
}
// this function applies formatting to text inside the tags
function handle_tags(tags, body) {
var start_tag = tags[0].toLowerCase();
var end_tag = tags[1].toLowerCase();
var found = body.findText(start_tag);
while (found) {
var elem = found.getElement();
var start = found.getEndOffsetInclusive();
var end = body.findText(end_tag, found).getStartOffset()-1;
switch (start_tag) {
case '<b>': elem.setBold(start, end, true); break;
case '<i>': elem.setItalic(start, end, true); break;
case '<u>': elem.setUnderline(start, end, true); break;
}
found = body.findText(start_tag, found);
}
body.replaceText(start_tag, ''); // remove tags
body.replaceText(end_tag, '');
}
The script just changes the {placeholders} with the data and saves the result as a PDF file (Label.pdf). The PDF looks like this:
There is one thing, I'm not sure if it's possible -- to change a size of the texts dynamically to fit them into the cells, like it's done in your 'autosize.html'. Roughly, you can take a length of the text in the cell and, in case it is bigger than some number, to make the font size a bit smaller. Probably you can use the jquery texfill function from the 'autosize.html' to get an optimal size and apply the size in the document.
I'm not sure if I got you right. Do you need make PDF and save it on Google Drive? You can do in Google Docs.
As example:
Make a new document with your table and text. Something like this
Add this script into your doc:
function myFunction() {
var copyFile = DriveApp.getFileById(ID).makeCopy();
var newFile = DriveApp.createFile(copyFile.getAs('application/pdf'));
newFile.setName('label');
copyFile.setTrashed(true);
}
Every time you run this script it makes the file 'label.pdf' on your Google Drive.
The size of this pdf will be the same as the page size of your Doc. You can make any size of page with add-on: Page Sizer https://webapps.stackexchange.com/questions/129617/how-to-change-the-size-of-paper-in-google-docs-to-custom-size
If you need to change the text in your label before generate pdf or/and you need change the name of generated file, you can do it via script as well.
Here is a variant of the script that changes a font size in one of the cells if the label doesn't fit into one page.
function main() {
// input texts
var text = {};
text.matName = '<b>testing this to <u>see</u></b> if it <i>actually</i> works <i>e.coli</i>';
text.disposeWeek = 'end of semester';
text.prepper = 'John Ruppert';
text.className = 'Cell and <b>Molecular</b> Biology <u>Fall 2020</u> a few exercises a few exercises a few exercises a few exercises';
text.hazards = 'Lots of hazards';
// initial max font size for the 'matName'
var size = 10;
var doc_blob = set_text(text, size);
// if we got more than 1 page, reduce the font size and repeat
while ((size > 4) && (getNumPages(doc_blob) > 1)) {
size = size-0.5;
doc_blob = set_text(text, size);
}
// save pdf
DriveApp.createFile(doc_blob);
}
// this function takes texts and a size and put the texts into fields
function set_text(text, size) {
// make a copy
var copyFile = DriveApp.getFileById('1vA93FxGXcWLIEZBuQwec0n23cWGddyLoey-h0WR9weY').makeCopy();
var doc = DocumentApp.openById(copyFile.getId());
var body = doc.getBody();
// replace placeholders with data
body.replaceText('{matName}', text.matName);
body.replaceText('{disposeWeek}', text.disposeWeek);
body.replaceText('{prepper}', text.prepper);
body.replaceText('{className}', text.className);
body.replaceText('{hazards}', text.hazards);
// set font size for 'matName'
body.findText(text.matName).getElement().asText().setFontSize(size);
// make Italics, Bold and Underline
handle_tags(['<i>', '</i>'], body);
handle_tags(['<b>', '</b>'], body);
handle_tags(['<u>', '</u>'], body);
// save the doc
doc.saveAndClose();
// delete the copy
copyFile.setTrashed(true);
// return blob
return docblob = doc.getBlob().setName('Label.pdf');
}
// this function formats the text beween html tags
function handle_tags(tags, body) {
var start_tag = tags[0].toLowerCase();
var end_tag = tags[1].toLowerCase();
var found = body.findText(start_tag);
while (found) {
var elem = found.getElement();
var start = found.getEndOffsetInclusive();
var end = body.findText(end_tag, found).getStartOffset()-1;
switch (start_tag) {
case '<b>': elem.setBold(start, end, true); break;
case '<i>': elem.setItalic(start, end, true); break;
case '<u>': elem.setUnderline(start, end, true); break;
}
found = body.findText(start_tag, found);
}
body.replaceText(start_tag, '');
body.replaceText(end_tag, '');
}
// this funcion takes saved doc and returns the number of its pages
function getNumPages(doc) {
var blob = doc.getAs('application/pdf');
var data = blob.getDataAsString();
var pages = parseInt(data.match(/ \/N (\d+) /)[1], 10);
Logger.log("pages = " + pages);
return pages;
}
It looks rather awful and hopeless. It turned out that Google Docs has no page number counter. You need to convert your document into a PDF and to count pages of the PDF file. Gross!
Next problem, even if you managed somehow to count the pages, you have no clue which of the cells was overflowed. This script takes just one cell, changes its font size, counts pages, changes the font size again, etc. But it doesn't granted a success, because there can be another cell with long text inside. You can reduce font size of all the texts, but it doesn't look like a great idea as well.

Paypal Php Sdk - NotifyUrl is not a fully qualified URL Error

I have this code
$product_info = array();
if(isset($cms['site']['url_data']['product_id'])){
$product_info = $cms['class']['product']->get($cms['site']['url_data']['product_id']);
}
if(!isset($product_info['id'])){
/*
echo 'No product info.';
exit();
*/
header_url(SITE_URL.'?subpage=user_subscription#xl_xr_page_my%20account');
}
$fee = $product_info['yearly_price_end'] / 100 * $product_info['fee'];
$yearly_price_end = $product_info['yearly_price_end'] + $fee;
$fee = ($product_info['setup_price_end'] / 100) * $product_info['fee'];
$setup_price_end = $product_info['setup_price_end'] + $fee;
if(isset($_SESSION['discountcode_amount'])){
$setup_price_end = $setup_price_end - $_SESSION['discountcode_amount'];
unset($_SESSION['discountcode_amount']);
}
$error = false;
$plan_id = '';
$approvalUrl = '';
$ReturnUrl = SITE_URL.'payment/?payment_type=paypal&payment_page=process_agreement';
$CancelUrl = SITE_URL.'payment/?payment_type=paypal&payment_page=cancel_agreement';
$now = $cms['date'];
$now->modify('+5 minutes');
$apiContext = new \PayPal\Rest\ApiContext(
new \PayPal\Auth\OAuthTokenCredential(
$cms['options']['plugin_paypal_clientid'], // ClientID
$cms['options']['plugin_paypal_clientsecret'] // ClientSecret
)
);
use PayPal\Api\ChargeModel;
use PayPal\Api\Currency;
use PayPal\Api\MerchantPreferences;
use PayPal\Api\PaymentDefinition;
use PayPal\Api\Plan;
use PayPal\Api\Patch;
use PayPal\Api\PatchRequest;
use PayPal\Common\PayPalModel;
use PayPal\Api\Agreement;
use PayPal\Api\Payer;
use PayPal\Api\ShippingAddress;
// Create a new instance of Plan object
$plan = new Plan();
// # Basic Information
// Fill up the basic information that is required for the plan
$plan->setName($product_info['name'])
->setDescription($product_info['desc_text'])
->setType('fixed');
// # Payment definitions for this billing plan.
$paymentDefinition = new PaymentDefinition();
// The possible values for such setters are mentioned in the setter method documentation.
// Just open the class file. e.g. lib/PayPal/Api/PaymentDefinition.php and look for setFrequency method.
// You should be able to see the acceptable values in the comments.
$setFrequency = 'Year';
//$setFrequency = 'Day';
$paymentDefinition->setName('Regular Payments')
->setType('REGULAR')
->setFrequency($setFrequency)
->setFrequencyInterval("1")
->setCycles("999")
->setAmount(new Currency(array('value' => $yearly_price_end, 'currency' => $cms['session']['client']['currency']['iso_code'])));
// Charge Models
$chargeModel = new ChargeModel();
$chargeModel->setType('SHIPPING')
->setAmount(new Currency(array('value' => 0, 'currency' => $cms['session']['client']['currency']['iso_code'])));
$paymentDefinition->setChargeModels(array($chargeModel));
$merchantPreferences = new MerchantPreferences();
// ReturnURL and CancelURL are not required and used when creating billing agreement with payment_method as "credit_card".
// However, it is generally a good idea to set these values, in case you plan to create billing agreements which accepts "paypal" as payment_method.
// This will keep your plan compatible with both the possible scenarios on how it is being used in agreement.
$merchantPreferences->setReturnUrl($ReturnUrl)
->setCancelUrl($CancelUrl)
->setAutoBillAmount("yes")
->setInitialFailAmountAction("CONTINUE")
->setMaxFailAttempts("0")
->setSetupFee(new Currency(array('value' => $setup_price_end, 'currency' => $cms['session']['client']['currency']['iso_code'])));
$plan->setPaymentDefinitions(array($paymentDefinition));
$plan->setMerchantPreferences($merchantPreferences);
// ### Create Plan
try {
$output = $plan->create($apiContext);
} catch (Exception $ex){
die($ex);
}
echo $output->getId().'<br />';
echo $output.'<br />';
Been working with paypal php sdk for some days now and my code stop working.
So i went back to basic and i am still getting the same damn error.
I am trying to create a plan for subscription but getting the following error:
"NotifyUrl is not a fully qualified URL"
I have no idea how to fix this as i dont use NotfifyUrl in my code?
Could be really nice if anyone had an idea how to fix this problem :)
Thanks
PayPal did a update to their API last night which has caused problem within their SDK.
They are sending back null values in their responses.
I MUST stress the error is not on sending the request to PayPal, but on processing their response.
BUG Report : https://github.com/paypal/PayPal-PHP-SDK/issues/1151
Pull Request : https://github.com/paypal/PayPal-PHP-SDK/pull/1152
Hope this helps, but their current SDK is throwing exceptions.
Use below simple fix.
Replace below function in vendor\paypal\rest-api-sdk-php\lib\PayPal\Api\MerchantPreferences.php
public function setNotifyUrl($notify_url)
{
if(!empty($notify_url)){
UrlValidator::validate($notify_url, "NotifyUrl");
}
$this->notify_url = $notify_url;
return $this;
}
If you get the same error for return_url/cancel_url, add the if condition as above.
Note: This is not a permanent solution, you can use this until getting the update from PayPal.
From the GitHub repo for the PayPal PHP SDK, I see that the error you mentioned is thrown when MerchantPreferences is not given a valid NotifyUrl. I see you're setting the CancelUrl and ReturnUrl, but not the NotifyUrl. You may simply need to set that as well, i.e.:
$NotifyUrl = (some url goes here)
$obj->setNotifyUrl($NotifyUrl);
Reason behind it!
error comes from.
vendor\paypal\rest-api-sdk-php\lib\PayPal\Validation\UrlValidator.php
line.
if (filter_var($url, FILTER_VALIDATE_URL) === false) {
throw new \InvalidArgumentException("$urlName is not a fully qualified URL");
}
FILTER_VALIDATE_URL: according to this php function.
INVALID URL: "http://cat_n.domain.net.in/"; // IT CONTAIN _ UNDERSCORE.
VALID URL: "http://cat-n.domain.net.in/"; it separated with - dash
here you can dump your url.
vendor\paypal\rest-api-sdk-php\lib\PayPal\Validation\UrlValidator.php
public static function validate($url, $urlName = null)
{
var_dump($url);
}
And then check this here: https://www.w3schools.com/PHP/phptryit.asp?filename=tryphp_func_validate_url
you can check here what character will reason for invalid.

Crunchbase Data API v3.1 to Google Sheets

I'm trying to pull data from the Crunchbase Open Data Map to a Google Spreadsheet. I'm following Ben Collins's script but it no longer works since the upgrade from v3 to v3.1. Anyone had any luck modifying the script for success?
var USER_KEY = 'insert your API key in here';
// function to retrive organizations data
function getCrunchbaseOrgs() {
var ss = SpreadsheetApp.getActiveSpreadsheet();
var sheet = ss.getSheetByName('Organizations');
var query = sheet.getRange(3,2).getValue();
// URL and params for the Crunchbase API
var url = 'https://api.crunchbase.com/v/3/odm-organizations?query=' + encodeURI(query) + '&user_key=' + USER_KEY;
var json = getCrunchbaseData(url,query);
if (json[0] === "Error:") {
// deal with error with fetch operation
sheet.getRange(5,1,sheet.getLastRow(),2).clearContent();
sheet.getRange(6,1,1,2).setValues([json]);
}
else {
if (json[0] !== 200) {
// deal with error from api
sheet.getRange(5,1,sheet.getLastRow(),2).clearContent();
sheet.getRange(6,1,1,2).setValues([["Error, server returned code:",json[0]]]);
}
else {
// correct data comes back, filter down to match the name of the entity
var data = json[1].data.items.filter(function(item) {
return item.properties.name == query;
})[0].properties;
// parse into array for Google Sheet
var outputData = [
["Name",data.name],
["Homepage",data.homepage_url],
["Type",data.primary_role],
["Short description",data.short_description],
["Country",data.country_code],
["Region",data.region_name],
["City name",data.city_name],
["Blog url",data.blog_url],
["Facebook",data.facebook_url],
["Linkedin",data.linkedin_url],
["Twitter",data.twitter_url],
["Crunchbase URL","https://www.crunchbase.com/" + data.web_path]
];
// clear any old data
sheet.getRange(5,1,sheet.getLastRow(),2).clearContent();
// insert new data
sheet.getRange(6,1,12,2).setValues(outputData);
// add image with formula and format that row
sheet.getRange(5,2).setFormula('=image("' + data.profile_image_url + '",4,50,50)').setHorizontalAlignment("center");
sheet.setRowHeight(5,60);
}
}
}
This code no longer pulls data as expected.
I couldn't confirm about the error messages when you ran the script. So I would like to show about the clear difference point. It seems that the endpoint was changed from https://api.crunchbase.com/v/3/ to https://api.crunchbase.com/v3.1/. So how about this modification?
From :
var url = 'https://api.crunchbase.com/v/3/odm-organizations?query=' + encodeURI(query) + '&user_key=' + USER_KEY;
To :
var url = 'https://api.crunchbase.com/v3.1/odm-organizations?query=' + encodeURI(query) + '&user_key=' + USER_KEY;
Note :
From your script, I couldn't also find query. So if the script doesn't work even when you modified the endpoint, please confirm about it. You can see the detail of API v3 Compared to API v3.1 is here.
References :
API v3 Compared to API v3.1
Using the API
If this was not useful for you, I'm sorry.

Get pdf-attachments from Gmail as text

I searched around the web & Stack Overflow but didn't find a solution. What I try to do is the following: I get certain attachments via mail that I would like to have as (Plain) text for further processing. My script looks like this:
function MyFunction() {
var threads = GmailApp.search ('label:templabel');
var messages = GmailApp.getMessagesForThreads(threads);
for (i = 0; i < messages.length; ++i)
{
j = messages[i].length;
var messageBody = messages[i][0].getBody();
var messageSubject = messages [i][0].getSubject();
var attach = messages [i][0].getAttachments();
var attachcontent = attach.getContentAsString();
GmailApp.sendEmail("mail", messageSubject, "", {htmlBody: attachcontent});
}
}
Unfortunately this doesn't work. Does anybody here have an idea how I can do this? Is it even possible?
Thank you very much in advance.
Best, Phil
Edit: Updated for DriveApp, as DocsList deprecated.
I suggest breaking this down into two problems. The first is how to get a pdf attachment from an email, the second is how to convert that pdf to text.
As you've found out, getContentAsString() does not magically change a pdf attachment to plain text or html. We need to do something a little more complicated.
First, we'll get the attachment as a Blob, a utility class used by several Services to exchange data.
var blob = attachments[0].getAs(MimeType.PDF);
So with the second problem separated out, and maintaining the assumption that we're interested in only the first attachment of the first message of each thread labeled templabel, here is how myFunction() looks:
/**
* Get messages labeled 'templabel', and send myself the text contents of
* pdf attachments in new emails.
*/
function myFunction() {
var threads = GmailApp.search('label:templabel');
var threadsMessages = GmailApp.getMessagesForThreads(threads);
for (var thread = 0; thread < threadsMessages.length; ++thread) {
var message = threadsMessages[thread][0];
var messageBody = message.getBody();
var messageSubject = message.getSubject();
var attachments = message.getAttachments();
var blob = attachments[0].getAs(MimeType.PDF);
var filetext = pdfToText( blob, {keepTextfile: false} );
GmailApp.sendEmail(Session.getActiveUser().getEmail(), messageSubject, filetext);
}
}
We're relying on a helper function, pdfToText(), to convert our pdf blob into text, which we'll then send to ourselves as a plain text email. This helper function has a variety of options; by setting keepTextfile: false, we've elected to just have it return the text content of the PDF file to us, and leave no residual files in our Drive.
pdfToText()
This utility is available as a gist. Several examples are provided there.
A previous answer indicated that it was possible to use the Drive API's insert method to perform OCR, but it didn't provide code details. With the introduction of Advanced Google Services, the Drive API is easily accessible from Google Apps Script. You do need to switch on and enable the Drive API from the editor, under Resources > Advanced Google Services.
pdfToText() uses the Drive service to generate a Google Doc from the content of the PDF file. Unfortunately, this contains the "pictures" of each page in the document - not much we can do about that. It then uses the regular DocumentService to extract the document body as plain text.
/**
* See gist: https://gist.github.com/mogsdad/e6795e438615d252584f
*
* Convert pdf file (blob) to a text file on Drive, using built-in OCR.
* By default, the text file will be placed in the root folder, with the same
* name as source pdf (but extension 'txt'). Options:
* keepPdf (boolean, default false) Keep a copy of the original PDF file.
* keepGdoc (boolean, default false) Keep a copy of the OCR Google Doc file.
* keepTextfile (boolean, default true) Keep a copy of the text file.
* path (string, default blank) Folder path to store file(s) in.
* ocrLanguage (ISO 639-1 code) Default 'en'.
* textResult (boolean, default false) If true and keepTextfile true, return
* string of text content. If keepTextfile
* is false, text content is returned without
* regard to this option. Otherwise, return
* id of textfile.
*
* #param {blob} pdfFile Blob containing pdf file
* #param {object} options (Optional) Object specifying handling details
*
* #returns {string} id of text file (default) or text content
*/
function pdfToText ( pdfFile, options ) {
// Ensure Advanced Drive Service is enabled
try {
Drive.Files.list();
}
catch (e) {
throw new Error( "To use pdfToText(), first enable 'Drive API' in Resources > Advanced Google Services." );
}
// Set default options
options = options || {};
options.keepTextfile = options.hasOwnProperty("keepTextfile") ? options.keepTextfile : true;
// Prepare resource object for file creation
var parents = [];
if (options.path) {
parents.push( getDriveFolderFromPath (options.path) );
}
var pdfName = pdfFile.getName();
var resource = {
title: pdfName,
mimeType: pdfFile.getContentType(),
parents: parents
};
// Save PDF to Drive, if requested
if (options.keepPdf) {
var file = Drive.Files.insert(resource, pdfFile);
}
// Save PDF as GDOC
resource.title = pdfName.replace(/pdf$/, 'gdoc');
var insertOpts = {
ocr: true,
ocrLanguage: options.ocrLanguage || 'en'
}
var gdocFile = Drive.Files.insert(resource, pdfFile, insertOpts);
// Get text from GDOC
var gdocDoc = DocumentApp.openById(gdocFile.id);
var text = gdocDoc.getBody().getText();
// We're done using the Gdoc. Unless requested to keepGdoc, delete it.
if (!options.keepGdoc) {
Drive.Files.remove(gdocFile.id);
}
// Save text file, if requested
if (options.keepTextfile) {
resource.title = pdfName.replace(/pdf$/, 'txt');
resource.mimeType = MimeType.PLAIN_TEXT;
var textBlob = Utilities.newBlob(text, MimeType.PLAIN_TEXT, resource.title);
var textFile = Drive.Files.insert(resource, textBlob);
}
// Return result of conversion
if (!options.keepTextfile || options.textResult) {
return text;
}
else {
return textFile.id
}
}
The conversion to DriveApp is helped with this utility from Bruce McPherson:
// From: http://ramblings.mcpher.com/Home/excelquirks/gooscript/driveapppathfolder
function getDriveFolderFromPath (path) {
return (path || "/").split("/").reduce ( function(prev,current) {
if (prev && current) {
var fldrs = prev.getFoldersByName(current);
return fldrs.hasNext() ? fldrs.next() : null;
}
else {
return current ? null : prev;
}
},DriveApp.getRootFolder());
}