Exceeded maximum execution time on Google App script with Google Big Query - sql

How can I extend the execution time within my code below. Essentially, I use Google App scripts to query data from our big query data base and export data on to Google spreadsheets.
The following is my code:
function Weekly_Metric(){
var ss = SpreadsheetApp.getActiveSpreadsheet();
var sheetName = "Budget";
var sheet = ss.getSheetByName(sheetName);
ss.setActiveSheet(sheet);
var sql = ' bigqueryscript ';
var results = GSReport.runQueryAsync(sql);
var resultsValues = GSReport.parseBigQueryAPIResponse(results);
sheet.clear();
ss.appendRow(["Label1", "Label2", "Label3"]);
for ( var i = 0 ; i < resultsValues.length ; i++ ) {
ss.appendRow(resultsValues[i]);
}
}

Always reduce the number of calls to Google Apps Script services as much as you can.
In this case, the loop containing appendRow() can be replaced with javascript array operations and a single call to setValues().
...
sheet.clear();
var data = [];
data.push(["Label1", "Label2", "Label3"]);
for ( var i = 0 ; i < resultsValues.length ; i++ ) {
data.push(resultsValues[i]);
}
ss.getRange(1,1,data.length,data[0].length).setValues(data);
...
Alternatively, if resultsValues is an array of rows already, you only need to add the labels:
...
sheet.clear();
resultsValues.unshift(["Label1", "Label2", "Label3"]);
ss.getRange(1,1,resultsValues.length,resultsValues[0].length).setValues(resultsValues);
...
If that doesn't do the trick, then you should look at your GSReport object's methods.

Related

Is it possible to batch process range protections in Google Apps Script?

I have to create a dozen protected ranges in a sheet. I have code that works but is very slow because it contacts the server for each range. I know it's possible to work on a local copy of the data if there's some cell processing involved. Is it possible for range protections also?
If it's not, would caching help?
The below code uses the username from the first row as an editor for a bunch of rows in the same column.
var spreadSheet = SpreadsheetApp.getActiveSpreadsheet();
var sheets = spreadSheet.getSheets();
//Set protections per column, we start from the 4th.
for (var i = 4; i <= sheets[3].getLastColumn(); i++){
///Get the username.
var editor = sheets[3].getRange(1, i).getDisplayValue();
//Set the protection.
var protection = sheets[3].getRange(3, i, 22, 1).protect();
protection.setDescription(editor);
//Handle the case of deleted/unknown usernames.
try{
protection.addEditor(editor + '#domain.com');
} catch(error){
protection.addEditor('user#domain.com');
}
}
I've found a solution for a similar issue https://stackoverflow.com/a/37820854 but when I try to apply it to my case I get an error "TypeError: Cannot find function getRange in object Range" so I must be doing something wrong.
var test = [];
for (var i = 4; i <= sheets[3].getLastColumn(); i++){
test.push(sheets[3].getRange(3, i, 22, 1));
}
var editor;
for (var i = 0; i<test.length; i++){
var editor = test[i].getRange(1, 1).getDisplayValue();
}
The syntax for the method getRange() is getRange(row, column, numRows, numColumns), while you counter variable i loops through the COLUMNS instead of ROWS.
If your intention is to loop through all columns and add an editor to each one, it should be something like
for (var i = 4; i <= sheets[3].getLastColumn(); i++){
///Get the username.
var editor = sheets[3].getRange(1, i).getDisplayValue();
//Set the protection.
var protection = sheets[3].getRange(startRow, i, rowNumber, columnNumber).protect();
protection.setDescription(editor);
//Handle the case of deleted/unknown usernames.
try{
protection.addEditor(editor + '#domain.com');
} catch(error){
protection.addEditor('user#domain.com');
}
}
Its possible to do batch processing.
But you'll have to use Advanced Google Services. Check out the Sheets Advanced service and the Sheets API documentation.

Merging many spreadsheets into report file exceeds maximum execution time

I am using the following script to add rows of files from a student loop in the Google spreadsheet if credits are less than x. The script was working good but as the data in the spreadsheet is being added daily, now the script is throwing "Exceeded maximum execution time" error (we have more than 2000 files). As I am new to scripting I don't know how to optimize the code.
Could someone help me to optimize the code or any solution so that the execution time take less than 5 min. Every time you compare to an email, it has to be compared to many emails. Please Help!
function updated() {
//Final file data (Combined)
var filecombined = SpreadsheetApp.openById("XXXXXXXXXX");
var sheet2 = filecombined.getSheets();
//Folder with all the files
var parentFolder = DriveApp.getFolderById("YYYYYYYYYYYY");
var files = parentFolder.getFiles();
//Current Date
var fecha = new Date();
//Path for each file in the folder
while (files.hasNext()) {
var idarchivo = files.next().getId();
var sps = SpreadsheetApp.openById(idarchivo);
var sheet = sps.getSheetByName('STUDENT PROFILE');
var data = sheet.getDataRange().getValues();
var credits = data[5][1];
//Flat; bandera:1 (new row), bandera:2 (update row)
var bandera = 1;
//Take data from final file (Combined)
var data2 = sheet2[0].getDataRange().getValues();
//If credits are less than X: write
if (credits < 120) {
var email = data[2][1];
var lastrow = filecombined.getLastRow();
var u = 0;
//comparison loop by email, if found it, update and exit the loop
while (u < lastrow) {
u = u + 1;
if (email == data2[u - 1][1]) {
sheet2[0].getRange(u, 3).setValue(credits);
sheet2[0].getRange(u, 4).setValue(fecha);
u = lastrow;
bandera = 2;
}
}
//if that email does not exist, write a new row
if (bandera == 1) {
var nombre = data[0][1];
sheet2[0].getRange(lastrow + 1, 1).setValue(nombre);
sheet2[0].getRange(lastrow + 1, 2).setValue(email);
sheet2[0].getRange(lastrow + 1, 3).setValue(credits);
sheet2[0].getRange(lastrow + 1, 4).setValue(fecha);
}
}
}
SpreadsheetApp.flush();
}
The questioner's code is taking taking more than 4-6 minutes to run and is getting an error Exceeded maximum execution time.
The following answer is based solely on the code provided by the questioner. We don't have any information about the 'filecombined' spreadsheet, its size and triggers. We are also in the dark about the various student spreadsheets, their size, etc, except that we know that there are 2,000 of these files. We don't know how often this routine is run, nor how many students have credits <120.
getvalues and setvalues statements are very costly; typically 0.2 seconds each. The questioners code includes a variety of such statements - some are unavoidable but others are not.
In looking at optimising this code, I made two major changes.
1 - I moved line 27 var data2 = sheet2[0].getDataRange().getValues();
This line need only be executed once and I relocated it at the top of the code just after the various "filecombined" commands. As it stood, this line was being executed once for every student spreadsheet; this along may have contributed to several minutes of execution time.
2) I converted certain setvalue commands to an array, and then updated the "filecombined" spreadsheet from the array once only, at the end of the processing. Depending on the number of students with low credits and who are not already on the "filecombined" sheet, this may represent a substantial saving.
The code affected was lines 47 to 50.
line47: sheet2[0].getRange(lastrow+1, 1).setValue(nombre);
line48: sheet2[0].getRange(lastrow+1, 2).setValue(email);
line49: sheet2[0].getRange(lastrow+1, 3).setValue(credits);
line50: sheet2[0].getRange(lastrow+1, 4).setValue(fecha);
There are setvalue commands also executed at lines 38 and 39 (if the student is already on the "filecombined" spreadsheet), but I chose to leave these as-is. As noted above, we don't know how many such students there might be, and the cost of these setvalue commands may be minor or not. Until this is clear, and in the light of other time savings, I chose to leave them as-is.
function updated() {
//Final file data (Combined)
var filecombined = SpreadsheetApp.openById("XXXXXXXXXX");
var sheet2 = filecombined.getSheets();
//Take data from final file (Combined)
var data2 = sheet2[0].getDataRange().getValues();
// create some arrays
var Newdataarray = [];
var Masterarray = [];
//Folder with all the files
var parentFolder = DriveApp.getFolderById("YYYYYYYYYYYY");
var files = parentFolder.getFiles();
//Current Date
var fecha = new Date();
//Path for each file in the folder
while (files.hasNext()) {
var idarchivo = files.next().getId();
var sps = SpreadsheetApp.openById(idarchivo);
var sheet = sps.getSheetByName('STUDENT PROFILE');
var data = sheet.getDataRange().getValues();
var credits = data[5][1];
//Flat; bandera:1 (new row), bandera:2 (update row)
var bandera = 1;
//If credits are less than X: write
if (credits < 120){
var email = data[2][1];
var lastrow = filecombined.getLastRow();
var u = 0;
//comparison loop by email, if found it, update and exit the loop
while (u < lastrow) {
u = u + 1;
if (email == data2[u-1][1]){
sheet2[0].getRange(u, 3).setValue(credits);
sheet2[0].getRange(u, 4).setValue(fecha);
u = lastrow;
bandera = 2;
}
}
//if that email does not exist, write a new row
if(bandera == 1){
var nombre = data[0][1];
Newdataarray = [];
Newdataarray.push(nombre);
Newdataarray.push(email);
Newdataarray.push(credits);
Newdataarray.push(fecha);
Masterarray.push(Newdataarray);
}
}
}
// update the target sheet with the contents of the array
// these are all adding new rows
lastrow = filecombined.getLastRow();
sheet2[0].getRange(lastrow+1, 1, Masterarray.length, 4);
sheet2[0].setValues(Masterarray);
SpreadsheetApp.flush();
}
As I mentioned in my comment, the biggest issue you have is that you repeatedly search an array for a value, when you could use a much faster lookup function.
// Create an object that maps an email address to the (last) array
// index of that email in the `data2` array.
const knownEmails = data2.reduce(function (acc, row, index) {
var email = row[1]; // email is the 2nd element of the inner array (Column B on a spreadsheet)
acc[email] = index;
return acc;
}, {});
Then you can determine if an email existed in data2 by trying to obtain the value for it:
// Get this email's index in `data2`:
var index = knownEmails[email];
if (index === undefined) {
// This is a new email we didn't know about before
...
} else {
// This is an email we knew about already.
var u = ++index; // Convert the array index into a worksheet row (assumes `data2` is from a range that started at Row 1)
...
}
To understand how we are constructing knownEmails from data2, you may find the documentation on Array#reduce helpful.

Filter cached sqlJdbs query in Pentaho CE

I use sqlJdbs query as a data provider for my CCC controls. I use geospatial request in my query that's why I cache my results(Cache=True). Otherwise the request made long.
It works fine. However I have to use parameters in my query to filter resulting rows:
SELECT ...
FROM ...
WHERE someField IN (${aoi_param})
Is there some way to cache full set of rows and then apply WHERE to cached results without rebuilding new cache for each set of values in the ${aoi_param}?
What is the best practice?
So, I am not really sure that it is the best practice, but I solved my problem this way:
I included aoi_param to the Listeners and Parameters of my chart control
Then I filtered data set in Post Fetch:
function f(data){
var _aoi_param = this.dashboard.getParameterValue('${p:aoi_param}');
function isInArray(myValue, myArray) {
var arrayLength = myArray.length;
for (var i = 0; i < arrayLength; i++) {
if (myValue == myArray[i]) return true;
}
return false;
}
function getFiltered(cdaData, filterArray) {
var allCdaData = cdaData;
cdaData = {
metadata: allCdaData.metadata,
resultset: allCdaData.resultset.filter(function(row){
// 2nd column is an AOI id in my dataset
return isInArray(row[2], filterArray);
})
};
return cdaData;
}
var dataFiltered = getFiltered(data, _aoi_param);
return dataFiltered;
}
excluded WHERE someField IN (${aoi_param}) from the query of my sql over sqlJdbc component

BigQuery UDF memory exceeded error on multiple rows but works fine on single row

I'm writing a UDF to process Google Analytics data, and getting the "UDF out of memory" error message when I try to process multiple rows. I downloaded the raw data and found the largest record and tried running my UDF query on that, with success. Some of the rows have up to 500 nested hits, and the size of the hit record (by far the largest component of each row of the raw GA data) does seem to have an effect on how many rows I can process before getting the error.
For example, the query
select
user.ga_user_id,
ga_session_id,
...
from
temp_ga_processing(
select
fullVisitorId,
visitNumber,
...
from [79689075.ga_sessions_20160201] limit 100)
returns the error, but
from [79689075.ga_sessions_20160201] where totals.hits = 500 limit 1)
does not.
I was under the impression that any memory limitations were per-row? I've tried several techniques, such as setting row = null; before emit(return_dict); (where return_dict is the processed data) but to no avail.
The UDF itself doesn't do anything fancy; I'd paste it here but it's ~45 kB in length. It essentially does a bunch of things along the lines of:
function temp_ga_processing(row, emit) {
topic_id = -1;
hit_numbers = [];
first_page_load_hits = [];
return_dict = {};
return_dict["user"] = {};
return_dict["user"]["ga_user_id"] = row.fullVisitorId;
return_dict["ga_session_id"] = row.fullVisitorId.concat("-".concat(row.visitNumber));
for(i=0;i<row.hits.length;i++) {
hit_dict = {};
hit_dict["page"] = {};
hit_dict["time"] = row.hits[i].time;
hit_dict["type"] = row.hits[i].type;
hit_dict["page"]["engaged_10s"] = false;
hit_dict["page"]["engaged_30s"] = false;
hit_dict["page"]["engaged_60s"] = false;
add_hit = true;
for(j=0;j<row.hits[i].customMetrics.length;j++) {
if(row.hits[i].customDimensions[j] != null) {
if(row.hits[i].customMetrics[j]["index"] == 3) {
metrics = {"video_play_time": row.hits[i].customMetrics[j]["value"]};
hit_dict["metrics"] = metrics;
metrics = null;
row.hits[i].customDimensions[j] = null;
}
}
}
hit_dict["topic"] = {};
hit_dict["doctor"] = {};
hit_dict["doctor_location"] = {};
hit_dict["content"] = {};
if(row.hits[i].customDimensions != null) {
for(j=0;j<row.hits[i].customDimensions.length;j++) {
if(row.hits[i].customDimensions[j] != null) {
if(row.hits[i].customDimensions[j]["index"] == 1) {
hit_dict["topic"] = {"name": row.hits[i].customDimensions[j]["value"]};
row.hits[i].customDimensions[j] = null;
continue;
}
if(row.hits[i].customDimensions[j]["index"] == 3) {
if(row.hits[i].customDimensions[j]["value"].search("doctor") > -1) {
return_dict["logged_in_as_doctor"] = true;
}
}
// and so on...
}
}
}
if(row.hits[i]["eventInfo"]["eventCategory"] == "page load time" && row.hits[i]["eventInfo"]["eventLabel"].search("OUTLIER") == -1) {
elre = /(?:onLoad|pl|page):(\d+)/.exec(row.hits[i]["eventInfo"]["eventLabel"]);
if(elre != null) {
if(parseInt(elre[0].split(":")[1]) <= 60000) {
first_page_load_hits.push(parseFloat(row.hits[i].hitNumber));
if(hit_dict["page"]["page_load"] == null) {
hit_dict["page"]["page_load"] = {};
}
hit_dict["page"]["page_load"]["sample"] = 1;
page_load_time_re = /(?:onLoad|pl|page):(\d+)/.exec(row.hits[i]["eventInfo"]["eventLabel"]);
if(page_load_time_re != null) {
hit_dict["page"]["page_load"]["page_load_time"] = parseFloat(page_load_time_re[0].split(':')[1])/1000;
}
}
// and so on...
}
}
row = null;
emit return_dict;
}
The job ID is realself-main:bquijob_4c30bd3d_152fbfcd7fd
Update Aug 2016 : We have pushed out an update that will allow the JavaScript worker to use twice as much RAM. We will continue to monitor jobs that have failed with JS OOM to see if more increases are necessary; in the meantime, please let us know if you have further jobs failing with OOM. Thanks!
Update : this issue was related to limits we had on the size of the UDF code. It looks like V8's optimize+recompile pass of the UDF code generates a data segment that was bigger than our limits, but this was only happening when when the UDF runs over a "sufficient" number of rows. I'm meeting with the V8 team this week to dig into the details further.
#Grayson - I was able to run your job over the entire 20160201 table successfully; the query takes 1-2 minutes to execute. Could you please verify that this works on your side?
We've gotten a few reports of similar issues that seem related to # rows processed. I'm sorry for the trouble; I'll be doing some profiling on our JavaScript runtime to try to find if and where memory is being leaked. Stay tuned for the analysis.
In the meantime, if you're able to isolate any specific rows that cause the error, that would also be very helpful.
A UDF will fail on anything but very small datasets if it has a lot of if/then levels, such as:
if () {
.... if() {
.........if () {
etc
We had to track down and remove the deepest if/then statement.
But, that is not enough. In addition, when you pass the data into the UDF run a "GROUP EACH BY" on all the variables. This will force BQ to send the output to multiple "workers". Otherwise it will also fail.
I've wasted 3 days of my life on this annoying bug. Argh.
I love the concept of parsing my logs in BigQuery, but I've got the same problem, I get
Error: Resources exceeded during query execution.
The Job Id is bigquery-looker:bquijob_260be029_153dd96cfdb, if that at all helps.
I wrote a very basic parser does a simple match and returns rows. Works just fine on a 10K row data set, but I get out of resources when trying to run against a 3M row logfile.
Any suggestions for a work around?
Here is the javascript code.
function parseLogRow(row, emit) {
r = (row.logrow ? row.logrow : "") + (typeof row.l2 !== "undefined" ? " " + row.l2 : "") + (row.l3 ? " " + row.l3 : "")
ts = null
category = null
user = null
message = null
db = null
found = false
if (r) {
m = r.match(/^(\d\d\d\d-\d\d-\d\d \d\d:\d\d:\d\d\.\d\d\d (\+|\-)\d\d\d\d) \[([^|]*)\|([^|]*)\|([^\]]*)\] :: (.*)/ )
if( m){
ts = new Date(m[1])/1000
category = m[3] || null
user = m[4] || null
db = m[5] || null
message = m[6] || null
found = true
}
else {
message = r
found = false
}
}
emit({
ts: ts,
category: category,
user: user,
db: db,
message: message,
found: found
});
}
bigquery.defineFunction(
'parseLogRow', // Name of the function exported to SQL
['logrow',"l2","l3"], // Names of input columns
[
{'name': 'ts', 'type': 'timestamp'}, // Output schema
{'name': 'category', 'type': 'string'},
{'name': 'user', 'type': 'string'},
{'name': 'db', 'type': 'string'},
{'name': 'message', 'type': 'string'},
{'name': 'found', 'type': 'boolean'},
],
parseLogRow // Reference to JavaScript UDF
);

Why can I not use Continuation when using a proxy class to access MS CRM 2013?

So I have a standard service reference proxy calss for MS CRM 2013 (i.e. right-click add reference etc...) I then found the limitation that CRM data calls limit to 50 results and I wanted to get the full list of results. I found two methods, one looks more correct, but doesn't seem to work. I was wondering why it didn't and/or if there was something I'm doing incorrectly.
Basic setup and process
crmService = new CrmServiceReference.MyContext(new Uri(crmWebServicesUrl));
crmService.Credentials = System.Net.CredentialCache.DefaultCredentials;
var accountAnnotations = crmService.AccountSet.Where(a => a.AccountNumber = accountNumber).Select(a => a.Account_Annotation).FirstOrDefault();
Using Continuation (something I want to work, but looks like it doesn't)
while (accountAnnotations.Continuation != null)
{
accountAnnotations.Load(crmService.Execute<Annotation>(accountAnnotations.Continuation.NextLinkUri));
}
using that method .Continuation is always null and accountAnnotations.Count is always 50 (but there are more than 50 records)
After struggling with .Continutation for a while I've come up with the following alternative method (but it seems "not good")
var accountAnnotationData = accountAnnotations.ToList();
var accountAnnotationFinal = accountAnnotations.ToList();
var index = 1;
while (accountAnnotationData.Count == 50)
{
accountAnnotationData = (from a in crmService.AnnotationSet
where a.ObjectId.Id == accountAnnotationData.First().ObjectId.Id
select a).Skip(50 * index).ToList();
accountAnnotationFinal = accountAnnotationFinal.Union(accountAnnotationData).ToList();
index++;
}
So the second method seems to work, but for any number of reasons it doesn't seem like the best. Is there a reason .Continuation is always null? Is there some setup step I'm missing or some nice way to do this?
The way to get the records from CRM is to use paging here is an example with a query expression but you can also use fetchXML if you want
// Query using the paging cookie.
// Define the paging attributes.
// The number of records per page to retrieve.
int fetchCount = 3;
// Initialize the page number.
int pageNumber = 1;
// Initialize the number of records.
int recordCount = 0;
// Define the condition expression for retrieving records.
ConditionExpression pagecondition = new ConditionExpression();
pagecondition.AttributeName = "address1_stateorprovince";
pagecondition.Operator = ConditionOperator.Equal;
pagecondition.Values.Add("WA");
// Define the order expression to retrieve the records.
OrderExpression order = new OrderExpression();
order.AttributeName = "name";
order.OrderType = OrderType.Ascending;
// Create the query expression and add condition.
QueryExpression pagequery = new QueryExpression();
pagequery.EntityName = "account";
pagequery.Criteria.AddCondition(pagecondition);
pagequery.Orders.Add(order);
pagequery.ColumnSet.AddColumns("name", "address1_stateorprovince", "emailaddress1", "accountid");
// Assign the pageinfo properties to the query expression.
pagequery.PageInfo = new PagingInfo();
pagequery.PageInfo.Count = fetchCount;
pagequery.PageInfo.PageNumber = pageNumber;
// The current paging cookie. When retrieving the first page,
// pagingCookie should be null.
pagequery.PageInfo.PagingCookie = null;
Console.WriteLine("#\tAccount Name\t\t\tEmail Address");while (true)
{
// Retrieve the page.
EntityCollection results = _serviceProxy.RetrieveMultiple(pagequery);
if (results.Entities != null)
{
// Retrieve all records from the result set.
foreach (Account acct in results.Entities)
{
Console.WriteLine("{0}.\t{1}\t\t{2}",
++recordCount,
acct.EMailAddress1,
acct.Name);
}
}
// Check for more records, if it returns true.
if (results.MoreRecords)
{
// Increment the page number to retrieve the next page.
pagequery.PageInfo.PageNumber++;
// Set the paging cookie to the paging cookie returned from current results.
pagequery.PageInfo.PagingCookie = results.PagingCookie;
}
else
{
// If no more records are in the result nodes, exit the loop.
break;
}
}