RESTful API list files with 'setFields()' modifier gives unexpected results - google-drive-android-api

In an Android app, I'm trying to list files from the Drive, using the RESTful API.
https://developers.google.com/drive/v2/reference/files/list
I have 332 (number is just for example) files in the Google Drive and I run the following code snippet (taken directly from the doc above), trying to list them:
private com.google.api.services.drive.Drive _svc;
...
Files.List rqst = _svc.files().list(); //.setFields("items(title, id)").setMaxResults(90);
do try {
FileList list = rqst.execute();
Log.i("TAG", "count: " + list.getItems().size());
rqst.setPageToken(list.getNextPageToken());
} catch (IOException e) { rqst.setPageToken(null); }
while (rqst.getPageToken() != null && rqst.getPageToken().length() > 0);
if I use plain request:
_svc.files().list();
the loop produces paged results as expected, i.e. 100 + 100 + 100 + 32 files
Adding additional modifier:
_svc.files().list().setFields("items(title, id)");
gives me only the first page of 100 results. The 'getNextPageToken()' produces NULL, indicating no more data available.
Needles to say that using "setMaxResults()" modifier produces the same results. For instance, if I add 'setMaxResults(90)',
_svc.files().list().setMaxResults(90);
produces 90+90+90+62 = 332 files, but
_svc.files().list().setMaxResults(90).setFields("items(title, id)");
yields only 90 files, with no more pages available.
I am assuming that I must be doing something wrong, since I find it unlikely that I would discover a bug never reported (after googling for hours) in an interface that has been
around for ages now.

You need to request the nextPageToken as one of the fields you want to receive. Otherwise, it will appear to be null since its not included in the response.

Related

OutOfMemory on custom extractor

I have stitched a lot of small XML files into one file, and then made a custom extractor to return rows with one byte array that corresponds to each file.
Run on remote/master
Run it for one file (gzipped, 11Mb), it works fine.
Run it for more than one file, I get a System.OutOfMemoryException.
Run on local/master
Run it for one or more files (gzipped 500+ Mbs), works fine.
Extractor looks like this:
public override IEnumerable<IRow> Extract(IUnstructuredReader input, IUpdatableRow output)
{
using (var stream = new StreamReader(input.BaseStream))
{
var xml = stream.ReadToEnd();
// Clean stiched XML
xml = UtilsXml.CleanXml(xml);
// Get nodes - one for each stiched file
var d = new XmlDocument();
d.LoadXml(xml);
var root = d.FirstChild;
for (int i = 0; i < root.ChildNodes.Count; i++)
{
output.Set<object>(1, Encoding.ASCII.GetBytes(root.ChildNodes[i].OuterXml.ToString()));
yield return output.AsReadOnly();
}
yield break;
}
}
and error message looks like this:
==== Caught exception System.OutOfMemoryException
at System.Xml.XmlDocument.CreateTextNode(String text)
at System.Xml.XmlLoader.LoadAttributeNode()
at System.Xml.XmlLoader.LoadNode(Boolean skipOverWhitespace)
at System.Xml.XmlLoader.LoadDocSequence(XmlDocument parentDoc)
at System.Xml.XmlDocument.Load(XmlReader reader)
at System.Xml.XmlDocument.LoadXml(String xml)
at Microsoft.Analytics.Tools.Formats.Text.XmlByteArrayRowExtractor.<Extract>d__0.MoveNext()
at ScopeEngine.SqlIpExtractor<ScopeEngine::GZipInput,Extract_0_Data0>.GetNextRow(SqlIpExtractor<ScopeEngine::GZipInput\,Extract_0_Data0>* , Extract_0_Data0* output) in d:\data\ccs\jobs\bc367467-ef86-43d2-a937-46ba2d4cc524_v0\sqlmanaged.h:line 1924
So what am I doing wrong? And how do I debug this on remote?
Thanks!
Unfortunately local run does not enforce memory allocations, so you would have to check memory in local vertex debug yourself.
Looking at your code above, I see that you are loading XML documents into a DOM. Please note that an XML DOM can explode the data size from the string representation up to a factor of 10 or more (I have seen 2 to 12 in my times as the resident SQL XML guru).
Each UDO today only gets 1/2 GB of RAM to play with. So what I assume is that your XML DOM document(s) start going beyond that.
The recommendation normally is that you use the XMLReader interface (there is a reader extractor in the samples on http://usql.io as well) and scan through the document(s) to find the information you are looking for.
If your documents are always small enough (e.g., <20MB), you may want to make sure that you release the memory of the other documents and operate one document at a time.
We do have plans to allow you to annotate your UDO with memory needs, but that is still a bit out.

Unpredictable behaviour with Selenium and jUnit

I am working on a website and trying to test it with Selenium and jUnit. I'm getting race conditions between the test and the site, despite my best efforts.
The front end of the site is HTML and jQuery. The back end (via AJAX) is PHP.
The site
I have two required text input fields (year and age), plus some others that I'm not changing in the tests that give problems. As soon as both text inputs are non-empty, an AJAX call is made to the back end. This will return 0+ results. If 0 results are returned, a results div on the screen gets some text saying that there were no results. If >0 results are returned, a table is written to the results div showing the results.
I don't want the site to wait until e.g. 4 digits' worth of year is entered before doing the AJAX call as it could be looking at ancient history (yes, really). So, as soon as both are non-empty the call should be made. If you type slowly, this means that entering e.g. 2015 will trigger calls for year=2, year=20, year=201 and year=2015. (This is OK.)
The test
I'm using page objects - one for the inputs and one for the output. At the start of the test, I wait for a prompt to be present on the screen (please enter some data) as that is generated by JavaScript that checks the state of the input fields - so I know that the page has loaded and JavaScript has run.
The wait for a prompt is made immediately after the page object is created for the output. This is the relevant method in the page object:
// Wait until the prompt / help text is displayed. Assumes that the prompt text always contains the word "Please"
public void waitForText() {
wait.until(ExpectedConditions.textToBePresentInElementLocated(By.id("resultContainer"), "Please"));
}
The method for setting the year is
public void setYear(String year){
WebElement yearField = driver.findElement(By.id(yearInputId));
if (yearField == null) {
// This should never happen
Assert.fail("Can't find year input field using id " + yearInputId);
} else {
yearField.sendKeys(new String [] {year});
driver.findElement(By.id(ageInputId)).click(); // click somewhere else
}
}
and there's a corresponding one for age.
I have a series of methods that wait for things to happen, which don't seem to have prevented the problem (below). These do things like wait for the current result values to be different from a previous snapshot of them, wait for a certain number of results to be returned etc.
I create a driver for Chrome as follows:
import org.openqa.selenium.chrome.ChromeDriver;
// ...
case CHROME: {
System.setProperty("webdriver.chrome.driver", "C:\\path\\chromedriver.exe");
result = new ChromeDriver();
break;
}
The problem
Some of the time, things work OK. Some of the time, both inputs are filled in with sensible values by the test, but the "there are 0 results" message is displayed. Some of the time, the test hangs part-way through filling in the inputs. It seems to be fine when I'm testing with Firefox, but Chrome often fails.
The fact that there is unpredictable behaviour suggests that I'm not controlling all the things I need to (and / or my attempts to control things are wrong). I can't see that I'm doing anything particularly weird, so someone must have hit these kinds of issue before.
Is there a browser issue I'm not addressing?
Is there something I'm doing wrong in setting the values?
Is there something I'm doing wrong in my test choreography?
It could be that when you start typing, the script is still loading or that there's a pending Ajax call when you start handling the next field or validation.
You could try to synchronize the calls with a low level script :
const String JS_WAIT_NO_AJAX =
"var callback = arguments[0]; (function fn(){ " +
" if(window.$ && window.$.active == 0) " +
" return callback(); " +
" setTimeout(fn, 60); " +
"})();";
JavascriptExecutor js = (JavascriptExecutor)driver;
driver.manage().timeouts().setScriptTimeout(20, TimeUnit.SECONDS);
js.executeAsyncScript(JS_WAIT_NO_AJAX);
driver.findElement(By.Id("...")).sendKeys("...");
js.executeAsyncScript(JS_WAIT_NO_AJAX);
driver.findElement(By.Id("...")).click();

Running Google Dataflow pipeline from a Google App Engine app?

I am creating a dataflow job using DataflowPipelineRunner. I tried the following scenarios.
Without specifying any machineType
With g1 small machine
with n1-highmem-2
In all the above scenarios, Input is a file from GCS which is very small file(KB size) and output is Big Query table.
I got Out of memory error in all the scenarios
The size of my compiled code is 94mb. I am trying only word count example and it did not read any input(It fails before the job starts). Please help me understand why i am getting this error.
Note: I am using appengine to start the job.
Note: The same code works with beta versoin 0.4.150414
EDIT 1
As per the suggestions in the answer tried the following,
Switched from Automatic scaling to Basic Scaling.
Used machine type B2 which provides 256MB memory
After these configuration, Java Heap Memory problem is solved. But it is trying to upload a jar into stagging location which is more than 10Mb, hence it fails.
It logs the following exception
com.google.api.client.http.HttpRequest execute: exception thrown while executing request
com.google.appengine.api.urlfetch.RequestPayloadTooLargeException: The request to https://www.googleapis.com/upload/storage/v1/b/pwccloudedw-stagging-bucket/o?name=appengine-api-L4wtoWwoElWmstI1Ia93cg.jar&uploadType=resumable&upload_id=AEnB2Uo6HCfw6Usa3aXlcOzg0g3RawrvuAxWuOUtQxwQdxoyA0cf22LKqno0Gu-hjKGLqXIo8MF2FHR63zTxrSmQ9Yk9HdCdZQ exceeded the 10 MiB limit.
at com.google.appengine.api.urlfetch.URLFetchServiceImpl.convertApplicationException(URLFetchServiceImpl.java:157)
at com.google.appengine.api.urlfetch.URLFetchServiceImpl.fetch(URLFetchServiceImpl.java:45)
at com.google.apphosting.utils.security.urlfetch.URLFetchServiceStreamHandler$Connection.fetchResponse(URLFetchServiceStreamHandler.java:543)
at com.google.apphosting.utils.security.urlfetch.URLFetchServiceStreamHandler$Connection.getInputStream(URLFetchServiceStreamHandler.java:422)
at com.google.apphosting.utils.security.urlfetch.URLFetchServiceStreamHandler$Connection.getResponseCode(URLFetchServiceStreamHandler.java:275)
at com.google.api.client.http.javanet.NetHttpResponse.<init>(NetHttpResponse.java:36)
at com.google.api.client.http.javanet.NetHttpRequest.execute(NetHttpRequest.java:94)
at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:965)
at com.google.api.client.googleapis.media.MediaHttpUploader.executeCurrentRequestWithoutGZip(MediaHttpUploader.java:545)
at com.google.api.client.googleapis.media.MediaHttpUploader.executeCurrentRequest(MediaHttpUploader.java:562)
at com.google.api.client.googleapis.media.MediaHttpUploader.resumableUpload(MediaHttpUploader.java:419)
at com.google.api.client.googleapis.media.MediaHttpUploader.upload(MediaHttpUploader.java:336)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:427)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:352)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:469)
at com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel$UploadOperation.call(AbstractGoogleAsyncWriteChannel.java:357)
at java.util.concurrent.FutureTask.run(FutureTask.java:260)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1168)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:605)
at com.google.apphosting.runtime.ApiProxyImpl$CurrentRequestThreadFactory$1$1.run(ApiProxyImpl.java:1152)
at java.security.AccessController.doPrivileged(Native Method)
at com.google.apphosting.runtime.ApiProxyImpl$CurrentRequestThreadFactory$1.run(ApiProxyImpl.java:1146)
at java.lang.Thread.run(Thread.java:745)
at com.google.apphosting.runtime.ApiProxyImpl$CurrentRequestThreadFactory$2$1.run(ApiProxyImpl.java:1195)
I tried directly uploading the jar file - appengine-api-1.0-sdk-1.9.20.jar, still it tries to upload this jar appengine-api-L4wtoWwoElWmstI1Ia93cg.jar.
which i dont know what jar it is. Any idea on what this jar is appreciated.
Please help me to fix this issue.
The short answer is that if you use AppEngine on a Managed VM you will not encounter the AppEngine sandbox limits (OOM when using a F1 or B1 instance class, execution time limit issues, whitelisted JRE classes). If you really want to run within the App Engine sandbox, then your use of the Dataflow SDK most conform to the limits of the AppEngine sandbox. Below I explain common issues and what people have done to conform to the AppEngine sandbox limits.
The Dataflow SDK requires an AppEngine instance class which has enough memory to execute the users application to construct the pipeline, stage any resources, and send the job description to the Dataflow service. Typically we have seen users require using an instance class with more than 128mb of memory to not see OOM errors.
Generally constructing a pipeline and submitting it to the Dataflow service typically takes less than a couple of seconds if the required resources for your application are already staged. Uploading your JARs and any other resources to GCS can take longer than 60 seconds. This can be solved manually by pre-staging your JARs to GCS beforehand (the Dataflow SDK will skip staging them again if it detects they are already there) or using a task queue to get a 10 minute limit (note that for large applications, 10 mins may not be enough to stage all your resources).
Finally, within the AppEngine sandbox environment, you and all your dependencies are limited to using only whitelisted classes within the JRE or you'll get an exception like:
java.lang.SecurityException:
java.lang.IllegalAccessException: YYY is not allowed on ZZZ
...
EDIT 1
We perform a hash of the contents of the jars on the classpath and upload them to GCS with a modified filename. AppEngine runs a sandboxed environment with its own JARs, appengine-api-L4wtoWwoElWmstI1Ia93cg.jar refers to appengine-api.jar which is a jar that the sandboxed environment adds. You can see from our PackageUtil#getUniqueContentName(...) that we just append -$HASH before .jar.
We are working to solve why you are seeing the RequestPayloadToLarge excepton and it is currently recommended that you set the filesToStage option and filter out the jars not required to execute your Dataflow to get around the issue that you face. You can see how we build the files to stage with DataflowPipelineRunner#detectClassPathResourcesToStage(...).
I had the same problem with the 10MB limit. What I did was filtering out the JAR files bigger than that limit (instead of specific files), and then set the renaming files in the DataflowPipelineOptions with setFilesToStage.
So I just copied the method detectClassPathResourcesToStage from the Dataflow SDK and changed it sightly:
private static final long FILE_BYTES_THRESHOLD = 10 * 1024 * 1024; // 10 MB
protected static List<String> detectClassPathResourcesToStage(ClassLoader classLoader) {
if (!(classLoader instanceof URLClassLoader)) {
String message = String.format("Unable to use ClassLoader to detect classpath elements. "
+ "Current ClassLoader is %s, only URLClassLoaders are supported.", classLoader);
throw new IllegalArgumentException(message);
}
List<String> files = new ArrayList<>();
for (URL url : ((URLClassLoader) classLoader).getURLs()) {
try {
File file = new File(url.toURI());
if (file.length() < FILE_BYTES_THRESHOLD) {
files.add(file.getAbsolutePath());
}
} catch (IllegalArgumentException | URISyntaxException e) {
String message = String.format("Unable to convert url (%s) to file.", url);
throw new IllegalArgumentException(message, e);
}
}
return files;
}
And then when I'm creating the DataflowPipelineOptions:
DataflowPipelineOptions dataflowOptions = PipelineOptionsFactory.create().as(DataflowPipelineOptions.class);
...
dataflowOptions.setFilesToStage(detectClassPathResourcesToStage(DataflowPipelineRunner.class.getClassLoader()));
Here's a version of Helder's 10MB-filtering solution that will adapt to the default file-staging behavior of DataflowPipelineOptions even if it changes in a future version of the SDK.
Instead of duplicating the logic, it passes a throwaway copy of the DataflowPipelineOptions to DataflowPipelineRunner to see which files it would have staged, then removes any that are too big.
Note that this code assumes that you've defined a custom PipelineOptions class named MyOptions, along with a java.util.Logger field named logger.
// The largest file size that can be staged to the dataflow service.
private static final long MAX_STAGED_FILE_SIZE_BYTES = 10 * 1024 * 1024;
/**
* Returns the list of .jar/etc files to stage based on the
* Options, filtering out any files that are too large for
* DataflowPipelineRunner.
*
* <p>If this accidentally filters out a necessary file, it should
* be obvious when the pipeline fails with a runtime link error.
*/
private static ImmutableList<String> getFilesToStage(MyOptions options) {
// Construct a throw-away runner with a copy of the Options to see
// which files it would have wanted to stage. This could be an
// explicitly-specified list of files from the MyOptions param, or
// the default list of files determined by DataflowPipelineRunner.
List<String> baseFiles;
{
DataflowPipelineOptions tmpOptions =
options.cloneAs(DataflowPipelineOptions.class);
// Ignore the result; we only care about how fromOptions()
// modifies its parameter.
DataflowPipelineRunner.fromOptions(tmpOptions);
baseFiles = tmpOptions.getFilesToStage();
// Some value should have been set.
Preconditions.checkNotNull(baseFiles);
}
// Filter out any files that are too large to stage.
ImmutableList.Builder<String> filteredFiles = ImmutableList.builder();
for (String file : baseFiles) {
long size = new File(file).length();
if (size < MAX_STAGED_FILE_SIZE_BYTES) {
filteredFiles.add(file);
} else {
logger.info("Not staging large file " + file + ": length " + size
+ " >= max length " + MAX_STAGED_FILE_SIZE_BYTES);
}
}
return filteredFiles.build();
}
/** Runs the processing pipeline with given options. */
public void runPipeline(MyOptions options)
throws IOException, InterruptedException {
// DataflowPipelineRunner can't stage large files;
// remove any from the list.
DataflowPipelineOptions dpOpts =
options.as(DataflowPipelineOptions.class);
dpOpts.setFilesToStage(getFilesToStage(options));
// Run the pipeline as usual using "options".
// ...
}

Knowledge & Connect PHP API, Found object(Account or Answer) but contains only null fields

I'm facing some strange issues when I try to fetch(Connect PHP API)/searchContent(Knowledge Foundation API) following the tutorials/documentations.
Behaviour and output
Following the documentation, we initialize the API. The function error_get_last() (called after the fetch) states that the core read-only file (we are not allowed to modify it) contains an error:
Array ( [type] => 8 [message] => Undefined index: REDIRECT_URL [file] => /cgi-bin/${interface_name}.cfg/scripts/cp/core/framework/3.2.4/init.php [line] => 246 )
After initialization, we call the fetch function to retrieve an account. If we give a wrong ID, it returns an error:
Invalid ID: No such Account with ID = 32
Otherwise, furnishing a correct ID returns an Account object with all fields populated as NULL:
object(RightNow\Connect\v1_2\Account)#22 (25) {
["ID"]=>
NULL
["LookupName"]=>
NULL
["CreatedTime"]=>
NULL
["UpdatedTime"]=>
NULL
["AccountHierarchy"]=>
NULL
["Attributes"]=>
NULL
["Country"]=>
NULL
["CustomFields"]=>
NULL
["DisplayName"]=>
NULL
["DisplayOrder"]=>
NULL
["EmailNotification"]=>
NULL
["Emails"]=>
NULL
["Login"]=>
NULL
/* [...] */
["StaffGroup"]=>
NULL
}
Attempts, workaround and troubleshooting information
Configuration: The account used using the InitConnectAPI() has the permissions
Initialization: Call to InitConnectAPI() not throwing any exception(added a try - catch block)
Call to the fetch function: As said above, the call to RNCPHP\Account::fetch($act_id) finds the account (invalid_id => error) but doesn't manage to populate the fields
No exception is thrown on the RNCPHP::fetch($correct_id) call
The behaviour is the same when I try to retrieve an answer following a sample example from the Knowledge Foundation API : $token = \RNCK::StartInteraction(...) ; \RNCK::searchContent($token, 'lorem ipsum');
Using PHP's SoapClient, I manage to retrieve populated objects. However, It's not part of the standard and a self-call-local-WebService is not a good practice.
Code reproducing the issue
error_reporting(E_ALL);
require_once(get_cfg_var('doc_root') . '/include/ConnectPHP/Connect_init.phph');
InitConnectAPI();
use RightNow\Connect\v1_2 as RNCPHP;
/* [...] */
try
{
$fetched_acct = RNCPHP\Account::fetch($correct_usr_id);
} catch ( \Exception $e)
{
echo ($e->getMessage());
}
// Dump part
echo ("<pre>");
var_dump($fetched_acct);
echo ("</pre>");
// The core's error on which I have no control
print_r(error_get_last());
Questions:
Have any of you face the same issue ? What is the workaround/fix which would help me solve it ?
According to the RNCPHP\Account::fetch($correct_usr_id) function behaviour, we can surmise that the issue comes from the 'fields populating' step which might be part of the core (on which I have no power). How am I supposed to deal with this (fetch is static and account doesn't seem abstract) ?
I tried to use the debug_backtrace() function in order to have some visibility on what may go wrong but it doesn't output relevant information. Is there any way I can get more debug information ?
Thanks in advance,
Oracle Service Cloud uses lazy loading to populate the object variables from queried data using Connect for PHP APIs. When you output the result of an object, it will appear as each variable is empty, per your example. However, if you access the parameter, then it becomes available. This is only an issue when you try to print your object, like this example. Accessing the data should be immediate.
To print your object, like in your example, you would need to iterate through the object variables and access each one first. You could build a helper class to do that through reflection. But, to illustrate with a single field, do the following:
$acct = RNCPHP\Account::fetch($correctId);
$acct->ID;
print_r($acct); // Will now "show" ID, but none of the other fields have been loaded.
In the real world, you probably just want to operate on the data. So, even though you cannot "see" the data in the object, it's there. In the example below, we're accessing the updated time of the account and then performing an action on the object if it meets a condition.
//Set to disabled if last updated < 90 days ago
$acct = RNCPHP\Account::fetch($correctId);
$chkDate = time() - 7776000;
if($acct->UpdatedTime < $chkDate){
$acct->Attributes->PermanentlyDisabled = true;
$acct->save(RNCPHP\RNObject::SuppressAll);
}
If you were to print_r the object after the if condition, then you would see the UpdatedTime variable data because it was loaded at the condition check.

API Client 1.3 (rev89) - Error 500 "No individual errors" when using Fields Filter

Today (10.00 AM GMT+2) the code deployed in a production environment, started throwing an increasing number of errors while requesting file lists from a Google Drive folder, the error was always 500 "No Individual Errors".
After 2 hours, all the request failed.
The code regarding the file list request is the following:
'Search for a specific file name
oListReq.Q = "mimeType = 'application/vnd.google-apps.folder' and title = '" + ParentFolder + "' and trashed=false"
oListReq.Fields = "items/id" 'MSO - 20130621 - only ID is needed
oListReq.MaxResults = 10 'Max 10 files (too many I Expect only 1)
'Get the results
oFileList = oListReq.Fetch()
Testing the same requests with the API Explorer there is no problem and only the ID is returned.
Going step by step trying to identify the problem, turns out that all the requests with the Fields field specified generated a 500 error (other requests in the code have "items(id,alternateLink)" but the result is the same as the code above).
Temporary fixed the code commenting those lines.
Could you please investigate why this filters are not working with the .Net Client Library anymore?
Sorry for that. This error has been reproduced and Google is investigating on this. For now, please turn off fields filter.
It seems the issue is now fixed. We had the same issue with one of our production application, we had to produce a hot fix, but I performed a test a few minutes ago and it looks like it works again.