BigQuery: Encountered an error while globbing file pattern - google-bigquery

I queried a federated table with data in Google spreadsheet. Following recommendations in issue 720
https://code.google.com/p/google-bigquery/issues/detail?id=720
I've created following code:
Set<String> scopes = new HashSet<>();
scopes.add(BigqueryScopes.BIGQUERY);
scopes.add("https://www.googleapis.com/auth/drive");
scopes.add("https://www.googleapis.com/auth/spreadsheets");
final HttpTransport transport= new NetHttpTransport();
final JsonFactory jsonFactory= new JacksonFactory();
GoogleCredential credential = new GoogleCredential.Builder()
.setTransport(transport).setJsonFactory(jsonFactory)
.setServiceAccountId(GC_CREDENTIALS_ACCOUNT_EMAIL)
.setServiceAccountScopes(scopes)
.setServiceAccountPrivateKey(getPrivateKey())
.build();
String omgsql = "SELECT * FROM [<myproject>:<mydataset>.failures] LIMIT 1000";
JobReference jobIdomg = startQuery(bigquery, "<myproject>", omgsql);
// Poll for Query Results, return result output
Job completedJobomg = checkQueryResults(bigquery, "<myproject>", jobIdomg);
GetQueryResultsResponse queryResultomg = bigquery.jobs()
.getQueryResults(
"<myproject>", completedJobomg
.getJobReference()
.getJobId()
).execute();
List<TableRow> rowsomg = queryResultomg.getRows();
Without https://www.googleapis.com/auth/drive scope job fails immediately after inserting, with it - fails on completion.
Inserting Query Job: SELECT * FROM [<myproject>:<mydataset>.failures] LIMIT 1000
Job ID of Query Job is: job_S3-fY5jrb4P3UhVgNGeRkDYQofg
Job status (194ms) job_S3-fY5jrb4P3UhVgNGeRkDYQofg: RUNNING
Job status (1493ms) job_S3-fY5jrb4P3UhVgNGeRkDYQofg: RUNNING
Job status (2686ms) job_S3-fY5jrb4P3UhVgNGeRkDYQofg: RUNNING
...
Job status (29881ms) job_S3-fY5jrb4P3UhVgNGeRkDYQofg: DONE
Exception in thread "main" com.google.api.client.googleapis.json.GoogleJsonResponseException: 400 Bad Request
{
"code" : 400,
"errors" : [ {
"domain" : "global",
"location" : "/gdrive/id/1T4qNgi9vFJF4blK4jddYf8XlfT6uDiqNpTExWf1NMyY",
"locationType" : "other",
"message" : "Encountered an error while globbing file pattern.",
"reason" : "invalid"
} ],
"message" : "Encountered an error while globbing file pattern."
}
at com.google.api.client.googleapis.json.GoogleJsonResponseException.from(GoogleJsonResponseException.java:145)
So the question here - what else am I missing? Or is it just a bigquery bug?

Ok, after a day of experiments - the account you are using to obtain Google Credential should have access to the file(s) on which external table is created. Hope this will help someone.

Related

Gatling feeder/parameter issue - Exception in thread "main" java.lang.UnsupportedOperationException

I just involved the new project for API test for our service by using Gatling. At this point, I want to search query, below is the code:
def chnSendToRender(testData: FeederBuilderBase[String]): ChainBuilder = {
feed(testData)
exec(api.AdvanceSearch.searchAsset(s"{\"all\":[{\"all:aggregate:text\":{\"contains\":\"#{edlAssetId}_Rendered\"}}]}", "#{authToken}")
.check(status.is(200).saveAs("searchStatus"))
.check(jsonPath("$..asset:id").findAll.optional.saveAs("renderedAssetList"))
)
.doIf(session => session("searchStatus").as[Int] == 200) {
exec { session =>
printConsoleLog("Rendered Asset ID List: " + session("renderedAssetList").as[String], "INFO")
session
}
}
}
I declared the feeder already in the simulation scala file:
class GVRERenderEditor_new extends Simulation {
private val edlToRender = csv("data/render/edl_asset_ids.csv").queue
private val chnPostRender = components.notifications.notice.JobsPolling_new.chnSendToRender(edlToRender)
private val scnSendEDLForRender = scenario("Search Post Render")
.exitBlockOnFail(exec(preSimAuth))
.exec(chnPostRender)
setUp(
scnSendEDLForRender.inject(atOnceUsers(1)).protocols(httpProtocol)
)
.maxDuration(sessionDuration.seconds)
.assertions(global.successfulRequests.percent.is(100))
}
But Gatling test failed to run, showing this error: Exception in thread "main" java.lang.UnsupportedOperationException: There were no requests sent during the simulation, reports won't be generated
If I hardcode the #{edlAssetId} (put the real edlAssetId in that query), I will get result. I think I passed the parameter wrongly in this case. I've tried to print the output in console log but no luck. What's wrong with this code? I would appreciate your help. Thanks!
feed(testData)
exec(api.AdvanceSearch.searchAsset(s"{\"all\":[{\"all:aggregate:text\":{\"contains\":\"#{edlAssetId}_Rendered\"}}]}", "#{authToken}")
.check(status.is(200).saveAs("searchStatus"))
.check(jsonPath("$..asset:id").findAll.optional.saveAs("renderedAssetList"))
)
You're missing a . (dot) before the exec to attach it to the feed.
As a result, your method is returning the last instruction, ie the exec only.

How do I programatically track error messages/

I have a Lookup activity with a Failure output which executes a Stored Procedure activity. The Stored Procedure activity logs the failure. How do I programmatically get the name of the erroring Lookup activity and also the error message as in input parameters to the Stored Procedure activity? Thanks.
You could follow the example sdk code here to get the error messages from the activity which runs in the pipeline.
1.Run the Lookup activity pipeline.
CreateRunResponse runResponse = client.Pipelines.CreateRunWithHttpMessagesAsync(resourceGroup, dataFactoryName, pipelineName).Result.Body;
Console.WriteLine("Pipeline run ID: " + runResponse.RunId);
2.Get the error message if it crashed into some issues.
List<ActivityRun> activityRuns = client.ActivityRuns.ListByPipelineRun(
resourceGroup, dataFactoryName, runResponse.RunId, DateTime.UtcNow.AddMinutes(-10), DateTime.UtcNow.AddMinutes(10)).ToList();
if (pipelineRun.Status == "Succeeded")
Console.WriteLine(activityRuns.First().Output);
else
Console.WriteLine(activityRuns.First().Error);
3.Then run another sp activity pipeline with the above messages as parameters.
Dictionary<string, object> parameters = new Dictionary<string, object>
{
{ "errorMessage", activityRuns.First().Error}
};
CreateRunResponse runResponse = client.Pipelines.CreateRunWithHttpMessagesAsync(resourceGroup, dataFactoryName, pipelineName, parameters: parameters).Result.Body;
Console.WriteLine("Pipeline run ID: " + runResponse.RunId);

Not able to establish database connection -dberror(Connection.prepareStatement): 258 - insufficient privilege: Not authorized

I am new to SAP Hana cloud environment and was trying to learn sentiment analysis using Hana cloud platform. I am using the following code in my .xsjs script :
var body = "error";
var data = {
result : 0
};
var id = Number($.request.parameters.get("id"));
var word = $.request.parameters.get("word");
if(word.length!==0) {
try {
var conn = $.db.getConnection();
var query = 'call \"com.hana.cloud.platform.TwitterSenitmentAnalysis.DatabaseStore::update\"(?,?)';
var cst = conn.prepareCall(query);
cst.setString(1, word);
cst.setInteger(2, id);
var rs = cst.execute();
conn.commit();
rs = cst.getResultSet();
while(rs.next()) {
data.result = rs.getInteger(1);
}
body = JSON.stringify(data);
rs.close();
cst.close();
conn.close();
} catch (e) {
body = e.stack + e.message;
$.response.status = $.net.http.BAD_REQUEST;
conn.close();
}
}
I am able to use another xsjs service to connect to the database and perform select however when I try to perform update it gives me the following error :
Not able to establish database connection -dberror(Connection.prepareStatement): 258 - insufficient privilege: Not authorized
The schema name I am working with is called AMRIT and the user is called AMRIT as well. While trying to give object privileges to the AMRIT schema for the update I get the following error in the hana database cockpit:
8:07:22 PM (Security Editor) Changing 'AMRIT' user failed: 404 - Granting privilege 'UPDATE' on SCHEMA 'AMRIT' failed: insufficient privilege: Not authorized
please assist on how to solve this?
Should I be giving any additional privilege to the system user?
Thanks, regards
Please give insert privilege on your schema to user _SYS_REPO.

Unloading data from Amazon redshift to Amazon s3

I am trying to use the following code to unload data into S3 bucket. Which works but after unloading it throws some error.
Properties props = new Properties();
props.setProperty("user", MasterUsername);
props.setProperty("password", MasterUserPassword);
conn = DriverManager.getConnection(dbURL, props);
stmt = conn.createStatement();
String sql;
sql = "unload('select * from part where p_partkey in (select p_partkey from
part limit 10)') to"
+ " 's3://redshiftdump.****' "
+ " DELIMITER AS ','"
+ "ADDQUOTES "
+ "NULL AS ''"
+ "credentials 'aws_access_key_id=****;aws_secret_access_key=***' "
+ "parallel off" +
";";
boolean i = stmt.execute(sql);
stmt.close();
conn.close();
The unloading works. It is creating a file in the bucket. But it is giving me some error
java.sql.SQLException:
dataengine.impl.DSISimpleRowCountResult cannot be cast to
com.amazon.dsi.dataengine.interfaces.IResultSet
at
com.amazon.redshift.core.jdbc42.PGJDBC42Statement.createResultSet(Unknown
Source)
at com.amazon.jdbc.common.SStatement.executeQuery(Unknown Source)
what is this error and how to avoid it? Is there any way to dump the table in CSV format. Right now it is dumping the file in FILE format.
You say the UNLOAD works but you receive this error, that suggests to me that you are connecting successfully but there is an problem in the way your code interacts with the JDBC driver when the query completes.
We provide an example that may be helpful in our documentation on the page "Connect to Your Cluster Programmatically"
Regarding the output file format, you will get whatever is specified in your UNLOAD SQL but the filename will have a suffix (for example "000" or "6411_part_00") to indicate which part of the UNLOAD it is.
use executeUpdate .
def runQuery(sql: String) = {
Class.forName("com.amazon.redshift.jdbc.Driver")
val connection = DriverManager.getConnection(url, username, password)
var statement: Statement = null
try {
statement = connection.createStatement()
statement.setQueryTimeout(redshiftTimeoutInSeconds)
val result = statement.executeUpdate(sql)
logger.info(s"statement response code : ${result}")
} catch {
case e: Exception => {
logger.error(s"statement.isCloseOnCompletion :${e.getMessage} ::: ${e.printStackTrace()}")
throw new IngestionException(e.getMessage)
}
}
finally {
if(statement != null ) statement.close()
connection.close()
}
}

export data from bigquery to cloud storage- php client library - there is one extra empty new line in the cloud storage file

I followed this sample
https://cloud.google.com/bigquery/docs/exporting-data
public function exportDailyRecordsToCloudStorage($date, $tableId)
{
$validTableIds = ['table1', 'table2'];
if (!in_array($tableId, $validTableIds))
{
die("Wrong TableId");
}
$date = date("Ymd", date(strtotime($date)));
$datasetId = $date;
$dataset = $this->bigQuery->dataset($datasetId);
$table = $dataset->table($tableId);
// load the storage object
$storage = $this->storage;
$bucketName = 'mybucket';
$objectName = "daily_records/{$tableId}_" . $date;
$destinationObject = $storage->bucket($bucketName)->object($objectName);
// create the import job
$format = 'NEWLINE_DELIMITED_JSON';
$options = ['jobConfig' => ['destinationFormat' => $format]];
$job = $table->export($destinationObject, $options);
// poll the job until it is complete
$backoff = new ExponentialBackoff(10);
$backoff->execute(function () use ($job) {
print('Waiting for job to complete' . PHP_EOL);
$job->reload();
if (!$job->isComplete()) {
//throw new Exception('Job has not yet completed', 500);
}
});
// check if the job has errors
if (isset($job->info()['status']['errorResult'])) {
$error = $job->info()['status']['errorResult']['message'];
printf('Error running job: %s' . PHP_EOL, $error);
} else {
print('Data exported successfully' . PHP_EOL);
}
I have 37670 rows in my table1, and the cloud storage file has 37671 lines.
And I have 388065 my table2, and the cloud storage file has 388066 lines.
The last line in both cloud storage files is empty line.
Is this a Google BigQuery feature improvement request? or I did something wrong in my codes above?
What you described seems like an unexpected outcome. The output file should generally has the same number of lines as the source table.
Your PHP code looks fine and shouldn't be the cause of the issue.
I'm trying reproduce it but unable to. Could you double-check if the last empty line is somehow added by another tool like a text editor or something? How are you counting the lines of the resulting output.
If you have ruled that out and are sure the newline is indeed added by BigQuery export feature, please consider opening a bug using the BigQuery Issue Tracker as suggested by xuejian and include your job ID so that we can investigate further.