BigQueryIO returns TypedRead<TableRow> instead of PCollection<TableRow>. How to get the real data? - google-bigquery

I have a problem with retrieving a data from bigquery table inside DoFn. I can't find example to extract values from TypedRead.
This is a simplified pipeline. I would like to check does record with target SSN exists or not in bigquery table. The target SSN will be received via pubsub in real pipeline, I have replaced it with array of strings.
final BigQueryIoTestOptions options = PipelineOptionsFactory.fromArgs(args).withValidation().as(BigQueryIoTestOptions.class);
final List<String> SSNs = Arrays.asList("775-89-3939");
Pipeline p = Pipeline.create(options);
PCollection<String> ssnCollection = p.apply("GetSSNParams", Create.of(SSNs)).setCoder(StringUtf8Coder.of());
ssnCollection.apply("SelectFromBQ", ParDo.of(new DoFn<String, TypedRead<TableRow>>() {
#ProcessElement
public void processElement(ProcessContext c) throws Exception {
TypedRead<TableRow> tr =
BigQueryIO.readTableRows()
.fromQuery("SELECT pid19PatientSSN FROM dataset.table where pid19PatientSSN = '" + c.element() + "' LIMIT 1");
c.output(tr);
}
}))
.apply("ParseResponseFromBigQuery", ParDo.of(new DoFn<TypedRead<TableRow>, Void>() {
#ProcessElement
public void processElement(ProcessContext c) throws Exception {
System.out.println(c.element().toString());
}
}));
p.run();

Big query returns PCollection only, we can get the result as entry set like the below example or we can serialize to objects as well like mentioned here
If you want to query from BigQuery middle of your pipeline use BigQuery instead of BigQueryIO like mentioned here
BigQueryIO Example:
PipelineOptions options = PipelineOptionsFactory.fromArgs(args).create();
Pipeline pipeline = Pipeline.create(options);
PCollection<TableRow> result = pipeline.apply(BigQueryIO.readTableRows()
.fromQuery("SELECT id, name FROM [project-test:test_data.test] LIMIT 1"));
result.apply(MapElements.via(new SimpleFunction<TableRow, Void>() {
#Override
public Void apply(TableRow obj) {
System.out.println("***" + obj);
obj.entrySet().forEach(
(k)-> {
System.out.println(k.getKey() + " :" + k.getValue());
}
);
return null;
}
}));
pipeline.run().waitUntilFinish();
BigQuery Example:
// [START bigquery_simple_app_client]
BigQuery bigquery = BigQueryOptions.getDefaultInstance().getService();
// [END bigquery_simple_app_client]
// [START bigquery_simple_app_query]
QueryJobConfiguration queryConfig =
QueryJobConfiguration.newBuilder(
"SELECT "
+ "CONCAT('https://stackoverflow.com/questions/', CAST(id as STRING)) as url, "
+ "view_count "
+ "FROM `bigquery-public-data.stackoverflow.posts_questions` "
+ "WHERE tags like '%google-bigquery%' "
+ "ORDER BY favorite_count DESC LIMIT 10")
// Use standard SQL syntax for queries.
// See: https://cloud.google.com/bigquery/sql-reference/
.setUseLegacySql(false)
.build();
// Create a job ID so that we can safely retry.
JobId jobId = JobId.of(UUID.randomUUID().toString());
Job queryJob = bigquery.create(JobInfo.newBuilder(queryConfig).setJobId(jobId).build());
// Wait for the query to complete.
queryJob = queryJob.waitFor();
// Check for errors
if (queryJob == null) {
throw new RuntimeException("Job no longer exists");
} else if (queryJob.getStatus().getError() != null) {
// You can also look at queryJob.getStatus().getExecutionErrors() for all
// errors, not just the latest one.
throw new RuntimeException(queryJob.getStatus().getError().toString());
}
// [END bigquery_simple_app_query]
// [START bigquery_simple_app_print]
// Get the results.
QueryResponse response = bigquery.getQueryResults(jobId);
TableResult result = queryJob.getQueryResults();
// Print all pages of the results.
for (FieldValueList row : result.iterateAll()) {
String url = row.get("url").getStringValue();
long viewCount = row.get("view_count").getLongValue();
System.out.printf("url: %s views: %d%n", url, viewCount);
}
// [END bigquery_simple_app_print]

Related

Passing Variables into Dapper are query parameters... "A request to send or receive data was disallowed because the socket is not connected

Ok , I"m doing what looks like a simple Dapper query
but if I pass in my studId parameter, it blows up with this low level networking exception:
{"A request to send or receive data was disallowed because the socket is not connected and (when sending on a datagram socket using a sendto call) no address was supplied."}
if I comment out the line of sql that uses it, fix the where clause, ,and comment out the line where it's added the the parameters object. It retrieves rows as expected.
I've spent the last 2.5 days trying everything I could think of, changing the names to match common naming patterns, changing type to a string (just gave a error converting string to number), yadda yadda yadda..
I'm at a complete loss as to why it doesn't like that parameter, I look at other code that works and they pass Id's just fine...
At this point I figure it has to be an ID-10-t that's staring me in the face and I'm just assuming my way right past it with out seeing it.
Any help is appreciated
public List<StudDistLearnSchedRawResponse> GetStudDistanceLearningScheduleRaw( StudDistLearnSchedQueryParam inputs )
{
var aseSqlConnectionString = Configuration.GetConnectionString( "SybaseDBDapper" );
string mainSql = " SELECT " +
" enrollment.stud_id, " +
" sched_start_dt, " +
" sched_end_dt, " +
" code.code_desc, " +
" student_schedule_dl.enrtype_id, " +
" student_schedule_dl.stud_sched_dl_id, " +
" dl_correspond_cd, " +
" course.course_name, " +
" stud_course_sched_dl.sched_hours, " +
" actual_hours, " +
" course_comments as staff_remarks, " + // note this column rename - EWB
" stud_course_sched_dl.sched_item_id , " +
" stud_course_sched_dl.stud_course_sched_dl_id " +
" from stud_course_sched_dl " +
" join student_schedule_dl on student_schedule_dl.stud_sched_dl_id = stud_course_sched_dl.stud_sched_dl_id " +
" join course on stud_course_sched_dl.sched_item_id = course.sched_item_id " +
" left join code on student_schedule_dl.dl_correspond_cd = code.code_id " +
" join enrollment_type on student_schedule_dl.enrtype_id = enrollment_type.enrtype_id " +
" join enrollment on enrollment_type.enr_id = enrollment.enr_id " +
" where enrollment.stud_id = #studId " +
" and sched_start_dt >= #startOfWeek" +
" and sched_end_dt <= #startOfNextWeek";
DapperTools.DapperCustomMapping<StudDistLearnSchedRawResponse>();
//string sql = query.ToString();
DateTime? startOfWeek = StartOfWeek( inputs.weekStartDateTime, DayOfWeek.Monday );
DateTime? startOfNextWeek = StartOfWeek( inputs.weekStartDateTime.Value.AddDays( 7 ) , DayOfWeek.Monday );
try
{
using ( IDbConnection db = new AseConnection( aseSqlConnectionString ) )
{
db.Open();
var arguments = new
{
studId = inputs.StudId, // it chokes and gives a low level networking error - EWB
startOfWeek = startOfWeek.Value.ToShortDateString(),
startOfNextWeek = startOfNextWeek.Value.ToShortDateString(),
};
List<StudDistLearnSchedRawResponse> list = new List<StudDistLearnSchedRawResponse>();
list = db.Query<StudDistLearnSchedRawResponse>( mainSql, arguments ).ToList();
return list;
}
}
catch (Exception ex)
{
Trace.WriteLine(ex.ToString());
return null;
}
}
Here is the input object
public class StudDistLearnSchedQueryParam
{
public Int64 StudId;
public DateTime? weekStartDateTime;
}
Here is the dapper tools object which just abstracts some ugly code to look nicer.
namespace EricSandboxVue.Utilities
{
public interface IDapperTools
{
string ASEConnectionString { get; }
AseConnection _aseconnection { get; }
void ReportSqlError( ILogger DalLog, string sql, Exception errorFound );
void DapperCustomMapping< T >( );
}
public class DapperTools : IDapperTools
{
public readonly string _aseconnectionString;
public string ASEConnectionString => _aseconnectionString;
public AseConnection _aseconnection
{
get
{
return new AseConnection( _aseconnectionString );
}
}
public DapperTools( )
{
_aseconnectionString = Environment.GetEnvironmentVariable( "EIS_ASESQL_CONNECTIONSTRING" );
}
public void ReportSqlError( ILogger DalLog, string sql, Exception errorFound )
{
DalLog.LogError( "Error in Sql" );
DalLog.LogError( errorFound.Message );
//if (env.IsDevelopment())
//{
DalLog.LogError( sql );
//}
throw errorFound;
}
public void DapperCustomMapping< T >( )
{
// custom mapping
var map = new CustomPropertyTypeMap(
typeof( T ),
( type, columnName ) => type.GetProperties( ).FirstOrDefault( prop => GetDescriptionFromAttribute( prop ) == columnName )
);
SqlMapper.SetTypeMap( typeof( T ), map );
}
private string GetDescriptionFromAttribute( System.Reflection.MemberInfo member )
{
if ( member == null ) return null;
var attrib = (Dapper.ColumnAttribute) Attribute.GetCustomAttribute( member, typeof(Dapper.ColumnAttribute), false );
return attrib == null ? null : attrib.Name;
}
}
}
If I change the SQL string building to this(below), but leave everything else the same(Including StudId in the args struct)... it doesn't crash and retrieves rows, so it's clearly about the substitution of #studId...
// " where enrollment.stud_id = #studId " +
" where sched_start_dt >= #startOfWeek" +
" and sched_end_dt <= #startOfNextWeek";
You name your data members wrong. I had no idea starting a variable name with # was possible.
The problem is here:
var arguments = new
{
#studId = inputs.StudId, // it chokes and gives a low level networking error - EWB
#startOfWeek = startOfWeek.Value.ToShortDateString(),
#startOfNextWeek = startOfNextWeek.Value.ToShortDateString(),
};
It should have been:
var arguments = new
{
studId = inputs.StudId, // it chokes and gives a low level networking error - EWB
startOfWeek = startOfWeek.Value.ToShortDateString(),
startOfNextWeek = startOfNextWeek.Value.ToShortDateString(),
};
The # is just a hint to Dapper, that it should replace with a corresponding member name.
## has special meaning in some SQL dialects, that's probably what makes the trouble.
So here's what I Found out.
The Sybase implementation has a hard time with Arguments.
It especially has a hard time with arguments of type int64 (this existed way pre .NetCore)
So If you change the type of the passed in argument from int64 to int32, everything works fine.
You can cast it, or just change the type of the method parameter

BigQuery Java API not returning all rows when we execute query through it

We are facing one intermittent issue where when we execute a query though BigQuery Java API then the number of rows that we get doesn't match with when we execute the same query through BigQuery UI.
In our code, we are using QueryResponse object for executing a query and we also check whether query is completed or not by checking the flag
GetQueryResultsResponse.getJobComplete(), we also have mechanism to pull more records if the query is not returning all rows in one short while(queryResult.getRows() != null && queryResult.getTotalRows().compareTo(BigInteger.valueOf((queryResult.getRows().size()))) > 0) {
Following is the piece of code which we use for executing the query:
int retryCount = 0;
long waitTime = Constant.BASE_WAIT_TIME;
Bigquery bigquery = cloudPlatformConnector.connectBQ();
QueryRequest queryRequest = new QueryRequest();
queryRequest.setUseLegacySql(useLegacyDialect);
GetQueryResultsResponse queryResult = null;
GetQueryResultsResponse queryPaginationResult = null;
String pageToken;
do{
try{
QueryResponse query = bigquery.jobs().query(this.projectId, queryRequest.setQuery(querySql)).execute();
queryResult = bigquery.jobs().getQueryResults(query.getJobReference().getProjectId(), query.getJobReference().getJobId()).execute();
if(queryResult != null ){
if(!queryResult.getJobComplete()){
LOGGER.info("JobId for the query : "+ query.getJobReference().getJobId() + " is Job Completed : "+ queryResult.getJobComplete());
if(queryResult.getErrors() != null){
for( ErrorProto err: queryResult.getErrors() ){
LOGGER.info("Errors in query, Reason : "+ err.getReason()+ " Location : "+ err.getLocation() +" Message : "+ err.getMessage());
}
}
LOGGER.info("Query not completed : "+querySql);
throw new IOException("Query is failing retrying it");
}
}
LOGGER.info("JobId for the query : "+ query.getJobReference().getJobId() + " is Job Completed : "+ queryResult.getJobComplete() + " Total rows from query : " + queryResult.getTotalRows());
pageToken = queryResult.getPageToken();
while(queryResult.getRows() != null && queryResult.getTotalRows().compareTo(BigInteger.valueOf((queryResult.getRows().size()))) > 0) {
LOGGER.info("Inside the Pagination code block, Page Token : "+pageToken);
queryPaginationResult = bigquery.jobs().getQueryResults(projectId,query.getJobReference().getJobId()).setPageToken(pageToken).setStartIndex(BigInteger.valueOf(queryResult.getRows().size())).execute();
queryResult.getRows().addAll(queryPaginationResult.getRows());
pageToken = queryPaginationResult.getPageToken();
LOGGER.info("Inside the Pagination code block, total size : "+ queryResult.getTotalRows() + " Current Size : "+ queryResult.getRows().size());
}
}catch(IOException ex){
retryCount ++;
LOGGER.info("BQ Connection Attempt "+retryCount +" failed, Retrying in " + waitTime + " seconds");
if (retryCount == Constant.MAX_RETRY_LIMIT) {
LOGGER.info("BQ Connection Error", ex);
throw ex;
}
try {
Thread.sleep(waitTime);
} catch (InterruptedException e) {
LOGGER.info("Thread Error");
}
waitTime *= 2;
}
}while((queryResult == null && retryCount < Constant.MAX_RETRY_LIMIT ) || (!queryResult.getJobComplete() && retryCount < Constant.MAX_RETRY_LIMIT));
return queryResult.getRows();
The Query in which I am not getting all rows doesn't have any limit clause in it.
Currently we are using 0.5.0 version of google-cloud-bigquery.
Thanks in Advance!
I think on subsequent calls of getQueryResults, you need to call setPageToken properly with the pageToken returned from the previous page. Otherwise getQueryResults would just return the rows from the first page.

How to avoid to fetch a list of followers of the same Twitter user that was displayed before

I'm very new at coding and I'm having some issues. I'd like to display the followers of followers of ..... of followers of some specific users in Twitter. I have coded this and I can set a limit for the depth. But, while running the code with a small sample, I saw that I run into the same users again and my code re-display the followers of these users. How can I avoid this and skip to the next user? You can find my code below:
By the way, while running my code, I encounter with a 401 error. In the list I'm working on, there's a private user, and when my code catches that user, it stops. Additionally, how can I deal with this issue? I'd like to skip such users and prevent my code to stop.
Thank you for your help in advance!
PS: I know that I'll encounter with a 429 error working with a large sample. After fixing these issues, I'm planning to review relevant discussions to deal with.
public class mainJava {
public static Twitter twitter = buildConfiguration.getTwitter();
public static void main(String[] args) throws Exception {
ArrayList<String> rootUserIDs = new ArrayList<String>();
Scanner s = new Scanner(new File("C:\\Users\\ecemb\\Desktop\\rootusers1.txt"));
while (s.hasNextLine()) {
rootUserIDs.add(s.nextLine());
}
s.close();
for (String rootUserID : rootUserIDs) {
User rootUser = twitter.showUser(rootUserID);
List<User> userList = getFollowers(rootUser, 0);
}
}
public static List<User> getFollowers(User parent, int depth) throws Exception {
List<User> userList = new ArrayList<User>();
if (depth == 2) {
return userList;
}
IDs followerIDs = twitter.getFollowersIDs(parent.getScreenName(), -1);
long[] ids = followerIDs.getIDs();
for (long id : ids) {
twitter4j.User child = twitter.showUser(id);
userList.add(child);
getFollowers(child, depth + 1);
System.out.println(depth + "th user: " + parent.getScreenName() + " Follower: " + child.getScreenName());
}
return userList;
}
}
I guess graph search algorithms can be implemented for this particular issue. I chose Breadth First Search algorithm because visiting root user's followers at first would be better. You can check this link to additional information about algorithm.
Here is my implementation for your problem:
public List<User> getFollowers(User parent, int startDepth, int finalDepth) {
List<User> userList = new ArrayList<User>();
Queue<Long> queue = new LinkedList<Long>();
HashMap<Long, Integer> discoveredUserId = new HashMap<Long, Integer>();
try {
queue.add(parent.getId());
discoveredUserId.put(parent.getId(), 0);
while (!queue.isEmpty()) {
long userId = queue.remove();
int discoveredDepth = discoveredUserId.get(userId);
if (discoveredDepth == finalDepth) {
continue;
}
User user = twitter.showUser(userId);
handleRateLimit(user.getRateLimitStatus());
if (user.isProtected()) {
System.out.println(user.getScreenName() + "'s account is protected. Can't access followers.");
continue;
}
IDs followerIDs = null;
followerIDs = twitter.getFollowersIDs(user.getScreenName(), -1);
handleRateLimit(followerIDs.getRateLimitStatus());
long[] ids = followerIDs.getIDs();
for (int i = 0; i < ids.length; i++) {
if (!discoveredUserId.containsKey(ids[i])) {
discoveredUserId.put(ids[i], discoveredDepth + 1);
User child = twitter.showUser(ids[i]);
handleRateLimit(child.getRateLimitStatus());
userList.add(child);
if (discoveredDepth >= startDepth && discoveredDepth < finalDepth) {
System.out.println(discoveredDepth + ". user: " + user.getScreenName() + " has " + user.getFollowersCount() + " follower(s) " + (i + 1) + ". Follower: " + child.getScreenName());
}
queue.add(ids[i]);
} else {//prints to console but does not check followers. Just for data consistency
User child = twitter.showUser(ids[i]);
handleRateLimit(child.getRateLimitStatus());
if (discoveredDepth >= startDepth && discoveredDepth < finalDepth) {
System.out.println(discoveredDepth + ". user: " + user.getScreenName() + " has " + user.getFollowersCount() + " follower(s) " + (i + 1) + ". Follower: " + child.getScreenName());
}
}
}
}
} catch (TwitterException e) {
e.printStackTrace();
}
return userList;
}
//There definitely are more methods for handling rate limits but this worked for me well
private void handleRateLimit(RateLimitStatus rateLimitStatus) {
//throws NPE here sometimes so I guess it is because rateLimitStatus can be null and add this conditional expression
if (rateLimitStatus != null) {
int remaining = rateLimitStatus.getRemaining();
int resetTime = rateLimitStatus.getSecondsUntilReset();
int sleep = 0;
if (remaining == 0) {
sleep = resetTime + 1; //adding 1 more second
} else {
sleep = (resetTime / remaining) + 1; //adding 1 more second
}
try {
Thread.sleep(sleep * 1000 > 0 ? sleep * 1000 : 0);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
in this code HashMap<Long, Integer> discoveredUserId is used to prevent program checking same users repeatedly and storing in which depth we faced with this user.
and for private users, there is isProtected() method in twitter4j library.
Hope this implementation helps.

how to create a purchase order using BAPI : BAPI_PO_CREATE1

I am trying to create a Purchase Order using JCO3. I am able to execute my function without any error but i am not sure whats wrong system is not throwing any error and its not creating PO also in the SAP system.
JCoDestination destination = JCoDestinationManager.getDestination("ABAP_AS_WITHOUT_POOL");
JCoFunction createPurchaseOrderFunction = destination.getRepository().getFunction("BAPI_PO_CREATE1");
JCoFunction functionTransactionCommit = destination.getRepository().getFunction("BAPI_TRANSACTION_COMMIT");
// Input Header
JCoStructure poOrderHeader = createPurchaseOrderFunction.getImportParameterList().getStructure("POHEADER");
System.out.println("Header Structure" + poOrderHeader);
poOrderHeader.setValue("COMP_CODE", "0001");
poOrderHeader.setValue("DOC_TYPE", "NB");
//Date today = new Date();
SimpleDateFormat dateFormat = new SimpleDateFormat("dd.MM.yyyy");
//String date = dateFormat.format(today);
String dateinString = "20.08.2015";
Date date = dateFormat.parse(dateinString);
System.out.println("Date is: " + date);
poOrderHeader.setValue("CREAT_DATE",date);
poOrderHeader.setValue("VENDOR", "V544100170");
poOrderHeader.setValue("LANGU", "EN");
poOrderHeader.setValue("PURCH_ORG", "0005");
poOrderHeader.setValue("PUR_GROUP", "001");
poOrderHeader.setValue("CURRENCY", "INR");
// PO Items
JCoTable poItems = createPurchaseOrderFunction.getTableParameterList().getTable("POITEM");
poItems.appendRow();
poItems.setValue("PO_ITEM", "1");
poItems.setValue("MATERIAL", "ZZMT_TEST2");
poItems.setValue("PLANT", "Z111");
poItems.setValue("QUANTITY", "100");
poItems.setValue("NET_PRICE", "150");
try
{
JCoContext.begin(destination);
createPurchaseOrderFunction.execute(destination);
functionTransactionCommit.execute(destination);
functionTransactionCommit.getImportParameterList().setValue("WAIT", 10);
JCoContext.end(destination);
}
catch(Exception e)
{
throw e;
}
// Print the Return Structure Message
// JCoStructure returnStructure = createPurchaseOrderFunction.getExportParameterList().getStructure("RETURN");
// if (! (returnStructure.getString("TYPE").equals("")||returnStructure.getString("TYPE").equals("S")) )
// {
// throw new RuntimeException(returnStructure.getString("MESSAGE"));
// }
JCoTable table = createPurchaseOrderFunction.getTableParameterList().getTable("POITEM");
// Iterate over table and print JCOFiled
for(JCoField field : table)
{
System.out.println("Name: "+field.getName() + "----" + "Value:" + field.getValue());
System.out.println("------------------------------------------------------------------");
}
System.out.println("---------------------------------------------------------------------------------");
System.out.println("Table" + table);
You need to call BAPI_TRANSACTION_COMMIT at the end to complete the transaction

Google BigQuery returns only partial table data with C# application using .net Client Library

I am trying to execute the query (Basic select statement with 10 fields). My table contains more than 500k rows. C# application returns the response with only 4260 rows. However Web UI returns all the records.
Why my code returns only partial data, What is the best way to select all the records and load into C# Data Table? If there is any code snippet it would be more helpful to me.
using Google.Apis.Auth.OAuth2;
using System.IO;
using System.Threading;
using Google.Apis.Bigquery.v2;
using Google.Apis.Bigquery.v2.Data;
using System.Data;
using Google.Apis.Services;
using System;
using System.Security.Cryptography.X509Certificates;
namespace GoogleBigQuery
{
public class Class1
{
private static void Main()
{
try
{
Console.WriteLine("Start Time: {0}", DateTime.Now.ToString());
String serviceAccountEmail = "SERVICE ACCOUNT EMAIL";
var certificate = new X509Certificate2(#"KeyFile.p12", "notasecret", X509KeyStorageFlags.Exportable);
ServiceAccountCredential credential = new ServiceAccountCredential(
new ServiceAccountCredential.Initializer(serviceAccountEmail)
{
Scopes = new[] { BigqueryService.Scope.Bigquery, BigqueryService.Scope.BigqueryInsertdata, BigqueryService.Scope.CloudPlatform, BigqueryService.Scope.DevstorageFullControl }
}.FromCertificate(certificate));
BigqueryService Service = new BigqueryService(new BaseClientService.Initializer()
{
HttpClientInitializer = credential,
ApplicationName = "PROJECT NAME"
});
string query = "SELECT * FROM [publicdata:samples.shakespeare]";
JobsResource j = Service.Jobs;
QueryRequest qr = new QueryRequest();
string ProjectID = "PROJECT ID";
qr.Query = query;
qr.MaxResults = Int32.MaxValue;
qr.TimeoutMs = Int32.MaxValue;
DataTable DT = new DataTable();
int i = 0;
QueryResponse response = j.Query(qr, ProjectID).Execute();
string pageToken = null;
if (response.JobComplete == true)
{
if (response != null)
{
int colCount = response.Schema.Fields.Count;
if (DT == null)
DT = new DataTable();
if (DT.Columns.Count == 0)
{
foreach (var Column in response.Schema.Fields)
{
DT.Columns.Add(Column.Name);
}
}
pageToken = response.PageToken;
if (response.Rows != null)
{
foreach (TableRow row in response.Rows)
{
DataRow dr = DT.NewRow();
for (i = 0; i < colCount; i++)
{
dr[i] = row.F[i].V;
}
DT.Rows.Add(dr);
}
}
Console.WriteLine("No of Records are Readed: {0} # {1}", DT.Rows.Count.ToString(), DateTime.Now.ToString());
while (true)
{
int StartIndexForQuery = DT.Rows.Count;
Google.Apis.Bigquery.v2.JobsResource.GetQueryResultsRequest SubQR = Service.Jobs.GetQueryResults(response.JobReference.ProjectId, response.JobReference.JobId);
SubQR.StartIndex = (ulong)StartIndexForQuery;
//SubQR.MaxResults = Int32.MaxValue;
GetQueryResultsResponse QueryResultResponse = SubQR.Execute();
if (QueryResultResponse != null)
{
if (QueryResultResponse.Rows != null)
{
foreach (TableRow row in QueryResultResponse.Rows)
{
DataRow dr = DT.NewRow();
for (i = 0; i < colCount; i++)
{
dr[i] = row.F[i].V;
}
DT.Rows.Add(dr);
}
}
Console.WriteLine("No of Records are Readed: {0} # {1}", DT.Rows.Count.ToString(), DateTime.Now.ToString());
if (null == QueryResultResponse.PageToken)
{
break;
}
}
else
{
break;
}
}
}
else
{
Console.WriteLine("Response is null");
}
}
int TotalCount = 0;
if (DT != null && DT.Rows.Count > 0)
{
TotalCount = DT.Rows.Count;
}
else
{
TotalCount = 0;
}
Console.WriteLine("End Time: {0}", DateTime.Now.ToString());
Console.WriteLine("No. of records readed from google bigquery service: " + TotalCount.ToString());
}
catch (Exception e)
{
Console.WriteLine("Error Occurred: " + e.Message);
}
Console.ReadLine();
}
}
}
In this Sample Query get the results from public data set, In table contains 164656 rows but response returns 85000 rows only for the first time, then query again to get the second set of results. (But not known this is the only solution to get all the results).
In this sample contains only 4 fields, even-though it does not return all rows, in my case table contains more than 15 fields, I get response of ~4000 rows out of ~10k rows, I need to query again and again to get the remaining results for selecting 1000 rows takes time up to 2 minutes in my methodology so I am expecting best way to select all the records within single response.
Answer from User #:Pentium10
There is no way to run a query and select a large response in a single shot. You can either paginate the results, or if you can create a job to export to files, then use the files generated in your app. Exporting is free.
Step to run a large query and export results to files stored on GCS:
1) Set allowLargeResults to true in your job configuration. You must also specify a destination table with the allowLargeResults flag.
Example:
"configuration":
{
"query":
{
"allowLargeResults": true,
"query": "select uid from [project:dataset.table]"
"destinationTable": [project:dataset.table]
}
}
2) Now your data is in a destination table you set. You need to create a new job, and set the export property to be able to export the table to file(s). Exporting is free, but you need to have Google Cloud Storage activated to put the resulting files there.
3) In the end you download your large files from GCS.
It my turn to design the solution for better results.
Hoping this might help someone. One could retrieve next set of paginated result using PageToken. Here is the sample code for how to use PageToken. Although, I liked the idea of exporting for free. Here, I write rows to flat file but you could add them to your DataTable. Obviously, it is a bad idea to keep large DataTable in memory though.
public void ExecuteSQL(BigqueryService bqservice, String ProjectID)
{
string sSql = "SELECT r.Dealname, r.poolnumber, r.loanid FROM [MBS_Dataset.tblRemitData] R left join each [MBS_Dataset.tblOrigData] o on R.Dealname = o.Dealname and R.Poolnumber = o.Poolnumber and R.LoanID = o.LoanID Order by o.Dealname, o.poolnumber, o.loanid limit 100000";
QueryRequest _r = new QueryRequest();
_r.Query = sSql;
QueryResponse _qr = bqservice.Jobs.Query(_r, ProjectID).Execute();
string pageToken = null;
if (_qr.JobComplete != true)
{
//job not finished yet! expecting more data
while (true)
{
var resultReq = bqservice.Jobs.GetQueryResults(_qr.JobReference.ProjectId, _qr.JobReference.JobId);
resultReq.PageToken = pageToken;
var result = resultReq.Execute();
if (result.JobComplete == true)
{
WriteRows(result.Rows, result.Schema.Fields);
pageToken = result.PageToken;
if (pageToken == null)
break;
}
}
}
else
{
List<string> _fieldNames = _qr.Schema.Fields.ToList().Select(x => x.Name).ToList();
WriteRows(_qr.Rows, _qr.Schema.Fields);
}
}
The Web UI automatically flattens the data. This means that you see multiple rows for each nested field.
When you run the same query via the API, it won't be flattened, and you get fewer rows, as the nested fields are returned as objects. You should check if this is the case at you.
The other is that indeed you need to paginate through the results. Paging through list results has this explained.
If you want to do only one job, than you should write your query ouput to a table, than export the table as JSON, and download the export from GCS.