BigQuery Java API not returning all rows when we execute query through it

BigQuery Java API not returning all rows when we execute query through it - google-bigquery

We are facing one intermittent issue where when we execute a query though BigQuery Java API then the number of rows that we get doesn't match with when we execute the same query through BigQuery UI.
In our code, we are using QueryResponse object for executing a query and we also check whether query is completed or not by checking the flag
GetQueryResultsResponse.getJobComplete(), we also have mechanism to pull more records if the query is not returning all rows in one short while(queryResult.getRows() != null && queryResult.getTotalRows().compareTo(BigInteger.valueOf((queryResult.getRows().size()))) > 0) {
Following is the piece of code which we use for executing the query:
int retryCount = 0;
long waitTime = Constant.BASE_WAIT_TIME;
Bigquery bigquery = cloudPlatformConnector.connectBQ();
QueryRequest queryRequest = new QueryRequest();
queryRequest.setUseLegacySql(useLegacyDialect);
GetQueryResultsResponse queryResult = null;
GetQueryResultsResponse queryPaginationResult = null;
String pageToken;
do{
try{
QueryResponse query = bigquery.jobs().query(this.projectId, queryRequest.setQuery(querySql)).execute();
queryResult = bigquery.jobs().getQueryResults(query.getJobReference().getProjectId(), query.getJobReference().getJobId()).execute();
if(queryResult != null ){
if(!queryResult.getJobComplete()){
LOGGER.info("JobId for the query : "+ query.getJobReference().getJobId() + " is Job Completed : "+ queryResult.getJobComplete());
if(queryResult.getErrors() != null){
for( ErrorProto err: queryResult.getErrors() ){
LOGGER.info("Errors in query, Reason : "+ err.getReason()+ " Location : "+ err.getLocation() +" Message : "+ err.getMessage());
}
}
LOGGER.info("Query not completed : "+querySql);
throw new IOException("Query is failing retrying it");
}
}
LOGGER.info("JobId for the query : "+ query.getJobReference().getJobId() + " is Job Completed : "+ queryResult.getJobComplete() + " Total rows from query : " + queryResult.getTotalRows());
pageToken = queryResult.getPageToken();
while(queryResult.getRows() != null && queryResult.getTotalRows().compareTo(BigInteger.valueOf((queryResult.getRows().size()))) > 0) {
LOGGER.info("Inside the Pagination code block, Page Token : "+pageToken);
queryPaginationResult = bigquery.jobs().getQueryResults(projectId,query.getJobReference().getJobId()).setPageToken(pageToken).setStartIndex(BigInteger.valueOf(queryResult.getRows().size())).execute();
queryResult.getRows().addAll(queryPaginationResult.getRows());
pageToken = queryPaginationResult.getPageToken();
LOGGER.info("Inside the Pagination code block, total size : "+ queryResult.getTotalRows() + " Current Size : "+ queryResult.getRows().size());
}
}catch(IOException ex){
retryCount ++;
LOGGER.info("BQ Connection Attempt "+retryCount +" failed, Retrying in " + waitTime + " seconds");
if (retryCount == Constant.MAX_RETRY_LIMIT) {
LOGGER.info("BQ Connection Error", ex);
throw ex;
}
try {
Thread.sleep(waitTime);
} catch (InterruptedException e) {
LOGGER.info("Thread Error");
}
waitTime *= 2;
}
}while((queryResult == null && retryCount < Constant.MAX_RETRY_LIMIT ) || (!queryResult.getJobComplete() && retryCount < Constant.MAX_RETRY_LIMIT));
return queryResult.getRows();
The Query in which I am not getting all rows doesn't have any limit clause in it.
Currently we are using 0.5.0 version of google-cloud-bigquery.
Thanks in Advance!

I think on subsequent calls of getQueryResults, you need to call setPageToken properly with the pageToken returned from the previous page. Otherwise getQueryResults would just return the rows from the first page.

Related

DB2 - ERRORCODE=-4229, SQLSTATE=null

I'm using a batch class in EJB to INSERT more than 100 rows in the same commit using the command line executeBatch in the DB2.
When I execute the command shows this error: ERRORCODE=-4229, SQLSTATE=null.
The ID sequence is IDENTITY clause on the CREATE TABLE.
Table:
CREATE TABLE table (col1 INT,
col2 DOUBLE,
col3 INT NOT NULL GENERATED ALWAYS AS IDENTITY)
Does anyone have any idea?
ERROR:
Caused by: nested exception is: com.ibm.db2.jcc.am.BatchUpdateException: [jcc][t4][102][10040][4.24.97] Batch failure. The batch was submitted, but at least one exception occurred in an individual batch member.
Use getNextException() to retrieve exceptions for specific batch elements. ERRORCODE=-4229, SQLSTATE=null

It's not an answer, but a suggestion to handle Db2 exceptions to have an ability to deal with such errors.
If you are unable to rewrite your error handling, the only thing you can to is to enable JDBC trace on the client or/and set the Db2 dbm cfg DIAGLEVEL parameter to 4.
PreparedStatement pst = null;
try
{
pst = ...;
...
int [] updateCounts = pst.executeBatch();
System.out.println("Batch results:");
for (int i = 0; i < updateCounts.length; i++)
System.out.println(" Statement " + i + ":" + updateCounts[i]);
} catch (SQLException ex)
{
while (ex != null)
{
if (ex instanceof com.ibm.db2.jcc.DB2Diagnosable)
{
com.ibm.db2.jcc.DB2Diagnosable db2ex = com.ibm.db2.jcc.DB2Diagnosable) ex;
com.ibm.db2.jcc.DB2Sqlca sqlca = db2ex.getSqlca();
if (sqlca != null)
{
System.out.println("SQLCODE: " + sqlca.getSqlCode());
System.out.println("MESSAGE: " + sqlca.getMessage());
}
else
{
System.out.println("Error code: " + ex.getErrorCode());
System.out.println("Error msg : " + ex.getMessage());
}
}
else
{
System.out.println("Error code (no db2): " + ex.getErrorCode());
System.out.println("Error msg (no db2): " + ex.getMessage());
}
if (ex instanceof BatchUpdateException)
{
System.out.println("Contents of BatchUpdateException:");
System.out.println(" Update counts: ");
System.out.println(" Statement.SUCCESS_NO_INFO: " + Statement.SUCCESS_NO_INFO);
System.out.println(" Statement.EXECUTE_FAILED : " + Statement.EXECUTE_FAILED);
BatchUpdateException buex = (BatchUpdateException) ex;
int [] updateCounts = buex.getUpdateCounts();
for (int i = 0; i < updateCounts.length; i++)
System.out.println(" Statement " + i + ":" + updateCounts[i]);
}
ex = ex.getNextException();
}
}
...

BigQueryIO returns TypedRead<TableRow> instead of PCollection<TableRow>. How to get the real data?

I have a problem with retrieving a data from bigquery table inside DoFn. I can't find example to extract values from TypedRead.
This is a simplified pipeline. I would like to check does record with target SSN exists or not in bigquery table. The target SSN will be received via pubsub in real pipeline, I have replaced it with array of strings.
final BigQueryIoTestOptions options = PipelineOptionsFactory.fromArgs(args).withValidation().as(BigQueryIoTestOptions.class);
final List<String> SSNs = Arrays.asList("775-89-3939");
Pipeline p = Pipeline.create(options);
PCollection<String> ssnCollection = p.apply("GetSSNParams", Create.of(SSNs)).setCoder(StringUtf8Coder.of());
ssnCollection.apply("SelectFromBQ", ParDo.of(new DoFn<String, TypedRead<TableRow>>() {
#ProcessElement
public void processElement(ProcessContext c) throws Exception {
TypedRead<TableRow> tr =
BigQueryIO.readTableRows()
.fromQuery("SELECT pid19PatientSSN FROM dataset.table where pid19PatientSSN = '" + c.element() + "' LIMIT 1");
c.output(tr);
}
}))
.apply("ParseResponseFromBigQuery", ParDo.of(new DoFn<TypedRead<TableRow>, Void>() {
#ProcessElement
public void processElement(ProcessContext c) throws Exception {
System.out.println(c.element().toString());
}
}));
p.run();

Big query returns PCollection only, we can get the result as entry set like the below example or we can serialize to objects as well like mentioned here
If you want to query from BigQuery middle of your pipeline use BigQuery instead of BigQueryIO like mentioned here
BigQueryIO Example:
PipelineOptions options = PipelineOptionsFactory.fromArgs(args).create();
Pipeline pipeline = Pipeline.create(options);
PCollection<TableRow> result = pipeline.apply(BigQueryIO.readTableRows()
.fromQuery("SELECT id, name FROM [project-test:test_data.test] LIMIT 1"));
result.apply(MapElements.via(new SimpleFunction<TableRow, Void>() {
#Override
public Void apply(TableRow obj) {
System.out.println("***" + obj);
obj.entrySet().forEach(
(k)-> {
System.out.println(k.getKey() + " :" + k.getValue());
}
);
return null;
}
}));
pipeline.run().waitUntilFinish();
BigQuery Example:
// [START bigquery_simple_app_client]
BigQuery bigquery = BigQueryOptions.getDefaultInstance().getService();
// [END bigquery_simple_app_client]
// [START bigquery_simple_app_query]
QueryJobConfiguration queryConfig =
QueryJobConfiguration.newBuilder(
"SELECT "
+ "CONCAT('https://stackoverflow.com/questions/', CAST(id as STRING)) as url, "
+ "view_count "
+ "FROM `bigquery-public-data.stackoverflow.posts_questions` "
+ "WHERE tags like '%google-bigquery%' "
+ "ORDER BY favorite_count DESC LIMIT 10")
// Use standard SQL syntax for queries.
// See: https://cloud.google.com/bigquery/sql-reference/
.setUseLegacySql(false)
.build();
// Create a job ID so that we can safely retry.
JobId jobId = JobId.of(UUID.randomUUID().toString());
Job queryJob = bigquery.create(JobInfo.newBuilder(queryConfig).setJobId(jobId).build());
// Wait for the query to complete.
queryJob = queryJob.waitFor();
// Check for errors
if (queryJob == null) {
throw new RuntimeException("Job no longer exists");
} else if (queryJob.getStatus().getError() != null) {
// You can also look at queryJob.getStatus().getExecutionErrors() for all
// errors, not just the latest one.
throw new RuntimeException(queryJob.getStatus().getError().toString());
}
// [END bigquery_simple_app_query]
// [START bigquery_simple_app_print]
// Get the results.
QueryResponse response = bigquery.getQueryResults(jobId);
TableResult result = queryJob.getQueryResults();
// Print all pages of the results.
for (FieldValueList row : result.iterateAll()) {
String url = row.get("url").getStringValue();
long viewCount = row.get("view_count").getLongValue();
System.out.printf("url: %s views: %d%n", url, viewCount);
}
// [END bigquery_simple_app_print]

Gettint try and catch events after deleting row sql

Hello guys whenever I try to delete a row using a radiobutton, I get both try and catch messages, when it is supposed to be just 1 of them, I have this code
Here's my calling button method
if(request.getParameter("btnEliminar") != null)
{
String value;
int codParse;
OC_DAO objDAO = new OC_DAO();
valor = request.getParameter("rbSel");
codParse = Integer.parseInt(valor);
objDAO.DeleteRow(codParse);
}
Here's my java code
public void DeleteRow(int codDet)
{
try
{
cn = Conexion.getConexion();
pt = cn.prepareStatement("DELETE "
+ "FROM detalleProd "
+ "WHERE codDet = ?");
pt.setInt(1, codDet);
pt.executeUpdate();
System.out.println("ROW DELETED ON CODDET: " + codDet);
rs.close();
pt.close();
cn.close();
}
catch(Exception exc)
{
System.out.println("Error while deleting");
System.out.println(exc.toString());
}
}
And here's my log
Información: ROW DELETED ON CODDET: 48
Información: Error while deleting
Información: java.lang.NullPointerException

The reason is due to rs.close();, you have not set value of rs,it's null and can not be closed,you just need to remove this line of code.
Your code seems very strange,I do not have see where you declare rs,it will compile error in your IDE.

Lucene 6 Payloads

I am trying to work with payloads in Lucene 6 but I am having troubles. The idea is to index payloads and use them in a CustomScoreQuery to check if the payload of a query term matches the payload for the document term.
Here is my payload filter:
#Override
public final boolean incrementToken() throws IOException {
if (!this.input.incrementToken()) {
return false;
}
// get the current token
final char[] token = Arrays.copyOfRange(this.termAtt.buffer(), 0, this.termAtt.length());
String stoken = String.valueOf(token);
String[] parts = stoken.split(Constants.PAYLOAD_DELIMITER);
if (parts.length > 1 && parts.length == 2){
termAtt.setLength(parts[0].length());
// the rest is the payload
BytesRef br = new BytesRef(parts[1]);
System.out.println(br);
payloadAtt.setPayload(br);
}else if (parts.length > 1){
// skip
}else{
// no payload here
payloadAtt.setPayload(null);
}
return true;
}
It seems to be adding the payload, however when I try to access the payload in CustomScoreQuery it just keeps returning null.
public float determineBoost(int doc) throws IOException{
float boost = 1f;
LeafReader reader = this.context.reader();
System.out.println("Has payloads:" + reader.getFieldInfos().hasPayloads());
// loop through each location of the term and boost if location matches the payload
if (reader != null){
PostingsEnum posting = reader.postings(new Term(this.field, term.getTerm()), PostingsEnum.POSITIONS);
System.out.println("Term: " + term.getTerm());
if (posting != null){
// move to the document currently looking at
posting.advance(doc);
int count = 0;
while (count < posting.freq()){
BytesRef load = posting.getPayload();
System.out.println(posting);
System.out.println(posting.getClass());
System.out.println(posting.attributes());
System.out.println("Load: " + load);
// if the location matches in the term location than boos the term by the boost factor
try {
if(load != null && term.containLocation(new Payload(load))){
boost = boost * this.boost;
}
} catch (PayloadException e) {
// do not care too much, the payload is unrecognized
// this is not going to change the boost factor
}
posting.nextPosition();
count += 1;
}
}
}
return boost;
}
For my two tests it keeps stating the load is null. Any suggestions or help?

How to avoid to fetch a list of followers of the same Twitter user that was displayed before

I'm very new at coding and I'm having some issues. I'd like to display the followers of followers of ..... of followers of some specific users in Twitter. I have coded this and I can set a limit for the depth. But, while running the code with a small sample, I saw that I run into the same users again and my code re-display the followers of these users. How can I avoid this and skip to the next user? You can find my code below:
By the way, while running my code, I encounter with a 401 error. In the list I'm working on, there's a private user, and when my code catches that user, it stops. Additionally, how can I deal with this issue? I'd like to skip such users and prevent my code to stop.
Thank you for your help in advance!
PS: I know that I'll encounter with a 429 error working with a large sample. After fixing these issues, I'm planning to review relevant discussions to deal with.
public class mainJava {
public static Twitter twitter = buildConfiguration.getTwitter();
public static void main(String[] args) throws Exception {
ArrayList<String> rootUserIDs = new ArrayList<String>();
Scanner s = new Scanner(new File("C:\\Users\\ecemb\\Desktop\\rootusers1.txt"));
while (s.hasNextLine()) {
rootUserIDs.add(s.nextLine());
}
s.close();
for (String rootUserID : rootUserIDs) {
User rootUser = twitter.showUser(rootUserID);
List<User> userList = getFollowers(rootUser, 0);
}
}
public static List<User> getFollowers(User parent, int depth) throws Exception {
List<User> userList = new ArrayList<User>();
if (depth == 2) {
return userList;
}
IDs followerIDs = twitter.getFollowersIDs(parent.getScreenName(), -1);
long[] ids = followerIDs.getIDs();
for (long id : ids) {
twitter4j.User child = twitter.showUser(id);
userList.add(child);
getFollowers(child, depth + 1);
System.out.println(depth + "th user: " + parent.getScreenName() + " Follower: " + child.getScreenName());
}
return userList;
}
}

I guess graph search algorithms can be implemented for this particular issue. I chose Breadth First Search algorithm because visiting root user's followers at first would be better. You can check this link to additional information about algorithm.
Here is my implementation for your problem:
public List<User> getFollowers(User parent, int startDepth, int finalDepth) {
List<User> userList = new ArrayList<User>();
Queue<Long> queue = new LinkedList<Long>();
HashMap<Long, Integer> discoveredUserId = new HashMap<Long, Integer>();
try {
queue.add(parent.getId());
discoveredUserId.put(parent.getId(), 0);
while (!queue.isEmpty()) {
long userId = queue.remove();
int discoveredDepth = discoveredUserId.get(userId);
if (discoveredDepth == finalDepth) {
continue;
}
User user = twitter.showUser(userId);
handleRateLimit(user.getRateLimitStatus());
if (user.isProtected()) {
System.out.println(user.getScreenName() + "'s account is protected. Can't access followers.");
continue;
}
IDs followerIDs = null;
followerIDs = twitter.getFollowersIDs(user.getScreenName(), -1);
handleRateLimit(followerIDs.getRateLimitStatus());
long[] ids = followerIDs.getIDs();
for (int i = 0; i < ids.length; i++) {
if (!discoveredUserId.containsKey(ids[i])) {
discoveredUserId.put(ids[i], discoveredDepth + 1);
User child = twitter.showUser(ids[i]);
handleRateLimit(child.getRateLimitStatus());
userList.add(child);
if (discoveredDepth >= startDepth && discoveredDepth < finalDepth) {
System.out.println(discoveredDepth + ". user: " + user.getScreenName() + " has " + user.getFollowersCount() + " follower(s) " + (i + 1) + ". Follower: " + child.getScreenName());
}
queue.add(ids[i]);
} else {//prints to console but does not check followers. Just for data consistency
User child = twitter.showUser(ids[i]);
handleRateLimit(child.getRateLimitStatus());
if (discoveredDepth >= startDepth && discoveredDepth < finalDepth) {
System.out.println(discoveredDepth + ". user: " + user.getScreenName() + " has " + user.getFollowersCount() + " follower(s) " + (i + 1) + ". Follower: " + child.getScreenName());
}
}
}
}
} catch (TwitterException e) {
e.printStackTrace();
}
return userList;
}
//There definitely are more methods for handling rate limits but this worked for me well
private void handleRateLimit(RateLimitStatus rateLimitStatus) {
//throws NPE here sometimes so I guess it is because rateLimitStatus can be null and add this conditional expression
if (rateLimitStatus != null) {
int remaining = rateLimitStatus.getRemaining();
int resetTime = rateLimitStatus.getSecondsUntilReset();
int sleep = 0;
if (remaining == 0) {
sleep = resetTime + 1; //adding 1 more second
} else {
sleep = (resetTime / remaining) + 1; //adding 1 more second
}
try {
Thread.sleep(sleep * 1000 > 0 ? sleep * 1000 : 0);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
in this code HashMap<Long, Integer> discoveredUserId is used to prevent program checking same users repeatedly and storing in which depth we faced with this user.
and for private users, there is isProtected() method in twitter4j library.
Hope this implementation helps.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

BigQuery Java API not returning all rows when we execute query through it - google-bigquery

I think on subsequent calls of getQueryResults, you need to call setPageToken properly with the pageToken returned from the previous page. Otherwise getQueryResults would just return the rows from the first page.

Related

DB2 - ERRORCODE=-4229, SQLSTATE=null

BigQueryIO returns TypedRead<TableRow> instead of PCollection<TableRow>. How to get the real data?

Gettint try and catch events after deleting row sql

Lucene 6 Payloads

How to avoid to fetch a list of followers of the same Twitter user that was displayed before

Categories

Resources