Fetch time series from Google BigQuery - google-bigquery

I am trying to fetch a list of prices from google big query using the following query :
query_request = service.jobs()
query_data = {
'query': (
'''
SELECT
open
FROM
timeseries.price_2015
''')
}
query_response = query_request.query(
projectId=project_id,
body=query_data).execute()
The table contains 370000 records, but the query loads only the first 100000. I guess I am hitting some limit? Can you tell how I can fetch all records for the 'price' column?

The number of rows returned is limited by the lesser of either the maximum page size or the maxResults property. See more in Paging Through list Results
Consider using Jobs: getQueryResults or Tabledata: list where you can call those API in loop passing PageToken from previous response to next call and collecting whole set on client side

Related

Results DataSet from DynamoDB Query using GSI is not returning correct results

I have a dynamo DB table where I am currently storing all the events that are happening in my system with respect to every product. There is a primary key with a Hash combination of productid,eventtype and eventcategory and Sort Key as Creation Time on the main table. The table was created and data was added into it.
Later I added a new GSI on the table with the attributes being Secondary Hash (which is just the combination of eventcategory and eventtype (excluding productid) and CreationTime as Sort Key. This was added so that I can query for multiple products at once.
The GSI seems to work fine, However only later I realized the data being returned is incorrect
Here is the scenario. (I am running all these queries against the newly created index)
I was querying for products with in the last 30 days and the Query returns 312 records, However, when I run the same query for last 90 days, it returns me only 128 records (which is wrong, should be atleast equal or greater than last 30 days records)
I have the pagination logic already embedded in my code, so that the lastEvaluatedKey is verified every-time, to loop and fetch the next set of records and after the loop, all the results are combined.
Not sure if I am missing something.
ANy suggestions would be appreciated.
var limitPtr *int64
if limit > 0 {
limit64 := int64(limit)
limitPtr = &limit64
}
input := dynamodb.QueryInput{
ExpressionAttributeNames: map[string]*string{
"#sch": aws.String("SecondaryHash"),
"#pkr": aws.String("CreationTime"),
},
ExpressionAttributeValues: map[string]*dynamodb.AttributeValue{
":sch": {
S: aws.String(eventHash),
},
":pkr1": {
N: aws.String(strconv.FormatInt(startTime, 10)),
},
":pkr2": {
N: aws.String(strconv.FormatInt(endTime, 10)),
},
},
KeyConditionExpression: aws.String("#sch = :sch AND #pkr BETWEEN :pkr1 AND :pkr2"),
ScanIndexForward: &scanForward,
Limit: limitPtr,
TableName: aws.String(ddbTableName),
IndexName: aws.String(ddbIndexName),
}
You reached the maximum number of items to evaluate (not necessarily the number of matching items). The limit is 1 MB.
The response will contain a LastEvaluatedKey parameter, it is the last item's id. You have to perform a new query with an extra ExclusiveStartKey parameter. (ExclusiveStartKey should be equal with LastEvaluatedKey's value.)
When the LastEvaluatedKey is empty you reached the end of the table.

How to remove collection from model [duplicate]

I have a query, which selects documents to be removed. Right now, I remove them manually, like this (using python):
for id in mycoll.find(query, fields={}):
mycoll.remove(id)
This does not seem to be very efficient. Is there a better way?
EDIT
OK, I owe an apology for forgetting to mention the query details, because it matters. Here is the complete python code:
def reduce_duplicates(mydb, max_group_size):
# 1. Count the group sizes
res = mydb.static.map_reduce(jstrMeasureGroupMap, jstrMeasureGroupReduce, 'filter_scratch', full_response = True)
# 2. For each entry from the filter scratch collection having count > max_group_size
deleteFindArgs = {'fields': {}, 'sort': [('test_date', ASCENDING)]}
for entry in mydb.filter_scratch.find({'value': {'$gt': max_group_size}}):
key = entry['_id']
group_size = int(entry['value'])
# 2b. query the original collection by the entry key, order it by test_date ascending, limit to the group size minus max_group_size.
for id in mydb.static.find(key, limit = group_size - max_group_size, **deleteFindArgs):
mydb.static.remove(id)
return res['counts']['input']
So, what does it do? It reduces the number of duplicate keys to at most max_group_size per key value, leaving only the newest records. It works like this:
MR the data to (key, count) pairs.
Iterate over all the pairs with count > max_group_size
Query the data by key, while sorting it ascending by the timestamp (the oldest first) and limiting the result to the count - max_group_size oldest records
Delete each and every found record.
As you can see, this accomplishes the task of reducing the duplicates to at most N newest records. So, the last two steps are foreach-found-remove and this is the important detail of my question, that changes everything and I had to be more specific about it - sorry.
Now, about the collection remove command. It does accept query, but mine include sorting and limiting. Can I do it with remove? Well, I have tried:
mydb.static.find(key, limit = group_size - max_group_size, sort=[('test_date', ASCENDING)])
This attempt fails miserably. Moreover, it seems to screw mongo.Observe:
C:\dev\poc\SDR>python FilterOoklaData.py
bad offset:0 accessing file: /data/db/ookla.0 - consider repairing database
Needless to say, that the foreach-found-remove approach works and yields the expected results.
Now, I hope I have provided enough context and (hopefully) have restored my lost honour.
You can use a query to remove all matching documents
var query = {name: 'John'};
db.collection.remove(query);
Be wary, though, if number of matching documents is high, your database might get less responsive. It is often advised to delete documents in smaller chunks.
Let's say, you have 100k documents to delete from a collection. It is better to execute 100 queries that delete 1k documents each than 1 query that deletes all 100k documents.
You can remove it directly using MongoDB scripting language:
db.mycoll.remove({_id:'your_id_here'});
Would deleteMany() be more efficient? I've recently found that remove() is quite slow for 6m documents in a 100m doc collection. Documentation at (https://docs.mongodb.com/manual/reference/method/db.collection.deleteMany)
db.collection.deleteMany(
<filter>,
{
writeConcern: <document>,
collation: <document>
}
)
I would recommend paging if large number of records.
First: Get the count of data you want to delete:
-------------------------- COUNT --------------------------
var query= {"FEILD":"XYZ", 'DATE': {$lt:new ISODate("2019-11-10")}};
db.COL.aggregate([
{$match:query},
{$count: "all"}
])
Second: Start deleting chunk by chunk:
-------------------------- DELETE --------------------------
var query= {"FEILD":"XYZ", 'date': {$lt:new ISODate("2019-11-10")}};
var cursor = db.COL.aggregate([
{$match:query},
{ $limit : 5 }
])
cursor.forEach(function (doc){
db.COL.remove({"_id": doc._id});
});
and this should be faster:
var query={"FEILD":"XYZ", 'date': {$lt:new ISODate("2019-11-10")}};
var ids = db.COL.find(query, {_id: 1}).limit(5);
db.tags.deleteMany({"_id": { "$in": ids.map(r => r._id)}});
Run this query in cmd
db.users.remove( {"_id": ObjectId("5a5f1c472ce1070e11fde4af")});
If you are using node.js write this code
User.remove({ _id: req.body.id },, function(err){...});

Limit 1 should be used with queryRow in Yii?

I am usually using queryRow to get a single record. eg:-
$lastReport = Yii::app()->db->createCommand(
'SELECT * FROM report ORDER BY created DESC'
)->queryRow();
I looked the MySQL log to know which query is used for it.
SELECT tbl_report.* FROM report ORDER BY created DESC
It seems that Yii is retrieving all the records from the table and return the first record.
So I think we should use LIMIT 1 whenever we are using queryRow. eg:-
$lastReport = Yii::app()->db->createCommand(
'SELECT * FROM report ORDER BY created DESC LIMIT 1'
)->queryRow();
Since the queryRow is returning "the first row (in terms of an array) of the query result", Yii should automatically add the limit. otherwise user will use this query to get a single record and that will cause to performance degradation.
Is my understanding is correct or I missed something?
Yii should not add limit 1 to query, because queryRow is designed to get results row by row, for example in while. Yii has limited functionality with raw SQL code, but Query Builder is available:
$user = Yii::app()->db->createCommand()
->select('id, username, profile')
->from('tbl_user u')
->join('tbl_profile p', 'u.id=p.user_id')
->where('id=:id', array(':id'=>$id))
->queryRow();
More information available here: http://www.yiiframework.com/doc/guide/1.1/en/database.query-builder
You should rather use ActiveRecordClassName::model()->findByPk($id); or ActiveRecordClassName::model()->find($criteria), because it uses defaultScope() and other improvements.

How to get Google CSE RESTFull API Result's Next Page & confusion on daily request limit?

I am using Google CSE Restlful API. And my code to get results is
Google.Apis.Customsearch.v1.CseResource.ListRequest listRequest = svc.Cse.List(query);
listRequest.Cx = cx;
Google.Apis.Customsearch.v1.Data.Search search = listRequest.Fetch();
foreach (Google.Apis.Customsearch.v1.Data.Result result in search.Items)
{
//do something with items
}
It returns me 10 results out of total 100 . To see results of next 10 records I have to
listRequest.Start = 11;
search = listRequest.Fetch();
And now I my 'search.Items' have results from 11-20 .
Now I have 2 questions:
1- Is it right way to get the results of next page ( next 10 records) ?
2- And doing so would it mean that I have consumed 2 request out of 100 allowed requests per day ?
If this is correct then effectively user can only get total of 1000 results per day from Google CSE API.
So it means if I have to see all 100 results of my first query I would have to make 10 requests.
Thanks,
Wasim
Yes it's the right way: setting the start parameter to the next index will request the next paginated results from your query.
You are right also on the second question, each request (paginated or non paginated) is counted between the max of 100 allowed per day, resulting of a total of 1000 max results per day.

Total count for a limited result

So i implemented the paging for dojo.store.jsonRest to use as store in the dojox.grid.DataGrid. In the server im using Symfony 2 and as ORM Doctrine, im new to this two frameworks.
For Dojo jsonRest the response of the server must have a header Content-Range containing the result offset, limit and the total number of records (without the limit).
So for a response with a Content-Range: items 0-24/66 header, if the user where to scroll the grid records to the 24 row, it will make a async request with Range: 24-66 header, then the response header should have a Content-Range: items 24-66/66. This is done so Dojo can know how many request it can make for the paginated data and the records range for the presented and subsequent request.
So my problem is that to get the total number of records without the limit, i had to make a COUNT query using the same query that has the offset and limit. I don't like this.
I want to know if there is a way i can get the total count and the limited result without making two queries.
public function getByTextCount($text)
{
$dql = "SELECT COUNT(s.id) FROM Bundle:Something s WHERE s.text LIKE :text";
$query = $this->getEntityManager()->createQuery($dql);
$query->setParameter('text', '%'.$text.'%');
return $query->getSingleScalarResult();
}
-
public function getByText($text, $offset=0, $limit=24)
{
$dql = "SELECT r FROM Bundle:Something s WHERE s.text LIKE :text";
$query = $this->getEntityManager()->createQuery($dql);
$query->setParameter('text', '%'.$text.'%');
$query->setFirstResult($offset);
$query->setMaxResults($limit);
return $query->getArrayResult();
}
If you're using MySQL, you can do a SELECT FOUND_ROWS().
From the documentation.
A SELECT statement may include a LIMIT clause to restrict the number
of rows the server returns to the client. In some cases, it is
desirable to know how many rows the statement would have returned
without the LIMIT, but without running the statement again. To obtain
this row count, include a SQL_CALC_FOUND_ROWS option in the SELECT
statement, and then invoke FOUND_ROWS() afterward:
mysql> SELECT SQL_CALC_FOUND_ROWS * FROM tbl_name
-> WHERE id > 100 LIMIT 10;
mysql> SELECT FOUND_ROWS();
If you want to use Doctrine only (i.e. to avoid vendor-specific SQL) you can always reset part of the query after you have selected the entities:
// $qb is a Doctrine Query Builder
// $query is the actual DQL query returned from $qb->getQuery()
// and then updated with the ->setFirstResult(OFFSET) and ->setMaxResults(LIMIT)
// Get the entities as an array ready for JSON serialization
$entities = $query->getArrayResult();
// Reset the query and get the total records ready for the Range header
// 'e' in count(e) is the alias for the entity specified in the Query Builder
$count = $qb->resetDQLPart('orderBy')
->select('COUNT(e)')
->getQuery()
->getSingleScalarResult();