eBay Fulfillment API getOrders returns "total" 1 larger than the size of the orders array - ebay-api

I'm using the eBay Fulfillment API to retrieve orders in JSON format. I've been running into an issue with the "total" field included in the response: It's consistently 1 too large when my date range includes a specific time (around 10:07 UTC today).
Here's a PHP/Xdebug var_dumpof a json_decoded example response, trimmed of the actual (irrelevant) order data. Note the difference between 'total' and size:
object(stdClass)[22]
public 'href' => string 'https://api.ebay.com/sell/fulfillment/v1/order?filter=lastmodifieddate:%5B2020-10-12T14:44:56.000Z..%5D,orderfulfillmentstatus:%7BIN_PROGRESS%7CNOT_STARTED%7D&limit=200&offset=0' (length=177)
public 'total' => int 177
public 'limit' => int 200
public 'offset' => int 0
public 'orders' =>
array (size=176)
...
I've tried using the defaults for limit and offset, in that case the last page will have 1 too few orders (offset+size(array)==total+1). I've also run this quite a few times with different starting dates to make sure it's not a weird timing issue. As long as the lastmodifieddate is more recent than 10:08 UTC today, the numbers fit, otherwise they don't.
Is this just a bug in the eBay API or is there some deeper meaning to this? I would assume this number to tell me the total amount of orders included in my current query (disregarding limit and offset), but this inconsistency makes me believe it counts some orders that aren't included (such as fulfilled orders or unpaid orders).
Weirdly enough, it actually includes a "next" field when I tweak the limit and offset to fit the actual array size exactly, and using that field's value to retrieve the "next page" (which would be the last order) results in this:
object(stdClass)[22]
public 'href' => string 'https://api.ebay.com/sell/fulfillment/v1/order?filter=lastmodifieddate:%5B2020-10-13T09:29:03.000Z..%5D,orderfulfillmentstatus:%7BIN_PROGRESS%7CNOT_STARTED%7D&limit=78&offset=78' (length=177)
public 'total' => int 79
public 'prev' => string 'https://api.ebay.com/sell/fulfillment/v1/order?filter=lastmodifieddate:%5B2020-10-13T09:29:03.000Z..%5D,orderfulfillmentstatus:%7BIN_PROGRESS%7CNOT_STARTED%7D&limit=78&offset=0' (length=176)
public 'limit' => int 78
public 'offset' => int 78
public 'orders' =>
array (size=0)
empty
Is there any explanation for this behaviour?
Side note: I haven't tagged the question as php because the issue is with the json response I get from the REST API, irrespective of the language I use to process it.

Related

Streaming query with mssql and node, very slow the first time

I am using node js 10.16.0 and the node-mssql module to connect to a DB. Everything works fine and my simple queries work fine.
If I try to stream data from a query, using the node-mssql example , the first time I execute its very slow. It doesnt show a Timeout Error, but takes about a minute or more to complete.
According to the console log, it brings the first 55 rows and then stops for a while. It looks like it takes some time between the "sets" of data, as I divide them, according to my code below . If I execute the same query a second or third time, it takes only a second to complete. The total amount of rows is about 25.000 or more
How can I make my stream queries faster, at least the first time
Here is my code
following the example, the idea is, start streaming, set 1000 rows, pause streaming, process that rows, send them back with websockets, empty all arrays, continue with streaming, until done
let skate= [];
let leather= [];
let waterproof = [];
let stream_start = new Date();
const request = new sql.Request(pool);
request.stream = true;
request
.input('id_param', sql.Int, parseInt(id))
.input('start_date_param', sql.VarChar(50), startDate)
.input('stop_date_param', sql.VarChar(50), stopDate)
.query('SELECT skate, leather , waterproof FROM shoes WHERE id = #id_param AND CAST(startTime AS date) BETWEEN #start_date_param AND #stop_date_param ');
request.on('row', row => {
rowc++; console.log(rowc);
rowsToProcess.push(row);
if (rowsToProcess.length >= 1000) {
request.pause();
processRows();
}
});
const processRows = () => {
rowsToProcess.forEach((item, index) => {
skate.push(item.skate);
leather.push(item.leather );
waterproof.push(item.waterproof);
});
measurementsData.push(
{title: 'Skate shoes', data: skate},
{title: 'Leather shoes', data: leather},
{title: 'Waterproof shoes', data: waterproof}
);
console.log('another processRows done');
//ws.send(JSON.stringify({ message: measurementsData }));
rowsToProcess = [];
skate= [];
leather= [];
waterproof = [];
measurementsData = [];
request.resume();
}
request.on('done', () => {
console.log('rowc , ', rowc);
console.log('stream start , ', stream_start);
console.log('stream done , ', new Date());
processRows();
});
I would try to improve the indexing of shoes table. From what I see, 2 possible issues with your query/indexing :
You filter by datetime startTime column but there is index only on the id column (according to the comments)
You cast datetime to date within the where clause of the query
Indexes
As you're filtering only on date without time part, I'd suggest you to create a new column startDate which is the conversion of startTime to date and create an index on it. And then use this indexed column in the query.
Also, since you select only skate, leather , waterproof columns, including them in the index could give better performances. Read about indexes with included columns.
If you are always selecting data that is greater or older than certain date then you may look into filtered indexes.
Avoid cast in where
Even if in general cast does not cost but when using it within where clause it might keep SQL Server from making efficient use of the indexes. So you should avoid it.
If you create a new column with just the date part and index it as cited above, you don't need to use cast here:
WHERE id = #id_param AND startDate BETWEEN #start_date_param AND #stop_date_param
When a query runs slow the first time but fast in subsequent executions, as someone suggested earlier, its generally due to caching. The performance is quite likely related to the storage device that the database is operating on.
I expect the explain plan does not change between executions.
you should remove the cast on where clause or create a computed index (if possible in your db)
operations in the column always may hurt your query, avoid it if possible
try just set your where parameters
#start_date_param to date yyyy-mm-dd 00:00:00
#stop_date_param to date yyyy-mm-dd 23:59:59
AND startTime BETWEEN #start_date_param AND #stop_date_param

Results DataSet from DynamoDB Query using GSI is not returning correct results

I have a dynamo DB table where I am currently storing all the events that are happening in my system with respect to every product. There is a primary key with a Hash combination of productid,eventtype and eventcategory and Sort Key as Creation Time on the main table. The table was created and data was added into it.
Later I added a new GSI on the table with the attributes being Secondary Hash (which is just the combination of eventcategory and eventtype (excluding productid) and CreationTime as Sort Key. This was added so that I can query for multiple products at once.
The GSI seems to work fine, However only later I realized the data being returned is incorrect
Here is the scenario. (I am running all these queries against the newly created index)
I was querying for products with in the last 30 days and the Query returns 312 records, However, when I run the same query for last 90 days, it returns me only 128 records (which is wrong, should be atleast equal or greater than last 30 days records)
I have the pagination logic already embedded in my code, so that the lastEvaluatedKey is verified every-time, to loop and fetch the next set of records and after the loop, all the results are combined.
Not sure if I am missing something.
ANy suggestions would be appreciated.
var limitPtr *int64
if limit > 0 {
limit64 := int64(limit)
limitPtr = &limit64
}
input := dynamodb.QueryInput{
ExpressionAttributeNames: map[string]*string{
"#sch": aws.String("SecondaryHash"),
"#pkr": aws.String("CreationTime"),
},
ExpressionAttributeValues: map[string]*dynamodb.AttributeValue{
":sch": {
S: aws.String(eventHash),
},
":pkr1": {
N: aws.String(strconv.FormatInt(startTime, 10)),
},
":pkr2": {
N: aws.String(strconv.FormatInt(endTime, 10)),
},
},
KeyConditionExpression: aws.String("#sch = :sch AND #pkr BETWEEN :pkr1 AND :pkr2"),
ScanIndexForward: &scanForward,
Limit: limitPtr,
TableName: aws.String(ddbTableName),
IndexName: aws.String(ddbIndexName),
}
You reached the maximum number of items to evaluate (not necessarily the number of matching items). The limit is 1 MB.
The response will contain a LastEvaluatedKey parameter, it is the last item's id. You have to perform a new query with an extra ExclusiveStartKey parameter. (ExclusiveStartKey should be equal with LastEvaluatedKey's value.)
When the LastEvaluatedKey is empty you reached the end of the table.

search within an array with a condition

I have two array I'm trying to compare at many levels. Both have the same structure with 3 "columns.
The first column contains the polygon's ID, the second a area type, and the third, the percentage of each area type for a polygone.
So, for many rows, it will compare, for example, ID : 1 Type : aaa % : 100
But for some elements, I have many rows for the same ID. For example, I'll have ID 2, Type aaa, 25% --- ID 2, type bbb, 25% --- ID 2, type ccc, 50%. And in the second array, I'll have ID 2, Type aaa, 25% --- ID 2, type bbb, 10% --- ID 2, type eee, 38% --- ID 2, type fff, 27%.
here's a visual example..
So, my function has to compare these two array and send me an email if there are differences.
(I wont show you the real code because there are 811 lines). The first "if" condition is
if array1.id = array2.id Then
if array1.type = array2.type Then
if array1.percent = array2.percent Then
zone_verification = True
Else
zone_verification = False
The probleme is because there are more than 50 000 rows in each array. So when I run the function, for each "array1.id", the function search through 50 000 rows in array2. 50 000 searchs for 50 000 rows.. it's pretty long to run!
I'm looking for something to get it running faster. How could I get my search more specific. Example : I have many id "2" in the array1. If there are many id "2" in the array2, find it, and push all the array2.id = 3 in a "sub array" or something like that, and search in these specific rows. So I'll have just X rows in array1 to compare with X rows in array 2, not with 50 000. and when each "id 2" in array1 is done, do the same thing for "id 4".. and for "id 5"...
Hope it's clear. it's almost the first time I use VB.net, and I have this big function to get running.
Thanks
EDIT
Here's what I wanna do.
I have two different layers in a geospatial database. Both layers have the same structure. They are a "spatial join" of the land parcels (55 000), and the land use layer. The first layer is the current one, and the second layer is the next one we'll use after 2015.
So I have, for each "land parcel" the percentage of each land use. So, for a "land parcel" (ID 7580-80-2532, I can have 50% of farming use (TYPE FAR-23), and 50% of residantial use (RES-112). In the first array, I'll have 2 rows with the same ID (7580-80-2532), but each one will have a different type (FAR-23, RES-112) and a different %.
In the second layer, the same the municipal zoning (land use) has changed. So the same "land parcel" will now be 40% of residential use (RES-112), 20% of commercial (COM-54) and 40% of a new farming use (FAR-33).
So, I wanna know if there are some differences. Some land parcels will be exactly the same. Some parcels will keep the same land use, but not the same percentage of each. But for some land parcel, there will be more or less land use types with different percentage of each.
I want this script to compare these two layers and send me an email when there are differences between these two layers for the same land parcel ID.
The script is already working, but it takes too much time.
The probleme is, I think, the script go through all array2 for each row in array 1.
What I want is when there are more than 1 rows with the same ID in array1, take only this ID in both arrays.
Maybe if I order them by IDs, I could write a condition. kind of "when you find what you're looking for, stop searching when you'll find a different value?
It's hard to explain it clearly because I've been using VB since last week.. And english isn't my first language! ;)
If you just want to find out if there are any differences between the first and second array, you could do:
Dim diff = New HashSet(of Polygon)(array1)
diff.SymmetricExceptWith(array2)
diff will contain any Polygon which is unique to array1 or array2. If you want to do other types of comparisons, maybe you should explain what you're trying to do exactly.
UPDATE:
You could use grouping and lookups like this:
'Create lookup with first array, for fast access by ID
Dim lookupByID = array1.ToLookup(Function(p) p.id)
'Loop through each group of items with same ID in array2
For Each secondArrayValues in array2.GroupBy(Function(p) p.id)
Dim currentID As Integer = secondArrayValues.Key 'Current ID is the grouping key
'Retrieve values with same ID in array1
'Use a hashset to easily compare for equality
Dim firstArrayValues As New HashSet(of Polygon)(lookupByID(currentID))
'Check for differences between the two sets of data, for this ID
If Not firstArrayValues.SetEquals(secondArrayValues) Then
'Data has changed, do something
Console.WriteLine("Differences for ID " & currentID)
End If
Next
I am answering this question based on the first part that you wrote (that is without the EDIT section). The correct answer should explain a good algorithm but I am suggesting you to use DB capabilities because they have optimized many queries for these purpose.
Put all the records in DB two tables - O(n) time ... If the records are static you dont need to perform this step every time.
Table 1
id type percent
Table 2
id type percent
Then use the DB query, some thing like this
select count(*) from table1 t1, table2 t2 where t1.id!=t2.id and t1.type!=t2.type
(you can use some better queries, what I am trying to say is give the control to DB to perform this operation)
retrieve the result in your code and perform the necessary operation.
EDIT
1) You can sort them in O(n logn) time based on ID + type + Percent and then perform binary search.
2) Store the first record in hash map with appropriate key - could be ID only or ID+type
this will take O(n) time and searching ,if key is correct, will take constant time.
You need to define a structure to store this data. We'll store all the data in a LandParcel class, which will have a HashSet<ParcelData>
public class ParcelData
{
public ParcelType Type { get; set; } // This can be an enum, string, etc.
public int Percent { get; set; }
// Redefine Equals and GetHashCode conveniently
}
public class LandParcel
{
public ID Id { get; set; } // Whatever the type of the ID is...
public HashSet<ParcelData> Data { get; set; }
}
Now you have to build your data structure, with something like this:
Dictionary<ID, LandParcel> data1 = new ....
foreach (var item in array1)
{
LandParcel p;
if (!data1.TryGetValue(item.id, out p)
data1[item.id] = p = new LandParcel(id);
// Can this data be repeated?
p.Data.Add(new ParcelData(item.type, item.percent));
}
You do the same with a data2 dictionary for the second array. Now you iterate for all items in data1 and compare them with the item with the same id for data2.
foreach (var parcel2 in data2.Values)
{
var parcel1 = data1[parcel2.ID]; // Beware with exceptions here !!!
if (!parcel1.Data.SetEquals(parcel2.Data))
// You have different parcels
}
(Now that I look at it, we are practically doing a small database query here, kind of smelly code ...)
Sorry for the C# code since I don't really feel so comfortable with VB, but it should be fairly straightforward.

Total count for a limited result

So i implemented the paging for dojo.store.jsonRest to use as store in the dojox.grid.DataGrid. In the server im using Symfony 2 and as ORM Doctrine, im new to this two frameworks.
For Dojo jsonRest the response of the server must have a header Content-Range containing the result offset, limit and the total number of records (without the limit).
So for a response with a Content-Range: items 0-24/66 header, if the user where to scroll the grid records to the 24 row, it will make a async request with Range: 24-66 header, then the response header should have a Content-Range: items 24-66/66. This is done so Dojo can know how many request it can make for the paginated data and the records range for the presented and subsequent request.
So my problem is that to get the total number of records without the limit, i had to make a COUNT query using the same query that has the offset and limit. I don't like this.
I want to know if there is a way i can get the total count and the limited result without making two queries.
public function getByTextCount($text)
{
$dql = "SELECT COUNT(s.id) FROM Bundle:Something s WHERE s.text LIKE :text";
$query = $this->getEntityManager()->createQuery($dql);
$query->setParameter('text', '%'.$text.'%');
return $query->getSingleScalarResult();
}
-
public function getByText($text, $offset=0, $limit=24)
{
$dql = "SELECT r FROM Bundle:Something s WHERE s.text LIKE :text";
$query = $this->getEntityManager()->createQuery($dql);
$query->setParameter('text', '%'.$text.'%');
$query->setFirstResult($offset);
$query->setMaxResults($limit);
return $query->getArrayResult();
}
If you're using MySQL, you can do a SELECT FOUND_ROWS().
From the documentation.
A SELECT statement may include a LIMIT clause to restrict the number
of rows the server returns to the client. In some cases, it is
desirable to know how many rows the statement would have returned
without the LIMIT, but without running the statement again. To obtain
this row count, include a SQL_CALC_FOUND_ROWS option in the SELECT
statement, and then invoke FOUND_ROWS() afterward:
mysql> SELECT SQL_CALC_FOUND_ROWS * FROM tbl_name
-> WHERE id > 100 LIMIT 10;
mysql> SELECT FOUND_ROWS();
If you want to use Doctrine only (i.e. to avoid vendor-specific SQL) you can always reset part of the query after you have selected the entities:
// $qb is a Doctrine Query Builder
// $query is the actual DQL query returned from $qb->getQuery()
// and then updated with the ->setFirstResult(OFFSET) and ->setMaxResults(LIMIT)
// Get the entities as an array ready for JSON serialization
$entities = $query->getArrayResult();
// Reset the query and get the total records ready for the Range header
// 'e' in count(e) is the alias for the entity specified in the Query Builder
$count = $qb->resetDQLPart('orderBy')
->select('COUNT(e)')
->getQuery()
->getSingleScalarResult();

lucene group by

hi have index simple document where you have 2 fields:
1. profileId as long
2. profileAttribute as long.
i need to know how many profileId's have a certain set of attribute.
for example i index:
doc1: profileId:1 , profileAttribute = 55
doc2: profileId:1 , profileAttribute = 57
doc3: profileId:2 , profileAttribute = 55
and i want to know how many profiles have both attribute 55 and 57
in this example the answer is 1 cuz only profile id 1 have both attributes
thanks in advance for your help
You can search for profileAttribute:(57 OR 55) and then iterate over the results and put their profileId property in a set in order to count the total number of unique profileIds.
But you need to know that Lucene will perform poorly at this compared to, say, a RDBMS. This is because Lucene is an inverted index, meaning it is very good at retrieving the top documents that match a query. It is however not very good at iterating over the stored fields of a large number of documents.
However, if profileId is single-valued and indexed, you can get its values using Lucene's fieldCache which will prevent you from performing costly disk accesses. The drawback is that this fieldCache will use a lot of memory (depending on the size of your index) and take time to load every time you (re-)open your index.
If changing the index format is acceptable, this solution can be improved by making profileIds uniques, your index would have the following format :
doc1: profileId: [1], profileAttribute: [55, 57]
doc2: profileId: [2], profileAttribute: [55]
The difference is that profileIds are unique and profileAttribute is now a multi-valued field. To count the number of profileIds for a given set of profileAttribute, you now only need to query for the list of profileAttribute (as previously) and use a TotalHitCountCollector.