Related
I have a CosmosDB database/collection with the partition key on /id and spatial indexing enabled using the Geography configuration. When I query for objects with a LineString property within a given LineString or Polygon, the query retrieves all of the documents in the collection before returning those that are within the LineString/Polygon (retrieved is greater than output). The RU's consumed grow as the number of items within the collection grow, which signals to me that it's basically doing a scan and the index is not working.
CosmosDB documentation states the following:
Azure Cosmos DB supports indexing of Points, LineStrings, Polygons, and MultiPolygons
However the documentation does not have any examples that don't use the Point type and I am unable to query using permutations of exclusively non-Point types and hit the index.
To test spatial indexing is working I have an additional Start property on the item with the value of the first Point in the LineString, and I can query if this is within the Polygon at a constant RU consumption.
Here is the index:
{
"indexingMode": "consistent",
"automatic": true,
"includedPaths": [
{
"path": "/*"
}
],
"excludedPaths": [
{
"path": "/\"_etag\"/?"
}
],
"spatialIndexes": [
{
"path": "/*",
"types": [
"Point",
"LineString",
"Polygon",
"MultiPolygon"
]
}
]
}
Here is the needle. The haystack is about 1,000 objects with random LineStrings.
{
"id": "test",
"Start": {
"type": "Point",
"coordinates": [ 1, 3 ]
},
"Points": {
"type": "LineString",
"coordinates": [ [ 1, 3 ], [ 1, 4 ], [ 1, 5 ] ]
}
}
Here is the search within a Polygon:
SELECT *
FROM items i
WHERE ST_WITHIN(i.Points, {
"type":"Polygon",
"coordinates": [[[0, 10], [0, 0], [2, 0], [2, 10], [0, 10]]]
})
---
Request Charge: 127.4 RUs
Retrieved document count: 992
Retrieved document size: 1219980 bytes
Output document count: 1
Output document size: 441 bytes
Index hit document count: 0
Index lookup time: 3.77 ms
Here is the search within a LineString:
SELECT *
FROM items i
WHERE ST_WITHIN(i.Points, {
"type":"LineString",
"coordinates": [[1, 3], [1, 4], [1, 5]]
})
---
Request Charge: 122.53 RUs
Retrieved document count: 992
Retrieved document size: 1219980 bytes
Output document count: 1
Output document size: 441 bytes
Index hit document count: 0
Index lookup time: 3.0100000000000002 ms
Here is the search for a Start within the same Polygon as above, showing that spatial indexing is enabled and working:
SELECT *
FROM items i
WHERE ST_WITHIN(i.Start, {
"type":"Polygon",
"coordinates": [[[0, 10], [0, 0], [2, 0], [2, 10], [0, 10]]]
---
Request Charge: 8.1 RUs
Retrieved document count: 1
Retrieved document size: 343 bytes
Output document count: 1
Output document size: 392 bytes
Index hit document count: 1
Index lookup time: 2.79 ms
I create a container and add your sample document,but the result is different with yours.
First sql result:
SELECT *
FROM items i
WHERE ST_WITHIN(i.Points, {
"type":"Polygon",
"coordinates": [[[0, 10], [0, 0], [2, 0], [2, 10], [0, 10]]]
})
---
Request Charge: 10.53 RUs
Retrieved document count: 1
Retrieved document size: 349 bytes
Output document count: 1
Output document size: 398 bytes
Index hit document count: 1
Index lookup time: 1.6800000000000002 ms
Second sql result:
SELECT *
FROM items i
WHERE ST_WITHIN(i.Points, {
"type":"LineString",
"coordinates": [[1, 3], [1, 4], [1, 5]]
})
---
Request Charge: 7.24 RUs
Retrieved document count: 1
Retrieved document size: 349 bytes
Output document count: 1
Output document size: 398 bytes
Index hit document count: 1
Index lookup time: 1.1399000000000001 ms
Third sql result:
SELECT *
FROM items i
WHERE ST_WITHIN(i.Start, {
"type":"Polygon",
"coordinates": [[[0, 10], [0, 0], [2, 0], [2, 10], [0, 10]]]
})
---
Request Charge: 10.53 RUs
Retrieved document count: 1
Retrieved document size: 349 bytes
Output document count: 1
Output document size: 398 bytes
Index hit document count: 1
Index lookup time: 1.6500000000000001 ms
According to my test,each sql hit the index.
By the way,my index is same to you and geospatial configuration is Geography.You can try again and if the result is similar with yours above,please let me know more detail,such as sdk or detail of your document(I test this on Azure portal).
Hello i have followed all the steps to make an inference and successfully run it on the model in this link : https://pjreddie.com/media/files/yolov3.weights
but when i tried it on a model i trained with darknet i get this error :
[ INFO ] Creating Inference Engine...
[ INFO ] Loading network files:
newyolo.xml
newyolo.bin
[ INFO ] Preparing inputs
[ INFO ] Loading model to the plugin
[ INFO ] Starting inference...
To close the application, press 'CTRL+C' here or switch to the output window and press ESC key
To switch between sync/async modes, press TAB key in the output window
yolo_original.py:280: DeprecationWarning: shape property of IENetLayer is deprecated. Please use shape property of DataPtr instead objects returned by in_data or out_data property to access shape of input or output data on corresponding ports
out_blob = out_blob.reshape(net.layers[net.layers[layer_name].parents[0]].shape)
[ INFO ] Layer detector/yolo-v3/Conv_14/BiasAdd/YoloRegion parameters:
[ INFO ] classes : 10
[ INFO ] num : 3
[ INFO ] coords : 4
[ INFO ] anchors : [55.0, 56.0, 42.0, 87.0, 68.0, 81.0]
Traceback (most recent call last):
File "yolo_original.py", line 363, in <module>
sys.exit(main() or 0)
File "yolo_original.py", line 286, in main
args.prob_threshold)
File "yolo_original.py", line 153, in parse_yolo_region
h_scale=orig_im_h, w_scale=orig_im_w))
File "yolo_original.py", line 99, in scale_bbox
xmin = int((x - w / 2) * w_scale)
ValueError: cannot convert float NaN to integer
knowing that i have provided the right shape and changed yolo_v3.json to match my model
here is the content of my yolo_v3.json:
[
{
"id": "TFYOLOV3",
"match_kind": "general",
"custom_attributes": {
"classes": 10,
"anchors": [18,22,31,33,33,50,55, 56,42,87,68,81,111,98,73,158,156,202],
"coords": 4,
"num": 9,
"masks":[[6, 7, 8], [3, 4, 5], [0, 1, 2]],
"entry_points": ["detector/yolo-v3/Reshape", "detector/yolo-v3/Reshape_4", "detector/yolo-v3/Reshape_8"]
}
}
]
i have tried multiple things to debug this like not providing the jsonfile ....etc
ps : yolo_original.py is the same demo that comes with openvino just renamed,
i'm using openvino version 2020.1
transforming NaN to float or skipping values with Nan didn't slove the problem.
Similar questions asked here before:
Count items for a single key: jq count the number of items in json by a specific key
Calculate the sum of object values:
How do I sum the values in an array of maps in jq?
Question
How to emulate the COUNT aggregate function which should behave similarly to its SQL original? Let's extend this question even more to include other regular SQL functions:
COUNT
SUM / MAX/ MIN / AVG
ARRAY_AGG
The last one is not a standard SQL function - it's from PostgreSQL but is quite useful.
At input comes a stream of valid JSON objects. For demonstration let's pick a simple story of owners and their pets.
Model and data
Base relation: Owner
id name age
1 Adams 25
2 Baker 55
3 Clark 40
4 Davis 31
Base relation: Pet
id name litter owner_id
10 Bella 4 1
20 Lucy 2 1
30 Daisy 3 2
40 Molly 4 3
50 Lola 2 4
60 Sadie 4 4
70 Luna 3 4
Source
From above we get a derivative relation Owner_Pet (a result of SQL JOIN of the above relations) presented in JSON format for our jq queries (the source data):
{ "owner_id": 1, "owner": "Adams", "age": 25, "pet_id": 10, "pet": "Bella", "litter": 4 }
{ "owner_id": 1, "owner": "Adams", "age": 25, "pet_id": 20, "pet": "Lucy", "litter": 2 }
{ "owner_id": 2, "owner": "Baker", "age": 55, "pet_id": 30, "pet": "Daisy", "litter": 3 }
{ "owner_id": 3, "owner": "Clark", "age": 40, "pet_id": 40, "pet": "Molly", "litter": 4 }
{ "owner_id": 4, "owner": "Davis", "age": 31, "pet_id": 50, "pet": "Lola", "litter": 2 }
{ "owner_id": 4, "owner": "Davis", "age": 31, "pet_id": 60, "pet": "Sadie", "litter": 4 }
{ "owner_id": 4, "owner": "Davis", "age": 31, "pet_id": 70, "pet": "Luna", "litter": 3 }
Requests
Here are sample requests and their expected output:
COUNT the number of pets per owner:
{ "owner_id": 1, "owner": "Adams", "age": 25, "pets_count": 2 }
{ "owner_id": 2, "owner": "Baker", "age": 55, "pets_count": 1 }
{ "owner_id": 3, "owner": "Clark", "age": 40, "pets_count": 1 }
{ "owner_id": 4, "owner": "Davis", "age": 31, "pets_count": 3 }
SUM up the number of whelps per owner and get their MAX (MIN/AVG):
{ "owner_id": 1, "owner": "Adams", "age": 25, "litter_total": 6, "litter_max": 4 }
{ "owner_id": 2, "owner": "Baker", "age": 55, "litter_total": 3, "litter_max": 3 }
{ "owner_id": 3, "owner": "Clark", "age": 40, "litter_total": 4, "litter_max": 4 }
{ "owner_id": 4, "owner": "Davis", "age": 31, "litter_total": 9, "litter_max": 4 }
ARRAY_AGG pets per owner:
{ "owner_id": 1, "owner": "Adams", "age": 25, "pets": [ "Bella", "Lucy" ] }
{ "owner_id": 2, "owner": "Baker", "age": 55, "pets": [ "Daisy" ] }
{ "owner_id": 3, "owner": "Clark", "age": 40, "pets": [ "Molly" ] }
{ "owner_id": 4, "owner": "Davis", "age": 31, "pets": [ "Lola", "Sadie", "Luna" ] }
Here's an alternative, not using any custom functions with basic JQ. (I took the liberty to get rid of redundant parts of the question)
Count
In> jq -s 'group_by(.owner_id) | map({ owner_id: .[0].owner_id, count: map(.pet) | length})'
Out>[{"owner_id": "1","pets_count": 2}, ...]
Sum
In> jq -s 'group_by(.owner_id) | map({owner_id: .[0].owner_id, sum: map(.litter) | add})'
Out> [{"owner_id": "1","sum": 6}, ...]
Max
In> jq -s 'group_by(.owner_id) | map({owner_id: .[0].owner_id, max: map(.litter) | max})'
Out> [{"owner_id": "1","max": 4}, ...]
Aggregate
In> jq -s 'group_by(.owner_id) | map({owner_id: .[0].owner_id, agg: map(.pet) })'
Out> [{"owner_id": "1","agg": ["Bella","Lucy"]}, ...]
Sure, these might not be the most efficient implementations, but they show nicely how to implement custom functions oneself. All that changes between the different functions is inside the last map and the function after the pipe | (length, add, max)
The first map iterates over the different groups, taking the name from the first item, and using map again to iterate over the same-group items. Not as pretty as SQL, but not terribly more complicated.
I learned JQ today, and managed to do this already, so this should be encouraging for anyone getting started. JQ is neither like sed nor like SQL, but not terribly hard either.
Extended jq solution:
Custom count() function:
jq -sc 'def count($k): group_by(.[$k])[] | length as $l | .[0]
| .pets_count = $l
| del(.pet_id, .pet, .litter);
count("owner_id")' source.data
The output:
{"owner_id":1,"owner":"Adams","age":25,"pets_count":2}
{"owner_id":2,"owner":"Baker","age":55,"pets_count":1}
{"owner_id":3,"owner":"Clark","age":40,"pets_count":1}
{"owner_id":4,"owner":"Davis","age":31,"pets_count":3}
Custom sum() function:
jq -sc 'def sum($k): group_by(.[$k])[] | map(.litter) as $litters | .[0]
| . + {litter_total: $litters | add, litter_max: $litters | max}
| del(.pet_id, .pet, .litter);
sum("owner_id")' source.data
The output:
{"owner_id":1,"owner":"Adams","age":25,"litter_total":6,"litter_max":4}
{"owner_id":2,"owner":"Baker","age":55,"litter_total":3,"litter_max":3}
{"owner_id":3,"owner":"Clark","age":40,"litter_total":4,"litter_max":4}
{"owner_id":4,"owner":"Davis","age":31,"litter_total":9,"litter_max":4}
Custom array_agg() function:
jq -sc 'def array_agg($k): group_by(.[$k])[] | map(.pet) as $pets | .[0]
| .pets = $pets | del(.pet_id, .pet, .litter);
array_agg("owner_id")' source.data
The output:
{"owner_id":1,"owner":"Adams","age":25,"pets":["Bella","Lucy"]}
{"owner_id":2,"owner":"Baker","age":55,"pets":["Daisy"]}
{"owner_id":3,"owner":"Clark","age":40,"pets":["Molly"]}
{"owner_id":4,"owner":"Davis","age":31,"pets":["Lola","Sadie","Luna"]}
This is a nice exercise, but SO is not a programming service, so I will focus here on some key concepts for generic solutions in jq that are efficient, even for very large collections.
GROUPS_BY
The key to efficiency here is avoiding the built-in group_by, as it requires sorting. Since jq is fundamentally stream-oriented, the following definition of GROUPS_BY is likewise stream-oriented. It takes advantage of the efficiency of key-based lookups, while avoiding calling tojson on strings:
# emit a stream of the groups defined by f
def GROUPS_BY(stream; f):
reduce stream as $x ({};
($x|f) as $s
| ($s|type) as $t
| (if $t == "string" then $s else ($s|tojson) end) as $y
| .[$t][$y] += [$x] )
| .[][] ;
distinct and count_distinct
# Emit an array of the distinct entities in `stream`, without sorting
def distinct(stream):
reduce stream as $x ({};
($x|type) as $t
| (if $t == "string" then $x else ($x|tojson) end) as $y
| if (.[$t] | has($y)) then . else .[$t][$y] += [$x] end )
| [.[][]] | add ;
# Emit the number of distinct items in the given stream
def count_distinct(stream):
def sum(s): reduce s as $x (0;.+$x);
reduce stream as $x ({};
($x|type) as $t
| (if $t == "string" then $x else ($x|tojson) end) as $y
| .[$t][$y] = 1 )
| sum( .[][] ) ;
Convenience function
def owner: {owner_id,owner,age};
Example: "COUNT the number of pets per owner"
GROUPS_BY(inputs; .owner_id)
| (.[0] | owner) + {pets_count: count_distinct(.[]|.pet_id)}
Invocation: jq -nc -f program1.jq input.json
Output:
{"owner_id":1,"owner":"Adams","age":25,"pets_count":2}
{"owner_id":2,"owner":"Baker","age":55,"pets_count":1}
{"owner_id":3,"owner":"Clark","age":40,"pets_count":1}
{"owner_id":4,"owner":"Davis","age":31,"pets_count":3}
Example: "SUM up the number of whelps per owner and get their MAX"
GROUPS_BY(inputs; .owner_id)
| (.[0] | owner)
+ {litter_total: (map(.litter) | add)}
+ {litter_max: (map(.litter) | max)}
Invocation: jq -nc -f program2.jq input.json
Output: as given.
Example: "ARRAY_AGG pets per owner"
GROUPS_BY(inputs; .owner_id)
| (.[0] | owner) + {pets: distinct(.[]|.pet)}
Invocation: jq -nc -f program3.jq input.json
Output:
{"owner_id":1,"owner":"Adams","age":25,"pets":["Bella","Lucy"]}
{"owner_id":2,"owner":"Baker","age":55,"pets":["Daisy"]}
{"owner_id":3,"owner":"Clark","age":40,"pets":["Molly"]}
{"owner_id":4,"owner":"Davis","age":31,"pets":["Lola","Sadie","Luna"]}
I tried to run org.apache.ignite.examples.datastructures.IgniteSetExample on cluster(2 nodes) after adding some my debug code. Some of its source code like following:
CollectionConfiguration setCfg = new CollectionConfiguration();
setCfg.setAtomicityMode(CacheAtomicityMode.TRANSACTIONAL);
setCfg.setCacheMode(CacheMode.PARTITIONED);
// Initialize new set.
IgniteSet<String> set = ignite.set(setName, setCfg);
System.out.println("Set size before initializing: " + set.size()); //added by myslef
// Initialize set items.
for (int i = 0; i < 10; i++){
set.add(Integer.toString(i));
System.out.println("Set: " + Arrays.toString(set.toArray())); //added by myslef
}
System.out.println("Set size after initializing: " + set.size());
In my opinion, the size of ignite set should be 10 after adding data but I got a number which is great than 10 and typically 15. I found that there was some reduplicate numbers been added to the set. The log is here:
[19:53:16] Topology snapshot [ver=29, servers=2, clients=0, CPUs=8, heap=3.4GB]
Sep 21, 2017 7:53:16 PM org.apache.ignite.logger.java.JavaLogger info
Info: Topology snapshot [ver=29, servers=2, clients=0, CPUs=8, heap=3.4GB]
>>> Ignite set example started.
Set size before initializing: 0
Set: [0]
Set: [1, 1, 0]
Set: [2, 1, 2, 1, 0]
Set: [2, 1, 3, 2, 1, 0, 3]
Set: [2, 1, 3, 2, 1, 0, 4, 3]
Set: [2, 1, 3, 2, 1, 0, 5, 4, 3]
Set: [2, 1, 3, 2, 1, 0, 6, 5, 4, 3]
Set: [7, 2, 1, 3, 7, 2, 1, 0, 6, 5, 4, 3]
Set: [7, 2, 1, 3, 8, 7, 2, 1, 0, 6, 5, 4, 3]
Set: [7, 2, 1, 3, 9, 8, 7, 2, 1, 0, 6, 5, 4, 3]
Set size after initializing: 14
Sep 21, 2017 7:53:16 PM org.apache.ignite.logger.java.JavaLogger info
Info: Class locally deployed: class org.apache.ignite.examples.datastructures.IgniteSetExample$SetClosure
Sep 21, 2017 7:53:16 PM org.apache.ignite.logger.java.JavaLogger info
Info: Class locally deployed: class org.apache.ignite.configuration.CollectionConfiguration
Sep 21, 2017 7:53:16 PM org.apache.ignite.logger.java.JavaLogger info
Info: Class locally deployed: class org.apache.ignite.cache.CacheAtomicityMode
Sep 21, 2017 7:53:16 PM org.apache.ignite.logger.java.JavaLogger info
Info: Class locally deployed: class org.apache.ignite.cache.CacheMode
Set item has been added: 7aa983e1-c358-4876-b58f-4f3b7bfa65f3_0
Set item has been added: 7aa983e1-c358-4876-b58f-4f3b7bfa65f3_1
Set item has been added: 7aa983e1-c358-4876-b58f-4f3b7bfa65f3_2
Set item has been added: 7aa983e1-c358-4876-b58f-4f3b7bfa65f3_3
Set item has been added: 7aa983e1-c358-4876-b58f-4f3b7bfa65f3_4
Set size after writing [expected=20, actual=30]
Iterate over set.
Set item: 292c99a6-137b-433c-97d9-40ce0f8c0abc_1
Set item: 7aa983e1-c358-4876-b58f-4f3b7bfa65f3_3
Set item: 292c99a6-137b-433c-97d9-40ce0f8c0abc_3
Set item: 7
Set item: 292c99a6-137b-433c-97d9-40ce0f8c0abc_4
Set item: 2
Set item: 1
Set item: 7aa983e1-c358-4876-b58f-4f3b7bfa65f3_1
Set item: 3
Set item: 7aa983e1-c358-4876-b58f-4f3b7bfa65f3_2
Set item: 7aa983e1-c358-4876-b58f-4f3b7bfa65f3_3
Set item: 7aa983e1-c358-4876-b58f-4f3b7bfa65f3_4
Set item: 2
Set item: 1
Set item: 0
Set item: 6
Set item: 5
Set item: 7aa983e1-c358-4876-b58f-4f3b7bfa65f3_0
Set item: 4
Set item: 7aa983e1-c358-4876-b58f-4f3b7bfa65f3_1
Set item: 3
Set item: 7aa983e1-c358-4876-b58f-4f3b7bfa65f3_2
Set item: 292c99a6-137b-433c-97d9-40ce0f8c0abc_1
Set item: 9
Set item: 292c99a6-137b-433c-97d9-40ce0f8c0abc_2
Set item: 8
Set item: 292c99a6-137b-433c-97d9-40ce0f8c0abc_3
Set item: 7
Set item: 292c99a6-137b-433c-97d9-40ce0f8c0abc_4
Set item: 292c99a6-137b-433c-97d9-40ce0f8c0abc_0
Set size before clearing: 30
Set size after clearing: 0
Set was removed: true
Expected exception - Set has been removed from cache: GridCacheSetImpl [cache=GridDhtAtomicCache [defRes=org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$1#482d776b, near=null, super=GridDhtCacheAdapter [multiTxHolder=java.lang.ThreadLocal#186978a6, stopping=false, super=GridDistributedCacheAdapter [super=GridCacheAdapter [locMxBean=org.apache.ignite.internal.processors.cache.CacheLocalMetricsMXBeanImpl#631e06ab, clusterMxBean=org.apache.ignite.internal.processors.cache.CacheClusterMetricsMXBeanImpl#2a3591c5, aff=org.apache.ignite.internal.processors.cache.affinity.GridCacheAffinityImpl#34a75079, igfsDataCache=false, mongoDataCache=false, mongoMetaCache=false, igfsDataCacheSize=null, asyncOpsSem=java.util.concurrent.Semaphore#346a361[Permits = 500], name=datastructures_ATOMIC_PARTITIONED_0#default-ds-group, size=0]]]], name=03bbdb45-72ce-45aa-b75f-00b7b6134dc6, id=d55a844ae51-baeb6ba4-cb04-4d72-b0d8-188f21bc5ac5, collocated=false, hdrPart=961, rmvd=true, binaryMarsh=true, compute=org.apache.ignite.internal.IgniteComputeImpl#4052274f]
Sep 21, 2017 7:53:17 PM org.apache.ignite.logger.java.JavaLogger info
Info: Command protocol successfully stopped: TCP binary
Sep 21, 2017 7:53:17 PM org.apache.ignite.logger.java.JavaLogger info
Info: Stopped cache [cacheName=ignite-sys-cache]
Sep 21, 2017 7:53:17 PM org.apache.ignite.logger.java.JavaLogger info
Info: Stopped cache [cacheName=datastructures_TRANSACTIONAL_PARTITIONED_0#default-ds-group, group=default-ds-group]
Sep 21, 2017 7:53:17 PM org.apache.ignite.logger.java.JavaLogger info
Info: Stopped cache [cacheName=datastructures_ATOMIC_PARTITIONED_0#default-ds-group, group=default-ds-group]
Sep 21, 2017 7:53:17 PM org.apache.ignite.logger.java.JavaLogger info
Info: Stopped cache [cacheName=ignite-sys-atomic-cache#default-ds-group, group=default-ds-group]
Sep 21, 2017 7:53:17 PM org.apache.ignite.logger.java.JavaLogger info
Info: Removed undeployed class: GridDeployment [ts=1505994796165, depMode=SHARED, clsLdr=sun.misc.Launcher$AppClassLoader#73d16e93, clsLdrId=355a844ae51-7aa983e1-c358-4876-b58f-4f3b7bfa65f3, userVer=0, loc=true, sampleClsName=org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionFullMap, pendingUndeploy=false, undeployed=true, usage=0]
[19:53:17] Ignite node stopped OK [uptime=00:00:00:778]
Sep 21, 2017 7:53:17 PM org.apache.ignite.logger.java.JavaLogger info
Info:
>>> +---------------------------------------------------------------------------------+
>>> Ignite ver. 2.1.0#20170721-sha1:a6ca5c8a97e9a4c9d73d40ce76d1504c14ba1940 stopped OK
>>> +---------------------------------------------------------------------------------+
>>> Grid uptime: 00:00:00:778
Ignite set example finished.
If and only if I set "collocated" of the CollectionConfiguration instance to true , the size of IgniteSet was 10 as expected. But according to the official documents, if there is lots of data in a IgniteSet then "false" is recommended configuration of "collocated" attribute. So what's wrong here?
You can pour your data with IgniteSet by client mode, I have tested and it proved true. like this: Ignition.setClientMode(true);
Looks like IgniteSet has a bug. Thank you for the report.
For now you can use cache directly instead of a set. The same example will look like this:
public class IgniteSetExample {
static final Object DUMMY = new Object();
public static void main(String[] args) throws Exception {
Ignite ignite = Ignition.start("examples/config/example-ignite.xml");
CacheConfiguration<String, Object> cacheCfg = new CacheConfiguration<>("setCache");
cacheCfg.setAtomicityMode(TRANSACTIONAL);
cacheCfg.setCacheMode(PARTITIONED);
IgniteCache<String, Object> cache = ignite.getOrCreateCache(cacheCfg);
System.out.println("Set size before init: " + cache.size());
for (int i = 0; i < 10; i++) {
cache.put(Integer.toString(i), DUMMY);
System.out.println("Set elements: " + getKeys(cache));
}
System.out.println("Set size after init: " + cache.size());
}
static <T> List<T> getKeys(IgniteCache<T, ?> cache) {
List<T> keys = new ArrayList<>(cache.size());
for (Cache.Entry<T, ?> e : cache)
keys.add(e.getKey());
return keys;
}
}
I just want monitor my running spider's stats.I get the latest scrapy-plugins/scrapy-jsonrpc and set the spider as follows:
EXTENSIONS = {
'scrapy_jsonrpc.webservice.WebService': 500,
}
JSONRPC_ENABLED = True
JSONRPC_PORT = [60853]
but when I browse the http://localhost:60853/ , it just return
{"resources": ["crawler"]}
and I just can get the running spiders name without the stats.
anyone who can told me, which place I set wrong, thanks!
http://localhost:60853/ returns the resources available, /crawler being the only top-level one.
If you want to get stats for a spider, you'll need to query the /crawler/stats endpoint and call get_stats().
Here's an example using python-jsonrpc: (here I configured the webservice to listen on localhost and port 6024)
>>> import pyjsonrpc
>>> http_client = pyjsonrpc.HttpClient('http://localhost:6024/crawler/stats')
>>> http_client.call('get_stats', 'httpbin')
{u'log_count/DEBUG': 4, u'scheduler/dequeued': 4, u'log_count/INFO': 9, u'downloader/response_count': 2, u'downloader/response_status_count/200': 2, u'log_count/WARNING': 1, u'scheduler/enqueued/memory': 4, u'downloader/response_bytes': 639, u'start_time': u'2016-09-28 08:49:57', u'scheduler/dequeued/memory': 4, u'scheduler/enqueued': 4, u'downloader/request_bytes': 862, u'response_received_count': 2, u'downloader/request_method_count/GET': 4, u'downloader/request_count': 4}
>>> http_client.call('get_stats')
{u'log_count/DEBUG': 4, u'scheduler/dequeued': 4, u'log_count/INFO': 9, u'downloader/response_count': 2, u'downloader/response_status_count/200': 2, u'log_count/WARNING': 1, u'scheduler/enqueued/memory': 4, u'downloader/response_bytes': 639, u'start_time': u'2016-09-28 08:49:57', u'scheduler/dequeued/memory': 4, u'scheduler/enqueued': 4, u'downloader/request_bytes': 862, u'response_received_count': 2, u'downloader/request_method_count/GET': 4, u'downloader/request_count': 4}
>>> from pprint import pprint
>>> pprint(http_client.call('get_stats'))
{u'downloader/request_bytes': 862,
u'downloader/request_count': 4,
u'downloader/request_method_count/GET': 4,
u'downloader/response_bytes': 639,
u'downloader/response_count': 2,
u'downloader/response_status_count/200': 2,
u'log_count/DEBUG': 4,
u'log_count/INFO': 9,
u'log_count/WARNING': 1,
u'response_received_count': 2,
u'scheduler/dequeued': 4,
u'scheduler/dequeued/memory': 4,
u'scheduler/enqueued': 4,
u'scheduler/enqueued/memory': 4,
u'start_time': u'2016-09-28 08:49:57'}
>>>
You can also use jsonrpc_client_call from scrapy_jsonrpc.jsonrpc.
>>> from scrapy_jsonrpc.jsonrpc import jsonrpc_client_call
>>> jsonrpc_client_call('http://localhost:6024/crawler/stats', 'get_stats', 'httpbin')
{u'log_count/DEBUG': 5, u'scheduler/dequeued': 4, u'log_count/INFO': 11, u'downloader/response_count': 3, u'downloader/response_status_count/200': 3, u'log_count/WARNING': 1, u'scheduler/enqueued/memory': 4, u'downloader/response_bytes': 870, u'start_time': u'2016-09-28 09:01:47', u'scheduler/dequeued/memory': 4, u'scheduler/enqueued': 4, u'downloader/request_bytes': 862, u'response_received_count': 3, u'downloader/request_method_count/GET': 4, u'downloader/request_count': 4}
This is what you get "on the wire" for a request made with a modified example-client.py (see code a bit below, the example in https://github.com/scrapy-plugins/scrapy-jsonrpc is outdated as I write these lines):
POST /crawler/stats HTTP/1.1
Accept-Encoding: identity
Content-Length: 73
Host: localhost:6024
Content-Type: application/x-www-form-urlencoded
Connection: close
User-Agent: Python-urllib/2.7
{"params": ["httpbin"], "jsonrpc": "2.0", "method": "get_stats", "id": 1}
And the response
HTTP/1.1 200 OK
Content-Length: 504
Access-Control-Allow-Headers: X-Requested-With
Server: TwistedWeb/16.4.1
Connection: close
Date: Tue, 27 Sep 2016 11:21:43 GMT
Access-Control-Allow-Origin: *
Access-Control-Allow-Methods: GET, POST, PATCH, PUT, DELETE
Content-Type: application/json
{"jsonrpc": "2.0", "result": {"log_count/DEBUG": 5, "scheduler/dequeued": 4, "log_count/INFO": 11, "downloader/response_count": 3, "downloader/response_status_count/200": 3, "log_count/WARNING": 3, "scheduler/enqueued/memory": 4, "downloader/response_bytes": 870, "start_time": "2016-09-27 11:16:25", "scheduler/dequeued/memory": 4, "scheduler/enqueued": 4, "downloader/request_bytes": 862, "response_received_count": 3, "downloader/request_method_count/GET": 4, "downloader/request_count": 4}, "id": 1}
Here's the modified client to query /crawler/stats, which I called with ./example-client.py -H localhost -P 6024 get-spider-stats httpbin (for a running "httpbin" spider, JSONRPC_PORT being 6024 for me)
#!/usr/bin/env python
"""
Example script to control a Scrapy server using its JSON-RPC web service.
It only provides a reduced functionality as its main purpose is to illustrate
how to write a web service client. Feel free to improve or write you own.
Also, keep in mind that the JSON-RPC API is not stable. The recommended way for
controlling a Scrapy server is through the execution queue (see the "queue"
command).
"""
from __future__ import print_function
import sys, optparse, urllib, json
from six.moves.urllib.parse import urljoin
from scrapy_jsonrpc.jsonrpc import jsonrpc_client_call, JsonRpcError
def get_commands():
return {
'help': cmd_help,
'stop': cmd_stop,
'list-available': cmd_list_available,
'list-running': cmd_list_running,
'list-resources': cmd_list_resources,
'get-global-stats': cmd_get_global_stats,
'get-spider-stats': cmd_get_spider_stats,
}
def cmd_help(args, opts):
"""help - list available commands"""
print("Available commands:")
for _, func in sorted(get_commands().items()):
print(" ", func.__doc__)
def cmd_stop(args, opts):
"""stop <spider> - stop a running spider"""
jsonrpc_call(opts, 'crawler/engine', 'close_spider', args[0])
def cmd_list_running(args, opts):
"""list-running - list running spiders"""
for x in json_get(opts, 'crawler/engine/open_spiders'):
print(x)
def cmd_list_available(args, opts):
"""list-available - list name of available spiders"""
for x in jsonrpc_call(opts, 'crawler/spiders', 'list'):
print(x)
def cmd_list_resources(args, opts):
"""list-resources - list available web service resources"""
for x in json_get(opts, '')['resources']:
print(x)
def cmd_get_spider_stats(args, opts):
"""get-spider-stats <spider> - get stats of a running spider"""
stats = jsonrpc_call(opts, 'crawler/stats', 'get_stats', args[0])
for name, value in stats.items():
print("%-40s %s" % (name, value))
def cmd_get_global_stats(args, opts):
"""get-global-stats - get global stats"""
stats = jsonrpc_call(opts, 'crawler/stats', 'get_stats')
for name, value in stats.items():
print("%-40s %s" % (name, value))
def get_wsurl(opts, path):
return urljoin("http://%s:%s/"% (opts.host, opts.port), path)
def jsonrpc_call(opts, path, method, *args, **kwargs):
url = get_wsurl(opts, path)
return jsonrpc_client_call(url, method, *args, **kwargs)
def json_get(opts, path):
url = get_wsurl(opts, path)
return json.loads(urllib.urlopen(url).read())
def parse_opts():
usage = "%prog [options] <command> [arg] ..."
description = "Scrapy web service control script. Use '%prog help' " \
"to see the list of available commands."
op = optparse.OptionParser(usage=usage, description=description)
op.add_option("-H", dest="host", default="localhost", \
help="Scrapy host to connect to")
op.add_option("-P", dest="port", type="int", default=6080, \
help="Scrapy port to connect to")
opts, args = op.parse_args()
if not args:
op.print_help()
sys.exit(2)
cmdname, cmdargs, opts = args[0], args[1:], opts
commands = get_commands()
if cmdname not in commands:
sys.stderr.write("Unknown command: %s\n\n" % cmdname)
cmd_help(None, None)
sys.exit(1)
return commands[cmdname], cmdargs, opts
def main():
cmd, args, opts = parse_opts()
try:
cmd(args, opts)
except IndexError:
print(cmd.__doc__)
except JsonRpcError as e:
print(str(e))
if e.data:
print("Server Traceback below:")
print(e.data)
if __name__ == '__main__':
main()
In the example command above, I got this:
log_count/DEBUG 5
scheduler/dequeued 4
log_count/INFO 11
downloader/response_count 3
downloader/response_status_count/200 3
log_count/WARNING 3
scheduler/enqueued/memory 4
downloader/response_bytes 870
start_time 2016-09-27 11:16:25
scheduler/dequeued/memory 4
scheduler/enqueued 4
downloader/request_bytes 862
response_received_count 3
downloader/request_method_count/GET 4
downloader/request_count 4