SeaweedFS - Added new volume server but not able to add new files - seaweedfs

I've one master(x.x.x.61), one volume(x.x.x.63) and one filer + s3API (x.x.x.62) setup on 3 separate machines.
I added a new volume server (x.x.x.64) because I've max out the storage space on the first volume server.
But I'm still not able to add new files on the filer UI(http://x.x.x.62:8888)
In my filer logs, I noticed that it's trying to connect to the first volume server IP address that's out of space. Am I missing a configuration for it to connect to the new volume server?
E1221 11:09:48.027930 upload_content.go:351 unmarshal http://x.x.x.63:8080/7,2bafadaa4666: {"error":"failed to write to local disk: write data/chrisDir_7.dat: no space left on device"}{"name":"app_progress4.apk","size":2353734,"eTag":"92b10892"}
W1221 11:09:48.027950 upload_content.go:168 uploading 2 to http://x.x.x.63:8080/7,2bafadaa4666: unmarshal http://x.x.x.63:8080/7,2bafadaa4666: invalid character '{' after top-level value
E1221 11:09:48.027965 filer_server_handlers_write_upload.go:209 upload error: unmarshal http://x.x.x.63:8080/7,2bafadaa4666: invalid character '{' after top-level value
I1221 11:09:48.028022 common.go:70 response method:POST URL:/buckets/chrisDir/ with httpStatus:500 and JSON:{"error":"unmarshal http://x.x.x.63:8080/2,2ba84b2894a7: invalid character '{' after top-level value"}
In the master log, I see that the second volume server was added successfully and master.toml file was executed to rebalance
I1221 11:36:09.522690 node.go:225 topo:DefaultDataCenter:DefaultRack adds child x.x.x.64:8080
I1221 11:36:09.522716 node.go:225 topo:DefaultDataCenter:DefaultRack:x.x.x.64:8080 adds child
I1221 11:36:09.522724 master_grpc_server.go:138 added volume server 0: x.x.x.64:8080 [3caad049-38a6-43f6-8192-d1082c5e838b]
I1221 11:36:09.522744 master_grpc_server.go:49 found new uuid:x.x.x.64:8080 [3caad049-38a6-43f6-8192-d1082c5e838b] , map[x.x.x.63:8080:[5005b287-c812-4dba-ba41-9b5a6a022f12] x.x.x.64:8080:[3caad049-38a6-43f6-8192-d1082c5e838b]]
I1221 11:36:09.522866 volume_layout.go:393 Volume 11 becomes writable
I1221 11:36:09.522880 master_grpc_server.go:199 master see new volume 11 from x.x.x.64:8080
I1221 11:38:33.481721 master_server.go:323 executing: lock []
I1221 11:38:33.482821 master_server.go:323 executing: ec.encode [-fullPercent=95 -quietFor=1h]
I1221 11:38:33.483925 master_server.go:323 executing: ec.rebuild [-force]
I1221 11:38:33.484372 master_server.go:323 executing: ec.balance [-force]
I1221 11:38:33.484777 master_server.go:323 executing: volume.balance [-force]
2022/12/21 11:38:48 copying volume 21 from x.x.x.63:8080 to x.x.x.64:8080
I1221 11:38:48.486778 volume_layout.go:407 Volume 21 has 0 replica, less than required 1
I1221 11:38:48.486798 volume_layout.go:380 Volume 21 becomes unwritable
I1221 11:38:48.494998 volume_layout.go:393 Volume 21 becomes writable
2022/12/21 11:38:48 tailing volume 21 from x.x.x.63:8080 to x.x.x.64:8080
2022/12/21 11:38:58 deleting volume 21 from x.x.x.63:8080
....
How I start master
./weed master -mdir='.'
How I start volume
./weed volume -max=100 -mserver="x.x.x.61:9333" -dir="$dataDir"
How I start filer and s3
./weed filer -master="x.x.x.61:9333" -s3
What's in $HOME/.seaweedfs
drwxrwxr-x 2 seaweedfs seaweedfs 4096 Dec 20 16:01 .
drwxr-xr-x 20 seaweedfs seaweedfs 4096 Dec 20 16:01 ..
-rw-r--r-- 1 seaweedfs seaweedfs 2234 Dec 20 15:57 master.toml
Content of master.toml file
# Put this file to one of the location, with descending priority
# ./master.toml
# $HOME/.seaweedfs/master.toml
# /etc/seaweedfs/master.toml
# this file is read by master
[master.maintenance]
# periodically run these scripts are the same as running them from 'weed shell'
scripts = """
lock
ec.encode -fullPercent=95 -quietFor=1h
ec.rebuild -force
ec.balance -force
volume.deleteEmpty -quietFor=24h -force
volume.balance -force
volume.fix.replication
s3.clean.uploads -timeAgo=24h
unlock
"""
sleep_minutes = 7 # sleep minutes between each script execution
[master.sequencer]
type = "raft" # Choose [raft|snowflake] type for storing the file id sequence
# when sequencer.type = snowflake, the snowflake id must be different from other masters
sequencer_snowflake_id = 0 # any number between 1~1023
# configurations for tiered cloud storage
# old volumes are transparently moved to cloud for cost efficiency
[storage.backend]
[storage.backend.s3.default]
enabled = false
aws_access_key_id = "" # if empty, loads from the shared credentials file (~/.aws/credentials).
aws_secret_access_key = "" # if empty, loads from the shared credentials file (~/.aws/credentials).
region = "us-east-2"
bucket = "your_bucket_name" # an existing bucket
endpoint = ""
storage_class = "STANDARD_IA"
# create this number of logical volumes if no more writable volumes
# count_x means how many copies of data.
# e.g.:
# 000 has only one copy, copy_1
# 010 and 001 has two copies, copy_2
# 011 has only 3 copies, copy_3
[master.volume_growth]
copy_1 = 7 # create 1 x 7 = 7 actual volumes
copy_2 = 6 # create 2 x 6 = 12 actual volumes
copy_3 = 3 # create 3 x 3 = 9 actual volumes
copy_other = 1 # create n x 1 = n actual volumes
# configuration flags for replication
[master.replication]
# any replication counts should be considered minimums. If you specify 010 and
# have 3 different racks, that's still considered writable. Writes will still
# try to replicate to all available volumes. You should only use this option
# if you are doing your own replication or periodic sync of volumes.
treat_replication_as_minimums = false
System status
curl http://localhost:9333/dir/assign?pretty=y
{
"fid": "9,2bb2fd75d706",
"url": "x.x.x.63:8080",
"publicUrl": "x.x.x.63:8080",
"count": 1
}
curl http://x.x.x.61:9333/cluster/status?pretty=y
{
"IsLeader": true,
"Leader": "x.x.x.61:9333",
"MaxVolumeId": 21
}
curl "http://x.x.x.61:9333/dir/status?pretty=y"
{
"Topology": {
"Max": 200,
"Free": 179,
"DataCenters": [
{
"Id": "DefaultDataCenter",
"Racks": [
{
"Id": "DefaultRack",
"DataNodes": [
{
"Url": "x.x.x.63:8080",
"PublicUrl": "x.x.x.63:8080",
"Volumes": 20,
"EcShards": 0,
"Max": 100,
"VolumeIds": " 1-10 12-21"
},
{
"Url": "x.x.x.64:8080",
"PublicUrl": "x.x.x.64:8080",
"Volumes": 1,
"EcShards": 0,
"Max": 100,
"VolumeIds": " 11"
}
]
}
]
}
],
"Layouts": [
{
"replication": "000",
"ttl": "",
"writables": [
6,
1,
2,
7,
3,
4,
5
],
"collection": "chrisDir"
},
{
"replication": "000",
"ttl": "",
"writables": [
16,
19,
17,
21,
15,
18,
20
],
"collection": "chrisDir2"
},
{
"replication": "000",
"ttl": "",
"writables": [
8,
12,
13,
9,
14,
10,
11
],
"collection": ""
}
]
},
"Version": "30GB 3.37 438146249f50bf36b4c46ece02a430f44152777f"
}

Related

How to index and query complex spatial types in CosmosDB?

I have a CosmosDB database/collection with the partition key on /id and spatial indexing enabled using the Geography configuration. When I query for objects with a LineString property within a given LineString or Polygon, the query retrieves all of the documents in the collection before returning those that are within the LineString/Polygon (retrieved is greater than output). The RU's consumed grow as the number of items within the collection grow, which signals to me that it's basically doing a scan and the index is not working.
CosmosDB documentation states the following:
Azure Cosmos DB supports indexing of Points, LineStrings, Polygons, and MultiPolygons
However the documentation does not have any examples that don't use the Point type and I am unable to query using permutations of exclusively non-Point types and hit the index.
To test spatial indexing is working I have an additional Start property on the item with the value of the first Point in the LineString, and I can query if this is within the Polygon at a constant RU consumption.
Here is the index:
{
"indexingMode": "consistent",
"automatic": true,
"includedPaths": [
{
"path": "/*"
}
],
"excludedPaths": [
{
"path": "/\"_etag\"/?"
}
],
"spatialIndexes": [
{
"path": "/*",
"types": [
"Point",
"LineString",
"Polygon",
"MultiPolygon"
]
}
]
}
Here is the needle. The haystack is about 1,000 objects with random LineStrings.
{
"id": "test",
"Start": {
"type": "Point",
"coordinates": [ 1, 3 ]
},
"Points": {
"type": "LineString",
"coordinates": [ [ 1, 3 ], [ 1, 4 ], [ 1, 5 ] ]
}
}
Here is the search within a Polygon:
SELECT *
FROM items i
WHERE ST_WITHIN(i.Points, {
"type":"Polygon",
"coordinates": [[[0, 10], [0, 0], [2, 0], [2, 10], [0, 10]]]
})
---
Request Charge: 127.4 RUs
Retrieved document count: 992
Retrieved document size: 1219980 bytes
Output document count: 1
Output document size: 441 bytes
Index hit document count: 0
Index lookup time: 3.77 ms
Here is the search within a LineString:
SELECT *
FROM items i
WHERE ST_WITHIN(i.Points, {
"type":"LineString",
"coordinates": [[1, 3], [1, 4], [1, 5]]
})
---
Request Charge: 122.53 RUs
Retrieved document count: 992
Retrieved document size: 1219980 bytes
Output document count: 1
Output document size: 441 bytes
Index hit document count: 0
Index lookup time: 3.0100000000000002 ms
Here is the search for a Start within the same Polygon as above, showing that spatial indexing is enabled and working:
SELECT *
FROM items i
WHERE ST_WITHIN(i.Start, {
"type":"Polygon",
"coordinates": [[[0, 10], [0, 0], [2, 0], [2, 10], [0, 10]]]
---
Request Charge: 8.1 RUs
Retrieved document count: 1
Retrieved document size: 343 bytes
Output document count: 1
Output document size: 392 bytes
Index hit document count: 1
Index lookup time: 2.79 ms
I create a container and add your sample document,but the result is different with yours.
First sql result:
SELECT *
FROM items i
WHERE ST_WITHIN(i.Points, {
"type":"Polygon",
"coordinates": [[[0, 10], [0, 0], [2, 0], [2, 10], [0, 10]]]
})
---
Request Charge: 10.53 RUs
Retrieved document count: 1
Retrieved document size: 349 bytes
Output document count: 1
Output document size: 398 bytes
Index hit document count: 1
Index lookup time: 1.6800000000000002 ms
Second sql result:
SELECT *
FROM items i
WHERE ST_WITHIN(i.Points, {
"type":"LineString",
"coordinates": [[1, 3], [1, 4], [1, 5]]
})
---
Request Charge: 7.24 RUs
Retrieved document count: 1
Retrieved document size: 349 bytes
Output document count: 1
Output document size: 398 bytes
Index hit document count: 1
Index lookup time: 1.1399000000000001 ms
Third sql result:
SELECT *
FROM items i
WHERE ST_WITHIN(i.Start, {
"type":"Polygon",
"coordinates": [[[0, 10], [0, 0], [2, 0], [2, 10], [0, 10]]]
})
---
Request Charge: 10.53 RUs
Retrieved document count: 1
Retrieved document size: 349 bytes
Output document count: 1
Output document size: 398 bytes
Index hit document count: 1
Index lookup time: 1.6500000000000001 ms
According to my test,each sql hit the index.
By the way,my index is same to you and geospatial configuration is Geography.You can try again and if the result is similar with yours above,please let me know more detail,such as sdk or detail of your document(I test this on Azure portal).

openvino yolo-v3 inference error cannot convert float NaN to integer

Hello i have followed all the steps to make an inference and successfully run it on the model in this link : https://pjreddie.com/media/files/yolov3.weights
but when i tried it on a model i trained with darknet i get this error :
[ INFO ] Creating Inference Engine...
[ INFO ] Loading network files:
newyolo.xml
newyolo.bin
[ INFO ] Preparing inputs
[ INFO ] Loading model to the plugin
[ INFO ] Starting inference...
To close the application, press 'CTRL+C' here or switch to the output window and press ESC key
To switch between sync/async modes, press TAB key in the output window
yolo_original.py:280: DeprecationWarning: shape property of IENetLayer is deprecated. Please use shape property of DataPtr instead objects returned by in_data or out_data property to access shape of input or output data on corresponding ports
out_blob = out_blob.reshape(net.layers[net.layers[layer_name].parents[0]].shape)
[ INFO ] Layer detector/yolo-v3/Conv_14/BiasAdd/YoloRegion parameters:
[ INFO ] classes : 10
[ INFO ] num : 3
[ INFO ] coords : 4
[ INFO ] anchors : [55.0, 56.0, 42.0, 87.0, 68.0, 81.0]
Traceback (most recent call last):
File "yolo_original.py", line 363, in <module>
sys.exit(main() or 0)
File "yolo_original.py", line 286, in main
args.prob_threshold)
File "yolo_original.py", line 153, in parse_yolo_region
h_scale=orig_im_h, w_scale=orig_im_w))
File "yolo_original.py", line 99, in scale_bbox
xmin = int((x - w / 2) * w_scale)
ValueError: cannot convert float NaN to integer
knowing that i have provided the right shape and changed yolo_v3.json to match my model
here is the content of my yolo_v3.json:
[
{
"id": "TFYOLOV3",
"match_kind": "general",
"custom_attributes": {
"classes": 10,
"anchors": [18,22,31,33,33,50,55, 56,42,87,68,81,111,98,73,158,156,202],
"coords": 4,
"num": 9,
"masks":[[6, 7, 8], [3, 4, 5], [0, 1, 2]],
"entry_points": ["detector/yolo-v3/Reshape", "detector/yolo-v3/Reshape_4", "detector/yolo-v3/Reshape_8"]
}
}
]
i have tried multiple things to debug this like not providing the jsonfile ....etc
ps : yolo_original.py is the same demo that comes with openvino just renamed,
i'm using openvino version 2020.1
transforming NaN to float or skipping values with Nan didn't slove the problem.

SQL-style GROUP BY aggregate functions in jq (COUNT, SUM and etc)

Similar questions asked here before:
Count items for a single key: jq count the number of items in json by a specific key
Calculate the sum of object values:
How do I sum the values in an array of maps in jq?
Question
How to emulate the COUNT aggregate function which should behave similarly to its SQL original? Let's extend this question even more to include other regular SQL functions:
COUNT
SUM / MAX/ MIN / AVG
ARRAY_AGG
The last one is not a standard SQL function - it's from PostgreSQL but is quite useful.
At input comes a stream of valid JSON objects. For demonstration let's pick a simple story of owners and their pets.
Model and data
Base relation: Owner
id name age
1 Adams 25
2 Baker 55
3 Clark 40
4 Davis 31
Base relation: Pet
id name litter owner_id
10 Bella 4 1
20 Lucy 2 1
30 Daisy 3 2
40 Molly 4 3
50 Lola 2 4
60 Sadie 4 4
70 Luna 3 4
Source
From above we get a derivative relation Owner_Pet (a result of SQL JOIN of the above relations) presented in JSON format for our jq queries (the source data):
{ "owner_id": 1, "owner": "Adams", "age": 25, "pet_id": 10, "pet": "Bella", "litter": 4 }
{ "owner_id": 1, "owner": "Adams", "age": 25, "pet_id": 20, "pet": "Lucy", "litter": 2 }
{ "owner_id": 2, "owner": "Baker", "age": 55, "pet_id": 30, "pet": "Daisy", "litter": 3 }
{ "owner_id": 3, "owner": "Clark", "age": 40, "pet_id": 40, "pet": "Molly", "litter": 4 }
{ "owner_id": 4, "owner": "Davis", "age": 31, "pet_id": 50, "pet": "Lola", "litter": 2 }
{ "owner_id": 4, "owner": "Davis", "age": 31, "pet_id": 60, "pet": "Sadie", "litter": 4 }
{ "owner_id": 4, "owner": "Davis", "age": 31, "pet_id": 70, "pet": "Luna", "litter": 3 }
Requests
Here are sample requests and their expected output:
COUNT the number of pets per owner:
{ "owner_id": 1, "owner": "Adams", "age": 25, "pets_count": 2 }
{ "owner_id": 2, "owner": "Baker", "age": 55, "pets_count": 1 }
{ "owner_id": 3, "owner": "Clark", "age": 40, "pets_count": 1 }
{ "owner_id": 4, "owner": "Davis", "age": 31, "pets_count": 3 }
SUM up the number of whelps per owner and get their MAX (MIN/AVG):
{ "owner_id": 1, "owner": "Adams", "age": 25, "litter_total": 6, "litter_max": 4 }
{ "owner_id": 2, "owner": "Baker", "age": 55, "litter_total": 3, "litter_max": 3 }
{ "owner_id": 3, "owner": "Clark", "age": 40, "litter_total": 4, "litter_max": 4 }
{ "owner_id": 4, "owner": "Davis", "age": 31, "litter_total": 9, "litter_max": 4 }
ARRAY_AGG pets per owner:
{ "owner_id": 1, "owner": "Adams", "age": 25, "pets": [ "Bella", "Lucy" ] }
{ "owner_id": 2, "owner": "Baker", "age": 55, "pets": [ "Daisy" ] }
{ "owner_id": 3, "owner": "Clark", "age": 40, "pets": [ "Molly" ] }
{ "owner_id": 4, "owner": "Davis", "age": 31, "pets": [ "Lola", "Sadie", "Luna" ] }
Here's an alternative, not using any custom functions with basic JQ. (I took the liberty to get rid of redundant parts of the question)
Count
In> jq -s 'group_by(.owner_id) | map({ owner_id: .[0].owner_id, count: map(.pet) | length})'
Out>[{"owner_id": "1","pets_count": 2}, ...]
Sum
In> jq -s 'group_by(.owner_id) | map({owner_id: .[0].owner_id, sum: map(.litter) | add})'
Out> [{"owner_id": "1","sum": 6}, ...]
Max
In> jq -s 'group_by(.owner_id) | map({owner_id: .[0].owner_id, max: map(.litter) | max})'
Out> [{"owner_id": "1","max": 4}, ...]
Aggregate
In> jq -s 'group_by(.owner_id) | map({owner_id: .[0].owner_id, agg: map(.pet) })'
Out> [{"owner_id": "1","agg": ["Bella","Lucy"]}, ...]
Sure, these might not be the most efficient implementations, but they show nicely how to implement custom functions oneself. All that changes between the different functions is inside the last map and the function after the pipe | (length, add, max)
The first map iterates over the different groups, taking the name from the first item, and using map again to iterate over the same-group items. Not as pretty as SQL, but not terribly more complicated.
I learned JQ today, and managed to do this already, so this should be encouraging for anyone getting started. JQ is neither like sed nor like SQL, but not terribly hard either.
Extended jq solution:
Custom count() function:
jq -sc 'def count($k): group_by(.[$k])[] | length as $l | .[0]
| .pets_count = $l
| del(.pet_id, .pet, .litter);
count("owner_id")' source.data
The output:
{"owner_id":1,"owner":"Adams","age":25,"pets_count":2}
{"owner_id":2,"owner":"Baker","age":55,"pets_count":1}
{"owner_id":3,"owner":"Clark","age":40,"pets_count":1}
{"owner_id":4,"owner":"Davis","age":31,"pets_count":3}
Custom sum() function:
jq -sc 'def sum($k): group_by(.[$k])[] | map(.litter) as $litters | .[0]
| . + {litter_total: $litters | add, litter_max: $litters | max}
| del(.pet_id, .pet, .litter);
sum("owner_id")' source.data
The output:
{"owner_id":1,"owner":"Adams","age":25,"litter_total":6,"litter_max":4}
{"owner_id":2,"owner":"Baker","age":55,"litter_total":3,"litter_max":3}
{"owner_id":3,"owner":"Clark","age":40,"litter_total":4,"litter_max":4}
{"owner_id":4,"owner":"Davis","age":31,"litter_total":9,"litter_max":4}
Custom array_agg() function:
jq -sc 'def array_agg($k): group_by(.[$k])[] | map(.pet) as $pets | .[0]
| .pets = $pets | del(.pet_id, .pet, .litter);
array_agg("owner_id")' source.data
The output:
{"owner_id":1,"owner":"Adams","age":25,"pets":["Bella","Lucy"]}
{"owner_id":2,"owner":"Baker","age":55,"pets":["Daisy"]}
{"owner_id":3,"owner":"Clark","age":40,"pets":["Molly"]}
{"owner_id":4,"owner":"Davis","age":31,"pets":["Lola","Sadie","Luna"]}
This is a nice exercise, but SO is not a programming service, so I will focus here on some key concepts for generic solutions in jq that are efficient, even for very large collections.
GROUPS_BY
The key to efficiency here is avoiding the built-in group_by, as it requires sorting. Since jq is fundamentally stream-oriented, the following definition of GROUPS_BY is likewise stream-oriented. It takes advantage of the efficiency of key-based lookups, while avoiding calling tojson on strings:
# emit a stream of the groups defined by f
def GROUPS_BY(stream; f):
reduce stream as $x ({};
($x|f) as $s
| ($s|type) as $t
| (if $t == "string" then $s else ($s|tojson) end) as $y
| .[$t][$y] += [$x] )
| .[][] ;
distinct and count_distinct
# Emit an array of the distinct entities in `stream`, without sorting
def distinct(stream):
reduce stream as $x ({};
($x|type) as $t
| (if $t == "string" then $x else ($x|tojson) end) as $y
| if (.[$t] | has($y)) then . else .[$t][$y] += [$x] end )
| [.[][]] | add ;
# Emit the number of distinct items in the given stream
def count_distinct(stream):
def sum(s): reduce s as $x (0;.+$x);
reduce stream as $x ({};
($x|type) as $t
| (if $t == "string" then $x else ($x|tojson) end) as $y
| .[$t][$y] = 1 )
| sum( .[][] ) ;
Convenience function
def owner: {owner_id,owner,age};
Example: "COUNT the number of pets per owner"
GROUPS_BY(inputs; .owner_id)
| (.[0] | owner) + {pets_count: count_distinct(.[]|.pet_id)}
Invocation: jq -nc -f program1.jq input.json
Output:
{"owner_id":1,"owner":"Adams","age":25,"pets_count":2}
{"owner_id":2,"owner":"Baker","age":55,"pets_count":1}
{"owner_id":3,"owner":"Clark","age":40,"pets_count":1}
{"owner_id":4,"owner":"Davis","age":31,"pets_count":3}
Example: "SUM up the number of whelps per owner and get their MAX"
GROUPS_BY(inputs; .owner_id)
| (.[0] | owner)
+ {litter_total: (map(.litter) | add)}
+ {litter_max: (map(.litter) | max)}
Invocation: jq -nc -f program2.jq input.json
Output: as given.
Example: "ARRAY_AGG pets per owner"
GROUPS_BY(inputs; .owner_id)
| (.[0] | owner) + {pets: distinct(.[]|.pet)}
Invocation: jq -nc -f program3.jq input.json
Output:
{"owner_id":1,"owner":"Adams","age":25,"pets":["Bella","Lucy"]}
{"owner_id":2,"owner":"Baker","age":55,"pets":["Daisy"]}
{"owner_id":3,"owner":"Clark","age":40,"pets":["Molly"]}
{"owner_id":4,"owner":"Davis","age":31,"pets":["Lola","Sadie","Luna"]}

Unexpected behavior of IgniteSet in Official example

I tried to run org.apache.ignite.examples.datastructures.IgniteSetExample on cluster(2 nodes) after adding some my debug code. Some of its source code like following:
CollectionConfiguration setCfg = new CollectionConfiguration();
setCfg.setAtomicityMode(CacheAtomicityMode.TRANSACTIONAL);
setCfg.setCacheMode(CacheMode.PARTITIONED);
// Initialize new set.
IgniteSet<String> set = ignite.set(setName, setCfg);
System.out.println("Set size before initializing: " + set.size()); //added by myslef
// Initialize set items.
for (int i = 0; i < 10; i++){
set.add(Integer.toString(i));
System.out.println("Set: " + Arrays.toString(set.toArray())); //added by myslef
}
System.out.println("Set size after initializing: " + set.size());
In my opinion, the size of ignite set should be 10 after adding data but I got a number which is great than 10 and typically 15. I found that there was some reduplicate numbers been added to the set. The log is here:
[19:53:16] Topology snapshot [ver=29, servers=2, clients=0, CPUs=8, heap=3.4GB]
Sep 21, 2017 7:53:16 PM org.apache.ignite.logger.java.JavaLogger info
Info: Topology snapshot [ver=29, servers=2, clients=0, CPUs=8, heap=3.4GB]
>>> Ignite set example started.
Set size before initializing: 0
Set: [0]
Set: [1, 1, 0]
Set: [2, 1, 2, 1, 0]
Set: [2, 1, 3, 2, 1, 0, 3]
Set: [2, 1, 3, 2, 1, 0, 4, 3]
Set: [2, 1, 3, 2, 1, 0, 5, 4, 3]
Set: [2, 1, 3, 2, 1, 0, 6, 5, 4, 3]
Set: [7, 2, 1, 3, 7, 2, 1, 0, 6, 5, 4, 3]
Set: [7, 2, 1, 3, 8, 7, 2, 1, 0, 6, 5, 4, 3]
Set: [7, 2, 1, 3, 9, 8, 7, 2, 1, 0, 6, 5, 4, 3]
Set size after initializing: 14
Sep 21, 2017 7:53:16 PM org.apache.ignite.logger.java.JavaLogger info
Info: Class locally deployed: class org.apache.ignite.examples.datastructures.IgniteSetExample$SetClosure
Sep 21, 2017 7:53:16 PM org.apache.ignite.logger.java.JavaLogger info
Info: Class locally deployed: class org.apache.ignite.configuration.CollectionConfiguration
Sep 21, 2017 7:53:16 PM org.apache.ignite.logger.java.JavaLogger info
Info: Class locally deployed: class org.apache.ignite.cache.CacheAtomicityMode
Sep 21, 2017 7:53:16 PM org.apache.ignite.logger.java.JavaLogger info
Info: Class locally deployed: class org.apache.ignite.cache.CacheMode
Set item has been added: 7aa983e1-c358-4876-b58f-4f3b7bfa65f3_0
Set item has been added: 7aa983e1-c358-4876-b58f-4f3b7bfa65f3_1
Set item has been added: 7aa983e1-c358-4876-b58f-4f3b7bfa65f3_2
Set item has been added: 7aa983e1-c358-4876-b58f-4f3b7bfa65f3_3
Set item has been added: 7aa983e1-c358-4876-b58f-4f3b7bfa65f3_4
Set size after writing [expected=20, actual=30]
Iterate over set.
Set item: 292c99a6-137b-433c-97d9-40ce0f8c0abc_1
Set item: 7aa983e1-c358-4876-b58f-4f3b7bfa65f3_3
Set item: 292c99a6-137b-433c-97d9-40ce0f8c0abc_3
Set item: 7
Set item: 292c99a6-137b-433c-97d9-40ce0f8c0abc_4
Set item: 2
Set item: 1
Set item: 7aa983e1-c358-4876-b58f-4f3b7bfa65f3_1
Set item: 3
Set item: 7aa983e1-c358-4876-b58f-4f3b7bfa65f3_2
Set item: 7aa983e1-c358-4876-b58f-4f3b7bfa65f3_3
Set item: 7aa983e1-c358-4876-b58f-4f3b7bfa65f3_4
Set item: 2
Set item: 1
Set item: 0
Set item: 6
Set item: 5
Set item: 7aa983e1-c358-4876-b58f-4f3b7bfa65f3_0
Set item: 4
Set item: 7aa983e1-c358-4876-b58f-4f3b7bfa65f3_1
Set item: 3
Set item: 7aa983e1-c358-4876-b58f-4f3b7bfa65f3_2
Set item: 292c99a6-137b-433c-97d9-40ce0f8c0abc_1
Set item: 9
Set item: 292c99a6-137b-433c-97d9-40ce0f8c0abc_2
Set item: 8
Set item: 292c99a6-137b-433c-97d9-40ce0f8c0abc_3
Set item: 7
Set item: 292c99a6-137b-433c-97d9-40ce0f8c0abc_4
Set item: 292c99a6-137b-433c-97d9-40ce0f8c0abc_0
Set size before clearing: 30
Set size after clearing: 0
Set was removed: true
Expected exception - Set has been removed from cache: GridCacheSetImpl [cache=GridDhtAtomicCache [defRes=org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$1#482d776b, near=null, super=GridDhtCacheAdapter [multiTxHolder=java.lang.ThreadLocal#186978a6, stopping=false, super=GridDistributedCacheAdapter [super=GridCacheAdapter [locMxBean=org.apache.ignite.internal.processors.cache.CacheLocalMetricsMXBeanImpl#631e06ab, clusterMxBean=org.apache.ignite.internal.processors.cache.CacheClusterMetricsMXBeanImpl#2a3591c5, aff=org.apache.ignite.internal.processors.cache.affinity.GridCacheAffinityImpl#34a75079, igfsDataCache=false, mongoDataCache=false, mongoMetaCache=false, igfsDataCacheSize=null, asyncOpsSem=java.util.concurrent.Semaphore#346a361[Permits = 500], name=datastructures_ATOMIC_PARTITIONED_0#default-ds-group, size=0]]]], name=03bbdb45-72ce-45aa-b75f-00b7b6134dc6, id=d55a844ae51-baeb6ba4-cb04-4d72-b0d8-188f21bc5ac5, collocated=false, hdrPart=961, rmvd=true, binaryMarsh=true, compute=org.apache.ignite.internal.IgniteComputeImpl#4052274f]
Sep 21, 2017 7:53:17 PM org.apache.ignite.logger.java.JavaLogger info
Info: Command protocol successfully stopped: TCP binary
Sep 21, 2017 7:53:17 PM org.apache.ignite.logger.java.JavaLogger info
Info: Stopped cache [cacheName=ignite-sys-cache]
Sep 21, 2017 7:53:17 PM org.apache.ignite.logger.java.JavaLogger info
Info: Stopped cache [cacheName=datastructures_TRANSACTIONAL_PARTITIONED_0#default-ds-group, group=default-ds-group]
Sep 21, 2017 7:53:17 PM org.apache.ignite.logger.java.JavaLogger info
Info: Stopped cache [cacheName=datastructures_ATOMIC_PARTITIONED_0#default-ds-group, group=default-ds-group]
Sep 21, 2017 7:53:17 PM org.apache.ignite.logger.java.JavaLogger info
Info: Stopped cache [cacheName=ignite-sys-atomic-cache#default-ds-group, group=default-ds-group]
Sep 21, 2017 7:53:17 PM org.apache.ignite.logger.java.JavaLogger info
Info: Removed undeployed class: GridDeployment [ts=1505994796165, depMode=SHARED, clsLdr=sun.misc.Launcher$AppClassLoader#73d16e93, clsLdrId=355a844ae51-7aa983e1-c358-4876-b58f-4f3b7bfa65f3, userVer=0, loc=true, sampleClsName=org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionFullMap, pendingUndeploy=false, undeployed=true, usage=0]
[19:53:17] Ignite node stopped OK [uptime=00:00:00:778]
Sep 21, 2017 7:53:17 PM org.apache.ignite.logger.java.JavaLogger info
Info:
>>> +---------------------------------------------------------------------------------+
>>> Ignite ver. 2.1.0#20170721-sha1:a6ca5c8a97e9a4c9d73d40ce76d1504c14ba1940 stopped OK
>>> +---------------------------------------------------------------------------------+
>>> Grid uptime: 00:00:00:778
Ignite set example finished.
If and only if I set "collocated" of the CollectionConfiguration instance to true , the size of IgniteSet was 10 as expected. But according to the official documents, if there is lots of data in a IgniteSet then "false" is recommended configuration of "collocated" attribute. So what's wrong here?
You can pour your data with IgniteSet by client mode, I have tested and it proved true. like this: Ignition.setClientMode(true);
Looks like IgniteSet has a bug. Thank you for the report.
For now you can use cache directly instead of a set. The same example will look like this:
public class IgniteSetExample {
static final Object DUMMY = new Object();
public static void main(String[] args) throws Exception {
Ignite ignite = Ignition.start("examples/config/example-ignite.xml");
CacheConfiguration<String, Object> cacheCfg = new CacheConfiguration<>("setCache");
cacheCfg.setAtomicityMode(TRANSACTIONAL);
cacheCfg.setCacheMode(PARTITIONED);
IgniteCache<String, Object> cache = ignite.getOrCreateCache(cacheCfg);
System.out.println("Set size before init: " + cache.size());
for (int i = 0; i < 10; i++) {
cache.put(Integer.toString(i), DUMMY);
System.out.println("Set elements: " + getKeys(cache));
}
System.out.println("Set size after init: " + cache.size());
}
static <T> List<T> getKeys(IgniteCache<T, ?> cache) {
List<T> keys = new ArrayList<>(cache.size());
for (Cache.Entry<T, ?> e : cache)
keys.add(e.getKey());
return keys;
}
}

why the scrapy-plugins/scrapy-jsonrpc can't get the spider's stats

I just want monitor my running spider's stats.I get the latest scrapy-plugins/scrapy-jsonrpc and set the spider as follows:
EXTENSIONS = {
'scrapy_jsonrpc.webservice.WebService': 500,
}
JSONRPC_ENABLED = True
JSONRPC_PORT = [60853]
but when I browse the http://localhost:60853/ , it just return
{"resources": ["crawler"]}
and I just can get the running spiders name without the stats.
anyone who can told me, which place I set wrong, thanks!
http://localhost:60853/ returns the resources available, /crawler being the only top-level one.
If you want to get stats for a spider, you'll need to query the /crawler/stats endpoint and call get_stats().
Here's an example using python-jsonrpc: (here I configured the webservice to listen on localhost and port 6024)
>>> import pyjsonrpc
>>> http_client = pyjsonrpc.HttpClient('http://localhost:6024/crawler/stats')
>>> http_client.call('get_stats', 'httpbin')
{u'log_count/DEBUG': 4, u'scheduler/dequeued': 4, u'log_count/INFO': 9, u'downloader/response_count': 2, u'downloader/response_status_count/200': 2, u'log_count/WARNING': 1, u'scheduler/enqueued/memory': 4, u'downloader/response_bytes': 639, u'start_time': u'2016-09-28 08:49:57', u'scheduler/dequeued/memory': 4, u'scheduler/enqueued': 4, u'downloader/request_bytes': 862, u'response_received_count': 2, u'downloader/request_method_count/GET': 4, u'downloader/request_count': 4}
>>> http_client.call('get_stats')
{u'log_count/DEBUG': 4, u'scheduler/dequeued': 4, u'log_count/INFO': 9, u'downloader/response_count': 2, u'downloader/response_status_count/200': 2, u'log_count/WARNING': 1, u'scheduler/enqueued/memory': 4, u'downloader/response_bytes': 639, u'start_time': u'2016-09-28 08:49:57', u'scheduler/dequeued/memory': 4, u'scheduler/enqueued': 4, u'downloader/request_bytes': 862, u'response_received_count': 2, u'downloader/request_method_count/GET': 4, u'downloader/request_count': 4}
>>> from pprint import pprint
>>> pprint(http_client.call('get_stats'))
{u'downloader/request_bytes': 862,
u'downloader/request_count': 4,
u'downloader/request_method_count/GET': 4,
u'downloader/response_bytes': 639,
u'downloader/response_count': 2,
u'downloader/response_status_count/200': 2,
u'log_count/DEBUG': 4,
u'log_count/INFO': 9,
u'log_count/WARNING': 1,
u'response_received_count': 2,
u'scheduler/dequeued': 4,
u'scheduler/dequeued/memory': 4,
u'scheduler/enqueued': 4,
u'scheduler/enqueued/memory': 4,
u'start_time': u'2016-09-28 08:49:57'}
>>>
You can also use jsonrpc_client_call from scrapy_jsonrpc.jsonrpc.
>>> from scrapy_jsonrpc.jsonrpc import jsonrpc_client_call
>>> jsonrpc_client_call('http://localhost:6024/crawler/stats', 'get_stats', 'httpbin')
{u'log_count/DEBUG': 5, u'scheduler/dequeued': 4, u'log_count/INFO': 11, u'downloader/response_count': 3, u'downloader/response_status_count/200': 3, u'log_count/WARNING': 1, u'scheduler/enqueued/memory': 4, u'downloader/response_bytes': 870, u'start_time': u'2016-09-28 09:01:47', u'scheduler/dequeued/memory': 4, u'scheduler/enqueued': 4, u'downloader/request_bytes': 862, u'response_received_count': 3, u'downloader/request_method_count/GET': 4, u'downloader/request_count': 4}
This is what you get "on the wire" for a request made with a modified example-client.py (see code a bit below, the example in https://github.com/scrapy-plugins/scrapy-jsonrpc is outdated as I write these lines):
POST /crawler/stats HTTP/1.1
Accept-Encoding: identity
Content-Length: 73
Host: localhost:6024
Content-Type: application/x-www-form-urlencoded
Connection: close
User-Agent: Python-urllib/2.7
{"params": ["httpbin"], "jsonrpc": "2.0", "method": "get_stats", "id": 1}
And the response
HTTP/1.1 200 OK
Content-Length: 504
Access-Control-Allow-Headers: X-Requested-With
Server: TwistedWeb/16.4.1
Connection: close
Date: Tue, 27 Sep 2016 11:21:43 GMT
Access-Control-Allow-Origin: *
Access-Control-Allow-Methods: GET, POST, PATCH, PUT, DELETE
Content-Type: application/json
{"jsonrpc": "2.0", "result": {"log_count/DEBUG": 5, "scheduler/dequeued": 4, "log_count/INFO": 11, "downloader/response_count": 3, "downloader/response_status_count/200": 3, "log_count/WARNING": 3, "scheduler/enqueued/memory": 4, "downloader/response_bytes": 870, "start_time": "2016-09-27 11:16:25", "scheduler/dequeued/memory": 4, "scheduler/enqueued": 4, "downloader/request_bytes": 862, "response_received_count": 3, "downloader/request_method_count/GET": 4, "downloader/request_count": 4}, "id": 1}
Here's the modified client to query /crawler/stats, which I called with ./example-client.py -H localhost -P 6024 get-spider-stats httpbin (for a running "httpbin" spider, JSONRPC_PORT being 6024 for me)
#!/usr/bin/env python
"""
Example script to control a Scrapy server using its JSON-RPC web service.
It only provides a reduced functionality as its main purpose is to illustrate
how to write a web service client. Feel free to improve or write you own.
Also, keep in mind that the JSON-RPC API is not stable. The recommended way for
controlling a Scrapy server is through the execution queue (see the "queue"
command).
"""
from __future__ import print_function
import sys, optparse, urllib, json
from six.moves.urllib.parse import urljoin
from scrapy_jsonrpc.jsonrpc import jsonrpc_client_call, JsonRpcError
def get_commands():
return {
'help': cmd_help,
'stop': cmd_stop,
'list-available': cmd_list_available,
'list-running': cmd_list_running,
'list-resources': cmd_list_resources,
'get-global-stats': cmd_get_global_stats,
'get-spider-stats': cmd_get_spider_stats,
}
def cmd_help(args, opts):
"""help - list available commands"""
print("Available commands:")
for _, func in sorted(get_commands().items()):
print(" ", func.__doc__)
def cmd_stop(args, opts):
"""stop <spider> - stop a running spider"""
jsonrpc_call(opts, 'crawler/engine', 'close_spider', args[0])
def cmd_list_running(args, opts):
"""list-running - list running spiders"""
for x in json_get(opts, 'crawler/engine/open_spiders'):
print(x)
def cmd_list_available(args, opts):
"""list-available - list name of available spiders"""
for x in jsonrpc_call(opts, 'crawler/spiders', 'list'):
print(x)
def cmd_list_resources(args, opts):
"""list-resources - list available web service resources"""
for x in json_get(opts, '')['resources']:
print(x)
def cmd_get_spider_stats(args, opts):
"""get-spider-stats <spider> - get stats of a running spider"""
stats = jsonrpc_call(opts, 'crawler/stats', 'get_stats', args[0])
for name, value in stats.items():
print("%-40s %s" % (name, value))
def cmd_get_global_stats(args, opts):
"""get-global-stats - get global stats"""
stats = jsonrpc_call(opts, 'crawler/stats', 'get_stats')
for name, value in stats.items():
print("%-40s %s" % (name, value))
def get_wsurl(opts, path):
return urljoin("http://%s:%s/"% (opts.host, opts.port), path)
def jsonrpc_call(opts, path, method, *args, **kwargs):
url = get_wsurl(opts, path)
return jsonrpc_client_call(url, method, *args, **kwargs)
def json_get(opts, path):
url = get_wsurl(opts, path)
return json.loads(urllib.urlopen(url).read())
def parse_opts():
usage = "%prog [options] <command> [arg] ..."
description = "Scrapy web service control script. Use '%prog help' " \
"to see the list of available commands."
op = optparse.OptionParser(usage=usage, description=description)
op.add_option("-H", dest="host", default="localhost", \
help="Scrapy host to connect to")
op.add_option("-P", dest="port", type="int", default=6080, \
help="Scrapy port to connect to")
opts, args = op.parse_args()
if not args:
op.print_help()
sys.exit(2)
cmdname, cmdargs, opts = args[0], args[1:], opts
commands = get_commands()
if cmdname not in commands:
sys.stderr.write("Unknown command: %s\n\n" % cmdname)
cmd_help(None, None)
sys.exit(1)
return commands[cmdname], cmdargs, opts
def main():
cmd, args, opts = parse_opts()
try:
cmd(args, opts)
except IndexError:
print(cmd.__doc__)
except JsonRpcError as e:
print(str(e))
if e.data:
print("Server Traceback below:")
print(e.data)
if __name__ == '__main__':
main()
In the example command above, I got this:
log_count/DEBUG 5
scheduler/dequeued 4
log_count/INFO 11
downloader/response_count 3
downloader/response_status_count/200 3
log_count/WARNING 3
scheduler/enqueued/memory 4
downloader/response_bytes 870
start_time 2016-09-27 11:16:25
scheduler/dequeued/memory 4
scheduler/enqueued 4
downloader/request_bytes 862
response_received_count 3
downloader/request_method_count/GET 4
downloader/request_count 4