Fiware Entities and STH - entity

I am using the Orion Context Broker, an IoT Agent and Cygnus to handle and persist the data of several devices into a MongoDB. It's working, but I don't know if I'm doing it in the Fiware way, because after reading the documentation I am confused yet about some things:
I don't completely understand the difference between an Entity and an IoT Entity (or device?). My guess is that is a matter of how they provide context data and the nature of the entity modelled, but I would be grateful if someone could clarify it. I am especially confused because the creation of each entity type is different (it seems that I can't initialize an IoT entity at creation time, which I can when dealing with a regular Entity).
I can only persist the data of IoT Entities. Is it possible to have a Short Term History of a regular Entity?
I don't understand why the STH data is repeating attributes that have not changed. If I have an IoT Entity with two attributes, 'a' and 'b', and I modify both of them, a STH entry is created for each one, which is fine. However, if then I change the value of attribute 'b', two more registers are created: one for 'a' (which hasn't changed and is reflecting the same value that it already had) and one for 'b'. Could someone explain to me this behavior?

1. Entities vs IoT Entities
I assume that what you mean by an IoT entity is the entry made by the IoT Agent upon receiving a sensor reading from a provisioned device.
Logically there is no difference between an entity created and maintained by an IoT Agent and an entity created and maintained by any other service making NGSI request to the context broker.
Your so-called IoT Entity is merely a construct where an IoT agent does all the heavy lifting for you and converts the data coming from a device in a propitiatory format into the NGSI standard.
2. Short Term History of a regular Entity
To create Short Term History you will need a separate Generic Enabler such as STH-Comet or QuantumLeap. Both of these enablers receive updates from Orion using the subscriptions mechanism. If you set up your IoT data using one fiware-service header and set up your non-IoT data using another fiware-service you can easily set up a subscription to differentiate between the two.
e.g. the following subscription:
curl -iX POST \
'http://localhost:1026/v2/subscriptions/' \
-H 'Content-Type: application/json' \
-H 'fiware-service: iotdata' \
-H 'fiware-servicepath: /' \
-d '<body>'
Will only apply to entities with the iotdata service path, which would be created when you provision your IoT service.
3. Repeating attributes that have not changed.
The <body> of the subscription can be used to narrow down the conditions under which the historical data is persisted.
The entities, conditions and the attrs are the important part of the subject
subject": {
"entities": [
{
"idPattern": "Motion.*"
}
],
"condition": {
"attrs": [
"count"
]
}
},
"notification": {
"http": {
"url": "http://quantumleap:8668/v2/notify"
},
"attrs": [
"count"
],
"metadata": ["dateCreated", "dateModified"]
},
"throttling": 1
}'
The subscription defined above will only fire if the count attribute is changed and only persist the count attribute. If you do not limit your attrs then multiple lines will be persisted to the database. Similarly if you do not limit the condition then multiple entries of count will be persisted when other attributes are updated.

Related

Data Factory copy pipeline from API

We use Azure Data Factory copy pipeline to transfer data from REST api's to a Azure SQL Database and it is doing some strange things. Because we loop over a set of API's that need to be transferred the mapping is empty from the copy activity.
But for one API the automatic mapping is going wrong, the destination table is created with all the needed columns and correct datatypes based on the received metadata. When we run the pipeline for that specific API, the following message is showed.
{ "errorCode": "2200", "message": "ErrorCode=SchemaMappingFailedInHierarchicalToTabularStage,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Failed to process hierarchical to tabular stage, error message: Ticks must be between DateTime.MinValue.Ticks and DateTime.MaxValue.Ticks.\r\nParameter name: ticks,Source=Microsoft.DataTransfer.ClientLibrary,'", "failureType": "UserError", "target": "Copy data1", "details": [] }
As a test we did do the mapping for that API manually by using the "Import Schema" option on the Mapping page. there we see that all the fields are correctly mapped. We execute the pipeline again using the mapping and everything is working fine.
But of course, we don't want to use a manually mapping because it is used in a loop for different API's also.

What if access control rule defined for participant/asset contradicts access control rule for transaction?

I have a question regarding access control.
Specifically, the question is about the relationship between access control rules defined for participants or assets on the one hand and asset control rules defined for transactions accessing those participants/assets.
Here is an example:
Assume a Hyperledger Fabric network is used to create some kind of social network for employees of a company.
The following rule states that an employee has write access to his own data:
rule EmployeesHaveWriteAccessToTheirOwnData {
description: "Allow employees write access to their own data"
participant(p): "org.company.biznet.Employee"
operation: UPDATE
resource(r): "org.company.biznet.Employee"
condition: (p.getIdentifier() == r.getIdentifier())
action: ALLOW
}
Let's assume that the write access is facilitated through a transaction called "UpdateTransaction". Further assume that (maybe by accident) the action value of the access control rule of transaction "UpdateTransaction" is set to "Denied"
rule EmployeeCanSubmitTransactionsToUpdateData {
description: "Allow employees to update their data"
participant: "org.company.biznet.Employee"
operation: CREATE
resource: "org.company.biznet.UpdateTransaction"
action: Denied
}
Now there is the following situation:
Each employee is (through rule 1) given the right to change his/her data.
At the same time employees are not allowed to submit the transaction "UpdateTransaction" to change the data (see rule 2).
Is it now impossible for employees to change their data? Or are employees still able to change their data without submitting the transaction "UpdateTransaction"?
Put differently: is there a way for participants to access data (for which they have access rights) without using any of the transactions defined in the .cto-file?
I think the answer is, it depends.
In your example, denying access to the org.company.biznet.UpdateTransaction transaction would result in org.company.biznet.Employee participants being unable to use that transaction to update their data, even though they would otherwise be allowed.
Having said that, you should keep the system transactions in mind since they provide another potential route for org.company.biznet.Employee participants to update their own data.
For example, I tried that out on the basic-sample-network by replacing the EverybodyCanSubmitTransactions rule with
rule NobodyCanSubmitTransactions {
description: "Do not allow all participants to submit transactions"
participant: "org.example.basic.SampleParticipant"
operation: CREATE
resource: "org.example.basic.SampleTransaction"
action: DENY
}
That business network includes an OwnerHasFullAccessToTheirAssets rule and I was able to use the org.hyperledger.composer.system.UpdateAsset transaction to make updates for participants that owned an asset using the command,
composer transaction submit -d "$(cat txn.json)" -c party1#basic-sample-network
Where txn.json contained,
{
"$class": "org.hyperledger.composer.system.UpdateAsset",
"resources": [
{
"$class": "org.example.basic.SampleAsset",
"assetId": "ASSET1",
"owner": "resource:org.example.basic.SampleParticipant#PARTY1",
"value": "5000"
}
],
"targetRegistry": "resource:org.hyperledger.composer.system.AssetRegistry#org.example.basic.SampleAsset"
}
That wouldn't work if you had locked down the system namespace in your ACL rules though. (ACLs need a lot of thought!)
The other important thing to remember about ACLs is that they do not apply if you use the getNativeAPI method to access data via the Hyperledger Fabric APIs in your transaction processor functions.
Check out the system namespace reference along with the ACL reference, plus there is an ACL tutorial which may be of interest if you haven't seen it.

Properly Configuring Kafka Connect S3 Sink TimeBasedPartitioner

I am trying to use the TimeBasedPartitioner of the Confluent S3 sink. Here is my config:
{
"name":"s3-sink",
"config":{
"connector.class":"io.confluent.connect.s3.S3SinkConnector",
"tasks.max":"1",
"file":"test.sink.txt",
"topics":"xxxxx",
"s3.region":"yyyyyy",
"s3.bucket.name":"zzzzzzz",
"s3.part.size":"5242880",
"flush.size":"1000",
"storage.class":"io.confluent.connect.s3.storage.S3Storage",
"format.class":"io.confluent.connect.s3.format.avro.AvroFormat",
"schema.generator.class":"io.confluent.connect.storage.hive.schema.DefaultSchemaGenerator",
"partitioner.class":"io.confluent.connect.storage.partitioner.TimeBasedPartitioner",
"timestamp.extractor":"Record",
"timestamp.field":"local_timestamp",
"path.format":"YYYY-MM-dd-HH",
"partition.duration.ms":"3600000",
"schema.compatibility":"NONE"
}
}
The data is binary and I use an avro scheme for it. I would want to use the actual record field "local_timestamp" which is a UNIX timestamp to partition the data, say into hourly files.
I start the connector with the usual REST API call
curl -X POST -H "Content-Type: application/json" --data #s3-config.json http://localhost:8083/connectors
Unfortunately the data is not partitioned as I wish. I also tried to remove the flush size because this might interfere. But then I got the error
{"error_code":400,"message":"Connector configuration is invalid and contains the following 1 error(s):\nMissing required configuration \"flush.size\" which has no default value.\nYou can also find the above list of errors at the endpoint `/{connectorType}/config/validate`"}%
Any idea how to properly set the TimeBasedPartioner? I could not find a working example.
Also how can one debug such a problem or gain further insight what the connector is actually doing?
Greatly appreciate any help or further suggestions.
After studying the code at TimeBasedPartitioner.java and the logs with
confluent log connect tail -f
I realized that both timezone and locale are mandatory, although this is not specified as such in the Confluent S3 Connector documentation. The following config fields solve the problem and let me upload the records properly partitioned to S3 buckets:
"flush.size": "10000",
"storage.class": "io.confluent.connect.s3.storage.S3Storage",
"format.class": "io.confluent.connect.s3.format.avro.AvroFormat",
"schema.generator.class": "io.confluent.connect.storage.hive.schema.DefaultSchemaGenerator",
"partitioner.class": "io.confluent.connect.storage.partitioner.TimeBasedPartitioner",
"path.format": "'year'=YYYY/'month'=MM/'day'=dd/'hour'=HH",
"locale": "US",
"timezone": "UTC",
"partition.duration.ms": "3600000",
"timestamp.extractor": "RecordField",
"timestamp.field": "local_timestamp",
Note two more things: First a value for flush.size is also necessary, files are partitioned eventually into smaller chunks, not larger than specified by flush.size. Second, the path.format is better selected as displayed above so a proper tree structure is generated.
I am still not 100% sure if really the record field local_timestamp is used to partition the records.
Any comments or improvements are greatly welcome.
Indeed your amended configuration seems correct.
Specifically, setting timestamp.extractor to RecordField allows you to partition your files based on the timestamp field that your records have and which you identify by setting the property timestamp.field.
When instead one sets timestamp.extractor=Record, then a time-based partitioner will use the Kafka timestamp for each record.
Regarding flush.size, setting this property to a high value (e.g. Integer.MAX_VALUE) will be practically synonymous to ignore it.
Finally, schema.generator.class is no longer required in the most recent versions of the connector.

How to fetch fieldname and structure for additional properties in operation, managedObject etc

I am trying to figure out which fragments are related to operation:
managedObject
event
measurement
alarm
So , Is there a way to get all these fragments ?
Also there are additional Properties for which field name is defined as * and the value can be an Object or anything else(*). I have gone through the device management library and sensor library in cumulocity documentation but found it does not contain all the possible fragments and there is no clarity as in which object the fragment goes i.e does it go in operation or managedObject, or both?
Since every user, device and application can contribute such fragments, there is no "global list" of them that you could refer to. Normally, a client (application, device) knows what data it sends or what data it requests, so it's also in most cases not required.
Regarding the relationship between operations and managed objects, there are some typical design pattern. Let's say you want to configure something in a device, like a polling interval:
"mydevice_Configuration": { "pollingRate": 60 }
What your application would do is to send that fragment as an operation to a device:
POST /devicecontrol/operations HTTP/1.1
...
{
"deviceId": "12345",
"mydevice_Configuration": { "pollingRate": 60 }
}
The device would accept the operation (http://cumulocity.com/guides/rest/device-integration/#step-6-finish-operations-and-subscribe) and change its configuration. When it does that successfully, it will update its managed object to contain the new configuration:
PUT /inventory/managedObjects/12345 HTTP/1.1
{
"mydevice_Configuration": { "pollingRate": 60 }
}
This way, your inventory reflects as closely as possible the true state of devices.
Hope that helps ...

RESTful API: get/set one "global" single resource without id

TL;DR
In my current API, I've got two endpoints to handle the context:
GET /context/get
POST /context/set '{"id": "123"}'
What's the recommended way of having this global, id-less state accessible from RESTful API?
Please assume that the context concept can't be changed.
Background
Let's say I've got a user that is logged. He's by default assigned to a context that he can change.
After the context change, all the subsequent API calls will return different data, according to the context.
Example:
// Backend
Context = "Poland"
then
$ curl -X GET http://api.myapp.com/cities
will respond:
{
"cities": [{
"id": "1",
"name": Warszawa"
}, {
"id": "2",
"name": Wrocław"
}]
}
However, if you change the context:
// Backend
Context = "USA"
then, the same URL:
$ curl -X GET http://api.myapp.com/cities
should return the different set of data:
{
"cities": [{
"id": "3",
"name": New York City"
}, {
"id": "4",
"name": Boston"
}]
}
Question
As the context is just a global state on the backend side, it doesn't have an id. It doesn't belong to any collection either. Still, I want it to be accessible in the API. There are three possible solutions I see:
Solution #1 - existing
Set a context
$ curl -X POST http://api.myapp.com/context/set '{"id": "123"}'
Get a context
$ curl -X GET http://api.myapp.com/context/get
This one doesn't really feel like a RESTful API and still, on the frontend side, I have to mock the id (using ember-data). And the resource name is singular instead of plural.
Solution #2 - mocking the id
Set a context
$ curl -X POST http://api.myapp.com/context/1 '{"contextId": "123"}'
Get a context
$ curl -X GET http://api.myapp.com/context/1
Here I mock the id to always equal to one but I feel that it's super hacky and certainly not self-explanatory... Moreover, I've got a name conflict: id vs contextId. And the resource name is singular instead of plural.
Solution #3 - actions
Set a context
$ curl -X POST http://api.myapp.com/context/actions/set '{"id": "123"}'
Get a context
$ curl -X GET http://api.myapp.com/context/actions/get
This is very similar to the first one but using actions that could be a part of my whole API design (taken from e.g. gocardless. Still, I'll have a problem how to model it on the frontend side nicely. And the resource name is singular instead of plural again.
Is there any #4 option? How should I address this problem?
Thanks!
Your three solutions are RPC, not REST. Not only they are not stateless, but setting a resource to some other resource by setting an id is very RCP'ish.
A RESTful solution, if you really want to go that way, is to set the context in a header. The client should send a header like X-ContextId or something like that, and you determine the request context you need from that.
However, don't worry too much about being RESTful if that's not what your application requires. I recommend reading the answer here: SOAP vs REST (differences)
What's the recommended way of having this global, id-less state
accessible from RESTful API?
A RESTful API is by definition stateless, no client context should be stored on the server between requests.
If you want your API to be RESTful, you'll have to pass this id with each request.