How to model Storage Capacity in BPMN? - process

I am right now trying to model a warehouse with import and export processes. I have the problem that I do not know how I should model the capacity of different storage places in the warehouse. There are processes where vehicles with different loadings come and all of them need to be stored in the warehouse with a limited capacity. Else the arriving goods have to be declined.
I am modeling this process in a BPM Suite and was thinking about using Python to access this problem. I thought that I could simply use variables and if clauses to check the capacity of each storage. But if I would simulate this process with this approach then the variables are re-instantiated each time with the start value and do not hold the actual value., beucause with the script is included in the model as a script task.
Does anyone has other ideas to model capacity in BPMN?

Have you considered to not use BPMN as it is clearly adds more complexity than benefit in your case? Look at the Cadence Workflow which allows to specify orchestration logic using normal code and would support your requirements directly without any ugly workarounds.


How to implement a dual write system for SQL database and Elastic Search

I made my own research and found out that there is several ways to do that, but the most accurate is Change Data Capture. However, I don't see the benefits of it related to the asynchronous method for example :
Synchronous double-write: Elasticsearch is updated synchronously when
the DB is updated. This technical solution is the simplest, but it
faces the largest number of problems, including data conflicts, data
overwriting, and data loss. Make your choice carefully.
Asynchronous double-write: When the DB is updated, an MQ is recorded and used to
notify the consumer. This allows the consumer to backward query DB
data so that the data is ultimately updated to Elasticsearch. This
technical solution is highly coupled with business systems. Therefore,
you need to compile programs specific to the requirements of each
business. As a result, rapid response is not possible.
Change Data Capture (CDC): Change data is captured from the DB, pushed to an
intermediate program, and synchronously pushed to Elasticsearch by
using the logic of the intermediate program. Based on the CDC
mechanism, accurate data is returned at an extremely fast speed in
response to queries. This solution is less coupled to application
programs. Therefore, it can be abstracted and separated from business
systems, making it suitable for large-scale use. This is illustrated
in the following figure.
In another article it said that asynchronous is also risky if one datasource is down and we cannot easily rollback.
So my question is : Should I use CDC to perform persistance operations for multiple datasources ? Why CDC is better than asynchronous given that is based on the same principle ?

Data Consolidation for ETL pipeline

I am currently planning to move some data sources to one place for posterior analysis.
Currently I have any data sources (databases) such as:
Cassandra will be use for analytics in a big data pipeline. What is the best way to migrate any source to a Cassandra cluster?
I will highly recommend using NiFi for this use case. Some of benefits that I can outline right away.
Inbuilt "Processors" available for reading the data from all listed data sources and writing to Cassandra.
Very high throughput with low latency.
Rapid data acquisition pipeline development without writing a lot of code.
Ability to do "Change Data Capture" very easily later in your project, if needed.
Provides a highly concurrent model without a developer having to worry about the typical complexities of concurrency.
Is inherently asynchronous which allows for very high throughput and natural buffering even as processing and flow rates fluctuate
The resource-constrained connections make critical functions such as back-pressure and pressure release very natural and intuitive.
The points at which data enters and exits the system as well as how it flows through are well understood and easily tracked
And biggest of all, OPEN SOURCE.
You can refer Apache NiFi homepage for more information.
Hope that helps!

Data model design guide lines with GEODE

We are soon going to start something with GEODE regarding reference data. I would like to get some guide lines for the same.
As you know in financial reference data world there exists complex relationships between various reference data entities like Instrument, Account, Client etc. which might be available in database as 3NF.
If my queries are mostly read intensive which requires joins across
tables (2-5 tables), what's the best way to deal with the same with in
memory grid?
Case 1:
Separate regions for all tables in your database and then do a similar join using OQL as you do in database?
Even if you do so, you will have to design it with solid care that related entities are always co-located within same partition.
Modeling 1-to-many and many-many relationship using object graph?
Case 2:
If you know how your join queries look like, create a view model per join query having equi join characteristics.
(1) I have 1 join query requiring Employee,Department using emp.deptId = dept.deptId [OK fantastic 1 region with such view model exists]
(2) I have another join query requiring, Employee, Department, Salary, Address joins to address different requirement
So again I have to create a view model to address (2) which will contain similar Employee and Department data as (1). This may soon reach to memory threshold.
Changes in database can still be managed by event listeners, but what's the recommendations for that?
I think your general question is pretty broad and there isn't just one recommended approach to cover all UCs (primarily all your analytical views/models of your data as required by your application(s)).
Such questions involve many factors, such as the size of individual data elements, the volume of data, the frequency of access or access patterns originating from the application or applications, the timely delivery of information, how accurate the data needs to be, the size of your cluster, the physical resources of each (virtual) machine, and so on. Thus, any given approach will undoubtedly require application tuning, tuning GemFire accordingly and JVM tuning regardless of your data model. Still, a carefully crafted data model can determine the extent of such tuning.
In GemFire specifically, such tuning will involve different configuration such as, but not limited to: data management policies, eviction (Overflow) and expiration (LRU, or perhaps custom) settings along with different eviction/expiration thresholds, maybe storing data in Off-Heap memory, employing different partition strategies (PartitionResolver), and so on and so forth.
For example, if your Address information is relatively static, unchanging (i.e. actual "reference" data) then you might consider storing Address data in a REPLICATE Region. Data that is written to frequently (typically "transactional" data) is better off in a PARTITION Region.
Of course, as you know, any PARTITION data (managed in separate Regions) you "join" in a query (using OQL) must be collocated. GemFire/Geode does not currently support distributed joins.
Additionally, certain nodes could host certain Regions, thus dividing your cluster into "transactional" vs. "analytical" nodes, where the analytical-based nodes are updated from CacheListeners on Regions in transactional nodes (be careful of this), or perhaps better yet, asynchronously using an AEQ with AsyncEventListeners. AEQs can be separately made highly available and durable as well. This transactional vs analytical approach is the basis for CQRS.
The size of your data is also impacted by the form in which it is stored, i.e. serialized vs. not serialized, and GemFire's proprietary serialization format (PDX) is quite optimal compared with Java Serialization. It all depends on how "portable" your data needs to be and whether you can keep your data in serialized form.
Also, you might consider how expensive it is to join the data on-the-fly. Meaning, if your are able to aggregate, transform and enrich data at runtime relatively cheaply (compute vs. memory/storage), then you might consider using GemFire's Function Execution service, bringing your logic to the data rather than the data to your logic (the fundamental basis of MapReduce).
You should know, and I am sure you are aware, GemFire is a Key-Value store, therefore mapping a complex object graph into separate Regions is not a trivial problem. Dividing objects up by references (especially many-to-many) and knowing exactly when to eagerly vs. lazily load them is an overloaded problem, especially in a distributed, replicated data store such as GemFire where consistency and availability tradeoffs exist.
There are different APIs and frameworks to simplify persistence and querying with GemFire. One of the more notable approaches is Spring Data GemFire's extension of Spring Data Commons Repository abstraction.
It also might be a matter of using the right data model for the job. If you have very complex data relationships, then perhaps creating analytical models using a graph database (such as Neo4j) would be a simpler option. Spring also provides great support for Neo4j, led by the Neo4j team.
No doubt any design choice you make will undoubtedly involve a hybrid approach. Often times the path is not clear since it really "depends" (i.e. depends on the application and data access patterns, load, all that).
But one thing is for certain, make sure you have a good cursory knowledge and understanding of the underlying data store and it' data management capabilities, particularly as it pertains to consistency and availability, beginning with this.
Note, there is also a GemFire slack channel as well as a Apache DEV mailing list you can use to reach out to the GemFire experts and community of (advanced) GemFire/Geode users if you have more specific problems as you proceed down this architectural design path.

RDBMS or NoSQL for complex schedule data with loops and ordered actions?

I'm trying to figure out how best to store (and model) the scheduling component of an automation system. I need to store schedules on which things will be executed (lights, pumps, etc.). I'm not trying to managing the state of the schedule as it executes, yet. For now I'm interested in how to store and let the user work with schedule creation.
I have not determined what to use to store the data (RDBMS, MongoDB, Cassandra, etc., etc.). I think I'd prefer an RDBMS because much of the other data relational-like and I want to take advantage of joins, transactions, and be confident of the data resiliency. I'm aware that RDBMSs are not the only game in town that gives me those but the majority of my experience is with SQL and recently a MongoDB project. That said, I'm finding it difficult to describe the model in terms of tables and relationships. Something like the below is what I have so far.
What would be the best storage method, and second to that, any tips on a workable model would be appreciated.
Some needs
loops of actions
order of execution
transactions a big plus
Example Schedule
1) [step1] [action] room 1 lights on
2) [step2] [loop] 5x
2.1 [step2.1] [action] tv on (ignoring duration of steps)
2.2 [step2.2] [action] tv off
3) [step3] [action] pool pump on
In any reasonably complex automation system I think you'll find that you really need a state machine and a way to store state.
Storing state is fairly easy; storing the state machine and executing it is not so easy. One approach is to define the state machine in code and have it calculate the next time it needs to execute, you then put a record in the database for that next execution time. Your main loop then simply pulls events from the database in order, waits until it has one due for execution, loads the appropriate state machine and executes it. During execution the state machine can change the state of any variable (including external devices on/off state) and can re-schedule itself for execution one or more times in the event list.
I linked a couple of blog posts describing the way my automation system works. My requirements are a lot more complex that simple timing loops; they involve conditional logic, logic based on the history of what has happened previously and such like, but I suspect that sooner or later you will want to move beyond simple loops.

what are the ASO and BSO , what is advantage to use these

what are ASO and BSO and difference between aggregated storage and block storage?
when to use aggregated and when to use block storage technique.?
Oracle Answers
Shortly: If you have а very spare cube and do not need update values in cells by users, use ASO
A very fundamental and frequent question which appears in all Essbase interviews is what is the difference between ASO and BSO applications.
Here are few differences between ASO and BSO
Essbase system has two distinct storage options Aggregate Storage Option (ASO) and Block Storage Option (BSO) each one has its own unique significance.
Characteristics of ASO:
High dimensionality.
No Calculation scripts.
Only one database can be created under one application.
Mandate to fallow the naming conventions for Application name as Applications names should not be metadata, temp, log, default.
Dynamic time series and Time balance properties are not available.
The dimension build process builds any new member then the data will be erased otherwise the data will be alive.
Only one type of partition available (Transparent)
There is no concept of Sparse and Dense dimensions.
No Boolean attribute tag.
Only store data, never share, label only data storage properties are available.
Characteristics of BSO:
Less number of dimensions but shows the business model.
Special functionalities for Accounts and Time dimensions like Dynamic time series, Time balance, Variance reporting.
3 types of partitions Replicated, Transparent, Linked.
Currency conversion is possible.
There is no restriction of the number of databases under one application but performance costs.
Complex calculations can be achieved using calc scripts.
In ASO we can load data at only level 0 where as in BSO we can load data at any level.....
I know BSO a little, so I want to talk about ASO.
ASO(App Store Optimization) is the process of optimizing mobile apps to rank higher in an app store’s search results. The higher your app ranks in an app store’s search results, the more visible it is to potential customers. That increased visibility tends to translate into more traffic to your app’s page in the app store.