Performance issues with large datasets

Performance issues with large datasets - resolvejs

Is there any way of filtering the events in a projection associated with a read model by the aggregateId?
In the tests carried out we always receive all registered events. Is it possible to apply filters in a previous stage?
We have 100,000 aggregateId and each id has associated 15,000 events. Unable to filter by aggregateId, our projections have to iterate over all events.

So you have 100.000 aggregates with 15.000 events each.
You can use ReadModel or ViewModel:
Read Model:
Read model can be seen as a read database for your app. So if you want to store some data about each aggregate, you should insert/update row or entry in some table for each aggregate, see Hacker News example read model code.
It is important to understand that resolve read models are built on demand - on the first query. If you have a lot of events, it may take some time.
Another thing to consider - a newly created resolve app is configured to use in-memory database for read models, so on each app start you will have it rebuilt.
If you have a lot of events, and don't want to wait to wait for read models to build each time you start the app, you have to configure a real database storage for your read models.
Configiuring adapters is not well documented, we'll fix this. Here is what you need to write in the relevant config file for mongoDB:
readModelAdapters: [
{
name: 'default',
module: 'resolve-readmodel-mongo',
options: {
url: 'mongodb://127.0.0.1:27017/MyDatabaseName',
}
}
]
Since you have a database engine, you can use it for an event store too:
storageAdapter: {
module: 'resolve-storage-mongo',
options: {
url: 'mongodb://127.0.0.1:27017/MyDatabaseName',
collectionName: 'Events'
}
}
ViewModel
ViewModel is built on the fly during the query. It does not require a storage, but it reads all events for the given aggregateId.
reSolve view models are using snapshots. So if you have 15.000 events for a give aggregate, then on the first request all those events will be applied to calculate a vies state for the first time. After this, this state will be saved, and all subsequent requests will read a snapshot and all later events. By default snapshot is done per 100 events. So on the second query reSolve would read a snapshot for this view model, and apply not more than 100 events to it.
Again, keep in mind, that if you want snapshot storage to be persistent, you should configure a snapshot adapter:
snapshotAdapter: {
module: 'resolve-snapshot-lite',
options: {
pathToFile: 'path/to/file',
bucketSize: 100
}
}
ViewModel has one more benefit - if you use resolve-redux middleware on the client, it will be kept up-to-date there, reactively applying events that app is receiving via websockets.

Related

Use mobx or redux or with repository pattern and persistent local storage realm or sqlite?

Mobx and Redux will normally not persist any data. They will maintain a temporary global state while the app is running.
I know there are redux-persist and mobx-persist packages within both communities. But unfortunately these persisting solutions do not seem good at all. They only stringify or serialize a global state tree and persist it using some sort of key-value storage. Right?
The problem:
When such an app is open again, the stringified store will be parsed and structured back to its original data structure (JSON, for instance) and then fully loaded into the RAM memory. Am I right?
If yes, this is a problem. It is not good to always have a full "database" aka "global state" loaded in-memory. It will probably never be faster to filter data within a long array in my global state... compared to querying a table on SQLite, right?
I have been looking for some repository-like solution for persisting global state for either redux or mobx. I am yarning for some solution for persisting and querying data on some well-known mobile database like SQLite or others.
Any answers will be very much appreciated.

Indeed you can use repository pattern.
On your repository, you may have a save method.
save(group: GroupLocalStorageModel): Promise<boolean> {
let created;
this._localStorage.write(() => {
created = this._localStorage.create<GroupLocalStorageModel>("Group", group);
});
return Promise.resolve(true);
}
This method will literally save your entity to some local storage you set. In the example above, we are saving a group object to a Group collection which are like tables. We are using realm which is no-sql.
Once you have your repository, if you are using either redux or mobx, you will probably call your save method on your action. Both redux and mobx work with actions, right?
export const GroupStoreModel = types
.model("GroupStore")
.props({
groups: types.optional(types.array(GroupModel), []),
})
.extend(withEnvironment)
.actions((self) => {
return ({
_addGroupToStore(group: GroupLocalStorageModel) {
self.groups.push(group)
},
_deleteAllFromStore() {
self.groups.clear()
},
_addGroupsToStoreBatch: (groups: GroupLocalStorageModel[]) => {
self.groups.concat(groups);
},
})
})
/* Async actions */
.actions((self) => {
let groupRepository = self.environment.groupRepository;
return ({
addGroup(group: GroupLocalStorageModel) {
groupRepository.save(group).then(result => self._addGroupToStore(group))
},
getAllGroupsPaginated(page: number) {
groupRepository.getAllPaginated(page).then(groups => self._addGroupsToStoreBatch(groups));
},
deleteAll() {
groupRepository.deleteAll();
self._deleteAllFromStore();
}
})
})
In this example, we are using mobx-state-tree. And this addGroup action will update firstly our database, and then update also the global state.
We still want to use our global state so our views will be re-rendered automatically according to either connect for redux or observable for mobx.
See more informations here on the repository:
https://github.com/Hadajung/poc-react-native-database-example

AFAIK, there are two options for using sqlite with redux persist.
redux-persist-sqlite-storage: By maintainer's own word
By default redux-persist uses AsyncStorage as storage engine in react-native. This is a drop-in replacemet of AsyncStorage.
The library is inspired by react-native-sqlite-storage.
Please, remember, to use this, you need to install an additional package installed react-native-sqlite-storage
redux-persist-sqlite: By maintainer's own word
A redux-persist storage adapter that writes to sqlite.
This is adapted from https://github.com/prsn/redux-persist-sqlite-storage, but uses Node.js sqlite3 rather than react-native.
Great for Electron apps that are backed by Redux.
UPDATE: react-native-mmkv : This is developed by WeChat. As it says in its about section
An extremely fast key/value storage library for React Native. ~30x faster than AsyncStorage!

I'm not really sure what you need but if I understood you correctly you need to persist large amounts of data, and also to load that same data, but only in batches.
I believe that this kind of problem can be solved with a repository pattern and SOLID design principles.
You will need:
store class (mobx store) that holds your business logic.
repository class which is responsible for retrieving and persisting data.
The store gets the repository injected into it via the constructor.
Then when you call the initialize method on your store, it talks to the repository and retrieves the initial data. Now, initial data can be only a subset of all the data that is persisted. And you can implement some kind of paging on the store and repository, to retrieve data in batches as needed. Later you can call other methods to load and save additional data as needed.
pseudo code:
class Repository(){
initialize()// load the first batch
load(next 10 models)
save(data)
}
class Store{
constructor(repository)
initialize(){
repository.initialize()
}
load(){
repository.load()
}
save(){
repository.save()
}
}
Now your application data shouldn't be one giant object, rather it should consist of multiple stores, where each store is responsible for a part of the data. For example, you would have one store and repository for handling todos and another pair that handles address book contacts etc.
Addendum:
The reason the repository is injected into the store is so you could easily swap it for some other implementation (the store doesn't care how the data is persisted and retrieved) and also, unit testing is very easy.
You could also have a root store that would hold all other stores, so in essence you have your complete state in one place. So if you call serialize on the root store, it serializes all stores and returns one big object.

I think the best solution would be bloc hydrated or cubit hydrated package from flutter_bloc.
https://pub.dev/packages/hydrated_bloc
In background it uses Hive DB, very performant DB, and only keys are stored in memmory so it should not add huge bloat to the app like SQLite.
If you could make all you APP logic in blocks/cubits, then extra DB calls would be irelevant.

Bulk insert of devices / measurements

Do you plan to allow the creation of multiple objects in one only call? For example, currently if I want to create 50 devices (by import), I need to call the API 50 times.
I think it can load the server more unnecessarily that if all objects are contained in the same call.
For a project we don't want to communicate the measurements in real time (every seconds) but postpone the storage in cumulocity. So potentially we need to create ~4000 measurements in one time every hours. Is this approach realistic?

sure, there's no problem with this approach. It also permits you to optimise your mobile bandwidth, if you send the data over a mobile data channel. POST a measurement collection instead of a single measurement, i.e., use
Content-Type: application/vnd.com.nsn.cumulocity.measurementCollection+json
and in the body, use
{ "measurements": [ { ... first measurement ... }, { ... second measurement ... }, ... ] }
If you plan to create a large number of measurements at the same time and on a regular base on our public production system, we appreciate an advance note for capacity provisioning.
There's currently no bulk API for creating multiple managed objects in the same call. It's not been a bottleneck for our customers in practical roll-out scenarios.
However, there's an API for bulk registration of devices. Maybe that helps? It's used by the upload button on the device registration page, and is described here: https://cumulocity.com/guides/reference/device-credentials/ ("Bulk device credentials")
Cheers,
André

Flux without data caching?

Almost all examples of flux involve data cache on the client side however I don't think I would be able to do this for a lot of my application.
In the system I am thinking about using React/Flux, a single user can have 100's of thousands of the main piece of data we store (and 1 record probably has at least 75 data properties). Caching this much data on the client side seems like a bad idea and probably makes things more complex.
If I were not using Flux, I would just have a ORM like system that can talk to a REST API in which case a request like userRepository.getById(123) would always hit the API regardless if I requested that data in the last page. My idea is to just have the store have these methods.
Does Flux consider it bad that if I were to make request for data, that it always hit the API and never pulls data from a local cache instance? Can I use Flux in a way were a majority of the data retrieval requests are always going to hit an API?

The closest you can sanely get to no caching is to reset any store state to null or [] when an action requesting new data comes in. If you do this you must emit a change event, or else you invite race conditions.
As an alternative to flux, you can simply use promises and a simple mixin with an api to modify state. For example, with bluebird:
var promiseStateMixin = {
thenSetState: function(updates, initialUpdates){
// promisify setState
var setState = this.setState.bind(this);
var setStateP = function(changes){
return new Promise(function(resolve){
setState(changes, resolve);
});
};
// if we have initial updates, apply them and ensure the state change happens
return Promise.resolve(initialUpdates ? setStateP(initialUpdates) : null)
// wait for our main updates to resolve
.then(Promise.params(updates))
// apply our unwrapped updates
.then(function(updates){
return setStateP(updates);
}).bind(this);
}
};
And in your components:
handleRefreshClick: function(){
this.thenSetState(
// users is Promise<User[]>
{users: Api.Users.getAll(), loading: false},
// we can't do our own setState due to unlikely race conditions
// instead we supply our own here, but don't worry, the
// getAll request is already running
// this argument is optional
{users: [], loading: true}
).catch(function(error){
// the rejection reason for our getUsers promise
// `this` is our component instance here
error.users
});
}
Of course this doesn't prevent you from using flux when/where it makes sense in your application. For example, react-router is used in many many react projects, and it uses flux internally. React and related libraries/patters are designed to only help where desired, and never control how you write each component.

I think the biggest advantage of using Flux in this situation is that the rest of your app doesn't have to care that data is never cached, or that you're using a specific ORM system. As far as your components are concerned, data lives in stores, and data can be changed via actions. Your actions or stores can choose to always go to the API for data or cache some parts locally, but you still win by encapsulating this magic.

Firebase lazy loading

Can I make firebase load my data lazily? Let's say my app consists of a tree with a million nodes that the user can collapse, expand and modify. The vast majority of those nodes will stay collapsed the vast majority of the time so it doesn't make sense to keep everything in memory. As far as I can tell, firebase transfers everything in the database to the client on launch and is not meant to be used in any other way. Is that correct?

That's incorrect. Firebase synchronizes data only as you request it. To accomplish something like this, it's all about how you store the data.
For instance, a simplistic example would be this structure, which achieves the desired result:
/records/root/record1
/records/root/record2
/records/record1/record1-1
/records/record2/record2-1
Now you do ref.child('root_level').on('child_added'...) for your starting point. When a node is expanded, run a child_added on the child path.
You could also use priorities, storing all the records in the same path and loading only those you need based on the priority:
/records/record1 (priority null)
/records/record2 (priority null)
/records/record1-1 (priority 'record1')
/records/record2-1 (priority 'record2')
Now, to retrieve your root records, you use:
ref.child('records').startAt(null).endAt(null)
When a node is expanded, you use the following:
ref.child('records').startAt(parentId).endAt(parentId)

Ncqrs recreate the complete ReadModel

Using Ncqrs, is there a way to replay every single event ever happened (all aggregate types) and feed these through my denormalizers in order to recreate the whole read model from scratch?
Edit:
I though it's be good to provide a more specific use case. I'm building this inside a ASP.NET MVC application and using Entity Framework (Code first) for working with the read models. In order to speed up development (and because I'm lazy), I want to use a database initializer that recreates the database schemas once any read model changes. Then using the initializer's seed method to repopulate them.

There is unfortunately nothing built in to do this for you (though I haven't updated the version of ncqrs I use in quite a while so perhaps that's changed). It is also somewhat non-trivial to do it since it depends on exactly what you want to do.
The way I would do it (up to this point I have not had a need) would be to:
Call to the event store to get all relevant events
Depending on what you are doing this could be all events or just the events for one aggregate root, or a subset of events for one or more aggregate roots.
Re-create the read-model in memory from scratch (to save slow and unnecessary writing)
Store the re-created read-model in place of the existing one
Call to the event store one more time to get any events that may have been missed
Repeat until there are no new events being returned
One thing to note, if you are recreating the entire read-model database from scratch I would off-line the service temporarily or queue up new events until you finish.
Again there are different ways you could approach this problem, your architecture and scenarios will probably dictate how best to do it.

We use a MsSqlServerEventStore, to replay all the events I implemented the following code:
var myEventBus = NcqrsEnvironment.Get<IEventBus>();
if (myEventBus == null) throw new Exception("EventBus is not found in NcqesEnvironment");
var myEventStore = NcqrsEnvironment.Get<IEventStore>() as MsSqlServerEventStore;
if (myEventStore == null) throw new Exception("MsSqlServerEventStore is not found in NcqesEnvironment");
var myEvents = myEventStore.GetEventsAfter(GetFirstEventIdFromEventStore(), int.MaxValue);
myEventBus.Publish(myEvents);
This will push all the events on the eventbus and the denormalizers will process all the events. The function GetFirstEventIdFromEventStore just queries the eventstore and returns the first Id from the eventstore (where SequentialId = 1)

What I ended up doing is the following. At the service startup, before any commands are being processed, if the read model has changed, I throw it away and recreate it from scratch by processing all past events in my denormalizers. This is done in the database initializer's seed method.
This was a trivial task using the MS SQL event storage as there was a method for retrieving all events. However, I'm not sure about other event storages.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas