Orientdb maximum number of classes - schema

The orientdb documentation doesn't say anything about limitations regarding the number of classes. In practise it seems a large number of classes is limiting functionality.
I have a database with the Buildingsmart IFC classes as it's schema. This means I have a lot of classes. Every time a connection to the db is made, the server sends the complete list of classes (clusters). As a result, opening my database is taking too much time.
Is there a way to tell orientdb not to sent the list? I already know the internal db class structure, so I don't really have need for the list.

The maximum number of classes supported by OrientDB is a function of number of clusters in the database. Refer the clustering section.
Starting from v2.2, OrientDB automatically may create multiple
clusters per each Class to
improve the performance of parallelism. The number of clusters created
per class is equal to the number of CPU cores available on the server.
You can also have more clusters per class. The limit on the number of
clusters in a database is 32,767 (or, 215 - 1)
For CPU of 4 core, therefore, assuming each class has default number of clusters, the total class count in the database would be ~8100.

Related

Number of databases per instance

Is there any limitation on the number of redisgraph databases that I can create in a single redis instance?
GRAPH.QUERY database1 "Cypher"
Is there any performance issues regarding a high number of these keys?
Thanks
The number of graphs on a single Redis server is limited by the number of keys the server can accommodate, as each graph is associated with a key.
In terms of performance, RedisGraph the module will initialise a single global thread pool which will serve all of the graph keys, obviously each graph will have its memory footprint. but this is not different than having multiple keys of different types: list, set, hash.

Max number of concurrent connections to DB2, biggest table size?

What is the maximum of simultaneous connections which DB2 can handle? The connection can be from one App or can be from different apps from same or different applications.
Also, how much data can DB2 hold in its tables? In one table and in all the Tables? What is the approximate max number of records and the size of data?
The answer typical depends on your system resources. The DB2 documentation ("Knowledge Center") has a section on "SQL and XML Limits". There you will find the specific system limits, including maximum concurrency and table sizes.
Without any additional software, the maximum concurrency (parallel connections) is 64000.
The maximum table size requires some math. It depends on tablespaces, table organization, page size, number of columns and more. How many petabytes of storages does your system has available...?

Is there a max limit for objects per process?

In C++, map class is very comfortable. Instead of going for a separate database I want to store all the rows as objects and I want to create map object for the columns to search. I am concerned with maximum objects a process can handle. And is using map function to retrieve an object among, say, 10 million objects, if linux permits, is a good choice? I'm not worried about persisting the data.
What you are looking for is std::map::max_size, quoting from the reference:
...reflects the theoretical limit on the size of the container. At runtime, the size of the container may be limited to a value smaller than max_size() by the amount of RAM available.
No, there is no maximum number of objects per process. Objects (as in, C++ objects) are an abstraction which the OS is unaware of. The only meaningful limit in this regard is the amount of memory used.
You can completely fill your RAM using as much map as it takes, I promise.
As you can see in reference documentation, the constannt map::max_size will let you know the numbers.
This should be 2^31-1 on iX86 hardware/OS and 2^64-1 on amd64 hardware/64bit OS
Possible additionnal information here.
Object is a concept in programming language. In fact, the processes are not aware of the objects. With enough RAM space, you can alloc as many objects as possible in your program.
About your second question, my answer is that which data structure you choose in your program depends on the problem that you want to solve in your program. Map is a suitable data structure for quick accessing objects, testing existance, etc, but is not good enough to maintain the objects' order.

SOLR One collection (core) VS. many

I have multiple entities from a MySQL database that will be indexed in SOLR.
What is the best method in order to have the best performance results (query time)?
Using a single SOLR collection (core) with a field for the entity type
Or having a collection (core) for every entity type
Thanks
I would add a few more parameters for you to consider (mostly discouraging one core per entity approach, but not just for performance reasons that you are specifically asking for)
More cores would mean more endpoints. Your application will need to be made aware of such. And you may find it difficult to run a query across cores. For ex, if you are searching by a common attribute, say name, you would have to run multiple queries to each core and aggregate the result. And this will miss the relevancy aspect that you get out of the box in querying a single core.
Consider making minimal requests to your database. N+1 jdbc connections drastically slow down indexing. Instead, try to aggregate your results in a view and if you can fire a single query, your indexing will be much faster.
Range queries on common attributes will not be possible across core. Ex - if you have price of Books and Music Cds stored in different cores, you can't get all products between X and Y price range.
Faceting feature will also be compromised.
So, while you may perceive some index time performance gain by parallelizing in form of 1 core per entity, I feel this may reduce the features that you can benefit from.

Solandra Sharding: Insider Thoughts

Just got started on Solandra and was trying to understand the 2nd
level details of Solandra sharding.
AFAIK Soalndra creates number of shards configured (as
"solandra.shards.at.once" property) where each shard is up to size of
"solandra.maximum.docs.per.shard".
On the next level it starts
creating slots inside each shard which are defined by
"solandra.maximum.docs.per.shard"/"solandra.index.id.reserve.size".
What I understood from the datamodel of SchemaInfo CF that inside a
particular shard there are slots owned by different physical nodes and
these is a race happening between nodes to get these slots.
My questions are:
Does this mean if I request write on a particular solr node
eg .....solandra/abc/dataimport?command=full-import does this request
gets distributed to all possible nodes etc. Is this distributed write?
Because until that happens how would other nodes be competing for
slots inside a particular shard.Ideally the code for writing a doc or
set of docs would be getting executed on a single physical JVM.
By sharding we tried to write some docs on the single physical node
but if it is writing based on the slots which are owned by different
physical nodes , what did we actually achieved as we again need to
fetch results from different nodes. I understand that the write
throughput is maximized.
Can we look into tuning these numbers -?
"solandra.maximum.docs.per.shard" ,
"solandra.index.id.reserve.size","solandra.shards.at.once" .
If I have just one shard and replication factor as 5 in a single DC
6 node setup, I saw that the endpoints of this shard contain 5
endpoints as per the Replication Factor.But what happens to the 6th
one. I saw through nodetool that the left 6th node doesn't really get
any data. If I increase the replication factor to 6 while keeping the
cluster on , will this solve the problem and doing repair etc or is
there a better way.
Overall the shards.at.once param is used to control parallelism of indexing. the higher that number the more shards are written to at once. If you set it to one you will always to writing to only one shard. Normally this should be set to 20% > the number of nodes in the cluster. so for a four node cluster set it to five.
The higher the reserve size, the less coordination between the nodes is needed. so if you know you have lots of documents to write then raise this.
The higher the docs.per.shard the slower the queries for a given shard will become. In general this should be 1-5M max.
To answer your points:
This will only import from one node. but it will index across many depending on shards at once.
I think the question is should you write across all nodes? Yes.
Yes see above.
If you increase shards.at.once this will be populated quickly