Maximum number of DDS Topics that may be created in a single DDS Domain - data-distribution-service

Is there a limit to the number of Topics that may be created for a particular Domain in DDS? Is this implementation-dependent?
What is the maximum for RTI Connext DDS 5.0.0? I don't see it specified in the documentation.

The 'magic' limit of 240 you recalled was most likely either the maximum number of DomainParticipants that can run on a single computer on the same domain ID, which is 120. Or else it is the maximum number of DDS domain IDs, which is 233. See http://community.rti.com/kb/what-maximum-number-participants-domain
As Reinier mentioned there are no intrinsic limits to the number of endpoints.
Gerardo

With Connext, the limiting factor is not so much the number of Topics, but more the number of DataReaders and DataWriters created in a particular Domain. Of course, each DataReader and DataWriter is associated with exactly one Topic, so indirectly there is a dependency on the number of Topics.
With regard to the maximum number of DataReader and DataWriters in a Domain (often collectively indicated by Endpoints), the practical limitations depend on the resources in your system. Memory consumption due to administration of the topology of your DDS system will increase with the number of Endpoints. There is no hard or hard-coded limit on the number of Endpoints though.
If you have any particular scale in mind, I could indicate where you are in comparison to other users of the product.
This answer is indeed implementation dependent. My remarks apply to RTI Connext DDS and are not necessarily true for other DDS implementations.

Related

Why is consistent hashing mentioned in load balancing literature?

I've been reviewing load balancing and I don't understand why the topic of consistent hashing has come up some times. If the goal is to ensure uniform allocation of traffic to appropriate servers, why not just generate a random number [1,M] and assign to that server? (Source: educative.io behind paywall.)
I have read that priority queues can be useful too when certain tasks must be executed in a sequence which not be the order in which the arrived. Yet this doesn't (at least on the surface) be related to consistent hashing either.

Is there a way to find out the maximum memory of a database?

I'm writing a Web Application in Oracle Apex. Is there any table where I can find out the maximum usable memory? It needs to be an SQL Statement.
I've searched on google but I found not the right answer for my problem.
Basically, Oracle memory is arranged into two chunks:
The Shared Global Area
The Program Global Area
These chunks are subdivided to support different types of database operations. The documentation covers all this in greater detail. Find out more.
For your specific purpose, the maximum size of the SGA and PGA will give you the "maximum usable memory". This query provides that information:
select name, value
from v$parameter
where name in ('sga_max_size', 'pga_aggregate_limit')
For these parameters value is in bytes. You may wish to sum them together, but the two areas are not fungible so it's probably more useful to know the two allocations.
Note that v$parameter is not exposed to non-DBA users by default. So you may need to get privileges granted to your user before you can build an Apex screen over it.

What is a 'Partition' in Apache Helix

I am learning Apache Helix. I came across the keyword 'Partitions'.
According to the definition mentioned here http://helix.apache.org/Concepts.html, Each subtask (of a main task) is referred to as a partition in Helix.
When I gone through the recipe - Distributed Lock Manager, partitions are nothing but instances of a resource. (Increase the numOfPartitions, number of locks is increased).
final int numPartitions = 12;
admin.addResource(clusterName, lockGroupName, numPartitions, "OnlineOffline",
RebalanceMode.FULL_AUTO.toString());
Can someone explain with simple example, what exactly the partition in Apache Helix is ?
I think you're right that a partition is essentially an instance of a resource. As is the case in other distributed systems, partitions are used to achieve parallelism. A resource with only one instance can only run on one machine. Partitions simply provide the construct necessary to split a single resource among many machines by, well, partitioning the resource.
This is a pattern that is found in a large portion of distributed systems. The difference, though, is while e.g. distributed databases explicitly define partitions essentially as a subset of some larger data set that can fit on a single node, Helix is more generic in that partitions don't have a definite meaning or use case, but many potential meanings and potential use cases.
One of these use cases in a system with which I'm very familiar is Apache Kafka's topic partitions. In Kafka, each topic - essentially a distributed log - is broken into a number of partitions. While the topic data can be spread across many nodes in the cluster, each partition is constrained to a single log on a single node. Kafka provides scalability by adding new partitions to new nodes. When messages are produced to a Kafka topic, internally they're hashed to some specific partition on some specific node. When messages are consumed from a topic, the consumer switches between partitions - and thus nodes - as it consumes from the topic.
This pattern generally applies to many scalability problems and is found in almost any HA distributed database (e.g. DynamoDB, Hazelcast), map/reduce (e.g. Hadoop, Spark), and other data or task driven systems.
The LinkedIn blog post about Helix actually gives a bunch of useful examples of the relationships between resources and partitions as well.

own process manager based on hydra (mpich)

How do you assess the level of difficulty writing own process manager based on the sources of hydra (mpich)? ie., for scale 1 to 100? It will be change the part corresponding to the assignment of processes to computers.
This shouldn't be too hard, but Hydra already implements several rank allocation strategies, so you might not even need to write any code.
You can already provide a user-specified rank allocation. Based on the provided configuration, Hydra can use the hwloc library to obtain hardware topology information and bind processes to cores.

What is database throughput?

Well, not much to ask apart from the question. What do you mean when you say a OLTP DB must have a high throughput.
Going to the wiki.
"In communication networks, such as
Ethernet or packet radio, throughput
or network throughput is the average
rate of successful message delivery
over a communication channel. This
data may be delivered over a physical
or logical link, or pass through a
certain network node. The throughput
is usually measured in bits per second
(bit/s or bps), and sometimes in data
packets per second or data packets per
time slot."
So does this mean , OLTP databases need to have a high/quick insertion rate ( i.e. avoiding deadlocks etc)??
I was always under an impression if we take a database for say an airline industry, it must have quick insertion , but at the same time quick response time since it is critical to it's operation. And in many ways this shouldn't this be limited to the protocol involved in delivering the message/data to the database?
I am not trying to single out the "only" characteristic of OLTP systems. In general I would like to understand, what characteristics are inherent to a OLTP system.
Cheers!
In general, when you're talking about the "throughput" of an OLTP database, you're talking about the number of transactions per second. How many orders can the system take a second, how many web page requests can it service, how many customer inquiries can it handle. That tends to go hand-in-hand with discussions about how the OLTP system scales-- if you double the number of customers hitting your site every month because the business is taking off, for example, will the OLTP systems be able to handle the increased throughput.
That is in contrast to OLAP/ DSS systems which are designed to run a relatively small number of transactions over much larger data volumes. There, you're worried far less about the number of transactions you can do than about how those transactions slow down as you add more data. If you're that wildly successful company, you probably want the same number and frequency of product sales by region reports out of your OLAP system as you generate exponentially more sales. But you now have exponentially more data to crunch which requires that you tune the database just to keep report performance constant.
Throughput doesn't have a single, fixed meaning in this context. Loosely, it means the number of transactions per second, but "write" transactions are different than "read" transactions, and sustained rates are different than peak rates. (And, of course, a 10-byte row is different than a 1000-byte row.)
I stumbled on Performance Metrics & Benchmarks: Berkeley DB the other day when I was looking for something else. It's not a bad introduction to the different ways of measuring "how fast". Also, this article on database benchmarks is an entertaining read.