How many Vertica Databases can run on a Host in the same time?

How many Vertica Databases can run on a Host in the same time? - instance

I know in Oracle i can have multiple homes running on the same host ?
Can this be done in Vertica to ? i am running CE vertion of Vertica and it seems i can not do this !!

They don't allow multiple databases within a single instance of vertica to be active; it makes sense that they wouldn't allow multiple instance of vertica, resulting in multiple databases, active at the same time.
EDIT: Reasons I say it makes sense: Vertica can be resource intensive. It is designed to deal with A LOT of data. Having multiple 'Verticas' fighting for disk, cpu, bandwidth is going to negatively impact performance for all of them.

Also remember, Vertica is a distributed database unlike Oracle. Therefore you have an instance in each of your cluster node and the cluster has access to larger disk storage. Distributed databases are best used if data of various apps stays in a single cluster as each instance takes up lot of CPU for data compression, read/write optimized stores, and delivering performance.

You cannot run multiple databases in a Vertica cluster. That said, there are no set limits on the number of schemas you can run in a single database. Considering how many users and how much data can be handled by a single Vertica cluster (with one database), I would need a very compelling reason why multiple databases are necessary. It rather like complaining that a house doesn't have three kitchens, one each for breakfast lunch and dinner.

Related

Does splitting data between databases on the same instance increase the performance of Redis searches?

My doubt is,
I have the same instance of Redis, with multiple databases (one for each service).
If more than one service used the same database, would the prefix search be slower? (having the data of all the services in one place and having to go through all of them, as opposed to only going through the selected base)

Partitioning in Redis serves two main goals:
It allows for much larger databases, using the sum of the memory of
many computers. Without partitioning you are limited to the amount of
memory a single computer can support.
It allows scaling the computational power to multiple cores and multiple computers, and the network bandwidth to multiple computers and network adapters.
Ref

Integration through database should be avoided, read more here: https://martinfowler.com/bliki/IntegrationDatabase.html
If you use KEYS command for it, then read this from documentation:
Warning: consider KEYS as a command that should only be used in production environments with extreme care. It may ruin performance when it is executed against large databases. This command is intended for debugging and special operations, such as changing your keyspace layout. Don't use KEYS in your regular application code. If you're looking for a way to find keys in a subset of your keyspace, consider using SCAN or sets.
Redis doesn't support prefix index, but you can use sorted set to do prefix search to some degree.
Redis is in-memory store, so all reads are pretty fast as long as you model it the right way.
Better query multiple times than doing integration through db...
Btw if you have multiple services owning the same data, then you should probably model your services differently...

In general, I would avoid key prefix searching. REDIS isn't a standard database and key searches are slow.
Since REDIS is a key/value store, it's optimized as such.
To take advantage of REDIS you want to hit the desired key directly.
I expect key search time increases with the total amount of keys, so splitting the database would potentially reduce the key search time.
However, if your doing key searches, I would put the desired keys into a key list and just do a direct look up there.
desired_prefix = ["desired_prefix-a", "desired_prefix-b", ...]
lpush "prefix_x_keys" "x-a"

If you run multiple databases on a single instance, they all will try to use and acquire the same resource and memory plus they will also run their core process which will not help you.
you can think like this, 2 OS running on the same machine will never match the performance of a single OS utilizing all resources of the machine.
What you can do to increase the performance is make more tables or use the partition concept. This will not put too much data in a single table and the search will work faster.

Accessing Multiple Redis Shards

Hi I'm going to be using multiple Redis instances and some sharding between instances.
My question is will performance suffer [a noticeable amount] if loading a webpage requires multiple shards accessed.
My basic overview is to have load balanced between multiple Redis shards*footnote below, possibly using Twemproxy for this. And have everything pertaining to a particular users' data on only one shard, (for things like 'likes','user-information','save-list' etc.) but also have multiple instances of Redis containing objects (which many different users will access) and data about said objects which will load for users also. I will not need to have Redis operations on multiple keys in different databases, but I will need to have Redis instances return m keys from n instances in real time.
To come completely clean with you I'm also planning on using something like this https://github.com/mpalmer/redis/blob/nds-2.6/README.nds so that I can use Redis while saving many keys to disc when not in use.
FOOTNOTE: (I am aware of Redis's Master-Slave replication, but prefer sharding for the extra storage in place of just more access)
Please, if your only comment is along the lines of, ""don't bother to shard until you absolutely have to"", keep it to yourself. I'm not interested in hearing responses that sharding is only important for a certain percentage of sites. That may be your opinion and that may even be fact but that is not what I am asking here.

IMO, if you're going to perform multiple reads from multiple shards instead of a single instance, you're most likely to get better performance as long as:
1. The sharding layer isn't slowing you down
2. The app can pull the data from the different shards asynchronously

Single logical SQL Server possible from multiple physical servers?

With Microsoft SQL Server 2005, is it possible to combine the processing power of multiple physical servers into a single logical sql server? Is it possible on SQL Server 2008?
I'm thinking, if the database files were located on a SAN and somehow one of the sql servers acted as a kind of master, then processing could be spread out over multiple physical servers, for instance even allowing simultaneous updates where there was no overlap, and in the case of read-only queries on unlocked tables no limit.
We have an application that is limited by the speed of our sql server, and probably stuck with server 2005 for now. Is the only option to get a single more powerful physical server?
Sorry I'm not an expert, I'm not sure if the question is a stupid one.
TIA

Before rushing out and buying new hardware, find out where your bottlenecks really are. Many locking problems can be solved with the appropriate indexes for your workload.
For example, I've seen instances where placing tempDB on SSD solved performance issues and saved the client buying an expensive new server.
Analyse your workload: How Can I Log and Find the Most Expensive Queries?
With SQL Server 2008 you can utilise the Management Data Warehouse (MDW) to capture your workload.
White Paper: SQL Server 2008 Performance and Scale
Also: please be aware that a SAN solution is not necessarily a faster I/O solution than directly attached storage. It depends on the SAN, number of Physical disks in a LUN, LUN subscription and usage, the speed of the HBA's and several other hardware factors...

Optimizing the app may be a big job of going through all business logic and lines of code. But looking for the most expansive query can easily locate the bottleneck area. Maybe it only happens to a couple of the biggest tables, views or stored procedures. Add or fine tune an index may help right the way. If bumping up the RAM is possible try that option as well. That is cheap and easy configure.
Good luck.

You might want to google for "sql server scalable shared database". Yes you can store your db files on a SAN and use multiple servers, but you're going to have to meet some pretty rigid criteria for it to be a performance boost or even useful (high ratio of reads to writes, small enough dataset to fit in memory or a fast enough SAN, multiple concurrent accessors, etc, etc).
Clustering is complicated and probably much more expensive in the long run than a bigger server, and far less effective than properly optimized application code. You should definitely make sure your app is well optimized.

Redis as a database

I want to use Redis as a database, not a cache. From my (limited) understanding, Redis is an in-memory datastore. What are the risks of using Redis, and how can I mitigate them?

You can use Redis as an authoritative store in a number of different ways:
Turn on AOF (Append-only File store) see AOF docs. This will keep a log of all Redis commands made against your dataset in real-time.
Run Redis using Master-Slave replication see replication docs. This will allow you to provide high-availability if one of your instances fails.
If you're running on something like EC2 you can EBS back your Redis partition to provide another layer of protection against instance failure.
On the horizon is Redis Cluster - this is specifically designed as a way to run Redis in a way that should help with HA and scalability. However, this won't appear for at least another six months or so.

Redis is an in-memory store which can also write the data back to disc. You can specify how many times to do a fsync to make redis safer(but also slower => trade-off) .
But still I am not certain if redis is in state yet to really store (mission) critical data in it (yet?). If for example it is not a huge problem when 1 more tweets(twitter.com) or something similiar get losts then I would certainly use redis. There is also a lot of information available about persistence at redis's own website.
You should also be aware of some persistence problems which could occur by reading antirez(redis maintainers) blog article. You should read his blog because he has some interesting articles.

I would like to share a few things that we have learned by using Redis as a primary Database in our service. We choose Redis since we had data that could not be partitioned. We wanted to get the best performance we could get out of one box
Pros:
Redis was unbeatable in raw performance. We got 10K transactions per second out of the box (Note that one transaction involved multiple Redis commands). We were able to hit a rate of 25K+ transactions per second after a few optimizations, along with LUA scripts. So when it comes to performance per box, Redis is unmatched.
Redis is very simple to setup and has a very small learning curve as opposed to other SQL and NoSQL datastores.
Cons:
Redis supports only few primitive Data Structures like Hashes, Sets, Lists etc. and operations on these Data Structures. These are more than sufficient when you are using Redis as a cache, but if you want to use Redis as a full fledged primary data store, you will feel constrained. We had a tough time modelling our data requirements using these simple types.
The biggest problem we have seen with Redis was the lack of flexibility. Once you have solutioned the structure of your data, any modifications to storage requirements or access patterns virtually requires re-thinking of the entire solution. Not sure if this is the case with all NoSQL data stores though (I have heard MongoDB is more flexible, but haven't used it myself)
Since Redis is single threaded, CPU utilization is very low. You can't put multiple Redis instances on the same machine to improve CPU utilization as they will compete for the same disk, making disk as the bottleneck.
Lack of horizontal scalability is a problem as mentioned by other answers.

As Redis is an in-memory storage, you cannot store large data that won't fit you machine's memory size. Redis usually work very bad when the data it stores is larger than 1/3 of the RAM size. So, this is the fatal limitation of using Redis as a database.
Certainly, you can distribute you big data into several Redis instances, but you have to do it all on your own manually. The operation usually be done like this(assuming you have only 1 instance from start):
Use its master-slave mechanism to replicate data to the second machine, Now you have 2 copies of the same data.
Cut off the connection between master and slave.
Delete the first half(split by hashing, etc) of data on the first machine, and delete the second half of data on the second machine.
Tell all clients(PHP, C, etc...) to operate on the first machine if the specified keys are on that machine, otherwise operate on the second machine.
This is the way how Redis scales! You also have to stop your service to prevent any writes during the migration.
To the expierence we encounter, we have this conclusion to Redis: Redis is not the right choice to store more than 30G data, Redis is not scalable, Redis is quite suitable for prototype development.
We later find an alternative to Redis, that is SSDB(https://github.com/ideawu/ssdb), a leveldb server that supports nearly all the APIs of Redis, it is suitable for storing more than 1TB of data, that only depends on the size of you harddisk.

Redis is a database, that means we can use it for persisting information for any kind of app, information like user accounts, blog posts, comments and so on. After storing information we can retrieve it later on by writing queries.
Now this behavior is similar to just about every other database, but what is the difference? Or rather why would we use it over any other database?
Redis is fast.
Redis is not fast because it's written in a special programming language or anything like that, it's fast because all data is stored in-memory.
Most databases store all their information between both the memory of a computer and the hard drive. Accessing data in-memory is fast, but getting it stored on a hard disk is relatively slow.
So rather than storing memory in hard disk, Redis decided to store it in memory.
Now, the downside to this is that working with data that is larger than the amount of memory your computer has, that is not going to work.
That may sound like a tremendous problem, but Redis has clear strategies for working around this limitation.
The above is just the first reason why Redis is so fast.
The second reason is that Redis stores all of its data or rather organizes all of its data in simple data structures such as Doubly Linked Lists, Sorted Sets and so on.
These data structures have well-known and well-understood performance characteristics. So as developers we can decide exactly how our information is organized and how to efficiently query data.
It's also very fast because Redis is simple in nature, it's not feature heavy; feature heavy datastores like Postgres have performance penalties.
So to use Redis as a database you have to know how to store in limited space, you have to know how to organize it into these simple data structures mentioned above and you have to understand how to work around the limited feature set.
So as far as mitigating risks, the way you start to do that is to start to think Redis Design Methodology and not SQL Database Design Methodology. What do I mean?
So instead of, step 1. Put the data in tables, step 2. figure out how we will query it.
With Redis it's more:
Step 1. Figure out what queries we need to answer.
Step 2. Structure data to best answer those queries.

How many tables are recommended in a SQL Server Express database?

I'm a beginner. How many number of tables are recommended in a SQL Server Express database? Mainly attaining best performance speedwise as an objective. Is it generally recommend to use two databases as compared to one for a single application?

SQL Express databases have a limit of 4GB in size. Within that limit, any number of tables is fair game. The number of tables makes absolutely no impact on performance. The only thing that drives performance of the application vis-a-vis the database is the proper design of the tables, both as logical model and as physical database structure (ie. correct choices of clustered indexes, non-clustered indexes, constraints, defaults, data types etc), and the proper querying and updating of the database ie. queries that can be satisfied (covered, efficiently) by the existing indexes.
Splitting an application database into multiple distinct databases is a bad idea. You are loosing consistency of the recovery unit (you can't backup/restore the two databases in a consistent state) and you need to replicate all the infrastructure around the database twice (security roles and permissions, maintenance activities and procedures etc). Also spliting an application database into distinct databases gaves absolutely no performance advantage.

What you can do to make things speedier:
-break up your databases so that they use multiple files across multiple, fast drives
-federate (not really something you'll do if you're running Express)
-Install memory, memory, memory
-I can't remember Express's limitations and I don't care to look them up, but on the configuration screen where you can assign the number of CPUs to dedicate to SQL, give it as many as you can. You should also be able to set affinity there (if not, then in Task Manager)
Don't run anything you don't need (scheduler, report engine, Server, DHCP Client) if you don't have to

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas