If some libraries are shared between two processes, will they be stored as one copy or two copies in SE mode? - gem5

I have a question: Under SE mode, I run two processes on two different cores (ARM DerivO3CPU, if these are necessary). By using --redirect, both processes can use the same dynamically linked library. However, will those library data be stored as one single copy or two copies in the cache and main memory?
Also, if the answer is two copies, will the deduplication happen in FS mode?
Thank you very much for answering my questions!

Related

Multi System Database structure based copying/updating best practice

so after searching and not finding similar cases I want to open a new question.
So here is the case:
We are working with a large database with a very complicated data structure. Also we are working on multiple systems to ensure stability (development, testing, quality and productive) and its always a struggle so move data between those systems. As I said the data structure is very large and there is also a lot of logic inside the database. Customers are able to add new data parts as configuration and there is also a static income of data which are used for statistics and monitoring. So let me explain the problem with a small example:
Lets take this Database as an example. We have some families making some contest with each other. And they will create some statistics about the points they make.
The Purple Tables are fixed configurations. They are created once and they can only be changed via an Operator. Those changes will be done and tested in the development system first.
The Yellow Tables are changing configurations. Each Family is able to create or delete multiple Contests and assign their kids.
The Red Table is just plain data. Each time a kid makes points, a new row is added with the amount and current time and the relation to the kid and contest.
This table will be the base for the later statistics.
This Database is developed on two systems a productive one which is used by the families and a develop system which is used by the programmers/operators.
While developing the programmers will add test data like kids families contests and points. And while using the families will create new contests and assign new kids and will fill up the point table.
It's necessary to copy new/tested/fixed families from the development to the productive system.
Its also necessary to copy Contests, Contest-Kid-Assignments and Points from the productive to the development system to find new errors.
Also it must be possible to change the table structure on the development system and transmit this change to the productive system. (This shouldn't be the main topic here sometimes it can be such a large changes that there just is no easy way, so lets keep this point simple but keep it in mind.)
I want to copy parts of the tables to another system but be able to ignore some tables (for example: Points) and I want to make sure to not copy kids without their parent family so there is no "parentless" object in the database.
Question: What would be a good and save way to do this?
I don't need a solution for a specific database type or some scripts. I'm looking for tools, libraries or good practice. (But just as a note we're using mssql.)
We are currently making a tool for this problem (not going well: unstable, overly complicated, slow and possible reinventing the wheel).
Also a lot of devs I know just copy the whole database (making a backup and running it into another server) But this is also making problems: users are being copied and their guid change so they loose permissions etc. I don't think this is a good solution. Also the database is down for quite a long time and its never a smooth process.
Making it manually is sometimes the easiest way but considering the size of our data structure its not just a huge piece of work there is also a large possibility for mistakes.
So I'm hoping someone knows a tool or something similar to help me out.
Welcome to the pains of development for a Stateful entity like a database. :) RedGate makes a tool called SQL Source Control that is good for moving changed data and Schema into Production, and it can interface with source control solutions such as GIT. It's a bit pricey, but it's the best I've found. One option for keeping dev up to date with prod data and dev changes is one I concocted at my last place of employment, which was... not 100% perfect, but better than nothing, and free. It was developed in Powershell, and it went something like this:
Create Pre-restore, Pre-dacpac and Post-dacpac SQL scripts to store data and
permission diffs between dev and prod
Use SQLPackage.EXE to make DacPac of Dev(Dacpac is basically an xml schema of db, no
data)
Execute Pre-restore Proc (Often copying out test data that needs to be persisted)
Restore Prod over Dev
Execute Pre-dacpac script (any DDL That could cause data loss may need to go here)
Use SQLPackage.EXE to apply DacPac made in step 2 to Newly restored database
Execute Post-Dacpac Script (Permissions, restoration of data copied in step 3)
Again, like I said, it worked and automated the restoration of prod data into our dev environment while keeping our dev changes intact, but it required a good bit of upkeep and maintenance. Also, keep in mind, once your DB reaches a certain size, doing a nightly restore is no longer a viable option due to the time it takes to restore.

SQL files management

Most of my day is spent on writing SQL queries to perform small tasks, mainly to get information from the database and manipulate it somehow for data visualization building reports for others.
At the end of the day i try to have a nice folder scheme to help me reusing code and so on, but it's becoming harder to handle so many files and keep
track of everything I've done so far.
Don't want to have huge SQL files because I might want to
the end It's hard to avoid a war zone in my desktop and on this folders. It's also a mess to handle so many folders/codes.
For version control we're using a GIT server, but there is plenty of code that is not in production that we would like to keep track and reuse somehow.
We're using iPython notebook, R studio and SSMS to build our codes, I'm wonder if there is some efficient ways to work.
There must be an efficient way to work out there. What do you use to keep track of your (SQL) codes? and more importantly reuse it.
Thanks in advance,
Rafael
I just use a folder system. And I keep the shell-scripts so to speak as the first file (like the generic code to do X). Whereas the specific codes where I take X and apply dates and other conditions in the bottom half of the folder.
What do you use to keep track of your (SQL) codes? and more importantly reuse it.
For ease of reuse, I have all my running SQL code backed up on an SQL server through routine INFORMATION SCHEMA dumps. For all development code that I need to reuse with others, I have a GIT server that gets automatic updates throughout the day. For reuse on my laptop itself, I have a local backup through time machine.
As for directory or folder structure, all code starts as project based and eventually I migrate the best and most useful code to a personal folder structure that is topic based (date arithmetic, indexing, etc.). No matter how they are stored, all these folders are indexed using local and remote indexing features so I can search and retrieve them with just a few keystrokes when needed. Ultimately what's needed for optimum reuse is ease of retrieval. The quicker I can retrieve, the more reuse I get.
Lastly, it's not just SQL code, but all the supporting documents that led to that code solution. Sometimes this collection may include code from other languages, code from other servers, emails, text documents, images, workflows, etc. Keeping them all together enhances the value of reuse.

Database objects version control (not schema)

I'm trying to figure out how to implement version control in an environment where we have two DBs: one Testing and one Production.
In Testing. there are an arbitrary number of tasks being tested. These have no constraints in number of objects manipulated and complexity, meaning we can have a 3-day task that changes 2 package bodys and one trigger, and we can have a 3 month task that changes 100 different objects, including ะก source files and binary objects.
My main concern are the text-based objects of the DB. We need to version the Test and Production code, but any task can go from Testing to Production with no defined order whatsoever.
This means right now we have to manually track the changes in the files, selecting inside each file which lines in the code go from Testing to Production. We use a very rudimentary solution, writing in the header a sequence of comments with a file-based version number and adding in the code tags with that sequence to delimit the change.
I'm struggling to implement SVN because I wanted to create Testing as a branch of Production, having branches in Testing to limit each task, but I find that it can lead to many Testing tasks being ported to Production during merges.
This said, my questions are:
Is there a way to resolve this automatically?
Are there any database-specific version control solutions?
How can I "link" both environments if the code base is so different?
I used SVN for source control on DB scripts.
I dont have a technological solution to your problem but i can explain the methodology we used.
We had two sets of scripts - one for incremental changes and another for the complete declaration of database objects and procedures.
During development we updated only the incremental changes in a script that that was eventually used during deployment. during test rounds we updated the script.
Finally, After running the script on production we updated the second set of scripts containing the full declarations. The full scripts were used as reference and to create a db from scratch.

What's the Point of Multiple Redis Databases?

So, I've come to a place where I wanted to segment the data I store in redis into separate databases as I sometimes need to make use of the keys command on one specific kind of data, and wanted to separate it to make that faster.
If I segment into multiple databases, everything is still single threaded, and I still only get to use one core. If I just launch another instance of Redis on the same box, I get to use an extra core. On top of that, I can't name Redis databases, or give them any sort of more logical identifier. So, with all of that said, why/when would I ever want to use multiple Redis databases instead of just spinning up an extra instance of Redis for each extra database I want? And relatedly, why doesn't Redis try to utilize an extra core for each extra database I add? What's the advantage of being single threaded across databases?
You don't want to use multiple databases in a single redis instance. As you noted, multiple instances lets you take advantage of multiple cores. If you use database selection you will have to refactor when upgrading. Monitoring and managing multiple instances is not difficult nor painful.
Indeed, you would get far better metrics on each db by segregation based on instance. Each instance would have stats reflecting that segment of data, which can allow for better tuning and more responsive and accurate monitoring. Use a recent version and separate your data by instance.
As Jonaton said, don't use the keys command. You'll find far better performance if you simply create a key index. Whenever adding a key, add the key name to a set. The keys command is not terribly useful once you scale up since it will take significant time to return.
Let the access pattern determine how to structure your data rather than store it the way you think works and then working around how to access and mince it later. You will see far better performance and find the data consuming code often is much cleaner and simpler.
Regarding single threaded, consider that redis is designed for speed and atomicity. Sure actions modifying data in one db need not wait on another db, but what if that action is saving to the dump file, or processing transactions on slaves? At that point you start getting into the weeds of concurrency programming.
By using multiple instances you turn multi threading complexity into a simpler message passing style system.
In principle, Redis databases on the same instance are no different than schemas in RDBMS database instances.
So, with all of that said, why/when would I ever want to use multiple
Redis databases instead of just spinning up an extra instance of Redis
for each extra database I want?
There's one clear advantage of using redis databases in the same redis instance, and that's management. If you spin up a separate instance for each application, and let's say you've got 3 apps, that's 3 separate redis instances, each of which will likely need a slave for HA in production, so that's 6 total instances. From a management standpoint, this gets messy real quick because you need to monitor all of them, do upgrades/patches, etc. If you don't plan on overloading redis with high I/O, a single instance with a slave is simpler and easier to manage provided it meets your SLA.
Even Salvatore Sanfilippo (creator of Redis) thinks it's a bad idea to use multiple DBs in Redis. See his comment here:
https://groups.google.com/d/topic/redis-db/vS5wX8X4Cjg/discussion
I understand how this can be useful, but unfortunately I consider
Redis multiple database errors my worst decision in Redis design at
all... without any kind of real gain, it makes the internals a lot
more complex. The reality is that databases don't scale well for a
number of reason, like active expire of keys and VM. If the DB
selection can be performed with a string I can see this feature being
used as a scalable O(1) dictionary layer, that instead it is not.
With DB numbers, with a default of a few DBs, we are communication
better what this feature is and how can be used I think. I hope that
at some point we can drop the multiple DBs support at all, but I think
it is probably too late as there is a number of people relying on this
feature for their work.
I don't really know any benefits of having multiple databases on a single instance. I guess it's useful if multiple services use the same database server(s), so you can avoid key collisions.
I would not recommend building around using the KEYS command, since it's O(n) and that doesn't scale well. What are you using it for that you can accomplish in another way? Maybe redis isn't the best match for you if functionality like KEYS is vital.
I think they mention the benefits of a single threaded server in their FAQ, but the main thing is simplicity - you don't have to bother with concurrency in any real way. Every action is blocking, so no two things can alter the database at the same time. Ideally you would have one (or more) instances per core of each server, and use a consistent hashing algorithm (or a proxy) to divide the keys among them. Of course, you'll loose some functionality - piping will only work for things on the same server, sorts become harder etc.
Redis databases can be used in the rare cases of deploying a new version of the application, where the new version requires working with different entities.
I know this question is years old, but there's another reason multiple databases may be useful.
If you use a "cloud Redis" from your favourite cloud provider, you probably have a minimum memory size and will pay for what you allocate. If however your dataset is smaller than that, then you'll be wasting a bit of the allocation, and so wasting a bit of money.
Using databases you could use the same Redis cloud-instance to provide service for (say) dev, UAT and production, or multiple instances of your application, or whatever else - thus using more of the allocated memory and so being a little more cost-effective.
A use-case I'm looking at has several instances of an application which use 200-300K each, yet the minimum allocation on my cloud provider is 1M. We can consolidate 10 instances onto a single Redis without really making a dent in any limits, and so save about 90% of the Redis hosting cost. I appreciate there are limitations and issues with this approach, but thought it worth mentioning.
I am using redis for implementing a blacklist of email addresses , and i have different TTL values for different levels of blacklisting , so having different DBs on same instance helps me a lot .
Using multiple databases in a single instance may be useful in the following scenario:
Different copies of the same database could be used for production, development or testing using real-time data. People may use replica to clone a redis instance to achieve the same purpose. However, the former approach is easier for existing running programs to just select the right database to switch to the intended mode.
Our motivation has not been mentioned above. We use multiple databases because we routinely need to delete a large set of a certain type of data, and FLUSHDB makes that easy. For example, we can clear all cached web pages, using FLUSHDB on database 0, without affecting all of our other use of Redis.
There is some discussion here but I have not found definitive information about the performance of this vs scan and delete:
https://github.com/StackExchange/StackExchange.Redis/issues/873

Erlang ETS tables versus message passing: Optimization concerns?

I'm coming into an existing (game) project whose server component is written entirely in erlang. At times, it can be excruciating to get a piece of data from this system (I'm interested in how many widgets player 56 has) from the process that owns it. Assuming I can find the process that owns the data, I can pass a message to that process and wait for it to pass a message back, but this does not scale well to multiple machines and it kills response time.
I have been considering replacing many of the tasks that exist in this game with a system where information that is frequently accessed by multiple processes would be stored in a protected ets table. The table's owner would do nothing but receive update messages (the player has just spent five widgets) and update the table accordingly. It would catch all exceptions and simply go on to the next update message. Any process that wanted to know if the player had sufficient widgets to buy a fooble would need only to peek at the table. (Yes, I understand that a message might be in the buffer that reduces the number of widgets, but I have that issue under control.)
I'm afraid that my question is less of a question and more of a request for comments. I'll upvote anything that is both helpful and sufficiently explained or referenced.
What are the likely drawbacks of such an implementation? I'm interested in the details of lock contention that I am likely to see in having one-writer-multiple-readers, what sort of problems I'll have distributing this across multiple machines, and especially: input from people who've done this before.
first of all, default ETS behaviour is consistent, as you can see by documentation: Erlang ETS.
It provides atomicity and isolation, also multiple updates/reads if done in the same function (remember that in Erlang a function call is roughly equivalent to a reduction, the unit of measure Erlang scheduler uses to share time between processes, so a multiple function ETS operation could possibly be split in more parts creating a possible race condition).
If you are interested in multiple nodes ETS architecture, maybe you should take a look to mnesia if you want an OOTB multiple nodes concurrency with ETS: Mnesia.
(hint: I'm talking specifically of ram_copies tables, add_table_copy and change_config methods).
That being said, I don't understand the problem with a process (possibly backed up by a not named ets table).
I explain better: the main problem with your project is the first, basic assumption.
It's simple: you don't have a single writing process!
Every time a player takes an object, hits a player and so on, it calls a non side effect free function updating game state, so even if you have a single process managing game state, he must also tells other player clients 'hey, you remember that object there? Just forget it!'; this is why the main problem with many multiplayer games is lag: lag, when networking is not a main issue, is many times due to blocking send/receive routines.
From this point of view, using directly an ETS table, using a persistent table, a process dictionary (BAD!!!) and so on is the same thing, because you have to consider synchronization issues, like in objects oriented programming languages using shared memory (Java, everyone?).
In the end, you should consider just ONE main concern developing your application: consistency.
After a consistent application has been developed, only then you should concern yourself with performance tuning.
Hope it helps!
Note: I've talked about something like a MMORPG server because I thought you were talking about something similar.
An ETS table would not solve your problems in that regard. Your code (that wants to get or set the player widget count) will always run in a process and the data must be copied there.
Whether that is from a process heap or an ETS table makes little difference (that said, reading from ETS is often faster because it's well optimized and doesn't perform any other work than getting and setting data). Especially when getting the data from a remote node. For multple readers ETS is most likely faster since a process would handle the requests sequentially.
What would make a difference however, is if the data is cached on the local node or not. That's where self replicating database systems, such as Mnesia, Riak or CouchDB, comes in. Mnesia is in fact implemented using ETS tables.
As for locking, the latest version of Erlang comes with enhancements to ETS which enable multiple readers to simultaneously read from a table plus one writer that writes. The only locked element is the row being written to (thus better concurrent performance than a normal process, if you expect many simultaneous reads for one data point).
Note however, that all interaction with ETS tables is non-transactional! That means that you cannot rely on writing a value based on a previous read because the value might have changed in the meantime. Mnesia handles that using transactions. You can still use the dirty_* functions in Mneisa to squeeze out near-ETS performance out of most operations, if you know what you're doing.
It sounds like you have a bunch of things that can happen at any time, and you need to aggregate the data in a safe, uniform way. Take a look at the Generic Event behavior. I'd recommend using this to create an event server, and have all these processes share this information via events to your server, at that point you can choose to log it or store it somewhere (like an ETS table). As an aside, ETS tables are not good for peristent data like how many "widgets" a player has - consider Mnesia, or an excellent crash only db like CouchDB. Both of these replicate very well across machines.
You bring up lock contention - you shouldn't have any locks. Messages are processed in a synchronous order as they are received by each process. In fact, the entire point of the message passing semantics built into the language is to avoid shared-state concurrency.
To summarize, normally you communicate with messages, from process to process. This is hairy for you, because you need information from processes scattered all over the place, so my recommendation for you is based of the idea of concentrating all information that is "interesting" outside of the originating processes into a single, real-time source.