Apc user cache and php-fpm - apc

I have a pretty basic doubt on dirty read/writes to a value stored in apc when php-fpm is used to manage process.
i am looking at storing a counter in apc which i want to be shared across all the PHP fpm processes. The counter is used for the lo part of the hilo algorithm. Since apc user cache is shared across all the fpm child processes, i can basically increment this counter without having to worry about multiple processes having their own copy of the counter. But if the cache is shared, wouldn't you have to worry about thread safety? If yes, how do you go about doing that and if no, why not?
Thanks!

Didn't figure out how to do it with apc but have decided to use shared memory with semaphores to store the counter: http://in2.php.net/sem.

Related

Redis Database Vs Redis Cache

Could you please answer these 2 questions and correct me if wrong.
I assume Both Redis Database and Redis Cache are stored in Memory and not in Disk. Am I correct?
If Yes, What are the major difference between both. I am assuming both are stored in memory and it should not make much difference between them both. I mean the speed should be the same as they are in memory only. Do we still need Cache again?
Could you please tell me what are the differences and advantages between the both.
Second Question: Can the server restart remove all data in the Redis database? Cache must be deleted for sure I believe.
Thanks
Not sure what do you mean?
Redis is a product first of all - its an in-memory data structures store.
Depending on its configurations it can be targeted to different use cases:
Database
Cache
Even message broker
If you're coming from the cloud world, cloud providers can call this "Cache" and this means that they offer a redis that is pre-configured to be used as a cache (remove the oldest records when the memory becomes next to be fully utilized, etc).
But after you'll you will work with some kind of redis client that will interact with remote redis server.

Redis stream 50k consumer support parallel - capacity requirement

What are the Redis capacity requirements to support 50k consumers within one consumer group to consume and process the messages in parallel? Looking for testing an infrastructure for the same scenario and need to understand considerations.
Disclaimer: I worked in a company which used Redis in a somewhat large scale (probably less consumers than your case, but our consumers were very active), however I wasn't from the infrastructure team, but I was involved in some DevOps tasks.
I don't think you will find an exact number, so I'll try to share some tips and tricks to help you:
Be sure to read the entire Redis Admin page. There's a lot of useful information there. I'll highlight some of the tips from there:
Assuming you'll set up a Linux host, edit /etc/sysctl.conf and set a high net.core.somaxconn (RabbitMQ suggests 4096). Check the documentation of tcp-backlog config in redis.conf for an explanation about this.
Assuming you'll set up a Linux host, edit /etc/sysctl.conf and set vm.overcommit_memory = 1. Read below for a detailed explanation.
Assuming you'll set up a Linux host, edit /etc/sysctl.conf and set fs.file-max. This is very important for your use case. The Open File Handles / File Descriptors Limit is essentially the maximum number of file descriptors (each client represents a file descriptor) the SO can handle. Please check the Redis documentation on this. RabbitMQ documentation also present some useful information about it.
If you edit the /etc/sysctl.conf file, run sysctl -p to reload it.
"Make sure to disable Linux kernel feature transparent huge pages, it will affect greatly both memory usage and latency in a negative way. This is accomplished with the following command: echo never > /sys/kernel/mm/transparent_hugepage/enabled." Add this command also to /etc/rc.local to make it permanent over reboot.
In my experience Redis is not very resource-hungry, so I believe you won't have issues with CPU. Memory are directly related to how much data you intend to store in it.
If you set up a server with many cores, consider using more than one Redis Server. Redis is (mostly) single-threaded and will not use all your CPU resources if you use a single instance in a multicore environment.
Redis server also warns about wrong/risky configurations on startup (sorry for the old image):
Explanation on Overcommit Memory (vm.overcommit_memory)
Setting overcommit_memory to 1 says Linux to relax and perform the fork in a more optimistic allocation fashion, and this is indeed what you want for Redis [from Redis FAQ]
There are three possible settings for vm.overcommit_memory.
0 (zero): Check if enough memory is available and, if so, allow the allocation. If there isn’t enough memory, deny the request and return an error to the application.
1 (one): Permit memory allocation in excess of physical RAM plus swap, as defined by vm.overcommit_ratio. The vm.overcommit_ratio parameter is a
percentage added to the amount of RAM when deciding how much the kernel can overcommit. For instance, a vm.overcommit_ratio of 50 and 1 GB of
RAM would mean the kernel would permit up to 1.5 GB, plus swap, of memory to be allocated before a request failed.
2 (two): The kernel’s equivalent of "all bets are off", a setting of 2 tells the kernel to always return success to an application’s request for memory. This is absolutely as weird and scary as it sounds.

What about redis EVAL atomicity regarding keys with TTL?

As I know redis is single threaded solution from client point of view.
But what about the general architecture?
Amuse we have some lua script that going to execute several commands on keys that has some TTL.
How does redis garbage collections works? Could it interrupt the EVAL execution & evict some value or internal tasks share the single thread with user tasks?
Lua is majik, and because that is the case time stops when Redis is doing Lua. Put differently, expiration stops once you start running the script in the sense that time does not advance. However, if a key expired before the script started, it will not be available for the script to use.

how to design multi-process program using redis in python

I just started to use the redis cache in python. I read the tutorial but still feel confused about the concepts of "connectionpool", "connection" and etc..
I try to write a program which will be invoked multiple times in the console in different processes. They will all get and set the same shared in memory redis cache using same set of keys.
So to make it thread(process) safe, should I have one global connectionpool and get connections from the pool in different processes? Or should I have one global connection? What's the right way to do it?
Thanks,
Each instance of the program should spawn its own ConnectionPool. But this has nothing to do with thread safety. Whether or not your code is thread safe will depend on the type of operations you will be executing, and if you have multiple instances which may read and write concurrently, you need to look into using transactions, which are built into redis.

how to keep visited urls and maintain the job queue when writing a crawler

I'm writing a crawler. I keep the visited urls in redis set,and maintain the job queue using redis list. As data grows,memory is used up, my memory is 4G. How to maintain these without redis? I have no idea,if I store these in files,they also need to be in memory.
If I use a mysql to store that,I think it maybe much slower than redis.
I have 5 machines with 4G memory,if anyone has some material to set up a redis cluster,it also helps a lot. I have some material to set up a cluster to be failover ,but what I need is to set a load balanced cluster.
thx
If you are just doing the basic operations of adding/removing from sets and lists, take a look at twemproxy/nutcracker. With it you can use all of the nodes.
Regarding your usage pattern itself, are you removing or expiring jobs and URLs? How much repetition is there in the system? For example, are you repeatedly crawling the same URLs? If so, perhaps you only need a mapping of URLs to their last crawl time, and instead of a job queue you pull URLs that are new or outside a given window since their last run.
Without the details on how your crawler actually runs or interacts with Redis, that is about what I can offer. If memory grows continually, it likely means you aren't cleaning up the DB.