Sometimes redis keys are not deleted - redis

I have implemented storing the results of a selection from a database (list) in Redis to speed up data loading on the site. When a cms user performs any operations (creates, deletes or edits article), the keys in redis are dropped and load fresh data from the database.
But sometimes it happens that one or two users do not drop their keys after performing operations with articles and old data remains in redis. The Internet was available, nothing was turned off. Why does this happen - are there any typical reasons that I need to know?
Do I need to block the database so that there are no multiple connections? But redis seems to be single-threaded. What am I doing wrong? The function for key drop is very simple:
function articlesRedisDrop($prefix)
{
$keys = $redis->keys($prefix."*");
if(count($keys) > 0)
{
foreach($keys as $key)
{
$redis->del($key);
}
}
}

guess that an atomic question. After $redis->keys($prefix."*"), before $redis->del($key), another connection added refresh data to redis.
You can try to combine get and del operations to single lua script.
local keys = redis.call("keys",string.format("%s.*",KEYS[1]))
for _, key in pairs(keys) do
redis.call("del", key)
end
then run the script with eval command and prefix param. If you meet performance problem with keys command, you can try scan or store all prefix keys to a set then get and delete all.

Related

Incrementally importing data to a PostgreSQL database

Situation:
I have a PostgreSQL-database that is logging data from sensors in a field-deployed unit (let's call this the source database). The unit has a very limited hard-disk space, meaning that if left untouched, the data-logging will cause the disk where the database is residing to fill up within a week. I have a (very limited) network link to the database (so I want to compress the dump-file), and on the other side of said link I have another PostgreSQL database (let's call that the destination database) that has a lot of free space (let's just, for argument's sake, say that the source is very limited with regard to space, and the destination is unlimited with regard to space).
I need to take incremental backups of the source database, append the rows that have been added since last backup to the destination database, and then clean out the added rows from the source database.
Now the source database might or might not have been cleaned since a backup was last taken, so the destination database needs to be able to only imported the new rows in an automated (scripted) process, but pg_restore fails miserably when trying to restore from a dump that has the same primary key numbers as the destination database.
So the question is:
What is the best way to restore only the rows from a source that are not already in the destination database?
The only solution that I've come up with so far is to pg_dump the database and restore the dump to a new secondary-database on the destination-side with pg_restore, then use simple sql to sort out which rows already exist in my main-destination database. But it seems like there should be a better way...
(extra question: Am I completely wrong in using PostgreSQL in such an application? I'm open to suggestions for other data-collection alternatives...)
A good way to start would probably be to use the --inserts option to pg_dump. From the documentation (emphasis mine) :
Dump data as INSERT commands (rather than COPY). This will make
restoration very slow; it is mainly useful for making dumps that can
be loaded into non-PostgreSQL databases. However, since this option
generates a separate command for each row, an error in reloading a row
causes only that row to be lost rather than the entire table contents.
Note that the restore might fail altogether if you have rearranged
column order. The --column-inserts option is safe against column order
changes, though even slower.
I don't have the means to test it right now with pg_restore, but this might be enough for your case.
You could also use the fact that from the version 9.5, PostgreSQL provides ON CONFLICT DO ... for INSERTs. Use a simple scripting language to add these to the dump and you should be fine. I haven't found an option for pg_dump to add those automatically, unfortunately.
You might google "sporadically connected database synchronization" to see related solutions.
It's not a neatly solved problem as far as I know - there are some common work-arounds, but I am not aware of a database-centric out-of-the-box solution.
The most common way of dealing with this is to use a message bus to move events between your machines. For instance, if your "source database" is just a data store, with no other logic, you might get rid of it, and use a message bus to say "event x has occurred", and point the endpoint of that message bus at your "destination machine", which then writes that to your database.
You might consider Apache ActiveMQ or read "Patterns of enterprise integration".
#!/bin/sh
PSQL=/opt/postgres-9.5/bin/psql
TARGET_HOST=localhost
TARGET_DB=mystuff
TARGET_SCHEMA_IMPORT=copied
TARGET_SCHEMA_FINAL=final
SOURCE_HOST=192.168.0.101
SOURCE_DB=slurpert
SOURCE_SCHEMA=public
########
create_local_stuff()
{
${PSQL} -h ${TARGET_HOST} -U postgres ${TARGET_DB} <<OMG0
CREATE SCHEMA IF NOT EXISTS ${TARGET_SCHEMA_IMPORT};
CREATE SCHEMA IF NOT EXISTS ${TARGET_SCHEMA_FINAL};
CREATE TABLE IF NOT EXISTS ${TARGET_SCHEMA_FINAL}.topic
( topic_id INTEGER NOT NULL PRIMARY KEY
, topic_date TIMESTAMP WITH TIME ZONE
, topic_body text
);
CREATE TABLE IF NOT EXISTS ${TARGET_SCHEMA_IMPORT}.tmp_topic
( topic_id INTEGER NOT NULL PRIMARY KEY
, topic_date TIMESTAMP WITH TIME ZONE
, topic_body text
);
OMG0
}
########
find_highest()
{
${PSQL} -q -t -h ${TARGET_HOST} -U postgres ${TARGET_DB} <<OMG1
SELECT MAX(topic_id) FROM ${TARGET_SCHEMA_IMPORT}.tmp_topic;
OMG1
}
########
fetch_new_data()
{
watermark=${1-0}
echo ${watermark}
${PSQL} -h ${SOURCE_HOST} -U postgres ${SOURCE_DB} <<OMG2
\COPY (SELECT topic_id, topic_date, topic_body FROM ${SOURCE_SCHEMA}.topic WHERE topic_id >${watermark}) TO '/tmp/topic.dat';
OMG2
}
########
insert_new_data()
{
${PSQL} -h ${TARGET_HOST} -U postgres ${TARGET_DB} <<OMG3
DELETE FROM ${TARGET_SCHEMA_IMPORT}.tmp_topic WHERE 1=1;
COPY ${TARGET_SCHEMA_IMPORT}.tmp_topic(topic_id, topic_date, topic_body) FROM '/tmp/topic.dat';
INSERT INTO ${TARGET_SCHEMA_FINAL}.topic(topic_id, topic_date, topic_body)
SELECT topic_id, topic_date, topic_body
FROM ${TARGET_SCHEMA_IMPORT}.tmp_topic src
WHERE NOT EXISTS (
SELECT *
FROM ${TARGET_SCHEMA_FINAL}.topic nx
WHERE nx.topic_id = src.topic_id
);
OMG3
}
########
delete_below_watermark()
{
watermark=${1-0}
echo ${watermark}
${PSQL} -h ${SOURCE_HOST} -U postgres ${SOURCE_DB} <<OMG4
-- delete not yet activated; COUNT(*) instead
-- DELETE
SELECT COUNT(*)
FROM ${SOURCE_SCHEMA}.topic WHERE topic_id <= ${watermark}
;
OMG4
}
######## Main
#create_local_stuff
watermark="`find_highest`"
echo 'Highest:' ${watermark}
fetch_new_data ${watermark}
insert_new_data
echo 'Delete below:' ${watermark}
delete_below_watermark ${watermark}
# Eof
This is just an example. Some notes:
I assume a non-decreasing serial PK for the table; in most cases it could also be a timestamp
for simplicity, all the queries are run as user postgres, you might need to change this
the watermark method will guarantee that only new records will be transmitted, minimising bandwidth usage
the method is atomic, if the script crashes, nothing is lost
only one table is fetched here, but you could add more
because I'm paranoid, I us a different name for the staging table and put it into a separate schema
The whole script does two queries on the remote machine (one for fetch one for delete); you could combine these.
but there is only one script (executing from the local=target machine) involved.
The DELETE is not yet active; it only does a count(*)

Redis increment several fields in several hsets

I have data of several users in redis e.g.
hset - user111; field - dayssincelogin .....
I want to periodically update dayssincelogin for all users, one way to do it is
KEYS user*
HINCRBY ${key from above} dayssincelogin 1
Is it possible to do this in a single call? If not what's the most optimal way? I'm using using redis cluster and java client.
You can't do multiple increments in one command but you can bulk your commands together for performance gains.
Use Redis Pipe-lining or Scripting.
In Jedis I dont thing LUA is supported (If someone could answer that :) )
As #mp911de suggested; Use Exec for LUA Scripting
and you can also use pipelining to execute your bulk methods faster.
Have a Pipelining readup here for more information
And here is the sample code to use Jedis Pipelining.
Pipeline p = jedis.pipelined();
p.multi();
Response<Long> r1 = p.hincrBy("a", "f1", -1);
Response<Long> r2 = p.hincrBy("a", "f1", -2);
Response<List<Object>> r3 = p.exec();
List<Object> result = p.syncAndReturnAll();
Edit: Redis allows multi key operations only when they are present in the same shard. You should arrange your keys in such a way to ensure data affinity. like key1.{foo} and key5678.{foo} will reside in the same server

How do I completely clear a SQLite3 database without deleting the database file?

For unit testing purposes I need to completely reset/clear SQLite3 databases. All databases are created in memory rather than on the file system when running the test suite so I can't delete any files. Additionally, several instances of a class will be referencing the database simultaneously, so I can't just create a new database in memory and assign it to a variable.
Currently my workaround for clearing a database is to read all the table names from sqlite_master and drop them. This is not the same as completely clearing the database though, since meta data and other things I don't understand will probably remain.
Is there a clean and simple way, like a single query, to clear a SQLite3 database? If not, what would have to be done to an existing database to make it identical to a completely new database?
In case it's relevant, I'm using Ruby 2.0.0 with sqlite3-ruby version 1.3.7 and SQLite3 version 3.8.2.
This works without deleting the file and without closing the db connection:
PRAGMA writable_schema = 1;
DELETE FROM sqlite_master;
PRAGMA writable_schema = 0;
VACUUM;
PRAGMA integrity_check;
Another option, if possible to call the C API directly, is by using the SQLITE_DBCONFIG_RESET_DATABASE:
sqlite3_db_config(db, SQLITE_DBCONFIG_RESET_DATABASE, 1, 0);
sqlite3_exec(db, "VACUUM", 0, 0, 0);
sqlite3_db_config(db, SQLITE_DBCONFIG_RESET_DATABASE, 0, 0);
Here is the reference
The simple and quick way
If you use in-memory database, the fastest and most reliable way is to close and re-establish sqlite connection. It flushes any database data and also per-connection settings.
If you want to have some kind of "reset" function, you must assume that no other threads can interrupt that function - otherwise any method will fail. Therefore even you have multiple threads working on that database, there need to be a "stop the world" mutex (or something like that), so the reset can be performed. While you have exclusive access to the database connection - why not closing and re-opening it?
The hard way
If there are some other limitations and you cannot do it the way above, then you were already pretty close to have a complete solution. If your threads don't touch pragmas explicitly, then only "schema_version" pragma can be changed silently, but if your threads can change pragmas, well, then you have to go through the list on http://sqlite.org/pragma.html#toc and write "reset" function which will set each and every pragma value to it's initial value (you need to read default values at the begining).
Note, that pragmas in SQLite can be divided to 3 groups:
defined initially, immutable, or very limited mutability
defined dynamically, per connection, mutable
defined dynamically, per database, mutable
Group 1 are for example page_size, page_count, encoding, etc. Those are definied at database creation moment and usualle cannot be modified later, with some exceptions. For example page_size can be changed prior to "VACUUM", so the new page size will be set then. The page_count cannot be changed by user, but it changes automatically when adding data (obviously). The encoding is defined at creation time and cannot be modified later.
You should not need to reset pragmas from group 1.
Group 2 are for example cache_size, recursive_triggers, jurnal_mode, foreign_keys, busy_timeout, etc. These pragmas are always set to defaults when opening new connection to the database. If you don't disconnect, you will need to reset those to defaults manually.
Group 3 are for example schema_version, user_version, maybe some others, you need to look it up. Those will also need manual reset. If you disconnect from in-memory database, the database gets destroyed, so then you don't need to reset those.
Create an empty memory database.
Use the backup API to copy that database over the actual database.
In the case of sqlite3-ruby, see test/test_backup.rb for an example.
SELECT * FROM dbname.sqlite_master WHERE type='table';
and
DROP TABLE

Redis move all keys from one database to another

Is there a command to move redis keys from one database to another or is it possible only with lua scripting??
There has been this type of question asked perviously redis move all keys but the answers are not appropriate and convincing for a beginner like me.
u can use "MOVE" to move one key to another redis database;
the text below is from redis.io
MOVE key db
Move key from the currently selected database (see SELECT) to the specified destination database. When key already exists in the destination database, or it does not exist in the source database, it does nothing. It is possible to use MOVE as a locking primitive because of this.
Return value
Integer reply, specifically:
1 if key was moved.
0 if key was not moved.
I think this will do the job:
redis-cli keys '*' | xargs -I % redis-cli move % 1 > /dev/null
(1 is the new database number, and redirection to /dev/null is in order to avoid getting millions of '1' lines - since it will move the keys one by one and return 1 each time)
Beware that redis might run out of connections and then will display tons of such errors:
Could not connect to Redis at 127.0.0.1:6379: Cannot assign requested address
So it could be better (and much faster) to just dump the database and then import it into the new one.
If you have a big database with millions of keys, you can use the SCAN command to select all the keys (without blocking like the dangerous KEYS command that even Redis authors do not recommend).
SCAN gives you the keys by "pages" one by one and the idea is to start at page 0 (formally called CURSOR 0) and then continue with the next page/cursor until you hit the end (the stop signal is when you get the CURSOR 0 again).
You may use any popular language for this like Redis or Ruby or Scala. Here a draft using Bash Scripting:
#!/bin/bash -e
REDIS_HOST=localhost
PAGE_SIZE=10000
KEYS_TO_QUERY="*"
SOURCE_DB=0
TARGET_DB=1
TOTAL=0
while [[ "$CURSOR" != "0" ]]; do
CURSOR=${CURSOR:-0}
>&2 echo $TOTAL:$CURSOR
KEYS=$(redis-cli -h $REDIS_HOST -n $SOURCE_DB scan $CURSOR match "$KEYS_TO_QUERY" count $PAGE_SIZE)
unset CURSOR
for KEY in $KEYS; do
if [[ -z $CURSOR ]]; then
CURSOR=$KEY
else
TOTAL=$(($TOTAL + 1))
redis-cli -h $REDIS_HOST -n $SOURCE_DB move $KEY $TARGET_DB
fi
done
done
IMPORTANT: As usual, please do not copy and paste scripts without understanding what is doing, so here some details:
The while loop is selecting the keys page by page with the SCAN command and with every key then running the MOVE command.
The SCAN command will return the next cursor in the first line and then the rest of the lines will be the found keys. The while loop starts with the variable CURSOR not defined and then defined in the first loop (this is some magic to just stop in the next CURSOR 0 that will signal the end of the scanning)
PAGE_SIZE is the value of how long will be each scan query, lower values will impact very low on the server but will be slow, bigger values will make the server "sweat" but faster ... here the network is impacted, so try to find a sweet spot around 10000 or even 50000 (ironically values of 1 or 2 may stress also the server but due to the network wrapping part of each query)
KEYS_TO_QUERY: It's a pattern on the keys you want to query, like "*balance*" witll select the keys that include balance in the name of the key (don't forget to include the quotes to avoid syntax errors) ... additionally you can do the filtering at script side, just query all the key with "*" and add a bash scripting if conditional, this will be slower but if you cannot find a pattern for your keys selection this will help.
REDIS_HOST: using localhost by default, change it to any server you like (if you are using a custom port other than the default port 6379 you can also include it with something like myredisserver:4739)
SOURCE_DB: the database ID you want the keys move from (by default 0)
TARGET_DB: the database ID you want the keys move to (by default 1)
You can use this script to execute other commands or checks with the keys, just replace the MOVE command call for anything you may need.
NOTE: To move keys from one Redis server to another Redis server (this is not only moving between internal databases) you can use redis-utils-cli from the NPM packages here -> https://www.npmjs.com/package/redis-utils-cli

handling long running large transactions with perl dbi

I've got a large transaction comprising of getting lots of data from database A, do some manipulations with this data, then inserting the manipulated data into database B. I've only got permissions to select in database A but I can create tables and insert/update etc in database B.
The manipulation and insertion part is written in perl and already in use for loading data into database B from other data sources, so all that's required is to get the necessary data from database A and using it to initialize the perl classes.
How can I go about doing this so I can easily track back and pick up from where the error happened if any error occurs during the manipulation or insertion procedures (database disconnection, problems with class initialization because of invalid values, hard disk failure etc...)? Doing the transaction in one go doesn't seem like a good option because the amount data from database A means it would take at least a day or 2 for data manipulation and insertion into database B.
The data from database A can be grouped into around 1000 groups using unique keys, with each key containing 1000s of rows each. One way I thought I could do is to write a script that does commits per group, meaning I've got to track which group has already been inserted into database B. The only way I can think of to track the progress of which groups have been processed or not is either in a log file or in a table in database B. A second way I thought could work is to dump all the necessary fields needed for loading the classes for manipulation and insertion into a flatfile, read the file to initialize the classes and insert into database B. This also means that I got to do some logging, but should narrow it down to the exact row in the flatfile if any error occurs. The script will look something like this:
use strict;
use warnings;
use DBI;
#connect to database A
my $dbh = DBI->connect('dbi:oracle:my_db', $user, $password, { RaiseError => 1, AutoCommit => 0 });
#statement to get data based on group unique key
my $sth = $dbh->prepare($my_sql);
my #groups; #I have a list of this already
open my $fh, '>>', 'my_logfile' or die "can't open logfile $!";
eval {
foreach my $g (#groups){
#subroutine to check if group has already been processed, either from log file or from database table
next if is_processed($g);
$sth->execute($g);
my $data = $sth->fetchall_arrayref;
#manipulate $data, then use it to load perl classes for insertion into database B
#.
#.
#.
}
print $fh "$g\n";
};
if ($#){
$dbh->rollback;
die "something wrong...rollback";
}
So if any errors do occur, I can just run this script again and it should skip the groups or rows that have been processed and continue.
Both these methods is just variations on the same theme, and both require going back to where I've been tracking my progress (in table or file), skip the ones that've been commited to database B and process the remaining data.
I'm sure there's a better way of doing this but am struggling to think of other solutions. Is there another way of handling large transactions between databases that require data manipulation between getting data out from one and inserting into another? The process doesn't need to be all in Perl, as long as I can reuse the perl classes for manipulating and inserting the data into the database.
Sorry to say so but I really don't see how you could possibly solve this problem by taking a short cut. To me it sounds like you've though about the most reasonable ways:
Save the state in some temp table/file (I'd look into "perldoc -f tie", or sqlite) at each step
Handle errors properly TryCatch.pm, eval or whatever you prefer
Log your errors properly, i.e. structured logs you can read in
Add some "resume" flag to your script which reads in previous log and data and tries again
This is probably along the lines you've been thinking, but as I said, I don't think there's a general "right" way to handle your problem.