BigTable: 2 Writes to the same key, but 3 versions - bigtable

Sometimes If I write multiple versions onto the same row key, and with multiple column families within multiple batched mutations (each version is batched together with multiple writes).
Is this expected behavior due to data compaction? Will the extra version be removed over time?

The issue here is that you're putting the two columns in two separate entries in the batch, which means that even if they have the same row they won't be applied atomically.
Batch entries can succeed or fail individually, and the client will then retry just the failed entries. If, for example, one entry succeeds and the other times out but later succeeds silently, a retry of the "failed" entry can lead to the partial write results you're seeing.
In python you should therefore do something like the following (adapted from cloud.google.com/bigtable/docs/samples-python-hello):
print('Writing some greetings to the table.')
greetings = ['Hello World!', 'Hello Cloud Bigtable!', 'Hello Python!']
rows = []
column1 = 'greeting1'.encode()
column1 = 'greeting2'.encode()
for i, value in enumerate(greetings):
# Note: This example uses sequential numeric IDs for simplicity,
# but this can result in poor performance in a production
# application. Since rows are stored in sorted order by key,
# sequential keys can result in poor distribution of operations
# across nodes.
#
# For more information about how to design a Bigtable schema for
# the best performance, see the documentation:
#
# https://cloud.google.com/bigtable/docs/schema-design
row_key = 'greeting{}'.format(i).encode()
row = table.row(row_key)
# **Multiple calls to 'set_cell()' are allowed on the same batch
# entry. Each entry will be applied atomically, but a separate
# 'row' in the same batch will be applied separately even if it
# shares its row key with another entry.**
row.set_cell(column_family_id,
column1,
value,
timestamp=datetime.datetime.utcnow())
row.set_cell(column_family_id,
column2,
value,
timestamp=datetime.datetime.utcnow())
rows.append(row)
table.mutate_rows(rows)

Related

REDIS SISMEMBERS Performance: LUA script on multiple keys vs single transaction

I have a dozen of REDIS Keys of the type SET, say
PUBSUB_USER_SET-1-1668985588478915880,
PUBSUB_USER_SET-2-1668985588478915880,
PUBSUB_USER_SET-3-1668988644477632747,
.
.
.
.
PUBSUB_USER_SET-10-1668983464477632083
The set contains a userId and the problem statement is to check if the user is present in any of the set or not
The solution I tried is to get all the keys and append with a delimiter (, comma) and pass it as an argument to lua script wherein with gmatch operator I split the keys and run sismember operation until there is a hit.
local vals = KEYS[1]
for match in (vals..","):gmatch("(.-)"..",") do
local exist = redis.call('sismember', match, KEYS[2])
if (exist == 1) then
return 1
end
end
return 0
Now as and when the number of keys grows to PUBSUB_USER_SET-20 or PUBSUB_USER_SET-30 I see an increase in latency and in throughput.
Is this the better way to do or Is it better to batch LUA scripts where in instead of passing 30keys as arguments I pass in batches of 10keys and return as soon as the user is present or is there any better way to do this?
I would propose a different solution instead of storing keys randomly in a set. You should store keys in one set and you should query that set to check whether a key is there or not.
Lets say we've N sets numbered s-0,s-1,s-2,...,s-19
You should put your keys in one of these sets based on their hash key, which means you need to query only one set instead of checking all these sets. You can use any hashing algorithm.
To make it further interesting you can try consistent hashing.
You can use redis pipeline with batching(10 keys per iteration) to improve the performance

Multi-process Pandas Groupby Failing to do Anything After Processes Kick Off

I have a fairly large pandas dataframe that is about 600,000 rows by 50 columns. I would like to perform a groupby.agg(custom_function) to get the resulting data. The custom_function is a function that takes the first value of non-null data in the series or returns null if all values in the series are null. (My dataframe is hierarchically sorted by data quality, the first occurrence of a unique key has the most accurate data, but in the event of null data in the first occurrence I want to take values in the second occurence... and so on.)
I have found the basic groupby.agg(custom_function) syntax is slow, so I have implemented multi-processing to speed up the computation. When this code is applied over a dataframe that is ~10,000 rows long the computation takes a few seconds, however, when I try to use the entirety of the data, the process seems to stall out. Multiple processes kick off, but memory and cpu usage stay about the same and nothing gets done.
Here is the trouble portion of the code:
# Create list of individual dataframes to feed map/multiprocess function
grouped = combined.groupby(['ID'])
grouped_list = [group for name, group in grouped]
length = len(grouped)
# Multi-process execute single pivot function
print('\nMulti-Process Pivot:')
with concurrent.futures.ProcessPoolExecutor() as executor:
with tqdm.tqdm(total=length) as progress:
futures = []
for df in grouped_list:
future = executor.submit(custom_function, df)
future.add_done_callback(lambda p: progress.update())
futures.append(future)
results = []
for future in futures:
result = future.result()
results.append(result)
I think the issue has something to do with the multi-processing (maybe queuing up a job this large is the issue?). I don't understand why a fairly small job creates no issues for this code, but increasing the size of the input data seems to hang it up rather than just execute more slowly. If there is a more efficient way to take the first value in each column per unique ID, I'd be interested to hear it.
Thanks for your help.

How to trim a list/set in ScyllaDB to a specific size?

Is there a way to trim a list/set to a specific size (in terms of number of elements)?
Something similar to LTRIM command on Redis (https://redis.io/commands/ltrim).
The goal is to insert an element to a list/set but ensuring that its final size is always <= X (discarding old entries).
Example of what I would like to be able to do:
CREATE TABLE images (
name text PRIMARY KEY,
owner text,
tags set<text> // A set of text values
);
-- single command
UPDATE images SET tags = ltrim(tags + { 'gray', 'cuddly' }, 10) WHERE name = 'cat.jpg';
-- two commands (Redis style)
UPDATE images SET tags = tags + { 'gray', 'cuddly' } WHERE name = 'cat.jpg';
UPDATE images SET tags = ltrim(tags, 10) WHERE name = 'cat.jpg';
No, there is no such operation in Scylla (or in Cassandra).
The first reason is efficiency: As you may be aware, one reason why writes in Scylla are so efficient is that they do not do a read: Appending an element to a list just writes this single item to a sequential file (a so-called "sstable"). It does not need to read the existing list and check what elements it already has. The operation you propose would have needed to read the existing item before writing, slowing it down significantly.
The second reason is consistency: What happens if multiple operations like you propose are done in parallel, reaching different coordinators and replicas in different order? What happens if after earlier problems, one of the replicas is missing one of the values? There is no magic way to solve these problems, and the general solution that Scylla offers for concurrent Read-Modify-Write operations is LWT (Lightweight Transacations). You can emulate your ltrim operation using LWT but it will be significantly slower than ordinary writes. You will need to read the list to the client, modify it (append, ltrim, etc.) and then write it back with an LWT (with the extra condition that it still has its old value, or using an additional "version number" column).

How to synchronise multiple writer on redis?

I have multiple writers overwriting the same key in redis. How do I guarantee that only the chosen one write last?
Can I perform write synchronisation in Redis withour synchronise the writers first?
Background:
In my system a unique dispatcher send works to do to various workers. Each worker then write the result in Redis overwrite the same key. I need to be sure that only the last worker that receive work from the dispatcher writes in Redis. 
Use an ordered set (ZSET): add your entry with a score equal to the unix timestamp, then delete all but the top rank.
A Redis Ordered set is a set, where each entry also has a score. The set is ordered according to the score, and the position of an element in the ordered set is called Rank.
In order:
Remove all the entries with score equal or less then the one you are adding(zremrangebyscore). Since you are adding to a set, in case your value is duplicate your new entry would be ignored, you want instead to keep the entry with highest rank. 
Add your value to the zset (zadd)
delete by rank all the entries but the one with HIGHEST rank (zremrangebyrank)
You should do it inside a transaction (pipeline)
Example in python:
# timestamp contains the time when the dispatcher sent a message to this worker
key = "key_zset:%s"%id
pipeline = self._redis_connection.db.pipeline(transaction=True)
pipeline.zremrangebyscore(key, 0, t)  # Avoid duplicate Scores and identical data
pipeline.zadd(key, t, "value")
pipeline.zremrangebyrank(key, 0, -2)
pipeline.execute(raise_on_error=True)
If I were you, I would use redlock.
Before you write to that key, you acquire the lock for it, then update it and then release the lock.
I use Node.js so it would look something like this, not actually correct code but you get the idea.
Promise.all(startPromises)
.bind(this)
.then(acquireLock)
.then(withLock)
.then(releaseLock)
.catch(handleErr)
function acquireLock(key) {
return redis.rl.lock(`locks:${key}`, 3000)
}
function withLock(lock) {
this.lock = lock
// do stuff here after get the lock
}
function releaseLock() {
this.lock.unlock()
}
You can use redis pipeline with Transaction.
Redis is single threaded server. Server will execute commands syncronously. When Pipeline with transaction is used, server will execute all commands in pipeline atomically.
Transactions
MULTI, EXEC, DISCARD and WATCH are the foundation of transactions in Redis. They allow the execution of a group of commands in a single step, with two important guarantees:
All the commands in a transaction are serialized and executed sequentially. It can never happen that a request issued by another client is served in the middle of the execution of a Redis transaction. This guarantees that the commands are executed as a single isolated operation.
A simple example in python
with redis_client.pipeline(transaction=True) as pipe:
val = int(pipe.get("mykey"))
val = val*val%10
pipe.set("mykey",val)
pipe.execute()

Mapreduce Table Diff

I have two versions (old/new) of a database table with about 100,000,000 records. They are in files:
trx-old
trx-new
The structure is:
id date amount memo
1 5/1 100 slacks
2 5/1 50 wine
id is the simple primary key, other fields are non-key. I want to generate three files:
trx-removed (ids of records present in trx-old but not in trx-new)
trx-added (records from trx-new whose ids are not present in trx-old)
trx-changed (records from trx-new whose non-key values have changed since trx-old)
I need to do this operation every day in a short batch window. And actually, I need to do this for multiple tables and across multiple schemas (generating the three files for each) so the actual app is a bit more involved. But I think the example captures the crux of the problem.
This feels like an obvious application for mapreduce. Having never written a mapreduce application my questions are:
is there some EMR application that already does this?
is there an obvious Pig or maybe Cascading solution lying about?
is there some other open source example that is very close to this?
PS I saw the diff between tables question but the solutions over there didn't look scalable.
PPS Here is a little Ruby toy that demonstrates the algorithm: Ruby dbdiff
I think it would be easiest just to write your own job, mostly because you'll want to use MultipleOutputs to write to the three separate files from a single reduce step when the typical reducer only writes to one file. You'd need to use MultipleInputs to specify a mapper for each table.
This seems like the perfect problem to solve in cascading. You have mentioned that you have never written MR application and if the intent is to get started quickly (assuming you are familiar with Java) then Cascading is the way to go IMHO. I'll touch more on this in a second.
It is possible to use Pig or Hive but these aren't as flexible if you want to perform additional analysis on these columns or change schemas since you can build your Schema on the fly in Cascading by reading from the column headers or from a mapping file you create to denote the Schema.
In Cascading you would:
Set up your incoming Taps : Tap trxOld and Tap trxNew (These point to your source files)
Connect your taps to Pipes: Pipe oldPipe and Pipe newPipe
Set up your outgoing Taps : Tap trxRemoved, Tap trxAdded and Tap trxChanged
Build your Pipe analysis (this is where the fun (hurt) happens)
trx-removed :
trx-added
Pipe trxOld = new Pipe ("old-stuff");
Pipe trxNew = new Pipe ("new-stuff");
//smallest size Pipe on the right in CoGroup
Pipe oldNnew = new CoGroup("old-N-new", trxOld, new Fields("id1"),
trxNew, new Fields("id2"),
new OuterJoin() );
The outer join gives us NULLS where ids are missing in the other Pipe (your source data), so we can use FilterNotNull or FilterNull in the logic that follows to get us final pipes that we then connect to Tap trxRemoved and Tap trxAdded accordingly.
trx-changed
Here I would first concatenate the fields that you are looking for changes in using FieldJoiner then use an ExpressionFilter to give us the zombies (cause they changed), something like:
Pipe valueChange = new Pipe("changed");
valueChange = new Pipe(oldNnew, new Fields("oldValues", "newValues"),
new ExpressionFilter("oldValues.equals(newValues)", String.class),
Fields.All);
What this does is it filters out Fields with the same value and keeps the differences. Moreover, if the expression above is true it gets rid of that record. Finally, connect your valueChange pipe to your Tap trxChanged and your will have three outputs with all the data you are looking for with code that allows for some added analysis to creep in.
As #ChrisGerken suggested, you would have to use MultipleOutputs and MultipleInputs in order to generate multiple output files and associate custom mappers to each input file type (old/new).
The mapper would output:
key: primary key (id)
value: record from input file with additional flag (new/old depending on the input)
The reducer would iterate over all records R for each key and output:
to removed file: if only a record with flag old exists.
to added file: if only a record with flag new exists.
to changed file: if records in R differ.
As this algorithm scales with the number of reducers, you'd most likely need a second job, which would merge the results to a single file for a final output.
What come to my mind is that:
Consider your tables are like that:
Table_old
1 other_columns1
2 other_columns2
3 other_columns3
Table_new
2 other_columns2
3 other_columns3
4 other_columns4
Append table_old's elements "a" and table_new's elements "b".
When you merge both files and if an element exist on the first file and not in the second file this is removed
table_merged
1a other_columns1
2a other_columns2
2b other_columns2
3a other_columns3
3b other_columns3
4a other_columns4
From that file you can do your operations easily.
Also, let say your id's are n digits, and you have 10 clusters+1 master. Your key would be 1st digit of id, therefore, you divide the data to clusters evenly. You would do grouping+partitioning so your data would be sorted.
Example,
table_old
1...0 data
1...1 data
2...2 data
table_new
1...0 data
2...2 data
3...2 data
Your key is first digit and you do grouping according to that digit, and your partition is according to rest of id. Then your data is going to come to your clusters as
worker1
1...0b data
1...0a data
1...1a data
worker2
2...2a data
2...2b data and so on.
Note that, a, b doesnt have to be sorted.
EDIT
Merge is going to be like that:
FileInputFormat.addInputPath(job, new Path("trx-old"));
FileInputFormat.addInputPath(job, new Path("trx-new"));
MR will get two input and the two file will be merged,
For the appending part, you should create two more jobs before Main MR, which will have only Map. The first Map will append "a" to every element in first list and the second will append "b" to elements of second list. The third job(the one we are using now/main map) will only have reduce job to collect them. So you will have Map-Map-Reduce.
Appending can be done like that
//you have key:Text
new Text(String.valueOf(key.toString()+"a"))
but I think there may be different ways of appending, some of them may be more efficient in
(text hadoop)
Hope it would be helpful,