LinkedHashMap behaviour with Redis Hashes? - redis

I want to use Hashes data structure in Redis (Jedis client) but also want to maintain the insertion order something like LinkedHashMap in Java. I am totally new to Redis and gone through all the data structures and commands but somehow not able to think of any straight forward solution. Any help or suggestions will be appreciated.

Hashes in Redis do not maintain insertion order. You can achieve the same effect by using a Sorted Set and a counter to keep track of order. Here is a simple example (in Ruby, sorry):
items = {foo: "bar", yin: "yang", some_key: "some_value"}
items.each do |key, value|
count = redis.incr :my_hash_counter
redis.hset :my_hash, key, value
redis.zadd :my_hash_order, count, key
end
Retrieving the values in order would look something like this:
ordered_keys = redis.zrange :my_hash_order, 0, -1
ordered_hash = Hash[
ordered_keys.map {|key| [key, redis.hget(:my_hash, key)] }
]
# => {"foo"=>"bar", "yin"=>"yang", "some_key"=>"some_value"}

No need to use Sorted Set either counter. Just use a https://redis.io/commands#list, because it keeps the insertion order.
HSET my_hash foo bar
RPUSH my_ordered_keys foo
HSET my_hash yin yang
RPUSH my_ordered_keys yin
HSET my_hash some_key some_value
RPUSH my_ordered_keys some_key
LRANGE my_ordered_keys 0 10
1) "foo"
2) "yin"
3) "some_key"

Related

How to get id's from a list that are NOT in a table?

I am receiving a list of ID's. Most of these already exist in a table. I need to find which ID's are NOT in the table. This question has nothing to do with joins.
My API will receive a list of IDs, such as: [1, 2, 3, 4, 5]
Let's say there are three records in the table: [2, 3, 4]
The result I'm looking for is the array: [1, 5]
Our SQL brains jump quickly to something like the following, but clearly that's not what we need:
select * from widgets where id not in [list]
We don't need the records not in the list, we need the part of the list not in the records!
My fallback is to retrieve all records in the list and subtract from the list, something like this:
existing_ids = Widget.where(id: id_list).pluck(:id)
new_ids = id_list - existing_ids
That will work...but feels heavy-handed. Particularly if id_list has 100,000 records, and the table has 99,999 of those records.
I've searched around, and the only similar result is ID from list that is not in a table ... which did not find a viable solution.
Is there any way to do this in a single SQL query? (Bonus points for an ActiveRecord solution!)
To compare the lists to each other, either the input list needs to go into the database or the list of existing ids needs to come out of the database. The latter you already tried and didn't like, so here's an alternative
SELECT "id" FROM unnest('{1,2,3,4,5}'::integer[]) AS "id" WHERE "id" NOT IN (SELECT "id" FROM "widgets");
Not sure about performance.
Depending how many records are in your database, the simplest thing might just be to select all of the IDs and then drop the duplicates in Ruby.
from_api = [1,2,3,4,5]
existing = Widgets.pluck(:id) # => [2,3,4]
from_api.difference(existing) # => [1,5]
Obviously, if you have a substantial dataset, this will be less than optimal.
This should work.
from_api = [1,2,3,4,5]
existing = Widgets.order(:id).ids # => [2,3,4]
new_ids = []
from_api.each{ |n| new_ids << n unless existing.include? n }
new_ids # => [1,5]
or
from_api = [1,2,3,4,5]
existing = Widgets.order(:id).ids # => [2,3,4]
from_api.map{ |n| n == existing.first ? (nil if existing = existing.drop(1)) : n }.compac # => [1,5]
Balancing the complexity (to the current and future developers) of unset approach, I decided for my project that the simpler approach was warranted. While I didn't profile performance, I believe any gains would be minimal, if any.
Here is the solution I ended up with:
class Widget < ApplicationRecord
def self.absent(names)
uniq_names = names.uniq
uniq_names - where(name: uniq_names).pluck(:name)
end
end
And tests:
describe '.absent' do
subject { described_class.absent(names) }
let!(:widget1) { create(:widget, name: 'old-1') }
let!(:widget2) { create(:widget, name: 'old-2') }
let(:names) { %w[new-2 old-2 new-1 old-1 new-1 old-1] }
it { is_expected.to eq %w[new-2 new-1] }
end

Difference between storing data as a key and as property of a hash object

Right now, I'm storing user objects as follows:
user1 = ( id: 1, name: "bob")
user2 = { id: 2, name: "steve"}
HMSET "user:1", user1
HMSET "user:2", user2
HGETALL "user:1" would return the user1 object
HGETALL "user:2" would return the user2 object
I'm wondering if there would be any significant difference (performance or other) if I did:
user1 = ( id: 1, name: "bob")
user2 = { id: 2, name: "steve"}
HSET "USER", 1, JSON.stringify(user1)
HSET "USER", 2, JSON.stringify(user2)
HGET "USER", 1 would give me the string representation of user1 object
HGET "USER", 2 woudl give me the string representation of user2 object
There's not a huge difference either way. It's mostly going to boil down to a design decision based on what you're doing, although whichever you use you should stay consistent throughout the project to avoid confusion.
Here are some pros to method 2:
using JSON could help maintain type consistency
Redis will use less memory and may be a tiny bit faster, since it doesn't have to store or lookup those extra keys
might be easier to think about and work with in code
The main negative for method 2 is summed up in the following example. Say you need to update a user's name. Here's how you would do it with each method.
// Method 1:
HMSET user:1 name newname
// Method 2:
result = JSON.parse(HGET user 1)
result.name = newname
HSET user 1 JSON.stringify(result)

ways to select values by keys list?

For example I have key structure entity:product:[id] where id - is an integer [0-n]
So I can use this keys entity:product:* but I don't how much load does this query to the redis server.
Another solution is
Create one list key that will store Ids of the entity:products.
RPUSH entity:products:ids 1
RPUSH entity:products:ids 2
RPUSH entity:products:ids 3
RPUSH entity:products:ids 4
And then (pseudo-code)
entityProducts = redis.LRANGE('entityLproducts:ids, 0, -1)
foreach (entityProducts as id)
{
redis.GET('entity:product:' + id)
}
What is the better way? What will be faster and what will do less load to the redis server?

Rails/Sql - order/group search results such that repetition of entities occurs only after appearance of others

In my application, say, animals have many photos. I'm querying photos of animals such that I want all photos of all animals to be displayed. However, I want each animal to appear as a photo before repetition occurs.
Example:
animal instance 1, 'cat', has four photos,
animal instance 2, 'dog', has two photos:
photos should appear ordered as so:
#photo belongs to #animal
tiddles.jpg , cat
fido.jpg dog
meow.jpg cat
rover.jpg dog
puss.jpg cat
felix.jpg, cat (no more dogs so two consecutive cats)
Pagination is required so I can't
order on an array.
Filename
structure/convention provides no
help, though the animal_id exists on
each photo.
Though there are two
types of animal in this example this
is an active record model with
hundreds of records.
Animals may be
selectively queried.
If this isn't possible with active_record then I'll happily use sql; I'm using postgresql.
My brain is frazzled so if anyone can come up with a better title, please go ahead and edit it or suggest in comments.
Here is a PostgreSQL specific solution:
batch_id_sql = "RANK() OVER (PARTITION BY animal_id ORDER BY id ASC)"
Photo.paginate(
:select => "DISTINCT photos.*, (#{batch_id_sql}) batch_id",
:order => "batch_id ASC, photos.animal_id ASC",
:page => 1)
Here is a DB agnostic solution:
batch_id_sql = "
SELECT COUNT(bm.*)
FROM photos bm
WHERE bm.animal_id = photos.animal_id AND
bm.id <= photos.id
"
Photo.paginate(
:select => "photos.*, (#{batch_id_sql}) batch_id",
:order => "batch_id ASC, photos.animal_id ASC",
:page => 1)
Both queries work even when you have a where condition. Benchmark the query using expected data set to check if it meets the expected throughput and latency requirements.
Reference
PostgreSQL Window function
Having no experience in activerecord. Using plain PostgreSQL I would try something like this:
Define a window function over all previous rows which counts how many time the current animal has appeared, then order by this count.
SELECT
filename,
animal_id,
COUNT(*) OVER (PARTITION BY animal_id ORDER BY filename) AS cnt
FROM
photos
ORDER BY
cnt,
animal_id,
filename
Filtering on certain animal_id's will work. This will always order the same way. I don't know if you want something random in there, but it should be easily added.
New solution
Add an integer column called batch_id to the animals table.
class AddBatchIdToPhotos < ActiveRecord::Migration
def self.up
add_column :photos, :batch_id, :integer
set_batch_id
change_column :photos, :batch_id, :integer, :nil => false
add_index :photos, :batch_id
end
def self.down
remove_column :photos, :batch_id
end
def self.set_batch_id
# set the batch id to existing rows
# implement this
end
end
Now add a before_create on the Photo model to set the batch id.
class Photo
belongs_to :animal
before_create :batch_photo_add
after_update :batch_photo_update
after_destroy :batch_photo_remove
private
def batch_photo_add
self.batch_id = next_batch_id_for_animal(animal_id)
true
end
def batch_photo_update
return true unless animal_id_changed?
batch_photo_remove(batch_id, animal_id_was)
batch_photo_add
end
def batch_photo_remove(b_id=batch_id, a_id=animal_id)
Photo.update_all("batch_id = batch_id- 1",
["animal_id = ? AND batch_id > ?", a_id, b_id])
true
end
def next_batch_id_for_animal(a_id)
(Photo.maximum(:batch_id, :conditions => {:animal_id => a_id}) || 0) + 1
end
end
Now you can get the desired result by issuing simple paginate command
#animal_photos = Photo.paginate(:page => 1, :per_page => 10,
:order => :batch_id)
How does this work?
Let's consider we have data set as given below:
id Photo Description Batch Id
1 Cat_photo_1 1
2 Cat_photo_2 2
3 Dog_photo_1 1
2 Cat_photo_3 3
4 Dog_photo_2 2
5 Lion_photo_1 1
6 Cat_photo_4 4
Now if we were to execute a query ordered by batch_id we get this
# batch 1 (cat, dog, lion)
Cat_photo_1
Dog_photo_1
Lion_photo_1
# batch 2 (cat, dog)
Cat_photo_2
Dog_photo_2
# batch 3,4 (cat)
Cat_photo_3
Cat_photo_4
The batch distribution is not random, the animals are filled from the top. The number of animals displayed in a page is governed by per_page parameter passed to paginate method (not the batch size).
Old solution
Have you tried this?
If you are using the will_paginate gem:
# assuming you want to order by animal name
animal_photos = Photo.paginate(:include => :animal, :page => 1,
:order => "animals.name")
animal_photos.each do |animal_photo|
puts animal_photo.file_name
puts animal_photo.animal.name
end
I'd recommend something hybrid/corrected based on KandadaBoggu's input.
First off, the correct way to do it on paper is with row_number() over (partition by animal_id order by id). The suggested rank() will generate a global row number, but you want the one within its partition.
Using a window function is also the most flexible solution (in fact, the only solution) if you want to plan to change the sort order here and there.
Take note that this won't necessarily scale well, however, because in order to sort the results you'll need to:
fetch the whole result set that matches your criteria
sort the whole result set to create the partitions and obtain a rank_id
top-n sort/limit over the result set a second time to get them in their final order
The correct way to do this in practice, if your sort order is immutable, is to maintain a pre-calculated rank_id. KandadaBoggu's other suggestion points in the correct direction in this sense.
When it comes to deletes (and possibly updates, if you don't want them sorted by id), you may run into issues because you end up trading faster reads for slower writes. If deleting the cat with an index of 1 leads to updating the next 50k cats, you're going to be in trouble.
If you've very small sets, the overhead might be very acceptable (don't forget to index animal_id).
If not, there's a workaround if you find the order in which specific animals appear is irrelevant. It goes like this:
Start a transaction.
If the rank_id is going to change (i.e. insert or delete), obtain an advisory lock to ensure that two sessions can't impact the rank_id of the same animal class, e.g.:
SELECT pg_try_advisory_lock('the_table'::regclass, the_animal_id);
(Sleep for .05s if you don't obtain it.)
On insert, find max(rank_id) for that animal_id. Assign it rank_id + 1. Then insert it.
On delete, select the animal with the same animal_id and the largest rank_id. Delete your animal, and assign its old rank_id to the fetched animal (unless you were deleting the last one, of course).
Release the advisory lock.
Commit the work.
Note that the above will make good use of an index on (animal_id, rank_id) and can be done using plpgsql triggers:
create trigger "__animals_rank_id__ins"
before insert on animals
for each row execute procedure lock_animal_id_and_assign_rank_id();
create trigger "_00_animals_rank_id__ins"
after insert on animals
for each row execute procedure unlock_animal_id();
create trigger "__animals_rank_id__del"
before delete on animals
for each row execute procedure lock_animal_id();
create trigger "_00_animals_rank_id__del"
after delete on animals
for each row execute procedure reassign_rank_id_and_unlock_animal_id();
You can then create a multi-column index on your sort criteria if you're not joining all over them place, e.g. (rank_id, name). And you'll end up with a snappy site for reads and writes.
You should be able to get the pictures (or filenames, anyway) using ActiveRecord, ordered by name.
Then you can use Enumerable#group_by and Enumerable#zip to zip all the arrays together.
If you give me more information about how your filenames are really arranged (i.e., are they all for sure with an underscore before the number and a constant name before the underscore for each "type"? etc.), then I can give you an example. I'll write one up momentarily showing how you'd do it for your current example.
You could run two sorts and build one array as follows:
result1= The first of each animal type only. use the ruby "find" method for this search.
result2= All animals, sorted by group. Use "find" to again find the first occurrence of each animal and then use "drop" to remove those "first occurrences" from result2.
Then:
markCustomResult = result1 + result2
Then:
You can use willpaginate on markCustomResult

Rails (or maybe SQL): Finding and deleting duplicate AR objects

ActiveRecord objects of the class 'Location' (representing the db-table Locations) have the attributes 'url', 'lat' (latitude) and 'lng' (longitude).
Lat-lng-combinations on this model should be unique. The problem is, that there are a lot of Location-objects in the database having duplicate lat-lng-combinations.
I need help in doing the following
Find objects that share the same
lat-lng-combination.
If the 'url' attribute of the object
isn't empty, keep this object and delete the
other duplicates. Otherwise just choose the
oldest object (by checking the attribute
'created_at') and delete the other duplicates.
As this is a one-time-operation, solutions in SQL (MySQL 5.1 compatible) are welcome too.
If it's a one time thing then I'd just do it in Ruby and not worry too much about efficiency. I haven't tested this thoroughly, check the sorting and such to make sure it'll do exactly what you want before running this on your db :)
keep = []
locations = Location.find(:all)
locations.each do |loc|
# get all Locations's with the same coords as this one
same_coords = locations.select { |l| l.lat == loc.lat and \
l.lng == loc.lng }
with_urls = same_coords.select { |l| !l.url.empty? }
# decide which list to use depending if there were any urls
same_coords = with_urls.any? ? with_urls : same_coords
# pick the best one
keep << same_coords.sort { |a,b| b.created_at <=> a.created_at }.first.id
end
# only keep unique ids
keep.uniq!
# now we just delete all the rows we didn't decide to keep
locations.each do |loc|
loc.destroy unless keep.include?( loc.id )
end
Now like I said, this is definitely poor, poor code. But sometimes just hacking out the thing that works is worth the time saved in thinking up something 'better', especially if it's just a one-off.
If you have 2 MySQL columns, you can use the CONCAT function.
SELECT * FROM table1 GROUP BY CONCAT(column_lat, column_lng)
If you need to know the total
SELECT COUNT(*) AS total FROM table1 GROUP BY CONCAT(column_lat, column_lng)
Or, you can combine both
SELECT COUNT(*) AS total, table1.* FROM table1
GROUP BY CONCAT(column_lat, column_lng)
But if you can explain more on your question, perhaps we can have more relevant answers.