I am iterating over large data set using thin client and i only need list of keys from the Ignite cache
Is there a way to do it?
The value are very heavy as they are actual data files and key is UUID.
If you enable SQL support for your table, you can use query using the "virtual" column _key:
try (QueryCursor<List<?>> cur = cache2.query(new SqlFieldsQuery("select _key from table"))) {
for (List<?> r : cur) {
Long key = (Long)r.get(0);
}
}
Related
Using ignite 2.9.x
I have two BinaryObject caches with one serves as a parent and the other as a child with an affinity key to the parent, this way I can colocate all relevant records on the same node.
I then want to select all child records based on the parent key. At the moment based on a 3 nodes cluster, the search time is linearly growing as I add new parent and child records (child records per parent is fixed (1000)).
I wonder if there is a way to add an index on the child cache parent property (asset) so the scan will scale more efficiently.
Production ..> Asset (asset)
can I define an index on the Production cache using the parent key asset?
what would this config look like?
How would the query change?
Should I use the AffinityKey in this case (how)?
IgniteCache<Integer, BinaryObject> productions = ignite.cache("productions").withKeepBinary();
Map<LocalDate, Double> totals = new HashMap<>();
try (QueryCursor<Cache.Entry<BinaryObject, BinaryObject>> cursor =
productions.query(
new ScanQuery<BinaryObject, BinaryObject>().setLocal(true).setFilter(this::apply))) {
for (Cache.Entry<BinaryObject, BinaryObject> entry : cursor) {
OffsetDateTime timestamp = entry.getValue().field("timestamp");
double productionMwh = entry.getValue().field("productionMwh");
totals.computeIfPresent(
timestamp.toLocalDate().withDayOfMonth(1),
(localDate, aDouble) -> aDouble + productionMwh);
totals.computeIfAbsent(
timestamp.toLocalDate().withDayOfMonth(1), localDate -> productionMwh);
}
}
private boolean apply(BinaryObject id, BinaryObject production) {
return key == ((Integer) production.field("asset"));
}
The only way to implement efficient parent->child link is to define SQL table and SQL index for your data. You can still collocate such data on affinity column.
I am storing data in redis using JCA(java caching api) where key is String and value is Object which is JSON string.
I have a requirement to perform partial update to cache value instead of retrieving cache value using key and then modify attribute and perform put operation with latest cache value
{
"attribute1" : "value1",
"attribute2 " : [
{
"attribute3" : "value3"
}
]
}
Above is sample json format. As explained above is it possible to update value of attribute1 from value1 to value2 without geting cache value using key in redis
Assuming you are using JCache API (ie JSR-107), you can use Cache#invoke(K key, EntryProcessor<K,V,T> entryProcessor, Object... arguments) to perform an update in-place instead of get-then-put. According to EntryProcessor javadoc, Cache#invoke is executed atomically on the key, so you don't have to worry about concurrent modifications to the same cache entry.
You can use a Lua script, so that using the CJSON Lua library you update the item. I have shared a similar example on How to nest a list into a structure in Redis to reduce top level?
Not familiar with JCA, so not sure if your client would make it simple to send an EVAL command.
How do I find all the records in Redis?
USER_EXCHANGE = table
USER_ID = User ID (primary key)
UID = relationship
The key is stored with the following structure
USER_EXCHANGE:USER_ID:4030:UID:63867a4c6948e9405f4dd73bd9eaf8782b7a6667063dbd85014bd02046f6cc2e
I am trying to find all the records of the user 4030...
using (var redisClient = new RedisClient ())
{
List<object> ALL_UID = redisClient.Get<List<object>>("USER_EXCHANGE:USER_ID:4030:UID:*");
}
What am I doing wrong? Thank you all for your help.
Hi as you're trying to fetch all keys matching a pattern you should use KEYS.
GET won't match patterns. by will retrieve complete full names.
Caution this is a debug function and not a production function.
doc: https://redis.io/commands/keys
Production simple solution, recommanded for you is :
store a list of your key in a LIST
USER_EXCHANGE:USER_ID:4030 => [ uid1, uid2, uid3 ....]
get list of uids for a specific user ID by getting the list.
This is a good practice in REDIS.
I am trying to get a grip on the ServiceStack Redis example and Redis itself and now have some questions.
Question 1:
I see some static indexes defined, eg:
static class TagIndex
{
public static string Questions(string tag) { return "urn:tags>q:" + tag.ToLower(); }
public static string All { get { return "urn:tags"; } }
}
What does that '>' (greater than) sign do? Is this some kind of convention?
Question 2:
public User GetOrCreateUser(User user)
{
var userIdAliasKey = "id:User:DisplayName:" + user.DisplayName.ToLower();
using (var redis = RedisManager.GetClient())
{
var redisUsers = redis.As<User>();
var userKey = redis.GetValue(userIdAliasKey);
if (userKey != null) return redisUsers.GetValue(userKey);
if (user.Id == default(long)) user.Id = redisUsers.GetNextSequence();
redisUsers.Store(user);
redis.SetEntry(userIdAliasKey, user.CreateUrn());
return redisUsers.GetById(user.Id);
}
}
As far as I can understand, first a user is stored with a unique id. Is this necessary when using the client (I know this is not for Redis necessary)? I have for my model a meaningful string id (like an email address) which I like to use. I also see a SetEntry is done. What does SetEntry do exactly? I think it is an extra key just to set a relation between the id and a searchable key. I guess this is not necessary when storing the object itself with a meaningful key, so user.id = "urn:someusername". And how is SetEntry stored as a Redis Set or just an extra key?
Question 3:
This is more Redis related but I am trying to figure out how everything is stored in Redis in order to get a grip on the example so I did:
Started redis-cli.exe in a console
Typed 'keys *' this shows all keys
Typed 'get id:User:DisplayName:joseph' this showed 'urn:user:1'
Typed 'get urn:user:1' this shows the user
Now I also see keys like 'urn:user>q:1' or 'urn:tags' if I do a 'get urn:tags' I get the error 'ERR Operation against a key holding the wrong kind of value'. And tried other Redis commands like smembers but I cannot find the right query commands.
Question 1: return "urn:tags>q:" + tag.ToLower(); gives you the key (a string) for a given tag; the ">" has no meaning for Redis, it's a convention of the developer of the example, and could have been any other character.
Question 3: use the TYPE command to determine the type of the key, then you'll find the right command in redis documentation to get the values.
I'm working on an web app that collects traffic information for websites that use my service. Think google analytics but far more visual. I'm using SQL Server 2012 for the backbone of my app and am considering using MongoDB as the data gathering analytic side of the site.
If I have 100 users with an average of 20,000 hits a month on their site, that's 2,000,000 records in a single collection that will be getting queried.
Should I use MongoDB to store this information (I'm new to it and new things are intimidating)?
Should I dynamically create new collections/tables for every new user?
Thanks!
With MongoDB the collection (aka sql table) can get quite big without much issue. That is largely what it is designed for. The Mongo is part HuMONGOus (pretty clever eh). This is a great use for mongodb which is great at storing point in time information.
Options :
1. New Collection for each Client
very easy to do I use a GetCollectionSafe Method for this
public class MongoStuff
private static MongoDatabase GetDatabase()
{
var databaseName = "dbName";
var connectionString = "connStr";
var client = new MongoClient(connectionString);
var server = client.GetServer();
return server.GetDatabase(databaseName);
}
public static MongoCollection<T> GetCollection<T>(string collectionName)
{
return GetDatabase().GetCollection<T>(collectionName);
}
public static MongoCollection<T> GetCollectionSafe<T>(string collectionName)
{
//var db = GetDatabase();
var db = GetDatabase();
if (!db.CollectionExists(collectionName)) {
db.CreateCollection(collectionName);
}
return db.GetCollection<T>(collectionName);
}
}
then you can call with :
var collection = MongoStuff.GetCollectionSafe<Record>("ClientName");
Running this script
static void Main(string[] args)
{
var times = new List<long>();
for (int i = 0; i < 1000; i++)
{
Stopwatch watch = new Stopwatch();
watch.Start();
MongoStuff.GetCollectionSafe<Person>(String.Format("Mark{0:000}", i));
watch.Stop();
Console.WriteLine(watch.ElapsedMilliseconds);
times.Add(watch.ElapsedMilliseconds);
}
Console.WriteLine(String.Format("Max : {0} \nMin : {1} \nAvg : {2}", times.Max(f=>f), times.Min(f=> f), times.Average(f=> f)));
Console.ReadKey();
}
Gave me (on my laptop)
Max : 180
Min : 1
Avg : 6.635
Benefits :
Ease of splitting data if one client needs to go on their own
Might match your brain map of the problem
Cons :
Almost impossible to do aggregate data over all collections
Hard to find collections in Management studios (like robomongo)
2. One Large Collection
Use one collection for it all access it this way
var coll = MongoStuff.GetCollection<Record>("Records");
Put an index on the table (the index will make reads orders of magnitude quicker)
coll.EnsureIndex(new IndexKeysBuilder().Ascending("ClientId"));
needs to only be run once (per collection, per index )
Benefits :
One Simple place to find data
Aggregate over all clients possible
More traditional Mongodb setup
Cons :
All Clients Data is intermingled
May not mentally map as well
Just as a reference the mongodb limits for sizes are here :
[http://docs.mongodb.org/manual/reference/limits/][1]
3. Store only aggregated data
If you are never intending to break down to an individual record just save the aggregates themselves.
Page Loads :
# Page Total Time Average Time
15 Default.html 1545 103
I will let someone else tackle the MongoDB side of your question as I don't feel I'm the best person to comment on it, I would point out that MongoDB is a very different animal and you'll lose a lot of the RI you enjoy in SQL.
In terms of SQL design I would not use a different schema for each customer approach. Your database schema and backups could grow uncontrollably, maintaining a dynamically growing schema will be a nightmare.
I would suggest one of two approaches:
Either you can create a new database for each customer:
This is more secure as users cannot access each other's data (just use different credentials) and users are easier to manage/migrate and separate.
However many hosting providers charge per database, it will cost more to run and maintain and should you wish to compare data across users it gets much more challenging.
Your second approach is to simply host all users in a single DB, your tables will grow large (although 2 million rows is not over the top for a well maintained SQL DB). You would simply use a UserID column to discriminate.
The emphasis will be on you to get the performance you need through proper indexing
Users' data will exist in the same system and there's no SQL defense against users accessing each other's data - your code will have to be good!