I am starting to use Redis as a cache and I am not sure of the best structure for my application. The overall structure is below:
Some things I need to be able to do are:
Flushing everything for a data source (all users and all queries).
Flushing everything for a particular user (e.x: I would need to remove user 1 and it's queries from data source 1 and data source 2).
Everything listed in the tree is meant to be a part of the key to accessing the results of running the query for the user on the specific data source. I am new to Redis and have been going back and forth between using Hashes and Sets.
Option 1 (Sets):
DataSource1 => user1, user2, user3, user4
DataSource1:user1 => Query1, Query2, Query3
DataSource1:user1:Query1 => Results
Flushing things would be expensive because I would have to find all keys that match user1 or DataSource1.
Partial Option 2 (Hashes):
users:user1 DataSource1:Query1 Results1 DataSource2:Query1 Results2
Still not sure how flushing data sources or users would work here.
Does anyone have other thoughts/modifications?
Based on the information I would have a set with DataSource -> Users,and a hash for users -> queries. That way flushing users is just deleting the hash, and flushing a data source (which you probably do less) is looping through the set and deleting users, then deleting the datasource -> user set.
If you go with the first option of sets then it'll be more operations every time you flush queries from users, but it really depends on the the data is accessed.
Also depending on how many queries you store per user, hashes are more efficient than sets.
Related
looking for some help on one scenario.
i have one OLTP application(app1) which frequently reads metadata table, This table has only 2k records.
Another application(app2) needs to refresh(remove and insert 2k records again) metadata table few times in a day.
Since there there is no downtime how do i refresh the metadata table?
one option i can think of is:
Add column "ACTIVE" to metadata table(initial load put 1 as Active)
app1 while reading always use Active=1
app2: 1st insert new metadata with Active=0
app2: 2nd Delete Active=1
app2: 3rd update all rows Active=1
app2: finally commit
I'm not very convinced above option, is there any alternative option?
What is the problem that would require downtime? You shouldn't need to do anything special here unless, say, you have 2,000 rows of data but each row is hundreds of MB in size or you have some queries in App1 that run for several hours (which seems unlikely in an OLTP application).
So long as you do the delete and insert as part of a single transaction, which you say you are, you should cause no problems for App1. Any queries App1 runs before the refresh commits will see the state of the data before the refresh. Any queries App1 runs after the refresh commits will see the state of the data after the refresh. Readers never block writers and writers never block readers in Oracle so App1 isn't going to care that App2 is in the middle of a refresh (well, it may have to do slightly more work to read the data if it needs to use the data in the undo segments to reconstruct a read-consistent view of the data but it's unlikely that will have a meaningful impact on performance).
I have a program I wrote for a group that is constantly writing, reading and deleting per session per user to our SQL server. I do not need to worry about what is being written or deleted as all the data being written/deleted by an individual will never be needed by another. Each users writes are separated by a unique ID and all queries are based on that unique ID.
I want to be able to write/delete rows from the same table by multiple users at the same time. I know I can set up the session to be able to read while data is being written using SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED.
The question:
That said can I have multiple users write/delete from the same table at the same time?
Currently my only idea if this is not possible is to set up the tool to write to temp tables per user session. But I don't think that is an efficient option to constantly create and delete temp tables hundreds of times a day.
Yes you can make this multi tenant approach work fine.
Ensure leading column of all indexes is UserId so a query for one user never needs to scan rows belonging to a different user.
Ensure all queries have an equality predicate on UserId and verify execution plans to ensure that they are seeking on it.
Ensure no use of serializable isolation level as this can take range locks affecting adjacent users.
Ensure that row locking is not disabled on all indexes and restrict DML operations to <= 5,000 rows (to prevent lock escalation)
Consider using read committed snapshot isolation for your reading queries
We are trying to implement a logging or statistics implementation with Aerospike. We have users logged in and Anonymus users making queries to our main database and we want to store every request that it's made.
Our best approach so far is to store Records with the UserID as a Key, and the query keywords as a List like this:
{
Key: 'alacret'
Bins:{
searches: [
"something to search 1",
"something to search 2",
"something to search 3",
...
]
}
}
As the application Architect, reviewing this, I come to several performance/design pitfalls :
1) Retrieving and storing are two operations, getting all the list, append, and then put again seems inefficient or suboptimal
2) By doing two operations means that I have to do both in a transaction, to prevent raise conditions, that I think would kill the Aerospike performance
3) The documentation states that the List are data structures to sized-bounded data, so if I understand correctly is not gonna scale pretty well, especially for anonymous users who would increase the size of the list exponentially.
As an alternative, I'm proposing to move the userID as Bin, and generate a Key that prevents raises conditions and keep the save operation as a single operation, and not several in a transaction.
So, what I'm looking for are opinions and validations.
Greetings
You can append to the list or prepend it. You can also limit it by trimming it, if beyond a certain limit, you don't care to store the search items ie you only want to store say 100 most recent items in your userID search list. You can do the append and trim, then read back the updated list, all in one lock. If you are storing on disk, the record size is limited to 1MB including all overhead etc. You can store much much larger record size if storing data only in RAM. (storage-engine memory). Does that suit your application need?
We have an application that is installed on premises for many clients. We are trying to collect information that will be sent to us at a point in the future. We want to ensure that we can detect if any of our data is modified and if any data was deleted.
To prevent data from being modified we currently hash table rows and send the hashes with the data. However, we are struggling to detect if data has been deleted. For example if we insert 10 records in a table and hash each row the user wont be able to modify the record without us detecting it but if they drop all the records then we can't distinguish this from the initial installation.
Constraints:
Clients will have admin roles to DB
Application and DB will be behind a DMZ and won't be able to connect external services
Clients will be able to profile any sql commands and be able to replicate any initial setup we do. (to clearify they clients can also drop/recreate tables)
although clients can drop data and tables, there are some sets of data and tables that if dropped or deleted would be obvious to us during audits beacuse they should always be accumulating data and missing data or truncated data would stand out. We want to be able to detect deletion and fraud in the remaining tables.
We're working under the assumption that clients will not be able to reverse our code base or hash/encrypt data themselves
Clients will send us all data collected every month and the system will be audited by us once a year.
Also consider they client can take backups of the DB or snapshots of a VM in a 'good' state and then do a roll back to that 'good' state if they want to destroy data. we don't want do do any detection of vm snapshot or db backup roll backs directly.
So far the only solution we have is encrypting the install date (which could be modified) and the instance name. Then every minute 'increment' the encrypted data. When we add data to the system, we hash the data row and stick the hash in the encrypted data. Then continue to 'increment' the data. Then when the monthly data is sent we'd be able to see if they are deleting data and rolling the DB back to just after installation because the encrypted value wouldn't have any increments or would be have extra hashes that don't belong to any data.
Thanks
Have you looked into Event Sourcing? This could be used possibly with write-once media as secondary storage if performance is good enough that way. That would then guarantee transaction integrity even against DB or OS admins. I'm not sure whether it's feasible to do Event Sourcing with real write-once media and still keep a reasonable performance.
Let's say we have a md5() or similar function in your code, and you want to keep control of the modification of the "id" fields of the table "table1". You can do something like:
accumulatedIds = "secretkey-only-in-your-program";
for every record "record" in the table "table1"
accumulatedIds = accumulatedIds + "." + record.id;
update hash_control set hash = md5(accumulatedIds) where table = "table1";
After every authorized change of the information of the table "table1". Nobody could make changes out of this system without being noticed.
If somebody changes some of the id's, you will notice that because the hash wouldn't be the same.
If somebody wants to recreate your table, unless he recreates exactly the same information, he woulnd't be capable of making the hash again, because he don't know the "secretkey-only-in-your-program".
If somebody deletes a record, it can be discovered also, because that "accumulatedIds" wouldn't match. The same will apply if somebody adds a record.
The user can delete the record under the table hash_control, but he can't reconstruct the hash information properly without the "secretkey...", so you will notice that also.
What am I missing??
I would like to develop a Forum from scratch, with special needs and customization.
I would like to prepare my forum for intensive usage and wondering how to cache things like User posts count and User replies count.
Having only three tables, tblForum, tblForumTopics, tblForumReplies, what is the best approach of cache the User topics and replies counts ?
Think at a simple scenario: user press a link and open the Replies.aspx?id=x&page=y page, and start reading replies. On the HTTP Request, the server will run an SQL command wich will fetch all replies for that page, also "inner joining with tblForumReplies to find out the number of User replies for each user that replied."
select
tblForumReplies.*,
tblFR.TotalReplies
from
tblForumReplies
inner join
(
select IdRepliedBy, count(*) as TotalReplies
from tblForumReplies
group by IdRepliedBy
) as tblFR
on tblFR.IdRepliedBy = tblForumReplies.IdRepliedBy
Unfortunately this approach is very cpu intensive, and I would like to see your ideas of how to cache things like table Counts.
If counting replies for each user on insert/delete, and store it in a separate field, how to syncronize with manual data changing. Suppose I will manually delete Replies from SQL.
These are the three approaches I'd be thinking of:
1) Maybe SQL Server performance will be good enough that you don't need to cache. You might be underestimating how well SQL Server can do its job. If you do your joins right, it's just one query to get all the counts of all the users that are in that thread. If you are thinking of this as one query per user, that's wrong.
2) Don't cache. Redundantly store the user counts on the user table. Update the user row whenever a post is inserted or deleted.
3) If you have thousands of users, even many thousand, but not millions, you might find that it's practical to cache user and their counts in the web layer's memory - for ASP.NET, the "Application" cache.
I would not bother with caching untill I will need this for sure. From my expirience this is no way to predict places that will require caching. Try iterative approach, try to implement witout cashe, then gether statistics and then implement right caching (there are many kinds like content, data, aggregates, distributed and so on).
BTW, I do not think that your query is CPU consuming. SQL server will optimaze that stuff and COUNT(*) will run in ticks...
tbl prefixes suck -- as much as Replies.aspx?id=x&page=y URIs do. Consider ASP.NET MVC or just routing part.
Second, do not optimize prematurely. However, if you really need so, denormalize your data: add TotalReplies column to your ForumTopics table and either rely on your DAL/BL to keep this field up to date (possibly with a scheduled task to resync those), or use triggers.
For each reply you need to keep TotalReplies and TotalDirectReplies. That way, you can support tree-like structure of replies, and keep counts update throughout the entire hierarchy without a need to count each time.