I'm working on a migration project, currently the data look-up is from an in-memory cache. We are switching to Redis in-memory cache , looking for a solution in Redis cache.
Right now in our table there are records with min and max values. The table column is as below
Min Max vendor_name
1000 2000 VENDOR_1
2000 3000 VENDOR_2
Request from a client whose accoutNo is falling in between the range (Min - Max), from the above example if the client accountNo is 1500 then the response will be VENDOR_1. Min and max will be unique values there will not be duplicates, there will be millions of records in the DB for min-max (range) . I'm trying to insert these records in to Redis, went through redis documents and did some online reading. There are many references to redis ZRANGEBYSCORE but I could not finalize any solution. If anyone have implemented something similar in Redis please share. Highly appreciate .
I tried the below one, which did not give the desired result.
zadd 123456 999.1 vendor1 10000.1 vendor2 10010.1 vendor3 15000.1 vendor4 16000.1 vendor5
zrangebyscore 123456 16000 +inf limit 0 1
"vendor5"
Related
I am trying to improve the performance of updating only about 60K rows with data coming from different rows in the same table. At about 2 minutes, it's not terrible, but it's not great either, and my application really doesn't work if you have to wait so long between recalculations.
The app generates a set of financial statements for a business, where it calculates basic formulas on 1300 line items, like Rent, or Direct Labor, or Inventory costs, all of which roll up to totals that mimic the Balance Sheet, P&Ls, Cash Flow etc. Many of the line items need to calculate on a month by month basis, where for instance it has figure out April's On Hand Inventory before knowing what April's Inventory Value is. So the total program ends up looping through 48 months over 30 calculation passes, requiring about 8000 SQL statements. (fortunately it figures it all out by itself!) Each SQL is taking only a few milliseconds, but it adds up.
I'm pretty sure I can't reduce the number of loops, so I keep trying to figure out how to make each SQL quicker. The basic structure is as follows:
LI: Line item table that holds the basic info of each item, primary key LID
LID Name
123 Sales_1
124 Sales_2
200 Total Sales
Formula: Master/Detail tables that create any formula from the line items
Total sales=Sales_1 + Sales_2
or
{200}={123}+{124}
(I use curly braces to be able to find and replace the LIDs within the formula, as shown in the SQL below)
FC: Formula Calculation table: all line items by month, about 1300 items x 48 months=62K records. Primary key FID
FID SQL_ID LID LID_brace LIN OutputMonth Formula Amount
3232 25 123 {123} Sales_1 1 1200
3255 26 124 {124} Sales_2 1 1500
5454 177 200 {200} Total Sales 1 {123}+{124}
DMO:Operand Join table, which links a formula to its detail lines within the same table, so once Sales_1 is calculated, it can find the Total Sales record and update it, which then will evaluate then send its amount up the chain to the other LIDs that depend on it, such as Total Income. It locates the record to update based on the SQL_id, which is set based on the calc pass and month. Its complex to setup, but pretty straightforward once you actually run things
Master_FID Detail_FID
5454 3232 (links total sales to sales_1)
5454 3255 (links total sales to sales_2)
SQL1:
Update FC inner join DMO on FC.FID=DMO.Master_FID inner join FC2 on DMO.Detail_FID=FC2.FID set FC.formula=replace(FC.Formula,FC.LID_brace,FC2.Amount) where FC.sql_id=177
The above will change {123} + {124} to 1200+1500 which will then evaluate to 2700 when I run the following
SQL2:
UPDATE FC SET FC.amount = Eval([fc].[formula]) WHERE (((FC.calc_sql_id)=177 )
So those two sql statements are run over and over again, with the only thing changing is the SQL_id.
There are indexes on the SQL_ID, LID, FID etc
When measuring, the milliseconds per record can range from .04ms if there are many records included (~10K for some passes), up to 10 or 15 ms for just one record updated. Perhaps it is the setup of the query causing a whole lot of overhead time, because it doesn't seem to be a function of the actual number of records updated? Also its not very consistent, where some runs have 20+ ms compared to less than 3ms when it runs it again.
I know this is a complex question i'm asking that probably doesn't have a simple answer, but I'm just looking for directions for what might help. For instance, a parameter query if there isn't a whole lot of change between runs? Does Access have a better time of running a query if knows about it in advance, i.e a named query with parameters vs dynamic SQL? Am I just doomed because it still needs to run those 8000 queries?
Also, is there inherently a problem with trying to update the same table through a secondary join table, and/or is there a better way to do it?
Is it also because string replacing isn't efficient this way? If I tried RegEx would that be quicker? I would have to make a function that could do that within a query, but it seems like that's going to be slower.
Thanks in advance, this has been a most vexing problem!!!
I want to use redis to store data that is sourced from a sql db. In the db, each row has an ID, date, and value, where the ID and date make up a composite key (there can only be one value for a particular ID and date). An example is below:
ID Date Value
1 01/01/2001 1.2
1 02/01/2001 1.5
1 04/23/2002 1.5
2 05/05/2009 0.4
Users should be able to query this data in redis given a particular ID and date range. For example, they might want all values for 2019 with ID 45. If the user does not specify a start or end time, we use the system's Date.Min or Date.Max respectively. We also want to support refreshing redis data from the database using the same parameters (ID and date range).
Initially, I used a zset:
zset key zset member score
1 01/01/2001_1.2 20010101
1 02/01/2001_1.5 20010201
1 04/23/2002_1.5 20020423
2 05/05/2009_0.4 20090505
Only, what happens if the value field changes in the db? For instance, ID 1 and date 01/01/2001 might change to 1.3 later on. I would want the original value to be updated, but instead, a new member will be inserted. Rather, I would need to first check that a member for a particular score exists, and delete if it does before inserting a new member. I imagine this could get expensive if refreshing, for example, 10 years worth of data.
I thought of two possible fixes to this:
1.) Use a zset and string key-value:
zset key zset value score
1 1_01/01/2001 20010101
1 1_02/01/2001 20010201
1 1_04/23/2002 20020423
2 2_05/05/2009 20090505
string key string value
1_01/01/2001 1.2
1_02/01/2001 1.5
1_04/23/2002 1.5
2_05/05/2009 0.4
This allows me to easily update the value, and query for a date range, but adds some complexity as now I need to use two redis data structures instead of 1.
2.) Use a hash table:
hash key sub-key value
1 01/01/2001 1.2
1 02/01/2001 1.5
1 04/23/2002 1.5
2 05/05/2009 0.4
This is nice because I only have to use 1 data structure and although it would be O(N) to get all values for a particular hash key, solution 1 would have the same drawback when getting values for all string keys returned from the zset.
However, with this solution, I now need to generate all sub-keys between a given start and end date in my calling code, and not every date may have a value. There are also some edge cases that I now need to handle (what if the user wants all values up until today? Do I use HGETALL and just remove the ones in the future I don't care about? At what date range size should I use HGETALL rather than HMGET?)
In my view, there are pro's and con's to each solution, and I'm not sure which one will be easier in the long term to maintain. Does anyone have thoughts as to which structure they would choose in this situation?
I am using SQL Server 2005.
I have a site that people can vote on awesome motorcycles. Each time a user votes, there is one for the first bike and one vote against the second bike. Two votes are stored in the database. The vote table looks like this:
VoteID VoteDate BikeID Vote
1 2012-01-12 123 1
2 2012-01-12 125 0
3 2012-01-12 126 0
4 2012-01-12 129 1
I want to tally the votes for each bike quite frequently, say each hour. My idea is to store the tally as a percentage of contest won versus lost on the bike table as an attribute of the bike. So, if a bike won 10 contests and lost 20 contest, they would have a score (tally) of 33. I would tally up daily, weekly, and monthly scores.
BikeID BikeName DailyTally WeeklyTally MonthlyTally
1 Big Dog 5 10 50
2 Big Cat 3 15 40
3 Small Dog 9 8 0
4 Fish Face 19 21 0
Right now, there are about 500 votes per day being cast. We anticipate 2500 - 5000 per day in the next month or so.
What is the best way to tally the data and what is the best way to store it? Should the tallies be on their own table? Should a trigger be used to run a new tally each time a bike is voted on? Should a stored procedure be run hourly to get all tallies?
Any ideas would be very helpful!
Store your VoteDate as a datetime value instead of just date.
For your tallies, you can just make that a view and calculate it on the fly. This should be very simple to do using GROUP BY and DATEPART functions. If you need exact code for how to do this, please open a new question.
For that low volume of rows it doesn't make any sense to store aggregations in a table when you can just calculate them whenever you want to see them and get accurate and immediate results that are up-to-date.
I agree with #JNK try a view or just a normal stored proc to calculate the outputs on the fly. If you find it becomes too slow as your data grows I would investigate other routes then (like caching the data in another table etc). Probably worth keeping it simple to start with; you can always resuse the logic from the SP/VIEW later if you do want to setup a scheduled task.
Edit :
Removed the index view as per #Damien_The_Unbeliever comments its not deterministic and i'm stupid :)
First of all, I am new to optimizing mysql. The fact is that I have in my web application (around 400 queries per second), a query that uses a GROUP BY that i canĀ“t avoid and that is the cause of creating temporary tables. My configuration was:
max_heap_table_size = 16M
tmp_table_size = 32M
The result: temp table to disk percent + - 12.5%
Then I changed my settings, according to this post
max_heap_table_size = 128M
tmp_table_size = 128M
The result: temp table to disk percent + - 18%
The results were not expected, do not understand why.
It is wrong tmp_table_size = max_heap_table_size?
Should not increase the size?
Query
SELECT images, id
FROM classifieds_ads
WHERE parent_category = '1' AND published='1' AND outdated='0'
GROUP BY aux_order
ORDER BY date_lastmodified DESC
LIMIT 0, 100;
EXPLAIN
| 1 |SIMPLE|classifieds_ads | ref |parent_category, published, combined_parent_oudated_published, oudated | combined_parent_oudated_published | 7 | const,const,const | 67552 | Using where; Using temporary; Using filesort |
"Using temporary" in the EXPLAIN report does not tell us that the temp table was on disk. It only tells us that the query expects to create a temp table.
The temp table will stay in memory if its size is less than tmp_table_size and less than max_heap_table_size.
Max_heap_table_size is the largest a table can be in the MEMORY storage engine, whether that table is a temp table or non-temp table.
Tmp_table_size is the largest a table can be in memory when it is created automatically by a query. But this can't be larger than max_heap_table_size anyway. So there's no benefit to setting tmp_table_size greater than max_heap_table_size. It's common to set these two config variables to the same value.
You can monitor how many temp tables were created, and how many on disk like this:
mysql> show global status like 'Created%';
+-------------------------+-------+
| Variable_name | Value |
+-------------------------+-------+
| Created_tmp_disk_tables | 20 |
| Created_tmp_files | 6 |
| Created_tmp_tables | 43 |
+-------------------------+-------+
Note in this example, 43 temp tables were created, but only 20 of those were on disk.
When you increase the limits of tmp_table_size and max_heap_table_size, you allow larger temp tables to exist in memory.
You may ask, how large do you need to make it? You don't necessarily need to make it large enough for every single temp table to fit in memory. You might want 95% of your temp tables to fit in memory and only the remaining rare tables go on disk. Those last 5% might be very large -- a lot larger than the amount of memory you want to use for that.
So my practice is to increase tmp_table_size and max_heap_table_size conservatively. Then watch the ratio of Created_tmp_disk_tables to Created_tmp_tables to see if I have met my goal of making 95% of them stay in memory (or whatever ratio I want to see).
Unfortunately, MySQL doesn't have a good way to tell you exactly how large the temp tables were. That will vary per query, so the status variables can't show that, they can only show you a count of how many times it has occurred. And EXPLAIN doesn't actually execute the query so it can't predict exactly how much data it will match.
An alternative is Percona Server, which is a distribution of MySQL with improvements. One of these is to log extra information in the slow-query log. Included in the extra fields is the size of any temp tables created by a given query.
For ten years we've been using the same custom sorting on our tables, I'm wondering if there is another solution which involves fewer updates, especially since today we'd like to have a replication/publication date and wouldn't like to have our replication replicate unnecessary entries.I had a look into nested sets, but it doesn't seem to do the job for us.
Base table:
id | a_sort
---+-------
1 10
2 20
3 30
After inserting:
insert into table (a_sort) values(15)
An entry at the second position.
id | a_sort
---+-------
1 10
2 20
3 30
4 15
Ordering the table with:
select * from table order by a_sort
and resorting all the a_sort entries, updating at least id=(2,3,4)
will of course produce the desired output:
id | a_sort
---+-------
1 10
4 20
2 30
3 40
The column names, the column count, datatypes, a possible join, possible triggers or the way the resorting is done is/are irrelevant to the problem.Also we've found some pretty neat ways to do this task fast.
only; how the heck can we reduce the updates in the db to 1 or 2 max.
Seems like an awfully common problem.
The captain obvious in me thougth once "use an a_sort float(53), insert using a fixed value of ordervaluefirstentry+abs(ordervaluefirstentry-ordervaluenextentry)/2".
But this would only allow around 1040 "in between" entries - so never resorting seems a bit problematic ;)
You really didn't describe what you're doing with this data, so forgive me if this is a crazy idea for your situation:
You could make a sort of 'linked list' where instead of a column of values, you have a column for the 'next highest valued' id. This would decrease the number of updates to a maximum of 2.
You can make it doubly linked and also have a column for next lowest, which would bring the maximum number of updates to 3.
See:
http://en.wikipedia.org/wiki/Linked_list