Redis conditional retrieve and delete records - redis

We are using mysql as message queue, now we want to change from mysql to redis.
We are facing few difficulties to implement the same logic of mysql in redis.
In Mysql the process as follows:
Bulk insert into mysql database using load data infile.
in other php script we select the records order by priority with conditions and delete those records from database and process those records
How we can achieve same in redis?
in redis we are able to insert bulk data using pipe with lpush like key and json_encoded data
how can we get the data from redis key order by priority with some conditions and delete those records from redis?
We have table structure in mysql as below:
CREATE TABLE `message_queue` (
`sql_id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`msgdata` text,
`pid` tinyint(4) DEFAULT NULL,
`receiver` varchar(20) DEFAULT NULL,
`time` bigint(20) DEFAULT '0',
`udhdata` varchar(100) DEFAULT NULL,
PRIMARY KEY (`sql_id`),
KEY `pid` (`pid`),
KEY `time` (`time`)
) ENGINE=MyISAM
SELECT and delete queries
SELECT * FROM message_queue WHERE (time = 0 OR time <= UNIX_TIMESTAMP()) ORDER BY pid DESC, sql_id ASC limit 500;
DELETE FROM message_queue WHERE sql_id in(ids_list)

To satisfy most of the requirements you'll need to decompose your data into several data structures.
Insert:
Every record from your message_queue table should be stored in a Hash - sql_id looks like a good key name candidate.
Keep a Sorted Set, e.g. message_by_time where each member is the sql_id and the score is time.
Query:
Use ZRANGEBYSCORE message_by_time 0 0 and ZRANGEBYSCORE message_by_time -inf <replace-with-timestamp> to get the initial range.
Delete:
Call DEL and ZREM for each element.
Your requirements also specify the need to sort and limit - these can also be done in Redis, but I recommend handling them in the code instead.

Related

How to generate a unique numeric ID in SQL Server (not using identity)?

I need a unique number id for my table. Usually I would use Identity in Sql Server, but there is a catch to my use case. I would like to know the id before the row is created (to be able to reference it in other records in memory, before committing everything to the database).
I don't know if it's possible to achieve with Identity, but I could not figure that out.
So my next best guess is that I need a table that will store one value and keep incrementing it and returning me a new value for the id. Access would have to be locked so that no two operations can get the same value.
I am thinking of using e.g. sp_getapplock #Resource = 'MyUniqueId' to prevent same number from being returned to a caller. Perhaps I can use ordinary locking in transactions for that as well.
Is there any better approach to the problem?
You can create a SEQUENCE object that produces incrementing values. A SEQUENCE can be used independently or as a default value for one or more tables.
You can create a sequence with CREATE SEQUENCE :
CREATE SEQUENCE Audit.EventCounter
AS int
START WITH 1
INCREMENT BY 1 ;
You can retrieve the next value atomically with NEXT VALUE FOR and use it in multiple statements eg :
DECLARE #NextID int ;
SET #NextID = NEXT VALUE FOR Audit.EventCounter;
Rolling back a transaction doesn't affect a SEQUENCE. From the docs:
Sequence numbers are generated outside the scope of the current transaction. They are consumed whether the transaction using the sequence number is committed or rolled back.
You can use NEXT VALUE FOR as a default in multiple tables. In the documentation example, three different types of event table use the same SEQUENCE allowing all events to get unique numbers:
CREATE TABLE Audit.ProcessEvents
(
EventID int PRIMARY KEY CLUSTERED
DEFAULT (NEXT VALUE FOR Audit.EventCounter),
EventTime datetime NOT NULL DEFAULT (getdate()),
EventCode nvarchar(5) NOT NULL,
Description nvarchar(300) NULL
) ;
GO
CREATE TABLE Audit.ErrorEvents
(
EventID int PRIMARY KEY CLUSTERED
DEFAULT (NEXT VALUE FOR Audit.EventCounter),
EventTime datetime NOT NULL DEFAULT (getdate()),
EquipmentID int NULL,
ErrorNumber int NOT NULL,
EventDesc nvarchar(256) NULL
) ;
GO
CREATE TABLE Audit.StartStopEvents
(
EventID int PRIMARY KEY CLUSTERED
DEFAULT (NEXT VALUE FOR Audit.EventCounter),
EventTime datetime NOT NULL DEFAULT (getdate()),
EquipmentID int NOT NULL,
StartOrStop bit NOT NULL
) ;
GO
One option here would be to use a UUID to represent each unique record. Should you want to generate the UUID within SQL Server, you could use the NEWID() function (see the documentation for more information). If this value would be generated from your application code, you could convert it to uniqueidentifier type within SQL Server using CONVERT.
For reference, a UUID is a 16 byte unique identifier. It is extremely unlikely that your application or SQL Server would ever generate the same UUID more than once. They look like this:
773c1570-1076-4e19-b728-6d7b0b20895a
If you want a behaviour that matches the one of an IDENTITY column, try:
CREATE SEQUENCE mydb.dbo.mysequence;
And then, repeatedly:
SELECT NEXT VALUE FOR mysequence;
And , if you want to play some more, see here:
https://learn.microsoft.com/en-us/sql/t-sql/statements/create-sequence-transact-sql?view=sql-server-ver15
happy playing ...

Postgres : Deadlock with 2 update queries

I am trying to wrap my head around with this deadlock issue in our production, and now i really need some help.
PostgreSQL 9.5.10
Query 1: (Updating 1000 records)
update entitlements.stream_general sg
set stream_offset_id =nextval( 'entitlements.stream_general_stream_offset_id_seq' ),
should_update_offset_id = false
from (select id, topic, stream_id from entitlements.stream_general where should_update_offset_id = true limit 1000) sg2
where sg.id=sg2.id and sg.topic=sg2.topic and sg.stream_id = sg2.stream_id
Query 2: (Updating a single record)
update entitlements.stream_general set stream_action = $1::entitlements.stream_action_type, update_dt = now(), should_update_offset_id = true where stream_id = $2 and topic = $3 and id = $4
Exception :
Process 60563 waits for ShareLock on transaction 3603536083; blocked
by process 60701. Process 60701 waits for ShareLock on transaction
3603536039; blocked by process 60563.
Since there are only two transactions involved in the deadlock processes, how can one Update can be in deadlock with another Update. According to my understanding, after first Update there will be RowExclusiveLock on all those rows, and second Update should get blocked. How can there be a DEADLOCK?
stream_general table schema :
CREATE TABLE entitlements.stream_general (
stream_id int4 NOT NULL,
id varchar NOT NULL,
topic varchar NOT NULL,
stream_offset_id int8 NOT NULL DEFAULT '-1'::integer,
create_dt timestamptz NOT NULL DEFAULT now(),
update_dt timestamptz NOT NULL DEFAULT now(),
stream_action stream_action_type NOT NULL,
should_update_offset_id bool NULL,
PRIMARY KEY (stream_id, topic, id),
FOREIGN KEY (stream_id) REFERENCES entitlements.stream(stream_id) ON DELETE CASCADE
)
WITH (
OIDS=FALSE
) ;
CREATE INDEX stream_general_id_idx ON entitlements.stream_general USING btree (id, topic) ;
CREATE INDEX stream_general_should_update_offset_id_index ON entitlements.stream_general USING btree (should_update_offset_id) ;
CREATE INDEX stream_general_stream_id_idx ON entitlements.stream_general USING btree (stream_id, topic, stream_offset_id) ;
Note : stream_id is the foreign key.
Only culprit i can think of is subquery in Query1, but how i am not able to figure out how that Select can be problematic. Or may be something is up with foreign constraints.
The first query is first taking a read lock before taking a write lock. Could that cause a deadlock when the second query waits for a write lock too?
Try for update to have the subquery acquire a write lock?
update ...
from (
select ...
from ...
FOR UPDATE
) sg2

MySQL query slow when selecting VARCHAR

I have this table:
CREATE TABLE `search_engine_rankings` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`keyword_id` int(11) DEFAULT NULL,
`search_engine_id` int(11) DEFAULT NULL,
`total_results` int(11) DEFAULT NULL,
`rank` int(11) DEFAULT NULL,
`url` varchar(255) DEFAULT NULL,
`created_at` datetime DEFAULT NULL,
`updated_at` datetime DEFAULT NULL,
`indexed_at` date DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `unique_ranking` (`keyword_id`,`search_engine_id`,`rank`,`indexed_at`),
KEY `search_engine_rankings_search_engine_id_fk` (`search_engine_id`),
CONSTRAINT `search_engine_rankings_keyword_id_fk` FOREIGN KEY (`keyword_id`) REFERENCES `keywords` (`id`) ON DELETE CASCADE,
CONSTRAINT `search_engine_rankings_search_engine_id_fk` FOREIGN KEY (`search_engine_id`) REFERENCES `search_engines` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=244454637 DEFAULT CHARSET=utf8
It has about 250M rows in production.
When I do:
select id,
rank
from search_engine_rankings
where keyword_id = 19
and search_engine_id = 11
and indexed_at = "2010-12-03";
...it runs very quickly.
When I add the url column (VARCHAR):
select id,
rank,
url
from search_engine_rankings
where keyword_id = 19
and search_engine_id = 11
and indexed_at = "2010-12-03";
...it runs very slowly.
Any ideas?
The first query can be satisfied by the index alone -- no need to read the base table to obtain the values in the Select clause. The second statement requires reads of the base table because the URL column is not part of the index.
UNIQUE KEY `unique_ranking` (`keyword_id`,`search_engine_id`,`rank`,`indexed_at`),
The rows in tbe base table are not in the same physical order as the rows in the index, and so the read of the base table can involve considerable disk-thrashing.
You can think of it as a kind of proof of optimization -- on the first query the disk-thrashing is avoided because the engine is smart enough to consult the index for the values requested in the select clause; it will already have read that index into RAM for the where clause, so it takes advantage of that fact.
Additionally to Tim's answer. An index in Mysql can only be used left-to-right. Which means it can use columns of your index in your WHERE clause only up to the point you use them.
Currently, your UNIQUE index is keyword_id,search_engine_id,rank,indexed_at. This will be able to filter the columns keyword_id and search_engine_id, still needing to scan over the remaining rows to filter for indexed_at
But if you change it to: keyword_id,search_engine_id,indexed_at,rank (just the order). This will be able to filter the columns keyword_id,search_engine_id and indexed_at
I believe it will be able to fully use that index to read the appropriate part of your table.
I know it's an old post but I was experiencing the same situation and I didn't found an answer.
This really happens in MySQL, when you have varchar columns it takes a lot of time processing. My query took about 20 sec to process 1.7M rows and now is about 1.9 sec.
Ok first of all, create a view from this query:
CREATE VIEW view_one AS
select id,rank
from search_engine_rankings
where keyword_id = 19000
and search_engine_id = 11
and indexed_at = "2010-12-03";
Second, same query but with an inner join:
select v.*, s.url
from view_one AS v
inner join search_engine_rankings s ON s.id=v.id;
TLDR: I solved this by running optimize on the table.
I experienced the same just now. Even lookups on primary key and selecting just some few rows was slow. Testing a bit, I found it not to be limited to the varchar column, selecting an int also took considerable amounts of time.
A query roughly looking like this took around 3s:
select someint from mytable where id in (1234, 12345, 123456).
While a query roughly looking like this took <10ms:
select count(*) from mytable where id in (1234, 12345, 123456).
The approved answer here is to just make an index spanning someint also, and it will be fast, as mysql can fetch all information it needs from the index and won't have to touch the table. That probably works in some settings, but I think it's a silly workaround - something is clearly wrong, it should not take three seconds to fetch three rows from a table! Besides, most applications just does a "select * from mytable", and doing changes at the application side is not always trivial.
After optimize table, both queries takes <10ms.

Mysql concurrent select and insert slow the database

I have a large mysql table (about 5M rows) on which i frequently insert data.
This table is the same i have to read data from and sometimes the entire database gets slow because of selecting data while there are many pending inserts.
I put indexes on each field i use in the WHERE statment, so i really don't know why select gets so slow.
Could anyone provide me a hint to solve this problem ?
here is the sql of table and query
CREATE TABLE `messages` (
`id` int(10) unsigned NOT NULL auto_increment,
`user_id` int(10) unsigned NOT NULL default '0',
`dest` varchar(20) character set latin1 default NULL,
`body` text character set latin1,
`sent_on` timestamp NOT NULL default CURRENT_TIMESTAMP,
`md5` varchar(32) character set latin1 NOT NULL default '',
`interface` enum('mobile','desktop') default NULL,
PRIMARY KEY (`id`),
KEY `user_id` (`user_id`),
KEY `md5` (`md5`),
FULLTEXT KEY `dest` (`dest`,`body`),
FULLTEXT KEY `body` (`body`)
) ENGINE=MyISAM AUTO_INCREMENT=7074256 DEFAULT CHARSET=utf8
and here the query:
EXPLAIN SELECT SQL_CALC_FOUND_ROWS id, sent_on, dest AS who, body,interface FROM messages WHERE user_id = 2 ORDER BY sent_on DESC LIMIT 0,50 \G;
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: messages
type: ref
possible_keys: user_id
key: user_id
key_len: 4
ref: const
rows: 13997
Extra: Using where; Using filesort
1 row in set (0.00 sec)
Note the following in your EXPLAIN output:
Extra: Using where; Using filesort
The Using filesort means that MySQL is dumping the query results to a file to sort it, then reading the results back in to get the top 50 rows.
While I'm no expert, I think that you could optimize this process by providing an index which can both satisfy the selection criteria and sort order all in one go; then the selection and ordering can be determiend by an index scan only, without having to sort the result set every time.
In this case, your WHERE is on user_id, and your ORDER BY is on sent_on. So, in theory, if you provide a single index on those two columns (in that order), then the engine will be able to use the first half of the index to filter the results, and because the second half of the index is on the sent_on column, the index results will already be in order according to that column, allowing MySQL to simply retrieve the first 50 results from that index. No additional sorting required.
Disclaimer: I'm not a DBA. I may be completely wrong.
See Also: Mysql.com: Multiple Column Indexes
Maybe you have disabled Concurrent Inserts?
Could the ORDER BY be slowing you down? I don't know if its a good idea to index sent_on, it would depend on SELECT vs INSERT frequency

MySQL 1 millon row query speed

I'm having trouble getting a decent query time out of a large MySQL table, currently its taking over 20 seconds. The problem lies in the GROUP BY as MySQL needs to run a filesort but I don't see how I can get around this
QUERY:
SELECT play_date, COUNT(DISTINCT(email)) AS count
FROM log
WHERE type = 'play'
AND play_date BETWEEN '2009-02-23'
AND '2009-02-24'
GROUP BY play_date
ORDER BY play_date desc
EXPLAIN:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE log ALL type,type_2 NULL NULL NULL 530892 Using where; Using filesort
TABLE STRUCTURE
CREATE TABLE IF NOT EXISTS `log` (
`id` int(11) NOT NULL auto_increment,
`email` varchar(255) NOT NULL,
`type` enum('played','reg','friend') NOT NULL,
`timestamp` timestamp NOT NULL default CURRENT_TIMESTAMP,
`play_date` date NOT NULL,
`email_refer` varchar(255) NOT NULL,
`remote_addr` varchar(15) NOT NULL,
PRIMARY KEY (`id`),
KEY `email` (`email`),
KEY `type` (`type`),
KEY `email_refer` (`email_refer`),
KEY `type_2` (`type`,`timestamp`,`play_date`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=707859 ;
If anyone knows how I could improve the speed I would be very greatful
Tom
EDIT
I've added the new index with just play_date and type but MySQL refuses to use it
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE log ALL play_date NULL NULL NULL 801647 Using where; Using filesort
This index was created using ALTER TABLE log ADD INDEX (type, play_date);
You need to create index on fields type AND play_date.
Like this:
ALTER TABLE `log` ADD INDEX (`type`, `play_date`);
Or, alternately, you can rearrange your last key like this:
KEY `type_2` (`type`,`play_date`,`timestamp`)
so MySQL can use its left part as a key.
You should add an index on the fields that you base your search on.
In your case it play_date and type
You're not taking advantage of the key named type_2. It is a composite key for type, timestamp and play_date, but you're filtering by type and play_date, ignoring timestamp. Because of this, the engine can't make use of that key.
You should create an index on the fields type and play_date, or remove timestamp from the key type_2.
Or you could try to incorporate timestamp into your current query as a filter. But judging from your current query I don't think that is logical.
Does there need to be an index on play_date, or move the position in the composite index to second place?
The fastest options would be this
ALTER TABLE `log` ADD INDEX (`type`, `play_date`, 'email');
It would turn this index into a "covering index", which would mean that the query would only access the index stored in memory and not even goto the hard disk.
The DESC parameter is causing MySQL not to use the index for the ORDER BY. You can leave it ASC and iterate the resultset in reverse on the client side (?).