Suppose I am using HMSET myhash field1 "Hello" field2 "World" to set two fields in my redis, and some others are using HGETALL key to get all keys, In this case do they have the chance to get the result field1? or the result may only be either null or field1, field2.
Redis guarantees that both HMSET and HGETALL are executed atomically. So you'll either get null reply or all fields of myhash.
Related
A table with a primary key Id and an integer field, where the value is currently 100.
Two queries run at the same time on different connections, without an explicit "LOCK ROW" statement:
UPDATE TABLE SET thefield = thefield + 50 WHERE id = 12345
and
UPDATE TABLE SET thefield = thefield - 30 WHERE id = 12345
Does it matter whether I explicitly lock the row for the update?
I don't think it does. I think the result will always be 120 regardless of which query gets there first or if they both hit at the same second. I am pretty sure this gets evaluated as an atomic operation.
Or does it depend on the vendor and table engine?
Is there a potential for a collision where thefield is out of sync before the SET part of the query increments the value?
I thought that was the way it worked, and these will always have a consistent result. Am I a dummy who is that much out of practice?
Is it a good practice to sort an internal table (which is already sorted based on three fields) based on one field for the purpose of read table using binary search?
for example:
SORT itab by field1 field2 field3.
READ TABLE itab WITH KEY field1 = 'X'
field2 = 'Y'
field3 = 'Z' BINARY SEARCH.
SORT itab by field1.
READ TABLE itab WITH KEY field1 = 'X' BINARY SEARCH.
Is it okay if I sort the internal table once again based on field1 to serve for the purpose of second read table statement?
It is very bad practice
You would get the same result by leaving out the second SORT, but much faster, as the internal table is already in the right order:
SORT itab by field1 field2 field3.
READ TABLE itab WITH KEY field1 = 'X'
field2 = 'Y'
field3 = 'Z' BINARY SEARCH.
READ TABLE itab WITH KEY field1 = 'X' BINARY SEARCH.
Even if the second READ TABLE is by field2, you should leave out the SORT (and of course the BINARY SEARCH too).
Scanning a table from start to finish is linear with the number of lines, but SORT + BINARY SEARCH speed is n+log(n) even in the best case.
It is a bad practice to sort a table to only read one record.
The effort needed for the sort and the read combined is always more than the effort needed to do the one read on an unsorted table.
I have to update about 300 rows in a large table (600m rows) and I'm trying to make it faster.
The query I am using is a bit tricky:
UPDATE my_table
SET name = CASE WHEN (event_name in ('event_1', 'event_2', 'event_3'))
THEN 'deleted' ELSE name END
WHERE uid IN ('id_1', 'id_2')
I try to use EXPLAIN on this query and I get:
XN Seq Scan on my_table (cost=0.00..103935.76 rows=4326 width=9838)
Filter: (((uid)::text = 'id_1'::text) OR ((uid)::text = 'id_2'::text))
I have an interleaved sortkey, and uid is one of the columns included in this sortkey.
The reason for why the query looks like this is that in the real context the number of columns in SET (along with name) might vary, but it probably won't be more than 10.
Basic idea is that I don't want cross join (update rules are specific to the columns, I don't want to mix them together).
For example in future there will be a query like:
UPDATE my_table
SET name = CASE WHEN (event_name in ("event_1", "event_2", "event_3")) THEN 'deleted' ELSE name END,
address = CASE WHEN (event_name in ("event_1", "event_4")) THEN 'deleted' ELSE address END
WHERE uid IN ("id_1", "id_2")
Anyway, back to the first query, it runs for a very long time (about 45 minutes) and takes 100% CPU.
I tried to check even simpler query:
explain UPDATE my_table SET name = 'deleted' WHERE uid IN ('id_1', 'id_2')
XN Seq Scan on my_table (cost=0.00..103816.80 rows=4326 width=9821)
Filter: (((uid)::text = 'id_1'::text) OR ((uid)::text = 'id_2'::text))
I don't know what else I can add to the question to make it more clear, would be happy to hear any advice.
Have you tried removing the interleaved sort key and replacing it with a simple sort key on uid or a compound sort key with uid as the first column?
Also, the name uid makes me think that you may being using a GUID/UUID as the value. I would suggest that this is an anti-pattern for an id value in Redshift and especially for a sort key.
Problems with GUID/UUID id:
Do not occur in a predictable sequence
Often triggers a full sequential scan
New rows always disrupt the sort
Compress very poorly
Requires more disk space for storage
Requires more data to be read when queried
update in redshift is delete and then insert. Redshift by design just mark the rows as deleted and not deleting them physically(ghost rows). Explicit vacuum delete only < table_name > required to reclaim space.
Seq. Scan impacted by these ghost rows. Would suggest to run above command and check the performance of query later.
I want to erase a row in my database, I have 2 options; first to use a normal column to delete the row, second, the primary key?
I know that primary key is better, but why?
On MySql you can face strange locking behaviour in multiuser environment when deleting/updating rows using non-primary key columns.
Here is an example - two sessions trying to delete rows (autocommit is disabled).
C:\mysql\bin>mysql
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 2
Server version: 5.5.29-log MySQL Community Server (GPL)
Copyright (c) 2000, 2012, Oracle and/or its affiliates. All rights reserved.
mysql> create table test(
-> id int primary key,
-> val int
-> );
Query OK, 0 rows affected (0.02 sec)
......
mysql> select * from test;
+----+------+
| id | val |
+----+------+
| 1 | 1 |
| 2 | 2 |
| 3 | 3 |
| 4 | 4 |
| 5 | 5 |
| 6 | 6 |
+----+------+
6 rows in set (0.00 sec)
Now in session 1 we will delete row #5 using primary key
mysql> delete from test where id = 5;
Query OK, 1 row affected (0.00 sec)
and then in session 2 we delete row #2 using PK too
mysql> set autocommit=0;
Query OK, 0 rows affected (0.00 sec)
mysql> delete from test where id = 2;
Query OK, 1 row affected (0.00 sec)
Everything looks OK - row #5 was deleted by session 1 and row #2 deleted in session 2
And now look what will happen when we wil try to delete rows using non primary key:
Session 1
mysql> rollback;
Query OK, 0 rows affected (0.00 sec)
mysql> delete from test where val = 5;
Query OK, 1 row affected (0.00 sec)
and session 2
mysql> rollback;
Query OK, 0 rows affected (0.00 sec)
mysql> delete from test where val = 2;
ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction
mysql>
Delete command in session 2 "hangs", and after a minute or so it throws an error: Lock wait timeout
Lets try to delete others rows:
mysql> delete from test where val = 4;
ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction
mysql> delete from test where val = 6;
ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction
mysql>
Session 1 deletes only row #5, and, logically, a lock shuould be placed only on record #5 beying deleted, but as you can see in these examples, when not using primary key, MySql placed locks on all rows of the whole table. So it is safer to delete rows using only primary key (at least on MySql).
Primary key is better because you are sure what row you are deleting: although technically you can update a primary key column, it is not a normal practice to do so. Other columns, however, are changeable, which could lead to situations like this:
You have a table with a PK and another unique identifier, say, email
You read a row with email sample_email#gmail.com, and decide to delete it
The row gets modified concurrently, with the e-mail updated to simple_email#gmail.com
You execute the DELETE USER WHERE email='sample_email#gmail.com'
The DELETE command does not delete anything, because the e-mail has been changed before you managed to run your command. Since PK is not supposed to change, this situation would not be possible under normal circumstances. Of course your code can detect that deletion did not happen, redo the read, and re-issue the command, but that is a lot of work compared to using a primary key.
I assume that you mean a statement like this:
delete from table
where column = value
When the column is a primary key, it automatically has a unique index on it (at least in the databases I know of). This makes it fast to find the record to be deleted.
Another column with an index would be almost as fast, because it could use an index lookup.
A column without an index would be much slower, because the query would have to do a full table scan.
Rows are uniquely identified by superkeys, including candidate keys. The primary key is one candidate key, but not necessarily the only one.
There's no fundamental reason why the primary key must always be a "better" way to specify an update, delete or other operation. Use whichever attributes best express the intended update, especially bearing in mind that some attribute values may be subject to change. Very often a primary key is chosen as such because it is a candidate key thought unlikely to change or alternatively because it is the "preferred" identifier for updates.
Suppose a table has two keys: j and k, where k is designated the primary key. If a user actually wanted to perform an update based on the value of j: DELETE ... WHERE j=123; then depending on the effective transaction isolation level and the stability of either attribute that update could express a quite different intention from a similar update based on the corresponding value of k at some point in time. This is true whichever attribute is subject to change. So if you are concerned about the effect of changing key values then you should consider which choice of key(s) best expresses the user's intended update. Assuming candidate key value changes are rare anyway then using a designated primary key for all updates is typically the "default" assumption because consistently using the same key makes coding simpler.
My schema has 3 columns: ID1, ID2, Status
Each of the above columns is a string.
I would like to create a constraint which is the following:
There cannot be multiple records which have the same ID1 and ID2 which are in the 'UNPROCESSED' state. It is ok if there are multiple records which have the same ID1 and ID2 which are not in the UNPROCESSED state.
Is it possible to do this in SQL Server?
Assuming you're using SQL Server 2008 or later, you can apply a filtered index:
CREATE UNIQUE INDEX UQ_Unprocessed_IDs ON UnnamedTable (ID1,ID2)
WHERE (Status='Unprocessed')
I don't believe you can do that with a constraint. You need to implement a trigger on insert/update operations. The problem with SQL Server is that triggers are 'AFTER' triggers. There's no such thing as a 'BEFORE' trigger (though there is an 'INSTEAD OF' trigger type.
Hence, you need to do all the work to perform the transaction, vet it and roll it back if the constraint fails, rather than simply checking to see if the transaction would cause the constraint to be violated.
I would do something like having a column called ProcessedId (instead of Status) and assign values based on the ID1 and ID2 being processed or not. To clarify, see below
[ID1, ID2, ProcessId]
SomeId1, SomeId2, -1
SomeOtherId1, SomeOtherId2, 100304
So any unprocessed set of Ids will always have a ProcessId of -1, blocking and duplicates and any PROCESSED set of Id's will be assigned some sort of sequential number, allowing duplicates. Make sense?
So continuing on with my above example, if record set came in again unprocessed it would have a ProcessId of -1 and cause a PK violation.
You can do this using an view that is a union of two underlying tables, one for processed records, one for unprocessed reports.
However, the view itself cannot be made updatable, and if records ever change from processed back to unprocessed, and do so very often, this will perform poorly.
It will also make your thinking about parallel processing scenarios more complex.
Both of the limitations above are because you are replacing some updates by delete+insert.
A filtered index is typically a better solution if you have at least SQL Server 2008.