How to pick transaction isolation levels? - sql

I have a table in database that is responsible for storing ordered/reorderable lists. It has the following shape:
| id | listId | index | title | ... |
where id is primary key, listId is foreign key that identifies what list the item belongs to, title and other columns are contents of items. index property is responsible for position of item in list. It is an integer counter (starting with 0) that is unique in the scope of the list, but may repeat across lists. Example data:
| id | listId | index | title | ...
---------------------------------------------
| "item1" | "list1" | 0 | "title1" | ...
| "item2" | "list1" | 1 | "title2" | ...
| "item3" | "list1" | 2 | "title3" | ...
| "item4" | "list2" | 0 | "title4" | ...
| "item5" | "list2" | 1 | "title5" | ...
Users can create/delete items, move them inside the list or across lists.
To ensure consistency of indexes when running these operations, I do the following:
Create item:
Count items within this list
SELECT COUNT(DISTINCT "Item"."id") as "cnt"
FROM "item" "Item"
WHERE "Item"."listId" = ${listId}
Insert new item, with index set to count from step 1:
INSERT INTO "item"("id", "listId", "index", "title", ...)
VALUES (${id}, ${listId}, ${count}, ${title})
This way index grows with each item inserted into the list.
Move item:
Retrieve item's current listId and index:
SELECT "Item"."listId" AS "Item_listId", "Item"."index" AS "Item_index"
FROM "item" "Item"
WHERE "Item"."id" = ${id}
Change index of "shifted" items if necessary, so that order is consistent, e.g. given the item is moved forward, all items between its current position (exclusively) and its next position (inclusively) need to have their index decreased by 1:
UPDATE "item"
SET "index" = "index" - 1
WHERE "listId" = ${listId}
AND "index" BETWEEN ${sourceIndex + 1} AND ${destinationIndex}
I'll omit the variation with movement across lists because it is very similar.
Update the item itself:
UPDATE "item"
SET "index" = ${destinationIndex}
WHERE "id" = ${id}
Delete item:
Retrieve item's index and listId
Move all items in same list that are next to this item 1 step back, to remove the gap
UPDATE "item"
SET "index" = "index" - 1
WHERE "listId" = ${listId}
AND "index" > ${itemIndex}
Delete item:
DELETE FROM "item"
WHERE "id" = ${id}
Question is:
What transaction isolation levels should I provide for each of these operations? It is very important for me to keep index column consistent, no gaps and most importantly - no duplicates. Am I getting it right that create item operation is subject to phantom reads, because it counts items by some criteria, and it should be serializable? What about other operations?

Without knowing more about your specific application, the safest bet is indeed to use serializable as isolation level whenever you access that table but even that level may not be sufficient for your specific case.
A unique constraint on (listId, index) would prevent duplicates (what about the title? Can it be repeated in the same list?), some accurately crafted "watchdog" queries can further mitigate issues and database sequences or stored procedures can ensure that there are no gaps but truth is the mechanism itself seems fragile.
Knowing only so much of your specific problem, what you appear to have is a concurrency problem at user level in the sense that several users can access the same objects at the same time and make changes on them. Assuming this is your typical web-application with a stateless back-end (hence inherently distributed) this may carry a large amount of implications in terms of user experience reflecting on the architecture and even functional requirements. Say for example that user Foo moves item Car to List B which is currently being worked on by user Bar. It is then legit to assume that Bar will need to see item Car as soon as the operation is completed, but that will not happen unless there's some mechanism in place to immediately notify users of List B of the change. The more users you have working on the same set of lists, the worse it becomes even with notifications as you would have more and more of them up to the point where users see things changing all the time and just can't keep up with it.
There's a lot of assumptions anyone will make to provide you an answer. My own lead me to state that you probably need to revise the requirements for that application or ensure that management is aware of several limitations and that they accept them.
This type of problem is pretty common in distributed applications. Usually "locks" on certain sets of data are placed (either through database or shared memory pools) so that only one user can alter them at any given time or, alternatively, a workflow is provided to manage conflicting operations (much like versioning systems). When neither is done, a log of operations is kept to understand what happened and rectify problems later on should they be detected.

According to your constraints, you can create a unique index on two columns: listId,index can be defined as unique. That will avoid duplicates.
Additionally to avoid gaps I would recommend:
select listId, index, (select min(index) from Item i2 where listId = :listId and i2.index > i1.index) as nextIndex from Item i1 where nextIndex - index > 1 and listId = :listId
at the end of each transaction.
Together with transaction isolation level: "Repeatable Read" and rolling back and repeating the transaction if either the unique constraint fails, or the statement, I suggested, returned a record, this should meet your requirements.

Related

Designing Database to Support Multi-Language

I am attempting to introduce multi-language support into the back end of an application, and trying to figure out a strategy to efficiently implement this into the current schema.
I currently have multiple tables, all of which include standard english values.
My idea is in each table w/ english values, include a foreign key that relates to a single "Language" and multiple "Language values" in the table holding translations
So for ex.
Table-1
Value | Lang_ID
"This is a sentence" | 1
"This is also a sentence" | 2
"Translate this" | 3
Table-2
Value | Lang_ID
"This is a sentence from another table" | 4
"This table is different from table-1" | 5
Language-Table
Lang_ID | Lang_Code | Value
1 | "ZHO" | "这是一句话"
1 | "SPA" | "esta es una frase"
2 | "FRA" | "c'est aussi une phrase"
3 | "SPA" | "traduce esto"
4 | "FRA" | "ceci est une phrase d'un autre tableau"
....
My thinking is you would then just need to query Language-Table by WHERE Lang_ID=? AND Lang_Code=? to get the translation for that specific value.
I'm wondering a couple things
1) Is this a good practice?
2) How do I generate foreign keys that don't exist yet over multiple tables and keep each one unique (so as not to have two translations of different text values fall under the same Lang_ID)
Using another table to store the translations is a good idea. I have used it in my work as much as for personal projects. In my case, I designed it translation opened from the begining therefore I have no string value in my tables, even the default english values are in the translation table.
I have also put the language codes in a separate table, though I have never found a situation where I needed it, therefore I suggest you keep them where they are.
For the unique id, I suggest using a sequence like so (shamelessly stolen here):
create sequence sequence_name
start 1
increment 1
NO MAXVALUE
CACHE 1;
And use next_val('sequence_name') whenever inserting a new field to translate.

How to represent unique attribute in Z-notation without quantifiers?

Full disclosure, this is for a university course. I don't expect an answer outright, but help would be appreciated.
I need to model an Item entity using Z-notation. This is the description:
Item: Every item has a name and a unique ID which can be used to uniquely describe the item. An item also has a price (positive float) and a category.
Part of the requirement is modelling these entities without quantifiers.
This is what I ended up with, but I'm not sure that it's correct:
Schema for Item
The idea being that the name is some combination of strings, the ID is a tuple of a positive integer and said name, and both the price and the category are mapped with total functions.
The first predicate is to ensure a positive price, the second is to ensure the uniqueness of the ID, i.e. reduce the domain to all integers that are not already assigned. I don't think this is correct, though.
The main issue with your approach is that you try to put information about the whole system (or part of it) into the description of a single item. E.g. you specified the price as a mapping from the id to a float - which is fine in principle - but you do not have such a function for each item.
There are many ways to specify this, I show two approaches:
You have two schemas: E.g. Item and Database
+-- Item -----
| id: ℕ
| name: String
| price: ℝ
| category: String
|----
| price ≥ 0
+----------
+-- Database -----
| items: ℕ +-> Item
|----------
This way you have the ID of each item moved from the item itself. When each item has also a field id, it would be complicated to state without quantifiers the fact that items should map an id to an item with the same id. Or when you just use a set of items it would be complicated to describe without quantifiers that two items must have distinct identifiers.
The uniqueness of the id for each item is guaranteed by items being a function.
Or just use several functions for each aspect of an item:
+-- Items -----
| ids: ℕ
| name: ids --> String
| price: ids --> ℝ
| category: ids --> String
+----------
But stating the fact that all prices must be non-negative without quantifiers would be hard. Maybe by replacing ℝ by { x:ℝ | x≥0}.
A general remark: Do you need compute with your IDs? Maybe you can introduce a type with [ID] instead. The same applies for the category (e.g. [CATEGORY]).
And is the name not just a single string? But I don't think it would be a set of (unordered) strings.

Create new dimension using values from another dimension in SQL?

I currently have a SQL table that looks something like this:
RuleName | RuleGroup
---------------------------
Backdated task | DRFHA
Incorrect Num | FRCLSR
Incomplete close | CFPBDO
Appeal close | CFPBDO
Needs letter | CFPBCRE
Plan ND | DO
B7IND | CORE
I am currently writing a stored procedure in SSMS that pulls these dimensions from the existing table. However, I also want the procedure to create a new dimension that will create a "SuperGroup" dimension for each rule based on the text in it's RuleGroup (and an other column for the rest). For example:
RuleName | RuleGroup | SuperGroup
--------------------------------------------
Backdated task | DRFHA | Other
Incorrect Num | FRCLSR | Fore
Incomplete close | CFPBDO | DefaultOp
Appeal close | CFPBDO | DefaultOp
Needs letter | CFPBCRE | Core
Plan ND | DO | DefaultOp
B7IND | CORE | Core
I have currently tried used the "GROUP BY" function, as well as using SELECT with several "LIKE" statements. However, the issue is that this needs to be scaleable - although I only have 21 groups right now, I want to automatically sort if new groups are added.
Here is the SSMS procedure as well:
CREATE PROCEDURE [Rules].[PullRulesSpecifics]
AS
BEGIN
SELECT
ru.RuleName
ru.RuleGroup
FROM RuleData.groupings ru
WHERE 1=1
AND ru.ActiveRule = 1
AND ru.RuleOpen >= '2015-01-01'
Option 1: (the Normalized option)
Assuming that your database is well normalized, you should have a Foreign-Key constraint on your RuleGroup column that prevents users from entering whatever they like in there. This way, only valid RuleGroup values can be entered into the table. If this is that case (which I suspect it is not) then you can add a column to the Foreign-key table (the one that hold the list of valid RuleGroup values) that indicates to which SuperGroup the RuleGroup belongs. (The SuperGroup column would ideally have a FK constraint on it as well that references another table that contains all of the valid SuperGroup values.) If you use this approach, then there is no coding involved whenever a new SuperGroup is added. It maintains itself.
Option 2: (Not a best practice, try option #1 if you can)
Create a new SuperGroups table with 2 columns: SuperGroup and MatchingCriteria. Then you can join on the new SuperGroups table. (Note that this assumes that each MatchingCriteria is going to be mutually exclusive. If not, then you could match more than 1 SuperGroup and get results you might not have intended. Either that or you will have to find some other way to limit the results to a single SuperGroup.) The Query would look something like this:
SELECT
ru.RuleName,
ru.RuleGroup,
sg.SuperGroup
FROM RuleData.groupings ru
JOIN RuleData.SuperGroups sg ON ru.RuleGroup LIKE sg.MatchingCriteria
WHERE ru.ActiveRule = 1
AND ru.RuleOpen >= '2015-01-01'
I removed the WHERE 1=1 code. It was unnecessary and was probably just there to help you debug your problem.

Create a Parent–Children Web Part Page

I am trying to figure out how to do something that I would think is commonplace, but I cannot find how to do.
Given two Custom Lists, one with a field that is essentially a primary key, and the other with what is essentially a foreign key, I want to show all the rows from the first in one area of the display, and the related records for the selected row of the first, in a second part of the screen.
I am thinking this would be side–by–side web parts on a web-part page.
So:
ID pkID Data ID fkID Data
___________________ ______________________________
| 1 100 Row one. | | 8 100 Related one/one |
¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯ | 9 100 Related one/two |
2 113 Row two. | 10 100 Related one/three |
3 118 Row n. ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
11 113 Related two/one
12 113 Related two/two
13 118 Related n/one
(That is my attempt to show what is established between the two lists. Top row selected on the left, related records from the other row on the right.)
Surely this is common enough that there is a way to readily do this?
I suppose I might need to create a means of asserting that a row is 'selected.'
You will note that I am not useing the ID field that "belongs" to SharePoint.
You can create look up fields to establish that relationship, sharepoint 2010 even allows you to enforce the relationship like in a SQL database. so for instace you can declare what happens if you try to delete a parent if there is childs (Cascade, Prevent, etc).
Have a read here:
http://office.microsoft.com/en-au/sharepoint-server-help/create-list-relationships-by-using-unique-and-lookup-columns-HA101729901.aspx
About visually displaying them, you might have to create some webparts for it, as the only support OOB is the link to the child entity from the main entity on the parent list.

How should I go about implementing an "autonumber" field in SQL Server 2005?

I'm aware of IDENTITY fields but I have a feeling that I couldn't use one to solve my problem.
Let's say I have multiple clients. Each client has multiple orders. Each client needs to have their orders numbered sequentially, specific to them.
Example table structure:
Orders:
OrderID | ClientID | ClientOrderID | etc...
Some example rows for this table would be:
OrderID | ClientID | ClientOrderID | etc...
1 | 1 | 1 | ...
2 | 1 | 2 | ...
3 | 2 | 1 | ...
4 | 3 | 1 | ...
5 | 1 | 3 | ...
6 | 2 | 2 | ...
I know the naive way would be to take the MAX ClientOrderID for any client and use that value for INSERTs but that would be subject to concurrency issues. I was considering using a transaction but I'm not quite sure what the broadest isolation scope that can be used for this. I'll be using LINQ to SQL but I have feeling that isn't relevant.
Somebody correct me if I'm wrong, but as long as your MAX() call is in the same step as your insert, you won't have a problem with concurrency.
So, you could not do
select #newOrderID=max(ClientOrderID) + 1
from orders
where clientid=#myClientID;
insert into ( ClientID, ClientOrderID, ...)
values( #myClientID, #newOrderID, ...);
But you can do
insert into ( ClientID, ClientOrderID, ...)
select #myClientID, max(ClientOrderID) + 1, ...
from orders
where clientid=#myClientID;
I'm assuming OrderID is an identity column.
Again, if I'm incorrect on this, please let me know. Preferably with a URL
You could use a Repository pattern to handle your Orders and let it control the number of each specific clients order number. If you implement the OrderRepository correctly it could control the concurrency and number the order before saving it to the database (let the repository and not the db set the number).
Repository pattern: http://martinfowler.com/eaaCatalog/repository.html
One possibility (though I don't like to do this) is to have a lookup table that would tell you the greatest Order Number given for each vendor. Inside of a transaction, you'd fetch the most recent one from VendorOrderNumber, save your new order, increment the value in VendorOrderNumber, commit transaction.
This is an odd way to store data, but assuming you need it, there is nothing built-in that you can use.
Your suggestion of Max(ClientOrderID) is straight forward and pretty easy to implement (follow John MacIntyre's advice). It will probably work acceptably well on tables with a few thousand orders. As the table grows this approach will of course slow down.
Nick DeVore's suggestion of a lookup table is a little messier to implement but won't substantially be affected by data growth.
Depending on where/when you actually need the ClientOrderID, you could calculate the id when needed like this:
SELECT *,
ROW_NUMBER() OVER(ORDER BY OrderID) AS ClientOrderID
FROM Orders
WHERE ClientID = 1
This assumes that the ClientOrderIDs are in the same sequence as the OrderID. Without actually persisting the ID, it is awkward to use as a key to anything else. This approach should not be affected by data growth.