Database design for a step by step wizard - sql

I am designing a system containing logical steps with some actions associated (but the actions are not part of the question, but they are crucial for each step in the list)!
The ting is that I need to create a way to define all the logical steps in an ordered way, so that I can get the list by query, and also make modifications later on!
Anyone with some experience in this kind of database design?
I have been thinking of having a column named wizard_steps (or something similar), and then use priority to make the order, but for some reason i feel that this design at some point will fail (due to items with same priority, adding new items would then have to rearrange the rest of the items, and so forth)!
Another design I have been thinking about is the use of "next item" as a column in the wizard_step column, but I don't feel this is the correct step eighter!
So to summarize; I am trying to make a list (and the design should be open enought to support multiple lists) of elements where the order is crucial!
Any ideas on how the database should look like?
Thanks!
EDIT: I found this yii component I will check out: http://www.yiiframework.com/extension/simpleworkflow/
Might be a good solution!

If I get you well, your main concern is to create a schema that supports ordered lists and can provide easy insert/reordering of items.
The following table design:
id_list item_priority foreign_itemdef_id
1 1 245
1 2 32
1 3 45
2 1 156
2 2 248
2 3 127
coupled to a table with item definition will be easily queried but will be difficult to maintain, especially for insertions
That one:
id_list first_item_id
1 45
2 38
coupled to the linked list:
item_id next_item foreign_itemdef_id
45 381 56
381 NULL 59
38 39 89
39 42 78
42 NULL 45
Will be both difficult to query and update (you should update the linked list inside a transaction, otherwise your linked list can get corrupted).
I would prefer the first solution for simplicity.
Depending on your update frequency, you may consider using large increments between item_priority to help insertion:
id_list item_priority foreign_itemdef_id
1 1000 245
1 2000 32
1 3000 45
2 1000 156
2 2000 248
2 3000 127
1 2500 46 -- late insertion
1 2750 47 -- late insertion
EDIT:
Here's a query that will hopefully make room for an insertion: it increments priority of all rows above the argument
$query_make_room_for_new_item = "UPDATE item_priority_table SET item_priority = item_priority + 1 WHERE item_priority > ". $new_item_position_priority ." AND id_list = ".$id_list;
Then insert your item with priority $new_item_position_priority

Related

Second highest column

I have seen a similar question asked How to get second highest value among multiple columns in SQL ... however the solution won't work for Microsoft Access (Row_Number/Over Partition isn't valid in Access).
My Access query includes dozens of fields. I would like to create a new field/column that would return the second highest value of 10 specific columns that are included in the query, I will call this field "Cover". Something like this:
Product Bid1 Bid2 Bid3 Bid4 Cover
Watch 104 120 115 108 115
Shoe 65 78 79 76 18
Hat 20 22 19 20 20
I can do a really long SWITCH formula such as the following equivalent Excel formula:
IF( AND(Bid1> Bid2, Bid1 > Bid3, Bid1 > Bid4), Bid1,
AND(Bid2> Bid1, Bid2 > Bid3, Bid2 > Bid4), Bid2,
.....
But there must be a more efficient solution. A MAXIF equivalent would work perfectly if MS-Access Query had such a function.
Any ideas? Thank you in advance.
This would be easier if the data were laid out in a more normalized way. The clue is the numbered field names.
Your data is currently organized as a Pivot (known in Access as crosstab), but can easily be Unpivoted.
This data is much easier to work with if laid in a more normalized fashion which is this case would be:
Product Bid Amount
--------- ----- --------
Watch 1 104
Watch 2 120
Watch 3 115
Watch 4 108
Shoe 1 65
Shoe 2 78
Shoe 3 79
Shoe 4 76
Hat 1 20
Hat 2 22
Hat 3 19
Hat 4 20
This way querying becomes simpler.
It looks like you want the maximum of the bids, grouped by Product, so:
select Product, max(amount) as maxAmount
from myTable
group by product
Really, we shouldn't be storing text fields at all, so Product should be an ID number, with associated Product Names stored once in a separate table, instead of several times in the this one, like:
ProdID ProdName
-------- ----------
1 Watch
2 Shoe
3 Hat
... but that's another lesson.
Generally speaking repeating of anything should be avoided... that's pretty much the purpose of a database... but the links below will explain than I. :)
Quackit : Microsoft Access Tutorial
YouTube : DB Planning
Microsoft : Database Design Basics
Microsoft : Database Normalization Basics
Wikipedia : Database Normalization

SQL UPDATE SET interchanges values

I update a View to get in two columns the same value, but it interchanges the two values instead of just setting it. My (reduced for so) view UpdateADAuftrag2 is this.
SELECT dbo.CSDokument.AD1, dbo.UpdateAS400zuSellingBenutzer2.BenutzerNr
FROM dbo.AS400Auftrag
INNER JOIN
dbo.CSDokument ON dbo.AS400Auftrag.Angebotsnummer = dbo.CSDokument.Angebotsnummer
INNER JOIN
dbo.UpdateAS400zuSellingBenutzer2 ON dbo.AS400Auftrag.AD = dbo.UpdateAS400zuSellingBenutzer2.SchluesselWert
AND
dbo.CSDokument.AD1 <> dbo.UpdateAS400zuSellingBenutzer2.BenutzerNr
WHERE (dbo.AS400Auftrag.AD IS NOT NULL)
The important part is dbo.CSDokument.AD1 <> dbo.UpdateAS400zuSellingBenutzer2.BenutzerNr
AD1 is user number for external workers and BenutzerNr means user number. So e.g. the person Charlie Brown is an external worker and has the user number 31. When in AD1 is 31 - Charlie Brown is the external worker for this document (order in this case).
The Update statement loos like this
UPDATE [dbo].[UpdateADAuftrag2]
SET [AD1] = [BenutzerNr]
I have for example these values
AD1 | BenutzerNr
31 | 54
99 | 384
112 | 93
after the update the result is this
AD1 | BenutzerNr
54 | 31
384 | 99
93 | 112
Why not this?
AD1 | BenutzerNr
54 | 54
384 | 384
93 | 93
edit: UpdateAS400zuSellingBenutzer is also a View, but as far as I can see it includes only BenutzerNr and not AD1.
Firstly, you're never going to see your expected results in the view. Your UPDATE statement is effectively a DELETE statement (as far as the view is concerned). Rows only appear in the view if AD1 <> BenutzerNr, but you're setting them to be equal.
However, the documentation for updatable views states "Any modifications, including UPDATE, INSERT, and DELETE statements, must reference columns from only one base table." Your update statement references columns from more than one table.
https://msdn.microsoft.com/en-us/library/ms187956.aspx#Updatable Views
I'm not sure what you're trying to achieve here, but in my experience it's usually easier to issue the UPDATE statement against the base tables directly.
There were 2 bugs - Bug 1 View UpdateAS400zuSellingBenutzer2 had 2 results sometimes for one entry in CSDokument and Bug 2 There were 2 entries in Table AS400Auftrag and then it switched between these two entries. So it just looked like the SET switched the two entries but it was just by chance. Thanks for reading.

Counting number of occurences of tuples in an m:n relationship

I'd like to know if there's an efficient way to count the number of occurences of a permutation of entities from one side of the m:n relationship. Hopefully, the next example will illustrate properly what I mean:
Let's imagine a base with people and events of some sort. People can organize multiple events and events can be organized by more than one person. What i'd like to count is whether a certain tuple of people have already organized an event or if it's their first time. My first idea to do this is to add an attribute to the m:n relationship
PeopleID | EventID | TimesOrganized
100 1 1
200 1 1
300 2 1
400 3 1
Now, there's an event no. 4 that's again organized by persons 200 and 100 (let's say they should be added in that order). The new table should look like:
PeopleID | EventID | TimesOrganized
100 1 2
200 1 2
300 2 1
400 3 1
200 4 2
100 4 2
Now, if I added an event organized by persons 200 and 300 it would look like this:
PeopleID | EventID | TimesOrganized
100 1 2
200 1 2
300 2 1
400 3 1
200 4 2
100 4 2
200 5 1
300 5 1
How would I go about keeping the third column updated properly and what are my options?
I should also add that this a part of the larger project we have for one of the classes and we'll be implementing an application that uses the database in some way, so I might as well move this to application logic if there's no easy way.
I wouldn't recommend tracking a TimesOrganized column as you suggest.
You can simple query it as needed using a COUNT(EventId)..GROUP BY PeopleID.
If you do feel you need to maintain the value somewhere it probably is better normalized to the (presumed) table People. Something like People.TimesOrganized. But then you have to increment it as you go instead of just recalculating as needed.
If you want to count how many many time someone have organized an event the problem is not m:n, but 1:m. Just count the event grouped by the people, that's it, you don't really need to have that column in the table, if it's not needed a lot of time.
That said I find you table a little confusing, there are detail and aggregation mixed, the third one downright wrong: the PeopleID 200 had organized 3 event and the 300 have 2 event.

custom sorting or ordering a table without resorting the whole shebang

For ten years we've been using the same custom sorting on our tables, I'm wondering if there is another solution which involves fewer updates, especially since today we'd like to have a replication/publication date and wouldn't like to have our replication replicate unnecessary entries.I had a look into nested sets, but it doesn't seem to do the job for us.
Base table:
id | a_sort
---+-------
1 10
2 20
3 30
After inserting:
insert into table (a_sort) values(15)
An entry at the second position.
id | a_sort
---+-------
1 10
2 20
3 30
4 15
Ordering the table with:
select * from table order by a_sort
and resorting all the a_sort entries, updating at least id=(2,3,4)
will of course produce the desired output:
id | a_sort
---+-------
1 10
4 20
2 30
3 40
The column names, the column count, datatypes, a possible join, possible triggers or the way the resorting is done is/are irrelevant to the problem.Also we've found some pretty neat ways to do this task fast.
only; how the heck can we reduce the updates in the db to 1 or 2 max.
Seems like an awfully common problem.
The captain obvious in me thougth once "use an a_sort float(53), insert using a fixed value of ordervaluefirstentry+abs(ordervaluefirstentry-ordervaluenextentry)/2".
But this would only allow around 1040 "in between" entries - so never resorting seems a bit problematic ;)
You really didn't describe what you're doing with this data, so forgive me if this is a crazy idea for your situation:
You could make a sort of 'linked list' where instead of a column of values, you have a column for the 'next highest valued' id. This would decrease the number of updates to a maximum of 2.
You can make it doubly linked and also have a column for next lowest, which would bring the maximum number of updates to 3.
See:
http://en.wikipedia.org/wiki/Linked_list

Sparse data: efficient storage and retrieval in an RDBMS

I have a table representing values of source file metrics across project revisions, like the following:
Revision FileA FileB FileC FileD FileE ...
1 45 3 12 123 124
2 45 3 12 123 124
3 45 3 12 123 124
4 48 3 12 123 124
5 48 3 12 123 124
6 48 3 12 123 124
7 48 15 12 123 124
(The relational view of the above data is different. Each row contains the following columns: Revision, FileId, Value. The files and their revisions from which the data is calculated are stored in Subversion repositories, so we're trying to represent the repository's structure in a relational schema.)
There can be up to 23750 files in 10000 revisions (this is the case for the ImageMagick drawing program). As you can see, most values are the same between successive revisions, so the table's useful data is quite sparse. I am looking for a way to store the data that
avoids replication and uses space efficiently (currently the non-sparse representation requires 260 GB (data+index) for less than 10% of the data I want to store)
allows me to retrieve efficiently the values for a specific revision using an SQL query (without explicitly looping through revisions or files)
allows me to retrieve efficiently the revision for a specific metric value.
Ideally, the solution should not depend on a particular RDBMS and should be compatible with Hibernate. If this is not possible, I can live with using Hibernate, MySQL or PostgreSQL-specific features.
This is how I might model it. I've left out the Revisions table and Files table as those should be pretty self-explanatory.
CREATE TABLE Revision_Files
(
start_revision_number INT NOT NULL,
end_revision_number INT NOT NULL,
file_number INT NOT NULL,
value INT NOT NULL,
CONSTRAINT PK_Revision_Files PRIMARY KEY CLUSTERED (start_revision_number, file_number),
CONSTRAINT CHK_Revision_Files_start_before_end CHECK (start_revision_number <= end_revision_number)
)
GO
To get all of the values for files of a particular revision you could use the following query. Joining to the files table with an outer join would let you get those that have no defined value for that revision.
SELECT
REV.revision_number,
RF.file_number,
RF.value
FROM
Revisions REV
INNER JOIN Revision_Files RF ON
RF.start_revision_number <= REV.revision_number AND
RF.end_revision_number >= REV.revision_number
GO
Assuming that I understand correctly what you want in your third point, this will let you get all of the revisions for which a particular file has a certain value:
SELECT
REV.revision_number
FROM
Revision_Files RF
INNER JOIN Revisions REV ON
REV.revision_number BETWEEN RF.start_revision_number AND RF.end_revision_number
WHERE
RF.file_number = #file_number AND
RF.value = #value
GO