Versioning Lookup Tables - sql

I'm creating some enterprise master lookup tables in SQL Server 2008r2. I'm required to keep a table specific to a domain. For example Gender would be a table and City could be another table. In each table I keep Start and End dates to keep history.
Now I'm being asked to version these so you would know something like V1 had these codes and V2 had these updated codes. Thinking about one lookup table is pretty straight forward. Using Gender for example you could have a Gender table, a GenderVersion table and a GenderToGenderVersion linking table. This would require 2 extra tables per lookup. With just a few tables that is manageable but I’m hoping to get an idea for a better way to do this with many lookup tables
A few requirements.
As stated, lookup tables must be domain specific. I cannot combine
them.
Junior level programmers must be able to build a select
statement to find all the look values based on a specific version.
Any ideas? I hate the idea of having 3x the required tables.
[Edit]
Additional Details
We need to be able to determine exactly what codes were in a particular version.
We need codes that are not edited or removed from a previous version to be part of the next version.
We do maintain start and end dates in the codes. The version labels are in addition to the start and end dates.
Here is a sample model using Race. This is with the
RaceVersion table
RaceVersionID
VersionLabel
StartDate
EndDate
RaceToRaceVersion
RaceID
RaceVersionID
Race table
RaceID (used just for the master code DB)
RaceCode (used for all other systems that use these codes)
RaceDescription
StartDate
EndDate
Both the code and the version get date searchable effective dates.
You can find an version and then relate to the code
Whenever there is a change to the codes, in this case Race, you can add a new version then find all the rows that are currently effective and populate the linking table.
This works well for a few sets of codes. I'm hoping for functionality like this that scales better.

You could avoid multiple tables if you added a version column to each lookup table. Each time a new version of the data is entered, increment the version. When selecting data from the lookup, you'll always have to specify the correct version.
id | version | name
---------------------------------
1 | 1 | Lookup Name for V1
1 | 2 | Lookup Name for V2
SELECT name
FROM lookup
WHERE id = 1
AND version = 2
If you want to easily select the latest version, you'll need to create a view:
SELECT id, name
FROM lookup lu
WHERE version = (
SELECT MAX(version)
FROM lookup lu2
WHERE lu.id = lu2.id
)
However, this view would have a performance cost that may outweigh the cost of maintaining more verbose select statements that include the version every time.

The first thing that popped into my mind is a versionId field in all these tables and a version table describing what each versionId means. Same number of tables, more records in each.
Edit starts here
The second thing that popped into my head was to also add an ApplicationId and table. That way not all applications have to be on the same version.

Related

SQL Best way to return data from one table along with mapped data from another table

I have the following problem.
I have a table Entries that contains 2 columns:
EntryID - unique identifier
Name - some name
I have another EntriesMapping table (many to many mapping table) that contains 2 columns :
EntryID that refers to the EntryID of the Entries table
PartID that refers to a PartID in a seprate Parts table.
I need to write a SP that will return all data from Entries table, but for each row in the Entries table I want to provide a list of all PartID's that are registered in the EntriesMapping table.
My question is how do I best approach the deisgn of the solution to this, given that the results of the SP would regularly be processed by an app so performance is quite important.
1.
Do I write a SP that will select multiple rows per entry - where if there are more than one PartID's registered for a given entry - I will return multiple rows each having the same EntryID and Name but different PartID's
OR
2.
Do I write a SP that will select 1 row per entry in the Entries table, and have a field that is a string/xml/json that contains all the different PartID's.
OR
3. There is some other solution that I am not thinking of?
Solution 1 seems to me to be the better way to go, but I will be passing lots of repeating data.
Solution 2 wont pass extra data, but the string/json/xml would need to be processed additionally, resuling in larger cpu time per item.
PS: I feel like this is quite a common problem to solve, but I was unable to find any resource that can provide common solutions or some pros/cons to different approaches.
I think you need simple JOIN:
SELECT e.EntryId, e.Name, em.PartId
FROM Entries e
JOIN EntriesMapping em ON e.EntryId = em.EntryId
This will return what you want, no need for stored procedure for that.

How to force ID column to remain sequential even if a recored has been deleted, in SQL server?

I don't know what is the best wording of the question, but I have a table that has 2 columns: ID and NAME.
when I delete a record from the table the related ID field deleted with it and then the sequence spoils.
take this example:
if I deleted row number 2, the sequence of ID column will be: 1,3,4
How to make it: 1,2,3
ID's are meant to be unique for a reason. Consider this scenario:
**Customers**
id value
1 John
2 Jackie
**Accounts**
id customer_id balance
1 1 $500
2 2 $1000
In the case of a relational database, say you were to delete "John" from the database. Now Jackie would take on the customer_id of 1. When Jackie goes in to check here balance, she will now show $500 short.
Granted, you could go through and update all of her other records, but A) this would be a massive pain in the ass. B) It would be very easy to make mistakes, especially in a large database.
Ids (primary keys in this case) are meant to be the rock that holds your relational database together, and you should always be able to rely on that value regardless of the table.
As JohnFx pointed out, should you want a value that shows the order of the user, consider using a built in function when querying.
In SQL Server identity columns are not guaranteed to be sequential. You can use the ROW_NUMBER function to generate a sequential list of ids when you query the data from the database:
SELECT
ROW_NUMBER() OVER (ORDER BY Id) AS SequentialId,
Id As UniqueId,
Name
FROM dbo.Details
If you want sequential numbers don't store them in the database. That is just a maintenance nightmare, and I really can't think of a very good reason you'd even want to bother.
Just generate them dynamically using tSQL's RowNumber function when you query the data.
The whole point of an Identity column is creating a reliable identifier that you can count on pointing to that row in the DB. If you shift them around you undermine the main reason you WANT an ID.
In a real world example, how would you feel if the IRS wanted to change your SSN every week so they could keep the Social Security Numbers sequential after people died off?

Turn two database tables into one?

I am having a bit of trouble when modelling a relational database to an inventory managament system. For now, it only has 3 simple tables:
Product
ID | Name | Price
Receivings
ID | Date | Quantity | Product_ID (FK)
Sales
ID | Date | Quantity | Product_ID (FK)
As Receivings and Sales are identical, I was considering a different approach:
Product
ID | Name | Price
Receivings_Sales (the name doesn't matter)
ID | Date | Quantity | Type | Product_ID (FK)
The column type would identify if it was receiving or sale.
Can anyone help me choose the best option, pointing out the advantages and disadvantages of either approach?
The first one seems reasonable because I am thinking in a ORM way.
Thanks!
Personally I prefer the first option, that is, separate tables for Sales and Receiving.
The two biggest disadvantage in option number 2 or merging two tables into one are:
1) Inflexibility
2) Unnecessary filtering when use
First on inflexibility. If your requirements expanded (or you just simply overlooked it) then you will have to break up your schema or you will end up with unnormalized tables. For example let's say your sales would now include the Sales Clerk/Person that did the sales transaction so obviously it has nothing to do with 'Receiving'. And what if you do Retail or Wholesale sales how would you accommodate that in your merged tables? How about discounts or promos? Now, I am identifying the obvious here. Now, let's go to Receiving. What if we want to tie up our receiving to our Purchase Order? Obviously, purchase order details like P.O. Number, P.O. Date, Supplier Name etc would not be under Sales but obviously related more to Receiving.
Secondly, on unnecessary filtering when use. If you have merged tables and you want only to use the Sales (or Receving) portion of the table then you have to filter out the Receiving portion either by your back-end or your front-end program. Whereas if it a separate table you have just to deal with one table at a time.
Additionally, you mentioned ORM, the first option would best fit to that endeavour because obviously an object or entity for that matter should be distinct from other entity/object.
If the tables really are and always will be identical (and I have my doubts), then name the unified table something more generic, like "InventoryTransaction", and then use negative numbers for one of the transaction types: probably sales, since that would correctly mark your inventory in terms of keeping track of stock on hand.
The fact that headings are the same is irrelevant. Seeking to use a single table because headings are the same is misconceived.
-- person [source] loves person [target]
LOVES(source,target)
-- person [source] hates person [target]
HATES(source,target)
Every base table has a corresponding predicate aka fill-in-the-[named-]blanks statement describing the application situation. A base table holds the rows that make a true statement.
Every query expression combines base table names via JOIN, UNION, SELECT, EXCEPT, WHERE condition, etc and has a corresponding predicate that combines base table predicates via (respectively) AND, OR, EXISTS, AND NOT, AND condition, etc. A query result holds the rows that make a true statement.
Such a set of predicate-satisfying rows is a relation. There is no other reason to put rows in a table.
(The other answers here address, as they must, proposals for and consequences of the predicate that your one table could have. But if you didn't propose the table because of its predicate, why did you propose it at all? The answer is, since not for the predicate, for no good reason.)

Query - If more than one record ID# (non primary key) matches, use later date or larger primary key

I built a MS Access Database that takes a survey to create a custom report. The survey application that was used does not give us the reports we need. I usually grab the data (excel) and import it in access and build report the way we need them.
For this first time, we have people redoing the survey because they are updating something or they forgot to add something. I need to be able to grab the most recent surveys data so we don't get a duplicate when we run the report. (My main report is composed of several subreports. Some subreports will not visible if null, and any questions not answered are hidden and shrinked to prevent bulky reports with unnecessary whitespace.)
record ID (PK) | FName | LName | IDNum | Completed
1 | Bob | Smith | 57 | 3/31/2013 5:00pm
2 | Bob | Smith | 57 | 3/31/2013 7:00pm
I want record ID 2 or the one that was completed at 7pm.
The queries and reports are already completed so i have been trying to find a way to add a line of code in the criteria line of my query to grab the most recent record if the IDnum matches with more than one record.
I have been trying to find the best way to do it for the past several hours. I don't think that having my table be modified to 'table without duplicates' as after the database is complete, someone less technical will be using it. All they are going to do is import a new excel file to overwrite the table and the queries do everything to build the report. I don't want to manually delete the duplicate records either.
I know I need to do something along the lines with
IIF(count(IDNum)>1, *something, *something)
*I get stuck on the true and false part. How do i tell access that it needs to check within the table again to find the record with the larger primary key?
I thought this was going to be easy but i guess i was wrong. lol
I am fairly new at MS Access so I know I am not using the full potential and i might be going at this at the wrong angle. Any advice would be appreciated greatly.
I'm a student going into Info Systems, so i would really like to learn how to do this.
I believe the query you are looking for is
SELECT t1.*
FROM YourTable t1 INNER JOIN
(SELECT IDNum, MAX(Completed) AS MaxOfCompleted
FROM YourTable GROUP BY IDNum
) t2
ON t1.IDNum = t2.IDNum AND t1.Completed = t2.MaxOfCompleted;
When you are using an if function it should be iif not iff.
I'd recommend a correlated subquery, such as the following:
SELECT
Data.RecordID
, Data.FName
, Data.LName
, Data.IDNum
, Data.Completed
FROM
Data
WHERE
Data.Completed IN
(
SELECT TOP 1
DataSQ.Completed
FROM
Data as DataSQ
WHERE
DataSQ.IDNum = Data.IDNum
GROUP BY
DataSQ.Completed
ORDER BY
DataSQ.Completed DESC
)
GROUP BY
Data.RecordID
, Data.FName
, Data.LName
, Data.IDNum
, Data.Completed
;
Explanation
Instead of using a function such as Max or IIF, you can embed another SELECT query within the WHERE clause of your main query. The nested query is used to determine the most recent Completed date for every IDNum. Unlike selecting the most recent survey directly from your table with SELECT TOP 1 + ORDER BY, which would only return one record, the WHERE clause in your nested query refers back to the main query and produces a result for each IDNum. This is known as the Top N per Group pattern, and I've found it to be very useful. Note that in the nested query you will need to use a table name alias so that Access will be able to differentiate between the two queries.
Also, I'd generally recommend against trying to use a table PK to perform sorts. There are many cases when the PK order value will not be a good indicator of the values of related fields.
This code worked when tested on dummy data - best of luck!

Search across Columns and replace text

I have an Access database of information where I need to replace text that may reside in 1 of 10 columns. I have a number of different requests for find and replace that need to be done. I need to do this twice a day.
These are the details. We receive a download of data twice a day that has course information in it. A record can have 10 courses in it. Some of the courses need to be combined. For instance
Course 1 is 12345, there are 2 other courses that are the same and therefore course 2(01234), Course 3(34566) all need to be changed to 12345. I also need to combine other course in a similar fashion, since I need to do this twice a day, ideally I would like to have a table with just columns of find and replace and use it to pick up the changes and reference it in my sql code.
An easy way to do this is the key!
Have you considered a cross reference table of something like
Table1
MCourse Subcourse
12345 2(01234)
12345 3(34566)
Then you can do updates like
Set mainTable.Desiredfield = Table1.Mcourse
where desiredfield = subcourse
Or you can create a query that uses the cross reference table to select the desired value and make a new table from that.