How to design a SQL table where a field has many descriptions

How to design a SQL table where a field has many descriptions - sql

I would like to create a product table. This product has unique part numbers. However, each part number has various number of previous part numbers, and various number of machines where the part can be used.
For example the description for part no: AA1007
Previous part no's: AA1001, AA1002, AA1004, AA1005,...
Machine brand: Bosch, Indesit, Samsun, HotPoint, Sharp,...
Machine Brand Models: Bosch A1, Bosch A2, Bosch A3, Indesit A1, Indesit A2,....
I would like to create a table for this, but I am not sure how to proceed. What I have been able to think is to create a table for Previous Part no, Machine Brand, Machine Brand Models individually.
Question: what is the proper way to design these tables?

There are of course various ways to design the tables. A very basic way would be:
You could create tables like below. I added the columns ValidFrom and ValidTill, to identify at which time a part was active/in use.
It depends on your data, if datatype date is enough, or you need datetime to make it more exactly.
CREATE TABLE Parts
(
ID bigint NOT NULL
,PartNo varchar(100)
,PartName varchar(100)
,ValidFrom date
,ValidTill date
)
CREATE TABLE Brands
(
ID bigint NOT NULL
,Brand varchar(100)
)
CREATE TABLE Models
(
ID bigint NOT NULL
,BrandsID bigint NOT NULL
,ModelName varchar(100)
)
CREATE TABLE ModelParts
(
ModelsID bigint NOT NULL
,PartID bigint NOT NULL
)
Fill your data like:
INSERT INTO Parts VALUES
(1,'AA1007', 'Screw HyperFuturistic', '2017-08-09', '9999-12-31'),
(1,'AA1001', 'Screw Iron', '1800-01-01', '1918-06-30'),
(1,'AA1002', 'Screw Steel', '1918-07-01', '1945-05-08'),
(1,'AA1004', 'Screw Titanium', '1945-05-09', '1983-10-05'),
(1,'AA1005', 'Screw Futurium', '1983-10-06', '2017-08-08')
INSERT INTO Brands VALUES
(1,'Bosch'),
(2,'Indesit'),
(3,'Samsung'),
(4,'HotPoint'),
(5,'Sharp')
INSERT INTO Models VALUES
(1,1,'A1'),
(2,1,'A2'),
(3,1,'A3'),
(4,2,'A1'),
(5,2,'A2')
INSERT INTO ModelParts VALUES
(1,1)
To select all parts of a certain date (in this case 2013-03-03) of the "Bosch A1":
DECLARE #ReportingDate date = '2013-03-03'
SELECT B.Brand
,M.ModelName
,P.PartNo
,P.PartName
,P.ValidFrom
,P.ValidTill
FROM Brands B
INNER JOIN Models M
ON M.BrandsID = B.ID
INNER JOIN ModelParts MP
ON MP.ModelsID = M.ID
INNER JOIN Parts P
ON P.ID = MP.PartID
WHERE B.Brand = 'Bosch'
AND M.ModelName = 'A1'
AND P.ValidFrom <= #ReportingDate
AND P.ValidTill >= #ReportingDate
Of course there a several ways to do an historization of data.
ValidFrom and ValidTill (ValidTo) is one of my favourites, as you can easily do historical reports.
Unfortunately you have to handle the historization: When inserting a new row - in example for your screw - you have to "close" the old record by setting the ValidTill column before inserting the new one. Furthermore you have to develop logic to handle deletes...
Well, thats a quite large topic. You will find tons of information in the world wide web.

For the part number table, you can consider the following suggestion:
id | part_no | time_created
1 | AA1007 | 2017-08-08
1 | AA1001 | 2017-07-01
1 | AA1002 | 2017-06-10
1 | AA1004 | 2017-03-15
1 | AA1005 | 2017-01-30
In other words, you can add a datetime column which versions each part number. Note that I added a primary key id column here, which is invariant over time and keeps track of each part, despite that the part number may change.
For time independent queries, you would join this table using the id column. However, the part number might also serve as a foreign key. Off the top of my head, if you were generating an invoice from a previous date, you might lookup the appropriate part number at that time, and then join out to one or more tables using that part number.
For the other tables you mentioned, I do not see a similar requirement.

Related

Replace specific part in my data

I would like to know most efficient and secure way to replace some numbers. In my table i have two columns: Nummer and Vater. In Nummer column i store articles numbers. The one with .1 at the end is the 'main' article and rest are his combinations (sometimes main article doesn't contain combinations), all of specific makes it as concrete product with combinations. Numbers consist of 3-parts separated by 3 dots (always). Vater for all of them is always main article number as shown below:
Example 1:
Nummer | Vater
-------------------------------
003.10TT032.1 | 003.10TT032.1
003.10TT032.2L | 003.10TT032.1
003.10TT032.UY | 003.10TT032.1
Nummer column = varchar
Vater column = varchar
I want to have possibility to change first 2 parts n.n
For example i want to say and send via sql query that i want to replace to: 9.4R53 Therefore based on our example the final results should be as follows:
Nummer | Vater
----------------------
9.4R53.1 | 9.4R53.1
9.4R53.2L | 9.4R53.1
9.4R53.UY | 9.4R53.1
Example 2:
Current:
Nummer | Vater
-------------------------------
12.90D.1 | 12.90D.1
12.90D.089 | 12.90D.1
12.90D.2 | 12.90D.1
Replace to: 829.12
Result should be:
Nummer | Vater
-------------------------------
829.12.1 | 829.12.1
829.12.089 | 829.12.1
829.12.2 | 829.12.1
I made queries as follows:
Example 1 query:
update temp SET Nummer = replace(Nummer, '003.10TT032.', '9.4R53.'),
Vater = replace(Vater, '003.10TT032.1', '9.4R53.1')
WHERE Vater = '003.10TT032.1'
Example 2 query:
update temp SET Nummer = replace(Nummer, '12.90D.', '829.12.'),
Vater = replace(Vater, '12.90D.1', '829.12.1')
WHERE Vater = '12.90D.1 '
In my database i have thousends of records therefore i want to be sure this query is fine and not having anything that could make potentially wrong results. Please of your advice whether can it be like this or not.
Therefore questions:
Is this query fine according to how my articles are stored? (want to avoid wrong replacments which could makes mess into production database data)
Is there better solution?

To answer your questions: yes, your solution works, and yes, there's something to make it bullet-proof. Make it bullet proof and reversible is what I suggest to do below. You will sleep better if you know you can
A. Answer any question of angry people who ask you "what did you do to my product table"
B. Know you can reverse any change you've made to this table (without restoring a backup), including mistakes of others (like wrong instructions).
So if you really have to be 100% confident of the output, I would not run it in one go. I suggest to prepare the queries in a separate table, then run the queries in a loop in dynamic SQL.
It is a little bit cumbersome but you can do it like this: create a dedicated table with the columns you need (like batch_id, insert_date, etc.) and a column named execute_query NVARCHAR(MAX).
Then load your table by running a select distinct of the section you need to replace in your source table (using CHARINDEX to locate the second dot - make your CHARINDEX start from the CHARINDEX of the first dot+1 to get the second dot).
In other words: you prepare all the queries (like the ones in your examples) one by one and store them in a table.
If you want to be totally safe, the update queries can include a where source table_id between n and n' (which you build with a GROUP BY on the source table). This will secure that you can track which records you have updated if you have to answer questions later.
Once this is done, you run a loop which executes each line one by one.
The advantage of this approach is to keep track of your changes - you can also prepare the rollback query at the same time as you prepare the update query. Then you know you can safely revert all the changes you have ever made to your product table.
Never truncate that table, it is your audit table. If someone ask you what you did to the product catalogue you can answer any question, even 5 years from now.

This is a separate answer to show how to split the product ID into separate sections - If you have to update sections of the product ID I think it is better t to store it in a separate columns:
DECLARE #ProductRef TABLE
(ID INT IDENTITY(1,1) PRIMARY KEY CLUSTERED,
SrcNummer VARCHAR(255), DisplayNummer VARCHAR(255), SrcVater VARCHAR(255), DisplayVater VARCHAR(255),
NummerSectionA VARCHAR(255), NummerSectionB VARCHAR(255), NummerSectionC VARCHAR(255),
VaterSectionA VARCHAR(255), VaterSectionB VARCHAR(255), VaterSectionC VARCHAR(255) )
INSERT INTO #ProductRef (SrcNummer, SrcVater ) VALUES ('003.10TT032.1','003.10TT032.1')
INSERT INTO #ProductRef (SrcNummer, SrcVater ) VALUES ('003.10TT032.2L','003.10TT032.1')
INSERT INTO #ProductRef (SrcNummer, SrcVater ) VALUES ('003.10TT032.UY','003.10TT032.1')
DECLARE #Separator CHAR(1)
SET #Separator = '.'
;WITH SeparatorPosition (ID, SrcNummer, NumFirstSeparator, NumSecondSeparator, SrcVater, VatFirstSeparator, VatSecondSeparator)
AS ( SELECT
ID,
SrcNummer,
CHARINDEX(#Separator,SrcNummer,0) AS NumFirstSeparator,
CHARINDEX(#Separator,SrcNummer, (CHARINDEX(#Separator,SrcNummer,0))+1 ) AS NumSecondSeparator,
SrcVater,
CHARINDEX(#Separator,SrcVater,0) AS VatFirstSeparator,
CHARINDEX(#Separator,SrcVater, (CHARINDEX(#Separator,SrcVater,0))+1 ) AS VatSecondSeparator
FROM #ProductRef )
UPDATE #ProductRef
SET
NummerSectionA = SUB.NummerSectionA , NummerSectionB = SUB.NummerSectionB , NummerSectionC = SUB.NummerSectionC ,
VaterSectionA = SUB.VaterSectionA , VaterSectionB = SUB.VaterSectionB , VaterSectionC = SUB.VaterSectionC
FROM #ProductRef T
JOIN
(
SELECT
t.ID,
t.SrcNummer,
SUBSTRING (t.SrcNummer,0,s.NumFirstSeparator) AS NummerSectionA,
SUBSTRING (t.SrcNummer,s.NumFirstSeparator+1,(s.NumSecondSeparator-s.NumFirstSeparator-1) ) AS NummerSectionB,
RIGHT (t.SrcNummer,(LEN(t.SrcNummer)-s.NumSecondSeparator)) AS NummerSectionC,
t.SrcVater,
SUBSTRING (t.SrcVater,0,s.NumFirstSeparator) AS VaterSectionA,
SUBSTRING (t.SrcVater,s.NumFirstSeparator+1,(s.NumSecondSeparator-s.NumFirstSeparator-1) ) AS VaterSectionB,
RIGHT (t.SrcVater,(LEN(t.SrcVater)-s.NumSecondSeparator)) AS VaterSectionC
FROM #ProductRef t
JOIN SeparatorPosition s
ON t.ID = s.ID
) SUB
ON T.ID = SUB.ID
Then you only work on the correct product ID section.

Get all missing values between two limits in SQL table column

I am trying to assign ID numbers to records that are being inserted into an SQL Server 2005 database table. Since these records can be deleted, I would like these records to be assigned the first available ID in the table. For example, if I have the table below, I would like the next record to be entered at ID 4 as it is the first available.
| ID | Data |
| 1 | ... |
| 2 | ... |
| 3 | ... |
| 5 | ... |
The way that I would prefer this to be done is to build up a list of available ID's via an SQL query. From there, I can do all the checks within the code of my application.
So, in summary, I would like an SQL query that retrieves all available ID's between 1 and 99999 from a specific table column.

First build a table of all N IDs.
declare #allPossibleIds table (id integer)
declare #currentId integer
select #currentId = 1
while #currentId < 1000000
begin
insert into #allPossibleIds
select #currentId
select #currentId = #currentId+1
end
Then, left join that table to your real table. You can select MIN if you want, or you could limit your allPossibleIDs to be less than the max table id
select a.id
from #allPossibleIds a
left outer join YourTable t
on a.id = t.Id
where t.id is null

Don't go for identity,
Let me give you an easy option while i work on a proper one.
Store int from 1-999999 in a table say Insert_sequence.
try to write an Sp for insertion,
You can easly identify the min value that is present in your Insert_sequence and not in
your main table, store this value in a variable and insert the row with ID from variable..
Regards
Ashutosh Arya

You could also loop through the keys. And when you hit an empty one Select it and exit Loop.
DECLARE #intStart INT, #loop bit
SET #intStart = 1
SET #loop = 1
WHILE (#loop = 1)
BEGIN
IF NOT EXISTS(SELECT [Key] FROM [Table] Where [Key] = #intStart)
BEGIN
SELECT #intStart as 'FreeKey'
SET #loop = 0
END
SET #intStart = #intStart + 1
END
GO
From there you can use the key as you please. Setting a #intStop to limit the loop field would be no problem.

Why do you need a table from 1..999999 all information you need is in your source table. Here is a query which give you minimal ID to insert in gaps.
It works for all combinations:
(2,3,4,5) - > 1
(1,2,3,5) - > 4
(1,2,3,4) - > 5
SQLFiddle demo
select min(t1.id)+1 from
(
select id from t
union
select 0
)
t1
left join t as t2 on t1.id=t2.id-1
where t2.id is null

Many people use an auto-incrementing integer or long value for the Primary Key of their tables, and it is often called ID or MyEntityID or something similar. This column, since it's just an auto-incrementing integer, often has nothing to do with the data being stored itself.
These types of "primary keys" are called surrogate keys. They have no meaning. Many people like these types of IDs to be sequential because it is "aesthetically pleasing", but this is a waste of time and resources. The database could care less about which IDs are being used and which are not.
I would highly suggest you forget trying to do this and just leave the ID column auto-increment. You should also create an index on your table that is made up of those (subset of) columns that can uniquely identify each record in the table (and even consider using this index as your primary key index). In rare cases where you would need to use all columns to accomplish that, that is where an auto-incrementing primary key ID is extremely useful—because it may not be performant to create an index over all columns in the table. Even so, the database engine could care less about this ID (e.g. which ones are in use, are not in use, etc.).
Also consider that an integer-based ID has a maximum total of 4.2 BILLION IDs. It is quite unlikely that you'll exhaust the supply of integer-based IDs in any short amount of time, which further bolsters the argument for why this sort of thing is a waste of time and resources.

What is wrong with this SQL statement for inserting concatenated IDS from a table into a field?

INSERT INTO Activity_Feed (userID,Type,DataIDs,PodID)
VALUES ( 1437
, 'eventattend'
, (SELECT LEFT(EventID, LEN(eventID) - 1 as nvarchar)
FROM (
SELECT EventiD + ', '
FROM events
FOR XML PATH ('')) c (EventID))
, 5)
Basically I want to take a bunch of IDs from a table and insert them as a comma delimited string into a varchar field.
E.g.
Activity_Feed (table)
activityID 1
userID 2
DataIDs 1,3,4,56,367 // This would be the (Select Ids FROM bit)
I want to take a bunch of RSVP IDs from a table and stick their IDs in the field...
To further explain I wanted to avoid normalizing the query because of the nature of this query. Let me know if I should still separate out the data...
The activity feed works like this...
I create a new entry in activity with a type of event_attending (which is an event I am attending.
I timestamp it, I enter the ID for the event in the dataIDs field any new activity matching event_attending in a 6 hour period fires an update record rather than insert
I keep the timestamp the same but update the ID's that are associated with that time period so basically update the IDs with the latest event attendance within that time period.
I thought normalizing seemed like overkill :D is there such thing as overkill with normalization?

Always normalize your database.
This is totally not wrong but very poor in database design.
Reasons why this is very poor:
hard to join tables
hard to find values
Why not create table like this,
Activity
ActivityID
ActivityName
Feed
FeedID
FeedName
Activity_Feed
FeedID
ActivityID
so Activity_Feed table will contain something like this
FeedID ActivityID
=====================
1 1
1 2
1 3
2 1
2 3
3 2
3 1
and you can now join the tables,
SELECT a.ActivityName, c.FeedName
FROM Activity a
INNER JOIN Activity_Feed b
ON a.ActivityID = b.ActivityID
INNER JOIN Feed c
ON b.FeedID = c.FeedID
-- WHERE a.ActivityName = 1 -- or something like this

Table Variables: Is There a Cleaner Way?

I have a table that stores various clients I have done work for, separated by Government and Commercial columns. The problem is that there could be an uneven number of clients in either column. When I do a SELECT, I end up with NULL values in irregular places because I can't think of a clean way to order the result set. For example, the following is possible with a straight SELECT (no ORDER BY):
Government | Commercial
DOD | IBM
DOS | Microsoft
| Novell
DVA | Oracle
As you can see, there is a NULL value in the Government column because there are more commercial clients than government. This could change at any time and there's no guarantee which column will have more values. To eliminate rendering with a blank value in the middle of a result set, I decided to perform two separate SELECTs into table variables (one for the Government clients and another for the Commercial) and then SELECT one final time, joining them back together:
DECLARE #Government TABLE
(
Row int,
PortfolioHistoryId uniqueidentifier,
Government varchar(40),
GovernmentPortfolioContentId uniqueidentifier
)
DECLARE #Commercial TABLE
(
Row int,
PortfolioHistoryId uniqueidentifier,
Commercial varchar(40),
CommercialPortfolioContentId uniqueidentifier
)
INSERT INTO #Government
SELECT
(ROW_NUMBER() OVER (ORDER BY Government)) AS Row,
PortfolioHistoryId,
Government,
GovernmentPortfolioContentId
FROM dbo.PortfolioHistory
WHERE Government IS NOT NULL
INSERT INTO #Commercial
SELECT
(ROW_NUMBER() OVER (ORDER BY Commercial)) AS Row,
PortfolioHistoryId,
Commercial,
CommercialPortfolioContentId
FROM dbo.PortfolioHistory
WHERE Commercial IS NOT NULL
SELECT
g.Government,
c.Commercial,
g.GovernmentPortfolioContentId,
c.CommercialPortfolioContentId
FROM #Government AS g
FULL OUTER JOIN #Commercial AS c ON c.Row = g.Row
I'm not necessarily unhappy with this query (maybe I should be), but is there a cleaner way to implement this?

From design point, I do not see why you need two tables, or even two columns Government/Commercial. You could just have table Clients with a classifier column for OrganizationType. For example:
DECLARE TABLE Customer (
ID int
,Name varchar(50)
,OrganizationType char(1)
,Phone varchar(12)
)
For OrganizationType you could use: G=gov, B=business/commercial, C=charity. When querying the table use OrganizationType in ORDER BY and GROUP BY.
If there are some specific columns for gov and business clients, then keep all common columns in the Customer table and move specific columns in separate sub-type Government and Commercial tables as in this example. In the example, book and magazine are types of publication -- in your example government and commercial are types of customer.

you must have a governID which will connect the goverment table to commercial table. When inserting a values in commercial table you must also insert a governID to where you want the commercial row to be under with.
Government Commercial
governID | NAME commID | com_govID | Name
1 BIR 1 1 Netopia
2 1 SM Mall
so if you query it.
SELECT
g.Government,
c.Commercial,
g.GovernmentPortfolioContentId,
c.CommercialPortfolioContentId
FROM #Government AS g
FULL INNER JOIN #Commercial AS c ON c.governID = com_govID

SELECT
Government,
Commercial,
GovernmentPortfolioContentId,
CommercialPortfolioContentId
FROM dbo.PortfolioHistory
ORDER BY
CASE WHEN Government is null THEN 2 ELSE 1 END,
CASE WHEN Commercial is null THEN 2 ELSE 1 END,
Government,
Commercial
Aside: In your variable tables, the "Row" column should be declared as PRIMARY KEY - to gain the advantages of clustered indexing.

generally speaking, rendering issues should be handled in the application layer. Do a straight SELECT, then process it into the format you want.
So:
SELECT DISTINCT client, client_type FROM clients table;
Then in your application layer (I'll use quick and dirty PHP):
foreach($row = $result->fetch_assoc()) {
if($row['client_type']=='gov') {
$gov[]=$row['client'];
} else {
$com[]=$row['client'];
}
}
$limit=(count($gov)>$count($com)) ? count($gov) : count($com);
echo '<table><tr><th>Gov</th><th>Com</th><tr>';
for($i=0; $i< $limit; $i++) {
echo "<tr><td>{$gov[$i]}</td><td>{$com[$i]}</td></tr>\n";
}
echo '</table>

Tricky SQL statement over 3 tables

I have 3 different transaction tables, which look very similar, but have slight differences. This comes from the fact that there are 3 different transaction types; depending on the transaction types the columns change, so to get them in 3NF I need to have them in separate tables (right?).
As an example:
t1:
date,user,amount
t2:
date,user,who,amount
t3:
date,user,what,amount
Now I need a query who is going to get me all transactions in each table for the same user, something like
select * from t1,t2,t3 where user='me';
(which of course doesn't work).
I am studying JOIN statements but haven't got around the right way to do this. Thanks.
EDIT: Actually I need then all of the columns from every table, not just the ones who are the same.
EDIT #2: Yeah,having transaction_type doesn't break 3NF, of course - so maybe my design is utterly wrong. Here is what really happens (it's an alternative currency system):
- Transactions are between users, like mutual credit. So units get swapped between users.
- Inventarizations are physical stuff brought into the system; a user gets units for this.
- Consumations are physical stuff consumed; a user has to pay units for this.
|--------------------------------------------------------------------------|
| type | transactions | inventarizations | consumations |
|--------------------------------------------------------------------------|
| columns | date | date | date |
| | creditor(FK user) | creditor(FK user) | |
| | debitor(FK user) | | debitor(FK user) |
| | service(FK service)| | |
| | | asset(FK asset) | asset(FK asset) |
| | amount | amount | amount |
| | | | price |
|--------------------------------------------------------------------------|
(Note that 'amount' is in different units;these are the entries and calculations are made on those amounts. Outside the scope to explain why, but these are the fields). So the question changes to "Can/should this be in one table or be multiple tables (as I have it for now)?"
I need the previously described SQL statement to display running balances.
(Should this now become a new question altogether or is that OK to EDIT?).
EDIT #3: As EDIT #2 actually transforms this to a new question, I also decided to post a new question. (I hope this is ok?).

You can supply defaults as constants in the select statements for columns where you have no data;
so
SELECT Date, User, Amount, 'NotApplicable' as Who, 'NotApplicable' as What from t1 where user = 'me'
UNION
SELECT Date, User, Amount, Who, 'NotApplicable' from t2 where user = 'me'
UNION
SELECT Date, User, Amount, 'NotApplicable', What from t3 where user = 'me'
which assumes that Who And What are string type columns. You could use Null as well, but some kind of placeholder is needed.
I think that placing your additional information in a separate table and keeping all transactions in a single table will work better for you though, unless there is some other detail I've missed.

I think the meat of your question is here:
depending on the transaction types the columns change, so to get them in 3NF I need to have them in separate tables (right?).
I'm no 3NF expert, but I would approach your schema a little differently (which might clear up your SQL a bit).
It looks like your data elements are as such: date, user, amount, who, and what. With that in mind, a more normalized schema might look something like this:
User
----
id, user info (username, etc)
Who
---
id, who info
What
----
id, what info
Transaction
-----------
id, date, amount, user_id, who_id, what_id
Your foreign key constraint verbiage will vary based on database implementation, but this is a little clearer (and extendable).

You should consider STI "architecture" (single table inheritance). I.e. put all different columns into one table, and put them all under one index.
In addition you may want to add indexes to other columns you're making selection.

What is the result schema going to look like? - If you only want the minimal columns that are in all 3 tables, then it's easy, you would just UNION the results:
SELECT Date, User, Amount from t1 where user = 'me'
UNION
SELECT Date, User, Amount from t2 where user = 'me'
UNION
SELECT Date, User, Amount from t3 where user = 'me'

Or you could 'SubClass' them
Create Table Transaction
(
TransactionId Integer Primary Key Not Null,
TransactionDateTime dateTime Not Null,
TransactionType Integer Not Null,
-- Othe columns all transactions Share
)
Create Table Type1Transactions
{
TransactionId Integer PrimaryKey Not Null,
// Type 1 specific columns
}
ALTER TABLE Type1Transactions WITH CHECK ADD CONSTRAINT
[FK_Type1Transaction_Transaction] FOREIGN KEY([TransactionId])
REFERENCES [Transaction] ([TransactionId])
Repeat for other types of transactions...

What about simply leaving the unnecessary columns null and adding a TransactionType column? This would result in a simple SELECT statement.

select *
from (
select user from t1
union
select user from t2
union
select user from t3
) u
left outer join t1 on u.user=t1.user
left outer join t2 on u.user=t2.user
left outer join t3 on u.user=t3.user

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas