Sql server dynamic sequences - sql

Here is a theoretical scenario,
Suppose I have a client table and an invoice table.
1 client can have many invoices.
Now I want each invoice to have an invoice number that is unique to that client
i.e.
ClientId InvoiceNo
1 IN0001
2 IN0001
2 IN0002
2 IN0003
3 IN0001
Currently I am controlling this in my application by looking at max values etc but this is obviously not a great solution. I would much rather get my database to do this for me, as it should remove the risk of creating duplicate invoice numbers for a single client (race condition?)
I have been reading up on Sql Server 2012's SEQUENCE which sounds great, but the problem is that I would still need a seperate sequence per client
i.e.
CREATE SEQUENCE InvoiceNum_Client1.....
CREATE SEQUENCE InvoiceNum_Client2.....
CREATE SEQUENCE InvoiceNum_Client3.....
but something feels very dirty about making db meta changes from my app everytime a new client is created. Also then my trigger would have to do something like this (which I wouldn't even begin to know how to do)
NEXT VALUE FOR InvoiceNum_Client + #ClientId
etc
So my next thought was to have a "sequence" table, i.e.
ClientID INSequenceNumber
1 1
2 3
3 3
And in my trigger grab the InSequenceNumber for a given client, use it to make my invoiceNumber, and then update the sequence table, incrementing InSequenceNumber by 1 for the same client. It should have the same effect, but I am just not sure about the inner workings of transactions and scope etc
So my questions are
Is there any big disadvantage to my self rolled sequence method?
Is there another solution that I am perhaps overlooking?
Thanks!

Is there a specific reason for requiring that the invoice numbers are only unique per client? In most ERP systems, invoice numbers are typically globally unique, making implementation much easier. No matter what, you should have an Invoice table that contains a primary key (and you shouldn't use composite primary keys - that's just downright bad data modelling).
This leaves us with a scenario where you might not need to store the "per-client-invoice-number" in the database at all. Assuming you have a table called "Invoices" containing the following data:
Id | ClientId
---------------
1 | 1
2 | 1
3 | 2
4 | 1
5 | 3
6 | 2
Here, Id is the Primary Key of the Invoices table, and ClientId is a foreign key. A query like this:
SELECT
ClientId,
'IN' + RIGHT('0000' +
CONVERT(VARCHAR, ROW_NUMBER() OVER (PARTITION BY ClientId
ORDER BY Id)) AS InvoiceNo,
Id
FROM Invoices
ORDER BY ClientId, InvoiceNo
Would return:
ClientId | InvoiceNo | Id
---------------------------
1 | IN0001 | 1
1 | IN0002 | 2
1 | IN0003 | 4
2 | IN0001 | 3
2 | IN0002 | 6
3 | IN0001 | 5

Why do the clients have to have their own sequences? Have a global sequence number. Then, if you want to get the client sequences in order, use row_number():
select i.*, row_number() over (partition by clientid order by invoiceno) as ClientInvoiceSequence
from invoices;
Note: you might want the order by to be by another field such as date.
If you start storing this information in the database, you will need to do a lot of bookkeeping and careful transaction management:
What happens when two invoices are entered "at the same time"?
What happens when an invoice is deleted?
What happens when an invoice is modified in such a way that it might change the sequence?
You are much better off using an identity column and calculating the sequence when you need it.

Related

Is there a way to insert a record in SQL server if it does not match the latest version of the record based on three of the columns?

Consider the following table named UserAttributes:
+----+--------+----------+-----------+
| Id | UserId | AttrName | AttrValue |
+----+--------+----------+-----------+
| 4 | 1 | FavFood | Apples |
| 3 | 2 | FavFood | Burgers |
| 2 | 1 | FavShape | Circle |
| 1 | 1 | FavFood | Chicken |
+----+--------+----------+-----------+
I would like to insert a new record in this table if the latest version of a particular attribute for a user has a value that does not match the latest.
What I mean by the latest is, for example, if I was to do:
SELECT TOP(1) * FROM [UserAttributes] WHERE [UserId] = 1 AND [AttrName] = 'FavFood' ORDER BY [Id] DESC
I will be able to see that user ID 1's current favorite food is "Apples".
Is there a query safe for concurrency that will only insert a new favorite food if it doesn't match the current favorite food for this user?
I tried using the MERGE query with a HOLDLOCK, but the problem is that WHEN MATCHED/WHEN NOT MATCHED, and that works if I never want to insert a new record after a user has previously set their favorite food (in this example) to the new value. However, it does not consider that a user might switch to a new favorite food, then subsequently change back to their old favorite food. I would like to maintain all the changes as a historical record.
In the data set above, I would like to insert a new record if the user ID 1's new favorite food is "Burgers", but I do not want to insert a record if their new favorite food is "Apples" (since that is their current favorite food). I would also like to make this operation safe for concurrency.
Thank you for your help!
EDIT: I should probably also mention that when I split this operation into two queries (ie: first select their current favorite food, then do an insert query only if there is a new food detected) it works under normal conditions. However, we are observing race conditions (and therefore duplicates) since (as you may have guessed) the data set above is simply an example and there are many threads operating on this table at the same time.
A bit ugly, but to do it in one command, you could insert the user's (new) favorite food but filter with an EXCEPT of their current values.
e.g., (assuming the user's new data is in #UserID, #FavFood
; WITH LatestFavFood AS
(SELECT TOP(1) UserID, AttrName, AttrValue
FROM [UserAttributes]
WHERE [UserId] = #UserID AND [AttrName] = 'FavFood'
ORDER BY [Id] DESC
)
INSERT INTO UserAttributes (UserID, AttrName, AttrValue)
SELECT #UserID, 'FavFood', #FavFood
EXCEPT
SELECT UserID, AttrName, AttrValue
FROM LatestFavFood
Here's a DB_Fiddle with three runs.
EDIT: I have changed the above to assume varchar types for AttrName rather than nvarchar. The fiddle has a mixture. Would be good to ensure you get them correct (especially food as it may have special characters).

optimizing child/parent structure in one table with a lot of data

I have a table which has a simple parent child structure
products:
- id
- product_id
- time_created
- ... a few other columns
It is a parent if product_id IS NULL. Product id behaves here like parent_id. Data inside looks like this:
id | product_id
1 NULL
2 1
3 1
4 NULL
4 4
This table is updated every night a new versions are added.
Every user is using a lot of these products but only one version. User is notified if new rows are added for an product_id.
He can stop using id:2 and start using id:3. An another user will continue using id:2 etc.
products table is updated every night and it grows pretty fast. There are around 500000 rows at the moment and every night adds around 20000, probably 5-7000000 changes (new rows) per year.
Is there a way to optimize this database/table structure? Should I change anything? Is it a problem to have so much data in one table?
Your question is not clear. The sample data is suggesting that the parent-child relationship is only one level deep. If so, this is not a particularly hard problem. You can create a query to look up the most recent product id for each product -- and I'm assuming this is the one with the maximum id:
select id, product_id,
max(id) over (partition by coalsesce(product_id, id)) as biggest_id
from table t;
This is then a lookup table, to get the biggest id. It would produce:
id | product_id | biggest_id
1 NULL 3
2 1 3
3 1 3
4 NULL 4
4 4 4
If your table has deeper hierarchies, you can solve the problem using recursive CTEs, or by doing the calculation when the table is updated.

Multiple records in a table matched with a column

The architecture of my DB involves records in a Tags table. Each record in the Tags table has a string which is a Name and a foreign kery to the PrimaryID's of records in another Worker table.
Records in the Worker table have tags. Every time we create a Tag for a worker, we add a new row in the Tags table with the inputted Name and foreign key to the worker's PrimaryID. Therefore, we can have multiple Tags with different names per same worker.
Worker Table
ID | Worker Name | Other Information
__________________________________________________________________
1 | Worker1 | ..........................
2 | Worker2 | ..........................
3 | Worker3 | ..........................
4 | Worker4 | ..........................
Tags Table
ID |Foreign Key(WorkerID) | Name
__________________________________________________________________
1 | 1 | foo
2 | 1 | bar
3 | 2 | foo
5 | 3 | foo
6 | 3 | bar
7 | 3 | baz
8 | 1 | qux
My goal is to filter WorkerID's based on an inputted table of strings. I want to get the set of WorkerID's that have the same tags as the inputted ones. For example, if the inputted strings are foo and bar, I would like to return WorkerID's 1 and 3. Any idea how to do this? I was thinking something to do with GROUP BY or JOINING tables. I am new to SQL and can't seem to figure it out.
This is a variant of relational division. Here's one attempt:
select workerid
from tags
where name in ('foo', 'bar')
group by workerid
having count(distinct name) = 2
You can use the following:
select WorkerID
from tags where name in ('foo', 'bar')
group by WorkerID
having count(*) = 2
and this will retrieve your desired result/
Regards.
This article is an excellent resource on the subject.
While the answer from #Lennart works fine in Query Analyzer, you're not going to be able to duplicate that in a stored procedure or from a consuming application without opening yourself up to SQL injection attacks. To extend the solution, you'll want to look into passing your list of tags as a table-valued parameter since SQL doesn't support arrays.
Essentially, you create a custom type in the database that mimics a table with only one column:
CREATE TYPE list_of_tags AS TABLE (t varchar(50) NOT NULL PRIMARY KEY)
Then you populate an instance of that type in memory:
DECLARE #mylist list_of_tags
INSERT #mylist (t) VALUES('foo'),('bar')
Then you can select against that as a join using the GROUP BY/HAVING described in the previous answers:
select workerid
from tags inner join #mylist on tag = t
group by workerid
having count(distinct name) = 2
*Note: I'm not at a computer where I can test the query. If someone sees a flaw in my query, please let me know and I'll happily correct it and thank them.

Tricky SQL statement over 3 tables

I have 3 different transaction tables, which look very similar, but have slight differences. This comes from the fact that there are 3 different transaction types; depending on the transaction types the columns change, so to get them in 3NF I need to have them in separate tables (right?).
As an example:
t1:
date,user,amount
t2:
date,user,who,amount
t3:
date,user,what,amount
Now I need a query who is going to get me all transactions in each table for the same user, something like
select * from t1,t2,t3 where user='me';
(which of course doesn't work).
I am studying JOIN statements but haven't got around the right way to do this. Thanks.
EDIT: Actually I need then all of the columns from every table, not just the ones who are the same.
EDIT #2: Yeah,having transaction_type doesn't break 3NF, of course - so maybe my design is utterly wrong. Here is what really happens (it's an alternative currency system):
- Transactions are between users, like mutual credit. So units get swapped between users.
- Inventarizations are physical stuff brought into the system; a user gets units for this.
- Consumations are physical stuff consumed; a user has to pay units for this.
|--------------------------------------------------------------------------|
| type | transactions | inventarizations | consumations |
|--------------------------------------------------------------------------|
| columns | date | date | date |
| | creditor(FK user) | creditor(FK user) | |
| | debitor(FK user) | | debitor(FK user) |
| | service(FK service)| | |
| | | asset(FK asset) | asset(FK asset) |
| | amount | amount | amount |
| | | | price |
|--------------------------------------------------------------------------|
(Note that 'amount' is in different units;these are the entries and calculations are made on those amounts. Outside the scope to explain why, but these are the fields). So the question changes to "Can/should this be in one table or be multiple tables (as I have it for now)?"
I need the previously described SQL statement to display running balances.
(Should this now become a new question altogether or is that OK to EDIT?).
EDIT #3: As EDIT #2 actually transforms this to a new question, I also decided to post a new question. (I hope this is ok?).
You can supply defaults as constants in the select statements for columns where you have no data;
so
SELECT Date, User, Amount, 'NotApplicable' as Who, 'NotApplicable' as What from t1 where user = 'me'
UNION
SELECT Date, User, Amount, Who, 'NotApplicable' from t2 where user = 'me'
UNION
SELECT Date, User, Amount, 'NotApplicable', What from t3 where user = 'me'
which assumes that Who And What are string type columns. You could use Null as well, but some kind of placeholder is needed.
I think that placing your additional information in a separate table and keeping all transactions in a single table will work better for you though, unless there is some other detail I've missed.
I think the meat of your question is here:
depending on the transaction types the columns change, so to get them in 3NF I need to have them in separate tables (right?).
I'm no 3NF expert, but I would approach your schema a little differently (which might clear up your SQL a bit).
It looks like your data elements are as such: date, user, amount, who, and what. With that in mind, a more normalized schema might look something like this:
User
----
id, user info (username, etc)
Who
---
id, who info
What
----
id, what info
Transaction
-----------
id, date, amount, user_id, who_id, what_id
Your foreign key constraint verbiage will vary based on database implementation, but this is a little clearer (and extendable).
You should consider STI "architecture" (single table inheritance). I.e. put all different columns into one table, and put them all under one index.
In addition you may want to add indexes to other columns you're making selection.
What is the result schema going to look like? - If you only want the minimal columns that are in all 3 tables, then it's easy, you would just UNION the results:
SELECT Date, User, Amount from t1 where user = 'me'
UNION
SELECT Date, User, Amount from t2 where user = 'me'
UNION
SELECT Date, User, Amount from t3 where user = 'me'
Or you could 'SubClass' them
Create Table Transaction
(
TransactionId Integer Primary Key Not Null,
TransactionDateTime dateTime Not Null,
TransactionType Integer Not Null,
-- Othe columns all transactions Share
)
Create Table Type1Transactions
{
TransactionId Integer PrimaryKey Not Null,
// Type 1 specific columns
}
ALTER TABLE Type1Transactions WITH CHECK ADD CONSTRAINT
[FK_Type1Transaction_Transaction] FOREIGN KEY([TransactionId])
REFERENCES [Transaction] ([TransactionId])
Repeat for other types of transactions...
What about simply leaving the unnecessary columns null and adding a TransactionType column? This would result in a simple SELECT statement.
select *
from (
select user from t1
union
select user from t2
union
select user from t3
) u
left outer join t1 on u.user=t1.user
left outer join t2 on u.user=t2.user
left outer join t3 on u.user=t3.user

Database structure for items with varying attributes

I am developing a clothes web application and would appreciate advice on how to structure the data in my mysql database.
Every product (item of clothing) will be photograped in a number of ways, let's call them 'modes'. For example a shirt would be photographed buttoned or unbuttoned, and/or tucked in/not tucked in. A pair of trousers would have a different set of possible attributes. I want to store information on the way these items are photographed so I can later use that information to display the item of clothing in particular way.
So one method would be just to store all the possible attributes in a single table, something like:
productId (FK,PK)
modeId (PK)
isLoose
isTuckedIn
Size
HasSmthUnderneath
Where the attributes could be a value or a code defined in another table or NULL if it does not apply to a particular mode.
Then given a particular productId and modeId, I imagine I could filter out the NULL values for attributes which do not apply and use only the relevant ones.
However, I am not sure if that is the ideal way to store this kind of values as I would have alot of NULL values, for example in a pair of trousers which are only photographed in one way. I've heard of the EAV model, is this appropriate?
It's probably worth noting that the number of attributes will be decided by me and not the user and should not change considerably; and that my end goal is to extract the attributes of a particular mode so I can use that data in my application.
Sorry if anything is unclear!
I would be tempted to have the following normalized schema design
Mode Table
id | mode_style
---------------
1 | buttoned
2 | unbuttoned
3 | tucked in
4 | untucked
Clothes Table
id | name | description
----------------------------
1 | shirt | mans shirt...
2 | dress | short sleeve
Clothes_mm_Mode Table (Junction/Map table)
mode_id | clothes_id
--------------------
1 | 1
1 | 2
3 | 3
Then you can easily query those clothes that have an unbuttoned display
SELECT
c.id,
c.name,
c.description
FROM
Clothes c
INNER JOIN
Clothes_Mode cm
ON c.id = cm.clothes_id
WHERE
cm.mode_id = 2
If certain types of clothes are always displayed in the same way i.e. all shirts always have a buttoned and unbuttoned display, you could take out the Clothes_mm_Mode Table and introduce a Common Mode table that maps Modes to a Common Mode id
Common_Modes Table
id | name | description
--------------------------------------------------
1 | Men's Shirt | Common Modes for a Mens shirt
2 | Women's Shirt | Common Modes for a Womens shirt
Common_Modes_mm_Mode Table (Junction/Map table)
common_mode_id | mode_id
--------------------------------------------------
1 | 1
1 | 2
2 | 1
2 | 2
and then associate each item of Clothing with a Common Mode type
Clothing_Common_Modes Table
clothing_id | common_mode_id
----------------------------
1 | 1
The advantage of this design would be that when adding a new item of clothing, only one record need be entered into the Common Modes table to associate that item of clothing with the Modes common to the clothing type. Of course this could be handled without a common modes table by having a procedure that inserts the appropriate records into the original Clothes_mm_Mode Table for a new item of clothing, but by having the relationship in the database, it will be more prominent, visible and easier to maintain.
I think your design is fine. It would be possible to apply database normalization to it, which may give you the following designs alternatively:
have one table per property, each with (id, propvalue) pairs. Only add rows into these tables for items where the property actually applies.
have generic tables (id, propname, propvalue), perhaps one such table per property datatype (boolean, number, string).
With your description, I feel that either is overkill. The only exception would be cases were properties are multi-valued (e.g. list of available colors)
I personally think plain old key/value pairs for this type of thing are underrated, so if you're happy to control it more in the application itself you could also do something like this:
create table ProductStates
(
ProductId int PK
ModeState nvarchar(200) PK
)
Nice and simple in my mind. You get no redundant null values; if the product has that mode then there's a row, if not there's no row. Also means no schema changes required if there's a new state. If you wanted to you could have ModeState instead link out to a ModeStates lookup table, if you think integrity is going to be a problem.
create table ProductStates
(
ProductId int PK
ModeStateId int PK
)
create table ModeStates
(
ModeStateId int PK
ModeStateDescription nvarchar(500)
(...whatever else you might need here)
)
... though that's probably redundant.
Just an alternative, not sure if I'd do it that way myself (depends on the brief(s)). Did I get the specification right?