Basic SQL Insert statement approach - sql

Given that I have two tables
Customer (id int, username varchar)
Order (customer_id int, order_date datetime)
Now I want to insert into Order table based on customer information which is available in Customer table.
There are a couple of ways I can approch this problem.
First - I can query the customer information into a variable and then use it in an INSERT statement.
DECLARE #Customer_ID int
SELECT #Customer_ID = id FROM Customer where username = 'john.smith'
INSERT INTO Orders (customer_id, order_date) VALUES (#Customer_ID, GETDATE())
Second Approach is to use a combination of INSERT and SELECT query.
INSERT INTO Orders (customer_id, order_date)
SELECT id, GETDATE() FROM Customers
WHERE username = 'john.smith'
So my question is that which is a better way to proceed in terms of speed and overhead and why ? I know if we have a lot of information getting queried from Customer table then the second approach is much better.
p.s. I was asked this question in one of the technical interviews.

The second approach is better.
The first approach will fail if the customer is not found. No check is being done to make sure the customer id has been returned.
The second approach will do nothing if the customer is not found.
From an overhead approach why create variables if they are not needed. Set based sql is usually the better approach.

In a typical real-world order-entry system, the user has already looked the Customer up via a Search interface, or has chosen the customer from a list of customers displayed alphabetically; so your client program, when it goes to insert an order for that customer, already knows the CustomerID.
Furthermore, the order date is typically defaulted to getdate() as part of the ORDERS table definition, and your query can usually ignore that column.
But to handle multiple line items on an order, your insert into ORDER_HEADER needs to return the order header id so that it can be inserted into the ORDER DETAIL line item(s) child rows.

I don't recommend either approach. Why do you have the customer name and not the id in the first place? Don't you have a user interface that maintains a reference to the current customer by holding the ID in its state? Doing the lookup by name exposes you to potentially selecting the wrong customer.
If you must do this for reasons unknown to me, the 2nd approach is certainly more efficient because it only contains one statement.

Make the customer id in order table a foreign key which refers to customer table.

Related

Manually inserting record into linked table MS ACCESS

New to VBA and Access so hope I can explain this correctly.
I have two tables , Orders and Deliveries.
Orders include OrderNo, CustomerName, CustomerAddress, CustomerContact and so on.
Deliveries include DeliveryNo, OrderNo, DeliveryDate, DeliveryType and so on.
I have created a relationship between these two tables linking them between OrderNo as I require the CustomerName when creating a Delivery from an Order.
However sometimes a Delivery is manually inserted into the table using a form. A CustomerName is required for this record but there is now no corresponding OrderNo.
I am not sure how to set up my table to accommodate manual entries.
Appreciate the help, thanks
It sounds like the business requirement is to be able to insert deliveries into the delivery table without a corresponding order number. If this is the case, you need to relax the constraints on the table and remove relationship between Delivery.OrderNo and the Orders table. Otherwise, you could populate this field with a special number (0 or -1) to indicate no order in this circumstance if you need to enforce the foreign key relationship for other reasons. It all depends on how you wish to enforce the business logic.
What if you just added a CustomerName field to the form?
For what I understood of your question, you use the Orders table to generate the Deliveries, so for each order in the Orders table you would add/generate a record to the Deliveries table. Now, when you add a delivery manually(using the form) I suppose there is no order associated with it, hence no Customer Name. In that case the only solution would be to add a field called CustomerName to the mentioned form (I suppose the person who is manually inserting the Delivery knows the name of the customer) and any other field you need, and if your business logic requires it you can create a "virtual" Order with the same OrderNo as the inserted delivery.

Why Is SQL Trigger Not Inserting Rows In Sequential Order?

Recently I inherited a new ASP web application that merely allows customers to pay their outstanding invoices online. The application was poorly designed and did not have a payment history table. The entire payments table was deleted by the web service that transports the payment records to the accounting system of record.
I just created a simple trigger on the Payments table that simply copies the data from the Payments table into a Payment_Log table. Initially, the trigger just did a select * on inserted to copy the data. However, I just modified the trigger to insert the date of the payment into the Payment_Log table since one of our customers is having some issues that I need to debug. The new trigger is below. My question is that I have noticed that with this new version of the trigger, the rows are being inserted into the middle of the table (i.e. not at the end). Can someone explain why this is happening?
ALTER trigger [dbo].[PaymentHistory] on [dbo].[Payments]
for insert as
Declare #InvoiceNo nvarchar(255),
#CustomerName nvarchar(255),
#PaymentAmount float,
#PaymentRefNumber nvarchar(255),
#BulkPaid bit,
#PaymentType nvarchar(255),
#PaymentDate datetime
Select #InvoiceNo = InvoiceNo,
#CustomerName = CustomerName,
#PaymentAmount = PaymentAmount,
#PaymentRefNumber = PaymentRefNumber,
#BulkPaid = BulkPaid,
#PaymentType = PaymentType
from inserted
Set #PaymentDate = GETDATE()
Insert into Payment_Log
values (#InvoiceNo, #CustomerName, #PaymentAmount, #PaymentRefNumber, #BulkPaid, #PaymentType, #PaymentDate)
Below is a screenshot of SQL Server Management Studio that shows the rows being inserted into the middle of the table data. Thanks in advance for the help guys.
Datasets don't have an order. This means that SELECT * FROM x can return the results in a different order every time.
The only time that data is guaranteed to come back in the same order is when you specify an ORDER BY clause.
That said, there are circumstances that make the data normally come back in a certain order. The most visible one is with a clustered index.
This makes me wonder if the two tables have a Primary Key or not. Check all the indexes on each table and, at the very least, enforce a Primary Key.
As an aside, triggers in SQL Server are not fired for each row, but each batch. This can mean that the inserted table can contain more than just one row. (For example, when bulk inserting test data, or re-loading a large batch of transactions.)
For this reason, copying the data into variable is not a standard practice. Instead, you could just do the following...
ALTER trigger [dbo].[PaymentHistory] on [dbo].[Payments]
for insert as
INSERT INTO
Payment_Log
SELECT
InvoiceNo, CustomerName, PaymentAmount, PaymentRefNumber,
BulkPaid, PaymentType, GetDate()
FROM
inserted
Ok This question has the same answer as why do rows come back in different orders when I do not use an order by clause in my SQL query. If you ask SQL to process rows in any way it will process them in the fastest way it can, cashed rows first, those nearest the read head on the hard drive next and finally the rest.
Put it another way: you would be annoyed if queries took 10 times longer with rows in order than just on a first come first served basis. SQL does what you ask as quickly as it can
Hope this helps

Best way to implement SQL Many-to-Many joins

I have three tables, and their relevant columns are:
tPerson
-> PersonID
tPersonStatusHistory
-> PersonStatusHistoryID
-> PersonID
-> StatusID
-> PersonStatusDate
Status
-> StatusID
I want to store a full history of all the Statuses that a Person has ever had. But I also want easy access to the current status.
Query to get the current status:
SELECT TOP 1 StatusID FROM tPersonStatusHistory
WHERE PersonID = ? ORDER BY PersonStatusDate DESC
What I want is a query that will fetch me a list of Person records, with their most recent StatusID as a column in the query.
We have tried the following approaches:
Including the above query as a sub-query in the select.
Adding a CurrentPersonStatusHistoryID column to the tPerson table and maintaining it using a computed column that calls a User-Defined-Function.
Maintaining the CurrentPersonStatusHistoryID column using a trigger on the tPersonStatusHistory table.
The query to pull up the Person records is quite high use, so I don't want to have to look up the History table each time. The trigger approach is closest to what I want, since the data is persisted in the Person table and is only changed when an update is made (which is by comparison not very often).
I find triggers difficult to maintain and I would prefer to stay away from them. I've also found that when doing an Insert-Select, or an Update query involving multiple records, the trigger is only called on the first record and not the others.
What I really want is to put some logic into the column definition of CurrentPersonStatusHistoryID, press Save and have it persisted and updated behind the scenes without my intervention.
Given that Many-to-Many relationships are common I was wondering if anyone else had come across a similar situation and had some insight into the highest performance, and preferably least hassle, way of implementing this.
Another approach is to use something like the following query, perhaps as a view. It will give you the most recent StatusID for each Person.
SELECT PersonID, StatusID
FROM (
SELECT PersonID, StatusID,
rank() OVER(PARTITION BY PersonID ORDER BY PersonStatusDate DESC) as rnk
FROM tPersonStatusHistory
) A
WHERE rnk = 1
I'm not sure that this satisfies your requirement for performance, but it's something you could look into.

Database structure for storing historical data

Preface:
I was thinking the other day about a new database structure for a new application and realized that we needed a way to store historical data in an efficient way. I was wanting someone else to take a look and see if there are any problems with this structure. I realize that this method of storing data may very well have been invented before (I am almost certain it has) but I have no idea if it has a name and some google searches that I tried didn't yield anything.
Problem:
Lets say you have a table for orders, and orders are related to a customer table for the customer that placed the order. In a normal database structure you might expect something like this:
orders
------
orderID
customerID
customers
---------
customerID
address
address2
city
state
zip
Pretty straightforward, orderID has a foreign key of customerID which is the primary key of the customer table. But if we were to go and run a report over the order table, we are going to join the customers table to the orders table, which will bring back the current record for that customer ID. What if when the order was placed, the customers address was different and it has been subsequently changed. Now our order no longer reflects the history of that customers address, at the time the order was placed. Basically, by changing the customer record, we just changed all history for that customer.
Now there are several ways around this, one of which would be to copy the record when an order was created. What I have come up with though is what I think would be an easier way to do this that is perhaps a little more elegant, and has the added bonus of logging anytime a change is made.
What if I did a structure like this instead:
orders
------
orderID
customerID
customerHistoryID
customers
---------
customerID
customerHistoryID
customerHistory
--------
customerHistoryID
customerID
address
address2
city
state
zip
updatedBy
updatedOn
please forgive the formatting, but I think you can see the idea. Basically, the idea is that anytime a customer is changed, insert or update, the customerHistoryID is incremented and the customers table is updated with the latest customerHistoryID. The order table now not only points to the customerID (which allows you to see all revisions of the customer record), but also to the customerHistoryID, which points to a specific revision of the record. Now the order reflects the state of data at the time the order was created.
By adding an updatedby and updatedon column to the customerHistory table, you can also see an "audit log" of the data, so you could see who made the changes and when.
One potential downside could be deletes, but I am not really worried about that for this need as nothing should ever be deleted. But even still, the same effect could be achieved by using an activeFlag or something like it depending on the domain of the data.
My thought is that all tables would use this structure. Anytime historical data is being retrieved, it would be joined against the history table using the customerHistoryID to show the state of data for that particular order.
Retrieving a list of customers is easy, it just takes a join to the customer table on the customerHistoryID.
Can anyone see any problems with this approach, either from a design standpoint, or performance reasons why this is bad. Remember, no matter what I do I need to make sure that the historical data is preserved so that subsequent updates to records do not change history. Is there a better way? Is this a known idea that has a name, or any documentation on it?
Thanks for any help.
Update:
This is a very simple example of what I am really going to have. My real application will have "orders" with several foreign keys to other tables. Origin/destination location information, customer information, facility information, user information, etc. It has been suggested a couple of times that I could copy the information into the order record at that point, and I have seen it done this way many times, but this would result in a record with hundreds of columns, which really isn't feasible in this case.
When I've encountered such problems one alternative is to make the order the history table. Its functions the same but its a little easier to follow
orders
------
orderID
customerID
address
City
state
zip
customers
---------
customerID
address
City
state
zip
EDIT: if the number of columns gets to high for your liking you can separate it out however you like.
If you do go with the other option and using history tables you should consider using bitemporal data since you may have to deal with the possibility that historical data needs to be corrected. For example Customer Changed his current address From A to B but you also have to correct address on an existing order that is currently be fulfilled.
Also if you are using MS SQL Server you might want to consider using indexed views. That will allow you to trade a small incremental insert/update perf decrease for a large select perf increase. If you're not using MS SQL server you can replicate this using triggers and tables.
When you are designing your data structures, be very carful to store the correct relationships, not something that is similar to the correct relationships. If the address for an order needs to be maintained, then that is because the address is part of the order, not the customer. Also, unit prices are part of the order, not the product, etc.
Try an arrangement like this:
Customer
--------
CustomerId (PK)
Name
AddressId (FK)
PhoneNumber
Email
Order
-----
OrderId (PK)
CustomerId (FK)
ShippingAddressId (FK)
BillingAddressId (FK)
TotalAmount
Address
-------
AddressId (PK)
AddressLine1
AddressLine2
City
Region
Country
PostalCode
OrderLineItem
-------------
OrderId (PK) (FK)
OrderItemSequence (PK)
ProductId (FK)
UnitPrice
Quantity
Product
-------
ProductId (PK)
Price
etc.
If you truly need to store history for something, like tracking changes to an order over time, then you should do that with a log or audit table, not with your transaction tables.
Normally orders simply store the information as it is at the time of the order. This is especially true of things like part numbers, part names and prices as well as customer address and name. Then you don;t have to join to 5 or six tables to get teh information that can be stored in one. This is not denormalization as you actually need to have the innformation as it existed at the time of the order. I think is is less likely that having this information in the order and order detail (stores the individual items ordered) tables is less risky in terms of accidental change to the data as well.
Your order table would not have hundreds of columns. You would have an order table and an order detail table due to one to many relationships. Order table would include order no. customer id 9so you can search for everything this customer has ever ordered even if the name changed), customer name, customer address (note you don't need city state zip etc, put the address in one field), order date and possibly a few other fields that relate directly to the order at a top level. Then you have an order detail table that has order number, detail_id, part number, part description (this can be a consolidation of a bunch of fields like size, color etc. or you can separate out the most common), No of items, unit type, price per unit, taxes, total price, ship date, status. You put one entry in for each item ordered.
If you are genuinely interested in such problems, I can only suggest you take a serious look at "Temporal Data and the Relational Model".
Warning1 : there is no SQL in there and almost anything you think you know about the relational model will be claimed a falsehood. With good reason.
Warning2 : you are expected to think, and think hard.
Warning3 : the book is about what the solution for this particular family of problems ought to look like, but as the introduction says, it is not about any technology available today.
That said, the book is genuine enlightenment. At the very least, it helps to make it clear that the solution for such problems will not be found in SQl as it stands today, or in ORMs as those stand today, for that matter.
What you want is called a datawarehouse. Since datawarehouses are OLAP and not OLTP, it is recommended to have as many columns as you need in order to achieve your goals. In your case the orders table in the datawarehouse will have 11 fields as having a 'snapshot' of orders as they come, regardless of users accounts updates.
Wiley -The Data Warehouse Toolkit, Second Edition
It's a good start.
Our payroll system uses effective dates in many tables. The ADDRESSES table is keyed on EMPLID and EFFDT. This allows us to track every time an employee's address changes. You could use the same logic to track historical addresses for customers. Your queries would simply need to include a clause that compares the order date to the customer address date that was in effect at the time of the order. For example
select o.orderID, c.customerID, c.address, c.city, c.state, c.zip
from orders o, customers c
where c.customerID = o.customerID
and c.effdt = (
select max(c1.effdt) from customers c1
where c1.customerID = c.customerID and c1.effdt <= o.orderdt
)
The objective is to select the most recent row in customers having an effective date that is on or before the date of the order. This same strategy could be used to keep historical information on product prices.
I myself like to keep it simple. I would use two tables: a customer table and a customer history table. If you have the key (e.g. CustomerID) in the history table there is no reason to make a joining table, a select on that key will give you all records.
You also don't have audit information (e.g. date modified, who modified etc) in the history table as you show it, I expect you want this.
So mine would look something like this:
CustomerTable (this contains current customer information)
CustomerID (distinct non null)
...all customer information fields
CustomerHistoryTable
CustomerID (not distinct non null)
...all customer information fields
DateOfChange
WhoChanged
The DateOfChange field is the date the customer table was changed (from the values in this record) to the values in a more recent record of the values in the CustomerTable.
You orders table just needs a CustomerID if you need to find the customer information at the time of the order it is a simple select.

Maintaining a metadata table in SQL

Can someone help giving me some direction to tackle a scenario like this.
A User table which contains all the user information, UserID is the primary key on User Table. I have another table called for example Comments, which holds all the comments created by any user. Comments table contains UserID as the foreign key. Now i have to rank the Users based on number of comments they added. The more comments a user added, the ranking goes up. I am trying to see what will be the best way to do this.
I would prefer to have another table, which basically contains all the attributes or statistics of a user(might have more attributes in future, right now only rank, based on comment count),rather than adding another column in User table itself.
If I create another table Called UserStats, and have UserID as the foreign Key, and have another column, called Rank, there is a possibility that everytime a user adds a comment, we might need to update the ranks. How do I write a SP that does this, Im not even sure, if this is the right way to do this.
This is not the right way to do this.
You don't want to be materializing those kinds of computed values until there is a performance problem - and you have options like Indexed Views to help you well before you get to the point of doing what you suggested.
Just create a View called UserRankings and have it look like:
SELECT c.UserId, COUNT(c.CommentId) [Ranking]
FROM Comments c
GROUP BY c.UserId
Not sure how you want to do your rankings, but you can also look at the RANK() and DENSE_RANK() functions in T-SQL: Ranking Functions (Transact-SQL)
You could do this from a query
SELECT UserID,
COUNT(UserID) CntOfUserID
FROM UserComments
GROUP BY UserID
ORDER BY COUNT(UserID) DESC
You could also do this using a ROW_NUMBER
DECLARE #Comments TABLE(
UserID INT,
Comment VARCHAR(MAX)
)
INSERT INTO #Comments SELECT 3, 'Foo'
INSERT INTO #Comments SELECT 3, 'Bar'
INSERT INTO #Comments SELECT 3, 'Tada'
INSERT INTO #Comments SELECT 2, 'T'
INSERT INTO #Comments SELECT 2, 'G'
SELECT UserID,
ROW_NUMBER() OVER (ORDER BY COUNT(UserID) DESC) ID
FROM #Comments
GROUP BY UserID
Storing that kind of information is actually a bad idea. The count of comments per user is something that can be calculated at any given time quickly and easily. And if your columns are properly indexed (on the foreign key,) the count operation ought to happen very quickly.
The only reason you might want to persist metadata is if the load on your database is fast and furious and you simply cannot afford to run select queries with counts per request. And that load will also inform whether you simply add a column to your user table or create a whole separate table. (The latter solution being the one for the most extreme server loads.)
A few comments:
Yes, I think you should keep the "score" metadata somewhere, otherwise, you'd have to run the scoring calc each time, which could ultimately get expensive.
Second, I don't think you should calculate an actual "rank" (vs other users). Just calculate a "score" (based on the number of comments posted), then your query can determine "rank" by retrieving scores in descending order.
Third, I would probably make a trigger that updates the "score" in the metadata table, based on each insert into the comments table.