Student Seeking Advice for a CSC Exam - sql

I'm a student taking a course on SQL and DB. My question is this: how does one get good at hand writing queries? Our final exam will consist of many of these questions, and I want to do well. We aren't allowed any sort of reference sheet either, just fyi.
I suppose what I'm asking is: how would you approach this?

In short, You require practice aka hands on sql.

You will probably get many opinions on this from others. Aside from practice and reading, try to ensure you understand the absolute basics and sequence of query.
Always use table.column or alias.column to help prevent any ambiguity of where something is coming from.
Know the overall basic segments of writing a query such as
select
[all your alias.columns comma separated]
from
[your primary and/or JOIN/LEFT JOIN/etc tables]
where
[what is the criteria you are looking for]
AND [use proper parenthesis to prevent ambiguity if so needed]
group by
[any columns if doing aggregates such as count, min, max, avg, etc]
[you need to list all NON-AGGREGATE alias.columns]
having
[if any, such as count(*) > someValue]
order by
[any specific columns and ascending or descending order]
[such as orderDate DESC to put most recent order at top]
In my opinion, getting your FROM clause is one of the most important and I try to always list my table JOIN clauses on first table/alias = second table/alias. Indentation helps here so you can see how you get from one table to the next. At this point, do not think of your filtering (YET), just HOW the tables are related. Then you can add "AND" criteria for something you are specifically looking for from that source.
An example of orders. Looking for customers who ordered in the last 30 days. Start with that source as your first FROM table, everything else off of that. So I start with the orders because I care about WHEN something was ordered. I can then join to customers to get their name.
select
c.LastName,
c.FirstName,
o.OrderDate
from
Orders o
JOIN Customers c
on o.CustomerID = c.CustomerID
where
o.OrderDate > [sql-specific current date - 30 days]
order by
c.LastName,
c.FirstName
Another example of orders that ordered a specific item in the last 30 days. In this case, I could reverse the order of details as specific things being ordered might be smaller granularity vs everything. So, altering above such as
select
c.LastName,
c.FirstName,
o.OrderDate
from
Items i
JOIN OrderDetails od
on i.ItemID = od.ItemID
JOIN Orders o
on od.OrderID = o.OrderID
AND o.OrderDate > [sql-specific current date - 30 days]
JOIN Customers c
on o.CustomerID = c.CustomerID
where
i.ItemDescription = 'SomeThing'
order by
c.LastName,
c.FirstName
Notice my indentation nesting. Personal style preference, but at least you can see how alias i to od, od to o, o to c. In my preference, easier to see the trail of tables and how each are directly related. I also added the "AND" clause to filter out orders within the last 30 days directly in the JOIN to the orders table.
LEFT JOINs, I do the same and keep the criteria directly at the JOIN level. If you put a criteria of a left-join into the WHERE clause (without explicitly handling NULL OR [condition] it turns a left-join into an [INNER] join.
Hope this basic guidance helps you get more comfortable as you get more into writing your own queries and course/test preparation.

Related

SQL-92 Selecting with the dot operator

Before it is marked as a duplicate, I am not asking If I have to specify it fully, I am why it does not matter if it is specified. Hope that clears that up. Now to the question.
I'm new to SQL so I'm not sure if there is some technical term for this.
Say I have a database with tables: Orders and Customers.
Orders has categories: OrderID, CustomerID, and OrderDate
Customers has categories: CustomerID, CustomerName, ContactName, and Country
I then have a SQL Query:
SELECT Orders.OrderID, Customers.CustomerName, Orders.OrderDate
FROM Orders
INNER JOIN Customers ON Orders.CustomerID=Customers.CustomerID;
So I am selecting Orders.OrderID, Customers.CustomerName, and Orders.OrderDate FROM Orders table. If it is from the Orders table, why specify Orders. before the OrderID and OrderDate in select ? This is an example from a website, and does not explain this. I am not sure if it has to do with join (which is in the example) so that's why I also put it there and in the tags.
-Thanks
Sometimes the column name is found in both tables and your DBMS will throw an error that the column is ambiguous. It's usually a good idea to explicitly declare which table you want the item to come from.
Using an alias often make the code easier to read and write:
SELECT ord.OrderID, cus.CustomerName, ord.OrderDate
FROM Orders ord
INNER JOIN Customers cus ON ord.CustomerID=cus.CustomerID;
These table names are pretty short, but you can see how useful aliases can be when the table names become longer and more complicated.
One benefit to explicitly declaring the table is that you can tell at a glance what table the data is coming from. Once you have data coming from many sources, through joins or not, it can be difficult to tell exactly which table a field is coming from, if you do not show the table in the select statements.

SQL query on Northwind multiple tables

From Northwind database I want to get total revenue generated by emplyee sales
Employee -> Orders -> "Order Details"
I am not sure if my solution gives the right data (it was partly guessing)
SELECT
Employees.FirstName, Employees.LastName,
SUM(CONVERT(MONEY, ("Order Details".UnitPrice * Quantity * (1 - Discount) / 100)) * 100) AS ExtendedPrice
FROM
((Orders
INNER JOIN
"Order Details" ON Orders.OrderID = "Order Details".OrderID)
INNER JOIN
Employees ON Orders.EmployeeID = Employees.EmployeeID)
GROUP BY
LastName, FirstName;
Northwind database structure can be found here
Thank you in advance. It would be great to have a nice explanation as well
Chris, your effort is pretty good first effort, so there are a few things to change on this.
You don't need to divide by 100 and then multiply by 100. The discount is already a %. Your operation just truncates the numbers. I would avoid to this too early in a process as it introduces rounding errors. It is better to keep numbers raw and keep their precision as best you can for as long as you can. It is OK to display numbers as money in the GUI though i.e. to 2 decimals but not in intermediate calculations due to error introduced by truncating.
Table names and field names with spaces should be handled using [] rather than quotes. That makes it easier to find misspelling so use [Order Details]
When grouping and summing, make sure you use the keys. So name is not a key, so use EmployeeID if you are trying to group individual employees, this is because in real datasets you may have 2 employees with the same name and their sales will be grouped together incorrectly using your code.
Try this course/book, it is a good intro to querying databases. https://www.microsoft.com/en-au/learning/exam-70-461.aspx
The reason how this works? Select syntax has Select [fieldlist] from [table] inner join [jointable] on [join fields] group by [grouping fields]. fieldlist can be a calculation as well as actual field names to display. "inner join" means you want only those orders, order details, employees where there is actual matching data - Correct in your scenario. [table] and [jointable] is the actual tables that contain your data in a relational sense.
There is obvisouly a lot here to learn in one go. I would work through some of the different SQL Server querying courses that you can google.
Here's a revised version of the code:
SELECT Employees.EmployeeID, Employees.FirstName, Employees.LastName, Sum([Order Details].UnitPrice * Quantity * (1 - Discount)) AS ExtendedPrice
FROM Orders
INNER JOIN [Order Details] ON Orders.OrderID = [Order Details].OrderID
INNER JOIN Employees ON Orders.EmployeeID = Employees.EmployeeID
group by Employees.EmployeeID, Employees.FirstName, Employees.LastName
order by Employees.FirstName, Employees.LastName;

Writing a query that can be used to find which orders shipped to a different country from the customer

So I am currently in a Database 2 class at my University. We are using the Northwinds db. I haven't used SQL in a few years so I am a little rusty.
I changed a few of the pieces in the Orders table so that instead of 'Germany' it was 'Tahiti'. Now I need to write a query to find which orders shipped where.
I know that I will need to use an Join but I am not exactly sure how. I have gone to W3Schools and looked at the Joins SQL page but still haven't found the correct answer I am looking for.
This is what I have currently (which I am also not sure if it is correct):
SELECT Customers.Country
FROM Customer
WHERE Customer.Country = 'Germany'
INNER JOIN
SELECT Orders.ShipCountry
FROM Orders
WHERE Orders.ShipCountry = 'Tahiti'
So if anyone could give me help I would really appreciate it.
EDIT
So this is the actual question I was given which I think is also kind of poorly worded.
"Suspicious e-commerce transactions include orders placed by a customer in one country that are shipped to another country. In fact there are no such orders in the Northwind db, so create a few by modifying some of the "Germany" shipcountry entries in the orders table to "Tahiti". Then write a query which finds orders shipped to a different country from the customer. Hint: in order to do this, you will need to join the Customers and Orders table."
Is this what you are looking for:
SELECT *
FROM Customer AS C
INNER JOIN
Orders AS O
ON C.CustomerID = O.CustomerID
WHERE C.Country = 'Germany' AND O.ShipCountry = 'Tahiti';
The above query is based on the Schema as defined in CodePlex
i hope that this can help you
SELECT Orders.*
FROM Orders -- table name, also can use alias
inner join Customer -- table name, also can use alias
on Orders.ShipCountry = Customer.Country -- you must declare what is the field to use by join
where Customer.Country in ('Germany','Tahiti')

I am getting too many solutions when I need only one

I use a query that returns the name of the city with the highest number of orders placed.
This is what I have:
SELECT MAX(o.OrderID) AS [Number of Orders], od.ShipCity
FROM Orders o, [Order Details] od
GROUP BY o.ShipCity
ORDER BY [Number of Orders] DESC
I got all of the cities and their orders instead of just the one city with the most orders.
What happened?
Yeah, there's a couple of things wrong with your query. First, you're getting the max order id , which is presumably some autoincrement column. It's like Karl's answer is first, mine is second, SELECT MAX(answerid) FROM this_discussion = 2.... but that doesn't mean I have more answers than he does.
Rnofx5 is also right... you need to tell your table what to join ON, cause right now it's creating a Cartesian Product. If you're not sure what that is, for now accept that it's a horrible, evil, wicked thing to do and then Google it after we're done fixing the query.
So, we have orders and order details. Presumably orders does not contain City, so we need order details
SELECT count(o.OrderID), od.ShipCity
FROM orders AS o
INNER JOIN [Order Details] AS od
ON o.{a varible that both Orders and Order Details have in common} = .{a varible that both Orders and Order Details have in common}
GROUP BY od.ShipCity
ORDER BY count(o.OrderID) DESC
LIMIT 1;
Okay, so we're joining Orders with Order Details. In order to do that, we need to associate every order with something in Order Details. I don't know your schema, but from the sounds of it probably each order has a corresponding record in Order Details. In that case, you join these two tables using their ID. Something like
ON o.OrderID = od.OrderID
Now, we are counting all of the orders associated with a particular city... and we sorting them by our count, in descending order. And then we are keeping only the very first record that's returned (LIMIT 1)
Depending on your SQL implementation, you may need TOP 1 instead of LIMIT 1. You tagged mysqli, so presumably this is MySQL and in that case you'd want LIMIT not TOP. But be aware that that's a syntax variation you may encounter at some point
You are getting the highest orderID (MAX) per ship city rather then the count.
You instead need COUNT(o.OrderID)
And in MySQL you need to use LIMIT 1 on the end to get only the top most result.

When do you give up set operations in SQL and go procedural?

I was once given this task to do in an RDBMS:
Given tables customer, order, orderlines and product. Everything done with the usual fields and relationships, with a comment memo field on the orderline table.
For one customer retrieve a list of all products that customer has ever ordered with product name, year of first purchase, dates of three last purchases, comment of the latest order, sum of total income for that product-customer combination last 12 months.
After a couple of days I gave up doing it as a Query and opted to just fetch every orderline for a customer, and every product and run through the data procedurally to build the required table clientside.
I regard this a symptom of one or more of the following:
I'm a lazy idiot and should have seen how to do it in SQL
Set operations are not as expressive as procedural operations
SQL is not as expressive as it should be
Did I do the right thing? Did I have other options?
You definitely should be able to do this exercise without doing the work equivalent to a JOIN in application code, i.e. by fetching all rows from both orderlines and products and iterating through them. You don't have to be an SQL wizard to do that one. JOIN is to SQL what a loop is to a procedural language -- in that both are fundamental language features that you should know how to use.
One trap people fall into is thinking that the whole report has to be produced in a single SQL query. Not true! Most reports don't fit into a rectangle, as Tony Andrews points out. There are lots of rollups, summaries, special cases, etc. so it's both simpler and more efficient to fetch parts of the report in separate queries. Likewise, in a procedural language you wouldn't try do all your computation in a single line of code, or even in a single function (hopefully).
Some reporting tools insist that a report is generated from a single query, and you have no opportunity to merge in multiple queries. If so, then you need to produce multiple reports (and if the boss wants it on one page, then you need to do some paste-up manually).
To get a list of all products ordered (with product name), dates of last three purchases, and comment on latest order is straightforward:
SELECT o.*, l.*, p.*
FROM Orders o
JOIN OrderLines l USING (order_id)
JOIN Products p USING (product_id)
WHERE o.customer_id = ?
ORDER BY o.order_date;
It's fine to iterate over the result row-by-row to extract the dates and comments on the latest orders, since you're fetching those rows anyway. But make it easy on yourself by asking the database to return the results sorted by date.
Year of first purchase is available from the previous query, if you sort by the order_date and fetch the result row-by-row, you'll have access to the first order. Otherwise, you can do it this way:
SELECT YEAR(MIN(o.order_date)) FROM Orders o WHERE o.customer_id = ?;
Sum of product purchases for the last 12 months is best calculated by a separate query:
SELECT SUM(l.quantity * p.price)
FROM Orders o
JOIN OrderLines l USING (order_id)
JOIN Products p USING (product_id)
WHERE o.customer_id = ?
AND o.order_date > CURDATE() - INTERVAL 1 YEAR;
edit: You said in another comment that you'd like to see how to get the dates of the last three purchases in standard SQL:
SELECT o1.order_date
FROM Orders o1
LEFT OUTER JOIN Orders o2
ON (o1.customer_id = o2.customer_id AND (o1.order_date < o2.order_date
OR (o1.order_date = o2.order_date AND o1.order_id < o2.order_id)))
WHERE o1.customer_id = ?
GROUP BY o1.order_id
HAVING COUNT(*) <= 3;
If you can use a wee bit of vendor-specific SQL features, you can use Microsoft/Sybase TOP n, or MySQL/PostgreSQL LIMIT:
SELECT TOP 3 order_date
FROM Orders
WHERE customer_id = ?
ORDER BY order_date DESC;
SELECT order_date
FROM Orders
WHERE customer_id = ?
ORDER BY order_date DESC
LIMIT 3;
Set operations are not as expressive as procedural operations
Perhaps more like: "Set operations are not as familiar as procedural operations to a developer used to procedural languages" ;-)
Doing it iteratively as you have done now is fine for small sets of data, but simply doesn't scale the same way. The answer to whether you did the right thing depends on whether you are satisfied with the performance right now and/or don't expect the amount of data to increase much.
If you could provide some sample code, we might be able to help you find a set-based solution, which will be faster to begin with and scale far, far better. As GalacticCowboy mentioned, techniques such as temporary tables can help make the statements far more readable while largely retaining the performance benefits.
In most RDBMS you have the option of temporary tables or local table variables that you can use to break up a task like this into manageable chunks.
I don't see any way to easily do this as a single query (without some nasty subqueries), but it still should be doable without dropping out to procedural code, if you use temp tables.
This problem may not have been solvable by one query. I see several distinct parts...
For one customer
Get a list of all products ordered (with product name)
Get year of first purchase
Get dates of last three purchases
Get comment on latest order
Get sum of product purchases for the last 12 months
Your procedure is steps 1 - 5 and SQL gets you the data.
Sounds like a data warehouse project to me. If you need things like "three most recent things" and "sum of something over the last 12 months" then store them i.e. denormalize.
EDIT: This is a completely new take on the solution, using no temp tables or strange sub-sub-sub queries. However, it will ONLY work on SQL 2005 or newer, as it uses the "pivot" command that is new in that version.
The fundamental problem is the desired pivot from a set of rows (in the data) into columns in the output. While noodling on the issue, I recalled that SQL Server now has a "pivot" operator to deal with this.
This works on SQL 2005 only, using the Northwind sample data.
-- This could be a parameter to a stored procedure
-- I picked this one because he has products that he ordered 4 or more times
declare #customerId nchar(5)
set #customerId = 'ERNSH'
select c.CustomerID, p.ProductName, products_ordered_by_cust.FirstOrderYear,
latest_order_dates_pivot.LatestOrder1 as LatestOrderDate,
latest_order_dates_pivot.LatestOrder2 as SecondLatestOrderDate,
latest_order_dates_pivot.LatestOrder3 as ThirdLatestOrderDate,
'If I had a comment field it would go here' as LatestOrderComment,
isnull(last_year_revenue_sum.ItemGrandTotal, 0) as LastYearIncome
from
-- Find all products ordered by customer, along with first year product was ordered
(
select c.CustomerID, od.ProductID,
datepart(year, min(o.OrderDate)) as FirstOrderYear
from Customers c
join Orders o on o.CustomerID = c.CustomerID
join [Order Details] od on od.OrderID = o.OrderID
group by c.CustomerID, od.ProductID
) products_ordered_by_cust
-- Find the grand total for product purchased within last year - note fudged date below (Northwind)
join (
select o.CustomerID, od.ProductID,
sum(cast(round((od.UnitPrice * od.Quantity) - ((od.UnitPrice * od.Quantity) * od.Discount), 2) as money)) as ItemGrandTotal
from
Orders o
join [Order Details] od on od.OrderID = o.OrderID
-- The Northwind database only contains orders from 1998 and earlier, otherwise I would just use getdate()
where datediff(yy, o.OrderDate, dateadd(year, -10, getdate())) = 0
group by o.CustomerID, od.ProductID
) last_year_revenue_sum on last_year_revenue_sum.CustomerID = products_ordered_by_cust.CustomerID
and last_year_revenue_sum.ProductID = products_ordered_by_cust.ProductID
-- THIS is where the magic happens. I will walk through the individual pieces for you
join (
select CustomerID, ProductID,
max([1]) as LatestOrder1,
max([2]) as LatestOrder2,
max([3]) as LatestOrder3
from
(
-- For all orders matching the customer and product, assign them a row number based on the order date, descending
-- So, the most recent is row # 1, next is row # 2, etc.
select o.CustomerID, od.ProductID, o.OrderID, o.OrderDate,
row_number() over (partition by o.CustomerID, od.ProductID order by o.OrderDate desc) as RowNumber
from Orders o join [Order Details] od on o.OrderID = od.OrderID
) src
-- Now, produce a pivot table that contains the first three row #s from our result table,
-- pivoted into columns by customer and product
pivot
(
max(OrderDate)
for RowNumber in ([1], [2], [3])
) as pvt
group by CustomerID, ProductID
) latest_order_dates_pivot on products_ordered_by_cust.CustomerID = latest_order_dates_pivot.CustomerID
and products_ordered_by_cust.ProductID = latest_order_dates_pivot.ProductID
-- Finally, join back to our other tables to get more details
join Customers c on c.CustomerID = products_ordered_by_cust.CustomerID
join Orders o on o.CustomerID = products_ordered_by_cust.CustomerID and o.OrderDate = latest_order_dates_pivot.LatestOrder1
join [Order Details] od on od.OrderID = o.OrderID and od.ProductID = products_ordered_by_cust.ProductID
join Products p on p.ProductID = products_ordered_by_cust.ProductID
where c.CustomerID = #customerId
order by CustomerID, p.ProductID
SQL queries return results in the form of a single "flat" table of rows and columns. Reporting requirements are often more complex than this, demanding a "jagged" set of results like your example. There is nothing wrong with "going procedural" to solve such requirements, or using a reporting tool that sits on top of the database. However, you should use SQL as far as possible to get the best performance from the database.