SQL command to filter based on multiple tables and criteria - sql

I am trying to learn sql, its driving me nuts. I cannot seem to grasp the proper syntax to achieve my desired output. I am watching videos on udemy and reading books on basic sql trying to teach myself, but it seems they all fall short in helping me bridge this gap I seem to not be able to over come.
I have a pretty good handle on the basics of the SELECT, FROM, WHEN commands. I seem to be gaining knowledge on using aggregate functions, but I am by no means an expert.
I have two tables, "Orders" and "OrderDet". "Orders" contains the CustomerName and the OrderNo, and OrderDet contains everything else, like PartNo, DateFinished, OrderNo, etc.
I have a situation where I can have multiple customers order the same part number. I want to show all the last orders all customers placed.
For example
SELECT Orders.CustDesc, OrderDet.OrderNo, OrderDet.PartNo, OrderDet.DateFinished
FROM Orders
JOIN OrderDet ON Orders.OrderNo = OrderDet.OrderNo
ORDER BY OrderDet.PartNo, OrderDet.DateFinished
This query returns:
Customer OrderNo PartNo Date Finished
--------------------------------------------------------
Cust 1 5032 12345678-1 NULL
Cust 2 10032 12345678-1 2019-06-05 14:54:25.853
Cust 2 1048 12345678-1 2019-07-08 00:00:00.000
Cust 1 5028 12345678-1 2019-09-30 11:45:45.960
Cust 1 5029 12345678-1 2019-09-30 12:49:35.713
Cust 1 5030 12345678-1 2019-09-30 13:04:57.333
Cust 1 5031 12345678-1 2019-10-10 13:58:22.653
I'm still learning when and how to use aggregate function but seem to not be able to fully grasp the concept. I tried to use a MAX on the Date column and GROUP BY the Customer and PartNo, but unless I remove the Order Number, the output never collapses down to what I want.
For example I used:
SELECT Orders.CustDesc, OrderDet.PartNo, MAX(OrderDet.DateFinished)
FROM Orders
JOIN OrderDet ON Orders.OrderNo = OrderDet.OrderNo
GROUP BY Orders.CustDesc, OrderDet.PartNo
ORDER BY OrderDet.PartNo
Removing OrderDet.OrderNo from SELECT, and OrderDet.DateFinished from the Order By.
This returns the row output I desire, but lacking all the columns I want.
Customer PartNo Date Finished
--------------------------------------------
Cust 2 12345678-1 2019-07-08 00:00:00.000
Cust 1 12345678-1 2019-10-10 13:58:22.653
As soon as I try and add the OrderNo back into the mix, I get the same output as the first. I think I understand why this is happening because all the OrderNo's are unique and cannot get grouped, but I cant grasp how to over come this.
I understand this is a basic SQL command but I cannot seem to understand how to get the output I desire. In this example I wanted to only see the two rows of unique Customers based on the last date the PartNo was finished, but have the entire rows contents shown. Not just three columns.
Again, I am trying to learn this stuff and I can only read and re-read the same basic content to learn how to do this for so long. Everything I read seems to lack the info my brain seems to require for that "AH HA" moment.
Perhaps someone could help bridge this gap?

I am interpreting your question as wanting the most recent order for a given customer for each part that customer has ordered.
For this, I would recommend window functions:
select CustDesc, OrderNo, od.DateFinished
from (select o.custdesc, od.orderno, od.partno, od.datefinished,
row_number() over (partition by o.custdesc, od.partno order by od.datefinished desc) as seqnum
from Orders o join
orderdet od
on o.OrderNo = od.OrderNo
) od
where seqnum = 1;
order by od.PartNo, od.DateFinished

Related

POSTGRESQL - Finding specific product when

I've attempted to write a query but I've not managed to get it working correctly.
I'm attempting to retrieve where a specific product has been bought but where it also has been bought with other products. In the case below, I want to find where product A01 has been bought but also when it was bought with other products.
Data (extracted from tables for illustration):
Order | Product
123456 | A01
123457 | A01
123457 | B02
123458 | C03
123459 | A01
123459 | C03
Query which will return all orders with product A01 without showing other products:
SELECT
O.NUMBER
O.DATE
P.NUMBER
FROM
ORDERS O
JOIN PRODUCTS P on P.ID = O.ID
WHERE
P.NUMBER = 'A01'
I've tried to create a sub query which brings back just orders of product A01 but I don't know how to place it in the query for it to return all orders containing product A01 as well as any other product ordered with it.
Any help on this would be very grateful.
Thanks in advance.
You can use conditional SUM to detect if one ORDER group have one ore more 'A01'
CREATE TABLE orders
("Order" int, "Product" varchar(3))
;
INSERT INTO orders
("Order", "Product")
VALUES
(123456, 'A01'),
(123457, 'A01'),
(123457, 'B02'),
(123458, 'C03'),
(123459, 'A01'),
(123459, 'C03')
;
SELECT "Order"
FROM orders
GROUP BY "Order"
HAVING SUM(CASE WHEN "Product" = 'A01' THEN 1 ELSE 0 END) > 0
I appreciated Juan's including the DDL to create the database on my system. By the time I saw it, I'd already done all the same work, except that I got around the reserved word problem by naming that field Order1.
Sadly, I didn't consider that either of the offered queries worked on my system. I used MySQL.
The first one returned the A01 lines of the two orders on which other products were ordered too. I took Alex's purpose to include seeing all items of all orders that included A01. (Perhaps he wants to tell future customers what other products other customers have ordered with A01, and generate sales that way.)
The second one returned the three A01 lines.
Maybe Alex wants:
select *
from orders
where Order1 in (select Order1
from orders
where Product = 'A01')
It outputs all lines of all orders that include A01. The subquery makes a list of all orders with A01. The first query returns all lines of those orders.
In a big database, you might not want to run two queries, but this is the only way I see to get the result I understood Alex wanted. If that is what he wanted, he would have to run a second query once armed with output from the queries offered, so there's no real gain.
Good discussion. Thanks to all!
Use GROUP BY clause along with HAVING like
select "order", Product
from data
group by "order"
having count(distinct product) > 1;

DB2 Select from two tables when one table requires sum

In a DB2 Database, I want to do the following simple mathematics using a SQL query:
AvailableStock = SupplyStock - DemandStock
SupplyStock is stored in 1 table in 1 row, let's call this table the Supply table.
So the Supply table has this data:
ProductID | SupplyStock
---------------------
109 10
244 7 edit: exclude this product from the search
DemandStock is stored in a separate table Demand, where demand is logged as each customer logs demand during a customer order journey. Example data from the Demand table:
ProductID | DemandStock
------------------------
109 1
244 4 edit: exclude this product
109 6
109 2
So in our heads, if I want to calculate the AvailableStock for product '109', Supply is 10, Demand for product 109 totals to 9, and so Available stock is 1.
How do I do this in one select query in DB2 SQL?
The knowledge I have so far of some of the imagined steps in PseudoCode:
I select SupplyStock where product ID = '109'
I select sum(DemandStock) where product ID = '109'
I subtract SupplyStock from DemandStock
I present this as a resulting AvailableStock
The results will look like this:
Product ID | AvailableStock
109 9
I'd love to get this selected in one SQL select query.
Edit: I've since received an answer (that was almost perfect) and realised the question missed out some information.
This information:
We need to exclude data from products we don't want to select data for, and we also need to specifically select product 109.
My apologies, this was omitted from the original question.
I've since added a 'where' to select the product and this works for me. But for future sake, perhaps the answer should include this information too.
You do this using a join to bring the tables together and group by to aggregate the results of the join:
select s.ProductId, s.SupplyStock, sum(d.DemandStock),
(s.SupplyStock - sum(d.DemandStock)) as Available
from Supply s left join
Demand d
on s.ProductId = d.ProductId
where s.ProductId = 109
group by s.ProductId, s.SupplyStock;

Retrieving the latest transaction for an item

I have a table that lists lots of item transactions. I need to get the last record for each item (the one dated the latest).
For example my table looks like this:
Item Date TrxLocation
XXXXX 1/1/13 WAREHOUSE
XXXXX 1/2/13 WAREHOUSE
XXXXX 1/3/13 WAREHOUSE
aaaa 1/1/13 warehouse
aaaa 2/1/13 WAREHOUSE
I want the data to come back as follows:
XXXXX 1/3/13 WAREHOUSE
AAAA 2/1/13 WAREHOUSE
I tried doing something like this but it is bringing back the wrong date
select Distinct ITEMNMBR
TRXLOCTN,
DATERECD
from TEST
where DateRecd = (select max(DATERECD)
from TEST)
Any help is appreciated.
You're on the right track. You just need to change your subquery to a correlated subquery, which means that you give it some context to the outer query. If you just run your subquery (select max(DATERECD) from TEST) by itself, what do you get? You get a single date that is the latest date in the whole table, regardless of item. You need to tie the subquery to the outer query by linking on the ITEMNMBR column, like this:
SELECT ITEMNMBR, TRXLOCTN, DATERECD
FROM TEST t
WHERE DateRecd = (
SELECT MAX (DATERECD)
FROM TEST tMax
WHERE tMax.ITEMNMBR = t.ITEMNMBR)
No need for subquery. You are querying a single table and need to select MAX(date) and GROUP BY item and TrxLocation.
SELECT Item, max(DATERECD) AS max_dt_recd, TrxLocation
FROM test
GROUP BY Item, TrxLocation
/

SQL SUM with Repeating Sub Entries - Best Practice?

I hit this issue regularly but here is an example....
I have a Order and Delivery Tables. Each order can have one to many Deliveries.
I need to report totals based on the Order Table but also show deliveries line by line.
I can write the SQL and associated Access Report for this with ease ....
SELECT xxx
FROM
Order
LEFT OUTER JOIN
Delivery on Delivery.OrderNO = Order.OrderNo
until I get to the summing element. I obviously only want to sum each Order once, not the 1-many times there are deliveries for that order.
e.g. The SQL might return the following based on 2 Orders (ignore the banalness of the report, this is very much simplified)
Region OrderNo Value Delivery Date
North 1 £100 12-04-2012
North 1 £100 14-04-2012
North 2 £73 01-05-2012
North 2 £73 03-05-2012
North 2 £73 07-05-2012
South 3 £50 23-04-2012
I would want to report:
Total Sales North - £173
Delivery 12-04-2012
Delivery 14-04-2012
Delivery 01-05-2012
Delivery 03-05-2012
Delivery 07-05-2012
Total Sales South - £50
Delivery 23-04-2012
The bit I'm referring to is the calculation of the £173 and £50 which the first of which obviously shouldn't be £419!
In the past I've used things like MAX (for a given Order) but that seems like a fudge.
Surely there must be a regular answer to this seemingly common problem but I can't find one.
I don't necessarily need the code - just a helpful point in the right direction.
Many thanks,
Chris.
A roll up operator may not look pretty. However, it would do the regular aggregates that you see now, and it show the subtotals of the order. This is what you're looking for.
SELECT xxx
FROM
Order
LEFT OUTER JOIN
Delivery on Delivery.OrderNO = Order.OrderNo
GROUP BY xxx
WITH ROLLUP;
I'm not exactly sure how the rest of your query is set up, but it would look something like this:
Region OrderNo Value Delivery Date
North 1 £100 12-04-2012
North 1 £100 14-04-2012
North 2 £73 01-05-2012
North 2 £73 03-05-2012
North 2 £73 07-05-2012
NULL NULL f419 NULL
I believe what you want is called a windowing function for your aggregate operation. It looks like the following:
SELECT xxx, SUM(Value) OVER (PARTITION BY Order.Region) as OrderTotal
FROM
Order
LEFT OUTER JOIN
Delivery on Delivery.OrderNO = Order.OrderNo
Here's the MSDN article. The PARTITION BY tells the SUM to be done separately for each distinct Order.Region.
Edit: I just noticed that I missed what you said about orders being counted multiple times. One thing you could do is SUM() the values before joining, as a CTE (guessing at your schema a bit):
WITH RegionOrders AS (
SELECT Region, OrderNo, SUM(Value) OVER (PARTITION BY Region) AS RegionTotal
FROM Order
)
SELECT Region, OrderNo, Value, DeliveryDate, RegionTotal
FROM RegionOrders RO
INNER JOIN Delivery D on D.OrderNo = RO.OrderNo

Use Access SQL to do a grouped ranking

How do I rank salespeople by # customers grouped by department (with ties included)?
For example, given this table, I want to create the Rank column on the right. How should I do this in Access?
SalesPerson Dept #Customers Rank
Bill DeptA 20 1
Ted DeptA 30 2
Jane DeptA 40 3
Bill DeptB 50 1
Mary DeptB 60 2
I already know how to do a simple ranking with this SQL code. But I don't know how to rework this to accept grouping.
Select Count(*) from [Tbl] Where [#Customers] < [Tblx]![#Customers] )+1
Also, there's plenty of answers for this using SQL Server's Rank() function, but I need to do this in Access. Suggestions, please?
SELECT *, (select count(*) from tbl as tbl2 where
tbl.customers > tbl2.customers and tbl.dept = tbl2.dept) + 1 as rank from tbl
Just add the dept field to the subquery...
Great solution with subquery! Except for huge recordsets, the subquery solution gets very slow. Its better(quicker) to use a Self JOIN, look at the folowing solution: self join
SELECT tbl1.SalesPerson , count(*) AS Rank
FROM tbl AS tbl1 INNER JOIN tbl AS tbl2 ON tbl1.DEPT = tbl2.DEPT
AND tbl1.#Customers < tbl2.#Customers
GROUP BY tbl1.SalesPerson
I know this is an old thread. But since I spent a great deal of time on a very similar problem and was greatly helped by the former answers given here, I would like to share what I have found to be a MUCH faster way. (Beware, it is more complicated.)
First make another table called "Individualizer". This will have one field containing a list of numbers 1 through the-highest-rank-that-you-need.
Next create a VBA module and paste this into it:
'Global Declarations Section.
Option Explicit
Global Cntr
'*************************************************************
' Function: Qcntr()
'
' Purpose: This function will increment and return a dynamic
' counter. This function should be called from a query.
'*************************************************************
Function QCntr(x) As Long
Cntr = Cntr + 1
QCntr = Cntr
End Function
'**************************************************************
' Function: SetToZero()
'
' Purpose: This function will reset the global Cntr to 0. This
' function should be called each time before running a query
' containing the Qcntr() function.
'**************************************************************
Function SetToZero()
Cntr = 0
End Function
Save it as Module1.
Next, create Query1 like this:
SELECT Table1.Dept, Count(Table1.Salesperson) AS CountOfSalesperson
FROM Table1
GROUP BY Table1.Dept;
Create a MakeTable query called Query2 like this:
SELECT SetToZero() AS Expr1, QCntr([ID]) AS Rank, Query1.Dept,
Query1.CountOfSalesperson, Individualizer.ID
INTO Qtable1
FROM Query1
INNER JOIN Individualizer
ON Query1.CountOfSalesperson >= Individualizer.ID;
Create another MakeTable query called Query3 like this:
SELECT SetToZero() AS Expr1, QCntr([Identifier]) AS Rank,
[Salesperson] & [Dept] & [#Customers] AS Identifier, Table1.Salesperson,
Table1.Dept, Table1.[#Customers]
INTO Qtable2
FROM Table1;
If you have another field already that uniquely identifies every row you wouldn't need to create an Identifier field.
Run Query2 and Query3 to create the tables.
Create a fourth query called Query4 like this:
SELECT Qtable2.Salesperson, Qtable2.Dept, Qtable2.[#Customers], Qtable1.ID AS Rank
FROM Qtable1
INNER JOIN Qtable2 ON Qtable1.Rank = Qtable2.Rank;
Query4 returns the result you are looking for.
Practically, you would want to write a VBA function to run Query2 and Query3 and then call that function from a button placed in a convenient location.
Now I know this sounds ridiculously complicated for the example you gave. But in real life, I am sure your table is more complicated than this. Hopefully my examples can be applied to your actual situation. In my database with over 12,000 records this method is by FAR the fastest (as in: 6 seconds with 12,000 records compared to over 1 minute with 262 records ranked with the subquery method).
The real secret for me was the MakeTable query because this ranking method is useless unless you immediately output the results to a table. But, this does limit the situations that it can be applied to.
P.S. I forgot to mention that in my database I was not pulling results directly from a table. The records had already gone through a string of queries and multiple calculations before they needed to be ranked. This probably contributed greatly to the huge difference in speed between the two methods in my situation. If you are pulling records directly from a table, you might not notice nearly as big an improvement.
You need to do some math. I typically take advantage of the combination of a counter field and an "offset" field. You're aiming for a table which looks like this (#Customers isn't necessary, but will give you a visual that you're doing it properly):
SalesPerson Dept #Customers Ctr Offset
Bill DeptA 20 1 1
Ted DeptA 30 2 1
Jane DeptA 40 3 1
Bill DeptB 50 4 4
Mary DeptB 60 5 4
So, to give rank, you'd do [Ctr]-[Offset]+1 AS Rank
build a table with SalesPerson, Dept, Ctr, and Offset
insert into that table, ordered by Dept and #Customers (so that they're all sorted properly)
Update Offset to be the MIN(Ctr), grouping on Dept
Perform your math calculation to determine Rank
Clear out the table so you're ready to use it again next time.
To add to this and any other related Access Ranking or Rank Tie Breaker how-tos for other versions of Access, ranking should not be performed on crosstab queries if your FROM clause happens to NOT contain a table but a query that is either a crosstab query or a query that contains within it elsewhere a crosstab query.
The code referenced above where a SELECT statement within a SELECT statment is used (sub query),
"SELECT *, (select count(*) from tbl as tbl2 where tbl.customers > tbl2.customers and tbl.dept = tbl2.dept) + 1 as rank from tbl"
will not work and will always fail expressing a error on portion of the code where "tbl.customers > tbl2.customers" cannot be found.
In my situation on a past project, I was referencing a query instead of a table and within that query I had referenced a crosstab query thus failing and producing an error. I was able to resolve this by creating a table from the crosstab query first, and when I referenced the newly created table in the FROM clause, it started working for me.
So in final, normally you can reference a query or table in the FROM clause of the SELECT statement as what was shared previously above to do ranking, but be carefull as to if you are referencing a query instead of a table, that query must Not be a crosstab query or reference another query that is a crosstab query.
Hope this helps anyone else that may have had problems looking for a possible reason if you happen to reference the statements above and you are not referencing a table in your FROM clause within your own project. Also, performing subqueries on aliases with crosstab queries in Access probably isn't good idea or best practice either so stray away from that if/when possible.
If you found this useful, and wish that Access would allow the use of a scrolling mouse in a passthru query editor, give me a like please.
I normally pick tips and ideas from here and sometimes end up building amazing things from it!
Today, (well let’s say for the past one week), I have been tinkering with Ranking of data in Access and to the best of my ability, I did not anticipate what I was going to do something so complex as to take me a week to figure it out! I picked titbits from two main sites:
https://usefulgyaan.wordpress.com/2013/04/23/ranking-in-ms-access/ (seen that clever ‘>=’ part, and the self joins? Amazing… it helped me to build my solution from just one query, as opposed to the complex method suggested above by asonoftheMighty (not discrediting you… just didn’t want to try it for now; may be when I get to large data I might want to try that as well…)
Right here, from Paul Abott above ( ‘and tbl.dept = tbl2.dept’)… I was lost after ranking because I was placing AND YearID = 1, etc, then the ranking would end up happening only for sub-sets, you guessed right, when YearID = 1! But I had a lot of different scenarios…
Well, I gave that story partly to thank the contributors mentioned, because what I did is to me one of the most complex of the ranking that I think can help you in almost any situation, and since I benefited from others, I would like to share here what I hope may benefit others as well.
Forgive me that I am not able to post my table structures here, it is a lot of related tables. I will only post the query, so if you need to you may develop your tables to end up with that kind of query. But here is my scenario:
You have students in a school. They go through class 1 to 4, can either be in stream A or B, or none when the class is too small. They each take 4 exams (this part is not important now), so you get the total score for my case. That’s it. Huh??
Ok. Lets rank them this way:
We want to know the ranking of
• all students who ever passed through this school (best ever student)
• all students in a particular academic year (student of the year)
• students of a particular class (but remember a student will have passed through all classes, so basically his/her rank in each of those classes for the different years) this is the usual ranking that appears in report cards
• students in their streams (above comment applies)
• I would also like to know the population against which we ranked this student in each category
… all in one table/query. Now you get the point?
(I normally like to do as much of my 'programming' in the database/queries to give me visuals and to reduce the amount of code I will later have to right. I actually won't use this query in my application :), but it let's me know where and how to send my parameters to the query it came from, and what results to expect in my rdlc)
Don't you worry, here it is:
SELECT Sc.StudentID, Sc.StudentName, Sc.Mark,
(SELECT COUNT(Sch.Mark) FROM [StudentScoreRankTermQ] AS Sch WHERE (Sch.Mark >= Sc.Mark)) AS SchoolRank,
(SELECT Count(s.StudentID) FROM StudentScoreRankTermQ AS s) As SchoolTotal,
(SELECT COUNT(Yr.Mark) FROM [StudentScoreRankTermQ] AS Yr WHERE (Yr.Mark >= Sc.Mark) AND (Yr.YearID = Sc.YearID) ) AS YearRank,
(SELECT COUNT(StudentID) FROM StudentScoreRankTermQ AS Yt WHERE (Yt.YearID = Sc.YearID) ) AS YearTotal,
(SELECT COUNT(Cl.Mark) FROM [StudentScoreRankTermQ] AS Cl WHERE (Cl.Mark >= Sc.Mark) AND (Cl.YearID = Sc.YearID) AND (Cl.TermID = Sc.TermID) AND (Cl.ClassID=Sc.ClassID)) AS ClassRank,
(SELECT COUNT(StudentID) FROM StudentScoreRankTermQ AS C WHERE (C.YearID = Sc.YearID) AND (C.TermID = Sc.TermID) AND (C.ClassID = Sc.ClassID) ) AS ClassTotal,
(SELECT COUNT(Str.Mark) FROM [StudentScoreRankTermQ] AS Str WHERE (Str.Mark >= Sc.Mark) AND (Str.YearID = Sc.YearID) AND (Str.TermID = Sc.TermID) AND (Str.ClassID=Sc.ClassID) AND (Str.StreamID = Sc.StreamID) ) AS StreamRank,
(SELECT COUNT(StudentID) FROM StudentScoreRankTermQ AS St WHERE (St.YearID = Sc.YearID) AND (St.TermID = Sc.TermID) AND (St.ClassID = Sc.ClassID) AND (St.StreamID = Sc.StreamID) ) AS StreamTotal,
Sc.CalendarYear, Sc.Term, Sc.ClassNo, Sc.Stream, Sc.StreamID, Sc.YearID, Sc.TermID, Sc.ClassID
FROM StudentScoreRankTermQ AS Sc
ORDER BY Sc.Mark DESC;
You should get something like this:
+-----------+-------------+------+------------+-------------+----------+-----------+-----------+------------+------------+-------------+------+------+-------+--------+
| StudentID | StudentName | Mark | SchoolRank | SchoolTotal | YearRank | YearTotal | ClassRank | ClassTotal | StreamRank | StreamTotal | Year | Term | Class | Stream |
+-----------+-------------+------+------------+-------------+----------+-----------+-----------+------------+------------+-------------+------+------+-------+--------+
| 1 | Jane | 200 | 1 | 20 | 2 | 12 | 1 | 9 | 1 | 5 | 2017 | I | 2 | A |
| 2 | Tom | 199 | 2 | 20 | 1 | 12 | 3 | 9 | 1 | 4 | 2016 | I | 1 | B |
+-----------+-------------+------+------------+-------------+----------+-----------+-----------+------------+------------+-------------+------+------+-------+--------+
Use the separators | to reconstruct the result table
Just an idea about the tables, each student will be related to a class. Each class relates to years. Each stream relates to a class. Each term relates to a year. Each exam relates to a term and student and a class and a year; a student can be in class 1A in 2016 and moves on to class 2b in 2017, etc…
Let me also add that this a beta result, I have not tested it well enough and I do not yet have an opportunity to create a lot of data to see the performance. My first glance at it told me that it is good. So if you find reasons or alerts you want to point my way, please do so in comments so I may keep learning!