I know I am beating a dead horse here it seems like, but I have messed with this for an hour, trying all the examples I can find and nothing seems to be doing it for me. Below is a very dumbed down version of what I am going after. In my real world solution I am querying like 14 columns, with 2 joins and only like 3 conditions.
select distinct
d.rental_ticket,
i.Invoice_Number
from HP_View_DEL_Ticket_Header_Master as d
join CSView_INVC_Header_Master as i
on d.Rental_Ticket = i.Rental_Ticket_or_Tag_Number
where d.Ticket_Month <= '6'
and d.Ticket_Year = 2014
order by Rental_Ticket
I get something like this
Rental Invoice
3023 3127
3146 3074
3215 3103
3235 3167
3245 3054 -- dup
3245 3055 -- dup
3249 3081
3251 3214
3255 3102
3261 3099
3267 3098
3276 3056
I know since I am using distinct with multiple columns it will filter down to all combinations. well like many, I just need to see the rental number once, no matter how many invoices it has.
in my live query, I am using a condition that is looking for a code, CRT, I only want to see one line of data for (in turn on rental number) no matter if there is only one or 10 CRT codes present
I threw this in there based on another person example but it seemed to do nothing
where d.Rental_Ticket in (select max(Rental_Ticket) as rental_ticket from HP_View_DEL_Ticket_Header_Master as d group by d.Rental_Ticket)
any help will be greatly appreciated!!
UPDATE:
select d.rental_ticket, max(i.invoice_number) as Invoice_Number,
d.Reference_Location1 as Rig, max(d.Rental_Ticket)
from HP_View_DEL_Ticket_Header_Master as d
join CSView_INVC_Header_Master as i
on d.Rental_Ticket = i.Rental_Ticket_or_Tag_Number
where d.Ticket_Month <= '6'
and d.Ticket_Year = 2014
group by d.Rental_Ticket, d.Reference_Location1
order by Rental_Ticket
this give me 4 columns, when really I am only going to need 2 (Rental_Ticket and Rig)
thanks BD
Replace distinct with group by and that will give you a whole bunch of options:
select d.rental_ticket,
MIN(i.Invoice_Number) as Invoice_Number
from HP_View_DEL_Ticket_Header_Master as d
join CSView_INVC_Header_Master as i
on d.Rental_Ticket = i.Rental_Ticket_or_Tag_Number
where d.Ticket_Month <= '6'
and d.Ticket_Year = 2014
group by d.rental_ticket
order by Rental_Ticket
I would do this:
select d.rental_ticket, MAX(i.Invoice_Number)
from HP_View_DEL_Ticket_Header_Master as d
join CSView_INVC_Header_Master as i
on d.Rental_Ticket = i.Rental_Ticket_or_Tag_Number
where d.Ticket_Month <= '6'
and d.Ticket_Year = 2014
GROUP BY d.rental_ticket
order by d.rental_ticket
Basically you want to get data for each unique Rental Ticket. The problem is that the server knows that you could have several invoice numbers for each rental ticket. So you group by the Rental Ticket to get only unique values.
For all the other columns you need to use an aggregate function. Something to take all those instances of invoice numbers and get just one for each grouping of rental tickets.
In my example I used MAX. Which gives you 3055 as the invoice number for the rental ticket of 3245.
If you don't want to use Group By then these answers have some alternatives.
Related
This question already has answers here:
Two SQL LEFT JOINS produce incorrect result
(3 answers)
Closed 12 months ago.
I am having some troubles with a count function. The problem is given by a left join that I am not sure I am doing correctly.
Variables are:
Customer_name (buyer)
Product_code (what the customer buys)
Store (where the customer buys)
The datasets are:
Customer_df (list of customers and product codes of their purchases)
Store1_df (list of product codes per week, for Store 1)
Store2_df (list of product codes per day, for Store 2)
Final output desired:
I would like to have a table with:
col1: Customer_name;
col2: Count of items purchased in store 1;
col3: Count of items purchased in store 2;
Filters: date range
My query looks like this:
SELECT
DISTINCT
C_customer_name,
C.product_code,
COUNT(S1.product_code) AS s1_sales,
COUNT(S2.product_code) AS s2_sales,
FROM customer_df C
LEFT JOIN store1_df S1 USING(product_code)
LEFT JOIN store2_df S2 USING(product_code)
GROUP BY
customer_name, product_code
HAVING
S1_sales > 0
OR S2_sales > 0
The output I expect is something like this:
Customer_name
Product_code
Store1_weekly_sales
Store2_weekly_sales
Luigi
120012
4
8
James
100022
6
10
But instead, I get:
Customer_name
Product_code
Store1_weekly_sales
Store2_weekly_sales
Luigi
120012
290
60
James
100022
290
60
It works when instead of COUNT(product_code) I do COUNT(DSITINCT product_code) but I would like to avoid that because I would like to be able to aggregate on different timespans (e.g. if I do count distinct and take into account more than 1 week of data I will not get the right numbers)
My hypothesis are:
I am joining the tables in the wrong way
There is a problem when joining two datasets with different time aggregations
What am I doing wrong?
The reason as Philipxy indicated is common. You are getting a Cartesian result from your data thus bloating your numbers. To simplify, lets consider just a single customer purchasing one item from two stores. The first store has 3 purchases, the second store has 5 purchases. Your total count is 3 * 5. This is because for each entry in the first is also joined by the same customer id in the second. So 1st purchase is joined to second store 1-5, then second purchase joined to second store 1-5 and you can see the bloat. So, by having each store pre-query the aggregates per customer will have AT MOST, one record per customer per store (and per product as per your desired outcome).
select
c.customer_name,
AllCustProducts.Product_Code,
coalesce( PQStore1.SalesEntries, 0 ) Store1SalesEntries,
coalesce( PQStore2.SalesEntries, 0 ) Store2SalesEntries
from
customer_df c
-- now, we need all possible UNIQUE instances of
-- a given customer and product to prevent duplicates
-- for subsequent queries of sales per customer and store
JOIN
( select distinct customerid, product_code
from store1_df
union
select distinct customerid, product_code
from store2_df ) AllCustProducts
on c.customerid = AllCustProducts.customerid
-- NOW, we can join to a pre-query of sales at store 1
-- by customer id and product code. You may also want to
-- get sum( SalesDollars ) if available, just add respectively
-- to each sub-query below.
LEFT JOIN
( select
s1.customerid,
s1.product_code,
count(*) as SalesEntries
from
store1_df s1
group by
s1.customerid,
s1.product_code ) PQStore1
on AllCustProducts.customerid = PQStore1.customerid
AND AllCustProducts.product_code = PQStore1.product_code
-- now, same pre-aggregation to store 2
LEFT JOIN
( select
s2.customerid,
s2.product_code,
count(*) as SalesEntries
from
store2_df s2
group by
s2.customerid,
s2.product_code ) PQStore2
on AllCustProducts.customerid = PQStore2.customerid
AND AllCustProducts.product_code = PQStore2.product_code
No need for a group by or having since all entries in their respective pre-aggregates will result in a maximum of 1 record per unique combination. Now, as for your needs to filter by date ranges. I would just add a WHERE clause within each of the AllCustProducts, PQStore1, and PQStore2.
I have these two tables and im trying to get the dateindex of the last time that the company was rated below a B+. dateindex=19941 which means 1994 quarter 1
This selects all the companies that have B+ or above in q2 2020
SELECT DISTINCT mr.name, mc.rating, mr.DateIndex
FROM [Model].[rating]mc inner join [Model].[RawHist]mr
ON mc.BankId=mr.BankId
WHERE mc.Rating in ('A+','A','A-','B+') AND mr.DateIndex in('20202')
And it yields the following
How can I add the dateindex the last time it was below B+. so it would have those three fields and two more fields one with the last grade below b+ and its date index for 5 total fields.
This is what i have so far with the results
Its giving me way to many rows.
I have these two tables and im trying to get the dateindex of the last time that the company was rated below a B+.
This sounds like aggregation:
SELECT mr.name, MAX(mr.DateIndex)
FROM [Model].[rating] mc JOIN
[Model].[RawHist]mr
ON mc.BankId = mr.BankId
WHERE mc.Rating NOT IN ('A+', 'A', 'A-','B+')
GROUP BY mr.name;
This assumes that "less than B+" means that it is not one of the listed ratings.
I was wondering if it is possible to get 1 sql statement for my stocklevels of my different articles instead of doing that for all parts individually. This, to reduce the amount of communication with the server and to be more efficient.
The starting point is the next statement:
SELECT SUM(STOCKIN.QUANTITY)- SUM(STOCKOUT.QUANTITY)
FROM STOCKIN
INNER JOIN STOCKOUT ON STOCKIN.FK_LOT=STOCKOUT=FK_LOT
WHERE FK_LOT = 123456789
This gives of article 123456789 the difference between the 2 tables (StockIN and StockOUT). This is the stock level.
SELECT SUM(STOCKIN.QUANTITY)- SUM(STOCKOUT.QUANTITY)
FROM STOCKIN
INNER JOIN STOCKOUT ON STOCKIN.FK_LOT=STOCKOUT=FK_LOT
WHERE FK_LOT IN (1234567,4567,654321,2345)
This one gives the difference between the tables (stockIN and StockOUT) of a couple of articles combined. The result will be 1 number.
What I am looking for is the amount fo stock for each article in 1 SQL:
1234567 = A
4567 = B
654321 = C
2345 = D
Is that possible or do I have to execute the first SQL a lot of times for all the different articles?
EDIT: ( I do not know if I have to do it like this on this forum or if I may use the reply button.... I know, on tis forum, the moderation is strickt..)
I have added GROUP BY and that works. But....
Other Strange things happens:
I understand that the below SQL is not logical but it is a reduction of my initial SQL.. IT just gives a strange result and therefore my big sql goes wrong....
Even when reducing the SQL to:
SELECT
SUM(R_STOCKIN.QUANTITY)
From R_STOCKIN INNER Join R_STOCKOUT ON R_STOCKIN.FK_LOT=R_STOCKOUT.FK_LOT
WHERE R_STOCKIN.FK_LOT =1350
Gives a different result as:
SELECT sum(QUANTITY)
FROM [Speeltuin].[dbo].[R_STOCKIN] WHERE FK_LOT = 1350
It is a bigger number but he does not add the QUANTITY of the STOCK out table... I can not find out what he is doing.
Sum of stock in: 144
Sum of stock out: 122
Result of combined query: 864..
Anybody an idea?
It probably has to do with the fact that in STOCKOUT also a key FK_STOCKIN exists.
Stockout has 6 result and stockin has 2 results.. HE combines it to 12 results.
But, how to overcome this? Anybody an idea?
Does it need to be done without the JOIN statement? If yes, how?
Simply GROUP BY:
SELECT FK_LOT, SUM(STOCKIN.QUANTITY)- SUM(STOCKOUT.QUANTITY)
FROM STOCKIN
INNER JOIN STOCKOUT ON STOCKIN.FK_LOT=STOCKOUT.FK_LOT
WHERE FK_LOT IN (1234567, 4567, 654321, 2345)
GROUP BY FK_LOT
Edit: Do a UNION ALL instead, use negative QUANTITY values for STOCKOUT. GROUP BY the result:
select FK_LOT, SUM(QUANTITY)
from
(
select FK_LOT, QUANTITY from STOCKIN
UNION ALL
select FK_LOT, -QUANTITY from STOCKOUT
) dt
group by FK_LOT
I have a transaction fact table and product,time and location as a dimension tables. This fact table is sparse so if no Pizzas sold in January there is no record for Pizza in fact table.
When I drill down by product aggregated results for Pizza which is not in the result. But I want it with 0 values as units_sold = 0.
A solution is to join product table to fact table with a left outer join. Then I can get the desired result.
But when I cut with another dimension such as location or time, again those products are missing in the result.
Outer join provides empty column for other dimensional foreign keys so WHERE clause will remove them again.
How can I solve the problem? (I use ROLAP)
Use join condition is a good idea as some people answered. But I need more general solution.
For example,
Table1
person birth year death year
a 1950 2006
b 1952 2008
c 1960 2007
d 1953 1990
I want to get year by year count of the people that born between 1950-1953 and died in 2006-2008.
Like
birth = 1950 death = 2006 count = 1
birth = 1951 death = 2006 count = 0
...
Can we handle this scenario by using join conditions and where conditions appropriately.
You want LEFT JOIN, and then LEFT JOIN again. Then conditions go in the on clause. Something like this:
select . . .
from products p left join
fact f
on p.product_id = f.product_id left join
timedim td
on f.time_id = td.time_id and
td.month = 'January'
I'm going over some practise questions for an exam that I have coming up and I'm having a problem fully understanding group by. I see GROUP BY as the following: group the result set by one or more columns.
I have the following database schema
My query
SELECT orders.customer_numb, sum(order_lines.cost_line), customers.customer_first_name, customers.customer_last_name
FROM orders
INNER JOIN customers ON customers.customer_numb = orders.customer_numb
INNER JOIN order_lines ON order_lines.order_numb = orders.order_numb
GROUP BY orders.customer_numb, order_lines.cost_line, customers.customer_first_name, customers.customer_last_name
ORDER BY order_lines.cost_line DESC
What I'm struggling to understand
Why can't I simply use just GROUP BY orders.cost_line and group the data by cost_line?
What I'm trying to achieve
I'd like to achieve the name of the customer who has spent the most money. I just don't fully understand how to achieve this. I understand how joins work, I just can't seem to get my head around why I can't simply GROUP BY customer_numb and cost_line (with sum() used to calculate the amount spent). I seem to always get "not a GROUP BY expression", if someone could explain what I'm doing wrong (not just give me the answer), that would be great - I'd really appreciate that, and of course any resources that you have for using GROUP by properly.
Sorry for the long essay and If I've missed anything I apologise. Any help would be greatly appreciated.
I just can't seem to get my head around why I can't simply GROUP BY
customer_numb and cost_line (with sum() used to calculate the amount
spent).
When you say group by customer_numb you know that customer_numb uniquely identifies a row in the customer table (assuming customer_numb is either a primary or alternate key), so that any given customers.customer_numb will have one and only one value for customers.customer_first_name and customers.customer_last_name. But at parse time Oracle does not know, or at least acts like it does not know that. And it says, in a bit of panic, "What do I do if a single customer_numb has more than one value for customer_first_name?"
Roughly the rule is, expressions in the select clause can use expressions in the group by clause and/or use aggregate functions. (As well as constants and system variables that don't depend on the base tables, etc.) And by "use" I mean be the expression or part of the expression. So once you group on first name and last name, customer_first_name || customer_last_name would be a valid expression also.
When you have a table, like customers and are grouping by a primary key, or a column with a unique key and not null constraint, you can safely include them in group by clause. In this particular instance, group by customer.customer_numb, customer.customer_first_name, customer.customer_last_name.
Also note, that the order by in the first query will fail, since order_lines.cost_line doesn't have a single value for the group. You can order on sum(order_lines.cost_line) or use an column alias in the select clause and order on that alias
SELECT orders.customer_numb,
sum(order_lines.cost_line),
customers.customer_first_name,
customers.customer_last_name
FROM orders
INNER JOIN customers ON customers.customer_numb = orders.customer_numb
INNER JOIN order_lines ON order_lines.order_numb = orders.order_numb
GROUP BY orders.customer_numb,
customers.customer_first_name,
customers.customer_last_name
ORDER BY sum(order_lines.cost_line)
or
SELECT orders.customer_numb,
sum(order_lines.cost_line) as sum_cost_line,
. . .
ORDER BY sum_cost_line
Note: I've heard that some RDBMSes will imply additional expressions for the grouping without them being explicitly stated. Oracle is not one of those RDBMSes.
As for grouping by both customer_numb and cost_line Consider a DB with two customers, 1 and 2 with two orders of one line each:
Customer Number | Cost Line
1 | 20.00
1 | 20.00
2 | 35.00
2 | 30.00
select customer_number, cost_line, sum(cost_line)
FROM ...
group by customer_number, cost_line
order by sum(cost_line) desc
Customer Number | Cost Line | sum(cost_line)
1 | 20.00 | 40.00
2 | 35.00 | 35.00
2 | 30.00 | 30.00
The first row with highest sum(cost_line) is not the customer who spent the most.
I understand how joins work, I just can't seem to get my head around
why I can't simply GROUP BY customer_numb and cost_line (with sum()
used to calculate the amount spent).
This should give you the sum for every customer.
SELECT orders.customer_numb, sum(order_lines.cost_line)
FROM orders
INNER JOIN order_lines ON order_lines.order_numb = orders.order_numb
GROUP BY orders.customer_numb
Note that every column in the SELECT clause that's not an argument to an aggregate function is also a column in the GROUP BY clause.
Now you can join that with other tables to get more detail. Here's one way using a common table expression. (There are other ways to express what you want.)
with customer_sums as (
-- We give the columns useful aliases here.
SELECT orders.customer_numb as customer_numb,
sum(order_lines.cost_line) as total_orders
FROM orders
INNER JOIN order_lines ON order_lines.order_numb = orders.order_numb
GROUP BY orders.customer_numb
)
select c.customer_numb, c.customer_first_name, c.customer_last_name, cs.total_orders
from customers c
inner join customer_sums cs
on cs.customer_numb = c.customer_numb
order by cs.total_orders desc
Why can't I simply use just GROUP BY orders.cost_line and group the
data by cost_line?
Applying GROUP BY to order_lines.cost_line will give you one row for each distinct value in order_lines.cost_line. (The column orders.cost_line doesn't exist.) Here's what that data might look like.
OL.ORDER_NUMB OL.COST_LINE O.CUSTOMER_NUMB C.CUSTOMER_FIRST_NAME C.CUSTOMER_LAST_NAME
--
1 1.45 2014 Julio Savell
1 2.33 2014 Julio Savell
1 1.45 2014 Julio Savell
2 1.45 2014 Julio Savell
2 1.45 2014 Julio Savell
3 13.00 2014 Julio Savell
You can group by order_lines.cost_line, but it won't give you any useful information. This query
select order_lines.cost_line, orders.customer_numb
from order_lines
inner join orders on orders.customer_numb = order_lines.customer_numb
group by order_lines.cost_line;
should return something like this.
OL.COST_LINE O.CUSTOMER_NUMB
--
1.45 2014
2.33 2014
13.00 2014
Not terribly useful.
If you're interested in the sum of the order line items, you need to decide what column or columns to group (summarize) by. If you group (summarize) by order number, you'll get three rows. If you group (summarize) by customer number, you'll get one row.