Group By a Specific Column - sql

I had this query that was working fine...
select a.action_id, a.request_date, a.customer_id from customer.actions a
inner join customer.customers c on c.id = a.customer_id
inner join customer.entities e on e.id = c.entity_id
where request_url like '%/test%'
order by a.request_date desc;
This returned +- 500 lines. Now I wanted to inner join with another table but it's returning like 4000 lines..
select a.action_id, a.customer_id, a.request_date, mp.payment_type from customer.actions a
inner join customer.customers c on c.id = a.customer_id
inner join customer.entities e on e.id = c.entity_id
inner join customer.mobile_payments mp on mp.customer_id = a.customer_id
where request_url like '%/test%'
order by a.request_date desc;
I get duplicate records for every record on the mobile_payments table.
Group by does not work
How do I remove the duplicate action_ids ? DISTINCT (a.action_id) doesn't work
EDIT:
I just want the records from the customer.actions filtered by where request_url like '%/test%' and I want to get the payment type from the mobile_payments table. 1 for each action_id. When I do that I get duplicate actions ids.

As accurately pointed out by J0eBl4ck in comment, the issue is that you are joining to the payments table on the sole condition of matching customer. Think of it like this. You have a credit card for purchases over 10 years. Each month you make a payment, so you have 120 payments made under your account.
Is there some other basis you are joining to the payment table? You have an action ID? Does that mean something? Do you intend to get a payment associated to that specific action? Is there such a relationship where it would be
customer -> payments joined on customer_id AND action_id?
A secondary option is to apply a sub-query from the payments table grouped by the customer id so it only returns a single row, this way it does not distort your overall record count. You may need to adjust the underlying criteria some, but this should help you look into final resolution to your needs.
select
a.action_id,
a.customer_id,
a.request_date,
mp.payment_type
from
customer.actions a
inner join customer.customers c
on a.customer_id = c.id
inner join customer.entities e
on c.entity_id = e.id
inner join
-- this "PQ" Pre-Query is by customer ID and example
-- to get the payment type based on whatever the most recent
-- transaction in first position
( select mp.customer_id,
mp.payment_type,
row_number() over (partition by mp.customer_id
order by mp.WhateverPrimaryKeyOrDateIs DESC ) as rowSeq
from
customer.mobile_payments mp ) PQ
-- so here, we are only joining on the FIRST (most recent due to descending)
-- payment type applicable for the customer ID in question joined to
on a.customer_id = PQ.customer_id
AND PQ.rowSeq = 1
where
request_url like '%/test%'
order by
a.request_date desc;
Again, you WILL probably need to adjust the pre-query as we dont get the context association of payment to a given action. Hopefully it will guide you in your final goal though.

Related

SQL Get aggregate as 0 for non existing row using inner joins

I am using SQL Server to query these three tables that look like (there are some extra columns but not that relevant):
Customers -> Id, Name
Addresses -> Id, Street, StreetNo, CustomerId
Sales -> AddressId, Week, Total
And I would like to get the total sales per week and customer (showing at the same time the address details). I have come up with this query
SELECT a.Name, b.Street, b.StreetNo, c.Week, SUM (c.Total) as Total
FROM Customers a
INNER JOIN Addresses b ON a.Id = b.CustomerId
INNER JOIN Sales c ON b.Id = c.AddressId
GROUP BY a.Name, c.Week, b.Street, b.StreetNo
and even if my SQL skill are close to none it looks like it's doing its job. But now I would like to be able to show 0 whenever the one customer don't have sales for a particular week (weeks are just integers). And I wonder if somehow I should get distinct values of the weeks in the Sales table, and then loop through them (not sure how)
Any help?
Thanks
Use CROSS JOIN to generate the rows for all customers and weeks. Then use LEFT JOIN to bring in the data that is available:
SELECT c.Name, a.Street, a.StreetNo, w.Week,
COALESCE(SUM(s.Total), 0) as Total
FROM Customers c CROSS JOIN
(SELECT DISTINCT s.Week FROM sales s) w LEFT JOIN
Addresses a
ON c.CustomerId = a.CustomerId LEFT JOIN
Sales s
ON s.week = w.week AND s.AddressId = a.AddressId
GROUP BY c.Name, a.Street, a.StreetNo, w.Week;
Using table aliases is good, but the aliases should be abbreviations for the table names. So, a for Addresses not Customers.
You should generate a week numbers, rather than using DISTINCT. This is better in terms of performance and reliability. Then use a LEFT JOIN on the Sales table instead of an INNER JOIN:
SELECT a.Name
,b.Street
,b.StreetNo
,weeks.[Week]
,COALESCE(SUM(c.Total),0) as Total
FROM Customers a
INNER JOIN Addresses b ON a.Id = b.CustomerId
CROSS JOIN (
-- Generate a sequence of 52 integers (13 x 4)
SELECT ROW_NUMBER() OVER (ORDER BY a.x) AS [Week]
FROM (VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) a(x)
CROSS JOIN (SELECT x FROM (VALUES(1),(1),(1),(1)) b(x)) b
) weeks
LEFT JOIN Sales c ON b.Id = c.AddressId AND c.[Week] = weeek.[Week]
GROUP BY a.Name
,b.Street
,b.StreetNo
,weeks.[Week]
Please try the following...
SELECT Name,
Street,
StreetNo,
Week,
SUM( CASE
WHEN Total IS NULL THEN
0
ELSE
Total
END ) AS Total
FROM Customers a
JOIN Addresses b ON a.Id = b.CustomerId
RIGHT JOIN Sales c ON b.Id = c.AddressId
GROUP BY a.Name,
c.Week,
b.Street,
b.StreetNo;
I have modified your statement in three places. The first is I changed your join to Sales to a RIGHT JOIN. This will join as it would with an INNER JOIN, but it will also keep the records from the table on the right side of the JOIN that do not have a matching record or group of records on the left, placing NULL values in the resulting dataset's fields that would have come from the left of the JOIN. A LEFT JOIN works in the same way, but with any extra records in the table on the left being retained.
I have removed the word INNER from your surviving INNER JOIN. Where JOIN is not preceded by a join type, an INNER JOIN is performed. Both JOIN and INNER JOIN are considered correct, but the prevailing protocol seems to be to leave the INNER out, where the RDBMS allows it to be left out (which SQL-Server does). Which you go with is still entirely up to you - I have left it out here for illustrative purposes.
The third change is that I have added a CASE statement that tests to see if the Total field contains a NULL value, which it will if there were no sales for that Customer for that Week. If it does then SUM() would return a NULL, so the CASE statement returns a 0 instead. If Total does not contain a NULL value, then the SUM() of all values of Total for that grouping is performed.
Please note that I am assuming that Total will not have any NULL values other than from the RIGHT JOIN. Please advise me if this assumption is incorrect.
Please also note that I have assumed that either there will be no missing Weeks for a Customer in the Sales table or that you are not interested in listing them if there are. Again, please advise me if this assumption is incorrect.
If you have any questions or comments, then please feel free to post a Comment accordingly.

Excluding multiple results in specific column (SQL JOIN)

I'm taking my first steps in terms of practical SQL use in real life.
I have a few tables with contractual and financial information and the query works exactly as I need - to a certain point. It looks more or less like that:
SELECT /some columns/ from CONTRACTS
Linked 3 extra tables with INNER JOIN to add things like department names, product information etc. This all works but they all have simplish one-to-one relationship (one contract related to single department in Department table, one product information entry in the corresponding table etc).
Now this is my challenge:
I also need to add contract invoicing information doing something like:
inner join INVOICES on CONTRACTS.contnoC = INVOICES.contnoI
(and selecting also the Invoice number linked to the Contract number, although that's partly optional)
The problem I'm facing is that unlike with other tables where there's always one-to-one relationship when joining tables, INVOICES table can have multiple (or none at all) entries that correspond to a single contract no. The result is that I will get multiple query results for a single contract no (with different invoice numbers presented), needlessly crowding the query results.
Essentially I'm looking to add INVOICES table to a query to just identify if the contract no is present in the INVOICES table (contract has been invoiced or not). Invoice number itself could be presented (it is with INNER JOIN), however it's not critical as long it's somehow marked. Invoice number fields remains blank in the result with the INNER JOIN function, which is also necessary (i.e. to have the row presented even if the match is not found in INVOICES table).
SELECT DISTINCT would look to do what I need, but I seemed to face the problem that I need to levy DISTINCT criteria only for column representing contract numbers, NOT any other column (there can be same values presented, but all those should be presented).
Unfortunately I'm not totally aware of what database system I am using.
Seems like the question is still getting some attention and in an effort to provide some explanation here are a few techniques.
If you just want any contract with details from the 1 to 1 tables you can do it similarily to what you have described. the key being NOT to include any column from Invoices table in the column list.
SELECT
DISTINCT Contract, Department, ProductId .....(nothing from Invoices Table!!!)
FROM
Contracts c
INNER JOIN Departments D
ON c.departmentId = d.Department
INNER JOIN Product p
ON c.ProductId = p.ProductId
INNER JOIN Invoices i
ON c.contnoC = i.contnoI
Perhaps a Little cleaner would be to use IN or EXISTS like so:
SELECT
Contract, Department, ProductId .....(nothing from Invoices Table!!!)
FROM
Contracts c
INNER JOIN Departments D
ON c.departmentId = d.Department
INNER JOIN Product p
ON c.ProductId = p.ProductId
WHERE
EXISTS (SELECT 1 FROM Invoices i WHERE i.contnoI = c.contnoC )
SELECT
Contract, Department, ProductId .....(nothing from Invoices Table!!!)
FROM
Contracts c
INNER JOIN Departments D
ON c.departmentId = d.Department
INNER JOIN Product p
ON c.ProductId = p.ProductId
WHERE
contnoC IN (SELECT contnoI FROM Invoices)
Don't use IN if the SELECT ... list can return a NULL!!!
If you Actually want all of the contracts and just know if a contract has been invoiced you can use aggregation and a case expression:
SELECT
Contract, Department, ProductId, CASE WHEN COUNT(i.contnoI) = 0 THEN 0 ELSE 1 END as Invoiced
FROM
Contracts c
INNER JOIN Departments D
ON c.departmentId = d.Department
INNER JOIN Product p
ON c.ProductId = p.ProductId
LEFT JOIN Invoices i
ON c.contnoC = i.contnoI
GROUP BY
Contract, Department, ProductId
Then if you actually want to return details about a particular invoice you can use a technique similar to that of cybercentic87 if your RDBMS supports or you could use a calculated column with TOP or LIMIT depending on your system.
SELECT
Contract, Department, ProductId, (SELECT TOP 1 InvoiceNo FROM invoices i WHERE c.contnoC = i.contnoI ORDER BY CreateDate DESC) as LastestInvoiceNo
FROM
Contracts c
INNER JOIN Departments D
ON c.departmentId = d.Department
INNER JOIN Product p
ON c.ProductId = p.ProductId
GROUP BY
Contract, Department, ProductId
I would do it this way:
with mainquery as(
<<here goes you main query>>
),
invoices_rn as(
select *,
ROW_NUMBER() OVER (PARTITION BY contnoI order by
<<some column to decide which invoice you want to take eg. date>>) as rn
)
invoices as (
select * from invoices_rn where rn = 1
)
select * from mainquery
left join invoices i on contnoC = i.contnoI
This gives you an ability to get all of the invoice details to your query, also it gives you full control of which invoice you want see in your main query. Please read more about CTEs; they are pretty handy and much easier to understand / read than nested selects.
I still don't know what database you are using. If ROW_NUMBER is not available, I will figure out something else :)
Also with a left join you should use COALESCE function for example:
COALESCE(i.invoice_number,'0')
Of course this gives you some more possibilities, you could for example in your main select do:
CASE WHEN i.invoicenumber is null then 'NOT INVOICED'
else 'INVOICED'
END as isInvoiced
You can use
SELECT ..., invoiced = 'YES' ... where exists ...
union
SELECT ..., invoiced = 'NO' ... where not exists ...
or you can use a column like "invoiced" with a subquery into invoices to set it's value depending on whether you get a hit or not

Selecting Data from 2 tables based on another

I am using Oracle via the SQL+ command line, i'm trying to display data from two different tables, but i require the use of a third table to determine what to display. Below is an image of my sample 3 tables.
Stack won't let me show my images in the question so here is a link:
I want to display the "Name","O_ID" and "Date" for each order. I'm quite new to SQL and this may have been answered before but i could not find it.
JOIN the tables:
SELECT
c.Name,
o.O_ID,
od.Date
FROM Customer AS c
INNER JOIN "Order" AS o ON c.C_ID = o.C_ID
INNER JOIN OrderDate AS od ON o.O_ID = od.O_ID;
What you are attempting to do is a very common practice and requires a INNER JOIN.
From a design perspective, the Order Date table shouldn't even exist. The date column should just reside within the Order table. I wrote the query based on that design:
SELECT
o.O_ID,
o.Date,
c.Name,
FROM
customer AS c
INNER JOIN order AS o ON c.C_ID = o.C_ID
Edit:
More on the issue of your design: A natural ordering of your data would be that there is 1 date for an order, not many dates for an order. Introducing another table to simply store the date allows for there to potentially be many dates associated with an order, which is simply unnatural.
Do a JOIN
select c.Name, o.O_ID, od.Date
from Customer c
inner join Order o on o.O_ID = c.C_ID
inner join OrderDate od on o.O_ID = od_ID
Doc: http://www.w3schools.com/sql/sql_join.asp

SQL List the representative that handles the most customers

I have 2 tables SALESREP and CUSTOMER
I need to find out which salesrep has most customers
I have the following code:
select rep_lname, count(cust_num)
from customer inner join salesrep
on customer.REP_NUM = SALESREP.REP_NUM
group by rep_lname
This gives me all the rows with the number of customers each salesrep has, instead I need only one row that has the most customers.
How can I find the row with MAX num of customers?
select rep_lname, count(cust_num)
from customer inner join salesrep
on customer.REP_NUM = SALESREP.REP_NUM
group by rep_lname order by count(cust_num) desc limit 1;
I'm sure there's another way using having, but I can't seem to figure it out at the moment. Perhaps somebody else will chime in with it?
SELECT TOP 1 WITH TIES rep_lname, COUNT(cust_num)
FROM customer inner join salesrep
ON customer.REP_NUM = SALESREP.REP_NUM
GROUP BY rep_lname
ORDER BY count(cust_num) DESC

SQL query for join with condition

I have these two tables:
Customers: Id, Name
Orders: Id, CustomerId, Time, Status
I want to get a list of customers for which the LAST order does not have a status of 'Wrong'.
I know how to use a LEFT JOIN to get a count of orders for each customer, but I don't know how I can use this statement for what I want. Maybe a JOIN is not the right thing to use too, I'm not sure.
It's possible that customers do not have any order, and they should be returned.
I'm abstracting the real tables here, but the scenario is for a windows phone app sending notifications. I want to get all clients for which their last notification does not have a 'Dropped' status. I can sort their notifications (orders) by the 'Time' field. Thanks for the help, while I continue experimenting with subqueries in the where clause.
Select ...
From Customers As C
Where Not Exists (
Select 1
From Orders As O1
Join (
Select O2.CustomerId, Max( O2.Time ) As Time
From Orders As O2
Group By O2.CustomerId
) As LastOrderTime
On LastOrderTime.CustomerId = O1.CustomerId
And LastOrderTime.Time = O1.Time
Where O1.Status = 'Dropped'
And O1.CustomerId = C.Id
)
There are obviously alternatives based on the actual database product and version. For example, in SQL Server one could use the TOP command or a CTE perhaps. However, without knowing what specific product is being used, the above solution should produce the results you want in almost any database product.
Addition
If you were using a product that supported ranking functions (which database product and version isn't mentioned) and common-table expressions, then an alternative solution might be something like so:
With RankedOrders As
(
Select O.CustomerId, O.Status
, Row_Number() Over( Partition By CustomerId Order By Time Desc ) As Rnk
From Orders As O
)
Select ...
From Customers
Where Not Exists (
Select 1
From RankedOrders As O1
Where O1.CustomerId = C.Id
And O1.Rnk = 1
And O1.Status = 'Dropped'
)
Assuming Last order refers to the Time column here is my query:
SELECT C.Id,
C.Name,
MAX(O.Time)
FROM
Customers C
INNER JOIN Orders O
ON C.Id = O.CustomerId
WHERE
O.Status != 'Wrong'
GROUP BY C.Id,
C.Name
EDIT:
Regarding your table configuration. You should really consider revising the structure to include a third table. They would look like this:
Customer
CustomerId | Name
Order
OrderId | Status | Time
CompletedOrders
CoId | CustomerId | OrderId
Now what you do is store the info about a customer or order in their respective tables ... then when an order is made you just create a CompletedOrders entry with the ids of the 2 individual records. This will allow for a 1 to Many relationship between customer and orders.
Didn't check it out, but something like this?
SELECT c.CustmerId, c.Name, MAX(o.Time)
FROM Customers c
LEFT JOIN Orders o ON o.CustomerId = c.CustomerId
WHERE o.Status <> 'Wrong'
GROUP BY c.CustomerId, C.Name
You can get list of customers with the LAST order which has status of 'Wrong' with something like
select customerId from orders where status='Wrong'
group by customerId
having time=max(time)