SQL Query to evaluate multiple rows - sql

I would like to retrieve only the customers from an order table who have paid for all the orders. (Paid = 'Y').
The order table looks like this:
Cus # Order # Paid
111 1 Y
111 2 Y
222 3 Y
222 4 N
333 5 N
In this example the query should only return customer 111.
The query
Select * from order where Paid = 'Y';
returns customers that have paid and unpaid orders (ex. customer 222) in addition to customers who have paid for all of their orders (customer 111).
How do I structure the query to evaluate all the orders for a customer and only return information for a customer that has paid for all the orders?

Looking the problem a different way, you need only the customers who don't have any unpaid order.
sel cus from order group by Cus having min(Paid) = 'Y';
The above query also utilizes the fact that 'Y' > 'N'.
SQL Fiddle:
http://sqlfiddle.com/#!4/f6022/1
If you need to select all different orders for eligible customers, you may use OLAP functions:
select cus,order,paid from (select cus,order,paid,min(paid)
over (partition by cus)minz from order)dt where minz='Y';

With Oracle, you can also do select customer from your_table where paid='Y' minus select customer from your_table where paid='N', although I don't know if this is faster and, of course, you don't get the other fields in this case.

Don't know Oracle but often work in SQL Server, I would write query something like this
Select Cus from order group by Cus having count(*) = sum(case when Paid = ‘Y’ then 1 else 0);
Basically you retrieve customers where total order count equals to sum or paid orders.
Again, apologize for not giving proper Oracle syntax but hopefully that will point you to the right direction.

SELECT *
FROM orders oo
WHERE NOT EXISTS
(
SELECT 1
FROM orders io
WHERE oo.cus# = io.cus#
AND io.paid = 'N'
)
;

Related

using correlated subquery in the case statement

I’m trying to use a correlated subquery in my sql code and I can't wrap my head around what I'm doing wrong. A brief description about the code and what I'm trying to do:
The code consists of a big query (ALIASED AS A) which result set looks like a list of customer IDs, offer IDs and response status name ("SOLD","SELLING","IRRELEVANT","NO ANSWER" etc.) of each customer to each offer. The customers IDs and the responses in the result set are non-unique, since more than one offer can be made to each customer, and a customer can have different response for different offers.
The goal is to generate a list of distinct customer IDs and to mark each ID with 0 or 1 flag :
if the ID has AT LEAST ONE offer with status name is "SOLD" or "SELLING" the flag should be 1 otherwise 0. Since each customer has an array of different responses, what I'm trying to do is to check if "SOLD" or "SELLING" appears in this array for each customer ID, using correlated subquery in the case statement and aliasing the big underlying query named A with A1 this time:
select distinct
A.customer_ID,
case when 'SOLD' in (select distinct A1.response from A as A1
where A.customer_ID = A1.customer_ID) OR
'SELLING' in (select distinct A1.response from A as A1
where A.customer_ID = A1.customer_ID)
then 1 else 0 end as FLAG
FROM
(select …) A
What I get is a mistake alert saying there is no such object as A or A1.
Thanks in advance for the help!
You can use exists with cte :
with cte as (
<query here>
)
select c.*,
(case when exists (select 1
from cte c1
where c1.customer_ID = c.customer_ID and
c1.response in ('sold', 'selling')
)
then 1 else 0
end) as flag
from cte c;
You can also do aggregation :
select customer_id,
max(case when a.response in ('sold', 'selling') then 1 else 0 end) as flag
from < query here > a;
group by customer_id;
With statement as suggested by Yogesh is a good option. If you have any performance issues with "WITH" statement. you can create a volatile table and use columns from volatile table in your select statement .
create voltaile table as (select response from where response in ('SOLD','SELLING').
SELECT from customer table < and join voltaile table>.
The only disadvantge here is volatile tables cannot be accessed after you disconnect from session.

Conditional left join on max date and where clause in second table

I am attempting to join a customer table with sales table where I show the list of all customers in database and any paid sale the customer might have in the sales tables. Now a customer can have multiple sales rows in the sales table.
This is an example sales record of one customer with multiple sales in the sale tables
while extracting this record I would like to get only the MAX (q_saledatetime) WHERE the q_paidamount is > 0.
as in show me the last time this customer made a payment to us. So in this case row 2 where they paid 8.90 is what I would like to get for that customer. If a customer has no record in the sales table, show their name/details on the list either way.
My failure at the moment is how to include the where clause of the paid amount + max date column.
ATTEMPT A
select DISTINCT ON (q_customer.q_code)
q_customer.q_code, q_customer.q_name, -- customer info
MAX(q_saleheader.q_saledatetime) AS latestDate, q_saleheader.q_paidamount -- saleheader info
FROM q_customer
LEFT JOIN q_saleheader ON (q_customer.q_code = q_saleheader.q_customercode)
group by q_customer.q_code, q_customer.q_name , q_saleheader.q_saledatetime, q_saleheader.q_paidamount
order by q_customer.q_code ASC
which results in
so for Fred Blogg is picking up details from row 4 instead of 2 (first image). As there's no rule for q_paidamount at this point
ATTEMPT B
SELECT
customer.q_code, customer.q_name, -- customer info
sale.q_saledatetime, sale.q_paidamount -- sale info
FROM q_customer customer
LEFT JOIN (SELECT * FROM q_saleheader WHERE q_saledatetime =
(SELECT MAX(q_saledatetime) FROM q_saleheader b1 where q_paidamount > 0 ))
sale ON sale.q_customercode = customer.q_code
which results in
This doesnt seem to be getting any information from the sale table at all.
Update:
After having a closer look at my first attempt I amended the statement and came up with this solution which achieves the same results as Michal's answer. I just curious to know is there any pitfalls or perfomance disadvantages with the following way.
select DISTINCT ON (q_customer.q_code)
q_customer.q_code, q_customer.q_name, -- customer info
q_saleheader.q_saledatetime, q_saleheader.q_paidamount -- saleheader info
FROM q_customer
LEFT JOIN q_saleheader ON (q_customer.q_code = q_saleheader.q_customercode AND
q_saleheader.q_paidamount > 0 )
group by q_customer.q_code, q_customer.q_name , q_saleheader.q_saledatetime,
q_saleheader.q_paidamount
order by q_customer.q_code ASC, q_saleheader.q_saledatetime DESC
main change was adding AND q_saleheader.q_paidamount > 0 on the join and q_saleheader.q_saledatetime DESC to make sure are getting the top row of that related data. As mentioned both Michal's answer and this solution achieve the same results. Just curious about pitfalls in either of the two ways.
Try this query:
SELECT c.q_code,
c.q_name,
CASE WHEN q_saledatetime <> '1900-01-01 00:00:00.000' THEN q_saledatetime END q_saledatetime,
q_paidamount
FROM (
SELECT c.q_code,
c.q_name,
coalesce(s.q_saledatetime, '1900-01-01 00:00:00.000') q_saledatetime, --it will indicate customer with no data
s.q_paidamount,
ROW_NUMBER() OVER (PARTITION BY c.q_code ORDER BY COALESCE(s.q_saledatetime, '1900-01-01') DESC) rn
FROM q_customer c
LEFT JOIN (SELECT q_saledatetime,
q_paidamount
FROM q_saleheader
WHERE q_paidamount > 0) s
ON c.q_code = s.q_customercode
) c WHERE rn = 1

SELECT TOP 10 rows

I have built an SQL Query that returns me the top 10 customers which have the highest outstanding. The oustanding is on product level (each product has its own outstanding).
Untill now everything works fine, my only problem is that if a certain customer has more then 1 product then the second product or more should be categorized under the same customer_id like in the second picture (because the first product that has the highest outstanding contagions the second product that may have a lower outstanding that the other 9 clients of top 10).
How can I modify my query in order to do that? Is it possible in SQL Server 2012?
My query is:
select top 10 CUSTOMER_ID
,S90T01_GROSS_EXPOSURE_THSD_EUR
,S90T01_COGNOS_PROD_NAME
,S90T01_DPD_C
,PREVIOUS_BUCKET_DPD_REP
,S90T01_BUCKET_DPD_REP
from [dbo].[DM_07MONTHLY_DATA]
where S90T01_CLIENT_SEGMENT = 'PI'
and YYYY_MM = '2017_01'
group by CUSTOMER_ID
,S90T01_GROSS_EXPOSURE_THSD_EUR
,S90T01_COGNOS_PROD_NAME
,S90T01_DPD_C
,PREVIOUS_BUCKET_DPD_REP
,S90T01_BUCKET_DPD_REP
order by S90T01_GROSS_EXPOSURE_THSD_EUR desc;
You need to calculate the top Customers first, then pull out all their products. You can do this with a Common Table Expression.
As you haven't provided any test data this is untested, but I think it will work for you:
with top10 as
(
select top 10 CUSTOMER_ID
,sum(S90T01_GROSS_EXPOSURE_THSD_EUR) as TOTAL_S90T01_GROSS_EXPOSURE_THSD_EUR
from [dbo].[DM_07MONTHLY_DATA]
where S90T01_CLIENT_SEGMENT = 'PI'
and YYYY_MM = '2017_01'
group by CUSTOMER_ID
order by TOTAL_S90T01_GROSS_EXPOSURE_THSD_EUR desc
)
select m.CUSTOMER_ID
,m.S90T01_GROSS_EXPOSURE_THSD_EUR
,m.S90T01_COGNOS_PROD_NAME
,m.S90T01_DPD_C
,m.PREVIOUS_BUCKET_DPD_REP
,m.S90T01_BUCKET_DPD_REP
from [dbo].[DM_07MONTHLY_DATA] m
join top10 t
on m.CUSTOMER_ID = t.CUSTOMER_ID
order by t.TOTAL_S90T01_GROSS_EXPOSURE_THSD_EUR desc
,m.S90T01_GROSS_EXPOSURE_THSD_EUR;

SQL aggregate functions and sorting

I am still new to SQL and getting my head around the whole sub-query aggregation to display some results and was looking for some advice:
The tables might look something like:
Customer: (custID, name, address)
Account: (accountID, reward_balance)
Shop: (shopID, name, address)
Relational tables:
Holds (custID*, accountID*)
With (accountID*, shopID*)
How can I find the store that has the least reward_balance?
(The customer info is not required at this point)
I tried:
SELECT accountID AS ACCOUNT_ID, shopID AS SHOP_ID, MIN(reward_balance) AS LOWEST_BALANCE
FROM Account, Shop, With
WHERE With.accountID = Account.accountID
AND With.shopID=Shop.shopID
GROUP BY
Account.accountID,
Shop.shopID
ORDER BY MIN(reward_balance);
This works in a way that is not intended:
ACCOUNT_ID | SHOP_ID | LOWEST_BALANCE
1 | 1 | 10
2 | 2 | 40
3 | 3 | 100
4 | 4 | 1000
5 | 4 | 5000
As you can see Shop_ID 4 actually has a balance of 6000 (1000+5000) as there are two customers registered with it. I think I need to SUM the lowest balance of the shops based on their balance and display it from low-high.
I have been trying to aggregate the data prior to display but this is where I come unstuck:
SELECT shopID AS SHOP_ID, MIN(reward_balance) AS LOWEST_BALANCE
FROM (SELECT accountID, shopID, SUM(reward_balance)
FROM Account, Shop, With
WHERE
With.accountID = Account.accountID
AND With.shopID=Shop.shopID
GROUP BY
Account.accountID,
Shop.shopID;
When I run something like this statement I get an invalid identifier error.
Error at Command Line : 1 Column : 24
Error report -
SQL Error: ORA-00904: "REWARD_BALANCE": invalid identifier
00904. 00000 - "%s: invalid identifier"
So I figured I might have my joining condition incorrect and the aggregate sorting incorrect, and would really appreciate any general advice.
Thanks for the lengthy read!
Approach this problem one step at time.
We're going to assume (and we should probably check this) that by least reward_balance, that refers to the total of all reward_balance associated with a shop. And we're not just looking for the shop that has the lowest individual reward balance.
First, get all of the individual "reward_balance" for each shop. Looks like the query would need to involve three tables...
SELECT s.shop_id
, a.reward_balance
FROM `shop` s
LEFT
JOIN `with` w
ON w.shop_id = s.shop_id
LEFT
JOIN `account` a
ON a.account_id = w.account_id
That will get us the detail rows, every shop along with the individual reward_balance amounts associated with the shop, if there are any. (We're using outer joins for this query, because we don't see any guarantee that a shops is going to be related to at least one account. Even if it's true for this use case, that's not always true in the more general case.)
Once we have the individual amounts, the next step is to total them for each shop. We can do that using a GROUP BY clause and a SUM() aggregate.
SELECT s.shop_id
, SUM(a.reward_balance) AS tot_reward_balance
FROM `shop` s
LEFT
JOIN `with` w
ON w.shop_id = s.shop_id
LEFT
JOIN `account` a
ON a.account_id = w.account_id
GROUP BY s.shop_id
At this point, with MySQL we could add an ORDER BY clause to arrange the rows in ascending order of tot_reward_balance, and add a LIMIT 1 clause if we only want to return a single row. We can also handle the case when tot_reward_balance is NULL, assigning a zero in place of the NULL.
SELECT s.shop_id
, IFNULL(SUM(a.reward_balance),0) AS tot_reward_balance
FROM `shop` s
LEFT
JOIN `with` w
ON w.shop_id = s.shop_id
LEFT
JOIN `account` a
ON a.account_id = w.account_id
GROUP BY s.shop_id
ORDER BY tot_reward_amount ASC, s.shop_id ASC
LIMIT 1
If there are two (or more) shops with the same least value of tot_reward_amount, this query returns only one of those shops.
Oracle doesn't have the LIMIT clause like MySQL, but we can get equivalent result using analytic function (which is not available in MySQL). We also replace the MySQL IFNULL() function with the Oracle equivalent NVL() function...
SELECT v.shop_id
, v.tot_reward_balance
, ROW_NUMBER() OVER (ORDER BY v.tot_reward_balance ASC, v.shop_id ASC) AS rn
FROM (
SELECT s.shop_id
, NVL(SUM(a.reward_balance),0) AS tot_reward_balance
FROM shop s
LEFT
JOIN with w
ON w.shop_id = s.shop_id
LEFT
JOIN account a
ON a.account_id = w.account_id
GROUP BY s.shop_id
) v
HAVING rn = 1
Like the MySQL query, this returns at most one row, even when two or more shops have the same "least" total of reward_balance.
If we want to return all of the shops that have the lowest tot_reward_balance, we need to take a slightly different approach.
The best approach to building queries is step wise refinement; in this case, start by getting all of the individual reward_amount for each shop. Next step is to aggregate the individual reward_amount into a total. The next steps is to pickout the row(s) with the lowest total reward_amount.
In SQL Server, You can try using a CTE:
;with cte_minvalue as
(
select rank() over (order by Sum_Balance) as RowRank,
ShopId,
Sum_Balance
from (SELECT Shop.shopID, SUM(reward_balance) AS Sum_Balance
FROM
With
JOIN Shop ON With.ShopId = Shop.ShopId
JOIN Account ON With.AccountId = Account.AccountId
GROUP BY
Shop.shopID)ShopSum
)
select ShopId, Sum_Balance from cte_minvalue where RowRank = 1

How can I choose the closest match in SQL Server 2005?

In SQL Server 2005, I have a table of input coming in of successful sales, and a variety of tables with information on known customers, and their details. For each row of sales, I need to match 0 or 1 known customers.
We have the following information coming in from the sales table:
ServiceId,
Address,
ZipCode,
EmailAddress,
HomePhone,
FirstName,
LastName
The customers information includes all of this, as well as a 'LastTransaction' date.
Any of these fields can map back to 0 or more customers. We count a match as being any time that a ServiceId, Address+ZipCode, EmailAddress, or HomePhone in the sales table exactly matches a customer.
The problem is that we have information on many customers, sometimes multiple in the same household. This means that we might have John Doe, Jane Doe, Jim Doe, and Bob Doe in the same house. They would all match on on Address+ZipCode, and HomePhone--and possibly more than one of them would match on ServiceId, as well.
I need some way to elegantly keep track of, in a transaction, the 'best' match of a customer. If one matches 6 fields, and the others only match 5, that customer should be kept as a match to that record. In the case of multiple matching 5, and none matching more, the most recent LastTransaction date should be kept.
Any ideas would be quite appreciated.
Update: To be a little more clear, I am looking for a good way to verify the number of exact matches in the row of data, and choose which rows to associate based on that information. If the last name is 'Doe', it must exactly match the customer last name, to count as a matching parameter, rather than be a very close match.
for SQL Server 2005 and up try:
;WITH SalesScore AS (
SELECT
s.PK_ID as S_PK
,c.PK_ID AS c_PK
,CASE
WHEN c.PK_ID IS NULL THEN 0
ELSE CASE WHEN s.ServiceId=c.ServiceId THEN 1 ELSE 0 END
+CASE WHEN (s.Address=c.Address AND s.Zip=c.Zip) THEN 1 ELSE 0 END
+CASE WHEN s.EmailAddress=c.EmailAddress THEN 1 ELSE 0 END
+CASE WHEN s.HomePhone=c.HomePhone THEN 1 ELSE 0 END
END AS Score
FROM Sales s
LEFT OUTER JOIN Customers c ON s.ServiceId=c.ServiceId
OR (s.Address=c.Address AND s.Zip=c.Zip)
OR s.EmailAddress=c.EmailAddress
OR s.HomePhone=c.HomePhone
)
SELECT
s.*,c.*
FROM (SELECT
S_PK,MAX(Score) AS Score
FROM SalesScore
GROUP BY S_PK
) dt
INNER JOIN Sales s ON dt.s_PK=s.PK_ID
INNER JOIN SalesScore ss ON dt.s_PK=s.PK_ID AND dt.Score=ss.Score
LEFT OUTER JOIN Customers c ON ss.c_PK=c.PK_ID
EDIT
I hate to write so much actual code when there was no shema given, because I can't actually run this and be sure it works. However to answer the question of the how to handle ties using the last transaction date, here is a newer version of the above code:
;WITH SalesScore AS (
SELECT
s.PK_ID as S_PK
,c.PK_ID AS c_PK
,CASE
WHEN c.PK_ID IS NULL THEN 0
ELSE CASE WHEN s.ServiceId=c.ServiceId THEN 1 ELSE 0 END
+CASE WHEN (s.Address=c.Address AND s.Zip=c.Zip) THEN 1 ELSE 0 END
+CASE WHEN s.EmailAddress=c.EmailAddress THEN 1 ELSE 0 END
+CASE WHEN s.HomePhone=c.HomePhone THEN 1 ELSE 0 END
END AS Score
FROM Sales s
LEFT OUTER JOIN Customers c ON s.ServiceId=c.ServiceId
OR (s.Address=c.Address AND s.Zip=c.Zip)
OR s.EmailAddress=c.EmailAddress
OR s.HomePhone=c.HomePhone
)
SELECT
*
FROM (SELECT
s.*,c.*,row_number() over(partition by s.PK_ID order by s.PK_ID ASC,c.LastTransaction DESC) AS RankValue
FROM (SELECT
S_PK,MAX(Score) AS Score
FROM SalesScore
GROUP BY S_PK
) dt
INNER JOIN Sales s ON dt.s_PK=s.PK_ID
INNER JOIN SalesScore ss ON dt.s_PK=s.PK_ID AND dt.Score=ss.Score
LEFT OUTER JOIN Customers c ON ss.c_PK=c.PK_ID
) dt2
WHERE dt2.RankValue=1
Here's a fairly ugly way to do this, using SQL Server code. Assumptions:
- Column CustomerId exists in the Customer table, to uniquely identify customers.
- Only exact matches are supported (as implied by the question).
SELECT top 1 CustomerId, LastTransaction, count(*) HowMany
from (select Customerid, LastTransaction
from Sales sa
inner join Customers cu
on cu.ServiceId = sa.ServiceId
union all select Customerid, LastTransaction
from Sales sa
inner join Customers cu
on cu.EmailAddress = sa.EmailAddress
union all select Customerid, LastTransaction
from Sales sa
inner join Customers cu
on cu.Address = sa.Address
and cu.ZipCode = sa.ZipCode
union all [etcetera -- repeat for each possible link]
) xx
group by CustomerId, LastTransaction
order by count(*) desc, LastTransaction desc
I dislike using "top 1", but it is quicker to write. (The alternative is to use ranking functions and that would require either another subquery level or impelmenting it as a CTE.) Of course, if your tables are large this would fly like a cow unless you had indexes on all your columns.
Frankly I would be wary of doing this at all as you do not have a unique identifier in your data.
John Smith lives with his son John Smith and they both use the same email address and home phone. These are two people but you would match them as one. We run into this all the time with our data and have no solution for automated matching because of it. We identify possible dups and actually physically call and find out id they are dups.
I would probably create a stored function for that (in Oracle) and oder on the highest match
SELECT * FROM (
SELECT c.*, MATCH_CUSTOMER( Customer.Id, par1, par2, par3 ) matches FROM Customer c
) WHERE matches >0 ORDER BY matches desc
The function match_customer returns the number of matches based on the input parameters... I guess is is probably slow as this query will always scan the complete customer table
For close matches you can also look at a number of string similarity algorithms.
For example, in Oracle there is the UTL_MATCH.JARO_WINKLER_SIMILARITY function:
http://www.psoug.org/reference/utl_match.html
There is also the Levenshtein distance algorithym.