SQL - Unique results in column A based on a specific value in column B being the most frequent value

SQL - Unique results in column A based on a specific value in column B being the most frequent value - sql

So I have the following challenge:
I'm trying to get unique results from all the clients (Column A) that made most of their purchases at store 103 (Column B).
The store is defined in the first 3 digits of the ticket number. The challenge is that I'm also getting every ticket for each client. And I just need SQL to calculate and filter the results, based on all the unique clients that made most of their purchases at store 103.
The information in Column A comes from Table 1 and the information in column B comes from Table 2.
Example
I've been trying the following:
SELECT DISTINCT Table_1.Full_Name, Table_2.Ticket_#
FROM Table_2
LEFT OUTER JOIN Table_1
ON Table_2.Customer_Number = Table_1.Customer_Number;
I know I'm missing either the group by or order by keywords, but I don't know how to use them properly in this particular case.
Thank you very much in advance.

Here are three options.
SELECT customers.Full_Name, tickets."Ticket_#"
FROM Table_2 tickets INNER JOIN Table_1 customers
ON customers.Customer_Number = tickets.Customer_Number INNER JOIN
(
SELECT Customer_Number
FROM Table_2 tickets
GROUP BY Customer_Number
HAVING COUNT(CASE WHEN LEFT("Ticket_#", 3) = '103' then 1 end)
> COUNT(CASE WHEN LEFT("Ticket_#", 3) <> '103' then 1 end)
) AS m ON m.Customer_Number = customers.Customer_Number
SELECT customers.Full_Name, tickets."Ticket_#"
FROM Table_2 tickets INNER JOIN Table_1 customers
ON customers.Customer_Number = tickets.Customer_Number
WHERE customers.Customer_Number IN (
SELECT Customer_Number
FROM Table2 tickets
WHERE "Ticket_#" LIKE '103%'
GROUP BY Customer_Number
HAVING COUNT(*) > (
SELECT COUNT(*)
FROM Table2 tickets2
WHERE tickets2.Customer_Number = tickets.Customer_Number
AND NOT "Ticket_#" LIKE '103%'
)
)
WITH data AS (
SELECT customers.Full_Name, tickets."Ticket_#"
COUNT(CASE WHEN LEFT(tickets."Ticket_#", 3) = '103' then 1 end)
OVER (PARTITION BY customers.Customer_Number) AS MatchCount
COUNT(CASE WHEN LEFT(tickets."Ticket_#", 3) <> '103' then 1 end)
OVER (PARTITION BY customers.Customer_Number) AS NonmatchCount
FROM Table_2 tickets INNER JOIN Table_1 customers
ON customers.Customer_Number = tickets.Customer_Number
)
SELECT * FROM data WHERE MatchCount > NonmatchCount;

Related

Re-writing EXISTS as JOIN or a subquery in Oracle

I have a query which is very costly and taking more than an hour to execute. I tried converting the EXISTS clause to join but I am stuck, can anyone help?
The purpose is to find duplicate product within a unique space id. FLAT_TABLE consists of 5 million records.
Query:
select
tbl1.product,
tbl1.status,
tbl1.reservation,
tbl1.unique_space_id
FROM
schema1.flat_table tbl1
WHERE
tbl1.status = 'Active'
AND tbl1.product = 'Cage'
AND EXISTS
(SELECT 1
FROM schema1.flat_table tbl2
WHERE tbl2.product = 'Cage'
AND tbl2.status = 'Active'
AND tbl2.reservation <> 'Space Reserved'
AND tbl1.unique_space_id = tbl2.unique_space_id
GROUP BY tbl2.unique_space_id
HAVING COUNT (1) > 1
);

You can use analytical function count as follows:
select * from
(select tbl1.product, tbl1.status, tbl1.reservation, tbl1.unique_space_id,
count(case when tbl1.reservation <> 'Space Reserved' then 1 end)
over(partition by tbl1.unique_space_id) as cnt
FROM schema1.flat_table tbl1
WHERE tbl1.status = 'Active' AND tbl1.product = 'Cage')
where cnt > 1

You could rewrite your query as an inner join to the current exists subquery. The join would have the effect of filtering in the same way the exists clause was behaving.
SELECT DISTINCT
tbl1.product,
tbl1.status,
tbl1.reservation,
tbl1.unique_space_id
FROM schema1.flat_table tbl1
INNER JOIN
(
SELECT unique_space_id
FROM schema1.flat_table
WHERE product = 'Cage' AND
status = 'Active' AND
reservation <> 'Space Reserved'
GROUP BY unique_space_id
HAVING COUNT(*) > 1
) tbl2
ON tbl2.unique_space_id = tbl1.unique_space_id
WHERE
tbl1.status = 'Active' AND
tbl1.product = 'Cage';
Here is a more concise version using COUNT as an analytic function, along with a QUALIFY clause;
SELECT DISTINCT product, status, reservation, unique_space_id
FROM schema1.flat_table
WHERE status = 'Active' AND product = 'Cage'
QUALIFY COUNT(CASE WHEN reservation <> 'Space Reserved' THEN 1 END)
OVER (PARTITION BY unique_space_id) > 1;

identify which foreign key is being used ORACLE SQL

I have two tables, I want to get the average quality_score for quality_score_A and Quality_score_B.
this is what I have tried but this gives me the same value in quality_score_a and quality_score_b
SELECT AVG (quality_score),AVG (quality_score)
FROM REVIEW
JOIN Score_table on score.quality_score_A
JOIN Score_table on score.quality_score_B
WHERE PRODUCT_ID = 2
GROUP BY PRODUCT_ID;
refer to attachment table layout and desired outcome

You need to specify the condition you want to join on:
SELECT product_id, AVG(a.quality_score), AVG(b.quality_score)
FROM review r
JOIN score_table a ON r.quality_score_a = a.score_id
JOIN score_table b ON r.quality_score_a = b.score_id
WHERE product_id = 2
GROUP BY product_id

The simplest way, I think, is to unpivot the results:
select r.product_id,
avg(case when which = 'a' then s.quality_score end) as a_avg,
avg(case when which = 'b' then s.quality_score end) as b_avg
from ((select r.product_id, quality_score_a as score_id, 'a' as which
from reviews r
) union all
(select r.product_id, quality_score_b as score_id, 'b' as which
from reviews r
)
) r join
scores s
on r.score_id = s.score_id
group by r.product_id

How to apply WHERE clause to multiple SELECT statements in SQL Server

I am creating an query that selects data from multiple tables. I have completed all the query but now I have to apply the WHERE clause to the whole query.
I have 9 select statements, and these are working fine. Data is being selected from different tables. Now I want to declare date session and I want all data to be filtered according to the date provided. I am using the below query:
SELECT
(SELECT COUNT(DISTINCT OrderItems.ProductID)
FROM OrderItems) AS 'TotalSoldItemsDistinct',
(SELECT COUNT(OrderItems.ProductID)
FROM OrderItems) AS 'TotalSoldItemsInDistinct',
(SELECT COUNT(Orders.OrderID)
FROM Orders) AS 'TotalOrders',
(SELECT COUNT(Orders.OrderID)
FROM Orders
WHERE Orders.OrderStatusID = #CompleteOStatusID) AS 'CompleteOrders',
(SELECT COUNT(Orders.OrderID)
FROM Orders
WHERE Orders.OrderStatusID = #PendingOStatusID) AS 'PendingOrders',
(SELECT COUNT(Orders.ClientID)
FROM Orders
WHERE Orders.ClientID != #WalkingCustID) AS 'namedcustomers',
(SELECT COUNT(Orders.ClientID)
FROM Orders
WHERE Orders.ClientID = #WalkingCustID) AS 'WalkingCustomers',
(SELECT SUM(OrderItems.PurchasePrice)
FROM OrderItems) AS 'TotalPurchasePrice',
(SELECT SUM(OrderItems.SalePrice)
FROM OrderItems) AS 'TotalSalePrice'
I am selecting data from 2 tables named 'Orders' and 'OrderItems', I have column TransactionDate in 'Orders' table and column OrderDate in OrderItems table on that I want to use where filter. Can anybody please suggest how to apply filter to whole query?

You could try this
;with tempOrderItems AS
(
SELECT
COUNT(DISTINCT OrderItems.ProductID) AS 'TotalSoldItemsDistinct',
COUNT(OrderItems.ProductID) AS 'TotalSoldItemsInDistinct',
SUM(OrderItems.PurchasePrice) AS 'TotalPurchasePrice',
SUM(OrderItems.SalePrice) AS 'TotalSalePrice'
FROM OrderItems ori
WHERE OrderDate BETWEEN xxx AND yyy
)
, tempOrders AS
(
SELECT
COUNT(o.OrderID) AS 'TotalOrders',
SUM(CASE WHEN o.OrderStatusID = #CompleteOStatusID THEN 1 ELSE 0 END) AS 'CompleteOrders',
SUM(CASE WHEN o.OrderStatusID = #PendingOStatusID THEN 1 ELSE 0 END) AS 'PendingOrders',
SUM(CASE WHEN o.ClientID != #WalkingCustID THEN 1 ELSE 0 END) AS 'namedcustomers',
SUM(CASE WHEN o.ClientID = #WalkingCustID THEN 1 ELSE 0 END) AS 'WalkingCustomers'
FROM Orders o
WHERE TransactionDate BETWEEN xxx AND yyy
)
SELECT * FROM tempOrderItems
CROSS JOIN tempOrders

It is not fully clear what you want as a result, but here 2 approaches.
Try the following for selecting data from 2 tables at the same time (replace the date with your criteria):
SELECT * FROM Orders AS o INNER JOIN OrderItems AS i WHERE o.TransactionDate = '2015-02-12' AND i.OrderDate = '2015-02-12';
The SELECT * selects all columns from both tables as a result and the WHERE ... AND ...-clause filters for results only with your defined date.
Try the following for selecting order item data only for Data that matches the date on a specific order.
SELECT i.* FROM Orders AS o INNER JOIN OrderItems AS i WHERE o.TransactionDate = i.OrderDate AND o.OrderID = '12345';
The SELECT i.* tells the query to only return the columns of the OrderItems. And the WHERE o.TransactionDate = i.OrderDate ensures that only order items from the same date of the order with the OrderID "12345" are returned (which is defined with the AND o.OrderID = '12345'. This would work given you have a field "OrderID" on your Order table and you want to use it as a criteria.

Remove grouped data set when total of count is zero with subquery

I'm generating a data set that looks like this
category user total
1 jonesa 0
2 jonesa 0
3 jonesa 0
1 smithb 0
2 smithb 0
3 smithb 5
1 brownc 2
2 brownc 3
3 brownc 4
Where a particular user has 0 records in all categories is it possible to remove their rows form the set? If a user has some activity like smithb does, I'd like to keep all of their records. Even the zeroes rows. Not sure how to go about that, I thought a CASE statement may be of some help but I'm not sure, this is pretty complicated for me. Here is my query
SELECT DISTINCT c.category,
u.user_name,
CASE WHEN (
SELECT COUNT(e.entry_id)
FROM category c1
INNER JOIN entry e1
ON c1.category_id = e1.category_id
WHERE c1.category_id = c.category_id
AND e.user_name = u.user_name
AND e1.entered_date >= TO_DATE ('20140625','YYYYMMDD')
AND e1.entered_date <= TO_DATE ('20140731', 'YYYYMMDD')) > 0 -- I know this won't work
THEN 'Yes'
ELSE NULL
END AS TOTAL
FROM user u
INNER JOIN role r
ON u.id = r.user_id
AND r.id IN (1,2),
category c
LEFT JOIN entry e
ON c.category_id = e.category_id
WHERE c.category_id NOT IN (19,20)
I realise the case statement won't work, but it was an attempt on how this might be possible. I'm really not sure if it's possible or the best direction. Appreciate any guidance.

Try this:
delete from t1
where user in (
select user
from t1
group by user
having count(distinct category) = sum(case when total=0 then 1 else 0 end) )
The sub query can get all the users fit your removal requirement.
count(distinct category) get how many category a user have.
sum(case when total=0 then 1 else 0 end) get how many rows with activities a user have.

There are a number of ways to do this, but the less verbose the SQL is, the harder it may be for you to follow along with the logic. For that reason, I think that using multiple Common Table Expressions will avoid the need to use redundant joins, while being the most readable.
-- assuming user_name and category_name are unique on [user] and [category] respectively.
WITH valid_categories (category_id, category_name) AS
(
-- get set of valid categories
SELECT c.category_id, c.category AS category_name
FROM category c
WHERE c.category_id NOT IN (19,20)
),
valid_users ([user_name]) AS
(
-- get set of users who belong to valid roles
SELECT u.[user_name]
FROM [user] u
WHERE EXISTS (
SELECT *
FROM [role] r
WHERE u.id = r.[user_id] AND r.id IN (1,2)
)
),
valid_entries (entry_id, [user_name], category_id, entry_count) AS
(
-- provides a flag of 1 for easier aggregation
SELECT e.[entry_id], e.[user_name], e.category_id, CAST( 1 AS INT) AS entry_count
FROM [entry] e
WHERE e.entered_date BETWEEN TO_DATE('20140625','YYYYMMDD') AND TO_DATE('20140731', 'YYYYMMDD')
-- determines if entry is within date range
),
user_categories ([user_name], category_id, category_name) AS
( SELECT u.[user_name], c.category_id, c.category_name
FROM valid_users u
-- get the cartesian product of users and categories
CROSS JOIN valid_categories c
-- get only users with a valid entry
WHERE EXISTS (
SELECT *
FROM valid_entries e
WHERE e.[user_name] = u.[user_name]
)
)
/*
You can use these for testing.
SELECT COUNT(*) AS valid_categories_count
FROM valid_categories
SELECT COUNT(*) AS valid_users_count
FROM valid_users
SELECT COUNT(*) AS valid_entries_count
FROM valid_entries
SELECT COUNT(*) AS users_with_entries_count
FROM valid_users u
WHERE EXISTS (
SELECT *
FROM user_categories uc
WHERE uc.user_name = u.user_name
)
SELECT COUNT(*) AS users_without_entries_count
FROM valid_users u
WHERE NOT EXISTS (
SELECT *
FROM user_categories uc
WHERE uc.user_name = u.user_name
)
SELECT uc.[user_name], uc.[category_name], e.[entry_count]
FROM user_categories uc
INNER JOIN valid_entries e ON (uc.[user_name] = e.[user_name] AND uc.[category_id] = e.[category_id])
*/
-- Finally, the results:
SELECT uc.[user_name], uc.[category_name], SUM(NVL(e.[entry_count],0)) AS [entry_count]
FROM user_categories uc
LEFT OUTER JOIN valid_entries e ON (uc.[user_name] = e.[user_name] AND uc.[category_id] = e.[category_id])

Here's another method:
WITH totals AS (
SELECT
c.category,
u.user_name,
COUNT(e.entry_id) AS total,
SUM(COUNT(e.entry_id)) OVER (PARTITION BY u.user_name) AS user_total
FROM
user u
INNER JOIN
role r ON u.id = r.user_id
CROSS JOIN
category c
LEFT JOIN
entry e ON c.category_id = e.category_id
AND u.user_name = e.user_name
AND e1.entered_date >= TO_DATE ('20140625', 'YYYYMMDD')
AND e1.entered_date <= TO_DATE ('20140731', 'YYYYMMDD')
WHERE
r.id IN (1, 2)
AND c.category_id IN (19, 20)
GROUP BY
c.category,
u.user_name
)
SELECT
category,
user_name,
total
FROM
totals
WHERE
user_total > 0
;
The totals derived table calculates the totals per user and category as well as totals across all categories per user (using SUM() OVER ...). The main query returns only rows where the user total is greater than zero.

Inner join that ignore singlets

I have to do an self join on a table. I am trying to return a list of several columns to see how many of each type of drug test was performed on same day (MM/DD/YYYY) in which there were at least two tests done and at least one of which resulted in a result code of 'UN'.
I am joining other tables to get the information as below. The problem is I do not quite understand how to exclude someone who has a single result row in which they did have a 'UN' result on a day but did not have any other tests that day.
Query Results (Columns)
County, DrugTestID, ID, Name, CollectionDate, DrugTestType, Results, Count(DrugTestType)
I have several rows for ID 12345 which are correct. But ID 12346 is a single row of which is showing they had a row result of count (1). They had a result of 'UN' on this day but they did not have any other tests that day. I want to exclude this.
I tried the following query
select
c.desc as 'County',
dt.pid as 'PID',
dt.id as 'DrugTestID',
p.id as 'ID',
bio.FullName as 'Participant',
CONVERT(varchar, dt.CollectionDate, 101) as 'CollectionDate',
dtt.desc as 'Drug Test Type',
dt.result as Result,
COUNT(dt.dru_drug_test_type) as 'Count Of Test Type'
from
dbo.Test as dt with (nolock)
join dbo.History as h on dt.pid = h.id
join dbo.Participant as p on h.pid = p.id
join BioData as bio on bio.id = p.id
join County as c with (nolock) on p.CountyCode = c.code
join DrugTestType as dtt with (nolock) on dt.DrugTestType = dtt.code
inner join
(
select distinct
dt2.pid,
CONVERT(varchar, dt2.CollectionDate, 101) as 'CollectionDate'
from
dbo.DrugTest as dt2 with (nolock)
join dbo.History as h2 on dt2.pid = h2.id
join dbo.Participant as p2 on h2.pid = p2.id
where
dt2.result = 'UN'
and dt2.CollectionDate between '11-01-2011' and '10-31-2012'
and p2.DrugCourtType = 'AD'
) as derived
on dt.pid = derived.pid
and convert(varchar, dt.CollectionDate, 101) = convert(varchar, derived.CollectionDate, 101)
group by
c.desc, dt.pid, p.id, dt.id, bio.fullname, dt.CollectionDate, dtt.desc, dt.result
order by
c.desc ASC, Participant ASC, dt.CollectionDate ASC

This is a little complicated because the your query has a separate row for each test. You need to use window/analytic functions to get the information you want. These allow you to do calculate aggregation functions, but to put the values on each line.
The following query starts with your query. It then calculates the number of UN results on each date for each participant and the total number of tests. It applies the appropriate filter to get what you want:
with base as (<your query here>)
select b.*
from (select b.*,
sum(isUN) over (partition by Participant, CollectionDate) as NumUNs,
count(*) over (partition by Partitipant, CollectionDate) as NumTests
from (select b.*,
(case when result = 'UN' then 1 else 0 end) as IsUN
from base
) b
) b
where NumUNs <> 1 or NumTests <> 1
Without the with clause or window functions, you can create a particularly ugly query to do the same thing:
select b.*
from (<your query>) b join
(select Participant, CollectionDate, count(*) as NumTests,
sum(case when result = 'UN' then 1 else 0 end) as NumUNs
from (<your query>) b
group by Participant, CollectionDate
) bsum
on b.Participant = bsum.Participant and
b.CollectionDate = bsum.CollectionDate
where NumUNs <> 1 or NumTests <> 1

If I understand the problem, the basic pattern for this sort of query is simply to include negating or exclusionary conditions in your join. I.E., self-join where columnA matches, but columns B and C do not:
select
[columns]
from
table t1
join table t2 on (
t1.NonPkId = t2.NonPkId
and t1.PkId != t2.PkId
and t1.category != t2.category
)
Put the conditions in the WHERE clause if it benchmarks better:
select
[columns]
from
table t1
join table t2 on (
t1.NonPkId = t2.NonPkId
)
where
t1.PkId != t2.PkId
and t1.category != t2.category
And it's often easiest to start with the self-join, treating it as a "base table" on which to join all related information:
select
[columns]
from
(select
[columns]
from
table t1
join table t2 on (
t1.NonPkId = t2.NonPkId
)
where
t1.PkId != t2.PkId
and t1.category != t2.category
) bt
join [othertable] on (<whatever>)
join [othertable] on (<whatever>)
join [othertable] on (<whatever>)
This can allow you to focus on getting that self-join right, without interference from other tables.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL - Unique results in column A based on a specific value in column B being the most frequent value - sql

Related

Re-writing EXISTS as JOIN or a subquery in Oracle

identify which foreign key is being used ORACLE SQL

How to apply WHERE clause to multiple SELECT statements in SQL Server

Remove grouped data set when total of count is zero with subquery

Inner join that ignore singlets

Categories

Resources