SQL Prevent Multiple Counting - sql

I found some similar SO questions regarding my issue and they fixed it by using sub-queries but I can't seem to apply it on my situation.
Goal
My goal is to count the animals that have been a breeder at least once in their lifespan.
I have 2 tables to keep track of when an animal became a breeder. Here's a simple look of how the tables are structured:
Animals (a)
id name
-------------------
100 Mouse
101 Cow
102 Pig
103 Dog
Breeding History (bh)
id animal_id code date
--------------------------------------------
500 100 B 2016-01-12
501 100 A 2016-01-25
502 101 B 2016-01-28
503 102 B 2016-02-02
504 100 B 2016-02-05
505 100 A 2016-02-08
In this scenario, my current query for counting works fine for both 101 | Cow and 102 | Pig since they only became a breeder (Code: B) once. The count for an animal who never became a breeder is also correct but it's not really a problem here. For an animal that became a breeder more than once in its lifespan e.g. 100 | Mouse it would be counted by the number of times it became a breeder.
Query
SELECT
a.name,
COUNT(CASE WHEN bh.code IN ('B') THEN 1 ELSE NULL END) AS breeder_count
FROM animals a
LEFT OUTER JOIN breeding_history bh
ON a.id = bh.animal_id
GROUP BY a.name
Result
name breeder_count
--------------------------
Mouse 2
Cow 1
Pig 1
Dog 0
The result shows that there are 2 mice that became a breeder when actually it was the same animal and should only be counted once.

You can use the DISTINCT keyword, so as to count a 'B' just once:
SELECT
a.name,
COUNT(DISTINCT CASE WHEN bh.code IN ('B') THEN 1 END) AS breeder_count
FROM animals a
LEFT OUTER JOIN breeding_history bh
ON a.id = bh.animal_id
GROUP BY a.name
As a side note, ELSE NULL is redundant and has been removed from the CASE expression.
Demo here

Related

SQL Query: Join (or select) 2 columns from 1 table with 1 column from another table for a view without extra join columns

This is my very first Stackoverflow post, so I apologize if I am not formatting my question correctly. I'm pounding my head against the wall with what I'm sure is a simple problem. I have a table with a bunch of event information, about 10 columns as so:
Table: event_info
date location_id lead_user_id colead_user_id attendees start end <and a few more...>
------------------------------------------------------------------------------------------------
2020-10-10 1 3 1 26 2100 2200 .
2020-10-11 3 2 4 18 0600 0700
2020-10-12 2 5 6 6 0800 0900
And another table with user information:
Table: users
user_id user_name display_name email phone city
----------------------------------------------------------------------
1 Joe S goofball ...
2 John T schmoofball ...
3 Jack U aloofball ...
4 Jim V poofball ...
5 Joy W tootball ...
6 George A boring ...
I want to create a view that has only a subset of the information, not full table joins. The event table lead_user_id and colead_user_id columns both refer to the user_id column in the users table.
I want to create a view like this:
date Location Lead Name CoLead Name attendees
---------------------------------------------------------------------
2020-10-10 1 Jack U Joe S 26
2020-10-11 3 John T Jim V 18
2020-10-12 2 Joy W George A 6
I have tried the following and several iterations like it to no avail...
SELECT
E.date, E.location,
U1.display_name AS Lead Name,
U2.display_name AS CoLead Name.
E.attendees
FROM
users U1, event_info E
INNER JOIN
event_info E ON U1.user_id = E.lead_user_id
INNER JOIN
users U2 ON U2.user_id = E.colead_user_id
And I get the dreaded
You have an error in your SQL Syntax
message. I'm not surprised, as I've really only ever used joins on single columns or nested select statements... this two columns pointing to one is throwing me for a loop. Help!
correct query for this matter
SELECT
E.date, E.location,
U1.display_name AS Lead Name,
(select display_name from users where user_id=E.colead_user_id) AS CoLead Name,
E.attendees
FROM
event_info E
INNER JOIN
users U1 ON U1.user_id = E.lead_user_id

SQL How can this happen? - Query which normally returns 1 result alone actually resulted in multiple results when put inside WHERE clause

Question brief
I'm doing this practice on w3resource and I couldn't understand why the solution worked. I'm 2 days old to SQL. I'll appreciate very much if someone can help me explain.
I have 2 tables, COMPANY(com_id, com_name) and PRODUCT(pro_name, pro_price, com_id). Each company has several products with different prices. Now I need to write a query to display companies' name together with their most expensive products respectively.
The sample answer on the practice is like this
SELECT c.com_name, p.pro_name, p.pro_price
FROM product p
INNER JOIN company c ON p.com_id = c.com_id
AND p.pro_price =
( SELECT MAX(p.pro_price)
FROM product p
WHERE p.com_id = c.com_id );
The query returned expected result.
com_name pro_name pro_price
--------- --------- -----------
Samsung Monitor 5000.00
iBall DVD drive 900.00
Epsion Printer 2600.00
Zebronics ZIP drive 250.00
Asus Mother Board 3200.00
Frontech Speaker 550.00
But I cannot understand how, especially the part inside the bottom sub-query. Isn't SELECT MAX(p.pro_price) supposed to return only 1 highest price of all companies together?
I also tried subsecting this sub-query like this
SELECT MAX(p.pro_price)
FROM product p
INNER JOIN company c ON p.com_id = c.com_id
WHERE p.com_id = c.com_id;
... and it only returned 1 maximum value.
max(p.pro_price)
-----
5000.00
So how does the final result of the whole query include more than 1 records? There's no GROUP BY or anything.
By the way, the query seemed to use 2 conditions for INNER JOIN. But I also tried swapping the 2nd condition into a WHERE clause and it still worked the same. This is one more thing I don't understand.
The databases involved
COMPANY table
COM_ID | COM_NAME
----------------
11 | Samsung
12 | iBall
13 | Epsion
14 | Zebronics
15 | Asus
16 | Frontech
PRODUCT table
PRO_NAME PRO_PRICE COM_ID
-------------------- ---------- ---------
Mother Board 3200 15
Key Board 450 16
ZIP drive 250 14
Speaker 550 16
Monitor 5000 11
DVD drive 900 12
CD drive 800 12
Printer 2600 13
Refill cartridge 350 13
Mouse 250 12
The sub-query is a correlated sub-query. This query is executed for each value of c.com_id in the outer query:
WHERE p.com_id = c.com_id

Concatenating data from one row into the results from another

I have a SQL Server database of orders I'm struggling with. For a normal order a single table provides the following results:
Orders:
ID Customer Shipdate Order ID
-----------------------------------------------------------------
1 Tom 2015-01-01 100
2 Bob 2014-03-20 200
At some point they needed orders that were placed by more than one customer. So they created a row for each customer and split the record over multiple rows.
Orders:
ID Customer Shipdate Order ID
-----------------------------------------------------------------
1 Tom 2015-01-01 100
2 Bob 2014-03-20 200
3 John
4 Dan
5 2014-05-10 300
So there is another table I can join on to make sense of this which relates the three rows which are actually one order.
Joint.Orders:
ID Related ID
-----------------------------------------------------------------
5 3
5 4
I'm a little new to SQL and while I can join on the other table and filter to only get the data relating to Order ID 300, but what I'd really like is to concatenate the customers, but after searching for a while I can't see how to do this. What'd I'd really like to achieve is this as an output:
ID Customer Shipdate Order ID
----------------------------------------------------------------
1 Tom 2015-01-01 100
2 Bob 2014-03-20 200
5 John, Dan 2014-05-10 300
You should consider changing the schema first. The below query might help you get a feel of how it can be done with your current design.
Select * From Orders Where IsNull(Customer, '') <> ''
Union All
Select ID,
Customer = (Select Customer + ',' From Orders OI Where OI.ID (Select RelatedID from JointOrders JO Where JO.ID = O.ID)
,ShipDate, OrderID
From Orders O Where IsNull(O.Customer, '') = ''

SQL: How do I count the number of clients that have already bought the same product?

I have a table like the one below. It is a record of daily featured products and the customers that purchased them (similar to a daily deal site). A given client can only purchase a product one time per feature, but they may purchase the same product if it is featured multiple times.
FeatureID | ClientID | FeatureDate | ProductID
1 1002 2011-05-01 500
1 2333 2011-05-01 500
1 4458 2011-05-01 500
2 8888 2011-05-10 700
2 2333 2011-05-10 700
2 1111 2011-05-10 700
3 1002 2011-05-20 500
3 4444 2011-05-20 500
4 4444 2011-05-30 500
4 2333 2011-05-30 500
4 1002 2011-05-30 500
I want to count by FeatureID the number of clients that purchased FeatureID X AND who purchased the same productID during a previous feature.
For the table above the expected result would be:
FeatureID | CountofReturningClients
1 0
2 0
3 1
4 3
Ideally I would like to do this with SQL, but am also open to doing some manipulation in Excel/PowerPivot. Thanks!!
If you join your table to itself, you can find the data you're looking for. Be careful, because this query can take a long time if the table has a lot of data and is not indexed well.
SELECT t_current.FEATUREID, COUNT(DISTINCT t_prior.CLIENTID)
FROM table_name t_current
LEFT JOIN table_name t_prior
ON t_current.FEATUREDATE > t_prior.FEATUREDATE
AND t_current.CLIENTID = t_prior.CLIENTID
AND t_current.PRODUCTID = t_prior.PRODUCTID
GROUP BY t_current.FEATUREID
"Per feature, count the clients who match for any earlier Features with the same product"
SELECT
Curr.FeatureID
COUNT(DISTINCT Prev.ClientID) AS CountofReturningClients --edit thanks to feedback
FROM
MyTable Curr
LEFT JOIN
MyTable Prev WHERE Curr.FeatureID > Prev.FeatureID
AND Curr.ClientID = Prev.ClientID
AND Curr.ProductID = Prev.ProductID
GROUP BY
Curr.FeatureID
Assumptions: You have a table called Features that is:
FeatureID, FeatureDate, ProductID
If not then you could always create one on the fly with a temporary table, cte or view.
Then:
SELECT
FeatureID
, (
SELECT COUNT(DISTINCT ClientID) FROM Purchases WHERE Purchases.FeatureDate < Feature.FeatureDate AND Feature.ProductID = Purchases.ProductID
) as CountOfReturningClients
FROM Features
ORDER BY FeatureID
New to this, but wouldn't the following work?
SELECT FeatureID, (CASE WHEN COUNT(clientid) > 1 THEN COUNT(clientid) ELSE 0 END)
FROM table
GROUP BY featureID

Group by with count

Say I have a table like this in my MsSql server 2005 server
Apples
+ Id
+ Brand
+ HasWorms
Now I want an overview of the number of apples that have worms in them per brand.
Actually even better would be a list of all the apple brands with a flag if they are unspoiled or not.
So if I had the data
ID| Brand | HasWorms
---------------------------
1 | Granny Smith | 1
2 | Granny Smith | 0
3 | Granny Smith | 1
4 | Jonagold | 0
5 | Jonagold | 0
6 | Gala | 1
7 | Gala | 1
I want to end up with
Brand | IsUnspoiled
--------------------------
Granny Smith | 0
Jonagold | 1
Gala | 0
I figure I should first
select brand, numberOfSpoiles =
case
when count([someMagic]) > 0 then 1
else 0
end
from apples
group by brand
I can't use a having clause, because then brands without valid entries would dissapear from my list (I wouldn't see the entry Gala).
Then I thought a subquery of some kind should do it, but then I can't link the apple id of the outer (grouped) query to the inner (count) query...
Any ideas?
select brand, case when sum(hasworms)>0 then 0 else 1 end IsUnSpoiled
from apples
group by brand
SQL server version, I did spoiled instead of unspoiled, this way I could use the SIGN function and make the code shorter
table + data (DML + DDL)
create table Apples(id int,brand varchar(20),HasWorms bit)
insert Apples values(1,'Granny Smith',1)
insert Apples values(2,'Granny Smith',0)
insert Apples values(3,'Granny Smith',1)
insert Apples values(4,'Jonagold',0)
insert Apples values(5,'Jonagold',0)
insert Apples values(6,'Gala',1)
insert Apples values(7,'Gala',1)
Query
select brand, IsSpoiled = sign(sum(convert(int,hasworms)))
from apples
group by brand
Output
brand IsSpoiled
----------------------
Gala 1
Granny Smith 1
Jonagold 0
SELECT Brand,
1-MAX(HasWorms) AS IsUnspoiled
FROM apples
GROUP BY Brand
SELECT brand,
COALESCE(
(
SELECT TOP 1 0
FROM apples ai
WHERE ai.brand = ao.brand
AND hasWorms = 1
), 1) AS isUnspoiled
FROM (
SELECT DISTINCT brand
FROM apples
) ao
If you have an index on (brand, hasWorms), this query will be super fast, since it does not count aggregates, but instead searches for a first spoiled apple within each brand ans stops.
I haven't tested this, and maybe I'm missing something. But wouldn't this work?
SELECT Brand, SUM(CONVERT(int, HasWorms)) AS SpoiledCount
FROM Apples
GROUP BY Brand
ORDER BY SpoiledCount DESC
I assume HasWorms is a bit field, hence the CONVERT statement. This should return a list of brands with the count of spoiled apples per brand. You should see the worst (most spoiled) at the top and the best at the bottom.
There are many ways to skin this cat. Depending on your RDBMS, different queries will give you the best results. On our Oracle box, this query performs faster than all the others listed, assuming that you have an index on Brand in the Apples table (an index on Brand, HasWorms is even faster, but that may not be likely; depending on your data distribution, an index on just HasWorms may be the fastest of all). It also assumes you have a table "BrandTable", which just has the brands:
SELECT Brand
, 1 IsSpoiled
FROM BrandTable b
WHERE EXISTS
( SELECT 1
FROM Apples a
WHERE a.brand = b.brand
AND a.HasWorms = 1
)
UNION
SELECT Brand
, 0
FROM BrandTable b
WHERE NOT EXISTS
( SELECT 1
FROM Apples a
WHERE a.brand = b.brand
AND a.HasWorms = 1
)
ORDER BY 1;
SELECT CASE WHEN SUM(HasWorms) > 0 THEN 0 ELSE 1 END AS IsUnspoiled, Brand
FROM apples
GROUP BY Brand