Creating variable in SQL and using in WHERE clause - sql

I want to create a variable that counts the number of times each customer ID appears in the CSV, and then I want the output to be all customer IDs that appear 0,1,or 2 times. Here is my code so far:
SELECT Customers.customer_id , COUNT(*) AS counting
FROM Customers
LEFT JOIN Shopping_cart ON Customers.customer_id = Shopping_cart.customer_id
WHERE counting = '0'
OR counting = '1'
OR counting = '2'
GROUP BY Customers.customer_id;

SELECT Customers.customer_id , COUNT(*) AS counting
FROM Customers LEFT JOIN Shopping_cart on Customers.customer_id=Shopping_cart.customer_id
WHERE COUNT(*) < 3
GROUP BY Customers.customer_id;
The query groups all customer ids, and with count() we get the number of items in a group. So for your solution you call the group count() and say only the items where the group count is smaller then 3. Smaller then 3 includes (0,1,2). You can reuse the count() in the query.

You're probably thinking of HAVING, not WHERE.
For example:
select JOB, COUNT(JOB) from SCOTT.EMP
group by JOB
HAVING count(JOB) > 1 ;
While a tad odd, you may be specific about the HAVING condition(s):
HAVING count(JOB) = 2 or count(JOB) = 4
Note: the WHERE clause is used for filtering rows and it applies on each and every row, while the HAVING clause is used to filter groups.

You can apply a filter after the aggregation with the HAVING clause.
Please note that count(*) counts all rows, including empty ones, so you cannot use it to detect customers without any shopping cart; you have to count the non-NULL values in some column instead:
SELECT customer_id,
count(Shopping_cart.some_id) AS counting
FROM Customers
LEFT JOIN Shopping_cart USING (customer_id)
GROUP BY customer_id
HAVING count(Shopping_cart.some_id) BETWEEN 0 and 2;

Related

sql error ORA-00934: Group function is not allowed here

When I run the following query, I get ORA-00934: group function is not allowed here
What is the problem?
Select cust_name
from Customers
where
state = 'California' AND
cust_id in(
select cust_id
from Orders
where
count(cust_id) >= 1 AND
book_id in(select book_id from Books where category = 'Computers')
group by cust_id
)
You wrote:
where
count(cust_id) >= 1 AND
You cannot use a COUNT, MIN, MAX, AVG or other aggregate function in a WHERE clause because at the time the WHERE is executed the GROUP BY has not yet been done so there is no aggregation. SQLs execute in the following order:
FROM
WHERE
GROUP
SELECT
Subqueries execute in that order before main queries execute in that order. Main queries cannot access anything inside a sub query unless the sub query emits it (your sub queries emit lists of values used by IN)
So, you can't use COUNT in your WHERE, but let's look at what you're trying to do:
where
count(cust_id) >= 1 AND
"Where the count of cust_id is at least one.."
It's highly likely this is redundant; the way to get count to return 0 is not have any data for that cust_id, but because you're grouping and counting just one table it's you don't get a 0 count out of it - in order to show up in a result set a row has to be present, which means the count is always at least 1. Other than having null in the cust_id there is no way to make this query return 0 for any row:
SELECT cust_id, count(cust_id)
FROM t
GROUP BY cust_id
And if you're looking to eliminate nulls, you'd just say WHERE cust_id IS NOT NULL. If Orders has a not hull constraint on cust_id (is it logical to have an order that has no customer?) then there wouldn't be any need to specify it
Further, because you're then using the results in an IN, even if a NULL was selected, it gets discarded by the IN anyway- nothing is ever equal to a NULL, even another NULL so saying
WHERE x IN (1,2,3,NULL)
just gives you rows with x that is 1, 2 or 3; you don't get any rows with c as NULL. IN also doesn't care about duplicated values so this is the same as above:
WHERE x IN (1,1,2,2,2,3,NULL)
All in there is entirely no need for the clause you've put, and it can be removed. I suppose the question you're answering is "get the names of all customers from California who have ordered at least one book about computers". The at least one is a red herring; there won't be an order for them if they haven't so you can ignore it:
select cust_name
from Customers
where
state = 'California' AND
cust_id in(
select cust_id
from Orders
where
book_id in(select book_id from Books where category = 'Computers')
)
If however the assignment is "at least two books" then you will need to exclude the single orders. That is done with HAVING which is a where clause that runs after a GROUP BY...
Select cust_name
from Customers
where
state = 'California' AND
cust_id in(
select cust_id
from Orders
where
book_id in(select book_id from Books where category = 'Computers')
group by cust_id
having count(cust_id) > 1 AND
)
Note the use of > rather than >=
Personally, rather than nesting IN I would use JOINs and keep it all on the same level:
SELECT cust_name
FROM
Customers c
INNER JOIN Orders o on c.cust_id = o.cust_id
INNER JOIN Books b on o.book_id = b.book_id
WHERE
c.state = 'California' AND
b.category = 'Computers'
GROUP BY c.cust_id, c.cust_name
HAVING COUNT(*) > 1
If you're going to use this latter form for "at least one book", remove the HAVING but keep the GROUP BY rather than using DISTINCT, as it will prevent different customers with the same name coalescing into one
Seems no need use group by.
Try the SQL statement:
Select cust_name from Customers
where state = 'California'
AND cust_id in
(select cust_id from Orders
where count(cust_id) >= 1
AND book_id in
(select book_id from Books where category = 'Computers')
)
At least you can use distinct to avoid using group by. But distinct seems no need to use in the select subquery.

Why does adding GROUP BY cause a seemingly unrelated error?

The following code works fine:
SELECT name, (SELECT count(item_id) FROM bids WHERE item_id = items.id)
FROM items;
However, when I add
SELECT name, (SELECT count(item_id) FROM bids WHERE item_id = items.id)
FROM items
GROUP BY name;
I get ERROR: subquery uses ungrouped column "items.id" from outer query
Can anyone tell me why this is happening? Thanks!
If you GROUP BY name then any other columns you select from items must have an aggregate function applied. That's what GROUP BY means.
In your case, you are using another column from items -- id -- in a correlated scalar subquery. That's not an aggregate function, and id is not in the GROUP BY clause, so you get an error.
You could instead GROUP BY name, id. That should give you the same results as the first query, and is probably pointless.
If you actually have multiple rows in items with the same value for name, and you want to group the results of the scalar subquery for those values, you need to specify how to group them. Perhaps you want the total of the subquery results for each value of name. If so, I think you could do:
SELECT name, SUM(SELECT count(item_id) FROM bids WHERE item_id = items.id))
FROM items
GROUP BY name;
(I'm not positive about the specific syntax as I don't have a Postgres instance to test against.)
A clearer way to express it might be:
SELECT name, SUM(bid_count)
FROM (
SELECT name, (SELECT count(item_id) FROM bids WHERE item_id = items.id) AS bid_count
FROM items
)
GROUP BY name
Join the tables then perform the GROUP BY:
select i.name, count(b.item_id)
from items i
inner join bids b
on b.item_id = i.id
group by i.name
db<>fiddle here

SQL Server clause issue

Select Name, contact, and postal code of the customer who has done MAXIMUM transactions in the month of June.
SELECT
Customer.customer_name,
Customer.customer_email,
Customer.customer_postcode
FROM
Customer
INNER JOIN
Sales on Customer.customer_id = Sales.customer_id
WHERE
MAX(Sales.customer_id) IN (SELECT COUNT((sales.customer_id)) AS 'transactions'
FROM sales
GROUP BY (sales.customer_id))
AND MONTH(date_purchased) = 6;
But I get this error:
Msg 147, Level 15, State 1, Line 4
An aggregate may not appear in the WHERE clause unless it is in a subquery contained in a HAVING clause or a select list, and the column being aggregated is an outer reference
You're taking the MAX of the customer_id, but what you want is the customer_id with the highest number of transactions. Start with your inner query, and get the top customers using ORDER BY..DESC.
SELECT Sales.customer_id, count(Sales.customer_id) as transactions
FROM Sales
GROUP BY Sales.customer_id
ORDER BY transactions DESC;
Now that you have the top customer_id, you should be able to join that result on the Customers table (using this as a CTE or an inner query) to get the name, contact, and postal code.
Your current query has a number of issues:
Aggregates such as MAX cannot be used in the WHERE, they must be in the HAVING part.
Even if you change it to HAVING, the subquery is wrong because it doesn't filter on June
A much simpler method is to just join the tables, group, and then sort by COUNT and take the first row
The outer June filter should use start and end dates, not MONTH function, to improve performance
You should use proper table aliasing
SELECT TOP (1)
c.customer_name,
c.customer_email,
c.customer_postcode
FROM
Customer c
INNER JOIN
Sales s on Customer.customer_id = Sales.customer_id
WHERE
date_purchased >= '20200601' AND date_purchased < '20200701'
-- note the use of half open interval >= AND <
GROUP BY
c.customer_name,
c.customer_email,
c.customer_postcode
ORDER BY COUNT(*) DESC;

PostgreSQL Query to JOIN and SUM

I have 2 tables:
orders
orderItems
SUM TOTAL (products price total) of each order s saved on table orders field total. I need to connect these 2 tables and get the sum total and count from the values saved in orders table an example is below
SELECT
count(orders.id), sum(orders.total)
FROM
orders
INNER JOIN orderItems ON orderItems.order_no = orders.order_no
AND orders.order_no LIKE 'P%' AND orderItems.pCode IN ('1','2','3','4')
How do I get the sum and count from single query?
This is a stab in the dark, but based on your updated comments I think I might know what you are dealing with. It seems like you are doing a sum and count on the order header level from the "orders" table, but by joining to the lines table you are getting multiple records, thus getting a seemingly arbitrary multiplication of both aggregates.
If this is the case, where you only want to sum and count the order header if there is one or more lines that meet your criteria (pCode in 1, 2, 3, 4) then what you want is a semi-join, using the exists clause.
SELECT
count(orders.id), sum(orders.total)
FROM
orders o
where
o.order_no like 'P%' and
exists (
select null
from orderItems i
where
o.order_no = i.order_no and
i.pCode in ('1', '2', '3', '4')
)
What this does is even if you have multiple lines meeting your condition(s), it will still only sum each header once. The syntax takes some getting used to, but the construct itself is very useful and efficient. The alternative would be a subquery "in" list, which on PostgreSQL would not run as efficiently for large datasets.
If that's not what you meant, please edit your question with the sample data and what you expect to see for the final output.
If you want to use aggregates (e.g. SUM, COUNT) across values (e.g. pCode) then you need to use a GROUP BY clause on the non-aggregated columns:
SELECT
orderItems.pCode,
COUNT(orders.id) AS order_count,
SUM(orders.total) AS order_total
FROM orders
INNER JOIN orderItems
ON orderItems.order_no = orders.order_no
WHERE orders.order_no LIKE 'P%'
AND orderItems.pCode IN ('1','2','3','4')
GROUP BY
orderItems.pCode
Note how orderItems.pCode is in both the SELECT clause and the GROUP BY clause. If you wanted to list by orders.order_no as well then you would add that column to both clauses too.

SQLPLUS (Oracle) - Get MAX COUNT of GROUPBY

I need to identify which Month has the most entries. Ive used the TO_DATE function to format the date column to just the MONTH. Also, SELECT COUNT(*) in combination with the GROUP BY Clause I am able to return all records month and count attributes.
However, I need to be able to only return one row that is the MAX of the COUNT. IVE atempted to do so by adding a HAVING clause but returns an error. I suspect I need a subquery in here somewhere but am unsure as to how to go about it.
SELECT TO_CHAR(P.DATEREGISTERED,'MONTH') MONTH, COUNT(*) COUNT
FROM PET P
GROUP BY TO_CHAR(P.DATEREGISTERED,'MONTH')
HAVING COUNT = MAX(COUNT);
Another Attempt:
SELECT TO_CHAR(P.DATEREGISTERED,'MONTH') MONTH, COUNT(*) COUNT
FROM PET P
GROUP BY TO_CHAR(P.DATEREGISTERED,'MONTH')
HAVING COUNT(*) = (SELECT MAX(TO_CHAR(P.DATEREGISTERED,'MONTH')) FROM PET P);
In the query with alias, you are grouping by Month and getting a count of the number of records and you are checking whether that count is same as the maximum of the "date value" converted to month string. They are not even comparisons of the same type.
The query that you have provided in the answer correctly compares the count on both sides.
Another way to rewrite the query would be
select * from
(SELECT TO_CHAR(P.DATEREGISTERED,'MONTH') MONTH, COUNT(*) COUNT
FROM PET P
GROUP BY TO_CHAR(P.DATEREGISTERED,'MONTH') order by count(*) desc )
where rownum=1
Here we order the records in the subquery by descending order of the count and then getting the first row from that.
The bellow code works and returns the correct response. It is unclear to me as to why it works but the above attempts (w/ aliases) do not.
SELECT TO_CHAR(P.DATEREGISTERED,'MONTH') MONTH, COUNT(*) COUNT
FROM PET P
GROUP BY TO_CHAR(P.DATEREGISTERED,'MONTH')
HAVING COUNT(*) = (SELECT MAX(COUNT(*)) FROM PET P GROUP BY TO_CHAR(P.DATEREGISTERED,'MONTH'));