I'm trying to query my database for my class to find out which customer has placed the most orders. The table I'm searching is a three attribute table that has the customerID, orderID, and the placedDate.
The query I thought would work is:
select cid from placed order by sum(oid);
But I keep getting an error saying cid is "not a single-group group function" the oid is the primary key and is a foreign key that references another table. Is that what the issue is?
If you want to count the number of orders you should do a count instead of a SUM:
SELECT cid,COUNT(*)
FROM placed
GROUP BY cid
ORDER BY COUNT(*) DESC
This will give you the list of customers and their respective number of orders, ordered by the number of orders descendent.
If you want just the customer with most orders, you have to limit the number of records to the first one. For that, you have to tell what DBMS you use, since it varies with the DBMS the way you limit the query to the first one (ex: mysql is LIMIT 1, sql-server is TOP 1):
In Oracle, you can do:
SELECT * FROM (
SELECT cid,COUNT(*)
FROM placed
GROUP BY cid
ORDER BY COUNT(*) DESC
) a
WHERE rownum = 1
In case the there are one or more customers having maximum orders:
select * from orders o, customer c where o.cusId = c.cusId and o.cusId IN (select cusId from orders group by cusId having count(*) = (select count(*) from orders or group by or.cusId order by count(*) desc limit 1));
This solution is for MySQL, as I have used LIMIT. It can be changed as per the DBMS.
I also used = in the second query since LIMIT does not work with IN.
Related
I have an oracle sql database consisting of three tables and I was wondering,
What is the most efficient subquery that can be written to retrieve the information of the customer stored in the table customer_info who has performed the highest amount of purchases in total.(The purchase data is in the table purchase_logs). i.e the number of transactions one customer has performed NOT the quantity of the items purchased.
i.e my aim is to retrieve the customer details of the customer witht he highest amount of purchases done.
I have 3 tables one for the customer_info, one as the purchase_logs and the last one being the item_info.
My current Approach
SELECT * FROM customer_info
WHERE customer_id = (SELECT cust_id
FROM purchase_logs
GROUP BY cust_id
ORDER BY COUNT(*)
DESC LIMIT 1);
This doesn't seem to give me any results at all unfortunately.
This is my Database Schema along with the Sample Data of purchase_logs, customer_info, item_info and the Expected Output
I would really appreciate any help in understanding what the proper approach to solving this problem would be.
There is no limit 1 in Oracle SQL, use row limiting clause instead (fetch first in the example below):
SELECT *
FROM
(SELECT cust_id, count(*) cnt
FROM purchase_logs
GROUP BY cust_id
ORDER BY cnt desc
fetch first 1 row only with ties
) vc
join customer_info
on customer_id = vc.cust_id;
When I run the following query, I get ORA-00934: group function is not allowed here
What is the problem?
Select cust_name
from Customers
where
state = 'California' AND
cust_id in(
select cust_id
from Orders
where
count(cust_id) >= 1 AND
book_id in(select book_id from Books where category = 'Computers')
group by cust_id
)
You wrote:
where
count(cust_id) >= 1 AND
You cannot use a COUNT, MIN, MAX, AVG or other aggregate function in a WHERE clause because at the time the WHERE is executed the GROUP BY has not yet been done so there is no aggregation. SQLs execute in the following order:
FROM
WHERE
GROUP
SELECT
Subqueries execute in that order before main queries execute in that order. Main queries cannot access anything inside a sub query unless the sub query emits it (your sub queries emit lists of values used by IN)
So, you can't use COUNT in your WHERE, but let's look at what you're trying to do:
where
count(cust_id) >= 1 AND
"Where the count of cust_id is at least one.."
It's highly likely this is redundant; the way to get count to return 0 is not have any data for that cust_id, but because you're grouping and counting just one table it's you don't get a 0 count out of it - in order to show up in a result set a row has to be present, which means the count is always at least 1. Other than having null in the cust_id there is no way to make this query return 0 for any row:
SELECT cust_id, count(cust_id)
FROM t
GROUP BY cust_id
And if you're looking to eliminate nulls, you'd just say WHERE cust_id IS NOT NULL. If Orders has a not hull constraint on cust_id (is it logical to have an order that has no customer?) then there wouldn't be any need to specify it
Further, because you're then using the results in an IN, even if a NULL was selected, it gets discarded by the IN anyway- nothing is ever equal to a NULL, even another NULL so saying
WHERE x IN (1,2,3,NULL)
just gives you rows with x that is 1, 2 or 3; you don't get any rows with c as NULL. IN also doesn't care about duplicated values so this is the same as above:
WHERE x IN (1,1,2,2,2,3,NULL)
All in there is entirely no need for the clause you've put, and it can be removed. I suppose the question you're answering is "get the names of all customers from California who have ordered at least one book about computers". The at least one is a red herring; there won't be an order for them if they haven't so you can ignore it:
select cust_name
from Customers
where
state = 'California' AND
cust_id in(
select cust_id
from Orders
where
book_id in(select book_id from Books where category = 'Computers')
)
If however the assignment is "at least two books" then you will need to exclude the single orders. That is done with HAVING which is a where clause that runs after a GROUP BY...
Select cust_name
from Customers
where
state = 'California' AND
cust_id in(
select cust_id
from Orders
where
book_id in(select book_id from Books where category = 'Computers')
group by cust_id
having count(cust_id) > 1 AND
)
Note the use of > rather than >=
Personally, rather than nesting IN I would use JOINs and keep it all on the same level:
SELECT cust_name
FROM
Customers c
INNER JOIN Orders o on c.cust_id = o.cust_id
INNER JOIN Books b on o.book_id = b.book_id
WHERE
c.state = 'California' AND
b.category = 'Computers'
GROUP BY c.cust_id, c.cust_name
HAVING COUNT(*) > 1
If you're going to use this latter form for "at least one book", remove the HAVING but keep the GROUP BY rather than using DISTINCT, as it will prevent different customers with the same name coalescing into one
Seems no need use group by.
Try the SQL statement:
Select cust_name from Customers
where state = 'California'
AND cust_id in
(select cust_id from Orders
where count(cust_id) >= 1
AND book_id in
(select book_id from Books where category = 'Computers')
)
At least you can use distinct to avoid using group by. But distinct seems no need to use in the select subquery.
I am trying to write a SQL query that returns the name and purchase amount of the five customers in each state who have spent the most money.
Table schemas
customers
|_state
|_customer_id
|_customer_name
transactions
|_customer_id
|_transact_amt
Attempts look something like this
SELECT state, Sum(transact_amt) AS HighestSum
FROM (
SELECT name, transactions.transact_amt, SUM(transactions.transact_amt) AS HighestSum
FROM customers
INNER JOIN customers ON transactions.customer_id = customers.customer_id
GROUP BY state
) Q
GROUP BY transact_amt
ORDER BY HighestSum
I'm lost. Thank you.
Expected results are the names of customers with the top 5 highest transactions in each state.
ERROR: table name "customers" specified more than once
SQL state: 42712
First, you need for your JOIN to be correct. Second, you want to use window functions:
SELECT ct.*
FROM (SELECT c.customer_id, c.name, c.state, SUM(t.transact_amt) AS total,
ROW_NUMBER() OVER (PARTITION BY c.state ORDER BY SUM(t.transact_amt) DESC) as seqnum
FROM customers c JOIN
transaactions t
ON t.customer_id = c.customer_id
GROUP BY c.customer_id, c.name, c.state
) ct
WHERE seqnum <= 5;
You seem to have several issues with SQL. I would start with understanding aggregation functions. You have a SUM() with the alias HighestSum. It is simply the total per customer.
You can get them using aggregation and then by using the RANK() window function. For example:
select
state,
rk,
customer_name
from (
select
*,
rank() over(partition by state order by total desc) as rk
from (
select
c.customer_id,
c.customer_name,
c.state,
sum(t.transact_amt) as total
from customers c
join transactions t on t.customer_id = c.customer_id
group by c.customer_id
) x
) y
where rk <= 5
order by state, rk
There are two valid answers already. Here's a third:
SELECT *
FROM (
SELECT c.state, c.customer_name, t.*
, row_number() OVER (PARTITION BY c.state ORDER BY t.transact_sum DESC NULLS LAST, customer_id) AS rn
FROM (
SELECT customer_id, sum(transact_amt) AS transact_sum
FROM transactions
GROUP BY customer_id
) t
JOIN customers c USING (customer_id)
) sub
WHERE rn < 6
ORDER BY state, rn;
Major points
When aggregating all or most rows of a big table, it's typically substantially faster to aggregate before the join. Assuming referential integrity (FK constraints), we won't be aggregating rows that would be filtered otherwise. This might change from nice-to-have to a pure necessity when joining to more aggregated tables. Related:
Why does the following join increase the query time significantly?
Two SQL LEFT JOINS produce incorrect result
Add additional ORDER BY item(s) in the window function to define which rows to pick from ties. In my example, it's simply customer_id. If you have no tiebreaker, results are arbitrary in case of a tie, which may be OK. But every other execution might return different results, which typically is a problem. Or you include all ties in the result. Then we are back to rank() instead of row_number(). See:
PostgreSQL equivalent for TOP n WITH TIES: LIMIT "with ties"?
While transact_amt can be NULL (has not been ruled out) any sum may end up to be NULL as well. With an an unsuspecting ORDER BY t.transact_sum DESC those customers come out on top as NULL comes first in descending order. Use DESC NULLS LAST to avoid this pitfall. (Or define the column transact_amt as NOT NULL.)
PostgreSQL sort by datetime asc, null first?
i have a table with a bunch of customer IDs. in a customer table is also these IDs but each id can be on multiple records for the same customer. i want to select the most recently used record which i can get by doing order by <my_field> desc
say i have 100 customer IDs in this table and in the customers table there is 120 records with these IDs (some are duplicates). how can i apply my order by condition to only get the most recent matching records?
dbms is sql server 2000.
table is basically like this:
loc_nbr and cust_nbr are primary keys
a customer shops at location 1. they get assigned loc_nbr = 1 and cust_nbr = 1
then a customer_id of 1.
they shop again but this time at location 2. so they get assigned loc_nbr = 2 and cust_Nbr = 1. then the same customer_id of 1 based on their other attributes like name and address.
because they shopped at location 2 AFTER location 1, it will have a more recent rec_alt_ts value, which is the record i would want to retrieve.
You want to use the ROW_NUMBER() function with a Common Table Expression (CTE).
Here's a basic example. You should be able to use a similar query with your data.
;WITH TheLatest AS
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY group-by-fields ORDER BY sorting-fields) AS ItemCount
FROM TheTable
)
SELECT *
FROM TheLatest
WHERE ItemCount = 1
UPDATE: I just noticed that this was tagged with sql-server-2000. This will only work on SQL Server 2005 and later.
Since you didn't give real table and field names, this is just psuedo code for a solution.
select *
from customer_table t2
inner join location_table t1
on t1.some_key = t2.some_key
where t1.LocationKey = (select top 1 (LocationKey) as LatestLocationKey from location_table where cust_id = t1.cust_id order by some_field)
Use an aggregate function in the query to group by customer IDs:
SELECT cust_Nbr, MAX(rec_alt_ts) AS most_recent_transaction, other_fields
FROM tableName
GROUP BY cust_Nbr, other_fields
ORDER BY cust_Nbr DESC;
This assumes that rec_alt_ts increases every time, thus the max entry for that cust_Nbr would be the most recent entry.
By using time and date we can take out the recent detail for the customer.
use the column from where you take out the date and the time for the customer.
eg:
SQL> select ename , to_date(hiredate,'dd-mm-yyyy hh24:mi:ss') from emp order by to_date(hiredate,'dd-mm-yyyy hh24:mi:ss');
I'm trying to write a SQL Query for DB2 Version 8 which retrieves the most recent order of a specific part for a list of users. The query receives a parameter which contains a list of customerId numbers and the partId number. For example,
Order Table
OrderID
PartID
CustomerID
OrderTime
I initially wanted to try:
Select * from Order
where
OrderId = (
Select orderId
from Order
where
partId = #requestedPartId# and customerId = #customerId#
Order by orderTime desc
fetch first 1 rows only
);
The problem with the above query is that it only works for a single user and my query needs to include multiple users.
Does anyone have a suggestion about how I could expand the above query to work for multiple users? If I remove my "fetch first 1 rows only," then it will return all rows instead of the most recent. I also tried using Max(OrderTime), but I couldn't find a way to return the OrderId from the sub-select.
Thanks!
Note: DB2 Version 8 does not support the SQL "TOP" function.
Try the following one. I didn't test it. The idea is that you first find all orders for all your specified customers. These will be grouped and you find the biggest order time for each customer (combination of group by and max). This is the foo query, that identifies the records that you need. Than you join it with your order table to retrieve the necessary information for these orders.
select o.*
from order o inner join
(select customerId, max(orderTime)
from order o
where customerId in ( #customerIds#)
and partId = #requestedPartId#
group by customerId) foo
on o.customerId = foo.customerId
and o.orderTime = foo.orderTime
EDIT: The above query gives you the most recent order for each customer you specified under the condition, that there is only one order per customer and orderTime. To get only one order it is slightly different. The following example assumes that the orderTime is unique, meaning there are no two orders at the same time in the database. This is generally be true if orderTime is recorded in milliseconds.
select o.*
from order o inner join
(select customerId, max(orderTime)
from order o
where customerId in ( #customerIds#)
and partId = #requestedPartId#) foo
on o.orderTime = foo.orderTime