Most rented movie by title PostgreSQL sample database - sql

I am working on getting the most popular movies rented per month by title. So far I have been able to get the titles of the movies and the dates they were rented but the count column and date column are giving individual results. Here is the query I am using. Any help would be much appreciated.
SELECT x.rental_date, x.title, x.count FROM(
SELECT ren.rental_date,fil.title,COUNT(ren.rental_id)
FROM rental AS ren
JOIN inventory AS inv ON ren.inventory_id = inv.inventory_id
JOIN film AS fil ON inv.film_id = fil.film_id
GROUP BY title, rental_date) AS x
ORDER BY x.count,x.rental_date;

Simply aggregate by the year/month of the specific dates using to_char. Also, subquery is not necessary.
SELECT TO_CHAR(ren.rental_date, 'YYYY-MM') AS rental_month,
fil.title,
COUNT(ren.rental_id) AS rental_count
FROM rental AS ren
JOIN inventory AS inv
ON ren.inventory_id = inv.inventory_id
JOIN film AS fil
ON inv.film_id = fil.film_id
GROUP BY TO_CHAR(ren.rental_date, 'YYYY-MM'),
fil.title
ORDER BY rental_month,
rental_count DESC
Consider also date_part for year and month extraction or date_trunc to normalize dates to first day of month to keep the timestamp type:
SELECT DATE_PART('year', ren.rental_date) AS rental_year,
DATE_PART('month', ren.rental_date) AS rental_month,
...
SELECT DATE_TRUNC('month', ren.rental_date) AS rental_month,
...

Related

Is there a way to count how many strings in a specific column are seen for the 1st time?

**Is there a way to count how many strings in a specific column are seen for
Since the value in the column 2 gets repeated sometimes due to the fact that some clients make several transactions in different times (the client can make a transaction in the 1st month then later in the next year).
Is there a way for me to count how many IDs are completely new per month through a group by (never seen before)?
Please let me know if you need more context.
Thanks!
A simple way is two levels of aggregation. The inner level gets the first date for each customer. The outer summarizes by year and month:
select year(min_date), month(min_date), count(*) as num_firsts
from (select customerid, min(date) as min_date
from t
group by customerid
) c
group by year(min_date), month(min_date)
order by year(min_date), month(min_date);
Note that date/time functions depends on the database you are using, so the syntax for getting the year/month from the date may differ in your database.
You can do the following which will assign a rank to each of the transactions which are unique for that particular customer_id (rank 1 therefore will mean that it is the first order for that customer_id)
The above is included in an inline view and the inline view is then queried to give you the month and the count of the customer id for that month ONLY if their rank = 1.
I have tested on Oracle and works as expected.
SELECT DISTINCT
EXTRACT(MONTH FROM date_of_transaction) AS month,
COUNT(customer_id)
FROM
(
SELECT
date_of_transaction,
customer_id,
RANK() OVER(PARTITION BY customer_id
ORDER BY
date_of_transaction ASC
) AS rank
FROM
table_1
)
WHERE
rank = 1
GROUP BY
EXTRACT(MONTH FROM date_of_transaction)
ORDER BY
EXTRACT(MONTH FROM date_of_transaction) ASC;
Firstly you should generate associate every ID with year and month which are completely new then count, while grouping by year and month:
SELECT count(*) as new_customers, extract(year from t1.date) as year,
extract(month from t1.date) as month FROM table t1
WHERE not exists (SELECT 1 FROM table t2 WHERE t1.id==t2.id AND t2.date<t1.date)
GROUP BY year, month;
Your results will contain, new customer count, year and month

SQLite - Use a CTE to divide a query

quick question for those SQL experts out there. I feel a bit stupid because I have the feeling I am close to reaching the solution but have not been able to do so.
If I have these two tables, how can I use the former one to divide a column of the second one?
WITH month_usage AS
(SELECT strftime('%m', starttime) AS month, SUM(slots) AS total
FROM Bookings
GROUP BY month)
SELECT strftime('%m', b.starttime) AS month, f.name, SUM(slots) AS usage
FROM Bookings as b
LEFT JOIN Facilities as f
ON b.facid = f.facid
GROUP BY name, month
ORDER BY month
The first one computes the total for each month
The second one is the one I want to divide the usage column by the total of each month to get the percentage
When I JOIN both tables using month as an id it messes up the content, any suggestion?
I want to divide the usage column by the total of each month to get the percentage
Just use window functions:
SELECT
strftime('%m', b.starttime) AS month,
f.name,
SUM(slots) AS usage
1.0 * SUM(slots) AS usage
/ SUM(SUM(slots)) OVER(PARTITION BY strftime('%m', b.starttime)) ratio
FROM Bookings as b
LEFT JOIN Facilities as f
ON b.facid = f.facid
GROUP BY name, month
ORDER BY month

SQL: aggregation (group by like) in a column

I have a select that group by customers spending of the past two months by customer id and date. What I need to do is to associate for each row the total amount spent by that customer in the whole first week of the two month time period (of course it would be a repetition for each row of one customer, but for some reason that's ok ). do you know how to do that without using a sub query as a column?
I was thinking using some combination of OVER PARTITION, but could not figure out how...
Thanks a lot in advance.
Raffaele
Query:
select customer_id, date, sum(sales)
from transaction_table
group by customer_id, date
If it's a specific first week (e.g. you always want the first week of the year, and your data set normally includes January and February spending), you could use sum(case...):
select distinct customer_id, date, sum(sales) over (partition by customer_ID, date)
, sum(case when date between '1/1/15' and '1/7/15' then Sales end)
over (partition by customer_id) as FirstWeekSales
from transaction_table
In response to the comments below; I'm not sure if this is what you're looking for, since it involves a subquery, but here's my best shot:
select distinct a.customer_id, date
, sum(sales) over (partition by a.customer_ID, date)
, sum(case when date between mindate and dateadd(DD, 7, mindate)
then Sales end)
over (partition by a.customer_id) as FirstWeekSales
from transaction_table a
left join
(select customer_ID, min(date) as mindate
from transaction_table group by customer_ID) b
on a.customer_ID = b.customer_ID

COUNT and GROUP BY over time

I have a need to create sales reports by day, week, month, etc. in PostgreSQL. I have the following tables setup:
tbl_products:
id INT
name VARCHAR
tbl_purchase_order:
id INT
order_timestamp TIMESTAMP
tbl_purchase_order_items:
id INT
product_id INT (FK to tbl_products.id)
order_id (FK to tbl_purchase_order.id)
I need to create a SQL query that returns the number of times a given product has been purchased within a given time frame. That is, I need to query the number of times a given product ID appears in a purchase order item in a specific month, day, year, etc. In an earlier question I learned how to use date_trunc() to truncate my TIMESTAMP column to the period of time I'm concerned about. Now I'm faced with how to perform the COUNT and GROUP BY properly.
I've tried several queries using various combinations of COUNT(XXX) and GROUP BY XXX but never seem to come up with what I'm expecting. Can someone give me guidance as to how to construct this query? I'm more of a Java developer, so I'm still getting up to speed on SQL queries. Thanks for any help you can provide.
Count per year:
SELECT oi.product_id,
extract(year from po.order_timestamp) as order_year
count(*)
FROM purchase_order_items oi
JOIN purchase_order po ON po.id = oi.order_id
GROUP BY extract(year from po.order_timestamp)
Counter per month:
SELECT oi.product_id,
extract(month from po.order_timestamp) as order_month
extract(year from po.order_timestamp) as order_year
count(*)
FROM purchase_order_items oi
JOIN purchase_order po ON po.id = oi.order_id
GROUP BY extract(year from po.order_timestamp),
extract(month from po.order_timestamp)
See the postgres datetime functions http://www.postgresql.org/docs/8.1/static/functions-datetime.html
I would suggest that you use the extract function, to split the year, month and day into discreet columns in the result set, and then group by as per your requirements.

Can I limit the amount of rows to be used for a group in a GROUP BY statement

I'm having an odd problem
I have a table with the columns product_id, sales and day
Not all products have sales every day. I'd like to get the average number of sales that each product had in the last 10 days where it had sales
Usually I'd get the average like this
SELECT product_id, AVG(sales)
FROM table
GROUP BY product_id
Is there a way to limit the amount of rows to be taken into consideration for each product?
I'm afraid it's not possible but I wanted to check if someone has an idea
Update to clarify:
Product may be sold on days 1,3,5,10,15,17,20.
Since I don't want to get an the average of all days but only the average of the days where the product did actually get sold doing something like
SELECT product_id, AVG(sales)
FROM table
WHERE day > '01/01/2009'
GROUP BY product_id
won't work
If you want the last 10 calendar day since products had a sale:
SELECT product_id, AVG(sales)
FROM table t
JOIN (
SELECT product_id, MAX(sales_date) as max_sales_date
FROM table
GROUP BY product_id
) t_max ON t.product_id = t_max.product_id
AND DATEDIFF(day, t.sales_date, t_max.max_sales_date) < 10
GROUP BY product_id;
The date difference is SQL server specific, you'd have to replace it with your server syntax for date difference functions.
To get the last 10 days when the product had any sale:
SELECT product_id, AVG(sales)
FROM (
SELECT product_id, sales, DENSE_RANK() OVER
(PARTITION BY product_id ORDER BY sales_date DESC) AS rn
FROM Table
) As t_rn
WHERE rn <= 10
GROUP BY product_id;
This asumes sales_date is a date, not a datetime. You'd have to extract the date part if the field is datetime.
And finaly a windowing function free version:
SELECT product_id, AVG(sales)
FROM Table t
WHERE sales_date IN (
SELECT TOP(10) sales_date
FROM Table s
WHERE t.product_id = s.product_id
ORDER BY sales_date DESC)
GROUP BY product_id;
Again, sales_date is asumed to be date, not datetime. Use other limiting syntax if TOP is not suported by your server.
Give this a whirl. The sub-query selects the last ten days of a product where there was a sale, the outer query does the aggregation.
SELECT t1.product_id, SUM(t1.sales) / COUNT(t1.*)
FROM table t1
INNER JOIN (
SELECT TOP 10 day, Product_ID
FROM table t2
WHERE (t2.product_ID=t1.Product_ID)
ORDER BY DAY DESC
)
ON (t2.day=t1.day)
GROUP BY t1.product_id
BTW: This approach uses a correlated subquery, which may not be very performant, but it should work in theory.
I'm not sure if I get it right but If you'd like to get the average of sales for last 10 days for you products you can do as follows :
SELECT Product_Id,Sum(Sales)/Count(*) FROM (SELECT ProductId,Sales FROM Table WHERE SaleDAte>=#Date) table GROUP BY Product_id HAVING Count(*)>0
OR You can use AVG Aggregate function which is easier :
SELECT Product_Id,AVG(Sales) FROM (SELECT ProductId,Sales FROM Table WHERE SaleDAte>=#Date) table GROUP BY Product_id
Updated
Now I got what you meant ,As far as I know it is not possible to do this in one query.It could be possible if we could do something like this(Northwind database):
select a.CustomerId,count(a.OrderId)
from Orders a INNER JOIN(SELECT CustomerId,OrderDate FROM Orders Order By OrderDate) AS b ON a.CustomerId=b.CustomerId GROUP BY a.CustomerId Having count(a.OrderId)<10
but you can't use order by in subqueries unless you use TOP which is not suitable for this case.But maybe you can do it as follows:
SELECT PorductId,Sales INTO #temp FROM table Order By Day
select a.ProductId,Sum(a.Sales) /Count(a.Sales)
from table a INNER JOIN #temp AS b ON a.ProductId=b.ProductId GROUP BY a.ProductId Having count(a.Sales)<=10
If this is a table of sales transactions, then there should not be any rows in there for days on which there were no Sales. I.e., If ProductId 21 had no sales on 1 June, then this table should not have any rows with productId = 21 and day = '1 June'... Therefore you should not have to filter anything out - there should not be anything to filter out
Select ProductId, Avg(Sales) AvgSales
From Table
Group By ProductId
should work fine. So if it's not, then you have not explained the problem completely or accurately.
Also, in yr question, you show Avg(Sales) in the example SQL query but then in the text you mention "average number of sales that each product ... " Do you want the average sales amount, or the average count of sales transactions? And do you want this average by Product alone (i.e., one output value reported for each product) or do you want the average per product per day ?
If you want the average per product alone, for just thpse sales in the ten days prior to now? or the ten days prior to the date of the last sale for each product?
If the latter then
Select ProductId, Avg(Sales) AvgSales
From Table T
Where day > (Select Max(Day) - 10
From Table
Where ProductId = T.ProductID)
Group By ProductId
If you want the average per product alone, for just those sales in the ten days with sales prior to the date of the last sale for each product, then
Select ProductId, Avg(Sales) AvgSales
From Table T
Where (Select Count(Distinct day) From Table
Where ProductId = T.ProductID
And Day > T.Day) <= 10
Group By ProductId