COUNT and GROUP BY over time - sql

I have a need to create sales reports by day, week, month, etc. in PostgreSQL. I have the following tables setup:
tbl_products:
id INT
name VARCHAR
tbl_purchase_order:
id INT
order_timestamp TIMESTAMP
tbl_purchase_order_items:
id INT
product_id INT (FK to tbl_products.id)
order_id (FK to tbl_purchase_order.id)
I need to create a SQL query that returns the number of times a given product has been purchased within a given time frame. That is, I need to query the number of times a given product ID appears in a purchase order item in a specific month, day, year, etc. In an earlier question I learned how to use date_trunc() to truncate my TIMESTAMP column to the period of time I'm concerned about. Now I'm faced with how to perform the COUNT and GROUP BY properly.
I've tried several queries using various combinations of COUNT(XXX) and GROUP BY XXX but never seem to come up with what I'm expecting. Can someone give me guidance as to how to construct this query? I'm more of a Java developer, so I'm still getting up to speed on SQL queries. Thanks for any help you can provide.

Count per year:
SELECT oi.product_id,
extract(year from po.order_timestamp) as order_year
count(*)
FROM purchase_order_items oi
JOIN purchase_order po ON po.id = oi.order_id
GROUP BY extract(year from po.order_timestamp)
Counter per month:
SELECT oi.product_id,
extract(month from po.order_timestamp) as order_month
extract(year from po.order_timestamp) as order_year
count(*)
FROM purchase_order_items oi
JOIN purchase_order po ON po.id = oi.order_id
GROUP BY extract(year from po.order_timestamp),
extract(month from po.order_timestamp)

See the postgres datetime functions http://www.postgresql.org/docs/8.1/static/functions-datetime.html
I would suggest that you use the extract function, to split the year, month and day into discreet columns in the result set, and then group by as per your requirements.

Related

Most rented movie by title PostgreSQL sample database

I am working on getting the most popular movies rented per month by title. So far I have been able to get the titles of the movies and the dates they were rented but the count column and date column are giving individual results. Here is the query I am using. Any help would be much appreciated.
SELECT x.rental_date, x.title, x.count FROM(
SELECT ren.rental_date,fil.title,COUNT(ren.rental_id)
FROM rental AS ren
JOIN inventory AS inv ON ren.inventory_id = inv.inventory_id
JOIN film AS fil ON inv.film_id = fil.film_id
GROUP BY title, rental_date) AS x
ORDER BY x.count,x.rental_date;
Simply aggregate by the year/month of the specific dates using to_char. Also, subquery is not necessary.
SELECT TO_CHAR(ren.rental_date, 'YYYY-MM') AS rental_month,
fil.title,
COUNT(ren.rental_id) AS rental_count
FROM rental AS ren
JOIN inventory AS inv
ON ren.inventory_id = inv.inventory_id
JOIN film AS fil
ON inv.film_id = fil.film_id
GROUP BY TO_CHAR(ren.rental_date, 'YYYY-MM'),
fil.title
ORDER BY rental_month,
rental_count DESC
Consider also date_part for year and month extraction or date_trunc to normalize dates to first day of month to keep the timestamp type:
SELECT DATE_PART('year', ren.rental_date) AS rental_year,
DATE_PART('month', ren.rental_date) AS rental_month,
...
SELECT DATE_TRUNC('month', ren.rental_date) AS rental_month,
...

Is there a way to count how many strings in a specific column are seen for the 1st time?

**Is there a way to count how many strings in a specific column are seen for
Since the value in the column 2 gets repeated sometimes due to the fact that some clients make several transactions in different times (the client can make a transaction in the 1st month then later in the next year).
Is there a way for me to count how many IDs are completely new per month through a group by (never seen before)?
Please let me know if you need more context.
Thanks!
A simple way is two levels of aggregation. The inner level gets the first date for each customer. The outer summarizes by year and month:
select year(min_date), month(min_date), count(*) as num_firsts
from (select customerid, min(date) as min_date
from t
group by customerid
) c
group by year(min_date), month(min_date)
order by year(min_date), month(min_date);
Note that date/time functions depends on the database you are using, so the syntax for getting the year/month from the date may differ in your database.
You can do the following which will assign a rank to each of the transactions which are unique for that particular customer_id (rank 1 therefore will mean that it is the first order for that customer_id)
The above is included in an inline view and the inline view is then queried to give you the month and the count of the customer id for that month ONLY if their rank = 1.
I have tested on Oracle and works as expected.
SELECT DISTINCT
EXTRACT(MONTH FROM date_of_transaction) AS month,
COUNT(customer_id)
FROM
(
SELECT
date_of_transaction,
customer_id,
RANK() OVER(PARTITION BY customer_id
ORDER BY
date_of_transaction ASC
) AS rank
FROM
table_1
)
WHERE
rank = 1
GROUP BY
EXTRACT(MONTH FROM date_of_transaction)
ORDER BY
EXTRACT(MONTH FROM date_of_transaction) ASC;
Firstly you should generate associate every ID with year and month which are completely new then count, while grouping by year and month:
SELECT count(*) as new_customers, extract(year from t1.date) as year,
extract(month from t1.date) as month FROM table t1
WHERE not exists (SELECT 1 FROM table t2 WHERE t1.id==t2.id AND t2.date<t1.date)
GROUP BY year, month;
Your results will contain, new customer count, year and month

SQL - Aggregate dates from different columns into Month/Year table

So I have an 'Orders' table that lists the 'Ordered' and 'Shipped' dates for each order.
These are custom products and it takes 1 week to fill orders.
This is pretty representative of the table I have:
I want to aggregate this into a table so that I can see how many orders were ordered and shipped for each month during the date range specified when the report is run, and I want the Months and years to automatically populate without me having to hardcode for each month and year:
What's the best way to do this with SQL?
I eventually want to place the aggregated table into an SSRS report so that you can expand/collapse each year, if needed.
Date/time functions are notoriously database dependent. Here is a typical approach, though:
select yyyy, mm, sum(num_ordered), sum(num_shipped)
from ((select year(ordered) as yyyy, month(ordered) as mm, count(*) as num_ordered, 0 as num_shipped
from orders
group by year(ordered), month(ordered)
) union all
(select year(shipped) as yyyy, month(shipped) as mm, 0 count(*) as num_shipped
from orders
group by year(shipped), month(shipped)
)
) ym
group by yyyy, mm;

Multiple Counts Over Multiple Dates

I am essentially doing the following query (edited):
Select count(orders)
From Orders_Table
Where Order_Open_Date<=##/##/####
and Order_Close_Date>=##/##/####
Where the ##/##/##### is the same date. So in essence the number of 'open' orders for any given day. However I am wanting this same count for every single day for a year and don't want to write a separate query for each day for the whole year. I'm sorry this is probably really simple but I am new to SQL and I guess I don't know how to search for an answer to this question since my searches have come up with nothing. Thanks for any help you can offer.
why not
select Order_Date, count(orders) from Orders_Table group by Order_Date
and for last year
select Order_Date, count(orders) from Orders_Table where Order_Date > DATE_SUB(CURDATE(), INTERVAL 1 YEAR) group by Order_Date;
SELECT CONVERT(VARCHAR, Order_Date, 110), count(orders)
FROM Orders_Table
WHERE Order_Date = BETWEEN #A AND #B
GROUP BY CONVERT(VARCHAR, Order_Date, 110)
If you want to have every day of the year, including those with no orders, you will need to generate a temporary table or similar containing every date in the range and left/right join it to the Orders_Table data. This depends upon which RDBMS you're using. In SQL Server I have done this using a user defined function which returns a table variable.

Can I limit the amount of rows to be used for a group in a GROUP BY statement

I'm having an odd problem
I have a table with the columns product_id, sales and day
Not all products have sales every day. I'd like to get the average number of sales that each product had in the last 10 days where it had sales
Usually I'd get the average like this
SELECT product_id, AVG(sales)
FROM table
GROUP BY product_id
Is there a way to limit the amount of rows to be taken into consideration for each product?
I'm afraid it's not possible but I wanted to check if someone has an idea
Update to clarify:
Product may be sold on days 1,3,5,10,15,17,20.
Since I don't want to get an the average of all days but only the average of the days where the product did actually get sold doing something like
SELECT product_id, AVG(sales)
FROM table
WHERE day > '01/01/2009'
GROUP BY product_id
won't work
If you want the last 10 calendar day since products had a sale:
SELECT product_id, AVG(sales)
FROM table t
JOIN (
SELECT product_id, MAX(sales_date) as max_sales_date
FROM table
GROUP BY product_id
) t_max ON t.product_id = t_max.product_id
AND DATEDIFF(day, t.sales_date, t_max.max_sales_date) < 10
GROUP BY product_id;
The date difference is SQL server specific, you'd have to replace it with your server syntax for date difference functions.
To get the last 10 days when the product had any sale:
SELECT product_id, AVG(sales)
FROM (
SELECT product_id, sales, DENSE_RANK() OVER
(PARTITION BY product_id ORDER BY sales_date DESC) AS rn
FROM Table
) As t_rn
WHERE rn <= 10
GROUP BY product_id;
This asumes sales_date is a date, not a datetime. You'd have to extract the date part if the field is datetime.
And finaly a windowing function free version:
SELECT product_id, AVG(sales)
FROM Table t
WHERE sales_date IN (
SELECT TOP(10) sales_date
FROM Table s
WHERE t.product_id = s.product_id
ORDER BY sales_date DESC)
GROUP BY product_id;
Again, sales_date is asumed to be date, not datetime. Use other limiting syntax if TOP is not suported by your server.
Give this a whirl. The sub-query selects the last ten days of a product where there was a sale, the outer query does the aggregation.
SELECT t1.product_id, SUM(t1.sales) / COUNT(t1.*)
FROM table t1
INNER JOIN (
SELECT TOP 10 day, Product_ID
FROM table t2
WHERE (t2.product_ID=t1.Product_ID)
ORDER BY DAY DESC
)
ON (t2.day=t1.day)
GROUP BY t1.product_id
BTW: This approach uses a correlated subquery, which may not be very performant, but it should work in theory.
I'm not sure if I get it right but If you'd like to get the average of sales for last 10 days for you products you can do as follows :
SELECT Product_Id,Sum(Sales)/Count(*) FROM (SELECT ProductId,Sales FROM Table WHERE SaleDAte>=#Date) table GROUP BY Product_id HAVING Count(*)>0
OR You can use AVG Aggregate function which is easier :
SELECT Product_Id,AVG(Sales) FROM (SELECT ProductId,Sales FROM Table WHERE SaleDAte>=#Date) table GROUP BY Product_id
Updated
Now I got what you meant ,As far as I know it is not possible to do this in one query.It could be possible if we could do something like this(Northwind database):
select a.CustomerId,count(a.OrderId)
from Orders a INNER JOIN(SELECT CustomerId,OrderDate FROM Orders Order By OrderDate) AS b ON a.CustomerId=b.CustomerId GROUP BY a.CustomerId Having count(a.OrderId)<10
but you can't use order by in subqueries unless you use TOP which is not suitable for this case.But maybe you can do it as follows:
SELECT PorductId,Sales INTO #temp FROM table Order By Day
select a.ProductId,Sum(a.Sales) /Count(a.Sales)
from table a INNER JOIN #temp AS b ON a.ProductId=b.ProductId GROUP BY a.ProductId Having count(a.Sales)<=10
If this is a table of sales transactions, then there should not be any rows in there for days on which there were no Sales. I.e., If ProductId 21 had no sales on 1 June, then this table should not have any rows with productId = 21 and day = '1 June'... Therefore you should not have to filter anything out - there should not be anything to filter out
Select ProductId, Avg(Sales) AvgSales
From Table
Group By ProductId
should work fine. So if it's not, then you have not explained the problem completely or accurately.
Also, in yr question, you show Avg(Sales) in the example SQL query but then in the text you mention "average number of sales that each product ... " Do you want the average sales amount, or the average count of sales transactions? And do you want this average by Product alone (i.e., one output value reported for each product) or do you want the average per product per day ?
If you want the average per product alone, for just thpse sales in the ten days prior to now? or the ten days prior to the date of the last sale for each product?
If the latter then
Select ProductId, Avg(Sales) AvgSales
From Table T
Where day > (Select Max(Day) - 10
From Table
Where ProductId = T.ProductID)
Group By ProductId
If you want the average per product alone, for just those sales in the ten days with sales prior to the date of the last sale for each product, then
Select ProductId, Avg(Sales) AvgSales
From Table T
Where (Select Count(Distinct day) From Table
Where ProductId = T.ProductID
And Day > T.Day) <= 10
Group By ProductId