SQL - Aggregate dates from different columns into Month/Year table - sql

So I have an 'Orders' table that lists the 'Ordered' and 'Shipped' dates for each order.
These are custom products and it takes 1 week to fill orders.
This is pretty representative of the table I have:
I want to aggregate this into a table so that I can see how many orders were ordered and shipped for each month during the date range specified when the report is run, and I want the Months and years to automatically populate without me having to hardcode for each month and year:
What's the best way to do this with SQL?
I eventually want to place the aggregated table into an SSRS report so that you can expand/collapse each year, if needed.

Date/time functions are notoriously database dependent. Here is a typical approach, though:
select yyyy, mm, sum(num_ordered), sum(num_shipped)
from ((select year(ordered) as yyyy, month(ordered) as mm, count(*) as num_ordered, 0 as num_shipped
from orders
group by year(ordered), month(ordered)
) union all
(select year(shipped) as yyyy, month(shipped) as mm, 0 count(*) as num_shipped
from orders
group by year(shipped), month(shipped)
)
) ym
group by yyyy, mm;

Related

SQL Pivot table, with multiple pivots on criteria

Here is my dataset,
It has a reservation (unique ID) a reservation_dt a fiscal year (all the same year for the most part) month both numerical and name as well as a reservation status then it has total number reserved followed by a counter (basically
1 for each reservation row)
these are my guidelines (they need to be turned into columns by Month)
Requested - Count of All Distinct reservations
Num_Requested (sum total_number_requested by month)
Booked (count of All Distinct reservations status is order created)
Num_Booked (sum total_number_requested by month) where status is order created
Not_Booked (count of All Distinct reservations where status unfulfilled)
Not_Num_Booked, (sum total_number_requested by month where status is unfulfilled)
I am looking to translate this into a pivot table and this is what I've got so far and can't figure out why its not working.
I figured I would turn each of the above guidlines into a column, using either sum(total_number_Requested) or count(total_requested) where reseravation status is ... and such.
I'm open to any other ideas of how to make this simpler and make it work.
SELECT [month_name],
fyear AS fyear,
Requested,
Num_Requested
FROM (SELECT reservation,
reservation_status,
total_number_requested,
fyear,
[month_name],
[month],
total_requested
FROM #temp2) SourceTable
PIVOT (SUM(total_number_requested)
FOR reservation_status IN ([Requested])) PivotNumbRequested PIVOT(COUNT(reservation)
FOR total_requested IN ([Num_Requested])) PivotCountRequested
WHERE [month] = 7
ORDER BY fyear,
[month];
Use conditional expressions to emulate data pivot. Example:
SELECT fyear, Month, Monthname, Count(*) AS CountALL, Sum(total_number_requested) AS TotNum,
Sum(IIf(reservation_status = "Order Created", total_number_Requested, Null)) AS SumCreated
FROM tablename
GROUP BY fyear, Month, MonthName
More info:
SQLServer - Multiple PIVOT on same columns
Crosstab Query on multiple data points

Find the maximum average over a specific period, two tables

My task sounds like this: "Select sales territory (name) with sales in May 2013 higher than the average monthly sales per sales territory (Use SalesTerritory, SalesHeader tables)." As I understand it, logically, I need to find what territory was the maximum average for May 2013, while I need to link two tables (the "name" field in the "salesterritory" table, the rest of the data in the second, but the "name" must be present).
I tried to divide the task into parts, and find at least a territory by id without a name, here is my code:
SELECT TerritoryID, MAX(avga.sal)
from (select YEAR(OrderDate) AS 'Year', MONTH(OrderDate) AS 'Month', TerritoryID, AVG(TotalDue) AS 'sal'
FROM Sales.SalesOrderHeader
GROUP BY YEAR(OrderDate), MONTH(OrderDate), TerritoryID
having YEAR(OrderDate)=2013) as avga
group by TerritoryID
This result does not appear to be correct even at this stage. Please help how to do it right? At least without the second table.
Can you try this steps:
Separate this query into small queries that collect part of the data you want and make sense to you, for example: A query to select the sales territory (name) with sales in May 2013; another query that brings the average monthly sales by sales territory etc. This will help you understand parts of the main query that you will create.
You can now try this in one query. Perhaps common table expressions is an easier approach. Here are some examples: CTE
I believe you need both the average per territory in May 2013, but also the average across all territories for the same month. Note the use of OVER() in the query below. This clause enables to calculation of an average across multiple rows, which is ideal in this situation because we need to only return those territories that have their figures higher than the overall average.
select
yyyy
, mm
, TerritoryID
, territory_av
, month_av
from (
SELECT
yyyy
, mm
, TerritoryID
, territory_av
, AVG(av_value) OVER() month_av
FROM (
SELECT
YEAR(OrderDate) AS yyyy
, MONTH(OrderDate) AS mm
, TerritoryID
, AVG(TotalDue) AS territory_av
FROM Sales.SalesOrderHeader
WHERE YEAR(OrderDate) = 2013
AND MONTH(OrderDate) = 5
GROUP BY
YEAR(OrderDate)
, MONTH(OrderDate)
, TerritoryID
) AS derive1
) AS derive2
) AS derive3
WHERE territory_av > month_av
;
Don't use having as an alternative for where. Use where to filter table data which reduces the data processed by group by. Use having to filter aggregated values which happens after group by.
Regarding filtering for May 2013, it is more efficient to NOT use functions on data to assist filtering in a where clause. A more generic way to select a date range (that does not require changing data via functions) is like this:
WHERE OrderDate >= '2013-05-01'
AND OrderDate < '2013-06-01'
Syntax for dates differes amongst databases, you might need to convert the date literals into a date (or timestamp)
WHERE OrderDate >= to_date('2013-05-01','yyyy-mm-dd')
AND OrderDate < to_date('2013-06-01','yyyy-mm-dd')
or, in SQL Server you could use this:
WHERE OrderDate >= '20130501'
AND OrderDate < '20130601'

Is there a way to count how many strings in a specific column are seen for the 1st time?

**Is there a way to count how many strings in a specific column are seen for
Since the value in the column 2 gets repeated sometimes due to the fact that some clients make several transactions in different times (the client can make a transaction in the 1st month then later in the next year).
Is there a way for me to count how many IDs are completely new per month through a group by (never seen before)?
Please let me know if you need more context.
Thanks!
A simple way is two levels of aggregation. The inner level gets the first date for each customer. The outer summarizes by year and month:
select year(min_date), month(min_date), count(*) as num_firsts
from (select customerid, min(date) as min_date
from t
group by customerid
) c
group by year(min_date), month(min_date)
order by year(min_date), month(min_date);
Note that date/time functions depends on the database you are using, so the syntax for getting the year/month from the date may differ in your database.
You can do the following which will assign a rank to each of the transactions which are unique for that particular customer_id (rank 1 therefore will mean that it is the first order for that customer_id)
The above is included in an inline view and the inline view is then queried to give you the month and the count of the customer id for that month ONLY if their rank = 1.
I have tested on Oracle and works as expected.
SELECT DISTINCT
EXTRACT(MONTH FROM date_of_transaction) AS month,
COUNT(customer_id)
FROM
(
SELECT
date_of_transaction,
customer_id,
RANK() OVER(PARTITION BY customer_id
ORDER BY
date_of_transaction ASC
) AS rank
FROM
table_1
)
WHERE
rank = 1
GROUP BY
EXTRACT(MONTH FROM date_of_transaction)
ORDER BY
EXTRACT(MONTH FROM date_of_transaction) ASC;
Firstly you should generate associate every ID with year and month which are completely new then count, while grouping by year and month:
SELECT count(*) as new_customers, extract(year from t1.date) as year,
extract(month from t1.date) as month FROM table t1
WHERE not exists (SELECT 1 FROM table t2 WHERE t1.id==t2.id AND t2.date<t1.date)
GROUP BY year, month;
Your results will contain, new customer count, year and month

group by year month in postgresql

customer Date location
1 25Jan2018 texas
2 15Jan2018 texas
3 12Feb2018 Boston
4 19Mar2017 Boston.
I am trying to find out count of customers group by yearmon of Date column.Date column is of text data type
eg: In jan2018 ,the count is 2
I would do something like the following:
SELECT
date_part('year', formattedDate) as Year
,date_part('month', formattedDate) as Month
,count(*) as CustomerCountByYearMonth
FROM
(SELECT to_date(Date,'DDMonYYYY') as formattedDate from <table>) as tbl1
GROUP BY
date_part('year', formattedDate)
,date_part('month', formattedDate)
Any additional formatting for dates could be done on the inner query that will allow for adjustments in case some single digit days need to be padded or a month has four letters instead of three etc.
By converting to date type, you can properly order by date type and not alphabetical etc.
Optionally:
SELECT
Year
,Month
,count(*) as CustomerCountByYearMonth
FROM
(SELECT
date_part('year', to_date(Date,'DDMonYYYY')) as Year
,date_part('month', to_date(Date,'DDMonYYYY')) as Month
FROM <table>) as tbl1
GROUP BY
Year
,Month
You shouldn't store dates in a text column...
select substring(Date, length(Date)-6), count(*)
from tablename
group by substring(Date, length(Date)-6)
I thought #Jarlh asked a good question -- what about dates like January 1? Is it 01Jan2019 or 1Jan2019? If it can be either, perhaps a regex would work.
select
substring (date from '\d+(\D{3}\d{4})') as month,
count (distinct customer)
from t
group by month
The 'distinct customer' also presupposes you may have the same customer listed in the same month, but you only want to count it once. If that's not the case, just remove 'distinct.'
And, if you wanted the output in date format:
select
to_date (substring (date from '\d+(\D{3}\d{4})'), 'monyyyy') as month,
count (distinct customer)
from t
group by month
If it is a date column, you can truncate the date:
select date_trunc('month', date) as yyyymm, count(*)
from t
group by yyyymm
order by yyyymm;
I really read that the type was date. For a string, just use string functions:
select substr(date, 3, 7) as mmmyyyy, count(*)
from t
group by mmmyyyy;
Unfortunately, ordering doesn't work in this case. You should really be storing dates using the proper type.

Calculate the sum of a column on weekly basis in hive

I have a table say testTable in Hive(with data for 3 years) with the following columns:
retailers, order_total, order_total_qty, order_date
I have to create a new table with these columns:
'source_name' as source, sum(retailers), sum(order_total), sum(order_total_qty)
for each week from the starting order_date.
I am stuck with this. How can I group following data in the way that it will sum up on weekly basis.
Use WEEKOFYEAR() function to calculate aggregation on weekly basis.
select
'source_name' source,
sum(retailers) sum_retailers,
sum(order_total) sum_order_total,
sum(order_total_qty) sum_order_total_qty,
WEEKOFYEAR(order_date) week,
year(order_date) year
from testTable
where order_date >= '2015-01-01' --start_date
group by WEEKOFYEAR(order_date), year(order_date)
order by year, week; --order if necessary