SQL counts of tests given in year - sql

I'm trying to make a sum of tests given in a year from a specific table. This is what I have so far:
SELECT DISTINCT TO_CHAR(test_date, 'YYYY') AS Year, SUM(yearCount)
FROM(
SELECT COUNT(test_date) AS yearCount
FROM test_record
), test_record
GROUP BY test_record.test_date
ORDER BY Year ASC;
Which gives me the output:
YEAR SUM(YEARCOUNT)
---- --------------
1958 12
1991 12
1996 12
1998 12
2000 12
2001 12
2010 12
2012 12
2013 12
Now, I understand my problem lies here: SELECT COUNT(test_date) AS yearCount , because I have 12 entries in the table so it's obviously giving the count of the number of entries in the table. I need the count of tests given in each year, i.e. the output should look like this:
YEAR SUM(YEARCOUNT)
---- --------------
1958 1
1991 1
1996 1
1998 1
2000 1
2001 1
2010 1
2012 1
2013 4
So basically my question boils down to: How do I count by year in a date column?
(I'm using ORACLE 7 I believe)
EDIT: Thanks to the below help I was able to get my desired output, but they were both a little "wrong", so I didn't accept them (sorry if that's a Faux pas). Here is my script:
SELECT TO_CHAR(test_date, 'YYYY') AS Year, COUNT(test_date)
FROM test_record
GROUP BY TO_CHAR(test_date, 'YYYY')
ORDER BY Year ASC;

You want to group by year and not test date.
Select count(*), to_date('YYYY',test_data) as year
From test_record
Group by to_date('YYYY',test_date)

You can do group by alone , no need for subquery
SELECT TO_CHAR(test_date, 'YYYY') AS Year, COUNT(test_date)
GROUP BY TO_CHAR(test_date, 'YYYY')
ORDER BY Year ASC;

Related

SQL - use only clients that are present in all months

I have a dataset with different clients, and their sales count. Over time, some clients get added and deleted from the data. How do I make sure that when I look at the sales counts, that I am only using a selection of the clients that were in the data set all the time? Ie if I have a client that doesn't have a record for 2018-03, then I don't want that client to be part of the entire query. If a clients does not have a record in 2020-03, then I also do not want this client to be part of the entire query.
For example, the following query:
select DATE_PART (y, sold_date)as year, DATE_PART (mm, sold_date) as month, count(distinct(client))
from sales_data
where sold_date > '2018-01-01'
group by year, month
order by year,month
Yields
year month count
2018 1 78
2018 2 83
2018 3 80
2018 4 83
2018 5 84
2018 6 81
2018 7 83
2018 8 90
2018 9 89
2018 10 95
2018 11 94
2018 12 97
2019 1 102
2019 2 103
2019 3 102
2019 4 105
2019 5 103
2019 6 104
2019 7 104
2019 8 106
2019 9 106
2019 10 108
2019 11 109
2019 12 104
2020 1 104
2020 2 102
2020 3 103
2020 4 98
2020 5 97
2020 6 79
So I want to only use the clients that are in all months, they should not be more than 78, because there can not be more users than the minimal month (2018-1).
FYI, I am using Amazon Redshift here but I am OK with a query that's rdbms agnostic or works for SQL-Server/Oracle/MySQL/PostgreSQL, I am just interested in a pattern on how to solve this issue effectively.
If I'm understanding what you want correctly, and if this is just a one-off query, you could use a correlated subquery in the where clause:
SELECT
DATE_PART(y, s.sold_date) AS year,
DATE_PART(mm, s.sold_date) AS month,
COUNT(DISTINCT s.client)
FROM
sales_data AS s
WHERE
EXISTS (
SELECT sd.client FROM sales_data AS sd WHERE DATE_PART(y,
sd.sold_date) = 2018 AND DATE_PART(mm, sd.sold_date) = 1 AND
sd.client = s.client
) AND
s.sold_date > '2018-01-01'
GROUP BY
year,
month
ORDER
DATE_PART(y, s.sold_date),
DATE_PART(mm, s.sold_date)
presence in all months can be done with 2-step aggregation:
group sales data by customer ID having all months
group sales data joined to (1) by year, month
like this (=12 can be a dynamic expression, depending on the amount of history you have)
with
stable_customers as (
select customer_id
from sales_data
group by 1
having count(distinct date_trunc('month' from sold_date)=12
)
select
DATE_PART (y, sold_date) as year
,DATE_PART (mm, sold_date) as month,
,count(1)
from sales_date
join stable_customers
using (customer_id)
where sold_date > '2018-01-01'
group by year, month
order by year,month
Use window functions. Unfortunately, SQL Server does not support count(distinct) as a window function. Fortunately, there is a simple work-around using dense_rank():
select year, month, count(distinct client)
from (select sd.*, year, month,
(dense_rank() over (order by year, month) +
dense_rank() over (order by year desc, month desc)
) as num_months,
(dense_rank() over (partition by client order by year, month) +
dense_rank() over (partition by client order by year desc, month desc)
) as num_months_client
from sales_data sd cross apply
(values (year(sold_date), month(sold_date))) v(year, month)
where sd.sold_date > '2018-01-01'
) sd
where num_months_client = num_months
group by year, month
order by year, month;
Note: This looks at all months that are in the data. If all clients are missing 2019-03, then that months is not considered at all.

Oracle : SUM values between max and min dates

I am struggling with a query and think I need some help.
An example of the data I use :
I need to sum the values between the max DATE of each year that have the O (zero) TYPE.
The results should be :
YEAR - SUM
2015 - 10
2016 - 20
2017 - 20
2018 - 35
The sum must include lines with TYPE 1 or 2 but these lines cannot be used in the identification of the MAX DATE by year.
To be clear, in the data I provided, if we had
then the max DATE for 2018 would be 05/07/2018 and the results of the query :
YEAR - SUM
2015 - 10
2016 - 20
2017 - 20
2018 - 30
Identifying max value per year isn't a problem but excluding 1 and 2 TYPEs and then summing data is where I'm stuck at.
thanks in advance.
EDIT : correction of the results.
EDIT2 : another example
The expected results
YEAR - SUM
2015 - 10
2016 - 20
2017 - 20
2018 - 30
The expected results
YEAR - SUM
2015 - 10
2016 - 20
2017 - 20
2018 - 38
I don't know what "max date" has to do with this. You seem to want:
select to_char(date, 'YYYY') as yr, sum(value)
from t
where type = 0
group by to_char(date, 'YYYY')
order by to_char(date, 'YYYY');
EDIT:
If you want both the max date and the sum, then use conditional aggregation:
select to_char(date, 'YYYY') as yr, max(date) as max_date,
sum(case when type = 0 then value end) as sum_type0
from t
group by to_char(date, 'YYYY')
order by to_char(date, 'YYYY');

Multiple Group by in SQL

I have a table like this.
Year Month TenDays Pay
========================================
2015 8 2 12
2016 8 1 43
2016 8 2 11
2016 9 1 22
2016 9 2 33
2016 9 3 4
2016 9 3 25
I want to have SQL query that calculate sum of 'Pay' as 'TotalTenDays' group by 'year' and 'Month' and 'TenDays'
and also calculate sum of 'Pay' as 'TotalMonth' group by 'year' and 'Month'.
I can do that with "union all" but I am searching for a way without using union and 'with cte as()'.
Is it Possible?
Expected table must be like this:
Year Month TenDays TotalTenDays TotalMonth
====================================================================
2015 8 2 12 12
2016 8 1 43 54
2016 8 2 11 54
2016 9 1 22 84
2016 9 2 33 84
2016 9 3 29 84
The answer depends on the database dialect.
The first 4 columns are standard SQL GROUP BY logic, i.e. GROUP BY Year, Month, TenDays with a SUM(Pay) AS TotalTenDays result column.
The TotalMonth column is best done with a windowing function using the OVER clause, but that's only if the SQL dialect supports it.
E.g. for SQL Server, you can do this:
SELECT Year, Month, TenDays
, SUM(Pay) AS TotalTenDays
, SUM(SUM(Pay)) OVER (PARTITION BY Year, Month) AS TotalMonth
FROM MyTable
GROUP BY Year, Month, TenDays
ORDER BY Year, Month, TenDays
See SQL Fiddle for running query using MS SQL Server 2017.
If the SQL dialect doesn't support windowing functions, then suggestion in comment by Jonathan Leffler is a good alternative:
You need to create two sub-queries that do the aggregations, and then join the results of those two sub-queries. Each sub-query will have a different GROUP BY clause.
SELECT a.Year, a.Month, a.TenDays, a.TotalTenDays, b.TotalMonth
FROM ( SELECT Year, Month, TenDays
, SUM(Pay) AS TotalTenDays
FROM MyTable
GROUP BY Year, Month, TenDays
) a
JOIN ( SELECT Year, Month
, SUM(Pay) AS TotalMonth
FROM MyTable
GROUP BY Year, Month
) b ON b.Year = a.Year
AND b.Month = a.Month
ORDER BY a.Year, a.Month, a.TenDays
See SQL Fiddle for running query using MySQL 5.6.

Oracle PARTITION BY GROUPING_ID with SUM

I'm trying to implement a simple data warehouse analytic query, dealing with 'YEAR_VALUE', 'MONTH_VALUE' and a 'INVOICE_COST'
SELECT YEAR_VALUE, MONTH_VALUE, SUM (INVOICE_VALUE) AS TOTAL_INVOICE,
RANK () OVER (PARTITION BY GROUPING_ID (YEAR_VALUE, MONTH_VALUE) ORDER BY SUM (INVOICE_VALUE) DESC) AS YEAR_RANK,
RANK () OVER (PARTITION BY YEAR_VALUE, GROUPING_ID (MONTH_VALUE) ORDER BY SUM (INVOICE_VALUE) DESC) AS MONTH_RANK
FROM FACT_WH
JOIN TIME_WH ON TIME_WH.TIME_ID = FACT_WH.TIME_ID
GROUP BY (YEAR_VALUE, MONTH_VALUE);
The output is :
Output
'YEAR_RANK' should express year's total invoice value compared to other years, 2016 has a YEAR_RANK=1 and 2015 has a YEAR_RANK=2
The problem is that 'YEAR_RANK' has the values 1,2,3,4,5 it should be 1,1,2,2,1
I can't find the problem in my code, It's maybe in line #2, I tried everything and wasted much time already.
Thanks in advance.
A good approach, especially in case the query is complex and/or delivers confusing results is to divide the whole query in subqueries each solving a particular task.
In your case I'd recommend to first attack the join of the fact and dimension table and group by on YEAR and month to calculate the total_invoice
You get results such as
YEAR_VALUE MONTH_VALUE TOTAL_INVIOCE
---------- ----------- -------------
2016 3 29960
2016 1 10700
2015 11 5100
2015 8 1680
2016 2 800
Note that you don't need any GROUP BY extension such as GROUPING_ID, you'll solve everything using analytic functions
In the next step (using the previous result as a factored subquery) you calculate the year and months totals - using analytic version of SUM.
In the last step you calculate the RANK. Note that for the year you need
a DENSE_RANK, while otherwise you get 'skipped' ranks such as 1,3 (due to repeated records for one year).
The year_rank is not partitioned at all, the month_rankis partitioned on YEAR as you order the months within a year.
with data as (
-- perform join and group by in this subquery
select 2016 year_value, 3 month_value, 29960 total_invioce from dual union all
select 2016 year_value, 1 month_value, 10700 total_invioce from dual union all
select 2015 year_value, 11 month_value, 5100 total_invioce from dual union all
select 2015 year_value, 8 month_value, 1680 total_invioce from dual union all
select 2016 year_value, 2 month_value, 800 total_invioce from dual),
year_month as (
-- perform year and month summary here
select
year_value, month_value, total_invioce,
sum(total_invioce) over (partition by year_value) total_invoice_year,
sum(total_invioce) over (partition by month_value) total_invoice_month
from data
)
-- perform ranking here
select year_value, month_value, total_invioce,
dense_rank() OVER (ORDER BY total_invoice_year DESC) year_rank,
rank() OVER (partition by year_value ORDER BY total_invoice_month DESC) month_rank
from year_month
order by total_invioce desc;
YEAR_VALUE MONTH_VALUE TOTAL_INVIOCE YEAR_RANK MONTH_RANK
---------- ----------- ------------- ---------- ----------
2016 3 29960 1 1
2016 1 10700 1 2
2015 11 5100 2 1
2015 8 1680 2 2
2016 2 800 1 3

Grouping data on SQL Server

I have this table in SQL Server:
Year Month Quantity
----------------------------
2015 January 10
2015 February 20
2015 March 30
2014 November 40
2014 August 50
How can I identify the different years and months adding two more columns that group the same years with a number and then different months in sequential way like the example
Year Month Quantity Group Subgroup
------------------------------------------------
2015 January 10 1 1
2015 February 20 1 2
2015 March 30 1 3
2014 November 40 2 1
2014 August 50 2 2
You can use DENSE_RANK to calculate the groups for you:
SELECT t1.*, DENSE_RANK() OVER (ORDER BY Year DESC) AS [Group],
DENSE_RANK() OVER (PARTITION BY Year ORDER BY DATEPART(month, Month + ' 01 2010')) AS [SubGroup]
FROM t1
ORDER BY 4, 5
See this fiddle.
To associate group and subgroup with a number you can do this:
WITH RankedTable AS (
SELECT year, month, quantity,
ROW_NUMBER() OVER (partition by year order by Month) AS rn
FROM yourtable)
SELECT year, month, quantity,
SUM (CASE WHEN rn = 1 THEN 1 ELSE 0 END) OVER (ORDER BY YEAR) as year_group,
rn AS subgroup
FROM RankedTable
Here ROW_NUMBER() OVER clause calculates rank of a month within a year.
And SUM() ... OVER calculates running SUM for the months with rank 1.
SQL Fiddle