I'm trying to implement a simple data warehouse analytic query, dealing with 'YEAR_VALUE', 'MONTH_VALUE' and a 'INVOICE_COST'
SELECT YEAR_VALUE, MONTH_VALUE, SUM (INVOICE_VALUE) AS TOTAL_INVOICE,
RANK () OVER (PARTITION BY GROUPING_ID (YEAR_VALUE, MONTH_VALUE) ORDER BY SUM (INVOICE_VALUE) DESC) AS YEAR_RANK,
RANK () OVER (PARTITION BY YEAR_VALUE, GROUPING_ID (MONTH_VALUE) ORDER BY SUM (INVOICE_VALUE) DESC) AS MONTH_RANK
FROM FACT_WH
JOIN TIME_WH ON TIME_WH.TIME_ID = FACT_WH.TIME_ID
GROUP BY (YEAR_VALUE, MONTH_VALUE);
The output is :
Output
'YEAR_RANK' should express year's total invoice value compared to other years, 2016 has a YEAR_RANK=1 and 2015 has a YEAR_RANK=2
The problem is that 'YEAR_RANK' has the values 1,2,3,4,5 it should be 1,1,2,2,1
I can't find the problem in my code, It's maybe in line #2, I tried everything and wasted much time already.
Thanks in advance.
A good approach, especially in case the query is complex and/or delivers confusing results is to divide the whole query in subqueries each solving a particular task.
In your case I'd recommend to first attack the join of the fact and dimension table and group by on YEAR and month to calculate the total_invoice
You get results such as
YEAR_VALUE MONTH_VALUE TOTAL_INVIOCE
---------- ----------- -------------
2016 3 29960
2016 1 10700
2015 11 5100
2015 8 1680
2016 2 800
Note that you don't need any GROUP BY extension such as GROUPING_ID, you'll solve everything using analytic functions
In the next step (using the previous result as a factored subquery) you calculate the year and months totals - using analytic version of SUM.
In the last step you calculate the RANK. Note that for the year you need
a DENSE_RANK, while otherwise you get 'skipped' ranks such as 1,3 (due to repeated records for one year).
The year_rank is not partitioned at all, the month_rankis partitioned on YEAR as you order the months within a year.
with data as (
-- perform join and group by in this subquery
select 2016 year_value, 3 month_value, 29960 total_invioce from dual union all
select 2016 year_value, 1 month_value, 10700 total_invioce from dual union all
select 2015 year_value, 11 month_value, 5100 total_invioce from dual union all
select 2015 year_value, 8 month_value, 1680 total_invioce from dual union all
select 2016 year_value, 2 month_value, 800 total_invioce from dual),
year_month as (
-- perform year and month summary here
select
year_value, month_value, total_invioce,
sum(total_invioce) over (partition by year_value) total_invoice_year,
sum(total_invioce) over (partition by month_value) total_invoice_month
from data
)
-- perform ranking here
select year_value, month_value, total_invioce,
dense_rank() OVER (ORDER BY total_invoice_year DESC) year_rank,
rank() OVER (partition by year_value ORDER BY total_invoice_month DESC) month_rank
from year_month
order by total_invioce desc;
YEAR_VALUE MONTH_VALUE TOTAL_INVIOCE YEAR_RANK MONTH_RANK
---------- ----------- ------------- ---------- ----------
2016 3 29960 1 1
2016 1 10700 1 2
2015 11 5100 2 1
2015 8 1680 2 2
2016 2 800 1 3
Related
I am using AWS Athena (Presto based) and I have this table named base:
id
category
year
month
1
a
2021
6
1
b
2022
8
1
a
2022
11
2
a
2022
1
2
a
2022
4
2
b
2022
6
I would like to craft a query that counts the distinct values of the categories per id, cumulatively per month and year, but retaining the original columns:
id
category
year
month
sumC
1
a
2021
6
1
1
b
2022
8
2
1
a
2022
11
2
2
a
2022
1
1
2
a
2022
4
1
2
b
2022
6
2
I've tried doing the following query with no success:
SELECT id,
category,
year,
month,
COUNT(category) OVER (PARTITION BY id, ORDER BY year, month) AS sumC FROM base;
This results in 1, 2, 3, 1, 2, 3 which is not what I'm looking for. I'd rather need something like a COUNT(DISTINCT) inside a window function, though it's not supported as a construct.
I also tried the DENSE_RANK trick:
DENSE_RANK() OVER (PARTITION BY id ORDER BY category)
+ DENSE_RANK() OVER (PARTITION BY id ORDER BY category)
- 1 as sumC
Though, because there is no ordering between year and month, it just results in 2, 2, 2, 2, 2, 2.
Any help is appreciated!
One option is
creating a new column that will contain when each "category" is seen for the first time (partitioning on "id", "category" and ordering on "year", "month")
computing a running sum over this column, with the same partition
WITH cte AS (
SELECT *,
CASE WHEN ROW_NUMBER() OVER(
PARTITION BY id, category
ORDER BY year, month) = 1
THEN 1
ELSE 0
END AS rn1
FROM base
ORDER BY id,
year_,
month_
)
SELECT id,
category,
year_,
month_,
SUM(rn1) OVER(
PARTITION BY id
ORDER BY year, month
) AS sumC
FROM cte
I have a table like below
ID_NUMBER
SALEDATA
SALEAMOUNT
1
2020-09-07
47,000
2
2020-03-25
51,470
3
2021-06-12
32,000
4
2018-10-12
37,560
I want to select the rows with the 2 most recent dates only. So my desired output would be like below
ID_NUMBER
SALEDATA
SALEAMOUNT
1
2020-09-07
47,000
3
2021-06-12
32,000
Can someone please guide me on where would i start with this in SQL? I tried using MAX() but it is only giving me the most recent.
Thank you!
In Standard SQL, you would use:
select t.*
from t
order by saledata desc
offset 0 row fetch first 2 row only;
Not all databases support fetch first. It might be spelled limit or select top or something else, depending on your database.
Another option, with the rank analytic function. Sample data till line #7, query begins at line #9. See comments within code.
SQL> with test (id_number, saledata, saleamount) as
2 -- sample data
3 (select 1, date '2020-09-07', 47000 from dual union all
4 select 2, date '2020-03-25', 51470 from dual union all
5 select 3, date '2021-06-12', 32000 from dual union all
6 select 4, date '2018-10-12', 37560 from dual
7 )
8 -- sort them by date in descending order, fetch the first two rows
9 select id_number, saledata, saleamount
10 from (select t.*,
11 rank() over (order by saledata desc) rn
12 from test t
13 )
14 where rn <= 2
15 order by saledata;
ID_NUMBER SALEDATA SALEAMOUNT
---------- ---------- ----------
1 2020-09-07 47000
3 2021-06-12 32000
SQL>
select top 2 from your data set order by the saledata column descending
YEAR MONTH BALANCE SSN
2016 1 3175 34/1043/03T
2016 1 2984 93/1194/07T
2016 1 2269 39/3149/00T
2015 12 3172 36/1011/03T
2015 12 2984 22/1224/07T
2015 12 2169 12/3143/00T
For example I have this table, but I have rows for each month of each year, and I have to choose the best ssn and balance of each month of each year. For example, here, I would like obtain this on my query:
YEAR MONTH BALANCE SSN
2016 1 3175 34/1043/03T
2015 12 3172 36/1011/03T
What can I do?
You can do this in several ways. A very Oracle'ish way is to use keep:
select year, month,
max(balance) as balance,
max(SSN) keep (dense_rank first order by balance desc) as ssn
from t
group by year, month;
Like most DBMSes Oracle supports ROW_NUMBER/RANK:
select *
from
(
select year, month, balance, SSN,
row_number()
over (partition by year, month
order by balance desc) as rn
from tab
) dt
where rn = 1
I have this table in SQL Server:
Year Month Quantity
----------------------------
2015 January 10
2015 February 20
2015 March 30
2014 November 40
2014 August 50
How can I identify the different years and months adding two more columns that group the same years with a number and then different months in sequential way like the example
Year Month Quantity Group Subgroup
------------------------------------------------
2015 January 10 1 1
2015 February 20 1 2
2015 March 30 1 3
2014 November 40 2 1
2014 August 50 2 2
You can use DENSE_RANK to calculate the groups for you:
SELECT t1.*, DENSE_RANK() OVER (ORDER BY Year DESC) AS [Group],
DENSE_RANK() OVER (PARTITION BY Year ORDER BY DATEPART(month, Month + ' 01 2010')) AS [SubGroup]
FROM t1
ORDER BY 4, 5
See this fiddle.
To associate group and subgroup with a number you can do this:
WITH RankedTable AS (
SELECT year, month, quantity,
ROW_NUMBER() OVER (partition by year order by Month) AS rn
FROM yourtable)
SELECT year, month, quantity,
SUM (CASE WHEN rn = 1 THEN 1 ELSE 0 END) OVER (ORDER BY YEAR) as year_group,
rn AS subgroup
FROM RankedTable
Here ROW_NUMBER() OVER clause calculates rank of a month within a year.
And SUM() ... OVER calculates running SUM for the months with rank 1.
SQL Fiddle
I'm trying to make a sum of tests given in a year from a specific table. This is what I have so far:
SELECT DISTINCT TO_CHAR(test_date, 'YYYY') AS Year, SUM(yearCount)
FROM(
SELECT COUNT(test_date) AS yearCount
FROM test_record
), test_record
GROUP BY test_record.test_date
ORDER BY Year ASC;
Which gives me the output:
YEAR SUM(YEARCOUNT)
---- --------------
1958 12
1991 12
1996 12
1998 12
2000 12
2001 12
2010 12
2012 12
2013 12
Now, I understand my problem lies here: SELECT COUNT(test_date) AS yearCount , because I have 12 entries in the table so it's obviously giving the count of the number of entries in the table. I need the count of tests given in each year, i.e. the output should look like this:
YEAR SUM(YEARCOUNT)
---- --------------
1958 1
1991 1
1996 1
1998 1
2000 1
2001 1
2010 1
2012 1
2013 4
So basically my question boils down to: How do I count by year in a date column?
(I'm using ORACLE 7 I believe)
EDIT: Thanks to the below help I was able to get my desired output, but they were both a little "wrong", so I didn't accept them (sorry if that's a Faux pas). Here is my script:
SELECT TO_CHAR(test_date, 'YYYY') AS Year, COUNT(test_date)
FROM test_record
GROUP BY TO_CHAR(test_date, 'YYYY')
ORDER BY Year ASC;
You want to group by year and not test date.
Select count(*), to_date('YYYY',test_data) as year
From test_record
Group by to_date('YYYY',test_date)
You can do group by alone , no need for subquery
SELECT TO_CHAR(test_date, 'YYYY') AS Year, COUNT(test_date)
GROUP BY TO_CHAR(test_date, 'YYYY')
ORDER BY Year ASC;