I have a table in Oracle with columns: [DATEID date, COUNT_OF_PHOTOS int]
This table basically represents how many photos were uploaded per day.
I have a query that summarizes the number of photos uploaded per month:
select extract(year from dateid) as year, extract(month from dateid) as month, count(1) as Photos
from picture_table
group by extract(year from dateid), extract(month from dateid)
order by 1, 2
This does what I want, but I would like to run this query at the beginning of each month, lets say 07-02-2012, and have all data EXCLUDING the current month. How would I add a WHERE clause that ignores all entries that have a date equal to the current year+month?
Here is one way:
where to_char(dateid, 'YYYY-MM') <> to_char(sysdate, 'YYYY-MM')
To preserve any indexing strategy you may have on dateid:
select extract(year from dateid) as year, extract(month from dateid) as month, count(1) as Photos
from picture_table
WHERE (dateid < TRUNC(SYSDATE,'MM') OR dateid >= ADD_MONTHS(TRUNC(SYSDATE,'MM'),1))
group by extract(year from dateid), extract(month from dateid)
order by 1, 2
Related
Sample contents are:
id
created_dt
data
1
2023-01-14 11:52:41
{"customers": 1, "payments: 2}
2
2023-01-15 11:53:43
{"customers": 1, "payments: 2}
3
2023-01-18 11:51:45
{"customers": 1, "payments: 2}
4
2023-01-15 11:50:48
{"customers": 1, "payments: 2}
ID 4 or 2 should be distinct.
I want to get a result as follows:
year
week
customers
payments
2023
2
2
4
2023
3
1
2
I solved this problem in this way
SELECT
date_part('year', sq.created_dt) AS year,
date_part('week', sq.created_dt) AS week,
sum((sq.data->'customers')::int) AS customers,
sum((sq.data->'payments')::int) AS payments
FROM
(SELECT DISTINCT ON (created_dt::date) created_dt, data
FROM analytics) sq
GROUP BY
year, week
ORDER BY
year, week;
However, that subquery greatly complicates the query. Is there is a better method?
I need group the data by each week, however I also need to remove duplicate days.
Generate series to create the join table would solve the problem :
SELECT sum((sq.data->'customers')::int) as customers,
sum((sq.data->'payments')::int) as payments,
date_part('year', dategroup ) as year,
date_part('week', dategroup ) as week,
FROM generate_series(current_date , current_date+interval '1 month' , interval'1 week') AS dategroup
JOIN analytics AS a ON a.created_dt >= dategroup AND a.created_dt <= a.created_dt+interval '1 week'
GROUP BY dategroup
ORDER BY dategroup
First of all, I think your query is quite simple and understandable.
Here is the query with a with-query in it, in some point it adds more readabilty:
WITH unique_days_data AS (
SELECT DISTINCT created_dt::date, data_json
FROM analytics)
SELECT
date_part('year', ud.created_dt) as year,
date_part('week', ud.created_dt) as week,
sum((ud.data_json->'customers')::int) as customers,
sum((ud.data_json->'payments')::int) as payments
FROM unique_days_data ud
GROUP BY year, week
ORDER BY year, week;
The difference is that the first query uses the DISTINCT clause, not the DISTINCT ON clause.
Here is the sql fiddle.
You can simplify it by adding partitioning on "created_id::date", then filter last aggregated record for each week using FETCH FIRST n ROWS WITH TIES.
SELECT date_part('year', created_dt) AS year,
date_part('week', created_dt) AS week,
SUM((data->>'customers')::int) AS customers,
SUM((data->>'payments')::int) AS payments
FROM analytics
GROUP BY year, week, created_dt::date
ORDER BY ROW_NUMBER() OVER(
PARTITION BY date_part('week', created_dt)
ORDER BY created_dt::date DESC
)
FETCH FIRST 1 ROWS WITH TIES
Check the demo here.
I am trying to find the month on month growth rate of orders for the past 3 months for each country.
So far I have tried:
select date_part('month', order_date) as mnth,
country_id,
100 * (count() - lag(count(), 1) over (order by order_date)) / lag(count(), 1) over (order by order_date) as growth
from orders
and order_date >= DATEADD(DAY, -90, GETDATE())
group by country_id;
When we GROUP BY country_id, we produce a result of rows, one per country.
The aggregate COUNT will then operate on one group for each country and the subsequent window function (LAG) won't see more than one row for each country.
There's no way, in this context, LAG can be used to obtain data for a prior month for the same country.
GROUP BY country_id, date_part('month', order_date) is one approach that could be used. Be sure to LAG OVER PARTITIONs for each country, ordered by date.
Here's a small change in your SQL that might help (not tested and just a starting point).
Note: I used SQL Server to test below. Convert datepart to date_part as needed.
Fiddle for SQL Server
WITH cte AS (
SELECT *, datepart(month, order_date) AS mnth
FROM orders
WHERE order_date >= DATEADD(DAY, -90, GETDATE())
)
SELECT mnth
, country_id
, 100 * (COUNT(*) - LAG(COUNT(*)) OVER (PARTITION BY country_id ORDER BY mnth)) / LAG(COUNT(*)) OVER (PARTITION BY country_id ORDER BY mnth) AS growth
FROM cte
GROUP BY country_id, mnth
;
I have the following scenario
img
For each year I would like to display the month with the highest number of projects that have ended
I have tried the following so far:
SELECT COUNT(proj.projno) nr_proj, extract(month from proj.end_date) month
, extract(year from proj.end_date) year
FROM PROJ
GROUP BY extract(month from proj.end_date)
,extract(year from proj.end_date)
I am getting the information about the number of projects per month, per year.
Could any one give me hints how for each of the years I would select only the records with the highest count of projects?
You can use this solution using max analytic function to get max nr_proj value per year (partition by clause), then keep only rows where nr_proj = mx.
select t.nr_proj, t.month, t.year
from (
SELECT COUNT(proj.projno) nr_proj
, extract(month from proj.end_date) month
, extract(year from proj.end_date) year
, max( COUNT(proj.projno) ) over(partition by extract(year from proj.end_date)) mx
FROM PROJ
GROUP BY extract(month from proj.end_date), extract(year from proj.end_date)
) t
where nr_proj = mx
;
demo
I think the following will give you what you are after (if I understood the requirements). It fist counts the projects for each month then ranks the months by year, finally it selects the first rank.
select dt "Most Projects Month", cnt "Monthly Projects"
from ( -- Rank month Count by Year
select to_char( dt, 'yyyy-mm') dt
, cnt
, rank() over (partition by extract(year from dt)
order by cnt desc) rnk
from (-- count number of in month projects for each year
select trunc(end_date,'mon') dt, count(*) cnt
from projects
group by trunc(end_date,'mon')
)
)
where rnk = 1
order by dt;
NOTE: Not tested, no data supplied. In future do not post images, see Why No Images.
I have a table which has 2 fields timestamp and count. Table has data since 2016 November.
I have to set up a query which will daily aggregate the YTD sum(count) for all the years. I am not using calendar year definition but rather November-October (Next year). This shouldn't ideally change the logic
2017: 11/01/2016-10/31/2017;
2018: 11/01/2017-10/31/2018;
2019: 11/01/2018-10/31/2019;
2020: 11/01/2019-10/31/2020
I want a query that will calculate on any given day aggregate YTD with November 1st as the start date. I tried this query
select ytd_bucket
,sum(count_field) sum
from
(
select
timestamp_field,
count_field,
CASE
WHEN DATE(timestamp_field,"America/Los_Angeles") >= '2019-11-01' THEN '2020'
WHEN DATE(timestamp_field,"America/Los_Angeles") BETWEEN '2018-11-01' AND CAST(CONCAT('2019-',FORMAT_DATE('%m-%d', DATE(CURRENT_TIMESTAMP(),"America/Los_Angeles"))) AS DATE) THEN '2019'
WHEN DATE(timestamp_field,"America/Los_Angeles") BETWEEN '2017-11-01' AND CAST(CONCAT('2018-',FORMAT_DATE('%m-%d', DATE(CURRENT_TIMESTAMP(),"America/Los_Angeles"))) AS DATE) THEN '2018'
WHEN DATE(timestamp_field,"America/Los_Angeles") BETWEEN '2016-11-01' AND CAST(CONCAT('2017-',FORMAT_DATE('%m-%d', DATE(CURRENT_TIMESTAMP(),"America/Los_Angeles"))) AS DATE) THEN '2017'
ELSE NULL END as YTD_bucket
from table
)
group by 1
The above query does not aggregate the numbers are a YTD level. For the years prior to 2020 (ytd_bucket) the query is aggregating the entire years count.
Start by aggregating per day:
select date(timestamp_field, 'America/Los_Angeles') as dte,
count(*)
from table
group by dte;
Then, for the YTD, you want to add one year and get the date:
select dte,
count(*),
sum(count(*)) over (partition by extract(year from date_add(dte, interval 1 month))
order by min(timestamp_field)
) as running_cnt
from (select t.*,
date(timestamp_field, 'America/Los_Angeles') as dte
from t
) t
group by dte;
SELECT EVENT_DT - ((EVENT_DT -DATE'1900-01-07') MOD 7) AS dates,
CLSFD_USER_ID AS user_id,
COUNT(DISTINCT CLSFD_USER_ID) AS number_of_user_ids,
COUNT(DISTINCT CLSFD_CAS_AD_ID) AS number_of_ads,
SUM(IMPRSN_CNT) AS number_of_impressions
FROM clsfd_access_views.CLSFD_CAS_AD_HST
WHERE CLSFD_SITE_ID = 3001
AND datum >= '2017-01-01'
GROUP BY 1,2
I want to have the total number of unique users during each month of the year 2017. I tried:
GROUP BY EXTRACT(MONTH FROM datum), 2
But this returns an error. What would be the most efficient code to retrieve the total number of user ids, ads, and impressions, per month.
It doesn't make sense to me to be aggregating by users, since they are what you are trying to count. Try grouping by the month and year alone:
SELECT
EXTRACT(YEAR FROM EVENT_DT) || '-' || EXTRACT(MONTH FROM EVENT_DT) AS month,
COUNT(DISTINCT CLSFD_USER_ID) AS number_of_user_ids,
COUNT(DISTINCT CLSFD_CAS_AD_ID) AS number_of_ads,
SUM(IMPRSN_CNT) AS number_of_impressions
FROM clsfd_access_views.CLSFD_CAS_AD_HST
WHERE
CLSFD_SITE_ID = 3001 AND
datum >= '2017-01-01' AND datum < '2018-01-01'
GROUP BY
EXTRACT(YEAR FROM EVENT_DT) || '-' || EXTRACT(MONTH FROM EVENT_DT);
Note that I changed your restriction on datum to also exclude any year greater than 2017.
If you want this values to be included in current query, then you should use analytical functions. For example "total number of unique users during each month" would be something like:
select count(distinct user_id) over(partition by EXTRACT(MONTH FROM datum))
Be aware that those values will be repeated for each user.