I have a table in Presto with this schema:
created_at Record
timestamp String
created_at has records from 2020 to 2022.
What's the best way to get the total number of record by month, like this output:
Date N_Records
2020-01 1000
2020-02 1500
----
2022-03 3000
What I did so far:
select date_format(created_at, '%b') month, count(*) count
from table
group by date_format(created_at, '%b')
order by 1 asc
Problems with my code:
I don't have the respective year and the results are not sorted by asc month.
Can someone help with to improve my query?
You can use format including 4-digit year and 2-digit month:
select date_format(created_at, '%Y-%m') Date, count(*) N_Records
from table
group by 1
order by 1 asc
Related
PostgreSQL 13
Assuming a simplified table plans like the following, it can be assumed that there is at least 1 row for every month and sometimes multiple rows on the same day:
id
first_published_at
12345678910
2022-10-01 03:58:55.118
abcd1234efg
2022-10-03 03:42:55.118
jhsdf894hld
2022-10-03 17:34:55.118
aslb83nfys5
2022-09-12 08:17:55.118
My simplified query:
SELECT TO_CHAR(plans.first_published_at, 'YYYY-MM') AS publication_date, COUNT(*)
FROM plans
WHERE plans.first_published_at IS NOT NULL
GROUP BY TO_CHAR(plans.first_published_at, 'YYYY-MM');
This gives me the following result:
publication_date
count
2022-10
3
2022-09
1
But the result I would need for October is 4.
For every month, the count should be an aggregation of the current month and ALL previous months. I would appreciate any insight on how to approach this.
I would use your query as a CTE and run a select that uses cumulative sum as a window function.
with t as
(
SELECT TO_CHAR(plans.first_published_at, 'YYYY-MM') AS publication_date,
COUNT(*) AS cnt
FROM plans
WHERE plans.first_published_at IS NOT NULL
GROUP BY publication_date
)
select publication_date,
sum(cnt) over (order by publication_date) as "count"
from t
order by publication_date desc;
Demo on DB fiddle
I am trying to optimize the below query to help fetch all customers in the last three months who have a monthly order frequency +4 for the past three months.
Customer ID
Feb
Mar
Apr
0001
4
5
6
0002
3
2
4
0003
4
2
3
In the above table, the customer with Customer ID 0001 should only be picked, as he consistently has 4 or more orders in a month.
Below is a query I have written, which pulls all customers with an average purchase frequency of 4 in the last 90 days, but not considering there is a consistent purchase of 4 or more last three months.
Query:
SELECT distinct lines.customer_id Customer_ID, (COUNT(lines.order_id)/90) PurchaseFrequency
from fct_customer_order_lines lines
LEFT JOIN product_table product
ON lines.entity_id= product.entity_id
AND lines.vendor_id= product.vendor_id
WHERE LOWER(product.country_code)= "IN"
AND lines.date >= DATE_SUB(CURRENT_DATE() , INTERVAL 90 DAY )
AND lines.date < CURRENT_DATE()
GROUP BY Customer_ID
HAVING PurchaseFrequency >=4;
I tried to use window functions, however not sure if it needs to be used in this case.
I would sum the orders per month instead of computing the avg and then retrieve those who have that sum greater than 4 in the last three months.
Also I think you should select your interval using "month(CURRENT_DATE()) - 3" instead of using a window of 90 days. Of course if needed you should handle the case of when current_date is jan-feb-mar and in that case go back to oct-nov-dec of the previous year.
I'm not familiar with Google BigQuery so I can't write your query but I hope this helps.
So I've found the solution to this using WITH operator as below:
WITH filtered_orders AS (
select
distinct customer_id ID,
extract(MONTH from date) Order_Month,
count(order_id) CountofOrders
from customer_order_lines` lines
where EXTRACT(YEAR FROM date) = 2022 AND EXTRACT(MONTH FROM date) IN (2,3,4)
group by ID, Order_Month
having CountofOrders>=4)
select distinct ID
from filtered_orders
group by ID
having count(Order_Month) =3;
Hope this helps!
An option could be first count the orders by month and then filter users which have purchases on all months above your threshold:
WITH ORDERS_BY_MONTH AS (
SELECT
DATE_TRUNC(lines.date, MONTH) PurchaseMonth,
lines.customer_id Customer_ID,
COUNT(lines.order_id) PurchaseFrequency
FROM fct_customer_order_lines lines
LEFT JOIN product_table product
ON lines.entity_id= product.entity_id
AND lines.vendor_id= product.vendor_id
WHERE LOWER(product.country_code)= "IN"
AND lines.date >= DATE_SUB(CURRENT_DATE() , INTERVAL 90 DAY )
AND lines.date < CURRENT_DATE()
GROUP BY PurchaseMonth, Customer_ID
)
SELECT
Customer_ID,
AVG(PurchaseFrequency) AvgPurchaseFrequency
FROM ORDERS_BY_MONTH
GROUP BY Customer_ID
HAVING COUNT(1) = COUNTIF(PurchaseFrequency >= 4)
I have this query that shows the number of flights each month, but the month appears in number format and I want to convert the month number as text.
Here is the query:
select to_char(f.departuretime, 'yyyy-mm') month, count(*) numberofflights
from flight f
group by to_char(f.departuretime, 'yyyy-mm')
order by numberofflights desc;
output:
MONTH NUMBEROFFLIGHTS
2022-05 7
2022-11 5
2022-08 3
... ...
I want to display the months like "2022-MAY" or just "MAY" and so on.
You can use the month format instead of MM to get the month's name instead of its number:
select to_char(f.departuretime, 'yyyy-MONTH') month, count(*) numberofflights
from flight f
group by to_char(f.departuretime, 'yyyy-MONTH')
order by numberofflights desc;
Looking to compute a moving sum day by day over a date range. i.e. Looking to sum all values greater than or equal to the date but do it row by row. I know that a window function is needed, but need some help with the actual function.
** I need to compute the sum greater than each date in a row. Notice on 2017-08-02 I do not count the value from the day before
Example data:
2017-08-1, 1
2017-08-2, 5
2017-08-3, 4
2017-08-4, 3
2017-08-5, 2
Desired Result:
2017-08-1, 15
2017-08-2, 14
2017-08-3, 9
2017-08-4, 5
2017-08-5, 2
Here is what I have to produce this data.
SELECT DATE_TRUNC('day', created_at),
COUNT(*)
FROM table
GROUP BY 1
ORDER BY 1 DESC
Just use cumulative sums:
SELECT DATE_TRUNC('day', created_at),
COUNT(*),
SUM(COUNT(*)) OVER (ORDER BY DATE_TRUNC('day', created_at) DESC) as sum_greater_than
FROM table
GROUP BY 1
ORDER BY 1 DESC;
I have a table similar to this:
| id(INTEGER) | id2(INTEGER) | time(DATETIME) | value(REAL) |
|-------------|--------------|----------------------|-------------|
| 1 | 2000 | 2004-01-01 00:00:00 | 1000 |
which I query with visual basic. Now I want to sum all entries between year 2004 and 2010 so the result looks like this:
2004 11,000
2005 35,000
2006 46,000
cIf I do it inside visual basis it is achieved with few loops but unfortunately this is not very performant.
Is it possible to create a query which yields the result say between two years grouped by years. Or between two months (within one year), grouped by months, days (within one month), hours (within one day), minutes (within one hour)?
EDIT:
Query for year interval:
SELECT STRFTIME('%Y', time) AS year, SUM(value) AS cumsum FROM mytable WHERE year >= STRFTIME('%Y','2005-01-01 00:00:00') AND year <= STRFTIME('%Y','2010-01-01 00:00:00') GROUP BY STRFTIME('%Y', mytable.time) ORDER BY year;
Now need an idea for months, days and hours.
This is an aggregation query, but you need to extract the year from the date:
select strftime('%Y', time) as yr, sum(value)
from table
group by strftime('%Y', time)
order by yr;
Hi Below is the code for both sql and sqlite.we can do group by time instead of using the strftime function again.
SQLITE:
select strftime('%Y',
time)as year , sum(value) value
from attendence group by time
SQL:
select EXTRACT(YEAR FROM time) as year, sum(value) value
from attendence Group by time