How do I count the values in a column by each month? - sql

In my data frame, I am trying to count number of values of a column per month, given by the time column in SQL. I want the output to have a count of the number of values in a column for each month. I know I can use the where function to count for one month and I could do this for all 12 months, but was wondering if there was a more efficient way.
Here's the inefficient example:
SELECT
count(column1) AS Total
FROM DataFrame
WHERE MONTH(date) == 1
GROUP by column1 ORDER by count(column1) DESC LIMIT 10

I am trying to count distinct values of a column per month, given by the time column in SQL. I
You seem to want GROUP BY:
SELECT MONTH(date) as mon, approx_count_distinct(column1, 0.1) AS Total
FROM DataFrame
GROUP by MONTH(date)
ORDER by Total DESC
LIMIT 10;
Note that using MONTH() without YEAR() or a filter on the date is highly suspicious.

Related

Is there a way to count how many strings in a specific column are seen for the 1st time?

**Is there a way to count how many strings in a specific column are seen for
Since the value in the column 2 gets repeated sometimes due to the fact that some clients make several transactions in different times (the client can make a transaction in the 1st month then later in the next year).
Is there a way for me to count how many IDs are completely new per month through a group by (never seen before)?
Please let me know if you need more context.
Thanks!
A simple way is two levels of aggregation. The inner level gets the first date for each customer. The outer summarizes by year and month:
select year(min_date), month(min_date), count(*) as num_firsts
from (select customerid, min(date) as min_date
from t
group by customerid
) c
group by year(min_date), month(min_date)
order by year(min_date), month(min_date);
Note that date/time functions depends on the database you are using, so the syntax for getting the year/month from the date may differ in your database.
You can do the following which will assign a rank to each of the transactions which are unique for that particular customer_id (rank 1 therefore will mean that it is the first order for that customer_id)
The above is included in an inline view and the inline view is then queried to give you the month and the count of the customer id for that month ONLY if their rank = 1.
I have tested on Oracle and works as expected.
SELECT DISTINCT
EXTRACT(MONTH FROM date_of_transaction) AS month,
COUNT(customer_id)
FROM
(
SELECT
date_of_transaction,
customer_id,
RANK() OVER(PARTITION BY customer_id
ORDER BY
date_of_transaction ASC
) AS rank
FROM
table_1
)
WHERE
rank = 1
GROUP BY
EXTRACT(MONTH FROM date_of_transaction)
ORDER BY
EXTRACT(MONTH FROM date_of_transaction) ASC;
Firstly you should generate associate every ID with year and month which are completely new then count, while grouping by year and month:
SELECT count(*) as new_customers, extract(year from t1.date) as year,
extract(month from t1.date) as month FROM table t1
WHERE not exists (SELECT 1 FROM table t2 WHERE t1.id==t2.id AND t2.date<t1.date)
GROUP BY year, month;
Your results will contain, new customer count, year and month

Postgres SQL: Sum of ids greater than a day, computed day by day over a series

Looking to compute a moving sum day by day over a date range. i.e. Looking to sum all values greater than or equal to the date but do it row by row. I know that a window function is needed, but need some help with the actual function.
** I need to compute the sum greater than each date in a row. Notice on 2017-08-02 I do not count the value from the day before
Example data:
2017-08-1, 1
2017-08-2, 5
2017-08-3, 4
2017-08-4, 3
2017-08-5, 2
Desired Result:
2017-08-1, 15
2017-08-2, 14
2017-08-3, 9
2017-08-4, 5
2017-08-5, 2
Here is what I have to produce this data.
SELECT DATE_TRUNC('day', created_at),
COUNT(*)
FROM table
GROUP BY 1
ORDER BY 1 DESC
Just use cumulative sums:
SELECT DATE_TRUNC('day', created_at),
COUNT(*),
SUM(COUNT(*)) OVER (ORDER BY DATE_TRUNC('day', created_at) DESC) as sum_greater_than
FROM table
GROUP BY 1
ORDER BY 1 DESC;

How to use an sum() function without group by?

I just have to omit those records whose sum of sales in all 53 weeks is 0 and would need the output without group by
You cannnot really get that in one query.
To get all years without any sum of sales, you have to sum the sales.
That is:
Firstly:
select YEAR(date) from YourTable group by YEAR(date) having sum(sales) > 0
Then:
select * from YourTable where Year in (<firstquery>) as aliasname
order by <anydatecolumn>
If you are using mssql you can do that in one query using the OVER clause and partitioning

Account for missing values in group by month

I'm trying to retrieve the average number of records added to the database each month. However for months that no records were added, the row is missing and therefore not being calculated into the average.
Here is the query:
SELECT AVG(a.count) AS AVG
FROM ( SELECT COUNT(*) AS count, MONTH(InsertedTimestamp) AS Month
FROM Certificates
WHERE InsertedTimestamp >= '9/19/2014'
AND InsertedTimestamp <= '7/1/2015'
GROUP BY MONTH(InsertedTimestamp)
) AS a
When I run just the inner query, only results from months 9,10,11 are showing, because there are no records for months 12,1,2,3,4,5,6,7. How can I add these missing rows to the table in order to get the correct monthly average?
Thanks!
This is easy enough to fix, just by using sum / cnt:
SELECT COUNT(*) / (TIMESTAMPDIFF(month, '2014-09-19', '2015-07-01' ) + 1)
FROM Certificates
WHERE InsertedTimestamp >= '2014-09-19' AND
InsertedTimestamp <= '2015-07-01' ;
You don't even need the subquery.

Minimum temp and maximum temp display using SQL script

I have a table in which minimum and maximum temperature is stored per order no and date. I want to pick the minimum temperature and maximum temperature for each day. This should be done using SQL script.
You have to use group by clause, and aggregate functions min, max as below:
select date, min(temperature), max(temperature)
from table
group by date
It will work if your date have only year, month and day (01/11/2012).
In oracle:
SELECT TO_CHAR(DATE_VAL,'DD-MM-YYYY'), MAX(temperature),
MIN(temperature) FROM table_name group by TO_CHAR(DATE_VAL,'DD-MM-YYYY');
In MySQl:
SELECT DATE_FORMAT(DATE_VAL, '%d-%m-%Y'), max(temperature),
min(temperature) from table_name group by DATE_FORMAT(DATE_VAL, '%d-%m-%Y');