Count distinct dates within timestamp by Month SQL - sql

I would like to count the number of distinct dates within each month.
I have a data set that looks like this:
TIMESTAMP
------------------
2017-10-25 14:39:51
2017-10-25 15:00:51
2017-11-10 02:39:42
2018-09-24 14:39:55
2018-09-25 13:25:01
2019-02-12 12:23:44
...
So my expected output would be:
year_month | count
2017-10 2
2018-09 2
2019-02 1
I have tried the following code so far, but it is returning incorrect results:
WITH F AS(
SELECT concat(YEAR(TIMESTAMP), MONTH(TIMESTAMP)) AS year_month
FROM tbl
WHERE TYPE = 'Site'
AND TO_SITE = 'location'
)
SELECT count(year_month), year_month
FROM F
GROUP BY year_month
I do not need to worry about the time of day. I just want to count the distinct days in each month. Thank you in advance for your help.

In SQL Server, I would recommend one of these approaches:
select year(timestamp), month(timestamp), count(distinct convert(date, timestamp)
from t
group by year(timestamp), month(timestamp);
Or:
select format(timestamp, 'yyyy-MM'), count(distinct convert(date, timestamp))
from t
group by format(timestamp, 'yyyy-MM');
I see no need for subqueries or CTEs.

from your code I assume you are using a SQL Server then you can do something like this
with cte as
(
select
left(convert(varchar, myCol,112),6) as yyyy_mm,
convert(date, myCol) as date
from myTable
)
select
yyyy_mm,
count(distinct date) as count
from cte
group by
yyyy_mm
output:
| yyyy_mm | count |
*-----------------*
| 201710 | 1 |
| 201711 | 1 |
| 201809 | 2 |
| 201902 | 1 |

Using date_trunc() in PostgreSQL it can simply be:
SELECT date_trunc('month', timestamp)
, count(DISTINCT date_trunc('day', timestamp))
FROM tbl
GROUP BY 1;
Various performance optimizations possible, depending on details of the setup.

Related

I'm getting unexpected results when I use SUM(CASE WHEN .... THEN "column_name") in sqlite3

I am trying to find the number of orders I got in the month of April. I have 3 orders but my query gets the result 0. What could be the problem?
Here's the table:
id | first | middle | last | product_name | numberOut | Date
1 | Muhammad | Sameer | Khan | Macbook | 1 | 2020-04-01
2 | Chand | Shah | Khurram | Dell Optiplex | 1 | 2020-04-02
3 | Sultan | | Chohan | HP EliteBook | 1 | 2020-03-31
4 | Express | Eva | Plant | Dell Optiplex | 1 | 2020-03-11
5 | Rana | Faryad | Ali | HP EliteBook | 1 | 2020-04-02
And here's the query:
SELECT SUM(CASE WHEN strftime('%m', oDate) = '04' THEN 'id' END) FROM orders;
If you want all Aprils, then you can just look at the month. I would recommend:
select count(*)
from orders o
where o.date >= '2020-04-01' and o.date < '2020-05-01';
Note that this does direct comparisons of date to a valid dates in the where clause.
The problem with your code is this:
THEN 'id'
You are using the aggregate function SUM() and you sum over a string literal like 'id' which is implicitly converted to 0 (because it can't be converted to a number) so the result is 0.
Even if you remove the single quotes you will not get the result that you want because you will get the sum of the ids.
But if you used:
THEN 1 ELSE 0
then you would get the correct result.
But with SQLite you can write it simpler:
SELECT SUM(strftime('%m', oDate) = '04') FROM orders;
without the CASE expression.
Or since you just want to count the orders then COUNT() will do it:
SELECT COUNT(*) FROM orders WHERE strftime('%m', oDate) = '04';
Edit.
If you want to count the orders for all the months then group by month:
SELECT strftime('%Y-%m', oDate) AS month,
COUNT(*) AS number_of_orders
FROM orders
GROUP BY month;
SELECT SUM(CASE WHEN strftime('%m', oDate) = '04' THEN 1 ELSE 0 END) FROM orders;
if you need to use SUM
There is a problem with your query. You do not need to do that aggregation operation.
SELECT COUNT(*) from table_name WHERE strftime('%m', Date) = '04';
I would use explicit date comparisons rather than date functions - this makes the query SARGeable, ie it may benefit an existing index.
The most efficient approach, with a filter in the where clause:
select count(*) cnt
from orders
where oDate >= '2020-04-01' and oDate < '2020-05-01'
Alternatively, if you want a result of 0 even when there are no orders in April you can do conditional aggregation, as you originally intended:
select sum(case when oDate >= '2020-04-01' and oDate < '2020-05-01' then 1 else 0 end) cnt
from orders

Get Max And Min dates for consecutive values in T-SQL

I have a log table like below and want to simplfy it by getting min start date and max end date for consecutive Status values for each Id. I tried many window function combinations but no luck.
This is what I have:
This is what want to see:
This is a typical gaps-and-islands problem. You want to aggregate groups of consecutive records that have the same Id and Status.
No need for recursion, here is one way to solve it using window functions:
select
Id,
Status,
min(StartDate) StartDate,
max(EndDate) EndDate
from (
select
t.*,
row_number() over(partition by id order by StartDate) rn1,
row_number() over(partition by id, status order by StartDate) rn2
from mytable t
) t
group by
Id,
Status,
rn1 - rn2
order by Id, min(StartDate)
The query works by ranking records over two different partitions (by Id, and by Id and Status). The difference between the ranks gives you the group each record belongs to. You can run the subquery independently to see what it returns and understand the logic.
Demo on DB Fiddle:
Id | Status | StartDate | EndDate
-: | :----- | :------------------ | :------------------
1 | B | 07/02/2019 00:00:00 | 18/02/2019 00:00:00
1 | C | 18/02/2019 00:00:00 | 10/03/2019 00:00:00
1 | B | 10/03/2019 00:00:00 | 01/04/2019 00:00:00
2 | A | 05/02/2019 00:00:00 | 22/04/2019 00:00:00
2 | D | 22/04/2019 00:00:00 | 05/05/2019 00:00:00
2 | A | 05/05/2019 00:00:00 | 30/06/2019 00:00:00
Try the following query. First order the data by StartDate and generate a sequence (rid). Then you the recursive cte to get the first row (rid=1) for each group (id,status), and recursively get the next row and compare the start/end date.
;WITH cte_r(id,[Status],StartDate,EndDate,rid)
AS
(
SELECT id,[Status],StartDate,EndDate, ROW_NUMBER() OVER(PARTITION BY Id,[Status] ORDER BY StartDate) AS rid
FROM log_table
),
cte_range(id,[Status],StartDate,EndDate,rid)
AS
(
SELECT id,[Status],StartDate,EndDate,rid
FROM cte_r
WHERE rid=1
UNION ALL
SELECT p.id, p.[Status], CASE WHEN c.StartDate<p.EndDate THEN p.StartDate ELSE c.StartDate END AS StartDate, c.EndDate,c.rid
FROM cte_range p
INNER JOIN cte_r c
ON p.id=c.id
AND p.[Status]=c.[Status]
AND p.rid+1=c.rid
)
SELECT id,[Status],StartDate,MAX(EndDate) AS EndDate FROM cte_range GROUP BY id,StartDate ;

Flat Big Query Rows

Please help me to resolve this task. I have Google Big Query table like this:
| name | startDate | endDate |
| Bob | 2018-01-01 | 2018-01-01 |
| Nick | 2017-12-29 | 2017-12-31 |
and as a result I need to get something like this:
| name | date |
| Bob | 2018-01-01 |
| Nick | 2017-12-29 |
| Nick | 2017-12-30 |
| Nick | 2017-12-31 |
Is it possible? Thank you in advance.
WITH CTE as (
SELECT 'bob' name, date('2018-01-01') startDate, date('2018-01-01') endDate
UNION ALL SELECT 'Nick', date '2017-12-29' startDate, date('2017-12-31') endDate
),
CTE2 AS (
SELECT name, GENERATE_DATE_ARRAY(startDate, endDate, INTERVAL 1 DAY) AS date
FROM CTE
)
SELECT name, date
FROM CTE2,
UNNEST(date) as date
Or just simply
#standardSQL
SELECT name, date
FROM `project.dataset.table`,
UNNEST(GENERATE_DATE_ARRAY(startDate, endDate)) date
You may make use of a calendar table here:
WITH dates AS (
SELECT '2017-12-29' AS date_val UNION ALL
SELECT '2017-12-30' UNION ALL
SELECT '2017-12-31' UNION ALL
SELECT '2018-01-01'
-- and maybe other dates
)
SELECT
t2.name,
t1.date_val
FROM dates t1
INNER JOIN yourTable t2
ON t1.date_val BETWEEN t2.startDate AND t2.endDate
ORDER BY
t2.name,
t1.date_val;
If your version of BigQuery does not support CTE, you may just inline the CTE as a subquery. That is, replace dates with the body of the CTE itself.
In practice, you might want to generate a date series (q.v. here), or possibly maintain a dedicated calendar table in your database. The above just shows what the query itself might look like.

Users that returned

I have a table with [EVENT_DATE] and [USER_ID], I would like to know how to count the users who came back in the same month.
Table_01
-------------------------------------------
EVENT_DATE | USER_ID
-------------------------------------------
2017-03-28 00:00:25.000 | 0006235012201
2017-03-04 23:00:00.000 | 0006235012201
2017-03-19 00:25:15.000 | 0006235012201
2017-02-03 10:00:02.000 | 0006235012202
2017-01-18 00:15:00.000 | 0006235012202
2017-03-28 11:00:15.000 | 0006235012202
2017-03-23 15:20:02.000 | 0006235012203
2017-02-18 12:00:06.000 | 0006235012203
2017-03-21 16:05:09.000 | 0006235012203
The answering being 2, because users 0006235012201 & 0006235012203 both came back within the same month.
EDIT: Sorry
I am looking to get the count by month.
-----------------------
Month | Users Returned
-----------------------
01/17 | 70
02/17 | 60
03/17 | 10
This is what I have, but it isn't correct as it seems to be listing users.
SELECT A.[USER_ID], A.[EVENT_DATE], COUNT(*)
FROM(
SELECT [USER_ID], [EVENT_DATE], COUNT(*)
FROM Table_01
GROUP BY [USER_ID], [EVENT_DATE]
HAVING COUNT(*) > 1
) A
GROUP BY A.[USER_ID], A.[EVENT_DATE]
Microsoft SQL Server 2016. Compatibility level: SQL Server 2005 (90)
select user_id
from your_table
group by user_id
having count(distinct year(event_date) * 100 + month(event_date)) > 1
Try the below script
SELECT DISTINCT USER_ID
FROM Table_01
GROUP BY USER_ID, (YEAR(EVENT_DATE)*100)+MONTH(EVENT_DATE)
HAVING COUNT(*) > 1
Use below query to get users returned in same month
SELECT USER_ID
FROM #Table
GROUP BY USER_ID,(YEAR(EVENT_DATE)*100)+MONTH(EVENT_DATE)
HAVING COUNT(*) > 1
Using this you can get month wise result
select month(EVENT_DATE),year(EVENT_DATE), USER_ID from tableName
where year(EVENT_DATE)=2017
Group by USER_ID,year(EVENT_DATE),month(EVENT_DATE)
Having HAVING COUNT(*) > 1
if you want some specific month wise you can add where condition like month(EVENT_DATE) =3
You need two levels of aggregation. First to get the users who have more than one row in a month. Then grouping by year,month to get the counts of users for a specific month.
select cast(mth as varchar(2))+'/'+cast(yr as varchar(4)) as mth,count(*)
from (select user_id,month(event_date) as mth,year(event_date) as yr
from tablename
group by user_id,month(event_date),year(event_date)
having count(*) > 1
) t
group by cast(mth as varchar(2))+'/'+cast(yr as varchar(4))

Select top 2 rows different from each other

I have this table
| date | sum |
|--------------|-------|
| 2015-02-19 | 10000 |
| 2015-02-19 | 10000 |
| 2015-02-20 | 15000 |
| 2015-02-20 | 15000 |
| 2015-02-21 | 18000 |
| 2015-02-21 | 18000 |
I want to select top 2 rows from the table, but only different ones, meaning my result should return 2015-02-20 and 2015-02-21.
SELECT TOP 2 distinct date
FROM stock
Using this gives me an error:
Incorrect syntax near the keyword 'distinct'.
Help would be highly appreciated.
You can try like this
select top 2 * from
(
select distinct date FROM stock
)
Try something like:
SELECT TOP 2 date
FROM stock
GROUP BY date
I think Distinct and Top should switch places in your query:
SELECT DISTINCT TOP 2 date FROM stock ORDER BY date DESC
try
select distinct top 2 date from stock
You can use GROUP BY:
SELECT TOP 2 date
FROM stock
GROUP BY date
ORDER BY date DESC
Sample result:
DATE
2015-02-21
2015-02-20
See result in SQL Fiddle.
Try this :
WITH cte AS
( SELECT distinct date ,
ROW_NUMBER() OVER (PARTITION BY date
ORDER BY date DESC
)
AS rn
FROM stock
)
SELECT date
FROM cte
WHERE rn <= 3
ORDER BY rn ;
Try this:
SELECT TOP 2 date FROM stock group by date