MYSQL query: count the dates - sql

In the table below
+-------+-----------------------+
| id | timestamp |
+-------+-----------------------+
| 1 | 2010-06-10 14:35:30 |
| 2 | 2010-06-10 15:27:35 |
| 3 | 2010-06-10 16:39:36 |
| 4 | 2010-06-11 14:55:30 |
| 5 | 2010-06-11 18:45:31 |
| 6 | 2010-06-12 20:25:31 |
+-------+-----------------------+
I want to be able to count the dates (time is ignored). So the output should be like below:
+-------+-----------------------+
| id | type | count |
+-------+-----------------------+
| 1 | 2010-06-10 | 3 |
| 2 | 2010-06-11 | 2 |
| 3 | 2010-06-12 | 1 |
+-------+-----------------------+
What would be the query for this?

This works if you can live without the id column in the result:
SELECT DATE(timestamp) AS type, COUNT(*) AS `count`
FROM sometable
GROUP BY DATE(timestamp)
ORDER BY DATE(timestamp)

SELECT
DATE(timestamp),
COUNT(*)
FROM
My_Table
GROUP BY
DATE(timestampe)
This doesn't give you a row number for each row. I don't know if (why?) that's important.

select date(timestamp) as type, count(*)
from your_table
group by type;

This might work...
select DATE(timestamp), count(timestamp)
from _table
group by timestamp
order by count(timestamp) desc

SELECT count( DATE_FORMAT( timestamp, '%Y-%m-%d/' ) ) , DATE_FORMAT( timestamp, '%Y-%m-%d' )
FROM tablename
group by DATE_FORMAT( timestamp, '%Y-%m-%d' );
You can not include the id in the select or the count will be off.

Related

Find the first order of a supplier in a day using SQL

I am trying to write a query to return supplier ID (sup_id), order date and the order ID of the first order (based on earliest time).
+--------+--------+------------+--------+-----------------+
|orderid | sup_id | items | sales | order_ts |
+--------+--------+------------+--------+-----------------+
|1111132 | 3 | 1 | 27,0 | 24/04/17 13:00 |
|1111137 | 3 | 2 | 69,0 | 02/02/17 16:30 |
|1111147 | 1 | 1 | 87,0 | 25/04/17 08:25 |
|1111153 | 1 | 3 | 82,0 | 05/11/17 10:30 |
|1111155 | 2 | 1 | 29,0 | 03/07/17 02:30 |
|1111160 | 2 | 2 | 44,0 | 30/01/17 20:45 |
|....... | ... | ... | ... | ... ... |
+--------+--------+------------+--------+-----------------+
Output I am looking for:
+--------+--------+------------+
| sup_id | date | order_id |
+--------+--------+------------+
|....... | ... | ... |
+--------+--------+------------+
I tried using a subquery in the join clause as below but didn't know how to join it without having selected order_id.
SELECT sup_id, date(order_ts), order_id
FROM sales s
JOIN
(
SELECT sup_id, date(order_ts) as date, min(time(order_date))
FROM sales
GROUP BY merchant_id, date
) m
on ...
Kindly assist.
You can use not exists:
select *
from sales
where not exists (
-- find sales for same supplier, earlier date, same day
select *
from sales as older
where older.sup_id = sales.sup_id
and older.order_ts < sales.order_ts
and older.order_ts >= cast(sales.order_ts as date)
)
The query below might not be the fastest in the world, but it should give you all information you need.
select order_id, sup_id, items, sales, order_ts
from sales s
where order_ts <= (
select min(order_ts)
from sales m
where m.sup_id = s.sup_id
)
select sup_id, min(order_ts), min(order_id) from sales
where order_ts = '2022-15-03'
group by sup_id
Assumed orderid is an identity / auto increment column

Get max dates for each customer

Let's say I have a customer table like so:
id | start_date | created_at
-----------------------------
1 | 2020-1-15 | 2020-1-15
1 | 2020-1-16 | 2020-1-15
1 | 2020-1-16 | 2020-1-16
2 | 2020-1-15 | 2020-1-15
2 | 2020-1-16 | 2020-1-15
I want to get 1 row per customer id that has the max(start_date) and if it's the same date will use the max(created_at).
Result should look like this:
id | start_date | created_at
-----------------------------
1 | 2020-1-16 | 2020-1-16
2 | 2020-1-16 | 2020-1-15
I'm having a hard time with window functions as I thought a partition by id would work but I have 2 dates.
Maybe I use a group by?
please try this one, you could use order by two columns
SELECT * FROM (
SELECT id, start_date, Created_At, ROW_NUMBER()OVER(PARTITION BY id ORDER BY start_date DESC, Created_At DESC) AS R
FROM #date
) A
WHERE A.R = 1

Get users who took ride for 3 or more consecutive dates

I have below table, it shows user_id and ride_date.
+---------+------------+
| user_id | ride_date |
+---------+------------+
| 1 | 2019-11-01 |
| 1 | 2019-11-03 |
| 1 | 2019-11-05 |
| 2 | 2019-11-03 |
| 2 | 2019-11-04 |
| 2 | 2019-11-05 |
| 2 | 2019-11-06 |
| 3 | 2019-11-03 |
| 3 | 2019-11-04 |
| 3 | 2019-11-05 |
| 3 | 2019-11-06 |
| 4 | 2019-11-05 |
| 4 | 2019-11-07 |
| 4 | 2019-11-08 |
| 4 | 2019-11-09 |
| 5 | 2019-11-11 |
| 5 | 2019-11-13 |
+---------+------------+
I want user_id who took rides for 3 or more consecutive days along with days on which they took consecutive rides
The desired result is as below
+---------+-----------------------+
| user_id | consecutive_ride_date |
+---------+-----------------------+
| 2 | 2019-11-03 |
| 2 | 2019-11-04 |
| 2 | 2019-11-05 |
| 2 | 2019-11-06 |
| 3 | 2019-11-03 |
| 3 | 2019-11-04 |
| 3 | 2019-11-05 |
| 3 | 2019-11-06 |
| 4 | 2019-11-08 |
| 4 | 2019-11-09 |
| 4 | 2019-11-10 |
+---------+-----------------------+
SQL Fiddle
With LAG() and LEAD() window functions:
with cte as (
select *,
datediff(
day,
lag([ride_date]) over (partition by [user_id] order by [ride_date]),
[ride_date]
) prev1,
datediff(
day,
lag([ride_date], 2) over (partition by [user_id] order by [ride_date]),
[ride_date]
) prev2,
datediff(
day,
[ride_date],
lead([ride_date]) over (partition by [user_id] order by [ride_date])
) next1,
datediff(
day,
[ride_date],
lead([ride_date], 2) over (partition by [user_id] order by [ride_date])
) next2
from Table1
)
select [user_id], [ride_date]
from cte
where
(prev1 = 1 and prev2 = 2) or
(prev1 = 1 and next1 = 1) or
(next1 = 1 and next2 = 2)
See the demo.
Results:
> user_id | ride_date
> ------: | :---------
> 2 | 03/11/2019
> 2 | 04/11/2019
> 2 | 05/11/2019
> 2 | 06/11/2019
> 3 | 03/11/2019
> 3 | 04/11/2019
> 3 | 05/11/2019
> 3 | 06/11/2019
> 4 | 07/11/2019
> 4 | 08/11/2019
> 4 | 09/11/2019
Here is one way to adress this gaps-and-island problem:
first, assign a rank to each user ride with row_number(), and recover the previous ride_date (aliased lag_ride_date)
then, compare the date of the previous ride to the current one in a conditional sum, that increases when the dates are successive ; by comparing this with the rank of the user ride, you get groups (aliased grp) that represent consecutive rides with a 1 day spacing
do a window count how many records belong to each group (aliased cnt)
filter on records whose window count is greater than 3
Query:
select user_id, ride_date
from (
select
t.*,
count(*) over(partition by user_id, grp) cnt
from (
select
t.*,
rn1
- sum(case when ride_date = dateadd(day, 1, lag_ride_date) then 1 else 0 end)
over(partition by user_id order by ride_date) grp
from (
select
t.*,
row_number() over(partition by user_id order by ride_date) rn1,
lag(ride_date) over(partition by user_id order by ride_date) lag_ride_date
from Table1 t
) t
) t
) t
where cnt >= 3
Demo on DB Fiddle
This is a typical gaps and island problems.
We can solve it as follows
with data
as (
select user_id
,ride_date
,dateadd(day
,-row_number() over(partition by user_id order by ride_date asc)
,ride_date) as grp_field
from Table1
)
,consecutive_days
as(
select user_id
,ride_date
,count(*) over(partition by user_id,grp_field) as cnt
from data
)
select *
from consecutive_days
where cnt>=3
order by user_id,ride_date
https://dbfiddle.uk/?rdbms=sqlserver_2017&fiddle=7bb851d9a12966b54afb4d8b144f3d46
There is no need to apply gaps-and-islands methodologies to this problem. The problem is much simpler to solve.
You can return the users and first date just by using LEAD():
SELECT t1.*
FROM (SELECT t1.*,
LEAD(ride_date, 2) OVER (PARTITION BY user_id ORDER BY ride_date) as ride_date_2
FROM table1 t1
) t1
WHERE ride_date_2 = DATEADD(day, 2, ride_date);
If you want the actual dates, you can unpivot the results:
SELECT DISTINCT t1.user_id, v.ride_date
FROM (SELECT t1.*,
LEAD(ride_date, 2) OVER (PARTITION BY user_id ORDER BY ride_date) as ride_date_2
FROM table1 t1
) t1 CROSS APPLY
(VALUES (t1.ride_date),
(DATEADD(day, 1, t1.ride_date)),
(DATEADD(day, 2, t1.ride_date))
) v(ride_date)
WHERE t1.ride_date_2 = DATEADD(day, 2, t1.ride_date)
ORDER BY t1.user_id, v.ride_date;

Rolling 90 days active users in BigQuery, improving preformance (DAU/MAU/WAU)

I'm trying to get the number of unique events on a specific date, rolling 90/30/7 days back. I've got this working on a limited number of rows with the query bellow but for large data sets I get memory errors from the aggregated string which becomes massive.
I'm looking for a more effective way of achieving the same result.
Table looks something like this:
+---+------------+-------------+
| | date | userid |
+---+------------+-------------+
| 1 | 2013-05-14 | xxxxx |
| 2 | 2017-03-14 | xxxxx |
| 3 | 2018-01-24 | xxxxx |
| 4 | 2013-03-21 | xxxxx |
| 5 | 2014-03-19 | xxxxx |
| 6 | 2015-09-03 | xxxxx |
| 7 | 2014-02-06 | xxxxx |
| 8 | 2014-10-30 | xxxxx |
| ..| ... | ... |
+---+------------+-------------+
Format of the desired result:
+---+------------+---------------------------------------------+
| | date | active_users_7_days | active_users_90_days |
+---+------------+---------------------------------------------+
| 1 | 2013-05-14 | 1240 | 34339 |
| 2 | 2017-03-14 | 4334 | 54343 |
| 3 | 2018-01-24 | ..... | ..... |
| 4 | 2013-03-21 | ..... | ..... |
| 5 | 2014-03-19 | ..... | ..... |
| 6 | 2015-09-03 | ..... | ..... |
| 7 | 2014-02-06 | ..... | ..... |
| 8 | 2014-10-30 | ..... | ..... |
| ..| ... | ..... | ..... |
+---+------------+---------------------------------------------+
My query looks like this:
#standardSQL
WITH
T1 AS(
SELECT
date,
STRING_AGG(DISTINCT userid) AS IDs
FROM
`consumer.events`
GROUP BY
date ),
T2 AS(
SELECT
date,
STRING_AGG(IDs) OVER(ORDER BY UNIX_DATE(date) RANGE BETWEEN 90 PRECEDING
AND CURRENT ROW) AS IDs
FROM
T1 )
SELECT
date,
(
SELECT
COUNT(DISTINCT (userid))
FROM
UNNEST(SPLIT(IDs)) AS userid) AS NinetyDays
FROM
T2
Counting unique users requires a lot of resources, even more if you want results over a rolling window. For a scalable solution, look into approximate algorithms like HLL++:
https://medium.freecodecamp.org/counting-uniques-faster-in-bigquery-with-hyperloglog-5d3764493a5a
For an exact count, this would work (but gets slower as the window gets larger):
#standardSQL
SELECT DATE_SUB(date, INTERVAL i DAY) date_grp
, COUNT(DISTINCT owner_user_id) unique_90_day_users
, COUNT(DISTINCT IF(i<31,owner_user_id,null)) unique_30_day_users
, COUNT(DISTINCT IF(i<8,owner_user_id,null)) unique_7_day_users
FROM (
SELECT DATE(creation_date) date, owner_user_id
FROM `bigquery-public-data.stackoverflow.posts_questions`
WHERE EXTRACT(YEAR FROM creation_date)=2017
GROUP BY 1, 2
), UNNEST(GENERATE_ARRAY(1, 90)) i
GROUP BY 1
ORDER BY date_grp
The approximate solution produces results way faster (14s vs 366s, but then the results are approximate):
#standardSQL
SELECT DATE_SUB(date, INTERVAL i DAY) date_grp
, HLL_COUNT.MERGE(sketch) unique_90_day_users
, HLL_COUNT.MERGE(DISTINCT IF(i<31,sketch,null)) unique_30_day_users
, HLL_COUNT.MERGE(DISTINCT IF(i<8,sketch,null)) unique_7_day_users
FROM (
SELECT DATE(creation_date) date, HLL_COUNT.INIT(owner_user_id) sketch
FROM `bigquery-public-data.stackoverflow.posts_questions`
WHERE EXTRACT(YEAR FROM creation_date)=2017
GROUP BY 1
), UNNEST(GENERATE_ARRAY(1, 90)) i
GROUP BY 1
ORDER BY date_grp
Updated query that gives correct results - removing rows with less than 90 days (works when no dates are missing):
#standardSQL
SELECT DATE_SUB(date, INTERVAL i DAY) date_grp
, HLL_COUNT.MERGE(sketch) unique_90_day_users
, HLL_COUNT.MERGE(DISTINCT IF(i<31,sketch,null)) unique_30_day_users
, HLL_COUNT.MERGE(DISTINCT IF(i<8,sketch,null)) unique_7_day_users
, COUNT(*) window_days
FROM (
SELECT DATE(creation_date) date, HLL_COUNT.INIT(owner_user_id) sketch
FROM `bigquery-public-data.stackoverflow.posts_questions`
WHERE EXTRACT(YEAR FROM creation_date)=2017
GROUP BY 1
), UNNEST(GENERATE_ARRAY(1, 90)) i
GROUP BY 1
HAVING window_days=90
ORDER BY date_grp
You can aggregate the date and do the sum. What is the aggregation? Take the most recent date:
select count(*) as num_users,
sum(case when date > datediff(current_date, interval -30 day) then 1 else 0 end) as num_users_30days,
sum(case when date > datediff(current_date, interval -60 day) then 1 else 0 end) as num_users_60days,
sum(case when date > datediff(current_date, interval -90 day) then 1 else 0 end) as num_users_90days
from (select user_id, max(date) as max(date)
from `consumer.events` e
group by user_id
) e;
If the most recent date for the user is in the period, then the user should be counted.
You can get this "as-of" a particular date by using a where clause in the subquery.

Select latest values for group of related records

I have a table that accommodates data that is logically groupable by multiple properties (foreign key for example). Data is sequential over continuous time interval; i.e. it is a time series data. What I am trying to achieve is to select only latest values for each group of groups.
Here is example data:
+-----------------------------------------+
| code | value | date | relation_id |
+-----------------------------------------+
| A | 1 | 01.01.2016 | 1 |
| A | 2 | 02.01.2016 | 1 |
| A | 3 | 03.01.2016 | 1 |
| A | 4 | 01.01.2016 | 2 |
| A | 5 | 02.01.2016 | 2 |
| A | 6 | 03.01.2016 | 2 |
| B | 1 | 01.01.2016 | 1 |
| B | 2 | 02.01.2016 | 1 |
| B | 3 | 03.01.2016 | 1 |
| B | 4 | 01.01.2016 | 2 |
| B | 5 | 02.01.2016 | 2 |
| B | 6 | 03.01.2016 | 2 |
+-----------------------------------------+
And here is example of desired output:
+-----------------------------------------+
| code | value | date | relation_id |
+-----------------------------------------+
| A | 3 | 03.01.2016 | 1 |
| A | 6 | 03.01.2016 | 2 |
| B | 3 | 03.01.2016 | 1 |
| B | 6 | 03.01.2016 | 2 |
+-----------------------------------------+
To put this in perspective — for every related object I want to select each code with latest date.
Here is a select I came with. I've used ROW_NUMBER OVER (PARTITION BY...) approach:
SELECT indicators.code, indicators.dimension, indicators.unit, x.value, x.date, x.ticker, x.name
FROM (
SELECT
ROW_NUMBER() OVER (PARTITION BY indicator_id ORDER BY date DESC) AS r,
t.indicator_id, t.value, t.date, t.company_id, companies.sic_id,
companies.ticker, companies.name
FROM fundamentals t
INNER JOIN companies on companies.id = t.company_id
WHERE companies.sic_id = 89
) x
INNER JOIN indicators on indicators.id = x.indicator_id
WHERE x.r <= (SELECT count(*) FROM companies where sic_id = 89)
It works but the problem is that it is painfully slow; when working with about 5% of production data which equals to roughly 3 million fundamentals records this select take about 10 seconds to finish. My guess is that happens due to subselect selecting huge amounts of records first.
Is there any way to speed this query up or am I digging in wrong direction trying to do it the way I do?
Postgres offers the convenient distinct on for this purpose:
select distinct on (relation_id, code) t.*
from t
order by relation_id, code, date desc;
So your query uses different column names than your sample data, so it's hard to tell, but it looks like you just want to group by everything except for date? Assuming you don't have multiple most recent dates, something like this should work. Basically don't use the window function, use a proper group by, and your engine should optimize the query better.
SELECT mytable.code,
mytable.value,
mytable.date,
mytable.relation_id
FROM mytable
JOIN (
SELECT code,
max(date) as date,
relation_id
FROM mytable
GROUP BY code, relation_id
) Q1
ON Q1.code = mytable.code
AND Q1.date = mytable.date
AND Q1.relation_id = mytable.relation_id
Other option:
SELECT DISTINCT Code,
Relation_ID,
FIRST_VALUE(Value) OVER (PARTITION BY Code, Relation_ID ORDER BY Date DESC) Value,
FIRST_VALUE(Date) OVER (PARTITION BY Code, Relation_ID ORDER BY Date DESC) Date
FROM mytable
This will return top value for what ever you partition by, and for whatever you order by.
I believe we can try something like this
SELECT CODE,Relation_ID,Date,MAX(value)value FROM mytable
GROUP BY CODE,Relation_ID,Date