I want to get the current total count of registered users by the day in an SQL Database from this data:
| userID | date_registered |
| -------- | --------------- |
| 10012 | 2021-03-01 |
| 10043 | 2021-03-01 |
| 10065 | 2021-03-04 |
| 10087 | 2021-03-05 |
| 10091 | 2021-03-05 |
| 10123 | 2021-03-05 |
| 10231 | 2021-03-06 |
| 10421 | 2021-03-09 |
So for 2021-03-01, there are currently 2 registered users.
For 2021-03-04, there are currently 3 registered users (including registers from previous dates)
For 2021-03-05, there are currently 6 registered users (including registers from previous dates)
and so on...
So the expected result should be
| total_user | date |
| ---------- | --------------- |
| 2 | 2021-03-01 |
| 3 | 2021-03-04 |
| 6 | 2021-03-05 |
| 7 | 2021-03-06 |
| 8 | 2021-03-09 |
Is there an SQL query possible to accomplish this result in BigQuery?
Much appreciated the help.
In BigQuery or any reasonable database, we can aggregate by date and then use SUM as an analytic function:
SELECT
SUM(COUNT(*)) OVER (ORDER BY date_registered) AS total_user,
date_registered AS date
FROM yourTable
GROUP BY
date_registered
ORDER BY
date_registered;
Note that if the same user might be reported more than once on a given date, then use COUNT(DISTINCT userID) instead of COUNT(*).
You can use this, but there are more practical ways for mysql 8+
SELECT e1.date_registered, (SELECT COUNT(e2.userID) FROM example e2
WHERE e2.date_registered <= e1.date_registered) AS count_ FROM example e1
GROUP BY date_registered
SqlFiddle
Related
Here's some more detail, since it's a bit hard to clearly ask this question in a sentence:
Basically, I have a table with some of the following fields:
| ID | date | start_date | amount_paid | last_amount_paid | field |
| -------- | ---------------------| ----------------------| ----------- | ---------------- | ---------- |
| ID_00001 | 2020-08-01 00:00:00 | 2019-11-06 20:23:36 | 0 | 0 | cosmetics |
| ID_00002 | 2020-08-02 00:00:00 | 2018-10-06 10:34:21 | 10 | 0 | finance |
| ... | ... | ... | ... | ... | ... |
| ID_99999 | 2021-11-06 00:00:00 | 2020-08-01 11:54:47 | 15 | 10 | software |
What I want is to add a "months" column that counts the number of months between the start date and date for each ID, for example:
| ID | date | start_date | ... | months |
| -------- | ---------------------| ----------------------| ---- | ---------- |
| ID_00001 | 2020-08-01 00:00:00 | 2019-11-06 20:23:36 | ... | 9 |
| ID_00002 | 2020-08-02 00:00:00 | 2018-10-06 10:34:21 | ... | 22 |
| ... | ... | ... | ... | ... |
| ID_99999 | 2021-11-06 00:00:00 | 2020-08-01 11:54:47 | ... | 15 |
I then want to group all IDs that have started (first start date) at the same time together (i.e. I want to group users by number of months).
I'm having a difficult time wrapping my mind around doing this using SnowflakeSQL.
The goal here is basically to track revenue by cohorts based on when they joined. Please let me know if my approach is wrong and how you would go about implementing that.
Much appreciated!
Using computed/generated column and DATEDIFF:
CREATE OR REPLACE TABLE t(id TEXT,
date DATE,
start_date DATE,
months INT AS (DATEDIFF(MONTH, start_date, date))
);
Sample data:
INSERT INTO t(id, date, start_date)
SELECT 'ID_00001', '2020-08-01 00:00:00', '2019-11-06 20:23:36'
UNION SELECT 'ID_00002', '2020-08-02 00:00:00', '2018-10-06 10:34:21';
SELECT * FROM t;
Output:
Alernatively wrapping table with a view:
CREATE VIEW t_vw
AS
SELECT t.id, t.start_date, t.date, DATEDIFF(MONTH, start_date, date) AS months
FROM t;
I'm trying to write a SQL statement that includes another statement, to get from that all a view. I have 1 data Table. this table have 3 rows(see: Table 1). What I'm trying to do is create a view which select all dates one time DISTINCT. now for every selected date row, select all rows where date = date and sum all price.
For example: the Main table
+----+--------------+---------------+------------+
| id | article_name | article_price | date |
+----+--------------+---------------+------------+
| 1 | T-Shirt | 10 | 2020-11-16 |
| 2 | Shoes | 25 | 2020-11-16 |
| 3 | Pullover | 35 | 2020-11-17 |
| 4 | Pants | 10 | 2020-11-18 |
+----+--------------+---------------+------------+
What im expecting is to have 3 rows(because the first 2 rows have the same date)
+------------+-----+
| date | sum |
+------------+-----+
| 2020-11-16 | 35 |
| 2020-11-17 | 35 |
| 2020-11-18 | 10 |
+------------+-----+
I'm having a hard time to think about an "Algorithm" to solve this.
any ideas?
Use group by!
select date, sum(article_price) as sum_article_price
from mytable
group by date
I have a set of data that tells me the owner for something for each date, sample data below. There are some breaks in the date column.
| owner | date |
|-------------+-------------+
| Samantha | 2010-01-02 |
| Max | 2010-01-03 |
| Max | 2010-01-04 |
| Max | 2010-01-06 |
| Max | 2010-01-07 |
| Conor | 2010-01-08 |
| Conor | 2010-01-09 |
| Conor | 2010-01-10 |
| Conor | 2010-01-11 |
| Abigail | 2010-01-12 |
| Abigail | 2010-01-13 |
| Abigail | 2010-01-14 |
| Abigail | 2010-01-15 |
| Max | 2010-01-17 |
| Max | 2010-01-18 |
| Abigail | 2010-01-20 |
| Conor | 2010-01-21 |
I am trying to write a query that can capture date ranges for when each owner's interval.. such as
| owner | start | end |
|-------------+------------+------------+
| Samantha | 2010-01-02 | 2010-01-02 |
| Max | 2010-01-03 | 2010-01-04 |
| Max | 2010-01-06 | 2010-01-07 |
| Conor | 2010-01-08 | 2010-01-11 |
| Abigail | 2010-01-12 | 2010-01-15 |
| Max | 2010-01-17 | 2010-01-18 |
| Abigail | 2010-01-20 | 2010-01-20 |
| Conor | 2010-01-21 | 2010-01-21 |
I tried think of this using min() and max() but I am stuck. I feel like I need to use lead() and lag() but not sure how to use them to get the output I want. Any ideas? Thanks in advance!
This is a typical gaps-and-island problem. Here is one way to solve it using row_number():
select owner, min(date) start, max(date) end
from (
select
owner,
row_number() over(order by date) rn1,
row_number() over(partition by owner, order by date) rn2
from mytable
) t
group by owner, rn1 - rn2
This works by ranking records by date over two different partitions (within the whole table and within groups having the same owner). The difference between the ranks gives you the group each record belongs to. You can run the inner query and look at the results to understand the logic.
This is a gaps-and-islands problem. You want to solve it by subtracting a sequential value from the date and aggregating:
select owner, min(date), max(date)
from (select t.*,
row_number() over (partition by owner order by date) as seqnum
from t
) t
group by owner, (date - seqnum * interval '1 day')
order by min(date);
The magic is that the sequence subtracted from the date is constant when the date values increment.
I was wondering if someone could please help me out with a query I'm trying to write.I'm operating on SQL Server syntax.
I'm trying to group these rows by Group and have the minimum date and the maximum date associated to that group also show up in my output.
I'm sorry for not having code to offer. I just really wasn't able to come up with anything. This one went beyond my understanding of SQL. Wouldn't mind if explanation or theory is added to solution as well for learning.
Again thanks to anyone in advance who can help me.
Here's the example.
Data Sample:
+----------------+-------------+-------------------+-----------------+
| Name | Group | CheckInDate | CheckOutDate |
+----------------+-------------+-------------------+-----------------+
| Rogue | Group 1 | 01/03/2019 | 01/08/2019 |
+----------------+-------------+-------------------+-----------------+
| Larry | Group 3 | 01/01/2019 | 01/07/2019 |
+----------------+-------------+-------------------+-----------------+
| Jorge | Group 2 | 01/02/2019 | 01/04/2019 |
+----------------+-------------+-------------------+-----------------+
| Tara | Group 1 | 01/02/2019 | 01/07/2019 |
+----------------+-------------+-------------------+-----------------+
| Luca | Group 2 | 01/03/2019 | 01/08/2019 |
+----------------+-------------+-------------------+-----------------+
Query Output
+----------------+-----------------+-------------------+-----------------+
| Description | Count | CheckInDate | CheckOutDate |
+----------------+-----------------+-------------------+-----------------+
| Group 1 | 2 | 01/02/2019 | 01/08/2019 |
+----------------+-----------------+-------------------+-----------------+
| Group 2 | 2 | 01/02/2019 | 01/08/2019 |
+----------------+-----------------+-------------------+-----------------+
| Group 3 | 1 | 01/01/2019 | 01/07/2019 |
+----------------+-----------------+-------------------+-----------------+
You need to group by the Group, and do a min and max on the date columns:
SELECT
Group AS Description, COUNT(*) AS Count,
MIN(CheckInDate) AS CheckInDate,
MAX(CheckOutDate) AS CheckOutDate
FROM Dates
GROUP BY Group;
I need to make a query to the following table, to return the maximum date grouped by code and also make the following calculation: deb-cre (maximum only).
How would I do this?
code | date | deb | cred
-----------------------------------
4 | 2018-01-01 | 100,00 | 200,00
4 | 2017-12-28 | 100,00 | 500,00
6 | 2018-01-23 | 350,00 | 400,00
6 | 2018-04-28 | 140,00 | 678,00
8 | 2018-01-12 | 156,00 | 256,00
8 | 2016-02-28 | 134,00 | 598,00
The result must be
4 | 2018-01-01 | -200,00
6 | 2018-04-28 | -50,00
8 | 2018-01-12 | -464,00
PostgreSQL's DISTINCT ON in combination with ORDER BY will return the first row per group:
SELECT DISTINCT ON (code)
code, date, deb - cre
FROM your_table
ORDER BY code, date DESC;