Get rows within current month - sql

Get the rows from a table filtered by current month in psql.
Example table -
+--------------+--------------------+--------------------------+
| id | name | registeration_date |
+--------------------------------------------------------------+
| 1 | abc | 2018|12|31 18:30:00|00 |
+--------------------------------------------------------------+
| 2 | pqr | 2018|11|31 18:30:00|00 |
+--------------------------------------------------------------+
| 3 | lmn | 2020|07|10 18:30:00|00 |
+--------------+--------------------+--------------------------+
After query result, assuming current month is July, 2020:
+--------------+--------------------+--------------------------+
| id | name | registeration_date |
+--------------+--------------------+--------------------------+
| 3 | lmn | 2020|07|10 18:30:00|00 |
+--------------+--------------------+--------------------------+

The usual approach is to use a range query to allow for an index usage on registration_date
select *
from the_table
where registration_date >= date_trunc('month', current_date)
and registration_date < date_trunc('month', current_date) + interval '1 month';

In this, I use Extract function. For more details on extractcheck here.
SELECT *
FROM table
WHERE extract(month from registeration_date) = extract(month from current_date)
AND extract(year from registeration_date) = extract(year from current_date)

Related

Previous Month MTD

Is there anyways I can find Previous month's MTD?
So my data is at day level and I need to find MTD and previous month MTD
D_date product TOTAL_UNIT
01/AUG/2020 A 10
01/AUG/2020 B 20
02/AUG/2020 A 15
02/AUG/2020 B 25
29/JUL/2020 A 5
29/JUL/2020 B 0
30/JUL/2020 A 2
31/JUL/2020 B 30
I can get current month MTD using below SQL (Oracle)
SUM(TOTAL_UNIT)OVER( PARTITION BY PRODUCT, TRUNC(D_DATE,'MM') ORDER BY D_DATE )MTD
However, when I do add_months -1 to get PMTD, it still shows the current MTD
I tried doing
SUM(TOTAL_UNIT)OVER( PARTITION BY PRODUCT, TRUNC(ADD_MONTHS(D_DATE,-1),'MM') ORDER BY D_DATE )MTD
Another way I can find is by doing a self-join but I would like to avoid that for performance issues.
Changing your partition-by clause to TRUNC(ADD_MONTHS(D_DATE,-1),'MM') - or ADD_MONTHS(TRUNC(D_DATE,'MM'),-1) - gives you a different value for that partitioning bu exactly the same groups as plain TRUNC(D_DATE,'MM').
If you want to get the last MTD before the current month you can put your existing query as a subquery and use lag():
select d_date, product, total_unit, m_date, mtd,
last_value(mtd) over (partition by product order by m_date range between unbounded preceding and 1 preceding) as prev_mtd
from (
select d_date, product, total_unit,
TRUNC(D_DATE,'MM') m_date,
SUM(TOTAL_UNIT)OVER( PARTITION BY PRODUCT, TRUNC(D_DATE,'MM') ORDER BY D_DATE )MTD
from your_table
)
order by product, d_date;
D_DATE | PRODUCT | TOTAL_UNIT | M_DATE | MTD | PREV_MTD
:-------- | :------ | ---------: | :-------- | --: | -------:
29-JUL-20 | A | 5 | 01-JUL-20 | 5 | null
30-JUL-20 | A | 2 | 01-JUL-20 | 7 | null
01-AUG-20 | A | 10 | 01-AUG-20 | 10 | 7
02-AUG-20 | A | 15 | 01-AUG-20 | 25 | 7
29-JUL-20 | B | 0 | 01-JUL-20 | 0 | null
31-JUL-20 | B | 30 | 01-JUL-20 | 30 | null
01-AUG-20 | B | 20 | 01-AUG-20 | 20 | 30
02-AUG-20 | B | 25 | 01-AUG-20 | 45 | 30
db<>fiddle
That is because you're using add_months inside an analytic function. So what you are doing is calculate the sum, partitioned by the month, as if it were the previous month.. While the last part doesn't make much sense to me, it does give you the correct SUM, but you didn't tell SQL to show the the previous month as well.
If you want the D_DATE to be showed as previous month, add another column trunc( ADD_MONTHS( d_date, -1), 'MM') as pmtd
select trunc( ADD_MONTHS( sysdate, -1), 'MM') from dual;
This will always get you the first date of the previous month.

30 day rolling count of distinct IDs

So after looking at what seems to be a common question being asked and not being able to get any solution to work for me, I decided I should ask for myself.
I have a data set with two columns: session_start_time, uid
I am trying to generate a rolling 30 day tally of unique sessions
It is simple enough to query for the number of unique uids per day:
SELECT
COUNT(DISTINCT(uid))
FROM segment_clean.users_sessions
WHERE session_start_time >= CURRENT_DATE - interval '30 days'
it is also relatively simple to calculate the daily unique uids over a date range.
SELECT
DATE_TRUNC('day',session_start_time) AS "date"
,COUNT(DISTINCT uid) AS "count"
FROM segment_clean.users_sessions
WHERE session_start_time >= CURRENT_DATE - INTERVAL '90 days'
GROUP BY date(session_start_time)
I then I tried several ways to do a rolling 30 day unique count over a time interval
SELECT
DATE(session_start_time) AS "running30day"
,COUNT(distinct(
case when date(session_start_time) >= running30day - interval '30 days'
AND date(session_start_time) <= running30day
then uid
end)
) AS "unique_30day"
FROM segment_clean.users_sessions
WHERE session_start_time >= CURRENT_DATE - interval '3 months'
GROUP BY date(session_start_time)
Order BY running30day desc
I really thought this would work but when looking into the results, it appears I'm getting the same results as I was when doing the daily unique rather than the unique over 30days.
I am writing this query from Metabase using the SQL query editor. the underlying tables are in redshift.
If you read this far, thank you, your time has value and I appreciate the fact that you have spent some of it to read my question.
EDIT:
As rightfully requested, I added an example of the data set I'm working with and the desired outcome.
+-----+-------------------------------+
| UID | SESSION_START_TIME |
+-----+-------------------------------+
| | |
| 10 | 2020-01-13T01:46:07.000-05:00 |
| | |
| 5 | 2020-01-13T01:46:07.000-05:00 |
| | |
| 3 | 2020-01-18T02:49:23.000-05:00 |
| | |
| 9 | 2020-03-06T18:18:28.000-05:00 |
| | |
| 2 | 2020-03-06T18:18:28.000-05:00 |
| | |
| 8 | 2020-03-31T23:13:33.000-04:00 |
| | |
| 3 | 2020-08-28T18:23:15.000-04:00 |
| | |
| 2 | 2020-08-28T18:23:15.000-04:00 |
| | |
| 9 | 2020-08-28T18:23:15.000-04:00 |
| | |
| 3 | 2020-08-28T18:23:15.000-04:00 |
| | |
| 8 | 2020-09-15T16:40:29.000-04:00 |
| | |
| 3 | 2020-09-21T20:49:09.000-04:00 |
| | |
| 1 | 2020-11-05T21:31:48.000-05:00 |
| | |
| 6 | 2020-11-05T21:31:48.000-05:00 |
| | |
| 8 | 2020-12-12T04:42:00.000-05:00 |
| | |
| 8 | 2020-12-12T04:42:00.000-05:00 |
| | |
| 5 | 2020-12-12T04:42:00.000-05:00 |
+-----+-------------------------------+
bellow is what the result I would like looks like:
+------------+---------------------+
| DATE | UNIQUE 30 DAY COUNT |
+------------+---------------------+
| | |
| 2020-01-13 | 3 |
| | |
| 2020-01-18 | 1 |
| | |
| 2020-03-06 | 3 |
| | |
| 2020-03-31 | 1 |
| | |
| 2020-08-28 | 4 |
| | |
| 2020-09-15 | 2 |
| | |
| 2020-09-21 | 1 |
| | |
| 2020-11-05 | 2 |
| | |
| 2020-12-12 | 2 |
+------------+---------------------+
Thank you
You can approach this by keeping a counter of when users are counted and then uncounted -- 30 (or perhaps 31) days later. Then, determine the "islands" of being counted, and aggregate. This involves:
Unpivoting the data to have an "enters count" and "leaves" count for each session.
Accumulate the count so on each day for each user you know whether they are counted or not.
This defines "islands" of counting. Determine where the islands start and stop -- getting rid of all the detritus in-between.
Now you can simply do a cumulative sum on each date to determine the 30 day session.
In SQL, this looks like:
with t as (
select uid, date_trunc('day', session_start_time) as s_day, 1 as inc
from users_sessions
union all
select uid, date_trunc('day', session_start_time) + interval '31 day' as s_day, -1
from users_sessions
),
tt as ( -- increment the ins and outs to determine whether a uid is in or out on a given day
select uid, s_day, sum(inc) as day_inc,
sum(sum(inc)) over (partition by uid order by s_day rows between unbounded preceding and current row) as running_inc
from t
group by uid, s_day
),
ttt as ( -- find the beginning and end of the islands
select tt.uid, tt.s_day,
(case when running_inc > 0 then 1 else -1 end) as in_island
from (select tt.*,
lag(running_inc) over (partition by uid order by s_day) as prev_running_inc,
lead(running_inc) over (partition by uid order by s_day) as next_running_inc
from tt
) tt
where running_inc > 0 and (prev_running_inc = 0 or prev_running_inc is null) or
running_inc = 0 and (next_running_inc > 0 or next_running_inc is null)
)
select s_day,
sum(sum(in_island)) over (order by s_day rows between unbounded preceding and current row) as active_30
from ttt
group by s_day;
Here is a db<>fiddle.
I'm pretty sure the easier way to do this is to use a join. This creates a list of all the distinct users who had a session on each day and a list of all distinct dates in the data. Then it one-to-many joins the user list to the date list and counts the distinct users, the key here is the expanded join criteria that matches a range of dates to a single date via a system of inequalities.
with users as
(select
distinct uid,
date_trunc('day',session_start_time) AS dt
from <table>
where session_start_time >= '2021-05-01'),
dates as
(select
distinct date_trunc('day',session_start_time) AS dt
from <table>
where session_start_time >= '2021-05-01')
select
count(distinct uid),
dates.dt
from users
join
dates
on users.dt >= dates.dt - 29
and users.dt <= dates.dt
group by dates.dt
order by dt desc
;

How can I insert data in table form into another table provided some specific conditions are satisfied

Logic: If today is Monday (reference 'time' table), data present in S should be inserted into M (along with a sent_day column which will have today's date).
If today is not Monday, dates corresponding to current week (unique week_id) should be checked in M table. If any of these dates are available in M then S should not be inserted into M. If these dates are not available in M then S should be inserted into M
time
+------------+------------+----------------+
| cal_dt | cal_day | week_id |
+------------+------------+----------------+
| 2020-03-23 | Monday | 123 |
| 2020-03-24 | Tuesday | 123 |
| 2020-03-25 | Wednesday | 123 |
| 2020-03-26 | Thursday | 123 |
| 2020-03-27 | Friday | 123 |
| 2020-03-30 | Monday | 124 |
| 2020-03-31 | Tueday | 124 |
+------------+------------+----------------+
M
+------------+----------+-------+
| sent_day | item | price |
+------------+----------+-------+
| 2020-03-11 | pen | 10 |
| 2020-03-11 | book | 50 |
| 2020-03-13 | Eraser | 5 |
| 2020-03-13 | sharpner | 5 |
+------------+----------+-------+
S
+----------+-------+
| item | price |
+----------+-------+
| pen | 25 |
| book | 20 |
| Eraser | 10 |
| sharpner | 3 |
+----------+-------+
Insert INTO M
SELECT
CASE WHEN(SELECT cal_day FROM time WHERE cal_dt = current_date) = 'Monday' THEN s.*
ELSE
(CASE WHEN(SELECT cal_dt FROM time WHERE wk_id =(SELECT wk_id FROM time WHERE cal_dt = current_date ) NOT IN (SELECT DISTINCT sent_day FROM M) THEN 1 ELSE 0 END)
THEN s.* ELSE END
FROM s
I would do this in two separate INSERT statements:
The first condition ("if today is monday") is quite easy:
insert into m (sent_day, item, price)
select current_date, item, price
from s
where exists (select *
from "time"
where cal_dt = current_date
and cal_day = 'Monday');
I find storing the date and the week day a bit confusing as the week day can easily be extracted from the day. For the test "if today is Monday" it's actually not necessary to consult the "time" table at all:
insert into m (sent_day, item, price)
select current_date, item, price
from s
where extract(dow from current_date) = 1;
The second part is a bit more complicated, but if I understand it correctly, it should be something like this:
insert into m (sent_day, item, price)
select current_date, item, price
from s
where not exists (select *
from m
where m.sent_day in (select cal_dt
from "time" t
where cal_dt = current_date
and cal_day <> 'Monday'));
If you just want a single INSERT statement, you could simply do a UNION ALL between the two selects:
insert into m (sent_day, item, price)
select current_date, item, price
from s
where extract(dow from current_date) = 1
union all
select current_date, item, price
from s
where not exists (select *
from m
where m.sent_day in (select cal_dt
from "time" t
where cal_dt = current_date
and cal_day <> 'Monday'));

Group by month, day, hour + gaps and islands problem

I need to calculate (in percentage) how long was status true during day, hours or month (working_time).
I simplify my table to this one:
| date | status |
|-------------------------- |-------- |
| 2018-11-05T19:04:21.125Z | true |
| 2018-11-05T19:04:22.125Z | true |
| 2018-11-05T19:04:23.125Z | true |
| 2018-11-05T19:04:24.125Z | false |
| 2018-11-05T19:04:25.125Z | true |
....
I need to get in result (depend on parameter) this one:
for hours:
| date | working_time |
|-------------------------- |--------------|
| 2018-11-05T00:00:00.000Z | 14 |
| 2018-11-05T01:00:00.000Z | 15 |
| 2018-11-05T02:00:00.000Z | 32 |
|... | ... |
| 2018-11-05T23:00:00.000Z | 13 |
for months:
| date | working_time |
|-------------------------- |--------------|
| 2018-01-01T00:00:00.000Z | 14 |
| 2018-02-01T00:00:00.000Z | 15 |
| 2018-03-01T00:00:00.000Z | 32 |
|... | ... |
| 2018-12-01T00:00:00.000Z | 13 |
My SQL query looks like this:
SELECT date_trunc('month', date) as date,
round((EXTRACT(epoch from sum(time_diff)) / 25920) :: numeric, 2) as working_time
FROM (SELECT date,
status as current_status,
(lag(status, 1) OVER (ORDER BY date)) AS previous_status,
(date -(lag(date, 1) OVER (ORDER BY date))) AS time_diff
FROM table
) as raw_data
WHERE current_status = TRUE AND previous_status = TRUE
GROUP BY date_trunc('month', date)
ORDER BY date;
and it works ok but really slow. Any idea about optimisation? Maybse using Row_Number() function?
Try this:
SELECT t.month_reference as date,
round( sum(if(t_aux.status,1,0)) / 25920) :: numeric, 2) as working_time
#I asume you use this number because is the uptime of the system 60*18*24,
#I would use this if I wanted the total seconds in the month 60*60*24*day(Last_day(t.month_reference))
FROM (SELECT date_trunc('month', t.date) as month_reference
FROM table
) as t
left join table t_aux
on t.month_reference=date_trunc('month', t_aux.date)
so when we group by month, the sum() will only find the rows that are true and have the referenced month
and t_aux.date <
(select t1.date
from table t1
where t.month_reference=date_trunc('month', t1.date)
and t1.status=false
order by t1.date asc limit 1 )
I add this so it only selects the rows that are true until it finds a row with status false in the same month reference
GROUP BY t.month_reference
ORDER BY t.month_reference;

SQL sum over partition for preceding period

I have the following table, which represent Customers for each day:
+----------+-----------+
| Date | Customers |
+----------+-----------+
| 1/1/2014 | 4 |
| 1/2/2014 | 7 |
| 1/3/2014 | 5 |
| 1/4/2014 | 5 |
| 1/5/2014 | 10 |
| 2/1/2014 | 7 |
| 2/2/2014 | 4 |
| 2/3/2014 | 1 |
| 2/4/2014 | 5 |
+----------+-----------+
I would like to add 2 additional columns:
Summary of the customers for the current month
Summary of the customers for the preceding month
here's the desired outcome:
+----------+-----------+----------------------+------------------------+
| Date | Customers | Sum_of_Current_month | Sum_of_Preceding_month |
+----------+-----------+----------------------+------------------------+
| 1/1/2014 | 4 | 31 | 0 |
| 1/2/2014 | 7 | 31 | 0 |
| 1/3/2014 | 5 | 31 | 0 |
| 1/4/2014 | 5 | 31 | 0 |
| 1/5/2014 | 10 | 31 | 0 |
| 2/1/2014 | 7 | 17 | 31 |
| 2/2/2014 | 4 | 17 | 31 |
| 2/3/2014 | 1 | 17 | 31 |
| 2/4/2014 | 5 | 17 | 31 |
+----------+-----------+----------------------+------------------------+
I have managed to calculate the 3rd column by a simple sum over partition function:
Select
Date,
Customers,
Sum(Customers) over (Partition by (Month(Date)||year(Date) Order by 1) as Sum_of_Current_month
From table
However, I can't find a way to calculate the Sum_of_preceding_month column.
Appreciate your support.
Asaf
The previous month is a bit tricky. What's your Teradata release, TD14.10 supports LAST_VALUE:
SELECT
dt,
customers,
Sum_of_Current_month,
-- return the previous sum
COALESCE(LAST_VALUE(x ignore NULLS)
OVER (ORDER BY dt
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING)
,0) AS Sum_of_Preceding_month
FROM
(
SELECT
dt,
Customers,
SUM(Customers) OVER (PARTITION BY TRUNC(dt,'mon')) AS Sum_of_Current_month,
CASE -- keep the number only for the last day in month
WHEN ROW_NUMBER()
OVER (PARTITION BY TRUNC(dt,'mon')
ORDER BY dt)
= COUNT(*)
OVER (PARTITION BY TRUNC(dt,'mon'))
THEN Sum_of_Current_month
END AS x
FROM tab
) AS dt
I think this might be easier by using lag() and an aggregation sub-query. The ANSI Standard syntax is:
Select t.*, tt.sumCustomers, tt.prev_sumCustomers
From table t join
(select extract(year from date) as yyyy, extract(month from date) as mm,
sum(Customers) as sumCustomers,
lag(sum(Customers)) over (order by extract(year from date), extract(month from date)
) as prev_sumCustomers
from table t
group by extract(year from date), extract(month from date)
) tt
on extract(year from date) = tt.yyyy and extract(month from date) = t.mm;
In Teradata, this would be written as:
Select t.*, tt.sumCustomers, tt.prev_sumCustomers
From table t join
(select extract(year from date) as yyyy, extract(month from date) as mm,
sum(Customers) as sumCustomers,
min(sum(Customers)) over (order by extract(year from date), extract(month from date)
rows between 1 preceding and 1 preceding
) as prev_sumCustomers
from table t
group by extract(year from date), extract(month from date)
) tt
on extract(year from date) = tt.yyyy and extract(month from date) = t.mm;
Try this:
SELECT
[Date],
[Customers],
(SELECT SUM(customers) FROM table WHERE MONTH(dte) = MONTH(tbl.dte)),
ISNULL((SELECT SUM(customers) FROM table WHERE MONTH(dte) = MONTH(DATEADD(MONTH, -1, tbl.dte))), 0)
FROM table tbl