Teradata Query for extracting data based on time interval (10 minutes - sql

Can someone help me with a query in Teradata/SQL that I can use to extract all users that have more than 3 transactions in a timestamp of 10 minutes. Below is an extract of the table in question.
Kind regards,

You can use lag()/lead() and time comparisons. To get the rows where there are 2 such transactions before:
select t.*
from t
qualify transaction_timestamp < lag(transaction_timestamp, -2) over (partition by userid order by transaction_timestamp) + interval '10' minute
If you only want the users:
select distinct userid
from (select t.*
from t
qualify transaction_timestamp < lag(transaction_timestamp, 2) over (partition by userid order by transaction_timestamp) + interval '10' minute
) t

Related

Day wise Rolling 30 day uniques user count bigquery

I am trying to generate a day on day rolling 30 days unique count using this query but the problem is running this query day on the day I need aug full month rolling 30 days day on day count in one script pls help
-----------------------------------------
SELECT max(date),count(DISTINCT user_id) as MAU
FROM user_data
WHERE date between DATE_SUB('2020-08-31' ,INTERVAL 29 DAY) and '2020-08-31';
BigQuery doesn't support rolling windows for count(distinct). So, one approach is a brute force method:
select dte,
(select count(distinct ud.user_id)
from user_data ud
where ud.date between DATE_SUB(dte, INTERVAL 29 DAY) and dte
) as num_users
from unnest(generate_date_array(date('2020-08-01'), date('2020-08-31'))) dte
Gordon approach works great.
If you need to calculate more numbers - Cross join the data.
SELECT
date_gen,
COUNT(DISTINCT IF(ud.date BETWEEN DATE_SUB(date_gen ,INTERVAL 29 DAY) AND date_gen,ud.user_id,NULL)) as MAU
FROM
UNNEST(GENERATE_DATE_ARRAY(DATE_SUB('2020-08-31' ,INTERVAL 29 DAY), date('2020-08-31'))) date_gen,
(SELECT * FROM user_data WHERE date BETWEEN DATE_SUB('2020-08-31' ,INTERVAL 60 DAY) AND '2020-08-31') AS ud
GROUP BY 1
ORDER BY 1 DESC
With SET and DECLARE you can get rid of replacing the 'DATE' multiple times.
Below is for BigQuery Standard SQL
#standardSQL
SELECT date, (SELECT COUNT(DISTINCT id) FROM t.users AS id) AS MAU
FROM (
SELECT date, ARRAY_AGG(user_id) OVER(mau_win) users
FROM `project.dataset.user_data`
WINDOW mau_win AS (
ORDER BY UNIX_DATE(date) DESC RANGE BETWEEN CURRENT ROW AND 29 FOLLOWING
)
) t
Above assumes you have entries in project.dataset.user_data table for all days in time period of your interest
If this is not a case, and you actually have some gaps in your data - you can use below
#standardSQL
SELECT date, (SELECT COUNT(DISTINCT id) FROM t.users AS id) AS MAU
FROM (
SELECT date, ARRAY_AGG(user_id) OVER(mau_win) users
FROM UNNEST(GENERATE_DATE_ARRAY('2020-08-01', '2020-08-31')) AS date
LEFT JOIN `project.dataset.user_data`
USING(date)
WINDOW mau_win AS (
ORDER BY UNIX_DATE(date) DESC RANGE BETWEEN CURRENT ROW AND 29 FOLLOWING
)
) t

Appending the result query in bigquery

I am doing a query where the query will append the data from previous date as the outcome in BigQuery.
So, the result data for today will be higher than yesterdays as the data is appending by days.
So far, what I only managed to get the outcome is the data by days (where you can see the number of ID declining and is not appending from previous day) as this result:
What should I do to add appending function in the query so each day will get the result of data from the previous day in bigquery?
code:
WITH
table1 AS (
SELECT
ID,
...
FROM t
WHERE DATE_SUB('2020-01-31', INTERVAL 31 DAY) and '2020-01-31'
),
table2 AS (
SELECT
ID,
COUNTIF((rating < 7) as bad,
COUNTIF((rating >= 7 AND SAFE_CAST(NPS_Rating as INT64) < 9) as intermediate,
COUNTIF((rating as good
FROM
t
WHERE DATE_SUB('2020-01-31', INTERVAL 31 DAY) and '2020-01-31'
)
SELECT
DATE_SUB('2020-01-31', INTERVAL 31 DAY) as date,
*
FROM table1
FULL OUTER JOIN table2 USING (ID)
If you have counts that you want to accumulate, then you want a cumulative sum. The query would look something like this:
select datecol, count(*), sum(count(*)) over (order by datecol)
from t
group by datecol
order by datecol;

Vertica Analytic function to count instances in a window

Let's say I have a dataset with two columns: ID and timestamp. My goal is to count return IDs that have at least n timestamps in any 30 day window.
Here is an example:
ID Timestamp
1 '2019-01-01'
2 '2019-02-01'
3 '2019-03-01'
1 '2019-01-02'
1 '2019-01-04'
1 '2019-01-17'
So, let's say I want to return a list of IDs that have 3 timestamps in any 30 day window.
Given above, my resultset would just be ID = 1. I'm thinking some kind of windowing function would accomplish this, but I'm not positive.
Any chance you could help me write a query that accomplishes this?
A relatively simple way to do this involves lag()/lead():
select t.*
from (select t.*,
lead(timestamp, 2) over (partition by id order by timestamp) as timestamp_2
from t
) t
where datediff(day, timestamp, timestamp_2) <= 30;
The lag() looks at the third timestamp in a series. The where checks if this is within 30 days of the original one. The result is rows where this occurs.
If you just want the ids, then:
select distinct id
from (select t.*,
lead(timestamp, 2) over (partition by id order by timestamp) as timestamp_2
from t
) t
where datediff(day, timestamp, timestamp_2) <= 30;

How to query database for rows from next 5 days

How can I make a query in SQL Server to query for all rows for the next 5 days.
The problem is that it has to be days with records, so the next 5 days, might become something like, Today, Tomorrow, some day in next month, etc...
Basically I want to query the database for the records for the next non empty X days.
The table has a column called Date, which is what I want to filter.
Why not split the search into 2 queries. First one searches for the date part, the second uses that result to search for records IN the dates returned by the first query.
#Anagha is close, just a little modification and it is OK.
SELECT *
FROM TABLE
WHERE DATE IN (
SELECT DISTINCT TOP 5 DATE
FROM TABLE
WHERE DATE >= referenceDate
ORDER BY DATE
)
You can use following SQL query where 5 different dates are fetched at first then all rows for those selected dates are displayed
declare #n int = 5;
select *
from myData
where
datecol in (
SELECT distinct top (#n) cast(datecol as date) as datecol
FROM myData
WHERE datecol >= '20180101'
ORDER BY datecol
)
Try this:
select date from table where date in (select distinct top 5 date
from table where date >= getdate() order by date)
If your values are dates, you can use `dense_rank():
select t.*
from (select t.*, dense_rank() over (order by datecol) as seqnum
from t
where datecol >= cast(getdate() as date)
) t
where seqnum <= 5;
If the column has a time component and you still want to define days by midnight-to-midnight (as suggested by the question), just convert to date:
select t.*
from (select t.*,
dense_rank() over (order by cast(datetimecol as date)) as seqnum
from t
where datetimecol >= cast(getdate() as date)
) t
where seqnum <= 5;

SQL get records from specific day

i am looking to only get records from exactly 3 days ago, not the records from now and as far back as 3 days ago... i only want records for that 24hour period and when i use the > its selecting everything from now until then
this is my query
SELECT user_id, nickname, user_email, date_of_register, verification_code, user_photo_url FROM `tbl_users` WHERE verified='N' AND date_of_register > DATE_SUB(CURDATE(), INTERVAL 3 DAY) ORDER BY date_of_register ASC
column date_of_register is date time and this is the format
2016-08-26 08:57:52
thank you for your help
Compare dates instead of date and time, CAST your date_of_register as DATE only.
WHERE verified='N' AND
CAST( date_of_register AS DATE) = CAST(DATE_SUB(CURDATE(), INTERVAL 3 DAY) AS DATE)
Try:
$date = "2016-08-26";
SELECT * FROM `tbl_users` WHERE `verified`='N' AND `date_of_register` = '{$date}' ORDER BY `date_of_register` ASC
Setting a variable of $date will always be the better way as it can be quicker to search rather than routing through numerous lines of code to find your query.
Rather than selecting a set few, why not just SELECT * (all) ?
You have to match date and ignore time to get result
use DATE(date_of_register) to get record from 3 days
SELECT user_id, nickname, user_email, date_of_register, verification_code, user_photo_url FROM `tbl_users` WHERE verified='N' AND DATE(date_of_register) > DATE_SUB(CURDATE(), INTERVAL 3 DAY) ORDER BY date_of_register ASC
Adding this as the current accepted solution would prevent indexes from working on the date_of_register field.
set #date=DATE_SUB(CURDATE(), INTERVAL 3 DAY)
SELECT user_id, nickname, user_email, date_of_register, verification_code, user_photo_url
FROM `tbl_users`
WHERE verified='N'
AND date_of_register > #date
AND date_of_register < DATE_ADD(#date,INTERVAL 1 DAY)
ORDER BY date_of_register ASC