Grab abandoned carters from the last hour in Oracle Responsys

Grab abandoned carters from the last hour in Oracle Responsys - sql

I'm trying to grab people out of a table who have an abandon date between 20 minutes ago and 2 hours ago. This seems to grab the right amount of time, but is all 4 hours old:
SELECT *
FROM $A$
WHERE ABANDONDATE >= SYSDATE - INTERVAL '2' HOUR
AND ABANDONDATE < SYSDATE - INTERVAL '20' MINUTE
AND EMAIL_ADDRESS_ NOT IN(SELECT EMAIL_ADDRESS_ FROM $B$ WHERE ORDERDATE >= sysdate - 4)
also, it grabs every record for everyone and I only want the most recent product abandoned (highest abandondate) for each email address. I can't seem to figure this one out.

If the results are EXACTLY four hours old, it is possible that there is a time zone mismatch. What is the EXACT data type of ABANDONDATE in your database? Perhaps TIMESTAMP WITH TIMEZONE? Four hours seems like the difference between UTC and EDT (Eastern U.S. with daylight savings time offset).
For your other question, did you EXPECT your query to only pick up the most recent product abandoned? Which part of your query would do that? Instead, you need to add row_number() over (partition by [whatever identifies clients etc.] order by abandondate), make the resulting query into a subquery and wrap it within an outer query where you filter by (WHERE clause) rn = 1. We can help with this if you show us the table structure (name and data type of columns in the table - only the relevant columns - including which is or are Primary Key).

Try
SELECT * FROM (
SELECT t.*,
row_number()
over (PARTITION BY email_address__ ORDER BY ABANDONDATE DESC) As RN
FROM $A$ t
WHERE ABANDONDATE >= SYSDATE - INTERVAL '2' HOUR
AND ABANDONDATE < SYSDATE - INTERVAL '20' MINUTE
AND EMAIL_ADDRESS_ NOT IN(
SELECT EMAIL_ADDRESS_ FROM $B$
WHERE ORDERDATE >= sysdate - 4)
)
WHERE rn = 1
another approach
SELECT *
FROM $A$
WHERE (EMAIL_ADDRESS_, ABANDONDATE) IN (
SELECT EMAIL_ADDRESS_, MAX( ABANDONDATE )
FROM $A$
WHERE ABANDONDATE >= SYSDATE - INTERVAL '2' HOUR
AND ABANDONDATE < SYSDATE - INTERVAL '20' MINUTE
AND EMAIL_ADDRESS_ NOT IN(
SELECT EMAIL_ADDRESS_ FROM $B$
WHERE ORDERDATE >= sysdate - 4)
GROUP BY EMAIL_ADDRESS_
)

Related

Rewrite PostgreSQL query using CTE:

I have the following code to pull records from a daterange in PostgreSQL, it works as intended. The "end date" is determined by the "date" column from the last record, and the "start date" is calculated by subtracting a 7-day interval from the "end date".
SELECT date
FROM files
WHERE daterange((
(SELECT date FROM files ORDER BY date DESC LIMIT 1) - interval '7 day')::date, -- "start date"
(SELECT date FROM files ORDER BY date DESC LIMIT 1)::date, -- "end date"
'(]') #> date::date
ORDER BY date ASC
I'm trying to rewrite this query using CTEs, so I can replace those subqueries with values such as end_date and start_date. Is this possible using this method or should I look for other alternatives like variables? I'm still learning SQL.
WITH end_date AS
(
SELECT date FROM files ORDER BY date DESC LIMIT 1
),
start_date AS
(
SELECT date FROM end_date - INTERVAL '7 day'
)
SELECT date
FROM files
WHERE daterange(
start_date::date,
end_date::date,
'(]') #> date::date
ORDER BY date ASC
Right now I'm getting the following error:
ERROR: syntax error at or near "-"
LINE 7: SELECT date FROM end_date - INTERVAL '7 day'

You do not need two CTEs, it's one just fine, which can be joined to filter data.
WITH RECURSIVE files AS (
SELECT CURRENT_DATE date, 1 some_value
UNION ALL
SELECT (date + interval '1 day')::date, some_value + 1 FROM files
WHERE date < (CURRENT_DATE + interval '1 month')::date
),
dates AS (
SELECT
(MAX(date) - interval '7 day')::date from_date,
MAX(date) to_date
FROM files
)
SELECT f.* FROM files f
JOIN dates d ON daterange(d.from_date, d.to_date, '(]') #> f.date
You even can make it to be a daterange initially in CTE and use it later like this
WITH dates AS (
SELECT
daterange((MAX(date) - interval '7 day')::date, MAX(date), '(]') range
FROM files
)
SELECT f.* FROM files f
JOIN dates d ON d.range #> f.date
Here the first CTE is used just to generate some data.
It will get all file lines for dates in the last week, excluding from_date and including to_date.
date
some_value
2022-09-26
25
2022-09-27
26
2022-09-28
27
2022-09-29
28
2022-09-30
29
2022-10-01
30
2022-10-02
31

I think this is what you want:
WITH end_date AS
(
SELECT date FROM files ORDER BY date DESC LIMIT 1
),
start_date AS
(
SELECT date - INTERVAL '7 day' as date
FROM end_date
)
SELECT F.date, S.date startDate, E.date endDate
FROM files F
JOIN start_date S on F.date >= S.date
JOIN end_date E on F.date <= E.date
ORDER BY date ASC;

I hope I'm not repeating anything, but if I understand your problem correctly I think this will work:
with cte as (
select max (date)::date as max_date from files
)
select date
from files
cross join cte
where date >= max_date - 7
Or perhaps even:
select date
from files
where date >= (select max (date)::date - 7 from files)
Since you have already determined that the CTE has the max date, there is really no need to further bound it with a between, <= or range. You can simply say anything after that date minus 7 days.
The error in your code above is because you want this:
SELECT date - INTERVAL '7 day' as date FROM end_date
And not this:
SELECT date FROM end_date - INTERVAL '7 day'
You are subtracting from the table, which doesn't make sense.

Postgresql Distinct Statement

How can i get the minutes distinct value with timestamp ...
Like , if table contains 1 minute 100 records are there...so i want count of records present or not per minute ...
For example,
SELECT DISTINCT(timestamp) FROM customers WHERE DATE(timestamp) = CURRENT_DATE
Result should be ..like
timestamp record
30-12-2019 11:30 5
30-12-2019 11:31 8

One option would be ::date conversion for timestamp column including GROUP BY :
SELECT timestamp, count(*)
FROM tab
WHERE timestamp::date = current_date
GROUP BY timestamp
Demo for current day
timestamp::date might be replaced with date(timestamp) like in your case.
Update : If the table contains data with precision upto microseconds, then
SELECT to_char(timestamp,'YYYY-MM-DD HH24:MI'), count(*)
FROM tab
WHERE date(timestamp) = current_date
GROUP BY to_char(timestamp,'YYYY-MM-DD HH24:MI')
might be considered.

Try something like the following:
SELECT DATE_TRUNC('minute', timestamp) as timestamp, COUNT(*) as record
FROM customers
WHERE DATE(timestamp) = CURRENT_DATE
GROUP BY DATE_TRUNC('minute', timestamp)
ORDER BY DATE_TRUNC('minute', timestamp)

fails in user acceptance

create table Minutes(Minute varchar2(5));
create table orders(OrderID varchar(54), Orderplaced TIMESTAMP ,
Ordercompleted TIMESTAMP);
insert into orders
VALUES
('#1',TO_TIMESTAMP('2018-01-15 00:12:20', 'YYYY-MM-DD HH24:MI:SS'),
TO_TIMESTAMP( '2018-01-15 00:12:42', 'YYYY-MM-DD HH24:MI:SS'));
insert into orders
VALUES
('#2',TO_TIMESTAMP('2018-01-15 01:15:20', 'YYYY-MM-DD HH24:MI:SS'),
TO_TIMESTAMP( '2018-01-15 02:56:20', 'YYYY-MM-DD HH24:MI:SS'));
insert into orders
VALUES
('#3',TO_TIMESTAMP('2018-01-15 01:20:20', 'YYYY-MM-DD HH24:MI:SS'),
TO_TIMESTAMP( '2018-01-15 03:00:20', 'YYYY-MM-DD HH24:MI:SS'));
insert into Minutes (Minute)
select to_char(trunc(sysdate) + interval '1' minute * (level - 1),
'HH24:MI') as minute
from dual
connect by level <= 1440;
select a.Minute, nvl(count(b.OrderID),0) as orders
from Minutes a
left join orders b
on a.Minute between to_char(cast( b.Orderplaced as date),'hh24:mi:ss') and
to_char(cast( b.Ordercompleted as date),'hh24:mi:ss')
where
a.Minute <= (select to_char(cast (sysdate as date),'hh24:mi:ss') from dual)
group by a.Minute
order by 1;
The processing time is too long and the result is undelivered as well.
It works fine with Integration testing. Please have a look once.

I ran your code, works OK for those test tables. However, I'd suggest a slight modification.
you don't have to CAST values from ORDERS table
Even worse is to CAST SYSDATE AS DATE - SYSDATE is a function that
returns DATE data type anyway
there's no need to select it from DUAL - you can use it "as is"
COUNT will return 0 even if there's nothing to return, so you can
omit NVL function
Here's the modified SELECT statement:
SELECT a.minute, COUNT (b.orderid) AS orders
FROM minutes a
LEFT JOIN orders b
ON a.minute BETWEEN TO_CHAR (b.orderplaced, 'hh24:mi:ss')
AND TO_CHAR (b.ordercompleted, 'hh24:mi:ss')
WHERE a.minute <= TO_CHAR (SYSDATE, 'hh24:mi:ss')
GROUP BY a.minute
ORDER BY 1;
What does it mean for you? I don't know. As I said, it works OK. Explain plan says that it performs full table scan of both MINUTES and ORDERS tables, so - if there's zillion rows in those tables, it might make a difference.
Consider creating indexes on columns you use; as you extract only time from the ORDERS table, those two would be function-based ones.
CREATE INDEX i1_min
ON minutes (minute);
CREATE INDEX i2_plac
ON orders (TO_CHAR (orderplaced, 'hh24:mi:ss'));
CREATE INDEX i3_compl
ON orders (TO_CHAR (ordercompleted, 'hh24:mi:ss'));
Then try again; hopefully, you'll see some improvement.

You said you're trying to get "the number of orders count per minute on a particular day", and later clarified that should be the current day. Your query is only looking at times - converted to strings - so it's looking at the same time slot across all records in your orders table. Really you want to restrict the found orders to the day you're interested in. Presumably your UAT environment just has much more data, across more days, than you created in IT.
You could just add a filter to restrict it to orders placed today:
select a.Minute, nvl(count(b.OrderID),0) as orders
from Minutes a
left join orders b
on a.Minute between to_char(cast( b.Orderplaced as date),'hh24:mi:ss') and
to_char(cast( b.Ordercompleted as date),'hh24:mi:ss')
and b.Orderplaced > trunc(sysdate) -- added this filter
where
a.Minute <= (select to_char(cast (sysdate as date),'hh24:mi:ss') from dual)
group by a.Minute
order by 1;
though you don't need any of the casting or subquery or nvl(), as #Littlefoot mentioned, so can simplify that a bit to:
select a.Minute, count(b.OrderID) as orders
from Minutes a
left join orders b
on a.Minute between to_char(b.Orderplaced,'hh24:mi:ss') and
to_char(b.Ordercompleted,'hh24:mi:ss')
and b.Orderplaced > trunc(sysdate)
where a.Minute <= to_char(sysdate,'hh24:mi:ss')
group by a.Minute
order by 1;
You're still doing a lot of conversions and comparing strings rather than dates/timestamps. It might be simpler to generate the minutes for that specific day in a CTE instead of a permanent table, and join using those values as well, without doing any further data conversions
with minutes (minute) as (
select cast(trunc(sysdate) as timestamp) + interval '1' minute * (level - 1)
from dual
connect by level <= (sysdate - trunc(sysdate)) * 1440
)
select to_char(m.minute, 'HH24:MI') as minute, count(o.orderid) as orders
from minutes m
left join orders o
on o.orderplaced >= cast(trunc(sysdate) as timestamp)
and o.orderplaced <= m.minute
and (o.ordercompleted is null or o.ordercompleted >= m.minute)
group by m.minute
order by m.minute;
I've included rows with no ordercompleted date, though it isn't clear if you want to count those.
You could also join on just the orderplaced date being today, which looks a bit odd, and do a conditional count:
with minutes (minute) as (
select cast(trunc(sysdate) as timestamp) + interval '1' minute * (level - 1)
from dual
connect by level <= (sysdate - trunc(sysdate)) * 1440
)
select to_char(m.minute, 'HH24:MI') as minute,
count(case when o.orderplaced <= m.minute
and (o.ordercompleted is null or o.ordercompleted >= m.minute)
then o.orderid end) as orders
from minutes m
left join orders o
on o.orderplaced >= cast(trunc(sysdate) as timestamp)
group by m.minute
order by m.minute;
Either way this assumes you have an index on orderplaced.
Look at the execution plans for your original query and these options and any others suggested, and test with realistic data, to see which is the best approach for your actual data and requirements.
To look for records for a different, full, day, change the sysdate references to a date/timestamp literal like timestamp '2018-01-15 00:00:00' or something relative like trunc(sysdate-1), and include an end-date on the orderplaced; and remove the end-time filter in the CTE; e.g.:
with minutes (minute) as (
select cast(trunc(sysdate - 1) as timestamp) + interval '1' minute * (level - 1)
from dual
connect by level <= 1440
)
select to_char(m.minute, 'HH24:MI') as minute, count(o.orderid) as orders
from minutes m
left join orders o
on o.orderplaced >= cast(trunc(sysdate - 1) as timestamp)
and o.orderplaced < cast(trunc(sysdate - 1) as timestamp) + interval '1' day
and o.orderplaced <= m.minute
and (o.ordercompleted is null or o.ordercompleted >= m.minute)
group by m.minute
order by m.minute;
or
with minutes (minute) as (
select timestamp '2018-01-15 00:00:00' + interval '1' minute * (level - 1)
from dual
connect by level <= 1440
)
select to_char(m.minute, 'HH24:MI') as minute, count(o.orderid) as orders
from minutes m
left join orders o
on o.orderplaced >= timestamp '2018-01-15 00:00:00'
and o.orderplaced < timestamp '2018-01-16 00:00:00'
and o.orderplaced <= m.minute
and (o.ordercompleted is null or o.ordercompleted >= m.minute)
group by m.minute
order by m.minute;
If you want to include rows where the placed and completed times are in the same minute, but still otherwise want to exclude rows from the minute they were placed, you'll need a bit more logic; maybe something like:
with minutes (minute) as (
select timestamp '2018-01-15 00:00:00' + interval '1' minute * (level - 1)
from dual
connect by level <= 1440
)
select to_char(m.minute, 'HH24:MI') as minute, count(o.orderid) as orders
from minutes m
left join orders o
on o.orderplaced >= timestamp '2018-01-15 00:00:00'
and o.orderplaced < timestamp '2018-01-16 00:00:00'
and ((trunc(o.ordercompleted, 'MI') > trunc(o.orderplaced, 'MI')
and o.orderplaced <= m.minute)
or (trunc(o.ordercompleted, 'MI') = trunc(o.orderplaced, 'MI')
and o.orderplaced < m.minute + interval '1' minute))
and (o.ordercompleted is null or o.ordercompleted >= m.minute)
group by m.minute
order by m.minute;
If you need further refinements you'll need to modify the clauses to suit, which might need a bit of experimentation.

Redshift: Running query using GETDATE() at specified list of times

So, I have a query that uses GETDATE() in WHERE and HAVING clauses:
SELECT GETDATE(), COUNT(*) FROM (
SELECT 1 FROM events
WHERE (event_time > (GETDATE() - interval '25 hours'))
GROUP BY id
HAVING MAX(event_time) BETWEEN (GETDATE() - interval '25 hours') AND (GETDATE() - interval '24 hours')
)
I'm basically trying to find the number of unique ids that have their latest event_time between 25 and 24 hours ago with respect to the current time.
The problem: I have another table query_dts which contains one column containing timestamps. Instead of running the above query on the current time, using GETDATE(), I need to run in on the timestamp of every entry of the query_dts table. Any ideas?
Note: I'm not really storing query_dts anywhere. I've created it like this:
WITH query_dts AS (
SELECT (
DATEADD(hour,-(row_number() over (order by true)), getdate())
) as n
FROM events LIMIT 48
),
which I got from here

How about avoiding the generator altogether and instead just splitting the intervals:
SELECT
dateadd(hour, -distance, getdate()),
count(0) AS event_count
FROM (
SELECT
id,
datediff(hour, max(event_time), getdate()) AS distance
FROM events
WHERE event_time > getdate() - INTERVAL '2 days'
GROUP BY id) AS events_with_distance
GROUP BY distance;

You can use a JOIN to combine the two queries. Then you just need to substitute the values for your date expression. I think this is the logic:
WITH query_dts AS (
SELECT DATEADD(hour, -(row_number() over (order by true)), getdate()) as n
FROM events
LIMIT 48
)
SELECT d.n, COUNT(*)
FROM (SELECT d.n
FROM events e JOIN
query_dts d
WHERE e.event_time > d.n
GROUP BY id
HAVING MAX(event_time) BETWEEN n - interval '25 hours' AND n
) i;

Here's what I ended up doing:
WITH max_time_table AS
(
SELECT id, max(event_time) AS max_time
FROM events
WHERE (event_time > GETDATE() - interval '74 hours')
GROUP BY id
),
query_dts AS
(
SELECT (DATEADD(hour,-(row_number() over (ORDER BY TRUE) - 1), getdate()) ) AS n
FROM events LIMIT 48
)
SELECT query_dts.n, COUNT(*)
FROM max_time_table JOIN query_dts
ON max_time_table.max_time BETWEEN (query_dts.n - interval '25 hours') AND (query_dts.n - interval '24 hours')
GROUP BY query_dts.n
ORDER BY query_dts.n DESC
Here, I selected 74 hours because I wanted 48 hours ago + 25 hours ago = 73 hours ago.
The problem is that this isn't a general-purpose way of doing this. It's a highly specific solution for this particular problem. Can someone think of a more general way of running a query dependent on GETDATE() using a column of dates in another table?

Postgresql get max row group by column

I am trying to get the max row from the sum of daily counts in a table. I have looked at several posts that look similar, however it doesn't seem to work. I have tried to follow
Get MAX row for GROUP in MySQL
but it doesn't work in Postgres. Here's what I have
select source, SUM(steps) as daily_steps, to_char("endTime"::date, 'MM/DD/YYYY') as step_date
from activities
where user_id = 1
and "endTime" <= CURRENT_TIMESTAMP + INTERVAL '1 day'
and "endTime" >= CURRENT_TIMESTAMP - INTERVAL '7 days'
group by source, to_char("endTime"::date, 'MM/DD/YYYY')
This returns the following
source, daily_steps, step_date
"walking";750;"11/17/2015"
"walking";821;"11/22/2015"
"walking";106;"11/20/2015"
"running";234;"11/21/2015"
"running";600;"11/24/2015"
I would like the result to return only the rows that have the max value for daily_steps by source. The result should look like
source, daily_steps, step_date
"walking";821;"11/22/2015"
"running";600;"11/24/2015"

Postgres offers the convenient distinct on syntax:
select distinct on (a.source) a.*
from (select source, SUM(steps) as daily_steps, to_char("endTime"::date, 'MM/DD/YYYY') as step_date
from activities a
where user_id = 1 and
"endTime" <= CURRENT_TIMESTAMP + INTERVAL '1 day' and
"endTime" >= CURRENT_TIMESTAMP - INTERVAL '7 days'
group by source, to_char("endTime"::date, 'MM/DD/YYYY')
) a
order by a.source, daily_steps desc;

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Grab abandoned carters from the last hour in Oracle Responsys - sql

Related

Rewrite PostgreSQL query using CTE:

Postgresql Distinct Statement

fails in user acceptance

Redshift: Running query using GETDATE() at specified list of times

Postgresql get max row group by column

Categories

Resources