fails in user acceptance - sql

create table Minutes(Minute varchar2(5));
create table orders(OrderID varchar(54), Orderplaced TIMESTAMP ,
Ordercompleted TIMESTAMP);
insert into orders
VALUES
('#1',TO_TIMESTAMP('2018-01-15 00:12:20', 'YYYY-MM-DD HH24:MI:SS'),
TO_TIMESTAMP( '2018-01-15 00:12:42', 'YYYY-MM-DD HH24:MI:SS'));
insert into orders
VALUES
('#2',TO_TIMESTAMP('2018-01-15 01:15:20', 'YYYY-MM-DD HH24:MI:SS'),
TO_TIMESTAMP( '2018-01-15 02:56:20', 'YYYY-MM-DD HH24:MI:SS'));
insert into orders
VALUES
('#3',TO_TIMESTAMP('2018-01-15 01:20:20', 'YYYY-MM-DD HH24:MI:SS'),
TO_TIMESTAMP( '2018-01-15 03:00:20', 'YYYY-MM-DD HH24:MI:SS'));
insert into Minutes (Minute)
select to_char(trunc(sysdate) + interval '1' minute * (level - 1),
'HH24:MI') as minute
from dual
connect by level <= 1440;
select a.Minute, nvl(count(b.OrderID),0) as orders
from Minutes a
left join orders b
on a.Minute between to_char(cast( b.Orderplaced as date),'hh24:mi:ss') and
to_char(cast( b.Ordercompleted as date),'hh24:mi:ss')
where
a.Minute <= (select to_char(cast (sysdate as date),'hh24:mi:ss') from dual)
group by a.Minute
order by 1;
The processing time is too long and the result is undelivered as well.
It works fine with Integration testing. Please have a look once.

I ran your code, works OK for those test tables. However, I'd suggest a slight modification.
you don't have to CAST values from ORDERS table
Even worse is to CAST SYSDATE AS DATE - SYSDATE is a function that
returns DATE data type anyway
there's no need to select it from DUAL - you can use it "as is"
COUNT will return 0 even if there's nothing to return, so you can
omit NVL function
Here's the modified SELECT statement:
SELECT a.minute, COUNT (b.orderid) AS orders
FROM minutes a
LEFT JOIN orders b
ON a.minute BETWEEN TO_CHAR (b.orderplaced, 'hh24:mi:ss')
AND TO_CHAR (b.ordercompleted, 'hh24:mi:ss')
WHERE a.minute <= TO_CHAR (SYSDATE, 'hh24:mi:ss')
GROUP BY a.minute
ORDER BY 1;
What does it mean for you? I don't know. As I said, it works OK. Explain plan says that it performs full table scan of both MINUTES and ORDERS tables, so - if there's zillion rows in those tables, it might make a difference.
Consider creating indexes on columns you use; as you extract only time from the ORDERS table, those two would be function-based ones.
CREATE INDEX i1_min
ON minutes (minute);
CREATE INDEX i2_plac
ON orders (TO_CHAR (orderplaced, 'hh24:mi:ss'));
CREATE INDEX i3_compl
ON orders (TO_CHAR (ordercompleted, 'hh24:mi:ss'));
Then try again; hopefully, you'll see some improvement.

You said you're trying to get "the number of orders count per minute on a particular day", and later clarified that should be the current day. Your query is only looking at times - converted to strings - so it's looking at the same time slot across all records in your orders table. Really you want to restrict the found orders to the day you're interested in. Presumably your UAT environment just has much more data, across more days, than you created in IT.
You could just add a filter to restrict it to orders placed today:
select a.Minute, nvl(count(b.OrderID),0) as orders
from Minutes a
left join orders b
on a.Minute between to_char(cast( b.Orderplaced as date),'hh24:mi:ss') and
to_char(cast( b.Ordercompleted as date),'hh24:mi:ss')
and b.Orderplaced > trunc(sysdate) -- added this filter
where
a.Minute <= (select to_char(cast (sysdate as date),'hh24:mi:ss') from dual)
group by a.Minute
order by 1;
though you don't need any of the casting or subquery or nvl(), as #Littlefoot mentioned, so can simplify that a bit to:
select a.Minute, count(b.OrderID) as orders
from Minutes a
left join orders b
on a.Minute between to_char(b.Orderplaced,'hh24:mi:ss') and
to_char(b.Ordercompleted,'hh24:mi:ss')
and b.Orderplaced > trunc(sysdate)
where a.Minute <= to_char(sysdate,'hh24:mi:ss')
group by a.Minute
order by 1;
You're still doing a lot of conversions and comparing strings rather than dates/timestamps. It might be simpler to generate the minutes for that specific day in a CTE instead of a permanent table, and join using those values as well, without doing any further data conversions
with minutes (minute) as (
select cast(trunc(sysdate) as timestamp) + interval '1' minute * (level - 1)
from dual
connect by level <= (sysdate - trunc(sysdate)) * 1440
)
select to_char(m.minute, 'HH24:MI') as minute, count(o.orderid) as orders
from minutes m
left join orders o
on o.orderplaced >= cast(trunc(sysdate) as timestamp)
and o.orderplaced <= m.minute
and (o.ordercompleted is null or o.ordercompleted >= m.minute)
group by m.minute
order by m.minute;
I've included rows with no ordercompleted date, though it isn't clear if you want to count those.
You could also join on just the orderplaced date being today, which looks a bit odd, and do a conditional count:
with minutes (minute) as (
select cast(trunc(sysdate) as timestamp) + interval '1' minute * (level - 1)
from dual
connect by level <= (sysdate - trunc(sysdate)) * 1440
)
select to_char(m.minute, 'HH24:MI') as minute,
count(case when o.orderplaced <= m.minute
and (o.ordercompleted is null or o.ordercompleted >= m.minute)
then o.orderid end) as orders
from minutes m
left join orders o
on o.orderplaced >= cast(trunc(sysdate) as timestamp)
group by m.minute
order by m.minute;
Either way this assumes you have an index on orderplaced.
Look at the execution plans for your original query and these options and any others suggested, and test with realistic data, to see which is the best approach for your actual data and requirements.
To look for records for a different, full, day, change the sysdate references to a date/timestamp literal like timestamp '2018-01-15 00:00:00' or something relative like trunc(sysdate-1), and include an end-date on the orderplaced; and remove the end-time filter in the CTE; e.g.:
with minutes (minute) as (
select cast(trunc(sysdate - 1) as timestamp) + interval '1' minute * (level - 1)
from dual
connect by level <= 1440
)
select to_char(m.minute, 'HH24:MI') as minute, count(o.orderid) as orders
from minutes m
left join orders o
on o.orderplaced >= cast(trunc(sysdate - 1) as timestamp)
and o.orderplaced < cast(trunc(sysdate - 1) as timestamp) + interval '1' day
and o.orderplaced <= m.minute
and (o.ordercompleted is null or o.ordercompleted >= m.minute)
group by m.minute
order by m.minute;
or
with minutes (minute) as (
select timestamp '2018-01-15 00:00:00' + interval '1' minute * (level - 1)
from dual
connect by level <= 1440
)
select to_char(m.minute, 'HH24:MI') as minute, count(o.orderid) as orders
from minutes m
left join orders o
on o.orderplaced >= timestamp '2018-01-15 00:00:00'
and o.orderplaced < timestamp '2018-01-16 00:00:00'
and o.orderplaced <= m.minute
and (o.ordercompleted is null or o.ordercompleted >= m.minute)
group by m.minute
order by m.minute;
If you want to include rows where the placed and completed times are in the same minute, but still otherwise want to exclude rows from the minute they were placed, you'll need a bit more logic; maybe something like:
with minutes (minute) as (
select timestamp '2018-01-15 00:00:00' + interval '1' minute * (level - 1)
from dual
connect by level <= 1440
)
select to_char(m.minute, 'HH24:MI') as minute, count(o.orderid) as orders
from minutes m
left join orders o
on o.orderplaced >= timestamp '2018-01-15 00:00:00'
and o.orderplaced < timestamp '2018-01-16 00:00:00'
and ((trunc(o.ordercompleted, 'MI') > trunc(o.orderplaced, 'MI')
and o.orderplaced <= m.minute)
or (trunc(o.ordercompleted, 'MI') = trunc(o.orderplaced, 'MI')
and o.orderplaced < m.minute + interval '1' minute))
and (o.ordercompleted is null or o.ordercompleted >= m.minute)
group by m.minute
order by m.minute;
If you need further refinements you'll need to modify the clauses to suit, which might need a bit of experimentation.

Related

how to get date different in postgres using date_part option

How to get date time difference in PostgreSQL
I am using below syntax
select id, A_column,B_column,
(SELECT count(*) AS count_days_no_weekend
FROM generate_series(B_column ::timestamp , A_column ::timestamp, interval '1 day') the_day
WHERE extract('ISODOW' FROM the_day) < 5) * 24 + DATE_PART('hour', B_column::timestamp-A_column ::timestamp ) as hrs
FROM table req where id='123';
If A_column=2020-05-20 00:00:00 and B_column=2020-05-15 00:00:00 I want to get 72(in hours).
Is there any possibility to skip weekends(Saturday and Sunday) in first one, it means to get the result as 72 hours(exclude weekend hours)
i am getting 0
But i need to get 72 hours
And if If A_column=2020-08-15 12:00:00 and B_column=2020-08-15 00:00:00 I want to get 12(in hours).
One option uses a lateral join and generate_series() to enumerate each and every hour between the two timestamps, while filtering out week-ends:
select t.a_column, t.b_column, h.count_hours_no_weekend
from mytable t
cross join lateral (
select count(*) count_hours_no_weekend
from generate_series(t.b_column::timestamp, t.a_column::timestamp, interval '1 hour') s(col)
where extract('isodow' from s.col) < 5
) h
where id = 123
I would attack this by calculating the weekend hours to let the database deal with daylight savings time. I would then subtract the intervening weekend hours from the difference between the two date values.
with weekend_days as (
select *, date_part('isodow', ddate) as dow
from table1
cross join lateral
generate_series(
date_trunc('day', b_column),
date_trunc('day', a_column),
interval '1 day') as gs(ddate)
where date_part('isodow', ddate) in (6, 7)
), weekend_time as (
select id,
sum(
least(ddate + interval '1 day', a_column) -
greatest(ddate, b_column)
) as we_ival
from weekend_days
group by id
)
select t.id,
a_column - b_column as raw_difference,
coalesce(we_ival, interval '0') as adjustment,
a_column - b_column -
coalesce(we_ival, interval '0') as adj_difference
from weekend_time w
left join table1 t on t.id = w.id;
Working fiddle.

Speed up query where results with count(*) = 0 are included

I have a table squitters with, amongst others, a column parsed_time. I want to know the number of records per hour for the last two days and used this query:
SELECT date_trunc('hour', parsed_time) AS hour , count(*)
FROM squitters
WHERE parsed_time > date_trunc('hour', now()) - interval '2 day'
GROUP BY hour
ORDER BY hour DESC;
This works, but hours with zero records do not appear in the result. I want to have hours
with zero records also in the result with a count equal to zero, so I wrote this query using the generate_series function:
SELECT bins.hour, count(squitters.parsed_time)
FROM generate_series(date_trunc('hour', now() - interval '2 day'), now(), '1 hour') bins(hour)
LEFT OUTER JOIN squitters ON bins.hour = date_trunc('hours', squitters.parsed_time)
GROUP BY bins.hour
ORDER BY bins.hour DESC;
This works, in the results are hour-bins with counts equal to zero, but is considerably slower.
How can I have the speed of the first query with the count=zero results of the second query?
(btw. there is an index on parsed_time)
You could try and change the join condition so no date function is applied on column parsed_time:
SELECT b.hour, COUNT(s.parsed_time) cnt
FROM generate_series(date_trunc('hour', now() - interval '2 day'), now(), '1 hour') b(hour)
LEFT OUTER JOIN squitters s
ON s.parsed_time >= b.hour
AND s.parsed_time < b.hours + interval '1 hour'
GROUP BY b.hour
ORDER BY b.hour DESC;
Alternatively, you could also try using a correlated subquery (or a lateral join) instead of a left join - this avoids the need for outer aggregation:
SELECT
b.hour,
(
SELECT COUNT(*)
FROM squitters s
WHERE s.parsed_time >= b.hour AND s.parsed_time < b.hours + interval '1 hour'
) cnt
FROM generate_series(date_trunc('hour', now() - interval '2 day'), now(), '1 hour') b(hour)
ORDER BY b.hour desc
You could take advantage of Common Table Expressions to divide your problem into small chunks:
WITH cte AS (
--First query your table
SELECT date_trunc('hour', parsed_time) AS sq_hour , count(*)
FROM squitters
WHERE parsed_time > date_trunc('hour', now()) - interval '2 day'
GROUP BY hour
ORDER BY hour DESC
), series AS (
--Create the series without the data returned from 1st query
SELECT
bins.series_hour,
0
FROM
generate_series(date_trunc('hour', now() - interval '2 day'), now(), '1 hour') bins(series_hour)
WHERE
series_hour not in (SELECT sq_hour FROM cte)
)
--Union the result
SELECT * FROM cte
UNION
SELECT * FROM series
ORDER BY 1

get count of records in every hour in the last 24 hour

i need number of records in each hour in the last 24 hours, i need my query to show 0 if there are no records in any of the particular hour for that day, i am just able to get data for hours that are in table.
SELECT TRUNC(systemdate,'HH24') + (trunc(to_char(systemdate,'mi')/10)*10)/24/60 AS date1,
count(*) AS txncount
FROM transactionlog
GROUP BY TRUNC(systemdate,'HH24') + (trunc(to_char(systemdate,'mi')/10)*10)/24/60 order by date1 desc;
result:
What should i do to get data in each hour of the last 24 hours?
Expected data:
record count in each hour for last 24 hours , starting from current date time.. if no record exist in that particular hour, 0 is shown.
The following might be what you need. It seems to work when I run it against the all_objects view.
WITH date_range
AS (SELECT TRUNC(sysdate - (rownum/24),'HH24') as the_hour
FROM dual
CONNECT BY ROWNUM <= 1000),
the_data
AS (SELECT TRUNC(created, 'HH24') as cr_ddl, count(*) as num_obj
FROM all_objects
GROUP BY TRUNC(created, 'HH24'))
SELECT TO_CHAR(dr.the_hour,'DD/MM/YYYY HH:MI AM'), NVL(num_obj,0)
FROM date_range dr LEFT OUTER JOIN the_data ao
ON ao.cr_ddl = dr.the_hour
ORDER BY dr.the_hour DESC
The 'date_range' generates a record for each hour over the past 24.
The 'the_data' does a count of the number of records in your target table based on the date truncated to the hour.
The main query then outer joins the two of them showing the date and the count from the sub-query.
I prefer both parts of the query in their own CTE because it makes the actual query very obvious and 'clean'.
In terms of your query you want this;
WITH date_range
AS (SELECT TRUNC(sysdate - (rownum/24),'HH24') as the_hour
FROM dual
CONNECT BY ROWNUM <= 24),
the_data
AS (SELECT TRUNC(systemdate, 'HH24') as log_date, count(*) as num_obj
FROM transactionlog
GROUP BY TRUNC(systemdate, 'HH24'))
SELECT TO_CHAR(dr.the_hour,'DD/MM/YYYY HH:MI AM'), NVL(trans_log.num_obj,0)
FROM date_range dr LEFT OUTER JOIN the_data trans_log
ON trans_log.log_date = dr.the_hour
ORDER BY dr.the_hour DESC
You could use this:
WITH transactionlog AS
(
SELECT TO_DATE('03/05/2018 01:12','dd/mm/yyyy hh24:mi') AS systemdate, 60 AS value
FROM dual UNION ALL
SELECT TO_DATE('03/05/2018 01:32','dd/mm/yyyy hh24:mi'), 35 FROM dual UNION ALL
SELECT TO_DATE('03/05/2018 09:44','dd/mm/yyyy hh24:mi'), 31 FROM dual UNION ALL
SELECT TO_DATE('03/05/2018 08:56','dd/mm/yyyy hh24:mi'), 24 FROM dual UNION ALL
SELECT TO_DATE('03/05/2018 08:02','dd/mm/yyyy hh24:mi'), 98 FROM dual
)
, time_range AS
(
SELECT TRUNC(sysdate, 'hh24') - 23/24 + (ROWNUM - 1) / 24 AS time1
FROM all_objects
WHERE ROWNUM <= 24
)
SELECT TO_CHAR(r.time1, 'mm/dd/yyyy hh:mi AM') AS date1,
COUNT(t.systemdate) AS txncount
FROM time_range r
LEFT JOIN transactionlog t
ON r.time1 = TRUNC(t.systemdate, 'hh24') --+ 1/24
GROUP BY r.time1
ORDER BY r.time1;
If 01:12 AM means 02:00 AM in result, then omit the comment code.
Reference: Generating Dates between two date ranges_AskTOM
Edited: For OP, you only need this:
WITH time_range AS
(
SELECT TRUNC(sysdate, 'hh24') - 23/24 + (ROWNUM - 1) / 24 AS time1
FROM all_objects
WHERE ROWNUM <= 24
)
SELECT TO_CHAR(r.time1, 'mm/dd/yyyy hh:mi AM') AS date1,
COUNT(t.systemdate) AS txncount
FROM time_range r
LEFT JOIN transactionlog t
ON r.time1 = TRUNC(t.systemdate, 'hh24') --+ 1/24
GROUP BY r.time1
ORDER BY r.time1;
You need to write a last 24 hours calendar table,then LEFT JOIN calendar table by Original table.
count(t.systemdate) need to count t.systemdate because t.systemdate might be NULL
connect by create last 24 hours calendar table
on clause TO_CHAR(t.systemdate,'YYYY/MM/DD hh24','nls_language=american') make sure the dateformat language are the same.
You can try this.
WITH Hours as
(
select sysdate + (level/24) dates
from dual
connect by level <= 24
)
SELECT TO_CHAR(h.dates,'YYYY-MM-DD HH24') AS dateHour, count(t.systemdate) AS totlecount
FROM Hours h
LEFT JOIN transactionlog t
on TO_CHAR(t.systemdate,'YYYY/MM/DD hh24','nls_language=american')
= TO_CHAR(h.dates,'YYYY/MM/DD hh24','nls_language=american')
GROUP BY h.dates
ORDER BY h.dates
sqlfiddle:http://sqlfiddle.com/#!4/73db7/2
CTE Recursive Version
You can also use CTE Recursive to write a calendar table
WITH Hours(dates,i) as
(
SELECT sysdate,1
FROM DUAL
UNION ALL
SELECT sysdate + (i/24),i+1
FROM Hours
WHERE i<24
)
SELECT TO_CHAR(h.dates,'YYYY-MM-DD HH24') AS dateHour, count(t.systemdate) AS totlecount
FROM Hours h
LEFT JOIN transactionlog t
on TO_CHAR(t.systemdate,'YYYY/MM/DD hh24','nls_language=american')
= TO_CHAR(h.dates,'YYYY/MM/DD hh24','nls_language=american')
GROUP BY h.dates
ORDER BY h.dates
sqlfiddle:http://sqlfiddle.com/#!4/73db7/7

Grab abandoned carters from the last hour in Oracle Responsys

I'm trying to grab people out of a table who have an abandon date between 20 minutes ago and 2 hours ago. This seems to grab the right amount of time, but is all 4 hours old:
SELECT *
FROM $A$
WHERE ABANDONDATE >= SYSDATE - INTERVAL '2' HOUR
AND ABANDONDATE < SYSDATE - INTERVAL '20' MINUTE
AND EMAIL_ADDRESS_ NOT IN(SELECT EMAIL_ADDRESS_ FROM $B$ WHERE ORDERDATE >= sysdate - 4)
also, it grabs every record for everyone and I only want the most recent product abandoned (highest abandondate) for each email address. I can't seem to figure this one out.
If the results are EXACTLY four hours old, it is possible that there is a time zone mismatch. What is the EXACT data type of ABANDONDATE in your database? Perhaps TIMESTAMP WITH TIMEZONE? Four hours seems like the difference between UTC and EDT (Eastern U.S. with daylight savings time offset).
For your other question, did you EXPECT your query to only pick up the most recent product abandoned? Which part of your query would do that? Instead, you need to add row_number() over (partition by [whatever identifies clients etc.] order by abandondate), make the resulting query into a subquery and wrap it within an outer query where you filter by (WHERE clause) rn = 1. We can help with this if you show us the table structure (name and data type of columns in the table - only the relevant columns - including which is or are Primary Key).
Try
SELECT * FROM (
SELECT t.*,
row_number()
over (PARTITION BY email_address__ ORDER BY ABANDONDATE DESC) As RN
FROM $A$ t
WHERE ABANDONDATE >= SYSDATE - INTERVAL '2' HOUR
AND ABANDONDATE < SYSDATE - INTERVAL '20' MINUTE
AND EMAIL_ADDRESS_ NOT IN(
SELECT EMAIL_ADDRESS_ FROM $B$
WHERE ORDERDATE >= sysdate - 4)
)
WHERE rn = 1
another approach
SELECT *
FROM $A$
WHERE (EMAIL_ADDRESS_, ABANDONDATE) IN (
SELECT EMAIL_ADDRESS_, MAX( ABANDONDATE )
FROM $A$
WHERE ABANDONDATE >= SYSDATE - INTERVAL '2' HOUR
AND ABANDONDATE < SYSDATE - INTERVAL '20' MINUTE
AND EMAIL_ADDRESS_ NOT IN(
SELECT EMAIL_ADDRESS_ FROM $B$
WHERE ORDERDATE >= sysdate - 4)
GROUP BY EMAIL_ADDRESS_
)

Postgresql get max row group by column

I am trying to get the max row from the sum of daily counts in a table. I have looked at several posts that look similar, however it doesn't seem to work. I have tried to follow
Get MAX row for GROUP in MySQL
but it doesn't work in Postgres. Here's what I have
select source, SUM(steps) as daily_steps, to_char("endTime"::date, 'MM/DD/YYYY') as step_date
from activities
where user_id = 1
and "endTime" <= CURRENT_TIMESTAMP + INTERVAL '1 day'
and "endTime" >= CURRENT_TIMESTAMP - INTERVAL '7 days'
group by source, to_char("endTime"::date, 'MM/DD/YYYY')
This returns the following
source, daily_steps, step_date
"walking";750;"11/17/2015"
"walking";821;"11/22/2015"
"walking";106;"11/20/2015"
"running";234;"11/21/2015"
"running";600;"11/24/2015"
I would like the result to return only the rows that have the max value for daily_steps by source. The result should look like
source, daily_steps, step_date
"walking";821;"11/22/2015"
"running";600;"11/24/2015"
Postgres offers the convenient distinct on syntax:
select distinct on (a.source) a.*
from (select source, SUM(steps) as daily_steps, to_char("endTime"::date, 'MM/DD/YYYY') as step_date
from activities a
where user_id = 1 and
"endTime" <= CURRENT_TIMESTAMP + INTERVAL '1 day' and
"endTime" >= CURRENT_TIMESTAMP - INTERVAL '7 days'
group by source, to_char("endTime"::date, 'MM/DD/YYYY')
) a
order by a.source, daily_steps desc;