Retrieve mean time between failure within given date range parameter - sql

Below is what i have tried,
select machine_id, count(incident_id) "No_Incident",
(fail_date BETWEEN (24 * to_date('&From_date_', 'DDMMYYYY') AND to_date('&To_Date','DDMMYYYY') / count(incident_id))) "MTBF"
from mytable;

This will work for you:
SELECT machine_id, count(incident_id) "No_Incident",
MAX(ROUND((ABS((TO_DATE('01-02-2019','DD-MM-YYYY') -
TO_DATE('03-02-2019'||' 23:59:59','DD-MM-YYYY HH24:MI:SS'))*24))))
/count(incident_id) AS MTBF
FROM
mytable;

You may consider the following examples:
since machine_id is not being aggregated (eg. being counted), I have used the group by to provide the Mean Time Between failure (MTBF) for each machine. If you would like for all machines, simple remove machine_id from the SELECT clause and the GROUP BY machine_id
You had a syntax error with your query as you were querying the range of dates using the between in your select clause. NB. I have modified this and placed it in a where clause. Based on how you execute your query (eq sql clients), how you may handle parameters may differ.
Using Oracle
SELECT
machine_id,
count(incident_id) as "No_Incidents",
(
EXTRACT(
HOUR FROM
CAST(MAX(fail_date) AS TIMESTAMP) - CAST(MIN(fail_date) AS TIMESTAMP)
) +
EXTRACT(
DAY FROM
CAST(MAX(fail_date) AS TIMESTAMP) - CAST(MIN(fail_date) AS TIMESTAMP)
) * 24
)/count(incident_id) as "MTBF"
FROM
mytable
WHERE
fail_date BETWEEN to_date('&From_date_', 'DDMMYYYY') AND to_date('&To_Date','DDMMYYYY')
GROUP BY
machine_id
In the oracle example, I converted the dates using CAST to a TIMESTAMP before finding the difference (date_max - date_min gives a date interval ). Extracting (using EXTRACT) and summing the hours (hours and days*24) gives the total hours.
Using MySQL
SELECT
machine_id,
count(incident_id) as "No_Incidents",
(
TIMESTAMPDIFF(HOUR,min(fail_date),max(fail_date))
)/count(incident_id) as "MTBF"
FROM
mytable
WHERE
fail_date BETWEEN to_date('&From_date_', 'DDMMYYYY') AND to_date('&To_Date','DDMMYYYY')
GROUP BY
machine_id
I have also used the TIMESTAMPDIFF in MYSQL function to determine the difference between the dates in hours.

Related

How to get difference between two dates in days in postgres

Following is a query in oracle.
SELECT start_date - TO_DATE('1900-01-01','YYYY-MM-DD') FROM start_table
In oracle it gives the output 44680.3646, where start_date is 01-MAY-22.
what query would require to form to get the same output in EDB and postgresql
If you want to get the fractional part of a day, then you need to convert each value to number of seconds using EXTRACT(EPOCH FROM ...) and divide by 86400(number of seconds in 1 day) and then find the difference of the results.
SELECT extract(epoch from '2022-05-01 11:44:16'::timestamp - '1900-05-02'::timestamp) / 86400 as date
Result: 44559.489074074074
Demo in DBfiddle

Converting date format number to date and taking difference in SQL

I have a data set as below,
Same is date in "YYYYMMDD" format, I wanted to convert the columns to date format and take the difference between the same.
I used to below code
SELECT to_date(statement_date_key::text, 'yyyymmdd') AS statement_date,
to_date(paid_date_key::text, 'yyyymmdd') AS paid_date,
statement_date - paid_date AS Diff_in_days
FROM Table
WHERE Diff_in_days >= 90
;
Idea is to convert both the columns to dates, take the difference between them and filter cases where difference in days is more than 90.
Later I was informed that server is supported by HiveSQL and does not support of using ":", date time, and temp tables can not be created.
I'm currently stuck on how to go about given the constraints.
Help would be much appreciated.
Sample date for reference is provided in the link
dbfiddle
Hive is a little convoluted in its use of dates. You can use unix_timestamp() and work from there:
SELECT datediff(to_date(unix_timestamp(cast(statement_date_key as varchar(10)), 'yyyyMMdd')),
to_date(unix_timestamp(cast(paid_date_key as varchar(10)), 'yyyyMMdd'))
) as diff_in_days
FROM Table;
Note that you need to use a subquery if you want to use diff_in_days in a where clause.
Also, if you have date keys, then presumably you also have a calendar table, which should make this much simpler.
Hello You Can Use Below Query It Work Well
select * from (
select convert(date, statement_date_key) AS statement_date,
convert(date, paid_date) AS paid_date,
datediff(D, convert(date, statement_date_key), convert(date, paid_date)) as Diff_in_days
from Table
) qry
where Diff_in_days >= 90
Simple way: Function unix_timestamp(string, pattern) converts string in given format to seconds passed from unix epoch, calculate difference in seconds then divide by (60*60*24) to get difference in days.
select * from
(
select t.*,
(unix_timestamp(string(paid_date_key), 'yyyyMMdd') -
unix_timestamp(string(statement_date_key), 'yyyyMMdd'))/86400 as Diff_in_days
from Table t
) t
where Diff_in_days>=90
You may want to add abs() if the difference can be negative.
One more method using regexp_replace:
select * from
(
select t.*,
datediff(date(regexp_replace(string(paid_date_key), '(\\d{4})(\\d{2})(\\d{2})','$1-$2-$3')),
date(regexp_replace(string(statement_date_key), '(\\d{4})(\\d{2})(\\d{2})','$1-$2-$3'))) as Diff_in_days
from Table t
) t
where Diff_in_days>=90

How to Calculate avg no of records added per day in BigQuery.?

I have a table in BigQuery having a column Published_date with a datatype of "Timestamp". I want to calculate avg no of rows added per day (for a specific month) in that table. I have the following query
SELECT AVG(Num_Rows)
FROM (SELECT [Day]=DAY( Published_Date ), Num_Rows=COUNT(*)
FROM `mytable`
WHERE Published_Date BETWEEN '20190729' AND '20190729 '
GROUP BY DAY( Published_Date ) ) AS Z
But its generating the following error
Could not cast literal "20190729" to type TIMESTAMP
How should I deal with timestamp because I only need the date from timestamp column?
I want to calculate avg no of rows added per day (for a specific month) in that table
Below example for BigQuery Standard SQL
#standardSQL
SELECT AVG(Num_Rows) AS avg_rows_per_day
FROM (
SELECT DATE(Published_Date) AS day, COUNT(*) AS Num_Rows
FROM `project.dataset.mytable`
WHERE DATE(Published_Date) BETWEEN '2019-07-01' AND '2019-07-31'
GROUP BY day
)
Use explicit conversion:
WHERE Published_Date BETWEEN TIMESTAMP('2019-07-29') AND TIMESTAMP('2019-07-29')
Note that you have a column called "_date", but the error is saying that the value is a timestamp. I find this confusing. We use a convention of using _ts in columns that are timestamps (and _dt for datetime and _date for date).
Why is this important? The timestamp is UTC. So you might need to be careful about timezones and time components -- which is not obvious in a column called Publish_Date.

How can I extract just the hour of a timestamp using standardSQL

How can I extract just the hour of a timestamp using standardSQL.
I've tried everything and no function works. The problem is that I have to extract the time from a column and this column is in the following format:2018-07-09T02:40:23.652Z
If I just put the date, it works, but if I put the column it gives the error below:
Syntax error: Expected ")" but got identifier "searchIntention" at [4:32]
Follow the query below:
#standardSQL
select TOTAL, dia, hora FROM
(SELECT cast(replace(replace(searchIntention.createdDate,'T',' '),'Z','')as
DateTime) AS DIA,
FORMAT_DATETIME("%k", DATETIME searchIntention.createdDate) as HORA,
count(searchintention.id) as Total
from `searchs.searchs2016626`
GROUP BY DIA)
Please, help me. :(
How can I extract just the hour of a timestamp using standardSQL?
Below is for BigQuery Standard SQL
You can use EXTRACT(HOUR FROM yourTimeStampColumn)
for example:
SELECT EXTRACT(HOUR FROM CURRENT_TIMESTAMP())
or
SELECT EXTRACT(HOUR FROM TIMESTAMP '2018-07-09T02:40:23.652Z')
or
SELECT EXTRACT(HOUR FROM TIMESTAMP('2018-07-09T02:40:23.652Z'))
In BigQuery Standard SQL, you can use the EXTRACT timestamp function in order to return an INT64 value corresponding to the part of the timestamp that you want to retrieve, like.
The available parts includes a full list that you can check in the documentation page linked, but in your use case you can directly refer to the HOUR operator in order to retrieve the INT64 representation of the hour value in a field of TIMESTAMP type.
#standardSQL
# Create a table
WITH table AS (
SELECT TIMESTAMP("2018-07-09T02:40:23.652Z") time
)
# Extract values from a Timestamp expression
SELECT
EXTRACT(DAY FROM time) as day,
EXTRACT(MONTH FROM time) as month,
EXTRACT(YEAR FROM time) as year,
EXTRACT(HOUR FROM time) AS hour,
EXTRACT(MINUTE FROM time) as minute,
EXTRACT(SECOND from time) as second
FROM
table

PostgreSQL: SELECT * from table WHERE timestamp IS WITHIN THIS MONTH

For some reason I'm kind of lost on how to archive:
SELECT * FROM table WHERE timestamp IS WITHIN THIS MONTH;
I've looked at https://www.postgresql.org/docs/9.4/static/functions-datetime.html, but are only able to select X days backwards.
I'm running PostgreSQL 9.4
... WHERE date_trunc('month', timestamp)
= date_trunc('month', current_timestamp);
Alternatively:
... WHERE timestamp >= date_trunc('month', current_timestamp)
AND timestamp < date_trunc('month', current_timestamp) + INTERVAL '1 month';
The second version can use an index on timestamp, the first would need one on the expression date_trunc('month', timestamp).
Why don't you just filter the month with between ?
Pass the start of this month as variable1, and the end of this month as variable2...
SELECT * FROM table WHERE
timestamp >= __month_start AND timestamp < __next_month_start
e.g.
SELECT * FROM table
WHERE
(
timestamp >= '20170701'::timestamp
AND
timestamp < '20170801'::timestamp
)
Unlike using functions in the where-clause, this maintains sargability.
What Laurenz Albe suggested will work, however you're going to have a performance penalty because you'll lose cardinality on that field, you either have to index expression you're going to query (Apparently PostgreSQL allows to do that: https://www.postgresql.org/docs/current/static/indexes-expressional.html) or create a separate column to store yyyy-mm values and query it.