How to know the frequency of persited row in a database Table - sql

I'm working on a web application with a postgresql database. We have a table that contains rows persisted by another app.
These rows arrived from Kafka everyday.
This is the structure of this table:
creation_date status
...
...
20/07/2021 11:47 CREATED
20/07/2021 11:47 CREATED
20/07/2021 11:47 TREATED
20/07/2021 11:45 TREATED
20/07/2021 11:45 TREATED
20/07/2021 11:44 TREATED
20/07/2021 11:44 TREATED
...
...
19/07/2021 16:20 TREATED
19/07/2021 16:16 TREATED
...
...
19/07/2021 08:10 TREATED
19/07/2021 08:08 ERROR
19/07/2021 08:05 TREATED
19/07/2021 08:04 TREATED
19/07/2021 08:02 ERROR
...
...
so everyday we have new rows persisted .
My issue is that I want to use these rows to know what is the interval where we receive rows everyday
(ex: between 7am and 9am or 8am and 10am...).
I want to know this interval in order to execute everyday an SQL query to count the rows in this X interval.
If the count is 0 I will send an alert to say there is a problem and we don't receive events today else everything is okay.
Do you have any idea how I can know this time interval using the previous data?
Regards.

Related

Averaging the data across two calendar years and defining the beginning month

I have a data for a period from December 2013 to November 2018. I converted it into a data frame as shown here.
Date 0.1 0.2 0.3 0.4 0.5 0.6
2013-12-01 301.04 297.4 296.63 295.76 295.25 295.25
2013-12-04 297.96 297.15 296.25 295.25 294.43 293.45
2013-12-05 298.4 297.61 296.65 295.81 294.75 293.89
2013-12-08 298.82 297.95 297.15 296.25 295.45 294.41
2013-12-09 298.65 297.65 296.95 296.02 295.13 294.05
2013-12-12 299.05 297.33 296.65 295.81 294.85 293.85
2013-12-16 301.05 300.28 299.38 298.45 297.65 296.51
....
2014-01-10 301.65 297.45 296.46 295.52 294.65 293.56
2014-01-11 301.99 298.95 298.39 297.15 296.05 295.11
2014-01-12 299.86 298.65 297.73 296.82 296.35 295.37
2014-01-13 299.25 298.15 297.3 296.43 295.26 294.31
I want to take monthly mean and seasonal mean of this data.
For monthly mean I have tried
df.resample('M').mean()
And it worked well.
For seasons, I would like decompose this data into 4 seasons (December-Feb; Mar-May; June-Aug; and Sep-Nov) of three months interval. While I tried the resample with 3 months interval. i.e.
df.resample('3M').mean()
However this is not worked well as it giving the average for the starting December month separately and then considering the above said interval for a calendar year (ie. from January to March and so on).
I would like to know if there are any possible ways to avoid this by specifying which month is our period of consideration begins.
Moreover, I would also like to know whether we can define these seasons beforehand and group the data accordingly to get averages with more ease.
You can define the origin in resample:
df.resample('M', origin=pd.Timestamp('2013-12-01')).mean()

create additional date, time and timezone columns based on the date and state columns in python dataframe

I have 3 columns called 'customer_state','call_date' and 'call_time' in my dataframe and I want to create 3 new columns 'customer_timezone' ,'customer_date' and 'customer_time'
Possible values for timezone are
Eastern Standard Time (EST)
Central Standard Time (CST)
Mountain Standard Time (MST)
Pacific Standard Time (PST)
Note: call_time is in Mountain Standard time and in 24 hours format
My dataframe looks like below :
call_date call_time customer_state
2019-11-01 13:46 MD
How my resultant dataframe should look like:
call_date call_time customer_state customer_timezone customer_date customer_time
2019-11-01 13:46 MD EST 2019-11-01 16:46
Any help is appreciated!
Additional note: To simplify this solution, my data only has 'call_time' within 6am and 4pm MST. So, I don't have to worry about changing the dates (for instance, if it is 9pm in MST, then it would be 12 am next day in EST). I do not have to worry about these edge cases. Infact 'call_date' and 'customer_date' would always be the same in my scenario. I just need to add +3 hours to the time

Stored time as number. Problem with formatting - 00 as seconds

Time is stored as a number in my database - NUMBER(6)
When it comes to past-midnight time, it is stored like this 1800. So basically 00:18:00
I'm using lpad to have it in 6 characters lpad(TIME,6,0) and it gives me 001800.
However, when I want to cast it into timestamp TO_TIMESTAMP(lpad(TIME,6,0), 'HH24MISS') it gives me the only first day of the month.
2019-07-01
When time is 000713(00:07:13), it is converted without problems
2019-07-01 00:07:13
What seems to be the issue?

Salesforce REST API get a list of updated records

I want to have a list of records under accounts object from a particular time to a particular time
say for example this my api query:
https://ax1.salesforce.com/services/data/v29.0/sobjects/Account/updated/?start=2015-06-30T06%3A49%3A00%2B00%3A00&end=2015-06-30T16%3A30%3A26%2B00%3A00
In my salesforce the time I have chosen is Indian Time which is UTC + 5:30
I created an account at 16:45 pm on 30th June in Indian time(as per salesforce this time is shown in the created by field of the account)
but for the above query in which I have chosen the start time and end time is 06:49 AM and 16:30 PM respectively
I got the record id which I have added at 16 45 pm Indian time but it shouldn't come in the response
The following is the response:
"ids": [
"0019000001QeOINAA3"
],
"latestDateCovered": "2015-06-30T09:00:00.000+0000"
}
Also the latestDateCovered it says 09 AM only
I don't understand this system
Could somebody help me on how this works?
The API REST API will be working with UTC DateTime values.
So you searched for records between:
2015-06-30 06:49:00 UTC
2015-06-30 16:30:26 UTC
Which I make to be:
2015-06-30 12:19:00 IST
2015-06-30 22:00:26 IST
So it would make sense that a record created at 16:45 pm on the 30th of June IST would appear in the results.
Try checking the SystemModStamp and LastModifiedDate fields in the API for the record in question, as the values will also be in UTC there.

Is there anyway to generate dynamic time table test data?

I'm on a project for a railway company, now I want to test my application with some data. Here's the problem, the data I need is like a time table like below:
departure_location arrival_location departure_time arrival_time
a b 2015-04-01 08:00 2015-04-01 08:30
b c 2015-04-01 09:10 2015-04-01 09:44
c d 2015-04-01 10:05 2015-04-01 12:00
...
The difficulty is that the generated timetable should be logical. That means the the arrival time of each stage should be later than the departure time but earlier than the departure time of next stage. And I want the ability that I can specify a time range, and the dynamic timetable will just be generated beautifully. I'll be the best If I can even configure how much time a train runs from source to destination and how long it stays on the arrival location until it departure again.
The BullSheet Generator can work with "logical" events, it's a php tool.
There is an example of code here:
https://github.com/lingtalfi/BullSheet#the-populator-script-a-full-database-example-demo
(Look for the "the_events" expression).
The worflow is called timelines and is explained here: https://github.com/lingtalfi/BullSheet/blob/master/docs/timelines.md