reset a countdown column to initial value in postgreSQL - sql

I'm running pgAdmin v5.2 over postgres v13.3
So I hit a wall with this one...
I'm running a query on a flight log table which is constantly updated with new flights data.
In it there is engn_hrs_contdwn column which calculates via window SUM function the accumulated hrs based on each individual flight time (hobs_total) which are then being deducted from 1200 which is a given value upon reaching an engine MUST be replaced.
This is the query that I run:
SELECT fleet.fleet_id,
(flt_log.date || ' '|| flt_log.tkof_01_time)::timestamp AS date,
flt_log.hobs_total,
1200-SUM (hobs_total)OVER (PARTITION BY fleet_id ORDER BY (flt_log.date || ' '|| flt_log.tkof_01_time)::timestamp
ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW) AS engn_hrs_contdwn
FROM flt_log,fleet
WHERE flt_log.aircraft_id = fleet.fleet_id
AND fleet_id = 2
;
fleet_id
date
hobs_total
engn_hrs_contdwn
2
2020-08-09 08:49:00
0.20
1199.80
2
2020-08-09 11:17:00
3.70
1196.10
2
2020-08-09 15:42:00
0.70
1195.40
2
2020-08-09 17:54:00
2.40
1193.00
2
2020-08-12 07:21:00
0.50
1192.50
2
2020-08-13 06:50:00
2.40
1190.10
2
2020-08-13 15:11:00
1.50
1188.60
2
2020-08-13 20:35:00
0.70
1187.90
2
2020-08-14 09:17:00
2.40
1185.50
This query works OK on calculating the remaining hrs but when reaching 0 it then returns negative values which are of course useless for calculating the countdown for a new engine.
My problem is how to reset the countdown initial value back to 1200 every time the engn_hrs_contdwn hit below 0 - so that the engn_hrs_contdwn column will start the countdown for the new engine and so on and so on.
Being novice at postgresql (and programming in general...) I researched this issue over the web and came across RECURSIVE QUERY and CASE FUNCTION which I think maybe the direction I should take for tackling this issue.
But quite honestly I got completely lost going over tutorials on these subjects and failed so far in my efforts.
Any guidance will be much appreciated.

Related

How do you iterate through a data frame based on the value in a row

I have a data frame which I am trying to iterate through, however not based on time, but on an increase of 10 for example
Column A
Column B
12:05
1
13:05
6
14:05
11
15:05
16
so in this case it would return a new data frame with the rows with 1 and 11. How am I able to do this? The different methods that I have tried such as asfreq resample etc. don't seem to work. They say invalid frequency. The reason I think about this is that it is not time based. What is the function that allows me to do this that isn't time based but based on a numerical value such as 10 or 7. I don't want the every nth number, but every time the column value changes by 10 from the last selected value. ex 1 to 11 then if the next values were 12 15 17 21, it would be 21.
here is one way to do it
# do a remainder division, and choose rows where remainder is zero
# offset by the first value, to make calculation simpler
first_val = df.loc[0]['Column B']
df.loc[((df['Column B'] - first_val) % 10).eq(0)]
Column A Column B
0 12:05 1
2 14:05 11

Pandas group by date and get count while removing duplicates

I have a data frame that looks like this:
maid date hour count
0 023f1f5f-37fb-4869-a957-b66b111d808e 2021-08-14 13 2
1 023f1f5f-37fb-4869-a957-b66b111d808e 2021-08-14 15 1
2 0589b8a3-9d33-4db4-b94a-834cc8f46106 2021-08-13 23 14
3 0589b8a3-9d33-4db4-b94a-834cc8f46106 2021-08-14 0 1
4 104010f8-5f57-4f7c-8ad9-5fc3ec0f9f39 2021-08-11 14 2
5 11947b4a-ccf8-48dc-a6a3-925836b3c520 2021-08-13 7 1
I am trying get a count of maid's for each date in such a way that if a maid is included in day 1, I don't want to include in any of the subsequent days. For example, 0589b8a3-9d33-4db4-b94a-834cc8f46106 is present in both 13th as well as 14. I want to include the maid in the count for 13th but not on 14th as it is already included in 13th.
I have written the following code and it works for small data frames:
import pandas as pd
df=pd.read_csv('/home/ubuntu/uniqueSiteId.csv')
umaids=[]
tdf=[]
df['date']=pd.to_datetime(df.date)
df=df.sort_values('date')
df=df[['maid','date']]
df=df.drop_duplicates(['maid','date'])
dts=df['date'].unique()
for dt in dts:
if not umaids:
df1=df[df['date']==dt]
k=df1['maid'].unique()
umaids.extend(k)
dff=df1
fdf=df1.values.tolist()
elif umaids:
dfs=df[df['date']==dt]
df2=dfs[~dfs['maid'].isin(umaids)]
umaids.extend(df2['maid'].unique())
sdf=df2.values.tolist()
tdf.append(sdf)
ftdf = [item for t in tdf for item in t]
ndf=fdf+ftdf
ndf=pd.DataFrame(ndf,columns=['maid','date'])
print(ndf)
Since I have 1000's of data frames and most often my data frame is more than a million rows, the above takes a long time to run. Is there a better way to do this.
The expected output is this:
maid date
0 104010f8-5f57-4f7c-8ad9-5fc3ec0f9f39 2021-08-11
1 0589b8a3-9d33-4db4-b94a-834cc8f46106 2021-08-13
2 11947b4a-ccf8-48dc-a6a3-925836b3c520 2021-08-13
3 023f1f5f-37fb-4869-a957-b66b111d808e 2021-08-14
As per discussion in the comments, the solution is quite simple: sort the dataframe by date and then drop duplicates only by maid. This will keep the first occurence of maid, which also happens to be the first occurence in time since we sorted by date. Then do the groupby as usual.

Creating a Nested/Loop Calculation in Vertica (?)

So maybe I'm just way over-thinking things, but is there any way to replicate a nested/loop calculation in Vertica with just SQL syntax.
Explanation -
In Column AP I have remaining values per month by an attribute key, in column CHANGE_1M I have an attribution value to apply.
The goal is for future values to calculate the preceding Row partition AP*CHANGE_1M, by the subsequent row partition CHANGE_1M to fill in the future AP values.
For reference I have 15,000 Keys Per Period and 60 Periods Per Year in the full-data set.
Sample Calculation
Period 5 =
(Period4_AP * Period5_CHANGE_1M)+Period4_AP
Period 6 =
(((Period4_AP * Period5_CHANGE_1M)+Period4_AP)*Period6_CHANGE_1M)
+
((Period4_AP * Period5_CHANGE_1M)+Period4_AP)
ect.
Sample Data on Top
Expected Results below
Vertica does not have (yet?) the RECURSIVE WITH clause, which you would need for the recursive calculation you seem to be needing here.
Only possible workaround would be tedious: write (or generate, using perl or Python, for example) as many nested queries as you need iterations.
I'll only want to detail this if you want to go down that path.
Long time no see - I should have returned to answer this question earlier.
I got so stuck on thinking of the programmatic way to solve this issue, I inherently forgot it is a math equation, and where you have math functions you have solutions.
Basically this question revolves around doing table multiplication.
The solution is to simply use LOG/LN functions to multiply and convert back using EXP.
Snippet of the simple solve.
Hope this helps other lost souls, don't forget your math background and spiral into a whirlpool of self-defeat.
EXP(SUM(LN(DEGREDATION)) OVER (ORDER BY PERIOD_NUMBER ASC ROWS UNBOUNDED PRECEDING)) AS DEGREDATION_RATE
** Controlled by what factors/attributes you need the data stratified by with a PARTITION
Basically instead of starting at the retention PX/P0, I back into with the degradation P1/P0 - P2/P1 ect.
PERIOD_NUMBER
DEGRADATION
DEGREDATION_RATE
DEGREDATION_RATE x 100000
0
100.00%
100.00%
100000.00
1
57.72%
57.72%
57715.18
2
60.71%
35.04%
35036.59
3
70.84%
24.82%
24820.66
4
76.59%
19.01%
19009.17
5
79.29%
15.07%
15071.79
6
83.27%
12.55%
12550.59
7
82.08%
10.30%
10301.94
8
86.49%
8.91%
8910.59
9
89.60%
7.98%
7984.24
10
86.03%
6.87%
6868.79
11
86.00%
5.91%
5907.16
12
90.52%
5.35%
5347.00
13
91.89%
4.91%
4913.46
14
89.86%
4.41%
4414.99
15
91.96%
4.06%
4060.22
16
89.36%
3.63%
3628.28
17
90.63%
3.29%
3288.13
18
92.45%
3.04%
3039.97
19
94.95%
2.89%
2886.43
20
92.31%
2.66%
2664.40
21
92.11%
2.45%
2454.05
22
93.94%
2.31%
2305.32
23
89.66%
2.07%
2066.84
24
94.12%
1.95%
1945.26
25
95.83%
1.86%
1864.21
26
92.31%
1.72%
1720.81
27
96.97%
1.67%
1668.66
28
90.32%
1.51%
1507.18
29
90.00%
1.36%
1356.46
30
94.44%
1.28%
1281.10
31
94.12%
1.21%
1205.74
32
100.00%
1.21%
1205.74
33
90.91%
1.10%
1096.13
34
90.00%
0.99%
986.52
35
94.44%
0.93%
931.71
36
100.00%
0.93%
931.71

Conditional formatting in webi Rich Client 4.1 of multiple values

I'm in BO 4.1 using a crosstab table. It is summary data based off specific detail information. Example:
Area-Days Late-Order #-Reason
1 - 5 - 12345-Lost
1 - 2 - 843254 - Lost
2 - 4 - 7532384 - Lost
1 - 7 - 12353 - Not home
So the output would be
Area 1 Area 2
Lost 2 1
Not home 1 0
Now for the conditional formatting part, I want it to highlight the Area 1 Lost cell as red because two of the orders are greater than 3 days late.
For whatever reason it seems to not be doing it because it's getting hung up line item 2 because that one is less than 3 days late.
Thank you!
I cheated and created a new object and then summed and did an if statement. Thanks for looking at this.

Daemon to monitor query and send mail conditionally in SQL Server

I've been melting my brains over a peculiar request: execute every two minutes a certain query and if it returns rows, send an e-mail with these. This was already done and delivered, so far so good. The result set of query is like this:
+----+---------------------+
| ID | last_update |
+----+---------------------|
| 21 | 2011-07-20 13:03:21 |
| 32 | 2011-07-20 13:04:31 |
| 43 | 2011-07-20 13:05:27 |
| 54 | 2011-07-20 13:06:41 |
+----+---------------------|
The trouble starts when the user asks me to modify it so the solution so that, e.g., the first time that ID 21 is caught being more than 5 minutes old, the e-mail is sent to a particular set of recipients; the second time, when ID 21 is between 5 and 10 minutes old another set of recipients is chosen. So far it's ok. The gotcha for me is from the third time onwards: the e-mails are now sent each half-hour, instead of every five minutes.
How should I keep track of the status of Mr. ID = 43 ? How would I know if he has already received an e-mail, two or three? And how to ensure that from the third e-mail onwards, the mails are sent each half-hour, instead of the usual 5 minutes?
I get the impression that you think this can be solved with a simple mathematical formula. And it probably can be, as long as your system is reliable.
Every thirty minutes can be seen as 360 degrees, or 2 pi radians, on a harmonic function graph. That's 12 degrees = 1 minute. Let's take cosin for instance:
f(x) = cos(x)
f(x) = cos(elapsedMinutes * 12 degrees)
Where elapsed minutes is the time since the first 30 minute update was due to go out. This should be a constant number of minutes added to the value of last_update.
Since you have a two minute window of error, it will be time to transmit the 30 minute update if the the value of f(x) (above) is between the value you would get at less than one minute before or after the scheduled update. Which would be = cos(1* 12 degrees) = 0.9781476007338056379285667478696.
Bringing it all together, it's time to send a thirty minute update if this SQL expression is true:
COS(RADIANS( 12 * DATEDIFF(minutes,
DATEADD(minutes, constantNumberOfMinutesBetweenSecondAndThirdUpdate, last_update),
CURRENT_TIMESTAMP))) > 0.9781476007338056379285667478696
If you need a wider window than exactly two minutes, just lower this number slightly.