While loop is iterating through when the value is false - while-loop

I'm working with a list of stock candles and creating filler candles for the minutes that don't have any volume. to do this I'm comparing the current candle with the previous candle plus one minute, and if it's true I'm looping through duplicating the previous minute and changing the time. When the code is executed the while loop is executed once every iteration causing there to be two of every minute. I can't seem to figure out the issue with this method.
clean_candles.append(data.pop(0))
for candle in data:
while candle["t"] > clean_candles[-1]["t"]+epoch_minute:
if to_datetime(clean_candles[-1]["t"],time=True)[-5:] == "20:00" :
candle["t"] == to_epoch(to_datetime(candle["t"]) + ' 4:00',time=True)
print("made 4am Candle")
break
clean_candles.append(candle)
clean_candles[-1]["t"] = clean_candles[-2]["t"]+epoch_minute
clean_candles[-1]["v"] = 0
clean_candles[-1]["n"] = 0
print(f"added minute filler {clean_candles[-1]['t']}")
clean_candles.append(candle)
print(f"added minute {clean_candles[-1]['t']}")
clean_candles = pd.DataFrame(clean_candles)
clean_candles.head(40) ```

Related

Using Pandas and Numpy to search for conditions within binned data in 2 data frames

Python newbie here. Here's a simplified example of my problem. I have 2 pandas dataframes.
One dataframe lightbulb_df has data on whether a light is on or off and looks something like this:
Light_Time
Light On?
5790.76
0
5790.76
0
5790.771
1
5790.779
1
5790.779
1
5790.782
0
5790.783
1
5790.783
1
5790.784
0
Where the time is in seconds since start of day and 1 is the lightbulb is on, 0 means the lightbulb is off.
The second dataframe sensor_df shows whether or not a sensor detected the lightbulb and has different time values and rates.
Sensor_Time
Sensor Detect?
5790.8
0
5790.9
0
5791.0
1
5791.1
1
5791.2
1
5791.3
0
Both dataframes are very large with 100,000s of rows. The lightbulb will turn on for a few minutes and then turn off, then back on, etc.
Using the .diff function, I was able to compare each row to its predecessor and depending on whether the result was 1 or -1 create a truth table with simplified on and off times and append it to lightbulb_df.
# use .diff() to compare each row to the last row
lightbulb_df['light_diff'] = lightbulb_df['Light On?'].diff()
# the light on start times are when
#.diff is less than 0 (0 - 1 = -1)
light_start = lightbulb_df.loc[lightbulb_df['light_diff'] < 0]
# the light off start times (first times when light turns off)
# are when .diff is greater than 0 (1 - 0 = 1)
light_off = lightbulb_df.loc[lightbulb_df['light_diff'] > 0]
# and then I can concatenate them to have
# a single changed state df that only captures when the lightbulb changes
lightbulb_changes = pd.concat((light_start, light_off)).sort_values(by=['Light_Time'])
So I end up with a dataframe of on start times, a dataframe of off start times, and a change state dataframe that looks like this.
Light_Time
Light On?
light_diff
5790.771
1
1
5790.782
0
-1
5790.783
1
1
5790.784
0
-1
Now my goal is to search the sensor_df dataframe during each of the changed state times (above 5790.771 to 5790.782 and 5790.783 to 5790.784) by 1 second intervals to see whether or not the sensor detected the lightbulb. So I want to end up with the number of seconds the lightbulb was on and the number of seconds the sensor detected the lightbulb for each of the many light on periods in the change state dataframe. I'm trying to get % correctly detected.
Whenever I try to plan this out, I end up using lots of nested for loops or while loops which I know will be really slow with 100,000s of rows of data. I thought about using the .cut function to divide up the dataframe into 1 second intervals. I made a for loop to cycle through each of the times in the changed state dataframe and then nested a while loop inside to loop through 1 second intervals but that seems like it would be really slow.
I know python has a lot of built in functions that could help but I'm having trouble knowing what to google to find the right one.
Any advice would be appreciated.

Comparing timedelta fields

I am looking at file delivery times and can't work out how to compare two timedelta fields using a for loop if statement.
time_diff is the difference between cob_date and last_update_time
average_diff is based on the average for a particular file
I want to find the delay for each row.
I have been able to produce a column delay using average_diff - time_diff
However, when the average_diff - time_diff < 0 I just want to return delay = 0 as this is not a delay.
I have made a for loop but this isn't working and I don't know why. I'm sure the answer is very simple but I can't get there.
test_pv_import_v2['delay2'] = pd.to_timedelta('0')
for index, row in test_pv_import_v2.iterrows():
if test_pv_import_v2['time_diff'] > test_pv_import_v2['average_diff'] :
test_pv_import_v2['delay2'] = test_pv_import_v2['time_diff'] - test_pv_import_v2['average_diff']
Use Series.where for set 0 Timedelta by condition:
mask = test_pv_import_v2['time_diff'] > test_pv_import_v2['average_diff']
s = (test_pv_import_v2['time_diff'] - test_pv_import_v2['average_diff'])
test_pv_import_v2['delay2'] = s.where(mask, pd.to_timedelta('0'))

How to get the lowest low of a series in PineScript

I'm trying to get the lowest low of a series of candles after a condition, but it always returns the last candle of the condition. I try with min(), lowest() and a for loop but it doesn't work. Also try using blackCandle[] and min(ThreeinARow)/lowest(ThreeinARow) and sometimes it returns the last candle and other times it gives me compilation error.
blackCandle = close < open
ThreeinARow = blackCandle[3] and blackCandle[2] and blackCandle[1]
SL = ThreeinARow ? min(low[1], low[2], low[3]) : na
//#version=4
study("Help (low after 3DownBar)", overlay=true, max_bars_back=100)
blackCandle = close < open
ThreeinARow = blackCandle[3] and blackCandle[2] and blackCandle[1]
bar_ind = barssince(ThreeinARow)
//SL = lowest(max(1, nz(bar_ind))) // the lowest low of a series of candles after the condition
SL = lowest(max(1, nz(bar_ind)+1)) // the lowest low of a series of candles since the condition
plot(SL, style=plot.style_cross, linewidth=3)
bgcolor(ThreeinARow ? color.silver : na)
See also the second solution which is in the commented line
It seems that I was misinterpreting it. Using min() does return the minimum of a series of candles. The detail is that I must enter the specific number of candles that I will use to calculate the minimum, which, for now, does not generate any problem for me. In the end, this is how I ended up writing it:
blackCandle = close < open
ThreeinARow = blackCandle[3] and blackCandle[2] and blackCandle[1]
Lowest_Low = if ThreeinARow
min(low[1], low[2], low[3])
plot(Lowest_Low, color=color.red)

Make a plot by occurence of a col by hour of a second col

I have this df :
and i would like to make a graph by half hour of how many row i have by half hour without including the day.
Just a graph with number of occurence by half hour not including the day.
3272 8711600410367 2019-03-11T20:23:45.415Z d7ec8e9c5b5df11df8ec7ee130552944 home 2019-03-11T20:23:45.415Z DISPLAY None
3273 8711600410367 2019-03-11T20:23:51.072Z d7ec8e9c5b5df11df8ec7ee130552944 home 2019-03-11T20:23:51.072Z DISPLAY None
Here is my try :
df["Created"] = pd.to_datetime(df["Created"])
df.groupby(df.Created.dt.hour).size().plot()
But it's not by half hour
I would like to show all half hour on my graph
One way you could do this is split up coding for hours and half-hours, and then bring them together. To illustrate, I extended your data example a bit:
import pandas as pd
df = pd.DataFrame({'Created':['2019-03-11T20:23:45.415Z', '2019-03-11T20:23:51.072Z', '2019-03-11T20:33:03.072Z', '2019-03-11T21:10:10.072Z']})
df["Created"] = pd.to_datetime(df["Created"])
First create a 'Hours column':
df['Hours'] = df.Created.dt.hour
Then create a column that codes half hours. That is, if the minutes are greater than 30, count it as half hour.
df['HalfHours'] = [0.5 if x>30 else 0 for x in df.Created.dt.minute]
Then bring them together again:
df['Hours_and_HalfHours'] = df['Hours']+df['HalfHours']
Finally, count the number of rows by groupby, and plot:
df.groupby(df['Hours_and_HalfHours']).size().plot()

Update stimulus attribute every ... ms or frame in PsychoPy

I'm trying to update the orientation of a gratingStim every 100 ms or so in the psychopy coder. Currently, I'm updating the attribute (or trying to) with these lines :
orientationArray = orientation.split(',') #reading csv line as a list
selectOri = 0 #my tool to select the searched value in the list
gabor.ori = int(orientationArray[selectOri]) #select value as function of the "selectOri", in this case always the first one
continueroutine = True
while continueroutine:
if timer == 0.1: # This doesn't work but it shows you what is planned
selectOri = selectOri + 1 #update value
gabor.ori = int(orientationArray[selectOri]) #update value
win.flip()
I can't find a proper way to update in a desired time frame.
A neat way to do something every x frames is to use the modulo operation in combination with a loop containin win.flip(). So if you want to do something every 6 frames (100 ms on a 60 Hz monitor), just do this in every frame:
frame = 0 # the current frame number
while continueroutine:
if frame % 6 == 0: # % is modulo. Here every sixth frame
gabor.ori = int(orientationArray[selectOri + 1])
# Run this every iteration to synchronize the while-loop with the monitor's frames.
gabor.draw()
win.flip()
frame += 1