How to drop rows of dataframe with datetime not increasing - pandas

I have a dataframe indexed on datetime, with the following output:
2022-04-08 21:59:49 7651.8 7655.8
2022-04-08 21:59:50 7651.7 7655.7
2022-04-08 21:59:54 7651.7 7655.7
2022-04-08 21:59:55 7651.8 7655.8
2022-04-08 09:47:00 7544.9 7545.9
A valid row has the condition where its datetime value is the same or greater than in previous row (and the first row is always valid).
Therefore, in the extract above, the only invalid row is the last one, as the datetime doesn't meet the above condition.
I have managed to remove the offending row by:
df.drop(df.loc[df.index.to_series().diff() < pd.to_timedelta('0 seconds')].index, inplace=True)
But this looks a little convoluted. Is there a simpler way to achieve this?

df.index.to_series().diff() < pd.to_timedelta('0 seconds') returns a boolean Series, you can use boolean indexing to select rows:
df = df.loc[~(df.index.to_series().diff() < pd.to_timedelta('0 seconds'))]

Related

Date column shifts and is no longer callable

I am using pandas groupby to group duplicate dates by their pm25 values to get one average. However when I use the groupby function, the structure of my dataframe changes, and I can no longer call the 'Date' Column.
Using groupby also changes the structure of my data: instead of being sorted by 1/1/19, 1/2/19, it is sorted by 1/1/19, 1/10/19, 1/11/19.
Here is my current code:
Before using df.groupby my df looks like:
df before groupy
I use groupby:
df.groupby('Date').mean('pm25')
print(df)
df after groupby
And after, I cannot call the 'Date' column anymore or sort the column
print(df['Date'])
Returns just
KeyError: 'Date'
Please help, or please let me know what else I can provide.
Using groupby also changes the structure of my data: instead of being sorted by 1/1/19, 1/2/19, it is sorted by 1/1/19, 1/10/19, 1/11/19.
This is because your Date column type is string not datetime. In string comparison, the third character 1 of 1/10/19 is smaller than the third character 2 of 1/2/19. If you want to keep the original sequence, you can to the following
df['Date'] = pd.to_datetime(df['Date']) # Convert Date column to datetime type
df['Date'] = df['Date'].dt.strftime('%m/%d/%y') # Convert datetime to other formats (but the dtype of column will be string)
And after, I cannot call the 'Date' column anymore or sort the column
This is because after groupby Date column, the returned dataframe will use Date column after groupby as index to represent each group.
pm25
Date
01/01/19 8.50
01/02/19 9.20
01/03/19 7.90
01/04/19 8.90
01/05/19 6.00
After doing df.groupby('Date').mean('pm25'), the returned dataframe above means the mean pm25 value of 01/01/19 group is 8.50, etc.
If you want to retrieve the Date column back from the index, you can reset_index() after groupby,
df.groupby('Date').mean('pm25').reset_index()
which gives
Date pm25
0 01/01/19 8.50
1 01/02/19 9.20
2 01/03/19 7.90
3 01/04/19 8.90
4 01/05/19 6.00
5 01/06/19 6.75
6 01/11/19 8.50
7 01/12/19 9.20
8 01/21/19 9.20
Or specify the as_index argument of pandas.DataFrame.groupby() to False
df.groupby('Date', as_index=False).mean('pm25')

My last upload of data, the timestamp is off by 2 days, how can I fix it?

So I have a table in Snowflake that I uploaded and made the mistake of uploading some data with the incorrect datetime. The first few columns of the table are different json results, that correspond to the date they were fetched.
Example:
name date
0 [{},{},{},{},{}] 8/20/2019
1 [{}] 12/22/2019
2 [{},{},{},{}] 11/15/2019
3 [{},{},{}] 1/10/2019
4 [{},{},{},{}] 12/1/2019
The INSERT was pretty straight forward, I staged the file and inserted it into an already created table.
I want to go in and change the date by two days, and will make a note to change how I am generating the date and data, so I don't have to do this manually anymore.
Is is possible to correct the column for a certain set of ids instead of a date range like I did below?
ALTER TABLE json_date ALTER COLUMN date Where date >"12-01-2019" and date < "12-30-2019" dateadd(day,2,date);
If this is simply a column, then you would use update:
update json_date
set date = dateadd(day, 2, date)
where date > '2019-12-01';

Sqlite second date is greater than first date

In my database I have patient records and in the table have column name Registered_Date in this yyyy-mm-dd format so when I execute query like
SELECT Patient_ID,First_Name,Middle_Name,Last_Name
FROM Patient_Records
WHERE Registered_Date BETWEEN '2019-01-25' AND '2018-10-01'
There is no result...
As you can see the first date is greater than the second date but there is no result, but if the first date is less than the second date it will result all the patient whose registered on the selected date.
Everything works as intended.
From the SQLite documentation :
The BETWEEN operator is logically equivalent to a pair of comparisons. x BETWEEN y AND z is equivalent to x >= y AND x <=z
So if the first date is greater than the second (ie y > z), the condition will always be evaluated as false, no matter what value (x) it is given.

MS Sql extract the day value in a date

In MS SQL I am trying to get the value of day in the current date, eg: today is 25/08/2016, so this value would be the number 25. I would also like to find the value of day in an earlier date (eg: 06/06/2016 so this number would be 06). I am then looking to subtract the second value from the first and determin if the result is a positive or negative value. If it is positive it should do one thing, eg print test A and if it is a negative value it should do something else, eg print test B.
I am new to MS sql and really have no clude how to implement this in the language. Does anyone have any pointers? Much appreciated. Please see my pseudo code:
change = value of days in current date - value of days in previous date
if change is a positive value : print "testA"
if change is 0 or a negaitve value: print "testB"
I am doing this in excel 2010. I have three columns with dates in colA & colB and testA or test B should be printed in colC depending if the value is positive or negative.
Eg data:
colA: 12/02/2016, 06/06/2016, 12/02/2016
colB: 12/05/2016, 12/02/2015, 28/06/2016
If you can able to change your dates format to MM/DD/YYYY then below script may help you
DECLARE #D1 DATETIME ='08/25/2016'
DECLARE #D2 DATETIME ='06/06/2016'
IF DATEPART(D,#D1) - DATEPART(D,#D2) > 0
PRINT 'testA'
ELSE
PRINT 'testB'
Use the below script :
SELECT CASE WHEN (DATEPART(dw,GETDATE())+DATEPART(dw,GETDATE()-1)) > 0 THEN 'testA'
WHEN (DATEPART(dw,GETDATE())+DATEPART(dw,GETDATE()-1)) =0 THEN 'testB' END Result
some additional info.
IF you need a particular DAY/Day Value use the below scripts
SELECT DATENAME(dw,'12/22/2016') --Thursday
SELECT DATEPART(dw,'12/22/2016') --5
IF you need a DAY of month value use the below script
SELECT DATEPART(dd,'12/22/2016') --22

NULL Date Calculation SQL Server

We have 6 Columns at SQL Server table SLA
I tried to add compained column QASLA from the below
Created - Closed Columns are datetime - Not NULL
EsculationDate - [EsculationFeedback]
[InternalEsculationReplay] - [InternalEsculationDate] Columns are
datetime type and Could be NULL values
alter table SLA
add QASLA as
iif((Closed=null),datediff(dd,getdate(),Created),
(datediff(dd,Closed,Created))-datediff(dd,IIF(COALESCE ([EsculationDate],0)>COALESCE ([InternalEsculationDate],0),COALESCE ([InternalEsculationDate],0),COALESCE ([EsculationDate],0)),
IIF(COALESCE ([EsculationFeedback],0)>COALESCE ([InternalEsculationReplay],0),COALESCE ([EsculationFeedback],0),COALESCE ([InternalEsculationReplay],0))))
When i try to insert a new record
insert into [dbo].[SLA]
([Created],[EsculationDate],[EsculationFeedback],[Closed])
values('10-Jun-15','10-Jun-15','15-Jun-15','15-Jun-15')
QASLA Result = -42173
I need to get value 0 as created date = 10 Jun and Closed Date = 15 Jun mins (EsculationDate+EsculationFeedback)
I try to use ISNULL also
Not quite sure I understand your question but you IIF is useless here because it will never be equal to NULL. You cannot compare a value to NULL and must use IS NULL:
alter table SLA
add QASLA as
iif((Closed is null),datediff(dd,getdate(),Created),
(datediff(dd,Closed,Created))-datediff(dd,IIF(COALESCE ([EsculationDate],0)>COALESCE ([InternalEsculationDate],0),COALESCE ([InternalEsculationDate],0),COALESCE ([EsculationDate],0)),
IIF(COALESCE ([EsculationFeedback],0)>COALESCE ([InternalEsculationReplay],0),COALESCE ([EsculationFeedback],0),COALESCE ([InternalEsculationReplay],0))))
Some of your DATEDIFF statements appear to have the dates the "wrong way around". You should have the earlier date as the second argument and the later date as the third argument to return the positive days between them.
Also, you are treating NULL dates as 0 in many cases, where 0 will be interpreted as a "very early date".
I would have expected your handling of NULLs to work very differently here? You can't use equality comparison between a column and NULL, you need to use the IS NULL statement instead, e.g. "Closed = NULL" becomes "Closed IS NULL".
Finally, you say Closed Date is NOT NULL, but the very first thing you do in your script is to compare it to NULL.