I'm new to sql ,I use pandas a lot ,but my boss asks me to use sql replace lots of pandas code.
I have a table my_table:
case_id first_created last_paid submitted_time
3456 2021-01-27 2021-01-29 2021-01-26 21:34:36.566023+00:00
7891 2021-08-02 2021-09-16 2022-10-26 19:49:14.135585+00:00
1245 2021-09-13 None 2022-10-31 02:03:59.620348+00:00
9073 None None 2021-09-12 10:25:30.845687+00:00
6891 2021-08-03 2021-09-17 None
First I need create 2 variables:
create_duration = first_created-submitted_time
paid_duration= last_paid-submitted_time
And if submitted_time is none just ignore that row ,else if create_duration or paid_duration
is negative ,convert it to 0,the unit should be days.
The ideal output should something similliar :
case_id first_created last_paid submitted_time create_duration paid_duration
3456 2021-01-27 2021-01-29 2021-01-26 21:34:36.566023+00:00 1 3
7891 2021-08-02 2021-09-16 2022-10-26 19:49:14.135585+00:00 0 0
1245 2021-09-13 None 2022-10-31 02:03:59.620348+00:00 0 null
9073 None None 2021-09-12 10:25:30.845687+00:00 null null
6891 2021-08-03 2021-09-17 null null null
My code:
select * from my_table
first_created-submitted_time as create_duration
last_paid-submitted_time as paid_duration
I have to say I'm too bad at sql,I have no idea how to continue,any friend can help ?
Related
I have the simple select script and it generates following audit table.
SELECT *
FROM Mytable
WHERE File = '123456A'
Output:
ID
File
StatusA
StatusB
User
UpdateDate
1
123456A
A
0
Tom
2021-01-01
12
123456A
B
0
Jack
2021-01-05
19
123456A
A
1
Alicia
2021-02-09
56
123456A
B
1
Jason
2021-03-09
87
123456A
A
1
Jason
2021-03-10
107
123456A
B
0
Ellie
2021-03-26
203
123456A
A
0
lucy
2021-04-08
239
123456A
B
1
Ellie
2021-04-16
I am trying to retrieve the rows when only column StatusB is changed. So it will generates the table like this.
SELECT *
FROM Mytable
WHERE File = '123456A'
-AND StatusB is changed
ID
File
StatusA
StatusB
User
UpdateDate
1
123456A
A
0
Tom
2021-01-01
19
123456A
A
1
Alicia
2021-02-09
107
123456A
B
0
Ellie
2021-03-26
239
123456A
B
1
Ellie
2021-04-16
In this case, I can see Alicia and Ellie changed the column StatusB. I am still thinking how to accomplish this goal.
Thanks,
-Ming
You can use lag():
select t.*
from (select t.*,
lag(statusB) over (order by updatedate) as prev_statusB
from Mytable t
where File = '123456A'
) t
where prev_statusB is null or prev_statusB <> statusB;
I am trying to get difference between two date columns below script and data used in script, but I am getting same results for all three rows
df = pd.read_csv(r'Book1.csv',encoding='cp1252')
df
Out[36]:
Start End DifferenceinDays DifferenceinHrs
0 10/26/2013 12:43 12/15/2014 0:04 409 9816
1 2/3/2014 12:43 3/25/2015 0:04 412 9888
2 5/14/2014 12:43 7/3/2015 0:04 409 9816
I am expecting results as in column DifferenceinDays which is calculated in excel but in python getting same values for all three rows, Please refer to below code used, can anyone let me know how is to calculate difference between 2 date column, I am trying to get number of hours between two date columns.
df["Start"] = pd.to_datetime(df['Start'])
df["End"] = pd.to_datetime(df['End'])
df['hrs']=(df.End-df.Start)
df['hrs']
Out[38]:
0 414 days 11:21:00
1 414 days 11:21:00
2 414 days 11:21:00
Name: hrs, dtype: timedelta64[ns]
IIUC, np.timedelta64(1,'h')
Additionally, it looks like excel calculates the hours differently, unsure why.
import numpy as np
df['hrs'] = (df['End'] - df['Start']) / np.timedelta64(1,'h')
print(df)
Start End DifferenceinHrs hrs
0 2013-10-26 12:43:00 2014-12-15 00:04:00 9816 9947.35
1 2014-02-03 12:43:00 2015-03-25 00:04:00 9888 9947.35
2 2014-05-14 12:43:00 2015-07-03 00:04:00 9816 9947.35
I have a data frame as shown below
Tenancy_ID Start_Date Cancelled_Date
1 2011-10-02 07:18:16 2011-12-02 08:15:16
2 2012-10-22 07:18:17 NaT
1 2013-06-02 07:14:12 NaT
3 2016-10-02 07:18:16 2017-03-02 08:18:15
From the above I would like to create new column named as Cancelled_Status based on the availability of cancelled date in Cancelled_Date.
Expected Output:
Tenancy_ID Start_Date Cancelled_Date Cancelled_status
1 2011-10-02 07:18:16 2011-12-02 08:15:16 Cancelled
2 2012-10-22 07:18:17 NaT Not_Cancelled
1 2013-06-02 07:14:12 NaT Not_Cancelled
3 2016-10-02 07:18:16 2017-03-02 08:18:15 Cancelled
Use numpy.where with Series.isna:
df['Cancelled_status'] = np.where(df['Cancelled_Date'].isna(), 'Not_Cancelled', 'Cancelled')
Alternative with
Series.notna:
df['Cancelled_status'] = np.where(df['Cancelled_Date'].notna(), 'Cancelled', 'Not_Cancelled')
May I ask for some help?
I need to calculate the months between the order dates for the same product ID.
I have the following data set
ORDER_NUM PRODUCT_ID ORDER_DATE
111111 222222 2015-05-20 18:30:38
111112 222223 2015-12-03 19:25:23
111113 222224 2015-12-30 18:16:25
111114 222225 2015-10-30 12:32:06
111115 222226 2015-12-26 16:14:33
111116 222227 2016-03-08 10:23:39
111117 222224 2015-10-01 09:04:56
111118 222223 2015-04-21 11:48:03
111119 222228 2015-11-14 10:00:38
111120 222229 2016-03-22 10:42:32
111121 222230 2015-11-10 12:14:41
111122 222231 2015-11-24 10:05:40
111123 222222 2015-12-05 12:18:28
111124 222232 2015-12-07 11:23:53
111125 222233 2015-07-17 10:47:54
111126 222234 2016-02-08 11:59:30
111127 222235 2015-11-08 15:40:08
111128 222223 2015-09-24 11:16:03
111129 222236 2015-11-09 12:30:04
where ORDER_NUM is unique value, PRODUCT_ID may appear many times and time also.
I need the result to be like:
ORDER_NUM PRODUCT_ID MONTHS_BETWEEN
111111 222222 0
111112 222223 2
111113 222224 3
111114 222225 0
111115 222226 0
111116 222227 0
111117 222224 0
111118 222223 0
111119 222228 0
111120 222229 0
111121 222230 0
111122 222231 0
111123 222222 7
111124 222232 0
111125 222233 0
111126 222234 0
111127 222235 0
111128 222223 5
111129 222236 0
The first appearance of PRODUCT_ID should have “0” value in MONTHS_BETWEEN and each next should have value the months between the current and the previous.
I am not sure that I managed to explain very well …
Please help…
You can use months_between() and lead():
select t.*,
months_between(lead(order_date() over (partition by product_id order by order_date)),
order_date
) as MonthsBetween
from t;
Notes:
This returns a number with decimal places. You might want to use trunc() or round() to get an integer.
This returns NULL when there is no "next" order. You can use COALESCE() to convert that to 0 (or something else) if you like.
To be honest, I can't tell if you want lead() or lag() (time to the next order or from the previous one). Your data is not ordered by date, making it hard to figure out the right ordering. But, you want one or the other.
I am working with a Raiser's Edge database using SQL Server 2005. I have written SQL that will produce a temporary table containing details of direct debit instalments. Below is a small table containing the key variables for the question I'm going to ask, with some fictional data:
Donor_ID Instalment_ID Instalment_Date Amount
1234 1111 01/01/2011 £5.00
1234 1112 01/02/2011 £0.00
1234 1113 01/03/2011 £5.00
1234 1114 01/04/2011 £5.00
1234 1115 01/05/2011 £0.00
1234 1116 01/06/2011 £0.00
2345 2111 01/01/2011 £0.00
2345 2112 01/02/2011 £5.00
2345 2113 01/03/2011 £5.00
2345 2114 01/04/2011 £0.00
2345 2115 01/05/2011 £0.00
2345 2116 01/06/2011 £0.00
As you will see, some of the values in the Amount column are £0.00. This can occur when a donor has insufficient funds in their account, for example.
What I'd like to do is write a SQL query that will create a field containing an incremental count of consecutive £0.00 payments that resets after a non-£0.00 payment or after a change in Donor_ID. I have reproduced the above data below, with the field I'd like to see.
Donor_ID Instalment_ID Instalment_Date Amount New_Field
1234 1111 01/01/2011 £5.00
1234 1112 01/02/2011 £0.00 1
1234 1113 01/03/2011 £5.00
1234 1114 01/04/2011 £5.00
1234 1115 01/05/2011 £0.00 1
1234 1116 01/06/2011 £0.00 2
2345 2111 01/01/2011 £0.00 1
2345 2112 01/02/2011 £5.00
2345 2113 01/03/2011 £5.00
2345 2114 01/04/2011 £0.00 1
2345 2115 01/05/2011 £0.00 2
2345 2116 01/06/2011 £0.00 3
To help clarify what I'm looking for, I think what I'm looking to do would be similar to a winning streak field on a list of a football team's results. For example:
Opponent Score Winning_Streak
Arsenal 1-0 1
Liverpool 0-0
Swansea 3-1 1
Chelsea 2-1 2
Fulham 4-0 3
Stoke 0-0
Man Utd 1-3
Reading 2-1 1
I've considered various options, but have made no progress. Unless I've missed something obvious, I think that a solution more advanced than my current SQL programming level might be required.
If I am thinking about this problem correctly, I believe that you want a row number when the Amount is 0.00 pounds.
Select 0 as As InsufficientCount
, Donor_ID
, Installment_ID
, Amount
From [Table]
Where Amount > 0.00
Union
Select Row_Number() Over (Partition By Donor_ID Order By Installment_ID)
, Donor_ID
, Installment_ID
, Amount
From [Table]
Where Amount = 0.00
This union select should only give you 'ranks' where the Amount equals 0.
Am calling your new field streakAmount
ALTER TABLE instalments ADD streakAmount int NULL;
Then, to update the value:
UPDATE instalments
SET streakAmount =
(SELECT
COUNT(*)
FROM
instalments streak
WHERE
streak.donor_id = instalments.donor_id
AND
streak.instalment_date <= instalments.instalment_date
AND
(streak.instalment_date >
-- find previous instalment date, if any exists
COALESCE(
(
SELECT
MAX(instalment_date)
FROM
instalments prev
WHERE
prev.donor_id = instalments.donor_id
AND
prev.amount > 0
AND
prev.instalment_date < instalments.instalment_date
)
-- otherwise min date
, cast('1753-1-1' AS date))
)
)
WHERE
amount = 0;
http://sqlfiddle.com/#!6/a571f/18