Pandas to slice date alike string and rearrange to date format - pandas

A column in the dataframe looks like DD/MM/YYYY format.
I want to slice it and rearrange to MM/DD/YYYY (for calculation)
I have tried:
import pandas as pd
from io import StringIO
csvfile = StringIO("""
DD/MM/YYYY
01/05/2020
21/02/2021
19/06/2021
05/06/2021
11/06/2021
10/05/2021
")
df = pd.read_csv(csvfile, sep = ',', engine='python')
df['DD/MM/YYYY'] = df['DD/MM/YYYY'].astype(str)
df['MM/DD/YYYY'] = df['DD/MM/YYYY'][3:5] + '/' + df['DD/MM/YYYY'][:2] + '/' + df['DD/MM/YYYY'][-4:]
# df['MM/DD/YYYY'] = pd.to_datetime(df['DD/MM/YYYY'][3:5] + '/' + df['DD/MM/YYYY'][:2] + '/' + df['DD/MM/YYYY'][-4:])
print (df)
But it doesn't work. What would be the right way to write it? Thank you!

Use .str:
df['MM/DD/YYYY'] = df['DD/MM/YYYY'].str[3:5] + '/' + df['DD/MM/YYYY'].str[:2] + '/' + df['DD/MM/YYYY'].str[-4:]
If possible you can parse datetimes by original format in specified in format='%d/%m/%Y' and then add Series.dt.strftime:
df['MM/DD/YYYY'] = pd.to_datetime(df['DD/MM/YYYY'], format='%d/%m/%Y').st.strftime('%m/%d/%Y')

Related

numpy/pandas - why the selected the element from list are the same by random.choice

there is a list which contains integer values.
list=[1,2,3,.....]
then I use np.random.choice function to select a random element and add it to the a existing dataframe column, please refer to below code
df.message = df.message.astype(str) + "rowNumber=" + '"' + str(np.random.choice(list)) + '"'
But the element selected by np.random.choice and appended to the message column are always the same for all message row.
What is issue here?
Expected result is that the selected element from the list is not the same.
Pass to np.random.choice with parameter size and convert values to strings:
df = pd.DataFrame(
{'message' : ['aa','bb','cc']})
L = [1,2,3,4,5]
df.message = (df.message.astype(str) + "rowNumber=" + '"' +
np.random.choice(L, size=len(df)).astype(str) + '"')
print (df)
message
0 aarowNumber="4"
1 bbrowNumber="2"
2 ccrowNumber="5"

From pandas dataframe save to csv file without double quotes

I have excel file with special character. I want to write the DataFrame without the double quote, but receive error. Help is very much appreciated.
To generate operation commands from excel to text format
from pandas import DataFrame
import pandas as pd
filename = r'In_file.xlsx'
df = pd.read_excel(filename, header=None)
df1 = df[0] + ' ' + df[1] + ' ' + df[2]
df1.to_csv('out_file3.txt', index=False, header=False, quoting=csv.QUOTE_NONE)
Error:
NameError Traceback (most recent call last)
<ipython-input-9-70ff5701bfb8> in <module>
9 df1 = df[0] + ' ' + df[1] + ' ' + df[2]
10
---> 11 df1.to_csv('out_file3.txt', index=False, header=False, quoting=csv.QUOTE_NONE)
> NameError: name 'csv' is not defined
You're missing the csv package import:
import csv # <- HERE!
from pandas import DataFrame
import pandas as pd
filename = r'In_file.xlsx'
df = pd.read_excel(filename, header=None)
df1 = df[0] + ' ' + df[1] + ' ' + df[2]
df1.to_csv('out_file3.txt', index=False, header=False, quoting=csv.QUOTE_NONE)

Create Dataframe name from 2 strings or variables pandas

i am extracting selected pages from a pdf file. and want to assign dataframe name based on the pages extracted:
file = "abc"
selected_pages = ['10','11'] #can be any combination eg ['6','14','20]
for i in selected_pages():
df{str(i)} = read_pdf(path + file + ".pdf",encoding = 'ISO-8859-1', stream = True,area = [100,10,740,950],pages= (i), index = False)
print (df{str(i)} )
The idea, ultimately, as in above example, is to have dataframes: df10, df11. I have tried "df" + str(i), "df" & str(i) & df{str(i)}. however all are giving error msg: SyntaxError: invalid syntax
Or any better way of doing it is most welcome. thanks
This is where a dictionary would be a much better option.
Also note the error you have at the start of the loop. selected_pages is a list, so you can't do selected_pages().
file = "abc"
selected_pages = ['10','11'] #can be any combination eg ['6','14','20]
df = {}
for i in selected_pages:
df[i] = read_pdf(path + file + ".pdf",encoding = 'ISO-8859-1', stream = True, area = [100,10,740,950], pages= (i), index = False)
i = int(i) - 1 # this will bring it to 10
dfB = df[str(i)]
#select row number to drop: 0:4
dfB.drop(dfB.index[0:4],axis =0, inplace = True)
dfB.columns = ['col1','col2','col3','col4','col5']

How do I get the current date in GML?

I need to be able to get the current date, it doesn't really matter what format it is. Is there a function, or perhaps an API i can use?
You can get the current date a number of ways in GML. The easiest of which is probably using the variables current_second, current_minute, current_hour, current_day, current_weekday, current_month, current_year
Here's an example that draws the day, month, and year.
draw_text(32, 32, "Today is " + string(current_day) + "/" + string (current_month) + "/" + string(current_year) +".");
You can change the timezone using date_set_timezone(timezone);
The available timezones are timezone_utc and timezone_local.
Another way to get the date is using date_current_datetime();
myhour = date_get_hour(date_current_datetime());
myday = date_get_day(date_current_datetime());
It exists some ways to do it. If you only need to show the current datetime you can use this:
show_message("Today is " + string(current_day) + "/" + string (current_month) + "/" + string(current_year) + " - " + string(current_hour) + ":" + string(current_minute) + "." + string(current_second) +".");
This will return something like: "Today is 3/6/2017 - 23:40:15."

Python 3.4: Loop and Append: Why Not Working With cx_Oracle and Pandas?

For some reason, the following code only returns a dataframe of 10 rows instead of 20 (there are millions of rows in the SQL view).
When I viewed the output from print(data2), it showed the first 10 rows as a DataFrame, but the next DataFrame was empty.
import cx_Oracle as cx
import pandas as pd
conn = cx.Connection("username/pwd#server")
data = pd.DataFrame([])
SQL1 = '''SELECT * FROM TABLE_MV where rownum between '''
for i in range(1, 20, 10):
lower = i
upper = i+9
SQL3 = SQL1 + str(lower) + ' and ' + str(upper)
data2 = pd.read_sql(SQL3, conn)
print(data2)
data = data.append(data2)