With pandas=1.4.0, it emits a Warning about not using psycopg2 directly within read_sql, but to use sqlalchemy. While attempting to do such a migration, I can not resolve how to pass a tuple as one of the query parameters. For example, this presently works:
import pandas as pd
import psycopg2
read_sql(
"SELECT * from news where id in %s",
psycopg2.connect("dbname=mydatabase"),
params=[(1, 2, 3),],
)
attempting to migrate this to sqlalchemy like so:
import pandas as pd
read_sql(
"SELECT * from news where id in %s",
"postgresql://localhost/mydatabase",
params=[(1, 2, 3),],
)
results in
...snipped...
File "/opt/miniconda3/envs/prod/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1802, in _execute_context
self.dialect.do_execute(
File "/opt/miniconda3/envs/prod/lib/python3.8/site-packages/sqlalchemy/engine/default.py", line 732, in do_execute
cursor.execute(statement, parameters)
TypeError: not all arguments converted during string formatting
So how do I pass a tuple as a params argument within pandas read_sql?
Wrap your query with a SQLAlchemy text object, use named parameters and pass the parameter values as a dictionary:
import pandas as pd
from sqlalchemy import text
read_sql(
text("SELECT * from news where id in :ids"),
"postgresql://localhost/mydatabase",
params={'id': (1, 2, 3),},
)
I have one dataframe df_test and I want to parse all the columns into a new df.
Also I want with if else statement to modify one column's context.
Tried this:
import pyspark
import pandas as pd
from pyspark.sql import SparkSession
df_cast= df_test.withColumn('account_id', when(col("account_id") == 8, "teo").when(col("account_id") == 9, "liza").otherwise(' '))
But it gives me this error:
NameError: name 'when' is not defined
Thanks in advance
At the start of your code, you should import the pyspark sql functions. The following, for example, would work:
import pyspark.sql.functions as F
import pyspark
import pandas as pd
from pyspark.sql import SparkSession
df_cast= df_test.withColumn('account_id', F.when(col("account_id") == 8, "teo").F.when(col("account_id") == 9, "liza").otherwise(' '))
I would like to ask question regarding to barplot for seaborn.
I have a dataset returned from bigquery and converted to dataframe as below.
Sample data from `df.sort_values(by=['dep_delay_in_minutes']).to_csv(csv_file)`
,dep_delay_in_minutes,arrival_delay_in_minutes,numflights
1,-50.0,-38.0,2
2,-49.0,-59.5,4
3,-46.0,-28.5,4
4,-45.0,-44.0,4
5,-43.0,-53.0,4
6,-42.0,-35.0,6
7,-40.0,-26.0,4
8,-39.0,-33.5,4
9,-38.0,-21.5,4
10,-37.0,-37.666666666666664,12
11,-36.0,-35.0,2
12,-35.0,-32.57142857142857,14
13,-34.0,-30.0,18
14,-33.0,-26.200000000000003,10
15,-32.0,-34.8,10
16,-31.0,-28.769230769230766,26
17,-30.0,-34.93749999999999,32
18,-29.0,-31.375000000000004,48
19,-28.0,-24.857142857142854,70
20,-27.0,-28.837209302325583,86
I wrote the code as below but the negative value is plotted on right hand side .
import matplotlib.pyplot as plt
import seaborn as sb
import pandas as pd
import numpy as np
import google.datalab.bigquery as bq
import warnings
# Disable warnings
warnings.filterwarnings('ignore')
sql="""
SELECT
DEP_DELAY as dep_delay_in_minutes,
AVG(ARR_DELAY) AS arrival_delay_in_minutes,
COUNT(ARR_DELAY) AS numflights
FROM flights.simevents
GROUP BY DEP_DELAY
ORDER BY DEP_DELAY
"""
df = bq.Query(sql).execute().result().to_dataframe()
df = df.sort_values(['dep_delay_in_minutes'])
ax = sb.barplot(data=df, x='dep_delay_in_minutes', y='numflights', order=df['dep_delay_in_minutes'])
ax.set_xlim(-50, 0)
How can I display x axis as numeric order with negative values on left hand side ?
I appreciate if I could get some adice.
It doesn't work to specify left and right with ax.set_xlim(). It was displayed well with only one specification.
import matplotlib.pyplot as plt
import seaborn as sb
import pandas as pd
import numpy as np
df = df.sort_values(['dep_delay_in_minutes'])
ax = sb.barplot(x='dep_delay_in_minutes', y='numflights', data=df, order=df['dep_delay_in_minutes'])
ax.set_xlim(0.0)
labels = ax.get_xticklabels()
plt.setp(labels, rotation=45)
plt.show()
A different notation was also possible.
ax.set_xlim(0.0,)
This is the code which I have use to plot except i have removed the key.
import datetime
import pandas as pd
import numpy as np
from pandas_datareader import data as web
import matplotlib.pyplot as plt
from pandas import DataFrame
from alpha_vantage.foreignexchange import ForeignExchange
cc = ForeignExchange(key=' ',output_format='pandas')
data, meta_data = cc.get_currency_exchange_daily(from_symbol='USD',to_symbol='EUR', outputsize='full')
print(data)
data['4. close'].plot()
plt.tight_layout()
plt.title('Intraday USD/Eur')
if i insert ylabel('label') i got the following error TypeError: 'str' object is not callable
This problem was outlined here thus if i restart my juyter kernel the ylabel will show up once but if I rerun the same code i will get the same error again. Is there a bug or is there a problem on my end?
I am not sure if it is relevent but the dataframe looks like this
Open High Low Close
date
2019-11-15 0.9072 0.9076 0.9041 0.9043
2019-11-14 0.9081 0.9097 0.9065 0.9070
2019-11-13 0.9079 0.9092 0.9071 0.9082
2019-11-12 0.9062 0.9085 0.9056 0.9079
2019-11-11 0.9071 0.9074 0.9052 0.9062
... ... ... ... ...
2014-11-28 0.8023 0.8044 0.8004 0.8028
2014-11-27 0.7993 0.8024 0.7983 0.8022
2014-11-26 0.8014 0.8034 0.7980 0.7993
2014-11-25 0.8037 0.8059 0.8007 0.8014
2014-11-24 0.8081 0.8085 0.8032 0.8036
1570 rows × 4 columns
import numpy as np
import pandas as pd
from pandas_datareader import data as wb
from yahoofinancials import YahooFinancials
sympol = [input()]
abc = YahooFinancials(sympol)
l=abc.get_financial_stmts('annual', 'cash')
df=pd.concat([pd.DataFrame(key) for key in l['cashflowStatementHistory']['FB']],axis=1,sort=True).reset_index().rename(columns={'index':'Time'})
Thank you it works but only with specific sympol. How to make it customised so that I can enter any symbol instead. What to write instead of 'FB'. I tried `[[sympol]] but it gives my an error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
7 l=abc.get_financial_stmts('annual', 'cash')
----> 8 df=pd.concat([pd.DataFrame(key) for key in l['cashflowStatementHistory'][[sympol]]],axis=1,sort=True).reset_index().rename(columns={'index':'Time'})
TypeError: unhashable type: 'list'
What do you think is the problem? How to fix it.
Thank you for help.
I think simpliest is use list comprehension with transpose by DataFrame.T of DataFrame and last use concat.
If need working later with TimeSeries, the best is also create DatetimeIndex:
df = pd.concat([pd.DataFrame(x).T for x in l['cashflowStatementHistory']['FB']], sort=True)
df.index = pd.to_datetime(df.index)