After importing pandasql as sqldf , getting 'module' object is not callable error - pandasql

I imported the below :
import pandasql as sqldf
import pandas as pd
import numpy as np
from pandasql import load_meat, load_births
pysqldf = lambda q: sqldf(q, globals())**
meat=load_meat()
when I write a query pysqldf("SELECT * FROM meat LIMIT 5;")
It gives :
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-60-d1c81773e718> in <module>()
----> 1 pysqldf("SELECT * FROM meat LIMIT 5;")
<ipython-input-57-ad9f322e1336> in <lambda>(q)
1 import pandasql as sqldf
----> 2 pysqldf = lambda q: sqldf(q, globals())
3 import pandas as pd
4 import numpy as np
5 from pandasql import load_meat, load_births
TypeError: 'module' object is not callable

I just checked the pandasql documentation and you need to import the sqldf function from pandasql instead of renaming it.
Replace your line import pandasql as sqldf with from pandasqlf import sqldf.

Related

How to migrate pandas read_sql from psycopg2 to sqlalchemy with a tuple as one of the query params

With pandas=1.4.0, it emits a Warning about not using psycopg2 directly within read_sql, but to use sqlalchemy. While attempting to do such a migration, I can not resolve how to pass a tuple as one of the query parameters. For example, this presently works:
import pandas as pd
import psycopg2
read_sql(
"SELECT * from news where id in %s",
psycopg2.connect("dbname=mydatabase"),
params=[(1, 2, 3),],
)
attempting to migrate this to sqlalchemy like so:
import pandas as pd
read_sql(
"SELECT * from news where id in %s",
"postgresql://localhost/mydatabase",
params=[(1, 2, 3),],
)
results in
...snipped...
File "/opt/miniconda3/envs/prod/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1802, in _execute_context
self.dialect.do_execute(
File "/opt/miniconda3/envs/prod/lib/python3.8/site-packages/sqlalchemy/engine/default.py", line 732, in do_execute
cursor.execute(statement, parameters)
TypeError: not all arguments converted during string formatting
So how do I pass a tuple as a params argument within pandas read_sql?
Wrap your query with a SQLAlchemy text object, use named parameters and pass the parameter values as a dictionary:
import pandas as pd
from sqlalchemy import text
read_sql(
text("SELECT * from news where id in :ids"),
"postgresql://localhost/mydatabase",
params={'id': (1, 2, 3),},
)

Transfer a df to a new one and change the context of a column

I have one dataframe df_test and I want to parse all the columns into a new df.
Also I want with if else statement to modify one column's context.
Tried this:
import pyspark
import pandas as pd
from pyspark.sql import SparkSession
df_cast= df_test.withColumn('account_id', when(col("account_id") == 8, "teo").when(col("account_id") == 9, "liza").otherwise(' '))
But it gives me this error:
NameError: name 'when' is not defined
Thanks in advance
At the start of your code, you should import the pyspark sql functions. The following, for example, would work:
import pyspark.sql.functions as F
import pyspark
import pandas as pd
from pyspark.sql import SparkSession
df_cast= df_test.withColumn('account_id', F.when(col("account_id") == 8, "teo").F.when(col("account_id") == 9, "liza").otherwise(' '))

How to display negative x values on the left side for barplot?

I would like to ask question regarding to barplot for seaborn.
I have a dataset returned from bigquery and converted to dataframe as below.
Sample data from `df.sort_values(by=['dep_delay_in_minutes']).to_csv(csv_file)`
,dep_delay_in_minutes,arrival_delay_in_minutes,numflights
1,-50.0,-38.0,2
2,-49.0,-59.5,4
3,-46.0,-28.5,4
4,-45.0,-44.0,4
5,-43.0,-53.0,4
6,-42.0,-35.0,6
7,-40.0,-26.0,4
8,-39.0,-33.5,4
9,-38.0,-21.5,4
10,-37.0,-37.666666666666664,12
11,-36.0,-35.0,2
12,-35.0,-32.57142857142857,14
13,-34.0,-30.0,18
14,-33.0,-26.200000000000003,10
15,-32.0,-34.8,10
16,-31.0,-28.769230769230766,26
17,-30.0,-34.93749999999999,32
18,-29.0,-31.375000000000004,48
19,-28.0,-24.857142857142854,70
20,-27.0,-28.837209302325583,86
I wrote the code as below but the negative value is plotted on right hand side .
import matplotlib.pyplot as plt
import seaborn as sb
import pandas as pd
import numpy as np
import google.datalab.bigquery as bq
import warnings
# Disable warnings
warnings.filterwarnings('ignore')
sql="""
SELECT
DEP_DELAY as dep_delay_in_minutes,
AVG(ARR_DELAY) AS arrival_delay_in_minutes,
COUNT(ARR_DELAY) AS numflights
FROM flights.simevents
GROUP BY DEP_DELAY
ORDER BY DEP_DELAY
"""
df = bq.Query(sql).execute().result().to_dataframe()
df = df.sort_values(['dep_delay_in_minutes'])
ax = sb.barplot(data=df, x='dep_delay_in_minutes', y='numflights', order=df['dep_delay_in_minutes'])
ax.set_xlim(-50, 0)
How can I display x axis as numeric order with negative values on left hand side ?
I appreciate if I could get some adice.
It doesn't work to specify left and right with ax.set_xlim(). It was displayed well with only one specification.
import matplotlib.pyplot as plt
import seaborn as sb
import pandas as pd
import numpy as np
df = df.sort_values(['dep_delay_in_minutes'])
ax = sb.barplot(x='dep_delay_in_minutes', y='numflights', data=df, order=df['dep_delay_in_minutes'])
ax.set_xlim(0.0)
labels = ax.get_xticklabels()
plt.setp(labels, rotation=45)
plt.show()
A different notation was also possible.
ax.set_xlim(0.0,)

Unable to label axis in juyter TypeError

This is the code which I have use to plot except i have removed the key.
import datetime
import pandas as pd
import numpy as np
from pandas_datareader import data as web
import matplotlib.pyplot as plt
from pandas import DataFrame
from alpha_vantage.foreignexchange import ForeignExchange
cc = ForeignExchange(key=' ',output_format='pandas')
data, meta_data = cc.get_currency_exchange_daily(from_symbol='USD',to_symbol='EUR', outputsize='full')
print(data)
data['4. close'].plot()
plt.tight_layout()
plt.title('Intraday USD/Eur')
if i insert ylabel('label') i got the following error TypeError: 'str' object is not callable
This problem was outlined here thus if i restart my juyter kernel the ylabel will show up once but if I rerun the same code i will get the same error again. Is there a bug or is there a problem on my end?
I am not sure if it is relevent but the dataframe looks like this
Open High Low Close
date
2019-11-15 0.9072 0.9076 0.9041 0.9043
2019-11-14 0.9081 0.9097 0.9065 0.9070
2019-11-13 0.9079 0.9092 0.9071 0.9082
2019-11-12 0.9062 0.9085 0.9056 0.9079
2019-11-11 0.9071 0.9074 0.9052 0.9062
... ... ... ... ...
2014-11-28 0.8023 0.8044 0.8004 0.8028
2014-11-27 0.7993 0.8024 0.7983 0.8022
2014-11-26 0.8014 0.8034 0.7980 0.7993
2014-11-25 0.8037 0.8059 0.8007 0.8014
2014-11-24 0.8081 0.8085 0.8032 0.8036
1570 rows × 4 columns

Pandas Dataframe

import numpy as np
import pandas as pd
from pandas_datareader import data as wb
from yahoofinancials import YahooFinancials
sympol = [input()]
abc = YahooFinancials(sympol)
l=abc.get_financial_stmts('annual', 'cash')
df=pd.concat([pd.DataFrame(key) for key in l['cashflowStatementHistory']['FB']],axis=1,sort=True).reset_index().rename(columns={'index':'Time'})
Thank you it works but only with specific sympol. How to make it customised so that I can enter any symbol instead. What to write instead of 'FB'. I tried `[[sympol]] but it gives my an error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
7 l=abc.get_financial_stmts('annual', 'cash')
----> 8 df=pd.concat([pd.DataFrame(key) for key in l['cashflowStatementHistory'][[sympol]]],axis=1,sort=True).reset_index().rename(columns={'index':'Time'})
TypeError: unhashable type: 'list'
What do you think is the problem? How to fix it.
Thank you for help.
I think simpliest is use list comprehension with transpose by DataFrame.T of DataFrame and last use concat.
If need working later with TimeSeries, the best is also create DatetimeIndex:
df = pd.concat([pd.DataFrame(x).T for x in l['cashflowStatementHistory']['FB']], sort=True)
df.index = pd.to_datetime(df.index)