I'm trying to extract keywords from the SQL database and CONTAINS and LIKE are not worrking.
import pandas as pd
from sqlalchemy import create_engine
engine = create_engine('sqlite://', echo=False)
df = pd.read_csv('data.csv')
sql = df.to_sql('table1', con=engine,index=True)
q1 = engine.execute('SELECT * FROM table1 WHERE features LIKE "Swimming" ').fetchall()
q2 = engine.execute('SELECT * FROM table1 WHERE CONTAINS(features, "Swimming") ').fetchall()
I've tred both the ways but i'm not getting the answer. And i get this error OperationalError: (sqlite3.OperationalError) no such function: CONTAINS [SQL: 'SELECT * FROM table1 WHERE CONTAINS(features, "Swimming") '] (Background on this error at: http://sqlalche.me/e/e3q8)
You need to use double quote around your SQL statement and single quotes inside that.
q1 = engine.execute("SELECT * FROM table1 WHERE features LIKE '%Swimming%' ").fetchall()
You might want to change 'Swimming' to '%Swimming%' so that it can be matched to any value that has Swimming in it.
Related
In the process of converting some SAS code to PySpark and we previously used a macro variable for the where statement in this code. In adapting to PySpark, I'm trying to pass a list of dates to the where statement, but I keep getting errors. I want the SQL code to pull all data from those 3 months. Any pointers?
month_list = ['202107', '202108', '202109']
sql_query = """ (SELECT *
FROM Table_Blah
WHERE (to_char(DateVariable,'yyyymm') IN '{}')
) as table1""".format(month_list)
Pass the list as a tuple to have the right sql syntax:
month_list = ['202107', '202108', '202109']
sql_query = """ (SELECT *
FROM Table_Blah
WHERE (to_char(DateVariable,'yyyymm') IN {})
) as table1""".format(tuple(month_list))
And you don’t need apostrophe for in statement
I would like to create a Temporary View from the results of a SQL Query - which sounds like a basic thing to do, but I just couldn't make it work and don't understand what is wrong.
This is my SQL query - which works fine and returns Col1.
%sql
SELECT
Col1
FROM
Table1
WHERE EXISTS (
select *
from TempView1)
I would like to write the results in another table which I can query. Therefore I do this :
df = spark.sql("""
SELECT
Col1
FROM
Table1
WHERE EXISTS (
select *
from TempView1)""")
OK
df
Out[28]: DataFrame[Col1: bigint]
df.createOrReplaceTempView("df_tmp_view")
OK
%sql
select * from df_tmp_view
Error in SQL statement: AnalysisException: Table or view not found: df_tmp_view; line 1 pos 14;
'Project [*]
+- 'UnresolvedRelation [df_tmp_view], [], false
display(affected_customers_tmp_view)
NameError: name 'df_tmp_view' is not defined
What am I doing wrong ?
I don't understand the error saying that the name is not defined although I define it just one command above. Also the SQL query is working and returning data...so what am I missing ?
Thanks !
you need to get the global context of the view, for example in your case:
global_temp_db = spark.conf.get("spark.sql.globalTempDatabase")
display(table(global_temp_db + "." + 'df_tmp_view'))
documentation
for example:
df_pd = pd.DataFrame(
{
'Name' : [231232,12312321,3213231],
}
)
df = spark.createDataFrame(df_pd)
df.createOrReplaceGlobalTempView('test_tmp_view')
global_temp_db = spark.conf.get("spark.sql.globalTempDatabase")
display(table(global_temp_db + "." + 'test_tmp_view'))
Trying to implement pandas.read_sql function.
I created a clickhouse table and filled it:
create table regions
(
date DateTime Default now(),
region String
)
engine = MergeTree()
PARTITION BY toYYYYMM(date)
ORDER BY tuple()
SETTINGS index_granularity = 8192;
insert into regions (region) values ('Asia'), ('Europe')
Then python code:
import pandas as pd
from sqlalchemy import create_engine
uri = 'clickhouse://default:#localhost/default'
engine = create_engine(uri)
query = 'select * from regions'
pd.read_sql(query, engine)
As the result I expected to get a dataframe with columns date and region but all I get is empty dataframe:
Empty DataFrame
Columns: [2021-01-08 09:24:33, Asia]
Index: []
UPD. It occured that defining clickhouse+native solves the problem.
Can it be solved without +native?
There is encient issue https://github.com/xzkostyan/clickhouse-sqlalchemy/issues/10. Also there is a hint which assumes to add FORMAT TabSeparatedWithNamesAndTypes at the end of a query. So the init query will be look like this:
select *
from regions
FORMAT TabSeparatedWithNamesAndTypes
I'm trying to set the schema.table_name to a variable, but I also want to pass an integer query 1 works fine when I put it in the python console, but when I call if through a gui it crashes. It only crashes when the %s is present. How can I handle this kind of issue?
In the supplied example below query1 falters but query 2 works.
As you can see I've imported the two necessary libraries.
import psycopg2
from psycopg2 import sql
sample = 10
conn = psycopg2.connect("<details>")
cur = conn.cursor()
print('tables dropped')
query1 = sql.SQL("""CREATE TABLE {}.{} AS
(SELECT * FROM {}.{} TABLESAMPLE BERNOULLI (%s));""").format(
*map(sql.Identifier, (schema, table_name_top, schema, top_point_cloud))), sample
query2 = sql.SQL("""CREATE TABLE {}.{} AS
(SELECT * FROM {}.{} TABLESAMPLE BERNOULLI (10));""").format(
*map(sql.Identifier, (schema, table_name_base, schema, base_point_cloud)))
print('query created')
cur.execute(query1)
cur.execute(query2)
I have read dozens of similar posts and tried everything but I still get an error message when trying to pass a parameter to a simple query using pyodbc. Apologies if there is an answer to this elsewhere but I cannot find it
I have a very simple table:
select * from Test
yields
a
b
c
This works fine:
import pyodbc
import pandas
connection = pyodbc.connect('DSN=HyperCube SYSTEST',autocommit=True)
result = pandas.read_sql("""select * from Test where value = 'a'""",connection,params=None)
print(result)
result:
value
0 a
However if I try to do the where clause with a parameter it fails
result = pandas.read_sql("""select * from Test where value = ?""",connection,params='a')
yields
Error: ('01S02', '[01S02] Unknown column/parameter value (9001) (SQLPrepare)')
I also tried this
cursor = connection.cursor()
cursor.execute("""select * from Test where value = ?""",['a'])
pyodbcResults = cursor.fetchall()
and still received the same error
Does anyone know what is going on? Could it be an issue with the database I am querying?
PS. I looked at the following post and the syntax there in the first part of answer 9 where dates are passed by strings looks identical to what I am doing
pyodbc the sql contains 0 parameter markers but 1 parameters were supplied' 'hy000'
Thanks
pandas.read_sql(sql, con, index_col=None, coerce_float=True, params=None, parse_dates=None, columns=None, chunksize=None)[https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_sql.html]¶
params : list, tuple or dict, optional, default: None
example:
cursor.execute("select * from Test where value = %s",['a'])
or Named arguments example:
result = pandas.read_sql(('select * from Test where value = %(par)s'),
db,params={"par":'p'})
in pyodbc write parms directly after sql parameter:
cursor.execute(sql, *parameters)
for example:
onepar = 'a'
cursor.execute("select * from Test where value = ?", onepar)
cursor.execute("select a from tbl where b=? and c=?", x, y)