Retrieve df from spark.sql : [PARSE_SYNTAX_ERROR] Syntax error at or near 'SELECT' - sql

I'm using a databricks notebook and I'd like to retrieve a dataframe from an SQL execution in Spark. I have:
statement = f""" USER {db}; SELECT * FROM {table}
"""
df = spark.sql(statement)
display(df)
However, unlike when I fire off the same statement in an SQL cell in the notebook, I get the following error:
[PARSE_SYNTAX_ERROR] Syntax error at or near 'SELECT': extra input 'SELECT'(line 1...
Where am I going wrong?

I tried to reproduce the same in my environment and got below results:
This my sample demo table Persons.
Create dataframe by using this code as shown in the below image.
df = sqlContext.sql("select * from Persons")
display(df)

Related

Error in SQL statement: AnalysisException: Table or view not found:

I've just started with Hive. I'm working on Databricks community. I write in python but wanted to write something in SQL but there is an error I cannot understand. I cannot see anything wrong in my code. Please help me.
spark.sql("create table happiness_perm as select * from happiness_tmp");
%sql
select Country, count(*) from happiness_perm group by Country
I tried use my data freame df_happiness instead happiness_perm and still I receive this:
Error in SQL statement: AnalysisException: Table or view not found: happiness_perm; line 1 pos 30;
'Aggregate ['Country], ['Country, unresolvedalias(count(1), None)]
+- 'UnresolvedRelation [happiness_perm], [], false
I would really appreciate your help!
Try this:
df = spark.sql("select * from happiness_tmp")
df.createOrReplaceTempView("happiness_perm")
First you get your data into a dataframe, then you write the contents of the dataframe to a table in the catalog.
You can then query the table.

Databricks notebook - return all values from a table

For test purposes, I have an empty DB into which I populate a tiny amount of data, extracted and transformed from a json file.
I would like to create a notebook using scala, which gets all values from all columns from a given table, and exit the notebook returning this result as a string.
I've tried variations of the following:
val result = spark.sql("select * from table.DB").as[String];
dbutils.notebook.exit(result)
However the first command fails with error:
AnalysisException: Try to map struct<Version:bigint,metadataInformation:struct<metadataID:string... etc ...> to Tuple1, but failed as the number of fields does not line up.;
However, something like the following works, to retrieve value of a specific field, from a column:
val result = spark.sql("select column.jsonfield from table.DB").as[String].first();
dbutils.notebook.exit(result)
How can I return the content of all columns?
val result = spark.sql("SELECT x FROM y").collect().toList.flatMap(x => x.toSeq).mkString(",")
dbutils.notebook.exit(result)

Writing results of SQL query to Temp View in Databricks

I would like to create a Temporary View from the results of a SQL Query - which sounds like a basic thing to do, but I just couldn't make it work and don't understand what is wrong.
This is my SQL query - which works fine and returns Col1.
%sql
SELECT
Col1
FROM
Table1
WHERE EXISTS (
select *
from TempView1)
I would like to write the results in another table which I can query. Therefore I do this :
df = spark.sql("""
SELECT
Col1
FROM
Table1
WHERE EXISTS (
select *
from TempView1)""")
OK
df
Out[28]: DataFrame[Col1: bigint]
df.createOrReplaceTempView("df_tmp_view")
OK
%sql
select * from df_tmp_view
Error in SQL statement: AnalysisException: Table or view not found: df_tmp_view; line 1 pos 14;
'Project [*]
+- 'UnresolvedRelation [df_tmp_view], [], false
display(affected_customers_tmp_view)
NameError: name 'df_tmp_view' is not defined
What am I doing wrong ?
I don't understand the error saying that the name is not defined although I define it just one command above. Also the SQL query is working and returning data...so what am I missing ?
Thanks !
you need to get the global context of the view, for example in your case:
global_temp_db = spark.conf.get("spark.sql.globalTempDatabase")
display(table(global_temp_db + "." + 'df_tmp_view'))
documentation
for example:
df_pd = pd.DataFrame(
{
'Name' : [231232,12312321,3213231],
}
)
df = spark.createDataFrame(df_pd)
df.createOrReplaceGlobalTempView('test_tmp_view')
global_temp_db = spark.conf.get("spark.sql.globalTempDatabase")
display(table(global_temp_db + "." + 'test_tmp_view'))

I'm trying to use RMySQL to perform search on imdb, but does not seem to get the tables right

I'm new to RMySQL. I'm trying to do an assignment based on the code in this page: https://beanumber.github.io/sds192/lab-sql.html and got stuck after I ran the first few lines:
library(mdsr)
library(RMySQL)
db <- dbConnect_scidb(dbname = "imdb")
class(db)
db %>%
dbGetQuery("SELECT * FROM kind_type;")
Below is my output:
[1] "MySQLConnection"
attr(,"package")
[1] "RMySQL"
> db %>%
+ dbGetQuery("SELECT * FROM kind_type;")
Error in .local(conn, statement, ...) :
could not run statement: Table 'imdb.kind_type' doesn't exist
I also tried to list the tables in db and below is the output:
> dbListTables(db)
character(0)
I greatly appreciate help on this issue.

R - Using sqldf to query multiple values from one column in dataframe

I'm pretty new to R and trying to use the SQLDF package to query a dataset. I have constructed the following query, which works perfectly and displays the correct data:
sqldf("select AreaName, TimePeriod, Value from df2 where Indicator == 'Obese children (Year 6)' AND AreaName == 'Barking and Dagenham'",
row.names = TRUE)
But I would like to pull the data for 'Richmond upon Thames' as well as Barking and Dagenham. I have tried this:
AND AreaName == 'Barking and Dagenham', 'Richmond upon Thames'
Which gives me the following error:
Error in sqliteSendQuery(con, statement, bind.data) : error in statement: near ",": syntax error
And I have also tried:
AND AreaName == 'Barking and Dagenham' AND AreaName == 'Richmond upon Thames'
Which creates the new dataframe as expected but when I view it, it is empty. I know it is not an issue with the name 'Richmond upon Thames' as I have entered this into the first statement by itself instead of 'Barking and Dagenham' and it works perfectly.
Could anybody help me with what the correct structure should be?
Many thanks