Error in SQL statement: AnalysisException: Table or view not found: - sql

I've just started with Hive. I'm working on Databricks community. I write in python but wanted to write something in SQL but there is an error I cannot understand. I cannot see anything wrong in my code. Please help me.
spark.sql("create table happiness_perm as select * from happiness_tmp");
%sql
select Country, count(*) from happiness_perm group by Country
I tried use my data freame df_happiness instead happiness_perm and still I receive this:
Error in SQL statement: AnalysisException: Table or view not found: happiness_perm; line 1 pos 30;
'Aggregate ['Country], ['Country, unresolvedalias(count(1), None)]
+- 'UnresolvedRelation [happiness_perm], [], false
I would really appreciate your help!

Try this:
df = spark.sql("select * from happiness_tmp")
df.createOrReplaceTempView("happiness_perm")
First you get your data into a dataframe, then you write the contents of the dataframe to a table in the catalog.
You can then query the table.

Related

Retrieve df from spark.sql : [PARSE_SYNTAX_ERROR] Syntax error at or near 'SELECT'

I'm using a databricks notebook and I'd like to retrieve a dataframe from an SQL execution in Spark. I have:
statement = f""" USER {db}; SELECT * FROM {table}
"""
df = spark.sql(statement)
display(df)
However, unlike when I fire off the same statement in an SQL cell in the notebook, I get the following error:
[PARSE_SYNTAX_ERROR] Syntax error at or near 'SELECT': extra input 'SELECT'(line 1...
Where am I going wrong?
I tried to reproduce the same in my environment and got below results:
This my sample demo table Persons.
Create dataframe by using this code as shown in the below image.
df = sqlContext.sql("select * from Persons")
display(df)

Writing results of SQL query to Temp View in Databricks

I would like to create a Temporary View from the results of a SQL Query - which sounds like a basic thing to do, but I just couldn't make it work and don't understand what is wrong.
This is my SQL query - which works fine and returns Col1.
%sql
SELECT
Col1
FROM
Table1
WHERE EXISTS (
select *
from TempView1)
I would like to write the results in another table which I can query. Therefore I do this :
df = spark.sql("""
SELECT
Col1
FROM
Table1
WHERE EXISTS (
select *
from TempView1)""")
OK
df
Out[28]: DataFrame[Col1: bigint]
df.createOrReplaceTempView("df_tmp_view")
OK
%sql
select * from df_tmp_view
Error in SQL statement: AnalysisException: Table or view not found: df_tmp_view; line 1 pos 14;
'Project [*]
+- 'UnresolvedRelation [df_tmp_view], [], false
display(affected_customers_tmp_view)
NameError: name 'df_tmp_view' is not defined
What am I doing wrong ?
I don't understand the error saying that the name is not defined although I define it just one command above. Also the SQL query is working and returning data...so what am I missing ?
Thanks !
you need to get the global context of the view, for example in your case:
global_temp_db = spark.conf.get("spark.sql.globalTempDatabase")
display(table(global_temp_db + "." + 'df_tmp_view'))
documentation
for example:
df_pd = pd.DataFrame(
{
'Name' : [231232,12312321,3213231],
}
)
df = spark.createDataFrame(df_pd)
df.createOrReplaceGlobalTempView('test_tmp_view')
global_temp_db = spark.conf.get("spark.sql.globalTempDatabase")
display(table(global_temp_db + "." + 'test_tmp_view'))

How to dump full SQL of failing Django queryset?

I am trying out a complex query on the django shell:
qs.annotate(rn=Window(expression=RowNumber(), order_by=F('date').desc(), partition_by=[F('name')]))
This is failing with:
ProgrammingError: syntax error at or near "DESC"
LINE 1: ...ion"."storage_name", ROW_NUMBER() OVER (ORDER BY DESC) OVER...
I need to debug this. I would like to see the full SQL, before it is even sent to Postgres (since it is failing). How can I do this?
From a working queryset, I would simply do:
In [60]: qs = Consumption.objects.values('name')
In [61]: print(qs.query)
SELECT "consumption_consumption"."name" FROM "consumption_consumption"

I'm trying to use RMySQL to perform search on imdb, but does not seem to get the tables right

I'm new to RMySQL. I'm trying to do an assignment based on the code in this page: https://beanumber.github.io/sds192/lab-sql.html and got stuck after I ran the first few lines:
library(mdsr)
library(RMySQL)
db <- dbConnect_scidb(dbname = "imdb")
class(db)
db %>%
dbGetQuery("SELECT * FROM kind_type;")
Below is my output:
[1] "MySQLConnection"
attr(,"package")
[1] "RMySQL"
> db %>%
+ dbGetQuery("SELECT * FROM kind_type;")
Error in .local(conn, statement, ...) :
could not run statement: Table 'imdb.kind_type' doesn't exist
I also tried to list the tables in db and below is the output:
> dbListTables(db)
character(0)
I greatly appreciate help on this issue.

Converting a Sparksql Query to Dataframe transformations

I am trying to re-write a sparksql query into a dataframe transformation using groupby and aggregate . Below is the original sparksql query .
result = spark.sql(
"select date, Full_Subcategory, Budget_Type, SUM(measure_value) AS planned_sales_inputs FROM lookups GROUP BY date, Budget_Type, Full_Subcategory")
Below is the Dataframe transformation that i am trying to do .
df_lookups.groupBy('Full_Subcategory','Budget_Type','date').agg(col('measure_value'),sum('measure_value')).show()
But i keep getting the below error .
Py4JJavaError: An error occurred while calling o2475.agg.
: org.apache.spark.sql.AnalysisException: cannot resolve '`measure_value`' given input columns: [Full_Subcategory, Budget_Type, date];;
'Aggregate [Full_Subcategory#278, Budget_Type#279, date#413], [Full_Subcategory#278, Budget_Type#279, date#413, 'measure_value, sum('measure_value) AS sum(measure_value)#16168]
I am pretty sure this has something do with grouping by columns and those columns being present in the select clause .
Kindly help .
I think it's because you are doing col('measure_value') inside agg function, which does not make sense as for me, because you are not aggregating any value in such way.
Just remove col('measure_value') from agg and you will get right result.