I've just started with Hive. I'm working on Databricks community. I write in python but wanted to write something in SQL but there is an error I cannot understand. I cannot see anything wrong in my code. Please help me.
spark.sql("create table happiness_perm as select * from happiness_tmp");
%sql
select Country, count(*) from happiness_perm group by Country
I tried use my data freame df_happiness instead happiness_perm and still I receive this:
Error in SQL statement: AnalysisException: Table or view not found: happiness_perm; line 1 pos 30;
'Aggregate ['Country], ['Country, unresolvedalias(count(1), None)]
+- 'UnresolvedRelation [happiness_perm], [], false
I would really appreciate your help!
Try this:
df = spark.sql("select * from happiness_tmp")
df.createOrReplaceTempView("happiness_perm")
First you get your data into a dataframe, then you write the contents of the dataframe to a table in the catalog.
You can then query the table.
Related
I'm using a databricks notebook and I'd like to retrieve a dataframe from an SQL execution in Spark. I have:
statement = f""" USER {db}; SELECT * FROM {table}
"""
df = spark.sql(statement)
display(df)
However, unlike when I fire off the same statement in an SQL cell in the notebook, I get the following error:
[PARSE_SYNTAX_ERROR] Syntax error at or near 'SELECT': extra input 'SELECT'(line 1...
Where am I going wrong?
I tried to reproduce the same in my environment and got below results:
This my sample demo table Persons.
Create dataframe by using this code as shown in the below image.
df = sqlContext.sql("select * from Persons")
display(df)
I would like to create a Temporary View from the results of a SQL Query - which sounds like a basic thing to do, but I just couldn't make it work and don't understand what is wrong.
This is my SQL query - which works fine and returns Col1.
%sql
SELECT
Col1
FROM
Table1
WHERE EXISTS (
select *
from TempView1)
I would like to write the results in another table which I can query. Therefore I do this :
df = spark.sql("""
SELECT
Col1
FROM
Table1
WHERE EXISTS (
select *
from TempView1)""")
OK
df
Out[28]: DataFrame[Col1: bigint]
df.createOrReplaceTempView("df_tmp_view")
OK
%sql
select * from df_tmp_view
Error in SQL statement: AnalysisException: Table or view not found: df_tmp_view; line 1 pos 14;
'Project [*]
+- 'UnresolvedRelation [df_tmp_view], [], false
display(affected_customers_tmp_view)
NameError: name 'df_tmp_view' is not defined
What am I doing wrong ?
I don't understand the error saying that the name is not defined although I define it just one command above. Also the SQL query is working and returning data...so what am I missing ?
Thanks !
you need to get the global context of the view, for example in your case:
global_temp_db = spark.conf.get("spark.sql.globalTempDatabase")
display(table(global_temp_db + "." + 'df_tmp_view'))
documentation
for example:
df_pd = pd.DataFrame(
{
'Name' : [231232,12312321,3213231],
}
)
df = spark.createDataFrame(df_pd)
df.createOrReplaceGlobalTempView('test_tmp_view')
global_temp_db = spark.conf.get("spark.sql.globalTempDatabase")
display(table(global_temp_db + "." + 'test_tmp_view'))
I am trying out a complex query on the django shell:
qs.annotate(rn=Window(expression=RowNumber(), order_by=F('date').desc(), partition_by=[F('name')]))
This is failing with:
ProgrammingError: syntax error at or near "DESC"
LINE 1: ...ion"."storage_name", ROW_NUMBER() OVER (ORDER BY DESC) OVER...
I need to debug this. I would like to see the full SQL, before it is even sent to Postgres (since it is failing). How can I do this?
From a working queryset, I would simply do:
In [60]: qs = Consumption.objects.values('name')
In [61]: print(qs.query)
SELECT "consumption_consumption"."name" FROM "consumption_consumption"
I'm new to RMySQL. I'm trying to do an assignment based on the code in this page: https://beanumber.github.io/sds192/lab-sql.html and got stuck after I ran the first few lines:
library(mdsr)
library(RMySQL)
db <- dbConnect_scidb(dbname = "imdb")
class(db)
db %>%
dbGetQuery("SELECT * FROM kind_type;")
Below is my output:
[1] "MySQLConnection"
attr(,"package")
[1] "RMySQL"
> db %>%
+ dbGetQuery("SELECT * FROM kind_type;")
Error in .local(conn, statement, ...) :
could not run statement: Table 'imdb.kind_type' doesn't exist
I also tried to list the tables in db and below is the output:
> dbListTables(db)
character(0)
I greatly appreciate help on this issue.
I am trying to re-write a sparksql query into a dataframe transformation using groupby and aggregate . Below is the original sparksql query .
result = spark.sql(
"select date, Full_Subcategory, Budget_Type, SUM(measure_value) AS planned_sales_inputs FROM lookups GROUP BY date, Budget_Type, Full_Subcategory")
Below is the Dataframe transformation that i am trying to do .
df_lookups.groupBy('Full_Subcategory','Budget_Type','date').agg(col('measure_value'),sum('measure_value')).show()
But i keep getting the below error .
Py4JJavaError: An error occurred while calling o2475.agg.
: org.apache.spark.sql.AnalysisException: cannot resolve '`measure_value`' given input columns: [Full_Subcategory, Budget_Type, date];;
'Aggregate [Full_Subcategory#278, Budget_Type#279, date#413], [Full_Subcategory#278, Budget_Type#279, date#413, 'measure_value, sum('measure_value) AS sum(measure_value)#16168]
I am pretty sure this has something do with grouping by columns and those columns being present in the select clause .
Kindly help .
I think it's because you are doing col('measure_value') inside agg function, which does not make sense as for me, because you are not aggregating any value in such way.
Just remove col('measure_value') from agg and you will get right result.