val martial = sqlContext.sql("select martial, count(*) as number from marketing1 where y='yes'group by martial order by number desc ")
Error:
org.apache.spark.sql.AnalysisException: cannot resolve 'martial' given input columns: [default, balance, education, duration, previous, age, loan, contact, campaig
n, poutcome, job, pdays, housing, y, marital, day, month]; line 1 pos 95
Is it martial column cannot resolve
Use `column-name` for accessing any column.
Related
I have the following tables:
Students(id, name, surname)
Courses(course id)
Course_Signup(id, student_id, course_id, year)
Grades(signup_id, mark)
I want to display all the students(id, name, surname) with their final grade (where final grade = avg of the grades of all courses), but only for the students that have passed all the courses for which they have sign-up in the current year.
This is what I tried:
SELECT s."id", s."name", s."surname", AVG(g."mark") AS "finalGrade"
FROM "STUDENT" s,
"course sign-up" csn
join "GRADES" g
on csn."id" = g."signup_id"
WHERE csn."year" >= '01-01-2022'
HAVING "finalGrade" >= 5.00
GROUP BY s."id"
However, after adding the last 2 lines, regarding the finalGrade condition, I get an invalid identifier error. Why is that?
Uh, oh. Did you really create tables using lower letter case names enclosed into double quotes? If so, get rid of them (the sooner, the better) because they only cause problems.
Apart from that, uniformly use joins - in your from clause there's the student table which isn't joined to any other table and results in cross join.
Don't compare dates to strings; use date literal (as I did), or to_date function with appropriate format model.
As of error you got: you can't reference expression's alias ("finalGrade") as is in the having clause - use the whole expression.
Also, group by should contain all non-aggregated columns from the select column list.
This "fixes" error you got, but - I suggest you consider everything I said:
SELECT s."id", s."name", s."surname", AVG(g."mark") AS "finalGrade"
FROM "STUDENT" s,
"course sign-up" csn
join "GRADES" g
on csn."id" = g."signup_id"
WHERE csn."year" >= date '2022-01-01'
GROUP BY s."id", s."name", s."surname"
HAVING AVG(g."mark") >= 5.00
Trying to do a simple count in Pyspark programmatically but coming up with errors. .count() works at the end of the statement if I drop AS (count(city)) but I need the count to appear inside not on the outside.
result = spark.sql("SELECT city AS (count(city)) AND business_id FROM business WHERE city = 'Reading'")
One of many errors
Py4JJavaError: An error occurred while calling o24.sql.
: org.apache.spark.sql.catalyst.parser.ParseException:
mismatched input '(' expecting ')'(line 1, pos 21)
== SQL ==
SELECT city AS (count(city)) AND business_id FROM business WHERE city = 'Reading'
---------------------^^^
Your syntax is incorrect. Maybe you want to do this instead:
result = spark.sql("""
SELECT
count(city) over(partition by city),
business_id
FROM business
WHERE city = 'Reading'
""")
You need to provide a window if you use count without group by. In this case, you probably want a count for each city.
Just my solution to the problem I'm trying to solve. The solution above is where I would like to be at.
result = spark.sql("SELECT count(*) FROM business WHERE city='Reading'")
Using relevant Hive DML statements and summary functions to generate reports that summaries the data.
year,town,taxi_co2,bus_co2
2013,luton,1,1
2013,manchester,3,2
2013,london,2,1
2014,luton,1,3
2014,london,3,1
2015,luton,4,1
2014,manchester,6,7
2016,london,2,2
2015,luton,4,1
2015,manchester,1,8
2014,london,3,1
2015,luton,3,1
2015,manchester,1,8
2015,london,3,1
2016,luton,6,5
2016,manchester,4,2
2016,london,3,2
2015,luton,4,1
2013,luton,1,2
2015,london,7,8
2013,manchester,3,2
2015,manchester,1,8
2015,london,7,8
The result I want is to filter only year 2013. And then show the total Co2 per town and a horizontal total.
town, total taxi co2, total bus co2, total (both taxi and bus)
luton, x, x, x
manchester, x, x
london, x, x, x
I have tried using HQL below, but I cannot get my head around completing it or whether my HQL is correct or not. But I'm not getting the desired result. :)
SELECT town,
sum(taxi_co2) AS Taxi,
sum(bus_co2) AS Bus
FROM <table>
WHERE year == '2013'
GROUP BY town;
SELECT town,
sum(taxi_co2) as Taxi,
sum(bus_co2) as Bus,
sum(taxi_co2)+sum(bus_co2) as Total
FROM <table>
WHERE year = '2013'
GROUP BY town;
If sum() for some town can be NULL, use NVL() to convert to 0:
nvl(sum(taxi_co2),0)+nvl(sum(bus_co2),0) as Total
i have different mobile types in a table. Types like iPhone,android,windows etc. I want to get the individual counts of each type using the same query. I used the below query to get count of one type.
`select type,
count(1) AS total
from mobile_types where type = 'iPhone'
group by type;'
I got the required o/p using this for one record.
iPhone 1000
But when i try it for multiple records i am getting an error. I used the following for multiple records.
'select type,
count(1) AS total
from mobile_types where type = 'iPhone'
from mobile_types where type = 'windows'
group by type;'
the error i got was "ParseException line 5:0 missing EOF at 'from' near ''iPhone''"
And is there a way to get the output in below format,with types as column names and the count below as row?
|iPhone|windows|android|
1000 |1500 |900 |
UPDATE
I was able to get the individual counts using the below script.
'select type,
count(1) AS total
from mobile_types where type = 'iPhone' OR type = 'android' OR type = 'windows'
group by type;'
But still need above mentioned o/p format. Current o/p format
iphone 1000
android 900
windows 1500.
Any suggestions?
To get the output format you're looking for:
select
sum(if(type = 'iphone', 1, 0)) as n_iphone,
sum(if(type = 'android', 1, 0)) as n_android,
sum(if(type = 'windows', 1, 0)) as n_windows
from mobile_types;
You don't need the where clause to get the type. This will show a list of all types with the count of each in the total column.
select type,
count(*) AS total
from mobile_types
group by type;
AFTER COMMENT
SELECT type,
COUNT(*) AS total
FROM mobile_types
WHERE type IN ('IPhone','Android', 'Other Phone Names')
GROUP BY type;
First table DailyOil with the fields DayofMonth, Month, Year, EXPL, AB,…
Second Table HPower with the fields HourEnding, Day, Month, Year, MWh1, MWh2, ….
I want to create a new table with HourEnding, Day, Month, Year, MWh1, MWh2, EXPL, AB
Notice the second table as an addition time field so is 24 times long than the oil table.
The R code:
Library(sqldf)
df4 <- sqldf("SELECT HP.Month, HP.Day, HP.Year, HP.Express_Avg, HP.Platte_Avg, HP.Full_Avg, HP.CasperToGurley_Avg, HP.OgallallatoEthlyn_Avg, OD.EXPL, OD.PLATTE, OD.CASPERtoGUERNSEY
FROM HPower HP
LEFT JOIN DailyOil OD
on HP.Day = OD.DayofMonth and HP.Month = OD.Month and HP.Year = OD.Year")
Error in sqliteExecStatement(con, statement, bind.data) :
RS-DBI driver: (error in statement: near "FROM": syntax error)