Counting number in Spark SQL - apache-spark-sql

I'm new to Spark SQL and my task is to count number of airports by state. I have already prepared my code but after execution the below error appears. Could you please point what is wrong here?
result_number_of_airports = spark.sql(
"SELECT state, COUNT(DISTINCT airport) AS number_of_airports \
FROM airports \
GROUP BY state \
ORDER BY number_of_airports DESC")
Schema
airports:pyspark.sql.dataframe.DataFrame
IATA_CODE:string
AIRPORT:string
CITY:string
STATE:string
COUNTRY:string
LATITUDE:string
LONGITUDE:string
Error message
AnalysisException: cannot resolve '`state`' given input columns: [airports.ActualElapsedTime, airports.AirTime, airports.AirlineID, airports.ArrDel15, airports.ArrDelay, airports.ArrDelayMinutes, airports.ArrTime, airports.ArrTimeBlk, airports.ArrivalDelayGroups, airports.CRSArrTime, airports.CRSDepTime, airports.CRSElapsedTime, airports.CancellationCode, airports.Cancelled, airports.Carrier, airports.CarrierDelay, airports.DayOfWeek, airports.DayofMonth, airports.DepDel15, airports.DepDelay, airports.DepDelayMinutes, airports.DepTime, airports.DepTimeBlk, airports.DepartureDelayGroups, airports.Dest, airports.DestAirportID, airports.DestAirportSeqID, airports.DestCityMarketID, airports.DestCityName, airports.DestState, airports.DestStateFips, airports.DestStateName, airports.DestWac, airports.Distance, airports.DistanceGroup, airports.Div1Airport, airports.Div1AirportID, airports.Div1AirportSeqID, airports.Div1LongestGTime, airports.Div1TailNum, airports.Div1TotalGTime, airports.Div1WheelsOff, airports.Div1WheelsOn, airports.Div2Airport, airports.Div2AirportID, airports.Div2AirportSeqID, airports.Div2LongestGTime, airports.Div2TailNum, airports.Div2TotalGTime, airports.Div2WheelsOff, airports.Div2WheelsOn, airports.Div3Airport, airports.Div3AirportID, airports.Div3AirportSeqID, airports.Div3LongestGTime, airports.Div3TailNum, airports.Div3TotalGTime, airports.Div3WheelsOff, airports.Div3WheelsOn, airports.Div4Airport, airports.Div4AirportID, airports.Div4AirportSeqID, airports.Div4LongestGTime, airports.Div4TailNum, airports.Div4TotalGTime, airports.Div4WheelsOff, airports.Div4WheelsOn, airports.Div5Airport, airports.Div5AirportID, airports.Div5AirportSeqID, airports.Div5LongestGTime, airports.Div5TailNum, airports.Div5TotalGTime, airports.Div5WheelsOff, airports.Div5WheelsOn, airports.DivActualElapsedTime, airports.DivAirportLandings, airports.DivArrDelay, airports.DivDistance, airports.DivReachedDest, airports.Diverted, airports.FirstDepTime, airports.FlightDate, airports.FlightNum, airports.Flights, airports.LateAircraftDelay, airports.LongestAddGTime, airports.Month, airports.NASDelay, airports.Origin, airports.OriginAirportID, airports.OriginAirportSeqID, airports.OriginCityMarketID, airports.OriginCityName, airports.OriginState, airports.OriginStateFips, airports.OriginStateName, airports.OriginWac, airports.Quarter, airports.SecurityDelay, airports.TailNum, airports.TaxiIn, airports.TaxiOut, airports.TotalAddGTime, airports.UniqueCarrier, airports.WeatherDelay, airports.WheelsOff, airports.WheelsOn, airports.Year, airports._c109]; line 1 pos 87;

Temp table was not correctly assigned.
I have changed from
temp_table_name = "flights"
flights.createOrReplaceTempView(temp_table_name)
temp_table_name = "airlines"
airlines.createOrReplaceTempView(temp_table_name)
temp_table_name = "airports"
airports.createOrReplaceTempView(temp_table_name)
to
temp_table_name1 = "flights"
flights.createOrReplaceTempView(temp_table_name1)
temp_table_name2 = "airlines"
flights.createOrReplaceTempView(temp_table_name2)
temp_table_name3 = "airports"
flights.createOrReplaceTempView(temp_table_name3)

Related

Cannot resolve logical: syntax error for sql

Not sure how to correct this logical syntax error, help would be appreciated!
Traceback (most recent call last):
File "c:\Users\M\Desktop\Coding\Course4wk4sql.py", line 7, in
cur.executescript('''
sqlite3.OperationalError: near "#logical": syntax error
PS C:\Users\M\Desktop\Coding> sqlite3.OperationalError: near "#logical": syntax error
Here is the code:
import json
import sqlite3
conn = sqlite3.connect ('rosterdb.sqlite')
cur = conn.cursor()
cur.executescript('''
DROP TABLE IF EXISTS User;
DROP TABLE IF EXISTS Member;
DROP TABLE IF EXISTS Course;
CREATE TABLE User (
id INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT UNIQUE,
name TEXT UNIQUE #logical key
);
CREATE TABLE Course (
id INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT UNIQUE,
title TEXT UNIQUE
);
CREATE TABLE Member (
user_id INTEGER,
course_id INTEGER,
role INTEGER,
PRIMARY KEY (user_id, course_id) #Going to force combination of these two to be unique
)
''')
filename = "roster_data.json"
jsondata = open(filename)
data = json.load(jsondata)
for entry in data:
user = entry[0]
course = entry[1]
instructor = entry[2]
user_statement = """INSERT OR IGNORE INTO User(name) VALUE 9 ? )"""
SQLparams = (user, )
cur.execue(course_statement, SQLparams)
course_statement = """INSERT OR IGNORE INTO Course(title) VALUES ( ? )"""
sqlparams = (course, )
cur.execute(course_statement, SQLparams)
courseID_statement = """SELECT id FROM Course WHERE title = ?"""
SQLparams = (course, )
cur.execute(courseID_statement. SQLparams)
courseID =cur.fetone()[0]
userID_statement = """SELECT id FROM User WHERE name = ?"""
SQLparams = (user, )
cur.execute(userID_statement, SQLparams)
userID = cur.fetchone()[0]
member_statement = """INSERT INTO Member(user_id, course_id, role)
VALUES(?, ?, ?)"""
SQLparams = (userID, courseID, instructor)
cur.execute(member_statement, SQLparams)
conn.commit()
test_statement = """
SELECT hex(User.name || Course.title || Member.role ) AS X FROM
User JOIN Member JOIN Course
ON User.id = Member.user_id AND Member.course_id = Course.id
ORDER BY X
"""
cur.execute(test_statement)
result = cur.fetchone()
print("RESULT: " + str(result))
#Closing the connection
cur.close()
conn.close()
You are using non-sql style comments eg #logical key in the sql script. While # is used for commenting in python -- or /* multi line comment */ is typically used in sqlite comments.
As a result you are getting a syntax error. You may remove these python style comments or attempt to replace them with sqlite style comments

SQL query to fetch count on the association tables

I'm trying to write the following SQL Server query, where I also need to fetch the count of vehicleId that is being referred in the child table,
select
BV.[BaseVehicleId], BV.[MakeId], MK.[MakeName], BV.[ModelId],
MD.[ModelName], V.[VehicleId], V.[SubModelId], SubMD.[SubModelName],
V.[RegionId], V.PublicationStageId,
V.[LastUpdateDate], V.[InsertDate], V2BedCount, V2BodyCount,
V2BrakeCount, V2DriveCount, V2EngineCount,V2MfrCount, V2SpringCount,
V2SteeringCount, V2TransmissionCount, V2WheelCount
from
[dbo].[BaseVehicle] BV,
[dbo].[Make] MK,
[dbo].[Model] MD,
[dbo].[SubModel] SubMD,
[dbo].[VehicleToBedConfig] V2Bed,
[dbo].[VehicleToBodyStyleConfig] V2Body,
[dbo].[VehicleToBrakeConfig] V2Brake,
[dbo].[VehicleToDriveType] V2Drive,
[dbo].[VehicleToEngineConfig] V2Engine,
[dbo].[VehicleToMfrBodyCode] V2Mfr,
[dbo].[VehicleToSpringTypeConfig] V2Spring,
[dbo].[VehicleToSteeringConfig] V2Steering,
[dbo].[VehicleToTransmission] V2Transmission,
[dbo].[VehicleToWheelBase] V2Wheel,
[dbo].[Vehicle] V
where
V.[PublicationStageId] = '4'
and V.[DeleteDate] IS NULL
and BV.[BaseVehicleId] = V.[BaseVehicleId]
and MK.[MakeId] = BV.[MakeId]
and MD.[ModelId] = BV.[ModelId]
and V.[SubModelId] = SubMD.[SubModelId]
and V.[VehicleId] = V2Bed.[VehicleId]
and V.[VehicleId] = V2Body.[VehicleId]
and V.[VehicleId] = V2Brake.[VehicleId]
and V.[VehicleId] = V2Drive.[VehicleId]
and V.[VehicleId] = V2Engine.[VehicleId]
and V.[VehicleId] = V2Mfr.[VehicleId]
and V.[VehicleId] = V2Spring.[VehicleId]
and V.[VehicleId] = V2Steering.[VehicleId]
and V.[VehicleId] = V2Transmission.[VehicleId]
and V.[VehicleId] = V2Wheel.[VehicleId]
I'm looking for a way to push the details on these columns from the above query:
V2BedCount, V2BodyCount, V2BrakeCount,
V2DriveCount, V2EngineCount,
V2MfrCount, V2SpringCount, V2SteeringCount, V2TransmissionCount,
V2WheelCount
Here V2BedCount is the count of Vehicle ID's that are mapped with VehicleToBedConfig table like
select COUNT(VehicleId) V2BedCount
from VehicleToBedConfig
group by VehicleId
Please let me know how do I insert the second query in the first to have one query populate the count details for all these columns
V2BedCount, V2BodyCount, V2BrakeCount,
V2DriveCount, V2EngineCount,
V2MfrCount, V2SpringCount, V2SteeringCount, V2TransmissionCount,
V2WheelCount
Assuming Id is a PK in VehicleToBedConfig try
COUNT(DISTINCT V2Bed.Id) OVER(PARTITION BY V.VehicleId) V2BedCount,
...
I advice you use an explicit JOIN syntax.

Oracle error: Not a group by function

Select EVENTPLAN.PLANNO, EVENTPLANLINE.LINENO, RESOURCETBL.RESNAME,
COUNT(EVENTPLANLINE.NUMBERFLD) AS NUMBEROFRESOURCES,
LOCATION.LOCNAME, EVENTPLANLINE.TIMESTART, EVENTPLANLINE.TIMEEND
FROM EVENTPLAN, RESOURCETBL, EVENTPLANLINE, LOCATION, FACILITY
WHERE EVENTPLAN.PLANNO = EVENTPLANLINE.PLANNO
AND EVENTPLANLINE.RESNO = RESOURCETBL.RESNO
AND EVENTPLANLINE.LOCNO = LOCATION.LOCNO
AND FACILITY.FACNO = LOCATION.FACNO
AND FACILITY.FACNAME = 'Basketball arena'
AND EVENTPLAN.ACTIVITY = 'Operation'
AND EVENTPLAN.WORKDATE BETWEEN '1-OCT-13' AND '31-DEC-13'
GROUP BY EVENTPLAN.PLANNO, EVENTPLANLINE.LINENO,
RESOURCETBL.RESNAME,EVENTPLANLINE.NUMBERFLD;
On running this query I am getting an error: Not a group by function. Can someone please tell me why am I getting this error? I have added all the fields in the GROUP BY function.
When you use an aggregate function ALL scalar fields must be in GROUP BY function.
You have missed these:
LOCATION.LOCNAME, EVENTPLANLINE.TIMESTART, EVENTPLANLINE.TIMEEND
So, the right query will be:
SELECT
EVENTPLAN.PLANNO, EVENTPLANLINE.LINENO, RESOURCETBL.RESNAME,
COUNT(EVENTPLANLINE.NUMBERFLD) AS NUMBEROFRESOURCES,
LOCATION.LOCNAME, EVENTPLANLINE.TIMESTART, EVENTPLANLINE.TIMEEND
FROM EVENTPLAN, RESOURCETBL, EVENTPLANLINE, LOCATION, FACILITY
WHERE EVENTPLAN.PLANNO = EVENTPLANLINE.PLANNO
AND EVENTPLANLINE.RESNO = RESOURCETBL.RESNO
AND EVENTPLANLINE.LOCNO = LOCATION.LOCNO
AND FACILITY.FACNO = LOCATION.FACNO
AND FACILITY.FACNAME = 'Basketball arena'
AND EVENTPLAN.ACTIVITY = 'Operation'
AND EVENTPLAN.WORKDATE BETWEEN '1-OCT-13' AND '31-DEC-13'
GROUP BY EVENTPLAN.PLANNO, EVENTPLANLINE.LINENO, RESOURCETBL.RESNAME,
LOCATION.LOCNAME, EVENTPLANLINE.TIMESTART, EVENTPLANLINE.TIMEEND

group by in django aggregate function?

I have problem writing group by sql into django app. Can any of you django users help me how to write this sql into django-friendly code? This is my model:
class Stock(models.Model):
name = models.CharField("Stock's name", max_length=200)
symbol = models.CharField("Stock's symbol", max_length=20)
class Dividend(models.Model):
amount = models.FloatField(default=0)
date = models.DateField('pay date')
stock = models.ForeignKey(Stock)
class UserStock(models.Model):
amount = models.FloatField('amount', default=0)
date = models.DateField('buy date')
price = models.FloatField('price', default=0)
user = models.ForeignKey(User)
stock = models.ForeignKey(Stock)
And this is sql code I want to write in django:
select stock_id, sum(price), sum(amount) as price from stocks_userstock group by stock_id;
I was trying to write something like this.
my_stock = UserStock.objects.filter(user=request.user)\
.annotate(sum_price = sum('price'), sum_amount = sum('amount'))
Thanks in advance, I hope it won't be a problem for some of you.
I believe just adding a values call will do that
my_stock = UserStock.objects.get(user=request.user)\
.values('stock_id').annotate(sum_price = sum('price'), sum_amount = sum('amount'))
and you will get back a list of dicts similar to
[
{'stock_id': 0, 'sum_price': 10, 'sum_amount': 25},
...
]
see here for more info
stock = Stock.objects.all().annotate(sum_price = Sum('user_stock__price'), sum_amount = Sum('user_stock__amount'))
It should work.
I've removed the User filter in my snippet to simplify but you can add it of course.

Problems with Left join Query LinqToSql

IBookingRepository bookingResp = new BookingRepository();
IQueryable<bookingTest> bookings = bookingResp.GetAllBookingsByView();
var grid = new System.Web.UI.WebControls.GridView();
grid.DataSource = from booking in bookings
join f in getallAttendees on booking.UserID equals f.UserID into fg
from fgi in fg.DefaultIfEmpty() //Where(f => f.EventID == booking.EventID)
where
booking.EventID == id
select new
{
EventID = booking.EventID,
UserID = booking.UserID,
TrackName = booking.Name,
BookingStatus = booking.StatusID,
AttendeeName = booking.FirstName,
// name = account.FirstName,
AmountPaid = booking.Cost,
AttendeeAddress = booking.DeliveryAdd1,
City = booking.DeliveryCity,
Postcode = booking.Postcode,
Date = booking.DateAdded,
hel = fgi == null ? null : fgi.HelmetsPurchased }// Product table
Hi, the above query doesnt executes it gives an error: The specified LINQ expression contains references to queries that are associated with different contexts. Any one can spot the what the problem is with the query.
I think that your getAllAttendees is from a different context than bookings so you won't be able to join them. To give a more exact answer you need to show where bookings and getAllAttendees comes from.