pyspark is not recognizing table from outside sql subquery - apache-spark-sql

I have registered two tables as temp views
ldf.createOrReplaceTempView("loan")
mdf.createOrReplaceTempView("mkt")
df = spark.sql("SELECT * FROM loan join (select * from mkt where loan.id >= mkt.id) as m on loan.qtr = m.qtr limit 1")
df.show()
However when I run this command pyspark fails to recognize the loan view inside the subquery. The error makes it seem like it can only see tables/views that are inside the subquery. It's not even looking in the loan view.
AnalysisException: cannot resolve 'loan.id' given input columns: [mkt.id, mkt.lo, mkt.qtr]

The current query is the following,
SELECT *
FROM loan
JOIN (
SELECT *
FROM mkt
WHERE loan.id >= mkt.id
) AS m
ON loan.qtr = m.qtr limit 1
The nested select statement that defines m does not know what loan is or what loan.id is. Therefore it's unable to make the comparison between loan.id and mkt.id.
I tested this current query format using both MySQL and PrestoSQL and both returned "Column cannot be resolved" error messages. These error messages align with the Spark SQL error message you received.
Instead you can compare columns from loan and mkt after joining the datasets together.
SELECT *
FROM loan
JOIN mkt ON loan.qtr = mkt.qtr
WHERE loan.id >= mkt.id
LIMIT 1
Run via Spark SQL it looks like,
df = spark.sql("SELECT * FROM loan JOIN mkt ON loan.qtr = mkt.qtr WHERE loan.id >= mkt.id LIMIT 1")

Related

AnalysisException: Could not resolve column/field reference: 'transaction_nominal_value' in SQL Impala

I have the following SQL query where I have two datasets CSDB and MMSR. I merge these two datasets to th Final dataset. So far everything works fine. Now I want to aggregate the data on trade_date but I get the error message: AnalysisException: Could not resolve column/field reference: 'transaction_nominal_value'. I am working in Impala. How can I resolve the problem?
--CSDB
WITH CSDB AS (
SELECT isin, nominal_currency, amount_outstanding, issue_price, amount_issued, yield, original_maturity, residual_maturity
FROM csdb_pq
where nominal_currency = "EUR"
),
--MMSR
MMSR AS (
SELECT transaction_nominal_amount, maturity_days, deal_rate, collateral_haircut, collateral_isin, collateral_nominal_amount, trade_date
FROM datashop_store_business_mmsr.secured_vl_pq
WHERE collateral_isin IS NOT NULL
),
---Join
Final AS (
SELECT *
FROM MMSR
LEFT JOIN CSDB
ON MMSR.collateral_isin = CSDB.isin
)
--Aggregate Data
SELECT trade_date, AVG(transaction_nominal_amount)
FROM Final
GROUP BY trade_date;
Should be transaction_nominal_amount, not transaction_nominal_value per your table elements within your CTE. Just a simple column name error and it should work.

SQL Math Operation In Correlated Subquery

I am working with three tables, basically, one is a bill of materials, one contains part inventory, and the last one contains work orders or jobs. I am trying to find out if it is possible to have a correlated subquery that can perform a math operation using a value from the outer query. Here's an example of what I'm trying to do:
SELECT A.work_order,A.assembly,A.job_quantity,
(SELECT COUNT(X.part_number)
FROM bom X
WHERE X.assembly = A.assembly
AND (X.quantity_required * A.job_quantity) >= (SELECT Y.quantity_available FROM inventory Y WHERE
Y.part_number = X.part_number)) AS negatives
FROM work_orders A
ORDER BY A.assembly ASC
I am attempting to find out, for a given work order, if there are parts that we do not have enough of to build the assembly. I'm currently getting an "Error correlating fields" error. Is it possible to do this kind of operation in a single query?
Try moving the subquery to a join, something like this:
SELECT a.work_order, a.assembly, a.job_quantity, n.negatives
FROM work_orders a JOIN (SELECT x.part_number, COUNT(x.part_number) as negatives
FROM bom x JOIN work_orders b
ON x.assembly = b.assembly
WHERE (x.quantity_required * b.job_quantity) >= (SELECT y.quantity_available
FROM inventory y WHERE
y.part_number = x.part_number)
GROUP BY x.part_number) n
ON a.part_number = n.part_number
ORDER BY a.assembly ASC
Or create a temporary cursor with the subquery and then use it to join the main table.
Hope this helps.
Luis

To pair up records in oracle sql

This is my sql:
when executed, it said
00000 - "missing keyword" on the position of "CROSS APPLY".
I'm just trying to pair up some records (in one day =20160720) with same TICKET_ID in the table and return their T_TIME and T_LOCATION.
select a.T_TIME, b.T_TIME, a.T_LOCATION, b.T_LOCATION
FROM TABLE a
CROSS APPLY
(select * from TABLE b where a.TICKET_ID = b.TICKET_ID having count(TICKET_ID) > 1) b
where (a.T_DATE=20160720);
Is the problem caused by using CROSS APPLY?
Ok, here is the problem I originally want to solve :)
The table looks like this:
T_TIME |T_LOCATION | TICKET_ID|T_DATE
20160720091032| ---0103| 1A268F|20160720
20160720095842| ---0115| 63T37H|20160720
20160720133408| ---0124| 1A268F|20160720
20160721152400| ---0116| 598I3R|20160721
20160720125844| ---0147| 63T37H|20160720
I want to pair up the records with same TICKET_ID. 2 records share one same TICKET_ID. And I want the output like:
20160720091032|20160720133408|0103|0124|
20160720095842|20160720125844|0115|0147|
The table is very large like for T_DATE=20160720 there will be 200000 records in total.
One way of doing it would be:
select a.ticket_id, a.t_time, b.t_time, a.t_location, b.t_location
from the_table a
join the_table b on a.ticket_id = b.ticket_id and a.t_time < b.t_time
where a.t_date = 20160720;
The join condition and a.t_time < b.t_time ensure that the "other" version of a pair isn't in the result e.g. you only get (0103, 0124) but not (0124, 0103).

Syntax error in Sql MS-Access

My question: the owners would like to know the revenue generated so far (i.e. where CheckOutDate < DATE()) for each room type in each hotel.
The calculation must be done in the SQL statement.
Determine the length of stay for each reservation (i.e. number of days) using the DateDiff function datediff('d', checkindate, checkoutdate) and multiply this value by the room rate.
Your output should be formatted as shown on the next page. Your Revenue totals may be different. Keep in mind, the Revenue amount may change on a daily basis, as we want to include only those reservations that are completed, not current or future reservations.
select
room.hotelID, room.roomtype,
datediff('d', Reservation.CheckOutDate, Reservation.CheckInDate) * ROOM_TYPE.RoomRate as Revenue
from
Reservation
inner join
Room on Room.hotelID = Reservation.HotelID
inner join
ROOM_TYPE on ROOM_TYPE.RoomType = Room.roomtype
group by
Room.HotelID, Room.roomtype;
I am getting syntax error statement missing in this.
How to resolve this error in MS Access?
When using a Group By clause, any columns that are not part of the grouping must be aggregated. In your case, Room.HotelID and Room.RoomType are the grouping columns. So they are fine in your SELECT clause, as-is. But Revenue needs to be aggregated. I expect that you will want to use the SUM aggregation to sum up all of the Revenue values for each room type. Try this...
select room.hotelID,
room.roomtype,
SUM( datediff(day,Reservation.CheckOutDate,Reservation.CheckInDate )*ROOM_TYPE.RoomRate) as Revenue
from Reservation
inner join Room on Room.hotelID=Reservation.HotelID
inner join ROOM_TYPE on ROOM_TYPE.RoomType=Room.roomtype
group by Room.HotelID, Room.roomtype;
Running the query below against your data in Access 2010 produced this result set:
hotelID roomtype Revenue
------- -------- ----------
1000 D $23,000.00
1000 F $23,100.00
1000 S $20,700.00
1111 D $36,500.00
1111 F $16,450.00
1111 S $15,300.00
SELECT
rm.hotelID,
rm.roomtype,
Sum(DateDiff('d', rs.CheckInDate, rs.CheckOutDate) * rt.RoomRate) AS Revenue
FROM
(
ROOM AS rm INNER JOIN RESERVATION AS rs
ON (rm.roomno = rs.RoomNo) AND (rm.hotelID = rs.HotelID)
)
INNER JOIN ROOM_TYPE AS rt
ON rm.roomtype = rt.RoomType
WHERE rs.CheckOutDate < Date()
GROUP BY rm.hotelID, rm.roomtype;
You should still learn how to use the Query Builder but I think the parens should look something like this:
select
Room.HotelID, Room.roomtype,
sum(
datediff('d',Reservation.CheckOutDate,Reservation.CheckInDate) *
ROOM_TYPE.RoomRate
) as Revenue
from
((Reservation inner join Room on Room.hotelID = Reservation.HotelID)
inner join ROOM_TYPE on ROOM_TYPE.RoomType = Room.roomtype)
group by
Room.HotelID, Room.roomtype;
So in summary:
Be careful with grouping columns and aggregates
Access uses quotes around it's datediff argument unlike some other systems
Nesting of joins needs parentheses
In MS Access we have to mention in brackets () on the clause in from statement. Apart from that datediff function has to be a part of aggregate function.

SQL SUM total using 2 tables

I have 2 tables: TBL_EQUIPMENTS and TBL_PROPOSAL.
TBL_PROPOSAL has 3 important columns:
id_proposal
date
discount
TBL_EQUIPMENTS has:
id_equipment
id_proposal
unit_price
quantity
Now I want to know how much (in €) is my proposals for this year, let's say:
For each TBL_PROPOSAL.date > "2013-01-01" I want to use the formula:
result = (TBL_EQUIPMENTS.unit_price * TBL_EQUIPMENTS.quantity) * (100 - TBL_PROPOSAL.discount)
I can do this with one SQL statement?
Yes you can:
select e.unit_price * e.quantity) * (100 - p.discount)
from tbl_Proposal p join
tbl_Equipments e
on p.id_Proposal = e.id_proposal
where date >= '2013-01-01'
The basic syntax is for a join. The p and e are called table aliases. They make the query easier to read (the full table names are rather bulky).
Date operations differ among databases. The last statement should work in most databases. However, you might try one of the following as well:
where year(date) = 2013
where extract(year from date) = 2013