Sales Dataframe:
transaction_id, name, customer_id, amount
Customer Dataframe:
customer_id, customer_name
I want to perform left join and on Sales Dataframe and then perform some transformations on Sales data frame rows, missing cusomter_ids. However when I do this,
sales.join(customer,Seq("customer_id"),"left_join").select(customer("cusomter_id"))
I still get that its ambiguous, I tried using Alias too, but failed, is there a better way to do the same?
Related
I have a distinct list of part numbers from one table. It is basically a table that contains a record of all the company's part numbers. I want to add columns that will pull data from different tables but only pertaining to the part number on that row of the distinct part list.
For example: if I have part A, B, C from the unique part list I want to add columns for Purchase quantity, repair quantity, loan quantity, etc... from three totally unique tables.
So it's almost like I need 3 subqueries that will sum of that data from the different tables for each part.
Can anybody steer me in the direction of how to do this? Please and thank you so much!
One method is correlated subqueries. Something like this:
select p.*,
(select count(*)
from purchases pu
where pu.part_id = p.part_id
) as num_purchases,
(select count(*)
from repairs r
where r.part_id = p.part_id
) as num_repairs,
(select count(*)
from loans l
where l.part_id = p.part_id
) as num_loans
from parts p;
Another option is joins with aggregation before the join. Or lateral joins (which are quite similar to correlated subqueries).
I tried this whole day but didn't got the result. Please go through the images on the link below
Here is my table 1 structure
and here is my table 2 structure
I want to multiply product_uom_qty from 1st table with cost in 2nd table and then group them by product_id which is there is both tables.
This is my query.
select sum(sale_order_line.product_uom_qty) * product_price_history.cost
from sale_order_line, product_price_history
group by product_id;
And result I am getting is
ERROR: column reference "product_id" is ambiguous. LINE 1: ...m
sale_order_line, product_price_history group by product_id...
please tell me where I am making mistake.
Presumably, you want something like this:
select product_id, sum(sol.product_uom_qty * pph.cost)
from sale_order_line sol join
product_price_history pph
using (product_id)
group by product_id;
That is, you need to join the tables on some condition.
Without sample data and desired results, it is unclear what you really want. But this at least is a sensible query that removes the syntax errors.
In this database I need to find the total amount that each customer paid for books in a category, and then sort them by their customer ID. The code appears to run correctly but I end up with approximately 20 extra rows than I should, although the sum appears to be correct in the right rows.
The customer ID is part of customer, but is not supposed to appear in the select clause, when I try and ORDER BY it, I get strange errors. The DB engine is DB2.
SELECT distinct customer.name, book.cat, sum(offer.price) AS COST
FROM offer
INNER JOIN purchase ON purchase.title=offer.title
INNER JOIN customer ON customer.cid=purchase.cid
INNER JOIN member ON member.cid=customer.cid
INNER JOIN book ON book.title=offer.title
WHERE
member.club=purchase.club
AND member.cid=purchase.cid AND purchase.club=offer.club
GROUP BY customer.name, book.cat;
You should fix your join conditions to include the ones in the where clause (between table relationships usually fit better into an on clause).
SELECT DISTINCT is almost never appropriate with a GROUP BY.
But those are not your question. You can use an aggregation function:
GROUP BY customer.name, book.cat
ORDER BY MIN(customer.id)
Currently have a single table with large amount of data in access, due to the size I couldn't easily work with it in Excel any more.
I'm partially there on a query to pull data from this table.
7 Column table
One column GL_GL_NUM contains a transaction number. ~ 75% of these numbers are pairs. I'm trying to pull the records (all columns information) for each unique transaction number in this column.
I have put together some code from googling that hypothetically should work but I think I'm missing something on the syntax or simply asking access to do what it cannot.
See below:
SELECT SOURCE_FUND, GLType, Contract, Status, Debit, Credit, GL_GL_NUM
FROM Suspense
JOIN (
SELECT TC_TXN_NUM TXN_NUM, COUNT(GL_GL_NUM) GL_NUM
FROM Suspense
GROUP BY TC_TXN_NUM HAVING COUNT(GL_GL_NUM) > 1 ) SUB ON GL_GL_NUM = GL_NUM
Hey Beth is this the suggested code? It says there is a syntax error in the FROM clause. Thanks.
SELECT * from SuspenseGL
JOIN (
SELECT TC_TXN_NUM, COUNT(GL_GL_NUM) GL_NUM
FROM Suspense
GROUP BY TC_TXN_NUM
HAVING COUNT(GL_GL_NUM) > 1
Do you want detailed results (all rows and columns) or aggregate results, with one row per tx number?
If you want an aggregate result, like the count of distinct transaction numbers, then you need to apply one or more aggregate functions to any other columns you include.
If you run
SELECT TC_TXN_NUM, COUNT(GL_GL_NUM) GL_NUM
FROM Suspense
GROUP BY TC_TXN_NUM
HAVING COUNT(GL_GL_NUM) > 1
you'll get one row for each distinct txn, but if you then join those results back with your original table, you'll have the same number of rows as if you didn't join them with distinct txns at all.
Is there a column you don't want included in your results? If not, then the only query you need to work with is
select * from suspense
Considering your column names, what you may want is:
SELECT SOURCE_FUND, GLType, Contract, Status, sum(Debit) as sum_debit,
sum(Credit) as sum_credit, count(*) as txCount
FROM Suspense
group by
SOURCE_FUND, GLType, Contract, Status
based on your comments, if you can't work with aggregate results, you need to work with them all:
Select * from suspense
What's not working? It doesn't matter if 75% of the txns are duplicates, you need to send out every column in every row.
OK, let's say
Select * from suspense
returns 8 rows, and
select GL_GL_NUM from suspense group by GL_GL_NUM
returns 5 rows, because 3 of them have duplicate GL_GL_NUMs and 2 of them don't.
How many rows do you want in your result set? if you want less than 8 rows back, you need to perform some sort of aggregate function on each column you want returned.
You could do something like the following:
SELECT S.* FROM
SUSPENSE AS S
INNER JOIN (SELECT DISTINCT GL_GL_NUM, MIN(ID) AS ID FROM SUSPENSE
GROUP BY GL_GL_NUM) AS S2
ON S.ID = S2.ID
AND S.GL_GL_NUM = S2.GL_GL_NUM
Which would return a single row for a unique gl_gl_num. However if the other rows have different data it will not be shown. You would have to either aggregate that data up using SUM(Credit), SUM(Debit) and then GROUP BY the gl_gl_num.
I have attached a SQL Fiddle to demonstrate my results and make this clearer.
http://sqlfiddle.com/#!3/8284f/2
I'm just starting out with SQL, I have been playing around with simple select queries and grouping data, now I want to pull some actually useful data out of our database for analysis. The data is organized as follows:
Access 2010 Database
I didn't set it up, I know it isn't set up as it should be
I can't change data, only poll
-Customers are kept in one table
-Closed orders are kept in another table (each line item is listed with invoice #, date closed, Customer ID as well as other info)
-Archived closed orders table keep sales records that are a year + old (table is laid out exactly the same as Closed order table)
I want to start with a simple query, list all the customers from a certain branch and their past year totals. Here's what I have tried:
SELECT CUSTOMERS.Company, CUSTOMERS.[Ship City], (SELECT SUM (CLOSEDORDERS.Quant*CLOSEDORDERS.SellPrice) FROM CLOSEDORDERS WHERE CUSTOMERS.ID = CLOSEDORDERS.CustID) AS LifeTotal
FROM CUSTOMERS, CLOSEDORDERS
WHERE CUSTOMERS.Branch=33;
When I run the query, it asks me to enter a parameter value for CLOSEDORDERS.Quant. What am I doing wrong?
I think this is what you're looking for with an OUTER JOIN:
SELECT CUSTOMERS.Company,
CUSTOMERS.[Ship City],
SUM(CLOSEDORDERS.Quant*CLOSEDORDERS.SellPrice) AS LifeTotal
FROM CUSTOMERS
LEFT JOIN CLOSEDORDERS ON CUSTOMERS.ID = CLOSEDORDERS.CustID
WHERE CUSTOMERS.Branch=33
GROUP BY CUSTOMERS.Company,
CUSTOMERS.[Ship City]
If you only want to return matching results from both tables, then use a standard INNER JOIN instead of the LEFT JOIN.