Alternate for minus in hive - hive

Can someone clearly explain to me how the below query works as an alternate for minus query in hive?
SELECT Customers.CustomerID
FROM Customers
LEFT JOIN Orders
ON Customers.CustomerID=Orders.CustomerID
where Orders.CustomerID is null;

Minus = present in first and not present in second(in your case, Customer ID present in Customer not in Order)
Please see below the explanation of your query.
Suppose customer table has the following data :
SNo,CustomerID
1,Customer1
2,Customer2
3,Customer3
Orders table has the following data:
SNo,CustomerID
1,Customer1
2,Customer3
3,Customer4
Subtracting Orders from Customer will give Customer2 as output.
Now when we are performing left join then it will take all the customer IDs from Customer table and join with Customer ID in orders table. When the
Customer ID is present in Customer but not present in Order then it would be Null which you are selecting in your WHERE clause. See below the joining Output and based on your query you will get "Customer2,Null" as your output.
Joining Output :
Customer1,Customer1
Customer2,Null
Customer3,Customer3

Related

SQL Join Table A Column A to Table B Column A or B

I have 2 tables - A Spend Table and a Sales Table
Spend Table Schema:
Spend_ID,
Spend_amount
Sales Table Schema:
Sales_ID_A,
Sales_ID_B,
Sales_amount
I want to do a left join of the spend table to the sales table. The join key from the spend table is the Spend_ID and i want to join when it matches the value in either Sales_ID_A or Sales_ID_B i.e. my match key on the sales table is in 2 columns. So in a way its like applying an 'or' condition to the join on the sales table Sales_ID_A or Sales_ID_B. If a match is found in Sales_ID_A, then no need to check Sales_ID_B. Only check Sales_ID_B if no match is found in Sales_ID_A. How do i achieve this with SQL?
Sample Data Illustration:
See screenshot
The columns called sales_id_a and sales_id_b are actually spend IDs and should better be called spend_id_a and spend_id_b.
You want to join on the first ID, but if that is null, you want to join on the second ID. Use COALESCE for this:
select *
from sales
left join spend on spend.spend_id = coalesce(sales.sales_id_a, sales.sales_id_b)

How to get two fields based off a most recent date attribute?

I have two tables:
A Billing table, and a Customer table.
The Billing table and customer table both share a common attribute of Customer Number.
Billing Table
I'm trying to create a view that will retrieve the customer code and bill number for the most recent invoice date. I'm having trouble ordering my query.
This is what I have so far.
CREATE VIEW RECENT_ORDER
AS
SELECT
c.Customer_Num, b.Bill_Num
FROM CUSTOMER c
INNER JOIN BILLING b ON c.Customer_Num = b.Customer_Num
WHERE c.Fname='Jess' AND c.Lname='Hanks'
HAVING MAX(b.Bill_Date);
I have also tried putting the 'HAVING' portion as a WHERE statement.
This should get your answer:
CREATE VIEW RECENT_ORDER
AS
SELECT c.customer_num, b.bill_num
FROM customer c
JOIN billing b ON c.customer_num = b.customer_num
WHERE b.bill_date =
(SELECT MAX(bill_date) FROM billing WHERE customer_num = b.customer_num)
AND c.Fname='Jess' AND c.Lname='Hanks'
Though normally I wouldn't expect to create a view that limits results to just one customer!? Remove the last line AND c.Fname ... if you intend to get the most recent result for any/all customers. Note that you may get multiple results for a customer if they have more than one invoice on the same date.

Why do repeated values appear in SQL results

I'm with a doubt about joins. For example, using an example database dvdrental, this query:
SELECT customer.customer_id
, first_name
, last_name
FROM customer
INNER JOIN payment ON Customer.customer_id = Payment.customer_id
Some records appear repeated, for example, it appears 3 times "342 Harold Martino" like:
342 Harold Martino
342 Harold Martino
342 Harold Martino
Do you know why it appears repeated records like in this example that appears the same Record 3 times? This repetition means that there are 3 records in the payment table where customer_id = 342? But this query "select * from payment where customer_id = 342" returns 32 records. So I'm not understanding properly how the join works.
There are many resources around this, so to be short your expression says this in plain english:
Get all the records from the customer table
Then for each of those records, get every payment record that has the same value in the customer_Id field.
return a single row for each payment record that duplicates all the fields from the customer record for each row in the payment record.
Finally, only return 3 columns:
the customer_id column from the customer table
the first_name column that is in one of the customer or payment table
the last_name column that is in one of the customer or payment table
Note that we didn't bring back any columns from the payment table... (I assume first_name and last_name are fields in the customer table...)
Keep in mind, a CROSS JOIN (or a FULL OUTER JOIN) is a join that says take all fields from the left side and create a single row combination that is multiplied by the right side, so for every row on the left, return a combination of the left row with every row on the right. So the number of rows returned in a CROSS JOIN is the number of rows in the current table, multiplied by the number of rows in the joined table.
In your query, an INNER JOIN or LEFT INNER JOIN will produce a recordset that includes all the fields from the current table structure and will include fields from the joined table as well.
the implicit LEFT component specifies that only records that match the existing table structure should be returned, so in this case only Payment records that match a customer_id in the currently not filtered customer table will be returned.
The number of resulting rows is the number of rows in the joined table that have a match in the current table.
If instead you want to query:
Select all the customers that have payments
then you can use a join, but you should also use a DISTINCT clause, to only return the unique records:
SELECT DISTINCT customer.customer_id
, first_name
, last_name
FROM customer
INNER JOIN payment ON Customer.customer_id = Payment.customer_id
An alternative way to do this is to use a sub-query instead of a join:
SELECT customer_id
, first_name
, last_name
FROM customer
WHERE EXISTS (SELECT customer_id FROM payment WHERE payment.customer_id = customer.customer_id)
The rules on when to use one style of query over the other are pretty convoluted and very dependant on the number of rows in each table, the types of indexes that are available and even the type or version of the RDBMS you are operating within.
To optimise, run both queries, compare the results and timing and use the one that fits your database better. If later performance becomes an issue, try the other one :)
Select the Customer_id field

How to SQL distinct on only one column

I'm trying to apply DISTINCT on only one column.
The question is:
Who is ordering equipment where the description begins with "tennis" or "volleyball".
Include:
Customer number,
Stock number, and
Description
Do not repeat any rows.
This is what the tables look like: Items, Stock, Orders
This is my code:
select distinct
orders.customer_num, stock.stock_num, stock.description
from
orders
join
items on items.order_num = orders.order_num
join
stock on stock.stock_num = items.stock_num
where
stock.description like 'tennis%'
or stock.description like 'volleyball%';
The result is:
But I'm trying to get no repeating numbers on the CUSTOMER_NUM column.
Thank you..
There is a possibility that your join condition is wrong.please try joining item table to customer table with condition as items.customer_num =customer.customer_num .I am not sure whether it will work as we dont have correct data of these tables.
I'm not sure you'll see this, but I don't believe the rows aren't repeating. Look at the description. You're only getting one different item description per customer number which don't repeat for that customer number. You can see this by adding in: 'order by orders.customer_num, stock.stock_num;' to the end.

How can I apply the LIMIT statement in a SQLite query to a specific side of a join?

Here's my requirement:
I have 2 tables, orders and orderContents. For each row in the orders table, there are a certain number of rows that contain description of the order. id column serves as foreign key.
What I want is to get all the details for each order (details from orderContents, including id column from orders table) table, but limit no. of results based on common column (foreign key, id)
Problem is that it limits orderContents rows, instead of limiting order rows.
How can I achieve desired effect?
EDIT: Updating tables and desired result set
Orders table:
OrderContents table:
Desired result on limiting number of records to 2:
I'm assuming you are trying to say that you want the results from both tables but only for the first X orders. If so, try this:
SELECT OC.*, O.* FROM OrderContents OC
INNER JOIN (SELECT *
FROM Orders
ORDER BY ID
LIMIT 2) O ON O.ID=OC.ID