Same Join on multiple fields in Hive

Same Join on multiple fields in Hive - hive

I have a metadata table CUST as below in hive
Cust_dec Cust_det
buy Interested
buy Cheap
no_buy Found Cheaper
no_buy No Interest
no_buy Other Faults
There's another table Reg_cst_dtls that needs to be joined with above metadata table multiple times and derive multiple fields as below.
item_id ca_brd_dec ne_brd_dec co_brd_dec ca_dtl ne_dtl co_dtl
1012 buy no_buy no_buy Interested Found Cheaper Other Faults
5278 buy buy Found Cheaper
1572 no_buy buy buy No Interest Cheap Cheap
6896 no_buy no_buy no_buy Other Faults Cheap Found Cheaper
Now, for each item_id Reg_cst_dtls, I need to see if ca_brd_dec match Cust_dec then ca_dtl should also match Cust_det and a new field ca_resp should be equal to ca_dtl else null.
Similarly, if ne_brd_dec match Cust_dec then ne_dtl should also match Cust_det and ne_resp should be equal to ne_dtl else null
And if co_brd_dec match Cust_dec then co_dtl should also match Cust_det and co_resp should be equal to co_dtl else null. Expected results as below.
item_id ca_brd_dec ne_brd_dec co_brd_dec ca_dtl ne_dtl co_dtl ca_resp ne_resp co_resp
1012 buy no_buy no_buy Interested Found Cheaper Other Faults Interested Found Cheaper Other Faults
5278 buy buy Found Cheaper
1572 no_buy buy buy No Interest Cheap Cheap No Interest
6896 no_buy no_buy no_buy Other Faults Cheap Found Cheaper Other Faults cheap Found Cheaper
Can anyone help with how this can be achieved in Hive ?
Thanks...!

You can use below hive query. If your data is case insensitive try upper or lower functions. Hive UDF approch is viable here.
SELECT item_id,
ca_brd_dec,
ne_brd_dec,
co_brd_dec,
ca_dtl,
ne_dtl,
co_dtl,
CASE
WHEN ca_j.cust_loopup IS NULL THEN NULL
ELSE ca_dtl
END AS ca_resp,
CASE
WHEN ne_j.cust_loopup IS NULL THEN NULL
ELSE ne_dtl
END AS ne_resp,
CASE
WHEN co_j.cust_loopup IS NULL THEN NULL
ELSE co_dtl
END AS co_resp
FROM
(SELECT item_id,
ca_brd_dec,
ne_brd_dec,
co_brd_dec,
ca_dtl,
ne_dtl,
co_dtl,
concat(ca_brd_dec, ca_dtl) AS ca,
concat(ne_brd_dec, ne_dtl) AS ne,
concat(co_brd_dec, co_dtl) AS co
FROM reg_cst_dtls reg
LEFT OUTER JOIN
(SELECT concat(cust_dec, cust_det) AS cust_loopup
FROM cust) ca_j ON reg.ca = ca_j.cust_loopup
LEFT OUTER JOIN
(SELECT concat(cust_dec, cust_det) AS cust_loopup
FROM cust) ne_j ON reg.ne = ne_j.cust_loopup
LEFT OUTER JOIN
(SELECT concat(cust_dec, cust_det) AS cust_loopup
FROM cust) co_j ON reg.co = co_j.cust_loopup) sub

Related

How to fix a query that produces too many rows?

I'm designing Firebird 3.0 database for service sales, for example, for beauty saloons etc.
The database has the tables:
Serv - for service's list;
ServRecs - for service sales records;
Docs - for service documents;
Calc - for service calculations i.e. which raw material is used in specific service, quantity of raw material etc;
RecsOut - for raw material output records (sales);
RecsIn - for raw material Input records;
Inventory - for raw material's & good's names and properties.
Serv: Id, name, qnt, Vat...
ServRecs: Id, serv_id, Doc_id, qnt...
Docs: doc_id, docN, DocDT, Summ, ...
Calc: Id, serv_id, RawMat_id, qnt, unit_id...
RecsOut: id, doc_id, good_id, RecsIn_id
RecsIn: id, good_id...
Inventory: id, name (Rawaterial's and good's name)...
Let me explain with an example:
There is service document 323. There are 2 services used in it: service with serv_id=7 (hair cutting) and serv_id=8 (hair washing). As ServRecs table's qnt field shows service with serv_id=8 is used 2 times (i.e. 2 washing, before and after coloring), service with serv_id=7 only 1 time. As Calc table shows, generally, on service #7 are used raw material with id=11446 15ml and with id=11448 15ml, on service #8 - raw material with id=11450 10ml. That is, total used: raw material 11446 - 15ml, 11448 - 15ml and 11450 - 20ml (2*10ml).
My query looks like this:
select
i.id,
i.name as UsedRawMaterialName,
s.name as ServiceName,
ro.doc_id as ServiceDoc_id,
ri.cost as CostofRawMaterial,
sr.qnt as ServiceQnt, --used service quantity, for example, 2 times washing
sr.qnt*c.qnt as UsedRawMaterialQnt, --used service quantity*rawmaterial's used for 1 service
i.unit_k
from Inventory I, RecsOut ro, RecsIn ri, calc c, servrecs sr, serv s, Docs d, unit u,
where
d.doc_id= ro.doc_id and d.doc_id=sr.doc_id and d.doc_id=323 and
s.id=c.serv_id and sr.serv_id=c.serv_id and
c.rawmat_id=i.id and
ro.recsIn_id=ri.id and
i.unit_k=u.unit_k
My aim is get result like this:
However, the query returns result with redundant records and wrong values like this:
What is wrong in my query?
Update 1:
I changed "old-style Join syntax" with "new-style Join syntax" and easily find out that error was in "Join RecsOut ro on ro.id=i.id" clause. "New-style Join" is really very visually informative than old-style.
select
i.id,
i.name as UsedRawMaterialName,
s.name as ServiceName,
ro.doc_id as ServiceDoc_id,
ri.cost as CostofRawMaterial,
sr.qnt as ServiceQnt, --used service quantity, for example, 2 times washing
sr.qnt*c.qnt as UsedRawMaterialQnt, --used service quantity*rawmaterial's used for 1 service
i.unit_k
from
Inventory I Join RecsOut ro on ro.id=i.id
Join RecsIn ri on ro.recsin_id=ri.id
Join calc c on c.rawmat_id=i.id
join ServRecs sr on sr.serv_id=c.serv_id
Join serv s on s.id=c.serv_id
Join doc d on d.doc_id=ro.doc_id and
d.doc_id=sr.doc_id and
d.doc_id=323
join unit u on i.unit_k=u.unit_k

#basti A major benefit of the "New Style Join" is that each table can be brought in one at a time during development and testing. With each table "joined" it is very straightforward to see which relationship has generated more (or indeed less) records than you are expecting
Translating your code shows me there could be breakage somewhere . Thanks for replying to comment ...
from Inventory I
join RecsOut ro on ro.recsIn_id=ri.id
-- ??? join RecsIn ri, --- ??
join calc c on c.rawmat_id=i.id
join servrecs sr on sr.serv_id=c.serv_id
join serv s on s.id=c.serv_id
join Docs d on d.doc_id= ro.doc_id
and d.doc_id=sr.doc_id
and d.doc_id=323
join unit u on i.unit_k=u.unit_k
Don't forget to embrace inner , left and outer joins

AdventureWorks - Selling Price Problem - Queries Microsoft SQL Server

Just looking for someone who has downloaded AdventureWorks data and done queries with them.
I was looking for someone to explain the difference between list price and unit price.
I filtered to productid 749 and 83% of the time it is being sold to the customer with listprice = unitprice.
I did some digging to see if there were any discounts etc. with the below query which did not come up with an answer. Is there something I am missing?
select *
from sales.specialoffer
where SpecialOfferID = 1;
select SOH.customerID,
SOH.orderdate,
pp.listprice,
sod.unitprice,
sod.ProductID,
sod.SpecialOfferID,
SOD.UnitPriceDiscount,
sr.SalesReasonID,sr.name,
sr.ReasonType
from sales.SalesOrderHeader SOH
inner join sales.SalesOrderDetail SOD
on soh.SalesOrderID = sod.SalesOrderID
inner join production.Product PP
on SOD.ProductID= PP.ProductID
left join sales.SalesOrderHeaderSalesReason SOHSR
on soh.SalesOrderID = sohsr.SalesOrderID
left join sales.SalesReason SR
on SOHSR.SalesReasonID = SR.SalesReasonID
where standardcost >0
and PP.listprice != sod.unitprice
and pp.productid = 749
;

This is really an accounting question. List price, without any further attributes, is generally the "current" price as of now. This value will typically change over time. When you sell (or buy) something, you capture the price of each item sold (as well as other information) with the details of each sale - which is the price you find in the SOD table. Why? For very important accounting reasons.
So no - you aren't missing anything. BTW - did you notice a table called ProductListPriceHistory? So again - the difference you see is a current (or "now") fact versus an historical fact.
Lastly, don't expect a sample database to be completely consistent with respect to all the information it contains. The sample database was built to demonstrate various features of sql server and to serve as a learning platform. FWIW this database is quite dated. MS has developed WorldWideImporters as a replacement.

Oracle, SQL Conditional exclusion based on the items in the joined table

The following query excludes all the products however, I am trying to exclude the products "only if" the R.OPERATING_UNITS = 'WP' and PRODUCT_CAT = 'FUEL' in the joined table. I don't know how to condition that. I wanted to know what is the best efficient way to do that. Below is the query, the RESOURCE, PRODUCT table and also the desired result set. I simplified both the tables and query for the sake of explanation.
SELECT R.DEPTID,
R.FISCAL_YEAR,
sum(R.AMOUNT) total
FROM RESOURCE R
WHERE
R.PRODUCT_ID NOT IN (
SELECT PRODUCT_ID FROM PRODUCT WHERE PRODUCT_CAT='FUEL' )
group by R.FISCAL_YEAR,R.DEPTID
the RESOURCE table
DPTID FISCAL_YEAR OPERATING_UNIT AMOUNT PRODUCT
PTT 2017 WP 1200 31000
PTT 2017 SP 3000 32000
PTT 2017 GP 1000 31000
PTT 2017 WP 1000 32000
FPP 2017 WP 1000 32000
FPP 2018 GP 2000 33000
FPP 2017 SP 1000 32000
FPP 2018 WP 2200 31000
PRODUCT Table:
PRODUCT PRODUCT_CAT
31000 FUEL
32000 NON-FUEL
33000 MATERIAL
Result set. Note that it is ignoring WP when calculating the sum.
2017 PTT 5000 (igonred 1200 since operating unit=wp and product is 31000->FUEL but included wp and 32000)
2017 FPP 2000
2018 FPP 2000 (it did not consider the 2200 since operating unit=wp and product is 31000->FUEL)

WP filter should work after you change below statement
NOT IN (
SELECT PRODUCT_ID FROM PRODUCT WHERE PRODUCT_CAT='FUEL' )
and then you can filter operating unit.
SELECT R.DEPTID,
R.FISCAL_YEAR,
sum(R.AMOUNT) total
FROM RESOURCE R
WHERE
r.OPERATING_UNIT = 'WG' and
R.PRODUCT_ID IN
(
SELECT PRODUCT_ID FROM PRODUCT WHERE PRODUCT_CAT='FUEL' )
group by R.FISCAL_YEAR,R.DEPTID

Sticking in a not-equal to clause with an OR in-between should filter out the cases where both OperatingUnit = 'WP' & ProductCat = 'Fuel'
SELECT r.DEPTID
,r.FISCAL_YEAR
,SUM(r.AMOUNT) AS TOTAL
FROM [Resource] r
INNER JOIN [Product] p ON r.PRODUCT = p.PRODUCT
WHERE r.OPERATING_UNIT != 'WP'
OR p.PRODUCT_CAT != 'FUEL'
GROUP BY r.DEPTID
,r.FISCAL_YEAR
I used the following query below to view the data and verify that it's returning the 6/8 rows I wanted.
SELECT *
FROM [Resource] r
INNER JOIN [Product] p ON r.PRODUCT = p.PRODUCT

For ease of writing - and reading - the exclusion condition, it would be nice if we could work with tuples. And we can. One benefit is that it will be easy, in the future, to add other pairs of operating unit and product category to the exclusion list, without having to write lengthy conditions with lots of OR and AND.
If you run a query like this, and then you take a look at the EXPLAIN PLAN for the query, you will see that the parser expanded the tuple condition to a long logical expression with OR (and AND, if more than one tuple is excluded) - so the end result is the same, but the code looks more natural.
select r.deptid, r.fiscal_year, sum(r.amount) as total
from resource r inner join product p on r.product = p.product
where (r.operating_unit, p.product_cat) not in ( ('WP', 'FUEL') )
group by r.deptid, r.fiscal_year
;
Regarding NULL: if either r.operating_unit or p.product_cat can be NULL, you need to state how they should be handled. If, for example, the operating unit is WP but the product category is NULL, the corresponding row will be excluded in the query above. That may be the proper handling: the unit is definitely 'WP', and since we don't know the product category, we must make a decision. Since it may be 'FUEL', we just don't know for sure, we may choose to exclude it. Obviously, if both columns are NOT NULL then this is not an issue.
Note - I hope you don't really have a table PRODUCT with a column PRODUCT; that will lead to confusion which almost always then leads to bugs.

Selecting only rows that have matching column values

so I have the task of returning the a companies information if and only if ALL of their products have been discontinued.
I have a Suppliers and a Products table. The Suppliers table has a ProductID column and the Products table has a ProductID and a Discontinued column that stores a bit (1 being true or 0 being false).
If anyone has a solution to this, that would be a life saver.
EDIT: the query I'm working with would be something like this
select
s.CompanyName, p.ProductName, p.Discontinued
from
Suppliers s
join
Products p on s.SupplierID = p.SupplierID
and the output would be something like this
CompanyName ProductName Discontinued
-------------------------------------------------------------
Exotic Liquids Chai 0
Exotic Liquids Chang 0
Exotic Liquids Aniseed Syrup 0
New Orleans Cajun Delights Chef Anton's Cajun Seasoning 0
New Orleans Cajun Delights Chef Anton's Gumbo Mix 1
Grandma Kelly's Homestead Grandma's Boysenberry Spread 0
Grandma Kelly's Homestead Uncle Bob's Organic Dried Pears 0
Grandma Kelly's Homestead Northwoods Cranberry Sauce 0
Tokyo Traders Mishi Kobe Niku 1
Tokyo Traders Ikura 0
but I only want it to return the suppliers with all discontinued products

I'm going out on a limb here, but I think you're trying to JOIN two tables on the ProductID.
If that's the case, then you SQL query would look something like this:
SELECT a.ProductID, b.Discontinued FROM Suppliers a
LEFT JOIN Products b ON (a.productID = b.productID)
WHERE b.Discontinued = true

select Suppliers.SupplierID, CompanyName from Suppliers
inner join Products on
Products.SupplierID = Suppliers.SupplierID
where Discontinued = 1
and Products.SupplierID not in
(select SupplierID from Products where Discontinued = 0)
group by Suppliers.SupplierID, CompanyName

Become a better developer in every language you use. Stop trying to do everthing at once. Break the problem into digestible pieces. And learn how to post good questions. You have a query question - how do you expect others to understand your issue and supply useful suggestions without knowing your schema and understanding how you use it. Sure - it's fairly simple in this case but a script that generates your tables, populates them with sample data, and includes whatever you have tried encourages others to respond.
So the first step is to answer the question "which supplies provide only discontinued products". Apparently you have all the information you need in Product. Something like:
select P.SupplierID from dbo.Products as P
group by P.SupplierID
having min(cast(P.Discontinued as tinyint)) = 1
order by P.SupplierID;
I think that is correct but it is still a guess. Sometimes a bit column gets "used" in opposition to the name. What does that do? For each group generated by the group by clause (supplier ID), it will determine the min value of Discontinued (casting is necessary since you cannot use min/max with bit). For a supplier with all discontinued products, that column must be 1 for all associated rows. Since 1 > 0, that means that all products of a supplier are discontinued. Grouping also gives us a distinct set of rows.
Now that you know how to generate a distinct set of SupplierID values, you should be able to apply that to any other query to get information about those suppliers. You can join, use IN, use exists - try all three if you want to really make this a learning experience.

Need to return multiple entries from a single field in One Table

So Here is the problem I have a requirement where I need a customer type to equal two different things.
To Cover the requirement I don't need the customer type to equal Client, or Non client but equal Client, and Non_Client. Each Customer_No can have multiple Customer Types
Here is an example of what I have worked on so far. If you know a better way of optimizing this as well as solving the problem please let me know.
The out put should look like this
CustomerID CustomerType CustomerType
--------------------------------------
2345 Client NonClient
Select TB1.Customer_ID, IB1.Customer_Type, AS Non_client IB1.Customer_Type AS Client
From Client TB1, Client_ReF XB1, Client_Instr IB1, Client_XREC FB1
Where XB1.Client_NO = TB1.Client_NO
AND FB1.Client_ACCT = TB1.ACCT
AND XB1.Client_Instruct_NO = IB1.Client_Instruct_NO
AND FB1.Customer_ID= TB1. Client_NO
AND IB1.Client = 'Client'
AND IB1.Non_Client = 'NonClient'
I have omitted a few other filters that I felt were unnecessary. This also may not make sense, but I tried to change up the names of stuff as to keep myself in compliance.

First a small syntactic error:
You mustn't have a comma before the "AS Non_client "
Then what you are trying to do is make 1 value equal 2 different things for the same column which can never be true:
IB1.Customer_Type for 1 record can never be equal to "Client" and "NonClient" simultaneously.
The key here is that 1 customer can have multiple records and the records can differ in the customer_type. So to use that we need to join those records together which is easy since they share a Customer_ID:
Select TB1.Customer_ID,
IB1.Customer_Type AS Client,
IB2.Customer_Type AS Non_client
From Client TB1,
Client_ReF XB1,
Client_Instr IB1,
Client_Instr IB2,
Client_XREC FB1
Where XB1.Client_NO = TB1.Client_NO
AND FB1.Client_ACCT = TB1.ACCT
AND XB1.Client_Instruct_NO = IB1.Client_Instruct_NO
AND FB1.Customer_ID= TB1.Client_NO
AND IB1.Client = 'Client'
AND XB1.Client_Instruct_NO = IB2.Client_Instruct_NO
AND IB2.Non_Client = 'NonClient';
The above may not actually work due to me not fully understanding your data and structures but should put you on the right path. Particularly around the join of IB2 with XB1, you might have to join IB2 with all the same tables as IB1.
A better way than that however, and i'll leave you to research it, is using the EXISTS statement. The difference is that the above will join all records for the same customer together whereas EXISTS will just be satisfied if there's at least 1 instance of a "NonClient" record.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas