I have a requirement like below:
I've got a HIVE table containing below fields:
Table: USER_PRODUCT
user_id, product1_id, product2_id, product3_id, ... , product10_id
Here, the actual item for each user_id can be anything from 1 to 10 (Meaning for some user_id ONLY product1_id, product2_id is present)
I want to process above and remove items which are invalid based on another table containing product details:
Table: PRODUCT_DEAILS
product_id, product_status
I want to achieve this by writing a HIVE query.
Can someone help me in writing the query? My concern is how to iterate over all product_ids for each user_id?
For(all_rows in USER_PRODUCT)
Iterate over all product_IDs from 1 to 10)
Check if product is valid based on product status in PRODUCT_DEAILS
if(valid) --> keep as it is
else --> Remove product from table by setting it null
If product_deals is small enough, build array of valid products, cross join with USER_PRODUCT and use array_contains to check if a product is valid:
set hive.auto.convert.join=true;
set hive.auto.convert.join.noconditionaltask=true;
set hive.mapjoin.smalltable.filesize=1000000000; --adjust to small table size
set hive.auto.convert.join.noconditionaltask=1000000000;
with valid_product as (
select collect_set(product_id) as list
from PRODUCT_DEAILS
where product_status='valid'
sort by product_id
)
insert overwrite table USER_PRODUCT
select p.user_id,
case when array_contains(v.list, p.product1_id) then p.product1_id end product1_id,
case when array_contains(v.list, p.product2_id) then p.product2_id end product2_id,
case when array_contains(v.list, p.product3_id) then p.product3_id end product3_id,
case when array_contains(v.list, p.product4_id) then p.product4_id end product4_id,
case when array_contains(v.list, p.product5_id) then p.product5_id end product5_id,
case when array_contains(v.list, p.product6_id) then p.product6_id end product6_id,
case when array_contains(v.list, p.product7_id) then p.product7_id end product7_id,
case when array_contains(v.list, p.product8_id) then p.product8_id end product8_id,
case when array_contains(v.list, p.product9_id) then p.product9_id end product9_id,
case when array_contains(v.list, p.product10_id) then p.product10_id end product10_id
from USER_PRODUCT p
cross join valid_product v; --cross join with single row containing array
If PRODUCT_DEALS is too big to fit in array, then use common joins:
set hive.auto.convert.join=true;
set hive.auto.convert.join.noconditionaltask=true;
set hive.mapjoin.smalltable.filesize=1000000000; --adjust to small table size
set hive.auto.convert.join.noconditionaltask=1000000000;
with valid_product as (
select distinct product_id --Get distinct IDs of valid products
from PRODUCT_DEAILS
where product_status='valid'
)
insert overwrite table USER_PRODUCT
select p.user_id,
case when v1.product_id is not null then p.product1_id end product1_id,
case when v2.product_id is not null then p.product2_id end product2_id,
case when v3.product_id is not null then p.product3_id end product3_id,
case when v4.product_id is not null then p.product4_id end product4_id,
case when v5.product_id is not null then p.product5_id end product5_id,
case when v6.product_id is not null then p.product6_id end product6_id,
case when v7.product_id is not null then p.product7_id end product7_id,
case when v8.product_id is not null then p.product8_id end product8_id,
case when v9.product_id is not null then p.product9_id end product9_id,
case when v10.product_id is not null then p.product10_id end product10_id
from USER_PRODUCT p
left join valid_product v1 on p.product1_id=v1.product_id
left join valid_product v2 on p.product2_id=v2.product_id
left join valid_product v3 on p.product3_id=v3.product_id
left join valid_product v4 on p.product4_id=v4.product_id
left join valid_product v5 on p.product5_id=v5.product_id
left join valid_product v6 on p.product6_id=v6.product_id
left join valid_product v7 on p.product7_id=v7.product_id
left join valid_product v8 on p.product8_id=v8.product_id
left join valid_product v9 on p.product9_id=v9.product_id
left join valid_product v10 on p.product10_id=v10.product_id;
Related
I am trying to get the HTS "harmonized code" from two different tables.
STKMP purchased, STKMM Manufactured.
When I run my query, there are items that are missing the HTS from STKMP, I would like to replace NULLS
with the data on STKMM. I have tried case when but it gives me no results.
Select distinct
ltrim(rtrim(boldh.FEBOL#)) as BOL,
--ltrim(rtrim(bolh.FESCS#)) as ShipTo,
--ltrim(rtrim(bolh.FESNME)) as CustomerName,
--ltrim(rtrim(bolh.FGCPO#)) as CustPO,
--ltrim(rtrim(ocri.DDCSPI)) as CustLine,
ltrim(rtrim(bold.FGCPT#)) as CustPart,
ltrim(rtrim(bolh.FESNME)) as CustName,
ltrim(rtrim(bolh.FESAD1)) as CustStreet,
ltrim(rtrim(bolh.FESAD2)) as CustStreet1,
ltrim(rtrim(bolh.FESAD3)) as CustCityState,
ltrim(rtrim(stkmp.AWHARM)) as HTS,
case when STKMP.AWHARM is null then STKMM.AVHARM else stkmp.AWHARM end as HTTT,
ltrim(rtrim(V6CORG)) as COO,
ltrim(rtrim(awdes1)) as Descrip
--ltrim(rtrim([FGQSHO])) as QTY
FROM BOLH
left join bold on bolh.FEBOL# = bold.FGBOL#
left join ocri on bold.FGORD# = ocri.DDORD# and bold.FGITEM = ocri.DDITM#
left join STKA on ocri.DDPART = stka.v6part
left join STKMP on stka.V6PART = STKMP.AWPART
left join STKMM on STKMP.AWPART = STKMM.AVPART
Thanks
COALESCE ( expression [ ,...n ] )
Reference: https://learn.microsoft.com/en-us/sql/t-sql/language-elements/coalesce-transact-sql?view=sql-server-ver15
COALESCE(STKMP.AWHARM, stkmp.AWHARM) AS HTTT
Alternatively, to see if you got some issues with your joins and/or both values are null you could take it one step further, like this.
COALESCE(STKMP.AWHARM, stkmp.AWHARM, 'Both values are NULL') AS HTTT
Also, there are few more other options are there.
By using ISNULL
SELECT ISNULL(STKMP.AWHARM, stkmp.AWHARM) AS HTTT
By Using IIF Statement
SELECT IIF(STKMP.AWHARM IS NOT NULL,STKMP.AWHARM,stkmp.AWHARM) AS HTTT
I have a requirement where I need to fetch the Dimension Key of Region table on basis of the following preference.
Fetch dimension key on basis of Zipcode of Physical address(PA)
If the first condition is not satisfied that fetch dimension key on basis of the Zip Code of the Mailing address
If the second condition is also not satisfied than fetch the dimension key on basis of the Parish Code of Physical address
Else fetch dimension key on basis of parish Code of Mailing address.
I am trying to use the below query but is giving multiple records since all left joins are getting evaluated. I want that it should not go on the second condition if the first condition is satisfied.
select REGION_DIM_SK, CASE_NUM
from (
select distinct COALESCE(RDIM.REGION_DIM_SK, RDIM1.REGION_DIM_SK, RDIM2.REGION_DIM_SK, RDIM3.REGION_DIM_SK) AS REGION_DIM_SK
, DC.CASE_NUM, ADDR_TYPE_CD
FROM rpt_dm_ee_intg.CASE_PERSON_ADDRESS dc
left join rpt_dm_ee_prsnt.REGION_DIM RDIM on dc.ZIP_CODE = RDIM.ZIP_CODE and RDIM.REGION_EFF_END_DT IS NULL and dc.addr_type_cd='PA' AND dc.EFF_END_DT IS NULL
left join rpt_dm_ee_prsnt.REGION_DIM RDIM1 ON dc.ZIP_CODE = RDIM1.ZIP_CODE AND RDIM1.REGION_EFF_END_DT IS NULL AND dc.addr_type_cd='MA' AND DC.EFF_END_DT IS NULL
left join (
select PARISH_CD, min(REGION_DIM_SK) as REGION_DIM_SK
from rpt_dm_ee_prsnt.REGION_DIM
where REGION_EFF_END_DT is null
group by PARISH_CD
) RDIM2 ON dc.addr_type_cd='PA' and dc.PARISH_CD = RDIM2.PARISH_CD AND DC.EFF_END_DT IS NULL
left join (
select PARISH_CD, min(REGION_DIM_SK) as REGION_DIM_SK
from rpt_dm_ee_prsnt.REGION_DIM
where REGION_EFF_END_DT is null
group by PARISH_CD
) RDIM3 ON dc.addr_type_cd='MA' and dc.PARISH_CD = RDIM3.PARISH_CD AND DC.EFF_END_DT IS NULL
) A
where REGION_DIM_SK is not null
) RD on RD.case_num = rpt_dm_ee_intg.CASE_PERSON_ELIGIBILITY.CASE_NUM
Use multiple left joins. Your query is rather hard to follow -- it has other tables and references not described in the problem.
But the idea is:
select t.*,
coalesce(rpa.dim_key, rm.dim_key, rpap.dim_key, rmp.dim_key) as dim_key
from t left join
dim_region rpa
on t.physical_address_zipcode = rpa.zipcode left join
dim_region rm
on t.mailing_address_zipcode = rm.zipcode and
rpa.zipcode is null left join
dim_region rpap
on t.physical_addresss_parishcode = rpap.parishcode and
rm.zipcode is null left join
dim_region rmp
on t.physical_addresss_parishcode = rmp.parishcode and
rpap.zipcode is null
The trick is to put the conditions in CASE WHEN:
SELECT *
FROM table1 a
JOIN table2 b
ON CASE
WHEN a.code is not null and a.code = b.code THEN 1
WHEN a.type = b.type THEN 1
ELSE 0
END = 1
For your example you can reduce the code to just two joins, it can't be done in one as you are joining two different tables.
SELECT CASE WHEN RDIM.addres IS NULL THEN RDIM2.addres ELSE RDIM.addres
FROM rpt_dm_ee_intg.CASE_PERSON_ADDRESS dc
LEFT JOIN rpt_dm_ee_prsnt.REGION_DIM RDIM ON CASE
WHEN (dc.ZIP_CODE = RDIM.ZIP_CODE
AND RDIM.REGION_EFF_END_DT IS NULL
AND dc.addr_type_cd='PA'
AND dc.EFF_END_DT IS NULL) THEN 1
WHEN (dc.ZIP_CODE = RDIM1.ZIP_CODE
AND RDIM1.REGION_EFF_END_DT IS NULL
AND dc.addr_type_cd='MA'
AND DC.EFF_END_DT IS NULL) THEN 1
ELSE 0
END = 1
LEFT JOIN
(SELECT PARISH_CD,
min(REGION_DIM_SK) AS REGION_DIM_SK
FROM rpt_dm_ee_prsnt.REGION_DIM
WHERE REGION_EFF_END_DT IS NULL
GROUP BY PARISH_CD) RDIM2 ON CASE
WHEN (dc.addr_type_cd='PA'
AND dc.PARISH_CD = RDIM2.PARISH_CD
AND DC.EFF_END_DT IS NULL
AND RDIM.ZIP_CODE IS NULL) THEN 1
WHEN (dc.addr_type_cd='MA'
AND dc.PARISH_CD = RDIM3.PARISH_CD
AND DC.EFF_END_DT IS NULL
AND RDIM.ZIP_CODE IS NULL) THEN 1
ELSE 0
END = 1
edit
If you don't want to have nulls from RDIM2 table if RDIM1 zip code is present the logic could be easily extended to support that. You just need to add AND RDIM.ZIP_CODE IS NULL to CASE WHEN conditions.
I am learning SQL case statements and have the following stored procedure.
Select PT.[ID] 'TransactionID', PT.BatchNumber, PT.SequenceNumber, PT.TransactionDate,
PT.TerminalID, PT.TotalAmount, PT.TransactionTypeID, TT.TransactionType,
PT.PAN 'EmbossLine',PT.PreBalanceAmount, PT.PostBalanceAmount, RefTxnID, SettlementDate,PaidCash, CreditAmount, DiscountAmount,
RefPAN, Remarks, PT.Product,
case PT.Product when 1 then 'Taxi' end 'ProductName'
case PT.Product when 2 then 'Airport Lounge' end 'ProductName'
into #Temp
from POS_Transactions PT inner join TransactionType TT on TT.TransactionTypeID = PT.TransactionTypeID
where
PT.[ID] not in (Select distinct isnull(TransactionID,0) from Testcards)
and (PT.TransactionDate >= #DateFrom)
and (PT.TransactionDate < #DateTo)
and (PT.TransactionTypeID = #TransactionTypeID or #TransactionTypeID = -999)
select T.*, C.EmbossLine+' ' as 'EmbossLine', C.EmbossLine as 'EmbossLine1',
C.EmbossName, PM.MerchantID, PM.MerchantName1, C.AccountNumber, C.VehicleNumber
from #Temp T
inner join Card C on C.EmbossLine= T.EmbossLine
inner join Terminal on Terminal.TerminalID = T.TerminalID
inner join Merchant PM on PM.MerchantID = Terminal.MerchantID
where C.Status <>'E3'
and C.CardID not in (Select distinct isnull(CardID,0) from Testcards)
and (PM.MerchantID =#MerchantID or #MerchantID='-999')
and (C.EmbossLine like '%'+#EmbossLine+'%' or #EmbossLine like '-999')
and (C.EmbossName like '%'+#EmbossName+'%' or #EmbossName like '-999')
order by T.TransactionDate, MerchantName1, T.BatchNumber, T.SequenceNumber
drop table #Temp
When I create it, command is executed succesfully. However when I call it, it throws following error
Column names in each table must be unique. Column name 'ProductName'
in table '#Temp' is specified more than once.
I think I have problem in the syntax in these lines
case PT.Product when 1 then 'Taxi' end 'ProductName'
case PT.Product when 2 then 'Airport Lounge' end 'ProductName'
Can someone identify?
Your line is indeed a problem.
I suspect you're after something like:
case PT.Product
when 1 then 'Taxi'
when 2 then 'Airport Lounge'
end 'ProductName'
In your syntax you make two different cases, resulting in two columns being selected, both called the same.
Above the case can return two different values into the one row.
I have three tables I need to join in order to tell what documents a product needs. Not all documents are needed on each product.
There is a Document table, a Product table, and a DocTracking table that tracks the documents associated with products
Product Table
ProdID ProdName
1 Ball
2 Wheel
DocTracking Table
ProdID DocID
1 1
1 2
2 2
I want the join to look like this:
ProdID ProdName Needs Word Doc? Needs Excel Doc?
1 Ball Yes Yes
2 Wheel No Yes
Any help would be appreciated, if I need to make this into a Stored Procedure, that is fine.
If you have only those documents and they are fix you can use this query:
SELECT ProdID, ProdName,
[Needs Word Doc] = CASE WHEN EXISTS(
SELECT 1 FROM Document d INNER JOIN DocTracking dt ON d.DocID=dt.DocID
WHERE dt.ProdID = p.ProdID AND d.[Doc Name] = 'Word Document'
) THEN 'Yes' ELSE 'No' END,
[Needs Excel Doc] = CASE WHEN EXISTS(
SELECT 1 FROM Document d INNER JOIN DocTracking dt ON d.DocID=dt.DocID
WHERE dt.ProdID = p.ProdID AND d.[Doc Name] = ' Excel Spreadsheet'
) THEN 'Yes' ELSE 'No' END
FROM dbo.Product p
Of course you could also use the DocID, then the query doesn't depend on the name.
select P.ProdID, P.ProdName,
case
when DW.DocID is null then 'Yes'
else 'No'
end as NeedsWordDoc,
case
when DE.DocID is null then 'Yes'
else 'No'
end as NeedsExcelDoc
from Product P
left join DocTracking DTW on DTW.ProdId = P.ProdId
left join Document DW on DW.DocID = DTW.DocID
and DW.Name = 'Word Document'
left join DocTracking DTE on DTE.ProdId = P.ProdId
left join Document DE on DE.DocID = DTE.DocID
and DE.Name = 'Excel Spreadsheet'
This is a little more complicated than a typical pivot query. But, the only challenging part is determining which documents are included, and then getting 'Yes' or 'No' out.
The following does this with coalesce() and a conditional that checks for the presence of one type of document:
select pt.ProdId, pt.ProdName,
coalesce(MAX(case when dt.DocId = 1 then 'Yes' end), 'No') as "Needs Word Doc?",
coalesce(MAX(case when dt.DocId = 2 then 'Yes' end), 'No') as "Needs Excel Doc?"
from ProductTable pt left outer join
DocTracking dt
on dt.ProdId = dt.ProdId
group by pt.ProdId, pt.ProdName;
Note that SQL queries return a fixed number of columns. So, you cannot have a SQL query that simply returns a different number of columns based on what is present in the documents table. You can create a SQL query in a string and then use a database-specific command to run it.
may be this would help you - using pivot:
select ProdId, ProdName,
case when isnull([Word Document],0)<>0 then 'Yes'
else 'No'
end as [Needs Word Doc?],
case when isnull([Excel Spreadsheet],0)<>0 then 'Yes'
else 'No'
end as [Needs Excel Spreadsheet?]
from
(
select p.ProdId,p.ProdName,d.DocId,d.DocName from
#Prod p left join
#Track t
on p.ProdId=t.ProdId
inner join #Doc d
on t.DocId=d.DocId
)
as temp
pivot
(max(DocID)
For DocName in ([Word Document],[Excel Spreadsheet])
)
as pvt
I am trying to get a list of products and images from two tables. When I join them and use a case switch on the HasImage column from the products table, I get this error:
Error:
Subquery returned more than 1 value.
This is not permitted when the subquery follows =, !=, <, <= , >, >=
or when the subquery is used as an expression.
When a product does not have an image, I want to replace it with a default image.
Here is the select statement:
SELECT
P.[ProductId]
,P.[ProductName]
--If HasImage is false show the default.jpg
,Case P.[HasImage]
WHEN 'True' THEN (Select I.[FileName] as ProductImage FROM [ProductImages] I
INNER JOIN [Product] P on P.ProductId = I.ProductId
WHERE I.Sequence=0)
WHEN 'False' THEN 'default.jpg'
END
FROM [Product] P
LEFT JOIN [ProductImages] I
on P.ProductId = I.ProductId
The problem is in the Case When 'True'. That's what throws the error.
Product Table
ProductId ProductName HasImage
1 Coffee Mug True
2 Pen False
3 Pencil False
Product Images Table
ProductId Sequence FileName
1 0 Mug_Image1.jpg
1 1 Mug_2.jpg
1 2 Mug_Img3.jpg
There are multiple images for ProductId=1, but I use Sequence = 0 to return only one.
The returned data I want should look like this:
ProductId ProductName ProductImage
1 Coffee Mug Mug_Image1.jpg
2 Pen default.jpg
3 Pencil default.jpg
I have tried various combinations of coalesce(NULLIF, Left Joins, and different statments, but I haven't gotten all three products to display as desired.
In addition to my comment below OP, this is a query I believe should have been written in the first place.
SELECT
P.[ProductId]
,P.[ProductName]
--If HasImage is false show the default.jpg
,Case P.[HasImage]
WHEN 'True'
THEN I.[FileName]
WHEN 'False'
THEN 'default.jpg'
END
FROM [Product] P
LEFT JOIN [ProductImages] I
ON P.ProductId = I.ProductId
-- Filter Sequence 0 only
-- All products will be retrieved
-- whether they have associated Image with Sequence = 0
AND I.Sequence = 0
Filtering right side of left join allows you to retain properties of left join AND join only rows of interest. If HasImage serves just to mark existance of Images and not as business rule (show/don't show image of this particular product), you might remove case in favor of simple isnull(I.FileName, 'default.jpg').
Alternatively (Sql Server 2005 and newer) you might use CROSS APPLY to retrieve images:
SELECT
P.[ProductId]
,P.[ProductName]
,I.[FileName]
FROM [Product] P
OUTER APPLY
(
SELECT CASE P.[HasImage]
WHEN 'True'
THEN ProductImages.[FileName]
WHEN 'False'
THEN 'default.jpg'
END FileName
FROM [ProductImages]
WHERE P.ProductId = ProductImages.ProductId
AND I.Sequence = 0
) I
Why do you want to go for a INNER JOIN?
If you are sure that when HasImage is TRUE your table will have an image then better go for a straight join.
Try this query -
SELECT P.[ProductId],
P.[ProductName],
Case P.[HasImage]
WHEN 'True' THEN
(Select
TOP 1 -- For Oracle
I.[FileName] as ProductImage FROM [ProductImages] I
WHERE P.ProductId = I.ProductId
ORDER BY I.Sequence
LIMIT 1 -- For MySQL
)
WHEN 'False' THEN 'default.jpg'
END
FROM [Product] P