Date filter in hive while doing left outer join - sql

I am doing a query build in hive, the query is given below.
*
Select * from CSS407
LEFT OUTER JOIN PROD_CORE.SERV_ACCT_ISVC_LINK SASP
ON CSS407.TABLE_ABBRV_CODE = 'SACT'
AND CSS407.EVENT_ITEM_REF_NUM = SASP.Serv_Acct_Id
AND to_date(CSS407.EVENT_RTS_VAL) >= SASP.Acct_Serv_Pnt_Strt_Dt
AND to_date(CSS407.EVENT_RTS_VAL) < SASP.Acct_Serv_Pnt_End_Dt
LEFT OUTER JOIN PROD_CORE.CUST_ACCT_SA_LINK ASA
ON CSS407.TABLE_ABBRV_CODE = 'SACT'
AND CSS407.EVENT_ITEM_REF_NUM = ASA.Serv_Acct_Id
AND CSS407.EVENT_RTS_VAL_UTC_DTTM >= ASA.Acct_Relt_Strt_Dttm
AND CSS407.EVENT_RTS_VAL_UTC_DTTM < ASA.Acct_Relt_End_Dttm
LEFT OUTER JOIN PROD_CORE.CUST_SA_LINK ASAT
ON CSS407.TABLE_ABBRV_CODE = 'TACT'
AND CSS407.EVENT_ITEM_REF_NUM = ASAT.Serv_Acct_Id
AND CSS407.EVENT_RTS_VAL_UTC_DTTM >= ASAT.Acct_Relt_Strt_Dttm
AND CSS407.EVENT_RTS_VAL_UTC_DTTM < ASAT.Acct_Relt_End_Dttm
*
When I am executing the above table in hive I am getting the below error
"Both left and right aliases encountered in JOIN 'SASP'"
On further investigation I founded that we cannot use date between filter in the join on condition. In every post everyone is asking to insert that filter in where condition.
But in our case if we are moving that date between filter to where condition then we are not getting any data since left outer join is not satisfying.
I am getting this issue while executing in HIVE, it is working fine in Teradata and oracle
Please help.

Only equality(=) works in join condition in Hive.Move <= to where clause.
I have the similar issue earlier.Please check below thread.
Hive Select MAX() in Join Condition
Hope this helps.

There might be some common column between CSS407 and SERV_ACCT_ISVC_LINK which might be creating this error.

Related

ACCESS SQL - Joining 2 tables on Datetime values

I have an issue joining 2 tables with datetime values in Access.
I tried to join the tables by simply setting
LEFT JOIN Table1.Datetime=Table2.Datetime
However, the output of my query is really off.
I then tried to join by splitting the dates:
LEFT JOIN YEAR(Table1.Datetime)=YEAR(Table2.Datetime)
AND MONTH(Table1.Datetime)=MONTH(Table2.Datetime)
AND DAY(Table1.Datetime)=DAY(Table2.Datetime)
AND HOUR(Table1.Datetime)=HOUR(Table2.Datetime)
Running it this way, the query seems stucked and I don't ever get any results.
I then tied joining both table on a condition like:
LEFT JOIN Table1.Datetime>=Table2.Datetime
AND Table1.Datetime<Table2.Datetime + 1/24
I'm running out of ideas for my join to effectively work, any help would be much appreciated !
DateTime is based on Double, and you can't just check such values for equality because of potential floating point errors.
Try something like this:
LEFT JOIN Abs(Table1.Datetime-Table2.Datetime) < #00:00:01#
or:
LEFT JOIN DateDiff("s", Table1.Datetime, Table2.Datetime) = 0
or:
LEFT JOIN Format(Table1.Datetime, yyyymmddhhnnss") = Format(Table2.Datetime, yyyymmddhhnnss")
These may be too slow, however. If so, join two simple select queries, one for each table, having:
Format(Table1.Datetime, "yyyymmddhhnnss") As TextTime - and
Format(Table2.Datetime, "yyyymmddhhnnss") As TextTime
and then join on
query1.TextTime = query2.TextTime

What are the possible ways to optimize the below postgreSQL code?

I have written this SQL query to fetch the data from greenplum datalake. The primary table has hardy 800,000ish rows which I am joining with other table. The below query is taking insane amount of time to give result. What might be the possible reason for the longer query time? How to resolve it?
select
a.pole,
t.country_name,
a.service_area,
a.park_name,
t.turbine_platform_name,
a.turbine_subtype,
a.pad as "turbine_name",
t.system_number as "turbine_id",
a.customer,
a.service_contract,
a.component,
c.vendor_mfg as "component_manufacturer",
a.case_number,
a.description as "case_description",
a.rmd_diagnosis as "case_rmd_diagnostic_description",
a.priority as "case_priority",
a.status as "case_status",
a.actual_rootcause as "case_actual_rootcause",
a.site_trends_feedback as "case_site_feedback",
a.added as "date_case_added",
a.start as "date_case_started",
a.last_flagged as "date_case_flagged_by_algorithm_latest",
a.communicated as "date_case_communicated_to_field",
a.field_visible_date as "date_case_field_visbile_date",
a.fixed as "date_anamoly_fixed",
a.expected_clse as "date_expected_closure",
a.request_closure_date as "date_case_request_closure",
a.validation_date as "date_case_closure",
a.production_related,
a.estimated_value as "estimated_cost_avoidance",
a.cms,
a.anomaly_category,
a.additional_information as "case_additional_information",
a.model,
a.full_model,
a.sent_to_field as "case_sent_to_field"
from app_pul.anomaly_stage a
left join ge_cfg.turbine_detail t on a.scada_number = t.system_number and a.added > '2017-12-31'
left join tbwgr_v.pmt_wmf_tur_component_master_t c on a.component = c.component_name
Your query is basically:
select . . .
from app_pul.anomaly_stage a left join
ge_cfg.turbine_detail t
on a.scada_number = t.system_number and
a.added > '2017-12-31' left join
tbwgr_v.pmt_wmf_tur_component_master_t c
on a.component = c.component_name
First, the condition on a is ignored, because it is the first table in the left join and is the on clause. So, I assume you actually intend for it to filter, so write the query as:
select . . .
from app_pul.anomaly_stage a left join
ge_cfg.turbine_detail t
on a.scada_number = t.system_number left join
tbwgr_v.pmt_wmf_tur_component_master_t c
on a.component = c.component_name
where a.added > '2017-12-31'
That might help with performance. Then in Postgres, you would want indexes on turbine_detail(system_number) and pmt_wmf_tur_component_master_t(component_name). It is doubtful that an index would help on the first table, because you are already selecting a large amount of data.
I'm not sure if indexes would be appropriate in Greenplum.
Verify if the joins are using respective primary and foreign keys.
Try to execute the query removing one left join after the other, so you see the focus the problem.
Try using the plan execution.

SQL triple left join query across three databases

I'm trying to run a query across three tables in three different databases. This query works but I'm pulling close to a billion records... Is there any solution to pull the distinct fields from smlog.requestor_type and arcust.maj_class for the following query?
SELECT
smreq.request_id AS ROIrequestID,
arcust.customer AS LAWcustID,
smlog.logid AS ESLlogID,
arcust.maj_class AS invoicetype,
smlog.requestor_type AS SMLrequestortype,
smlog.request_type as SMLrequesttype
FROM roi.sm_request_sp_data reqsp
LEFT JOIN smart.smlog#smartlog smlog ON smlog.logid = reqsp.logid
LEFT JOIN roi.sm_requests smreq ON smreq.request_id = reqsp.request_id
LEFT JOIN lawson.arcustomer#smart7 arcust ON arcust.customer =
smreq.customer_id
WHERE smreq.ORIG_DT >= TO_DATE('2016/03/01', 'yyyy/mm/dd')
AND smreq.ORIG_DT <= TO_DATE('2016/03/02','yyyy/mm/dd')
GROUP BY smlog.requestor_type;
These are observations, not an answer
SELECT
smreq.request_id AS ROIrequestID
FROM roi.sm_request_sp_data reqsp
LEFT JOIN roi.sm_requests smreq ON reqsp.request_id = smreq.request_id
WHERE smreq.ORIG_DT >= TO_DATE('2016/03/01', 'yyyy/mm/dd')
AND smreq.ORIG_DT <= TO_DATE('2016/03/02', 'yyyy/mm/dd')
That LEFT JOIN is overridden completely by the where clause (any NULL produced from the left join is disallowed) so use an INNER JOIN instead.
For the where clause It isn't clear if you want one day's data ('2016/03/01') or 2 day's (both '2016/03/01'+ '2016/03/02'), If you are expecting just one day then don't use <= in the second predicate.
For the rest we really have no factual basis to make recommendations.

MS Access INNER JOIN/LEFT JOIN problems

I have the following SQL string which tries to combine an INNER JOIN with a LEFT JOIN in the FROM section.
As you can see I use table VIP_APP_VIP_SCENARIO_DETAIL_LE to perform the query. When I use it against this table, Access give me an "Invalid Operation" error.
Interestingly, when I use the EXACT same query using the VIP_APP_VIP_SCENARIO_DETAIL_BUDGET or VIP_APP_VIP_SCENARIO_DETAIL_ACTUALS table, it performs flawlessly.
So why would it work on two tables but not the other? All fields are in all tables and the data types are correct.
As a side note: on the query with the error, if I change the LEFT JOIN to an INNER JOIN, it runs with no problem! I really need a LEFT JOIN though.
SELECT
D.MATERIAL_NUMBER,
D.MATERIAL_DESCRIPTION,
D.PRODUCTION_LOT_SIZE,
D.STANDARDS_NAME,
D.WORK_CENTER,
S.OP_SHORT_TEXT,
S.OPERATION_CODE,
D.LINE_SPEED_UPM,
D.PERCENT_STD,
D.EQUIPMENT_SU,
D.EQUIPMENT_CU,
D.OPERATOR_NUM,
V.COSTING_LOT_SIZE,
V.VOL_TOTAL_ADJ
FROM
([STDS_SCENARIO: TEST] AS D INNER JOIN MASTER_SUMMARY AS S ON
D.MATERIAL_NUMBER = S.MATERIAL_NUMBER AND D.WORK_CENTER = S.WORK_CENTER)
LEFT JOIN
(SELECT ITEM_CODE, COSTING_LOT_SIZE, VOL_TOTAL_ADJ
FROM
VIP_APP_VIP_SCENARIO_DETAIL_LE
WHERE SCENARIO_ID = 16968) AS V ON D.MATERIAL_NUMBER = V.ITEM_CODE
ORDER BY D.MATERIAL_NUMBER, D.STANDARDS_NAME, S.OPERATION_CODE;
tried to mock this up in SQL server with some tables of my own, but the structure seemed to work, this follows the pattern referenced above. (hopefully no syntax errors left here)
SELECT * FROM (
select
D.MATERIAL_NUMBER,
D.MATERIAL_DESCRIPTION,
D.PRODUCTION_LOT_SIZE,
D.STANDARDS_NAME,
D.WORK_CENTER,
S.OP_SHORT_TEXT,
S.OPERATION_CODE,
D.LINE_SPEED_UPM,
D.PERCENT_STD,
D.EQUIPMENT_SU,
D.EQUIPMENT_CU,
D.OPERATOR_NUM
FROM [STDS_SCENARIO: TEST] D
INNER JOIN MASTER_SUMMARY S
ON D.MATERIAL_NUMBER = S.MATERIAL_NUMBER AND D.WORK_CENTER = S.WORK_CENTER) AS J
LEFT JOIN
(SELECT ITEM_CODE, COSTING_LOT_SIZE, VOL_TOTAL_ADJ
FROM
VIP_APP_VIP_SCENARIO_DETAIL_LE
WHERE SCENARIO_ID = 16968) AS V ON J.MATERIAL_NUMBER = V.ITEM_CODE
ORDER BY J.MATERIAL_NUMBER, J.STANDARDS_NAME, J.OPERATION_CODE;
Had help from a friend and we discovered that it was a casting problem between a linked Oracle table and the Access table. To fix the problem we casted both sides of the linked fields to a string:
CSTR(D.[MATERIAL_NUMBER]) = CSTR(V.[ITEM_CODE])

SQL Query: Comparing two dates in returned record

I'm trying to come up with an automated solution for something I do manually now and I only have minimal, bare-bones SQL skill. I usually modify simple queries others have built or will build basic select queries. I have done some reading but don't know how to make it do what I need in this case. I need to come up with something others can use while I am out for a month (and which will save me time when I return).
What I need is to return the fields below where tblThree.EndDate is later than tblFive.ServiceEnd. I have to do a couple of other compares on the dates, but if I get a working query of the first one I can make it work with the others. We use MS SQL Server 2008.
I tried creating sub-queries with aliases and failed miserably at making it work.
These are the table and fields I am working with:
tblOne.ServiceID
tblOne.ServiceYear
tblOne.Status
tblTwo.AccountNbr
tblTwo.AcctName
tblThree.BeginDate (smalldatetime, null)
tblThree.EndDate (smalldatetime, null)
tblFour.ClientID
tblFour.ServiceName
tblFive.ContractID
tblFive.ServiceBegin (smalldatetime, null)
tblFive.ServiceEnd (smalldatetime, null)
This is how the tables are related:
tblOne.ServiceID = tblThree.ServiceID
tblOne.ContractID = tblFive.ContractID
tblOne.ClientID = tblFour.ClientID
tblTwo.AccountNbr = tblFour.Account
I used MS Access 2003 to generate the Join SQL:
SELECT tblOne.ServiceID, tblTwo.AccountNbr,
tblTwo.AcctName, tblFour.ServiceName, tblOne.Status,
tblThree.BeginDate, tblThree.EndDate,
tblOne.ServiceYear, tblFive.ServiceBegin,
tblFive.ServiceEnd
FROM ((tblTwo INNER JOIN tblFour
ON tblTwo.AccountNbr=tblFour.AccountNbr) INNER JOIN (tblThree INNER JOIN tblOne
ON tblThree.ServiceID=tblOne.ServiceID)
ON tblFour.ClientID=tblOne.ClientID) INNER JOIN tblFive
ON tblOne.ContractID=tblFive.ContractID;
Thanks for any help.
Just add a WHERE clause to get started:
SELECT tblOne.ServiceID, tblTwo.AccountNbr,
tblTwo.AcctName, tblFour.ServiceName, tblOne.Status,
tblThree.BeginDate, tblThree.EndDate,
tblOne.ServiceYear, tblFive.ServiceBegin,
tblFive.ServiceEnd
FROM ((tblTwo INNER JOIN tblFour
ON tblTwo.AccountNbr=tblFour.AccountNbr) INNER JOIN (tblThree INNER JOIN tblOne
ON tblThree.ServiceID=tblOne.ServiceID)
ON tblFour.ClientID=tblOne.ClientID) INNER JOIN tblFive
ON tblOne.ContractID=tblFive.ContractID
WHERE tblThree.EndDate > tblFive.ServiceEnd;
SELECT
tblOne.ServiceID,
tblOne.ServiceYear,
tblOne.Status,
tblTwo.AccountNbr,
tblTwo.AcctName,
tblThree.BeginDate,
tblThree.EndDate,
tblFour.ClientID,
tblFour.ServiceName,
tblFive.ContractID,
tblFive.ServiceBegin,
tblFive.ServiceEnd
FROM tblOne
INNER JOIN tblThree
ON tblOne.ServiceID = tblThree.ServiceID
INNER JOIN tblFive
ON tblOne.ContractID = tblFive.ContractID
INNER JOIN tblFour
ON tblOne.ClientID = tblFour.ClientID
INNER JOIN tblTwo
ON tblTwo.AccountNbr = tblFour.Account
WHERE tblThree.EndDate > tblFive.ServiceEnd