SQL triple left join query across three databases

SQL triple left join query across three databases - sql

I'm trying to run a query across three tables in three different databases. This query works but I'm pulling close to a billion records... Is there any solution to pull the distinct fields from smlog.requestor_type and arcust.maj_class for the following query?
SELECT
smreq.request_id AS ROIrequestID,
arcust.customer AS LAWcustID,
smlog.logid AS ESLlogID,
arcust.maj_class AS invoicetype,
smlog.requestor_type AS SMLrequestortype,
smlog.request_type as SMLrequesttype
FROM roi.sm_request_sp_data reqsp
LEFT JOIN smart.smlog#smartlog smlog ON smlog.logid = reqsp.logid
LEFT JOIN roi.sm_requests smreq ON smreq.request_id = reqsp.request_id
LEFT JOIN lawson.arcustomer#smart7 arcust ON arcust.customer =
smreq.customer_id
WHERE smreq.ORIG_DT >= TO_DATE('2016/03/01', 'yyyy/mm/dd')
AND smreq.ORIG_DT <= TO_DATE('2016/03/02','yyyy/mm/dd')
GROUP BY smlog.requestor_type;

These are observations, not an answer
SELECT
smreq.request_id AS ROIrequestID
FROM roi.sm_request_sp_data reqsp
LEFT JOIN roi.sm_requests smreq ON reqsp.request_id = smreq.request_id
WHERE smreq.ORIG_DT >= TO_DATE('2016/03/01', 'yyyy/mm/dd')
AND smreq.ORIG_DT <= TO_DATE('2016/03/02', 'yyyy/mm/dd')
That LEFT JOIN is overridden completely by the where clause (any NULL produced from the left join is disallowed) so use an INNER JOIN instead.
For the where clause It isn't clear if you want one day's data ('2016/03/01') or 2 day's (both '2016/03/01'+ '2016/03/02'), If you are expecting just one day then don't use <= in the second predicate.
For the rest we really have no factual basis to make recommendations.

Related

ACCESS SQL - Joining 2 tables on Datetime values

I have an issue joining 2 tables with datetime values in Access.
I tried to join the tables by simply setting
LEFT JOIN Table1.Datetime=Table2.Datetime
However, the output of my query is really off.
I then tried to join by splitting the dates:
LEFT JOIN YEAR(Table1.Datetime)=YEAR(Table2.Datetime)
AND MONTH(Table1.Datetime)=MONTH(Table2.Datetime)
AND DAY(Table1.Datetime)=DAY(Table2.Datetime)
AND HOUR(Table1.Datetime)=HOUR(Table2.Datetime)
Running it this way, the query seems stucked and I don't ever get any results.
I then tied joining both table on a condition like:
LEFT JOIN Table1.Datetime>=Table2.Datetime
AND Table1.Datetime<Table2.Datetime + 1/24
I'm running out of ideas for my join to effectively work, any help would be much appreciated !

DateTime is based on Double, and you can't just check such values for equality because of potential floating point errors.
Try something like this:
LEFT JOIN Abs(Table1.Datetime-Table2.Datetime) < #00:00:01#
or:
LEFT JOIN DateDiff("s", Table1.Datetime, Table2.Datetime) = 0
or:
LEFT JOIN Format(Table1.Datetime, yyyymmddhhnnss") = Format(Table2.Datetime, yyyymmddhhnnss")
These may be too slow, however. If so, join two simple select queries, one for each table, having:
Format(Table1.Datetime, "yyyymmddhhnnss") As TextTime - and
Format(Table2.Datetime, "yyyymmddhhnnss") As TextTime
and then join on
query1.TextTime = query2.TextTime

What are the possible ways to optimize the below postgreSQL code?

I have written this SQL query to fetch the data from greenplum datalake. The primary table has hardy 800,000ish rows which I am joining with other table. The below query is taking insane amount of time to give result. What might be the possible reason for the longer query time? How to resolve it?
select
a.pole,
t.country_name,
a.service_area,
a.park_name,
t.turbine_platform_name,
a.turbine_subtype,
a.pad as "turbine_name",
t.system_number as "turbine_id",
a.customer,
a.service_contract,
a.component,
c.vendor_mfg as "component_manufacturer",
a.case_number,
a.description as "case_description",
a.rmd_diagnosis as "case_rmd_diagnostic_description",
a.priority as "case_priority",
a.status as "case_status",
a.actual_rootcause as "case_actual_rootcause",
a.site_trends_feedback as "case_site_feedback",
a.added as "date_case_added",
a.start as "date_case_started",
a.last_flagged as "date_case_flagged_by_algorithm_latest",
a.communicated as "date_case_communicated_to_field",
a.field_visible_date as "date_case_field_visbile_date",
a.fixed as "date_anamoly_fixed",
a.expected_clse as "date_expected_closure",
a.request_closure_date as "date_case_request_closure",
a.validation_date as "date_case_closure",
a.production_related,
a.estimated_value as "estimated_cost_avoidance",
a.cms,
a.anomaly_category,
a.additional_information as "case_additional_information",
a.model,
a.full_model,
a.sent_to_field as "case_sent_to_field"
from app_pul.anomaly_stage a
left join ge_cfg.turbine_detail t on a.scada_number = t.system_number and a.added > '2017-12-31'
left join tbwgr_v.pmt_wmf_tur_component_master_t c on a.component = c.component_name

Your query is basically:
select . . .
from app_pul.anomaly_stage a left join
ge_cfg.turbine_detail t
on a.scada_number = t.system_number and
a.added > '2017-12-31' left join
tbwgr_v.pmt_wmf_tur_component_master_t c
on a.component = c.component_name
First, the condition on a is ignored, because it is the first table in the left join and is the on clause. So, I assume you actually intend for it to filter, so write the query as:
select . . .
from app_pul.anomaly_stage a left join
ge_cfg.turbine_detail t
on a.scada_number = t.system_number left join
tbwgr_v.pmt_wmf_tur_component_master_t c
on a.component = c.component_name
where a.added > '2017-12-31'
That might help with performance. Then in Postgres, you would want indexes on turbine_detail(system_number) and pmt_wmf_tur_component_master_t(component_name). It is doubtful that an index would help on the first table, because you are already selecting a large amount of data.
I'm not sure if indexes would be appropriate in Greenplum.

Verify if the joins are using respective primary and foreign keys.
Try to execute the query removing one left join after the other, so you see the focus the problem.
Try using the plan execution.

Date filter in hive while doing left outer join

I am doing a query build in hive, the query is given below.
*
Select * from CSS407
LEFT OUTER JOIN PROD_CORE.SERV_ACCT_ISVC_LINK SASP
ON CSS407.TABLE_ABBRV_CODE = 'SACT'
AND CSS407.EVENT_ITEM_REF_NUM = SASP.Serv_Acct_Id
AND to_date(CSS407.EVENT_RTS_VAL) >= SASP.Acct_Serv_Pnt_Strt_Dt
AND to_date(CSS407.EVENT_RTS_VAL) < SASP.Acct_Serv_Pnt_End_Dt
LEFT OUTER JOIN PROD_CORE.CUST_ACCT_SA_LINK ASA
ON CSS407.TABLE_ABBRV_CODE = 'SACT'
AND CSS407.EVENT_ITEM_REF_NUM = ASA.Serv_Acct_Id
AND CSS407.EVENT_RTS_VAL_UTC_DTTM >= ASA.Acct_Relt_Strt_Dttm
AND CSS407.EVENT_RTS_VAL_UTC_DTTM < ASA.Acct_Relt_End_Dttm
LEFT OUTER JOIN PROD_CORE.CUST_SA_LINK ASAT
ON CSS407.TABLE_ABBRV_CODE = 'TACT'
AND CSS407.EVENT_ITEM_REF_NUM = ASAT.Serv_Acct_Id
AND CSS407.EVENT_RTS_VAL_UTC_DTTM >= ASAT.Acct_Relt_Strt_Dttm
AND CSS407.EVENT_RTS_VAL_UTC_DTTM < ASAT.Acct_Relt_End_Dttm
*
When I am executing the above table in hive I am getting the below error
"Both left and right aliases encountered in JOIN 'SASP'"
On further investigation I founded that we cannot use date between filter in the join on condition. In every post everyone is asking to insert that filter in where condition.
But in our case if we are moving that date between filter to where condition then we are not getting any data since left outer join is not satisfying.
I am getting this issue while executing in HIVE, it is working fine in Teradata and oracle
Please help.

Only equality(=) works in join condition in Hive.Move <= to where clause.
I have the similar issue earlier.Please check below thread.
Hive Select MAX() in Join Condition
Hope this helps.

There might be some common column between CSS407 and SERV_ACCT_ISVC_LINK which might be creating this error.

CRYSTAL REPORTS - Display all records where value appears multiple times in the table

For example, I have a report with twenty columns and I want to return ONLY the rows where the value in Column 9 appears more than once. I'm fairly new to this and so far I have not figured out an answer.
Other selection criteria includes date range, and service details. The end result would be a report of members who received service more than once during the identified period.
I have seen a couple of examples returning only one column but I do not know how to apply that logic to my scenario.
SELECT "MASTERS"."MEMBNAME", "MASTERS"."MEMBID", "MASTERS"."OPT", "MASTERS"."HPCODE", "MASTERS"."CLAIMNO", "MASTERS"."CROSSREF_ID", "DETAILS"."FROMDATESVC", "DETAILS"."TODATESVC", "MASTERS"."ADMDATE", "MASTERS"."DSCHDATE", "MASTERS"."DATERECD", "DETAILS"."DIAGCODE", "DIAG_CODES"."DIAGDESC", "MASTERS"."PLACESVC", "DETAILS"."PROCCODE", "DETAILS"."HSERVICECD", "DETAILS"."PROCDESC", "DETAILS"."HSERVICEDESC", "P_MASTERS"."FULLNAME", "V_MASTERS"."VENDORNM", "MASTERS"."SPEC", "P_MASTERS"."CLASS", "DETAILS"."BILLED", "DETAILS"."CONTRVAL", "DETAILS"."ADJUST", "DETAILS"."NET", "DETAILS"."INTEREST", "DETAILS"."QTY", "DETAILS"."ADJCODE", "MASTERS"."COMPANY_ID", "MEMB_COMPANY_V"."BIRTH", "ADJUST_CODES_V"."DESCR", "MEMB_COMPANY_V"."SEX", "P_MASTERS_1"."REV_FULLNAME", "MEMB_COMPANY_V"."OPFROMDT", "MEMB_COMPANY_V"."OPTHRUDT", "V_MASTERS"."VENDORID", "P_MASTERS"."CONTRACT", "ME_V"."MEMOLINE1", "DETAILS"."COPAY", "DETAILS"."SEQUENCE", "DETAILS"."DATEPAID", "DETAILS"."CHECKNO", "P_MASTERS_1"."ACCOUNT", "MASTERS"."ADMTYPE", "MASTERS"."ADMSOURCE", "MASTERS"."CONTRVAL", "MASTERS"."STATUS", "MASTERS"."DATEPAID", "MASTERS"."CHPREFIX", "MASTERS"."NET"
FROM ((((((("Datawarehouse"."dbo"."MASTERS" "MASTERS" INNER JOIN "Datawarehouse"."dbo"."DETAILS" "DETAILS" ON "MASTERS"."CLAIMNO"="DETAILS"."CLAIMNO") INNER JOIN "Datawarehouse"."dbo"."V_MASTERS" "V_MASTERS" ON "MASTERS"."VENDOR"="V_MASTERS"."VENDORID") INNER JOIN "Datawarehouse"."dbo"."P_MASTERS" "P_MASTERS" ON ("MASTERS"."COMPANY_ID"="P_MASTERS"."COMPANY_ID") AND ("MASTERS"."PROVID"="P_MASTERS"."PROVID")) INNER JOIN "Datawarehouse"."dbo"."MEMB_COMPANY_V" "MEMB_COMPANY_V" ON ("MASTERS"."COMPANY_ID"="MEMB_COMPANY_V"."COMPANY_ID") AND ("MASTERS"."MEMBID"="MEMB_COMPANY_V"."MEMBID")) LEFT OUTER JOIN "Datawarehouse"."dbo"."ME_V" "ME_V" ON ("MASTERS"."CLAIMNO"="ME_V"."CLAIMNO") AND ("MASTERS"."COMPANY_ID"="ME_V"."COMPANY_ID")) INNER JOIN "Datawarehouse"."dbo"."DIAG_CODES" "DIAG_CODES" ON "DETAILS"."DIAGCODE"="DIAG_CODES"."DIAGCODE") LEFT OUTER JOIN "Datawarehouse"."dbo"."ADJUST_CODES_V" "ADJUST_CODES_V" ON "DETAILS"."ADJCODE"="ADJUST_CODES_V"."CODE") LEFT OUTER JOIN "Datawarehouse"."dbo"."P_MASTERS" "P_MASTERS_1" ON ("MEMB_COMPANY_V"."COMPANY_ID"="P_MASTERS_1"."COMPANY_ID") AND ("MEMB_COMPANY_V"."PCP"="P_MASTERS_1"."PROVID")
WHERE ("MASTERS"."STATUS"='9' AND "MASTERS"."COMPANY_ID"='LWDLOM' AND ("DETAILS"."ADJCODE" IS NULL OR NOT ("DETAILS"."ADJCODE" LIKE 'D%' OR "DETAILS"."ADJCODE" LIKE 'KILL%')) AND "DETAILS"."NET"=0 AND "P_MASTERS"."CLASS"='51' AND ("MASTERS"."HPCODE"='GSMH' OR "MASTERS"."HPCODE"='HENS' OR "MASTERS"."HPCODE"='SCAS') AND "MASTERS"."CONTRVAL"<>0 OR "MASTERS"."CHPREFIX"=2 AND "MASTERS"."STATUS"='9' AND "MASTERS"."COMPANY_ID"='LWDLOM' AND ("DETAILS"."ADJCODE" IS NULL OR NOT ("DETAILS"."ADJCODE" LIKE 'D%' OR "DETAILS"."ADJCODE" LIKE 'KILL%')) AND "P_MASTERS"."CLASS"<>'51' AND ("MASTERS"."HPCODE"='GSMH' OR "MASTERS"."HPCODE"='HENS' OR "MASTERS"."HPCODE"='SCAS') AND "MASTERS"."NET"<>0) AND ("DETAILS"."FROMDATESVC">={ts '2014-01-01 00:00:00'} AND "DETAILS"."FROMDATESVC"<{ts '2015-12-31 00:00:01'}) AND "MASTERS"."DATEPAID"<{ts '2015-05-31 00:00:01'}
ORDER BY "MASTERS"."CLAIMNO", "DETAILS"."SEQUENCE"

This is a simplified query compared to yours. You should only need to add the COUNT() function to column 9 and the GROUP BY after the WHERE statement and before the ORDER BY statement.
SELECT COUNT(YourColumn9Name) AS YourColumn9Name
FROM Masters
GROUP BY YourColumn9Name HAVING (COUNT(YourColumn9Name)>1)

Having difficulty combining JET SQL queries

Warning: Here be beginner SQL! Be gentle...
I have two queries that independently give me what I want from the relevant tables in a reasonably timely fashion, but when I try to combine the two in a (fugly) union, things quickly fall to bits and the query either gives me duplicate records, takes an inordinately long time to run, or refuses to run at all quoting various syntax errors at me.
Note: I had to create a 'dummy' table (tblAllDates) with a single field containing dates from 1 Jan 2008 as I need the query to return a single record from each day, and there are days in both tables that have no data. This is the only way I could figure to do this, no doubt there is a smarter way...
Here are the queries:
SELECT tblAllDates.date, SUM(tblvolumedata.STT)
FROM tblvolumedata RIGHT JOIN tblAllDates ON tblvolumedata.date=tblAllDates.date
GROUP BY tblAllDates.date;
SELECT tblAllDates.date, SUM(NZ(tblTimesheetData.batching)+NZ(tblTimesheetData.categorisation)+NZ(tblTimesheetData.CDT)+NZ(tblTimesheetData.CSI)+NZ(tblTimesheetData.destruction)+NZ(tblTimesheetData.extraction)+NZ(tblTimesheetData.indexing)+NZ(tblTimesheetData.mail)+NZ(tblTimesheetData.newlodgement)+NZ(tblTimesheetData.recordedDeliveries)+NZ(tblTimesheetData.retrieval)+NZ(tblTimesheetData.scanning)) AS VA
FROM tblTimesheetData RIGHT JOIN tblAllDates ON tblTimesheetData.date=tblAllDates.date
GROUP BY tblAllDates.date;
The best result I have managed is the following:
SELECT tblAllDates.date, 0 AS STT, SUM(NZ(tblTimesheetData.batching)+NZ(tblTimesheetData.categorisation)+NZ(tblTimesheetData.CDT)+NZ(tblTimesheetData.CSI)+NZ(tblTimesheetData.destruction)+NZ(tblTimesheetData.extraction)+NZ(tblTimesheetData.indexing)+NZ(tblTimesheetData.mail)+NZ(tblTimesheetData.newlodgement)+NZ(tblTimesheetData.recordedDeliveries)+NZ(tblTimesheetData.retrieval)+NZ(tblTimesheetData.scanning)) AS VA
FROM tblTimesheetData RIGHT JOIN tblAllDates ON tblTimesheetData.date=tblAllDates.date
GROUP BY tblAllDates.date
UNION SELECT tblAllDates.date, SUM(tblvolumedata.STT) AS STT, 0 AS VA
FROM tblvolumedata RIGHT JOIN tblAllDates ON tblvolumedata.date=tblAllDates.date
GROUP BY tblAllDates.date;
This gives me the VA and STT data I want, but in two records where I have data from both in a single day, like this:
date STT VA
28/07/2008 0 54020
28/07/2008 33812 0
29/07/2008 0 53890
29/07/2008 33289 0
30/07/2008 0 51780
30/07/2008 30456 0
31/07/2008 0 52790
31/07/2008 31305 0
What I'm after is the STT and VA data in single row per day. How might this be achieved, and how far am I away from a query that could be considered optimal? (don't laugh, I only seek to learn!)

You could put all of that into one query like so
SELECT
dates.date,
SUM(volume.STT) AS STT,
SUM(NZ(timesheet.batching)+NZ(timesheet.categorisation)+NZ(timesheet.CDT)+NZ(timesheet.CSI)+NZ(timesheet.destruction)+NZ(timesheet.extraction)+NZ(timesheet.indexing)+NZ(timesheet.mail)+NZ(timesheet.newlodgement)+NZ(timesheet.recordedDeliveries)+NZ(timesheet.retrieval)+NZ(timesheet.scanning)) AS VA
FROM
tblAllDates dates
LEFT JOIN tblvolumedata volume
ON dates.date = volume.date
LEFT JOIN tblTimesheetData timesheet
ON
dates.date timesheet.date
GROUP BY dates.date;
I've put the dates table first in the FROM clause and then LEFT JOINed the two other tables.
The jet database can be funny with more than one join in a query, so you may need to wrap one of the joins in parentheses (I believe this is referred to as Bill's SQL!) - I would recommend LEFT JOINing the tables in the query builder and then taking the SQL code view and modifying that to add in the SUMs, GROUP BY, etc.
EDIT:
Ensure that the date field in each table is indexed as you're joining each table on this field.
EDIT 2:
How about this -
SELECT date,
Sum(STT),
Sum(VA)
FROM
(SELECT dates.date, 0 AS STT, SUM(NZ(tblTimesheetData.batching)+NZ(tblTimesheetData.categorisation)+NZ(tblTimesheetData.CDT)+NZ(tblTimesheetData.CSI)+NZ(tblTimesheetData.destruction)+NZ(tblTimesheetData.extraction)+NZ(tblTimesheetData.indexing)+NZ(tblTimesheetData.mail)+NZ(tblTimesheetData.newlodgement)+NZ(tblTimesheetData.recordedDeliveries)+NZ(tblTimesheetData.retrieval)+NZ(tblTimesheetData.scanning)) AS VA
FROM tblTimesheetData RIGHT JOIN dates ON tblTimesheetData.date=dates.date
GROUP BY dates.date
UNION SELECT dates.date, SUM(tblvolumedata.STT) AS STT, 0 AS VA
FROM tblvolumedata RIGHT JOIN dates ON tblvolumedata.date=dates.date
GROUP BY dates.date
)
GROUP BY date;
Interestingly, When I ran my first statement against some test data, the figures for STT and VA had all been multiplied by 4, compared to the second statement. Very strange behaviour and certainly not what I expected.

The table of dates is the best way.
Combine the joins in there FROM clause. Something like this....
SELECT d.date,
a.value,
b.value
FROM tableOfDates d
RIGHT JOIN firstTable a
ON d.date = a.date
RIGHT JOIN secondTable b
ON d.date = b.date

Turn the SQL into views and join them on the dates.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL triple left join query across three databases - sql

Related

ACCESS SQL - Joining 2 tables on Datetime values

What are the possible ways to optimize the below postgreSQL code?

Date filter in hive while doing left outer join

CRYSTAL REPORTS - Display all records where value appears multiple times in the table

Having difficulty combining JET SQL queries

Categories

Resources