How to use WHERE clauses for a complex JOIN query - sql

I'm trying to make my data set smaller, where I'm currently bringing in data from 8 different tables. In order to do this, I'd like to use the WHERE clause to filter out unnecessary data, but I'm not sure how to do that for all 8 tables. This is my current query:
--GroupA first, to join the hits and sessions tables
SELECT
GroupA_hits.session_id, GroupA_hits.hits_eventInfo_eventCategory, GroupA_hits.hits_eventInfo_eventAction, GroupA_hits.hits_eventInfo_eventLabel, GroupA_hits.cd126_hit_placeholder,
GroupA_sessions.session_id, GroupA_sessions.userId, GroupA_sessions.fullVisitorId, GroupA_sessions.visitNumber, GroupA_sessions.date,
GroupB_hits.session_id, GroupB_hits.hits_eventInfo_eventCategory, GroupB_hits.hits_eventInfo_eventAction, GroupB_hits.hits_eventInfo_eventLabel, GroupB_hits.cd126_hit_placeholder,
GroupB_sessions.session_id, GroupB_sessions.userId, GroupB_sessions.fullVisitorId, GroupB_sessions.visitNumber, GroupB_sessions.date,
GroupC_hits.session_id, GroupC_hits.hits_eventInfo_eventCategory, GroupC_hits.hits_eventInfo_eventAction, GroupC_hits.hits_eventInfo_eventLabel, GroupC_hits.cd126_hit_placeholder,
GroupC_sessions.session_id, GroupC_sessions.userId, GroupC_sessions.fullVisitorId, GroupC_sessions.visitNumber, GroupC_sessions.date,
GroupD_hits.session_id, GroupD_hits.hits_eventInfo_eventCategory, GroupD_hits.hits_eventInfo_eventAction, GroupD_hits.hits_eventInfo_eventLabel, GroupD_hits.cd126_hit_placeholder,
GroupD_sessions.session_id, GroupD_sessions.userId, GroupD_sessions.fullVisitorId, GroupD_sessions.visitNumber, GroupD_sessions.date
FROM `GroupA-bigquery.170369603.ga_flat_hits_202104*` GroupA_hits
LEFT JOIN `GroupA-bigquery.170369603.ga_flat_sessions_202104*` GroupA_sessions
ON (
GroupA_hits.session_id = GroupA_sessions.session_id
)
--Next, join GroupB to GroupA
LEFT JOIN `GroupB-bigquery.170359716.ga_flat_hits_202104*` GroupB_hits
ON (
GroupB_hits.session_id = GroupA_hits.session_id
)
LEFT JOIN `GroupB-bigquery.170359716.ga_flat_sessions_202104*` GroupB_sessions
ON (
GroupB_sessions.session_id = GroupA_sessions.session_id
)
--Now, join GroupC to GroupA
LEFT JOIN `GroupC-bigquery.170726426.ga_flat_hits_202104*` GroupC_hits
ON (
GroupC_hits.session_id = GroupA_hits.session_id
)
LEFT JOIN `GroupC-bigquery.170726426.ga_flat_sessions_202104*` GroupC_sessions
ON (
GroupC_sessions.session_id = GroupA_sessions.session_id
)
--Next, join GroupD to GroupA
LEFT JOIN `GroupD-bigquery.170374765.ga_flat_hits_202104*` GroupD_hits
ON (
GroupD_hits.session_id = GroupA_hits.session_id
)
LEFT JOIN `GroupD-bigquery.170374765.ga_flat_sessions_202104*` GroupD_sessions
ON (
GroupD_sessions.session_id = GroupA_sessions.session_id
)
I would like to also include the below clauses, these are all the same column names in the different _hits tables. This is what I've tried, but I get a "This query returned no results" back. I think it's because the way this query is written, BigQuery is looking for a row where all of these exist in one hit (is my assumption), which, there won't be any. But I'd like it to look through these four tables and grab all matching rows.
WHERE GroupA_hits.hits_eventInfo_eventCategory = 'rewards'
AND GroupB_hits.hits_eventInfo_eventCategory = 'rewards'
AND GroupC_hits.hits_eventInfo_eventCategory = 'rewards'
AND GroupD_hits.hits_eventInfo_eventCategory = 'rewards'
AND GroupA_hits.hits_eventInfo_eventAction = 'redeem points confirm'
AND GroupB_hits.hits_eventInfo_eventAction = 'redeem points confirm'
AND GroupC_hits.hits_eventInfo_eventAction = 'redeem points confirm'
AND GroupD_hits.hits_eventInfo_eventAction = 'redeem points confirm'
AND GroupA_hits.hits_eventInfo_eventLabel = 'gas savings'
AND GroupB_hits.hits_eventInfo_eventLabel = 'gas savings'
AND GroupC_hits.hits_eventInfo_eventLabel = 'gas savings'
AND GroupD_hits.hits_eventInfo_eventLabel = 'gas savings'
AND GroupA_hits.cd126_hit_placeholder Is Not NULL
AND GroupB_hits.cd126_hit_placeholder Is Not NULL
AND GroupC_hits.cd126_hit_placeholder Is Not NULL
AND GroupD_hits.cd126_hit_placeholder Is Not NULL

Consider moving the WHERE conditions into ON clauses to filter those tables during the LEFT JOIN operations:
...
FROM `GroupA-bigquery.170369603.ga_flat_hits_202104*` GroupA_hits
LEFT JOIN `GroupA-bigquery.170369603.ga_flat_sessions_202104*` GroupA_sessions
ON GroupA_hits.session_id = GroupA_sessions.session_id
AND GroupA_hits.hits_eventInfo_eventCategory = 'rewards'
AND GroupA_hits.hits_eventInfo_eventAction = 'redeem points confirm'
AND GroupA_hits.hits_eventInfo_eventLabel = 'gas savings'
AND GroupA_hits.cd126_hit_placeholder Is Not NULL
--Next, join GroupB to GroupA
LEFT JOIN `GroupB-bigquery.170359716.ga_flat_hits_202104*` GroupB_hits
ON GroupB_hits.session_id = GroupA_hits.session_id
AND GroupB_hits.hits_eventInfo_eventCategory = 'rewards'
AND GroupB_hits.hits_eventInfo_eventAction = 'redeem points confirm'
AND GroupB_hits.hits_eventInfo_eventLabel = 'gas savings'
AND GroupB_hits.cd126_hit_placeholder Is Not NULL
LEFT JOIN `GroupB-bigquery.170359716.ga_flat_sessions_202104*` GroupB_sessions
ON GroupB_sessions.session_id = GroupA_sessions.session_id
--Now, join GroupC to GroupA
LEFT JOIN `GroupC-bigquery.170726426.ga_flat_hits_202104*` GroupC_hits
ON GroupC_hits.session_id = GroupA_hits.session_id
AND GroupC_hits.hits_eventInfo_eventCategory = 'rewards'
AND GroupC_hits.hits_eventInfo_eventAction = 'redeem points confirm'
AND GroupC_hits.hits_eventInfo_eventLabel = 'gas savings'
AND GroupC_hits.cd126_hit_placeholder Is Not NULL
LEFT JOIN `GroupC-bigquery.170726426.ga_flat_sessions_202104*` GroupC_sessions
ON GroupC_sessions.session_id = GroupA_sessions.session_id
--Next, join GroupD to GroupA
LEFT JOIN `GroupD-bigquery.170374765.ga_flat_hits_202104*` GroupD_hits
ON GroupD_hits.session_id = GroupA_hits.session_id
AND GroupD_hits.hits_eventInfo_eventCategory = 'rewards'
AND GroupD_hits.hits_eventInfo_eventAction = 'redeem points confirm'
AND GroupD_hits.hits_eventInfo_eventLabel = 'gas savings'
AND GroupD_hits.cd126_hit_placeholder Is Not NULL
LEFT JOIN `GroupD-bigquery.170374765.ga_flat_sessions_202104*` GroupD_sessions
ON GroupD_sessions.session_id = GroupA_sessions.session_id

BigQuery is looking for a row where all of these exist in one hit (is my assumption), which, there won't be any.
Sounds like you want each option in an OR group, which could be further simplified like this:
WHERE
'rewards' IN (GroupA_hits.hits_eventInfo_eventCategory, GroupB_hits.hits_eventInfo_eventCategory, GroupC_hits.hits_eventInfo_eventCategory, GroupD_hits.hits_eventInfo_eventCategory)
AND 'redeem points confirm' IN (GroupA_hits.hits_eventInfo_eventAction, GroupB_hits.hits_eventInfo_eventAction, GroupC_hits.hits_eventInfo_eventAction, GroupD_hits.hits_eventInfo_eventAction)
AND 'gas savings' IN (GroupA_hits.hits_eventInfo_eventLabel, GroupB_hits.hits_eventInfo_eventLabel, GroupC_hits.hits_eventInfo_eventLabel, GroupD_hits.hits_eventInfo_eventLabel)
AND COALESCE(GroupA_hits.cd126_hit_placeholder, GroupB_hits.cd126_hit_placeholder, GroupC_hits.cd126_hit_placeholder, GroupD_hits.cd126_hit_placeholder) Is Not NULL
Note that I'm making some assumptions about how BigQuery handles ANSI standard SQL, as I'm not a regular BigQuery user.

Related

I want to output table based on what the user input

User required to input an APP_ID but optional to put MILESTONE_ID and TASK_ID. If MILESTONE_ID only the input, it should be the milestone only not include the task.
Ex. APP_ID: 1 MILESTONE_ID:341 TASK_ID: (blank) = output should be app_id and milestone only.
When APP_ID: 1 MILESTONE_ID: (blank) TASK_ID: 441 = output should be app_id and task only.
Lastly when user input a APP_ID: 1, MILESTONE_ID:341, TASK_ID: 441 = output should be app_id, task and milestone.
My current query is below:
SELECT APPLICATION, MILESTONE_NAME, TASK_NAME, FIELD_NAME, FIELD_ALIAS
FROM TBL_APPLICATIONS A
INNER JOIN TBL_WORKFLOWS B ON B.APPLICATION_FK = A.APPLICATION_PK
INNER JOIN TBL_WORKFLOW_DEFINITION C ON C.WORKFLOW_FK = B.WORKFLOW_PK
INNER JOIN TBL_MILESTONE D ON D.MILESTONE_PK = C.START_MILESTONE_FK OR D.MILESTONE_PK = C.END_MILESTONE_FK
INNER JOIN TBL_TASK_FOR_MILESTONE E ON E.MILESTONE_FK = D.MILESTONE_PK
INNER JOIN TBL_TASK F ON F.TASK_PK = E.TASK_FK strong text
INNER JOIN TBL_REQ_FOR_TASK G ON G.TASK_FK = F.TASK_PK
INNER JOIN TBL_TASK_REQUIREMENTS H ON H.TASK_REQUIREMENT_PK = G.TASK_REQUIREMENT_FK
WHERE APPLICATION_PK = :APPLICATION_ID
OR MILESTONE_PK = :MILESTONE_ID
OR TASK_PK = :TASK_ID
Output looks like this.
Your requirements are slightly unclear. If you want to change the projection of the query or even remove the tables from the FROM clause, that cannot be done in pure SQL. Dynamically assembling queries requires PL/SQL or some other client language.
But if your requirement is simply to conditionally suppress the column value whether a parameter is populated, that can be done by testing each parameter, like this:
SELECT APPLICATION
, case when :MILESTONE_ID is not null then MILESTONE_NAME end as MILESTONE_NAME
, case when :TASK_ID is not null then TASK_NAME end as TASK_NAME
, FIELD_NAME
, FIELD_ALIAS
FROM TBL_APPLICATIONS A
INNER JOIN TBL_WORKFLOWS B ON B.APPLICATION_FK = A.APPLICATION_PK
INNER JOIN TBL_WORKFLOW_DEFINITION C ON C.WORKFLOW_FK = B.WORKFLOW_PK
INNER JOIN TBL_MILESTONE D ON D.MILESTONE_PK = C.START_MILESTONE_FK OR D.MILESTONE_PK = C.END_MILESTONE_FK
INNER JOIN TBL_TASK_FOR_MILESTONE E ON E.MILESTONE_FK = D.MILESTONE_PK
INNER JOIN TBL_TASK F ON F.TASK_PK = E.TASK_FK strong text
INNER JOIN TBL_REQ_FOR_TASK G ON G.TASK_FK = F.TASK_PK
INNER JOIN TBL_TASK_REQUIREMENTS H ON H.TASK_REQUIREMENT_PK = G.TASK_REQUIREMENT_FK
WHERE APPLICATION_PK = :APPLICATION_ID
and ( :MILESTONE_ID is not null or :TASK_ID is not null )
and ( :MILESTONE_ID is null or MILESTONE_PK = :MILESTONE_ID )
and ( :TASK_ID is null or TASK_PK = : )
Note that I have corrected the WHERE clause to include conditions that only filter on MILESTONE_ID or TASK_ID when they are populated.

Get part of the result into columns instead of lines

I have a query about internal and external trainings. Every training has 1 or more speakers. What I want is to have a column for every speaker.
I was checking on aggregate functions, but I couldn't find the one solving my problem.
select e.sid
, e.access_kz EXTERN1
, z.zuname VERTRETERPERSONAL1
from MDEV e
join MDEVCAL ec on (ec.klient_id = e.klient_id and ec.EV_sid = e.sid)
left outer join MDEVCALS ecs on (ecs.klient_id = e.klient_id and ecs.ev_sid = e.sid and ecs.evcal_sid = ec.sid)
left outer join MDZHD z on (z.klient_id = e.klient_id and z.sid = ecs.MDZHD_SID)
I use left outer joins because a training doesn't have to have a speaker but can have X different speakers
What I want to have is something like this:
SID | Extern1 | VERTRETERPERSONAL1 | VERTRETERPERSONAL2 | VERTRETERPERSONAL3 ...
I think the Following query will resolve your problem but with the static list of trainers.
SELECT
*
FROM
(
SELECT
E.SID,
E.ACCESS_KZ EXTERN1,
Z.ZUNAME VERTRETERPERSONAL
FROM
MDEV E
JOIN MDEVCAL EC ON ( EC.KLIENT_ID = E.KLIENT_ID
AND EC.EV_SID = E.SID )
LEFT OUTER JOIN MDEVCALS ECS ON ( ECS.KLIENT_ID = E.KLIENT_ID
AND ECS.EV_SID = E.SID
AND ECS.EVCAL_SID = EC.SID )
LEFT OUTER JOIN MDZHD Z ON ( Z.KLIENT_ID = E.KLIENT_ID
AND Z.SID = ECS.MDZHD_SID )
) PIVOT (
MAX ( VERTRETERPERSONAL )
FOR VERTRETERPERSONAL
IN ( '<VERTRETERPERSONAL_NAME1>' AS VERTRETERPERSONAL1, '<VERTRETERPERSONAL_NAME2>' AS VERTRETERPERSONAL2, .... )
);
Cheers!!

Remove row from SQL pivot when fields null

The following query below listed generates blank values in a row.
-- Simplify the pivot selection query by separating the query using a with clause
WITH pivot_data AS(
SELECT va.identity,
vc.units,
s.field_name "Sample ID",
s.id_text "Lab ID",
TO_CHAR(str.result_value, S_FORMATMASK_PACKAGE.s_FormatMask(vc.analysis, s.id_numeric))result_value
FROM sample s
LEFT OUTER JOIN client c ON c.id = s.client_id
LEFT OUTER JOIN samp_test_result str ON (s.id_numeric = str.id_numeric
AND s.id_text = str.id_text and str.result_value is not null)
LEFT OUTER JOIN versioned_analysis va ON (va.identity = str.analysis)
LEFT OUTER JOIN versioned_component vc ON (vc.analysis = va.identity
AND vc.analysis_version = va.analysis_version
AND vc.name = str.component_name)
WHERE s.fas_sample_type = sample_pkg.get_leaf_sample
AND s.status = sample_pkg.get_authorised_sample
AND s.flg_released = constant_pkg.get_true
AND vc.flg_report = constant_pkg.get_true
AND c.id = UPPER ('I000009')
AND s.ID_NUMERIC between TO_NUMBER(2126) and TO_NUMBER(12917) and str.result_value <> 0
)
SELECT pvt1.*
FROM(SELECT * FROM pivot_data PIVOT (MAX(result_value) result_value, MAX(units) units FOR identity IN('NIR_N' "Nitrogen",
'XRF_P' "Phosphorus",
'XRF_K' "Potassium",
'XRF_CA' "Calcium",
'XRF_MG' "Magnesium",
'XRF_S' "Sulphur",
'XRF_SI' "Silicon",
'XRF_ZN' "Zinc",
'XRF_MN' "Manganese",
'XRF_CU' "Copper",
'XRF_FE' "Iron"))) pvt1
How do I remove a row that contains null fields

How to get fields from multiple tables

I want to get fields from 2 different tables . The last field candidate_score_id has a many to one relationship. So how should I join the below 2 queries
1) To get candidate_score_id from the candidate_score table.
select candidate_score_id from candidate_score a where
a.assessment_id = NEW.assessment_id and
a.candidate_id = NEW.candidate_id and
a.attempt_Count = NEW.attempt_count;
2) To insert different fields in to the candidate_score_details table. The field in this table should be obtained by query above.
insert into candidate_score_details(candidate_score_details_id, candidate_id, assessment_id, attempt_count, score_type, score_tag,correct, candidate_score_id)
select uuid();
select a.candidate_id, a.assessment_id,a.attempt_count,"BY-COMPLEXITY",
case c.complexity
when 1 then "HIGH"
when 2 then "MEDIUM"
when 3 then "LOW"
end, count(*) from candidate_answer a, answer_key b, question_meta_data c where a.candidate_id = NEW.candidate_id and
a.assessment_id = NEW.assessment_id and
a.attempt_count = NEW.attempt_count and
a.assessment_id = b.assessment_id and
a.question_id = b.question_number and
a.response = b.answer and
a.question_id = c.question_number
group by a.candidate_id, a.assessment_id, a.attempt_count, c.complexity;
Just looking at the SQL joining aspect of your question, you'll need to specify the table I THINK you're aliasing a 2nd table with the "NEW" reference. If that's the case, then the query would be (replacing "OTHER_TABLE_NAME" with the name of the 2nd table:
select a.candidate_score_id
from candidate_score a
left join OTHER_TABLE_NAME new on
and a.assessment_id = NEW.assessment_id
and a.candidate_id = NEW.candidate_id
and a.attempt_Count = NEW.attempt_count
Seems that Query 1 has the same 3 criteria on the "candidate_score" table as for the "candidate_answer" table in Query 2.
So how about adding a LEFT JOIN of "candidate_score" to "candidate_answer" on those 3 fields?
For example:
INSERT INTO candidate_score_details
(
candidate_score_details_id,
candidate_id,
assessment_id,
attempt_count,
score_type,
score_tag,
correct,
candidate_score_id
)
SELECT
uuid(),
answer.candidate_id,
answer.assessment_id,
answer.attempt_count,
'BY-COMPLEXITY' AS score_type,
(CASE meta.complexity
WHEN 1 THEN 'HIGH'
WHEN 2 THEN 'MEDIUM'
WHEN 3 THEN 'LOW'
END) AS score_tag,
COUNT(*) AS correct,
MAX(score.candidate_score_id) AS max_candidate_score_id
FROM candidate_answer AS answer
JOIN answer_key AS akey
ON (akey.assessment_id = answer.assessment_id AND akey.question_number = answer.question_id AND akey.answer = answer.response)
LEFT JOIN candidate_score AS score
ON (score.candidate_id = answer.candidate_id AND score.assessment_id = answer.assessment_id AND score.attempt_count = answer.attempt_count)
LEFT JOIN question_meta_data AS meta
ON meta.question_number = answer.question_id
WHERE answer.candidate_id = NEW.candidate_id
AND answer.assessment_id = NEW.assessment_id
AND answer.attempt_count = NEW.attempt_count
GROUP BY answer.candidate_id, answer.assessment_id, answer.attempt_count, meta.complexity;

Selecting fields from second table in left join

The fields c.Mcc and c.Mnc in outer select are not getting populated eventhough I have those fields in bd:RawDebug.CarrierDetails.
Any help would be appreciated.
SELECT d.Id, d.DebugReason, d.DebugData, d.d1, d.d2, c.Mcc, c.Mnc
FROM
(SELECT
Id, DebugReason, DebugData,
INTEGER(SUBSTR(DebugData,0,3)) AS d1,
SUBSTR(REGEXP_REPLACE(DebugData,'[^a-zA-Z0-9]',' '),4,LENGTH(DebugData)-3) as d2
FROM TABLE_DATE_RANGE([bd:RawDebug.T],TIMESTAMP('2016-05-16'),TIMESTAMP('2016-05-16'))
WHERE DebugReason = 50013 and Id = 550661626 LIMIT 50) AS d
LEFT JOIN
(
SELECT Network, Mcc, STRING(Mnc) as Mnc from [bd:RawDebug.CarrierDetails]
) AS c
ON c.Mcc = d.d1 and c.Mnc = d.d2
LIMIT 50
If c.Mcc and c.Mnc are NULL in your results, yet that have a value in the [bd:RawDebug.CarrierDetails] table, then the only explanation is the LEFT JOIN criteria. You should look again at the condition ON c.Mcc = d.d1 and c.Mnc = d.d2 and ensure this really does match with your data in [bd:RawDebug.CarrierDetails]