use multiple LEFT JOINs from multiple datasets SQL - sql

I need to perform multiple JOINs, I am grabbing the data from multiple tables and JOINing on id. The tricky part is that one table I need to join twice. Here is the code:
(
SELECT
content.brand_identifier AS brand_name,
CAST(timestamp(furniture.date) AS DATE) AS order_date,
total_hearst_commission
FROM
`furniture_table` AS furniture
LEFT JOIN `content_table` AS content ON furniture.site_content_id = content.site_content_id
WHERE
(
timestamp(furniture.date) >= TIMESTAMP('2020-06-01 00:00:00')
)
)
UNION
(
SELECT
flowers.a_merchant_name AS merchant_name
FROM
`flowers_table` AS flowers
LEFT JOIN `content` AS content ON flowers.site_content_id = content.site_content_id
)
GROUP BY
1,
2,
3,
4
ORDER BY
4 DESC
LIMIT
500
I thought I could use UNION but it gives me an error Syntax error: Expected keyword ALL or keyword DISTINCT but got "("

I'm not able to comment, but like GHB states, the queries do not have the same number of columns; therefore, UNION will not work here.
I think it would be helpful to know why sub-queries are needed in the first place. I'm guessing this query does not product the results you want, so please elaborate on why that is.
select
f.a_merchant_name as merchant_name,
c.brand_identifier as brand_name,
CAST(timestamp(f.date) AS DATE) AS order_date,
total_hearst_commission
from furniture_table f
left join content_table c on c.site_content_id = f.site_content_id
where timestamp(f.date) >= TIMESTAMP('2020-06-01 00:00:00')
group by 1,2,3,4

Related

Select other table as a column based on datetime in BigQuery [duplicate]

This question already has an answer here:
Full outer join and Group By in BigQuery
(1 answer)
Closed 5 months ago.
I have two tables which has a relationship, but I want to grouping them based on time. Here are the tables
I want select a receipt as a column based on published_at, it must be in between pickup_time and drop_time, so will get this result :
I tried with JOIN, but it seems like select rows with drop_time is NULL only
SELECT
t.source_id AS source_id,
t.pickup_time AS pickup_time,
t.drop_time AS drop_time,
ARRAY_AGG(STRUCT(r.source_id, r.receipt_id, r.published_at) ORDER BY r.published_at LIMIT 1)[SAFE_OFFSET(0)] AS receipt
FROM `my-project-gcp.data_source.trips` AS t
JOIN `my-project-gcp.data_source.receipts` AS r
ON
t.source_id = r.source_id
AND
r.published_at >= t.pickup_time
AND (
r.published_at <= t.drop_time
OR t.drop_time IS NULL
)
GROUP BY source_id, pickup_time, drop_time
and tried with sub-query, got
Correlated subqueries that reference other tables are not supported unless they can be de-correlated, such as by transforming them into an efficient JOIN
SELECT
t.source_id AS source_id,
t.pickup_time AS pickup_time,
t.drop_time AS drop_time,
ARRAY_AGG((
SELECT
STRUCT(r.source_id, r.receipt_id, r.published_at)
FROM `my-project-gcp.data_source.receipts` as r
WHERE
t.source_id = r.source_id
AND
r.published_at >= t.pickup_time
AND (
r.published_at <= t.drop_time
OR t.drop_time IS NULL
)
LIMIT 1
))[SAFE_OFFSET(0)] AS receipt
FROM `my-project-gcp.data_source.trips` as t
GROUP BY source_id, pickup_time, drop_time
Each source_id is a car and only one driver can drive a car at once.
We can partition therefore by that entry.
Your approach is working for small tables. Since there is no unique join key, the cross join fails on large tables.
I present here a solution with union all and look back technique. This is quite fast and works with up to middle large table sizes in the range of a few GB. It prevents the cross join, but is a quite long script.
In the table trips are all drives by the drivers are listed. The receipts list all fines.
We need a unique row identication of each trip to join on this one later on. We use the row number for this, please see table trips_with_rowid.
The table summery_tmp unions three tables. First we load the trips table and add an empty column for the fines. Then we load the trips table again to mark the times were no one was driving the car. Finally, we add the table receipts such that only the columns source_id, pickup_time and fine is filled.
This table is sorted by the pickup_time for each source_id and the table summary. So the fine entries are under the entry of the driver getting the car. The column row_id_new is filled for the fine entries by the value of the row_id of the driver getting the car.
Grouping by row_id_new and filtering unneeded entries does the job.
I changed the second of the entered times (lazyness), thus it differs a bit from your result.
With trips as
(Select 1 source_id ,timestamp("2022-7-19 9:37:47") pickup_time, timestamp("2022-07-19 9:40:00") as drop_time, "jhon" driver_name
Union all Select 1 ,timestamp("2022-7-19 12:00:01"),timestamp("2022-7-19 13:05:11"),"doe"
Union all Select 1 ,timestamp("2022-7-19 14:30:01"),null,"foo"
Union all Select 3 ,timestamp("2022-7-24 08:35:01"),timestamp("2022-7-24 09:15:01"),"bar"
Union all Select 4 ,timestamp("2022-7-25 10:24:01"),timestamp("2022-7-25 11:14:01"),"jhon"
),
receipts as
(Select 1 source_id, 101 receipt_id, timestamp("2022-07-19 9:37:47") published_at,40 price
Union all Select 1,102, timestamp("2022-07-19 13:04:47"),45
Union all Select 1,103, timestamp("2022-07-19 15:23:00"),32
Union all Select 3,301, timestamp("2022-07-24 09:15:47"),45
Union all Select 4,401, timestamp("2022-07-25 11:13:47"),45
Union all Select 5,501, timestamp("2022-07-18 07:12:47"),45
),
trips_with_rowid as
(
SELECT 2*row_number() over (order by source_id,pickup_time) as row_id, * from trips
),
summery_tmp as
(
Select *, null as fines from trips_with_rowid
union all Select row_id+1,source_id,drop_time,null,concat("no driver, last one ",driver_name),null from trips_with_rowid
union all select null,source_id, published_at, null,null, R from receipts R
),
summery as
(
SELECT last_value(row_id ignore nulls) over (partition by source_id order by pickup_time ) row_id_new
,*
from summery_tmp
order by 1,2
)
select source_id,min(pickup_time) pickup_time, min(drop_time) drop_time,
any_value(driver_name) driver_name, array_agg(fines IGNORE NULLS) as fines_Sum
from summery
group by row_id_new,source_id
having fines_sum is not null or (pickup_time is not null and driver_name not like "no driver%")
order by 1,2

Query from two tables have two different date fields

I have two tables one for receiving "PO_RECVR_HIST" and other table for sales "PS_TKT_HIST_LIN". I want to create a query showing the total for receiving and total for sales. The two dates are not related the results come wrong. The two tables has the same vendor. I am using the following query
SELECT P.VEND_NO,
sum(P.RECVR_TOT)AS RECV_TOT,
sum(S.CALC_EXT_PRC) AS SAL_TOT
FROM PO_RECVR_HIST P INNER JOIN
PS_TKT_HIST_LIN S
ON P.VEND_NO = S.ITEM_VEND_NO
WHERE P.RECVR_DAT > getdate()-7
GROUP BY P.VEND_NO, S.BUS_DAT
HAVING S.BUS_DAT > getdate()-7
ORDER BY P.VEND_NO
Any advise please?
I think I get it. You want the sums in different columns. One approach is to do a UNION ALL to get the data together, and then aggregate:
SELECT VEND_NO,
SUM(RECVR_TOT)AS RECV_TOT,
SUM(CALC_EXT_PRC) AS SAL_TOT
FROM ((SELECT P.VEND_NO, P.RECVR_DAT as dte, P.RECVR_TOT, 0 as CALC_EXT_PRC
FROM PO_RECVR_HIST P
) UNION ALL
(SELECT S.ITEM_VEND_NO, S.BUS_DAT, 0, CALC_EXT_PRC
FROM PS_TKT_HIST_LIN S
)
) PS
WHERE DTE > GETDATE() - 7
GROUP BY P.VEND_NO
ORDER BY P.VEND_NO;
A JOIN is just not a good approach because it will throw off the aggregation.

LEFT JOIN on static list of items?

DBMS is intersystems-cache!
Motivation: I need to do a left join on a table so I can get the same list of message types every time, even if the result is zero or null. Unfortunately, this is a large table so including a SELECT DISTINCT() is prohibitively slow. These should never change, so I thought I'd get the list once and just join them statically.
Based on another SO question, here is what I have to replace the SELECT DISTINCT():
SELECT 'HS.MESSAGE.GATEWAYREGISTRATIONREQUEST' as MessageBodyClassName
UNION SELECT 'HS.MESSAGE.MERGEPATIENTREQUEST'
UNION SELECT 'HS.MESSAGE.PATIENTSEARCHREQUEST'
This returns results that look exactly as expected, identical to the Distinct query. However, when I plug this into my JOIN statement, all the counts come back as zero.
Failing Query
SELECT mh.MessageBodyClassName, count(l.MessageBodyClassName) as MessageCount FROM
(
SELECT 'HS.MESSAGE.GATEWAYREGISTRATIONREQUEST' as MessageBodyClassName
UNION SELECT 'HS.MESSAGE.MERGEPATIENTREQUEST'
UNION SELECT 'HS.MESSAGE.PATIENTSEARCHREQUEST'
) mh LEFT JOIN
(
SELECT messageBodyClassName FROM ens.messageheader WHERE TimeCreated > DATEADD(hh, -1, GETUTCDATE())
) l ON mh.MessageBodyClassName = l.MessageBodyClassName
GROUP BY mh.MessageBodyClassName
Failed results
MessageBodyClassName MessageCount
------------------------------------- ------------
HS.MESSAGE.GATEWAYREGISTRATIONREQUEST 0
HS.MESSAGE.MERGEPATIENTREQUEST 0
HS.MESSAGE.PATIENTSEARCHREQUEST 0
Working Query
SELECT mh.MessageBodyClassName, count(l.MessageBodyClassName) as MessageCount FROM
(
SELECT DISTINCT(MessageBodyClassName) FROM ens.messageheader
) mh LEFT JOIN
(
SELECT messageBodyClassName FROM ens.messageheader WHERE TimeCreated > DATEADD(hh, -1, GETUTCDATE())
) l ON mh.MessageBodyClassName = l.MessageBodyClassName
GROUP BY mh.MessageBodyClassName
Working and expected results
MessageBodyClassName MessageCount
------------------------------------- ------------
HS.MESSAGE.GATEWAYREGISTRATIONREQUEST 0
HS.MESSAGE.MERGEPATIENTREQUEST 0
HS.MESSAGE.PATIENTSEARCHREQUEST 54
For VKP: Why are the results different? How can I adjust the first query with literals to get the proper (same) results?
The last thing I can think of is to run your DISTINCT query once into a permanent table in your database. That way the inner SELECT in your query will only have to process those three lines. The inner query would lose DISTINCT, like
SELECT MessageBodyClassName FROM ens.messageheader_permvals
EDIT: The below answer did not work
This may be a longshot, but if it doesn't work it might help you diagnose the problem. Instead of the UNION try
SELECT MessageBodyClassName FROM ens.messageheader
WHERE MessageBodyClassName in (
'HS.MESSAGE.GATEWAYREGISTRATIONREQUEST',
'HS.MESSAGE.MERGEPATIENTREQUEST',
'HS.MESSAGE.PATIENTSEARCHREQUEST')
That should return records only if those values actually exist in the table and are compatible with the format of MessageBodyClassName, which we know works using the DISTINCT version. I don't know if the performance will be better this way, but hopefully it will shed some light on the issue.
EDIT: the below answer does not apply, as the OP is was actually trying to select the literal quoted values
You don't have a FROM statements in your UNION query. Try
SELECT 'HS.MESSAGE.GATEWAYREGISTRATIONREQUEST' as MessageBodyClassName
FROM ens.messageheader
UNION SELECT 'HS.MESSAGE.MERGEPATIENTREQUEST'
FROM ens.messageheader
UNION SELECT 'HS.MESSAGE.PATIENTSEARCHREQUEST'
FROM ens.messageheader
The rest of the query looks right.
I agree with xQbert, problem is the hard codes values
Try
SELECT T1.MessageBodyClassName, T2.MessageBodyClassName
FROM (
SELECT 'HS.MESSAGE.GATEWAYREGISTRATIONREQUEST' as MessageBodyClassName
UNION SELECT 'HS.MESSAGE.MERGEPATIENTREQUEST'
UNION SELECT 'HS.MESSAGE.PATIENTSEARCHREQUEST'
) as T1
LEFT JOIN (
SELECT DISTINCT(MessageBodyClassName) as MessageBodyClassName
FROM ens.messageheader
) as T2
ON T1.MessageBodyClassName = T2.MessageBodyClassName
Possible solution: Create a temporal table
CREATE TABLE className as
SELECT DISTINCT(MessageBodyClassName) as MessageBodyClassName
FROM ens.messageheader

Oracle Left Join not returning all rows

I am using the following CTE. The first part collects all unique people and the second left joins the unique people with events during a particular time frame. I am expecting that all the rows be returned from my unique people table even if they don't have an event within the time frame. But this doesn't appear to be the case.
WITH DISTINCT_ATTENDING(ATTENDING) AS
(
SELECT DISTINCT ATTENDING
FROM PEOPLE
WHERE ATTENDING IS NOT NULL
), -- returns 62 records
EVENT_HISTORY(ATTENDING, TOTAL) AS
(
SELECT C.ATTENDING,
COUNT(C.ID)
FROM DISTINCT_ATTENDING D
LEFT JOIN PEOPLE C
ON C.ATTENDING = D.ATTENDING
AND TO_DATE(C.DATE, 'YYYYMMDD') < TO_DATE('20140101', 'YYYYMMDD')
GROUP BY C.ATTENDING
ORDER BY C.ATTENDING
)
SELECT * FROM EVENT_HISTORY; -- returns 49 rows
What am I doing wrong here?
Jonny
The problem is inthe column "C.ATTENDING", just change for "D.ATTENDING"
SELECT D.ATTENDING,
COUNT(C.ID)
FROM DISTINCT_ATTENDING D
LEFT JOIN PEOPLE C
ON C.ATTENDING = D.ATTENDING
AND TO_DATE(C.DATE, 'YYYYMMDD') < TO_DATE('20140101', 'YYYYMMDD')
GROUP BY D.ATTENDING
ORDER BY D.ATTENDING
Your query seems too complicated. I think the following does the same thing:
SELECT P.ATTENDING,
SUM(CASE WHEN TO_DATE(P.DATE, 'YYYYMMDD') < TO_DATE('20140101', 'YYYYMMDD')
THEN 1 ELSE 0 END)
FROM PEOPLE P
WHERE P.ATTENDING IS NOT NLL
GROUP BY P.ATTENDING
ORDER BY P.ATTENDING ;
Your problem is that you are aggregating by a column in the second table of a left join. This is NULL when there is no match.

Unpivot date columns to a single column of a complex query in Oracle

Hi guys, I am stuck with a stubborn problem which I am unable to solve. Am trying to compile a report wherein all the dates coming from different tables would need to come into a single date field in the report. Ofcourse, the max or the most recent date from all these date columns needs to be added to the single date column for the report. I have multiple users of multiple branches/courses for whom the report would be generated.
There are multiple blogs and the latest date w.r.t to the blogtitle needs to be grouped, i.e. max(date_value) from the six date columns should give the greatest or latest date for that blogtitle.
Expected Result:
select u.batch_uid as ext_person_key, u.user_id, cm.batch_uid as ext_crs_key, cm.crs_id, ir.role_id as
insti_role, (CASE when b.JOURNAL_IND = 'N' then
'BLOG' else 'JOURNAL' end) as item_type, gm.title as item_name, gm.disp_title as ITEM_DISP_NAME, be.blog_pk1 as be_blogPk1, bc.blog_entry_pk1 as bc_blog_entry_pk1,bc.pk1,
b.ENTRY_mod_DATE as b_ENTRY_mod_DATE ,b.CMT_mod_DATE as BlogCmtModDate, be.CMT_mod_DATE as be_cmnt_mod_Date,
b.UPDATE_DATE as BlogUpDate, be.UPDATE_DATE as be_UPDATE_DATE,
bc.creation_date as bc_creation_date,
be.CREATOR_USER_ID as be_CREATOR_USER_ID , bc.creator_user_id as bc_creator_user_id,
b.TITLE as BlogTitle, be.TITLE as be_TITLE,
be.DESCRIPTION as be_DESCRIPTION, bc.DESCRIPTION as bc_DESCRIPTION
FROM users u
INNER JOIN insti_roles ir on u.insti_roles_pk1 = ir.pk1
INNER JOIN crs_users cu ON u.pk1 = cu.users_pk1
INNER JOIN crs_mast cm on cu.crsmast_pk1 = cm.pk1
INNER JOIN blogs b on b.crsmast_pk1 = cm.pk1
INNER JOIN blog_entry be on b.pk1=be.blog_pk1 AND be.creator_user_id = cu.pk1
LEFT JOIN blog_CMT bc on be.pk1=bc.blog_entry_pk1 and bc.CREATOR_USER_ID=cu.pk1
JOIN gradeledger_mast gm ON gm.crsmast_pk1 = cm.pk1 and b.grade_handler = gm.linkId
WHERE cu.ROLE='S' AND BE.STATUS='2' AND B.ALLOW_GRADING='Y' AND u.row_status='0'
AND u.available_ind ='Y' and cm.row_status='0' and and u.batch_uid='userA_157'
I am getting a resultset for the above query with multiple date columns which I want > > to input into a single columnn. The dates have to be the most recent, i.e. max of the dates in the date columns.
I have successfully done the Unpivot by using a view to store the above
resultset and put all the dates in one column. However, I do not
want to use a view or a table to store the resultset and then do
Unipivot simply because I cannot keep creating views for every user
one would query for.
The max(date_value) from the date columns need to be put in one single column. They are as follows:
* 1) b.entry_mod_date, 2) b.cmt_mod_date ,3) be.cmt_mod_date , 4) b.update_Date ,5) be.update_date, 6) bc.creation_date *
Apologies that I could not provide the desc of all the tables and the
fields being used.
Any help to get the above mentioned max of the dates from these
multiple date columns into a single column without using a view or a
table would be greatly appreciated.*
It is not clear what results you want, but the easiest solution is to use greatest().
with t as (
YOURQUERYHERE
)
select t.*,
greatest(entry_mod_date, cmt_mod_date, cmt_mod_date, update_Date,
update_date, bc.creation_date
) as greatestdate
from t;
select <columns>,
case
when greatest (b_ENTRY_mod_DATE) >= greatest (BlogCmtModDate) and greatest(b_ENTRY_mod_DATE) >= greatest(BlogUpDate)
then greatest( b_ENTRY_mod_DATE )
--<same implementation to compare each time BlogCmtModDate and BlogUpDate separately to get the greatest then 'date'>
,<columns>
FROM table
<rest of the query>
UNION ALL
Select <columns>,
case
when greatest (be_cmnt_mod_Date) >= greatest (be_UPDATE_DATE)
then greatest( be_cmnt_mod_Date )
when greatest (be_UPDATE_DATE) >= greatest (be_cmnt_mod_Date)
then greatest( be_UPDATE_DATE )
,<columns>
FROM table
<rest of the query>
UNION ALL
Select <columns>,
GREATEST(bc_creation_date)
,<columns>
FROM table
<rest of the query>