Multiple left joins with aggregation on same table causes huge performance hit in SAP HANA - hana

I am joining two tables on HANA and, to get some statistics, I am LEFT joining the items table 3 times to get a total count, number of entries processed and number of errors, as shown below.
This is a dev system and the items table has only 1500 items. But the query below runs for 17 seconds.
When I remove any of the three aggregation terms (but leave the corresponding JOIN in place), the query executes almost immediately.
I have also tried adding indexes on the fields used in the specific JOINs, but that makes no difference.
select rk.guid, rk.run_id, rk.status, rk.created_at, rk.created_by,
count( distinct rp.guid ),
count( distinct rp2.guid ),
count( distinct rp3.guid )
from zbsbpi_rk as rk
left join zbsbpi_rp as rp
on rp.header = rk.guid
left join zbsbpi_rp as rp2
on rp2.header = rk.guid
and rp2.processed = 'X'
left join zbsbpi_rp as rp3
on rp3.header = rk.guid
and rp3.result_status = 'E'
where rk.run_id = '0000000010'
group by rk.guid, run_id, status, created_at, created_by

I think you can re-write you query to improve the performance:
select rk.guid, rk.run_id, rk.status, rk.created_at, rk.created_by,
count( distinct rp.guid ),
count( distinct (CASE WHEN rp.processed = 'X' then rp.guid else null end) ),
count( distinct (CASE WHEN rp.result_status = 'E' then rp.guid else null end))
from zbsbpi_rk as rk
left join zbsbpi_rp as rp
on rp.header = rk.guid
where rk.run_id = '0000000010'
group by rk.guid, run_id, status, created_at, created_by
I'm not entirely sure if the count distinct case construct will work on hana but you may try.

My apologies, but I forgot that I had posted this question here. I had posted the same question at answers.sap.com after not getting any joy here: https://answers.sap.com/questions/172096/multiple-left-joins-with-aggregation-on-same-table.html
I eventually came up with the solution, which was a bit of a "doh!" moment:
select rk.guid, rk.run_id, rk.status, rk.created_at, rk.created_by,
count( distinct rp.guid ),
count( distinct rp2.guid ),
count( distinct rp3.guid )
from zbsbpi_rk as rk
join zbsbpi_rp as rp
on rp.header = rk.guid
left join zbsbpi_rp as rp2
on rp2.guid = rp.guid
and rp2.processed = 'X'
left join zbsbpi_rp as rp3
on rp3.guid = rp.guid
and rp3.result_status = 'E'
where rk.run_id = '0000000010'
group by rk.guid, run_id, status, created_at, created_by
The subsequent left joins needed only to be joined to the first join on the same table, as the first join contained a superset of all the records anyway.

Related

Enter data for missing category in snowflake

I have a table like
For each keyword, there are 2 devices - mobile and desktop. If entry for only one device is found, then it should automatically create the entry for other device keeping the data in rest of the columns same. I am currently doing a full outer join which is working fine for the case where one device category is missing but generating duplicates where both devices are present. For example,
my current query is giving the result as
select a.keyword, b.device, a.rating
from kw a full outer join kw b
on a.keyword=b.keyword and a.rating=b.rating
How do I get the result as
The first step will be to identify records that don't have a paired record. There's a couple of ways to do this, but the easiest is probably just a quick GROUP BY/HAVING:
SELECT keyword
FROM kw
GROUP BY keyword
HAVING COUNT(*) = 1
You can those join those results back into the original table to generate the new records that are needed:
SELECT sk.keyword,
CASE WHEN kw.device = 'mobile' THEN 'desktop' ELSE 'mobile' END as device,
kw.rating
FROM
(
SELECT keyword
FROM kw
GROUP BY keyword
HAVING COUNT(*) = 1
)sk
INNER JOIN kw ON kw.keyword = sk.keyword
Then you can UNION back in the original table to bring your new records and existing records into a single result set:
SELECT sk.keyword,
CASE WHEN kw.device = 'mobile' THEN 'desktop' ELSE 'mobile' END as device,
kw.rating
FROM
(
SELECT keyword
FROM kw
GROUP BY keyword
HAVING COUNT(*) = 1
)sk
INNER JOIN kw ON kw.keyword = sk.keyword
UNION ALL
SELECT * FROM kw;
As another option that will scale if you add in more 'devices' is to cross join all the potential device/keyword combinations and then left join to your original table:
SELECT
fe.keyword,
fe.device,
CASE WHEN kw.rating IS NULL THEN max(rating) OVER (PARTITION BY fe.keyword) ELSE kw.rating END AS rating
FROM
(
SELECT DISTINCT kw.keyword, kw2.device
FROM kw, kw kw2
) fe
LEFT OUTER JOIN kw ON kw.keyword = fe.keyword
AND kw.device = fe.device;

(probably) very simple SQL query needed

Having a slow day....could use some assistance writing a simple ANSI SQL query.
I have a list of individuals within families (first and last names), and a second table which lists a subset of those individuals. I would like to create a third table which flags every individual within a family if ANY of the individuals are not listed in the second table. The goal is essentially to flag "incomplete" families.
Below is an example of the two input tables, and the desired third table.
As I said...very simple...having a slow day. Thanks!
I think you want a left join and case expression:
select t1.*,
(case when t2.first_name is null then 'INCOMPLETE' else 'OK' end) as flag
from table1 t1 left join
table2 t2
on t1.first_name = t2.first_name and t1.last_name = t2.last_name;
Of course, this marks "Diane Thomson" as "OK", but I think that is an error in the question.
EDIT:
Oh, I see. The last name defines the family (that seems like a pretty big assumption). But you can do this with window functions:
select t1.*,
(case when count(t2.first_name) over (partition by t1.last_name) =
count(*) over (partition by t1.last_name)
then 'OK'
else 'INCOMPLETE'
end) as flag
from table1 t1 left join
table2 t2
on t1.first_name = t2.first_name and t1.last_name = t2.last_name;
That's not simple, at least not in SAS :-)
Standard SQL, when Windowed Aggregates are supported:
select ft.*,
-- counts differ when st.first_name is null due to the outer join
case when count(*) over (partition by ft.last_name)
= count(st.first_name) over (partition by ft.last_name)
then 'OK'
else 'INCOMPLETE'
end
from first_table as ft
left join second_table as st
on ft.first_name = st.first_name
and ft.last_name = ft.last_name
Otherwise you need to a standard aggregate and join back:
select ft.*, st.flag
from first_table as ft
join
(
select ft.last_name,
case when count(*)
= count(st.first_name)
then 'OK'
else 'INCOMPLETE'
end as flag
from first_table as ft
left join second_table as st
on ft.first_name = st.first_name
and ft.last_name = st.last_name
group by ft.last_name
) as st
on ft.last_name = st.last_name
It is pretty easy to do in SAS if you want to take advantage of its non-ANSI SQL feature of automatically re-merging aggregate function results back onto detail records.
select
a.first
, a.last
, case when 1=max(missing(b.last)) then 'INCOMPLETE'
else 'OK'
end as flag
from table1 a left join table2 b
on a.last=b.last and a.first=b.first
group by 2
order by 2,1
;

Multiple Join count doesnt get 0

I've been trying to get data with joins. But problem is result doesn't has records which are has no data in second or third table.
Here is the query;
SELECT AUDIT_CONFIG.TITLE,AUDIT_CONFIG.AUDITOR_POOL,AUDIT_CONFIG.FREQUENCE,
TO_CHAR(TO_DATE(AUDIT_CONFIG.START_DATE,'yyyymmdd'),'dd/mm/yyyy') AS "START",
AUDIT_CONFIG.AUDIT_ID, TO_CHAR(MAX(AUDIT_DATES.AUDIT_DATE), 'dd/mm/yyyy') AS "FINISH",
TRUNC(MAX(AUDIT_DATES.AUDIT_DATE) - SYSDATE) DAY_TO,
(SELECT COUNT(DISTINCT UNIQ_ID) FROM SENDED_AUDIT) AS SCHEDULED,
(SELECT COUNT(*) FROM AUDIT_RESULTS WHERE PASSORFAIL='P') AS PASS,
(SELECT COUNT(*) FROM AUDIT_RESULTS WHERE PASSORFAIL='F') AS FAIL
FROM AUDIT_CONFIG
RIGHT JOIN AUDIT_DATES ON AUDIT_DATES.AUDIT_ID = AUDIT_CONFIG.AUDIT_ID
RIGHT JOIN SENDED_AUDIT ON SENDED_AUDIT.AUDIT_ID=AUDIT_CONFIG.AUDIT_ID
RIGHT JOIN AUDIT_RESULTS ON AUDIT_RESULTS.AUDIT_ID=AUDIT_CONFIG.AUDIT_ID
GROUP BY AUDIT_CONFIG.TITLE, AUDIT_CONFIG.AUDITOR_POOL, AUDIT_CONFIG.FREQUENCE,
TO_CHAR(TO_DATE(AUDIT_CONFIG.START_DATE, 'yyyymmdd'), 'dd/mm/yyyy'), AUDIT_CONFIG.AUDIT_ID;
And here is a image for understanding the problem; (my query returns just first row)
So any advice for getting 0 rows? Thanks in advance..
EDİT For Thorsten Kettner:
Solved now :) thank you for your help and time
Your query looks overly complicated
To start with: Few people use right outer joins for we find them less intuitive than left outer joins. It even seems you were confused with the joins and really wanted left joins instad.
Another thing is the count subqueries that are not related to the records in the main query. I don't think this is on purpose, is it?
Then you join sended_audit and audit_results - the same tables you are using in the count subqueries, but you don't use these joined records in your query.
I guess you want:
select
ac.title,
ac.auditor_pool,
ac.frequence,
to_char(to_date(ac.start_date, 'yyyymmdd'), 'dd/mm/yyyy') as "start",
ac.audit_id,
to_char(ad.max_date, 'dd/mm/yyyy') as "finish",
trunc(ad.max_date - sysdate) as day_to,
sa.scheduled,
nvl(ar.pass, 0) as pass,
nvl(ar.fail, 0) as fail
from audit_config ac
left join
(
select audit_id, max(audit_date) as max_date
from audit_dates
group by audit_id
) ad on ad.audit_id = ac.audit_id
left join
(
select audit_id, count(distinct uniq_id) as scheduled
from sended_audit
group by audit_id
) sa on sa.audit_id = ac.audit_id
left join
(
select
audit_id,
count(case when passorfail = 'p' then 1 end) as pass,
count(case when passorfail = 'f' then 1 end) as fail
from audit_results
group by audit_id
) ar on ar.audit_id = ac.audit_id;

Display Y/N column if record found in detail table

I'm trying to create a query so that I can have a column show Y/N if a particular item was ordered for a group of orders. The item I'm looking for would be OLI.id = '538'.
So my results would be:
Order#, Customer#, FreightPaid
12345, 00112233, Y
12346, 00112233, N
I cannot figure out if I need to use a subquery or the where exists function ?
Here's my current query:
SELECT distinct
OrderID,
Accountuid as Customerno
FROM [SMILEWEB_live].[dbo].[OrderLog] OL
inner join Orderlog_item OLI on OLI.orderlogkey = OL.[key]
inner join Account A on A.uid = OL.Accountuid
where A.GroupId = 'X9955'
and OL.CreateDate >= GETDATE() - 60
I would suggest an exists clause instead of a join:
select ol.OrderID, ol.Accountuid as Customerno,
(case when exists (select 1
from Orderlog_item OLI join
Account A
on A.uid = OL.Accountuid
where OLI.orderlogkey = OL.[key] and A.GroupId = 'X9955'
)
then 1 else 0
end) as flag
from [SMILEWEB_live].[dbo].[OrderLog] OL
where OL.CreateDate >= GETDATE() - 60;
This prevents a couple of problems. First, duplicate rows which are caused when there are multiple matching rows (and select distinct add unnecessary overhead). Second, missing rows, which happen when you use inner join instead of an outer join.

Produce result table trom multiple tables

SQL Server 2008 R2
I have 3 tables contained data for 3 different types of events
Type1, Type2, Type3 with two columns:
DatePoint ValuePoint
I want to produce result table which would look like that:
DatePoint TotalType1 TotalType2 TotalType3
I've started from that
SELECT [DatePoint]
,SUM(ValuePoint) as TotalType1
FROM [dbo].[Type1]
GROUP BY [DatePoint]
ORDER BY [DatePoint]
SELECT [DatePoint]
,SUM(ValuePoint) as TotalType2
FROM [dbo].[Type2]
GROUP BY [DatePoint]
ORDER BY [DatePoint]
SELECT [DatePoint]
,SUM(ValuePoint) as TotalType3
FROM [dbo].[Type3]
GROUP BY [DatePoint]
ORDER BY [DatePoint]
So I have three result but I need to produce one (Date TotalType1 TotalType2 TotalType3), what I need to do next achieve my goal?
UPDATE
Forgot to mention that DatePoint which is exists in one type may or may not exist in another
Here's my take. I assume that you don't have the same datetime values in every table (certainly, the stuff I get to work with is never so consistant). There should be an easier way to do this, but once you're past two outer joins things can get pretty tricky.
SELECT
dp.DatePoint
,isnull(t1.TotalType1, 0) TotalType1
,isnull(t2.TotalType2, 0) TotalType2
,isnull(t3.TotalType3, 0) TotalType3
from (-- Without "ALL", UNION will filter out duplicates
select DatePoint
from Type1
union select DatePoint
from Type2
union select DatePoint
from Type3) dp
left outer join (select DatePoint, sum(ValuePoint) TotalType1
from Type1
group by DatePoint) t1
on t1.DatePoint = db.DatePoint
left outer join (select DatePoint, sum(ValuePoint) TotalType2
from Type2
group by DatePoint) t2
on t2.DatePoint = db.DatePoint
left outer join (select DatePoint, sum(ValuePoint) TotalType3
from Type3
group by DatePoint) t3
on t3.DatePoint = db.DatePoint
order by dp.DatePoint
Suppose some distinct could help, but the general idea should be the following:
SELECT
t.[DatePoint],
SUM(t1.ValuePoint) as TotalType1,
SUM(t2.ValuePoint) as TotalType2,
SUM(t3.ValuePoint) as TotalType3
FROM
(
SELECT [DatePoint] FROM [dbo].[Type1]
UNION
SELECT [DatePoint] FROM [dbo].[Type2]
UNION
SELECT [DatePoint] FROM [dbo].[Type3]
) as t
LEFT JOIN
[dbo].[Type1] t1
ON
t1.[DatePoint] = t.[DatePoint]
LEFT JOIN
[dbo].[Type2] t2
ON
t2.[DatePoint] = t.[DatePoint]
LEFT JOIN
[dbo].[Type3] t3
ON
t3.[DatePoint] = t.[DatePoint]
GROUP BY
t.[DatePoint]
ORDER BY
t.[DatePoint]
To avoid all of the JOINs:
SELECT
SQ.DatePoint,
SUM(CASE WHEN SQ.type = 1 THEN SQ.ValuePoint ELSE 0 END) AS TotalType1,
SUM(CASE WHEN SQ.type = 2 THEN SQ.ValuePoint ELSE 0 END) AS TotalType2,
SUM(CASE WHEN SQ.type = 3 THEN SQ.ValuePoint ELSE 0 END) AS TotalType3
FROM (
SELECT
1 AS type,
DatePoint,
ValuePoint
FROM
dbo.Type1
UNION ALL
SELECT
2 AS type,
DatePoint,
ValuePoint
FROM
dbo.Type2
UNION ALL
SELECT
3 AS type,
DatePoint,
ValuePoint
FROM
dbo.Type3
) AS SQ
GROUP BY
DatePoint
ORDER BY
DatePoint
From the little information provided though, it seems like there are some flaws in the database design, which is probably part of the reason that querying the data is so difficult.