Having trouble using COUNT with INTERSECT in Teradata

Having trouble using COUNT with INTERSECT in Teradata - sql

I am trying to run the code below in Teradata. However, I keep getting an error when I try to count the number of rows this intersection has. The error is: Failed [2616 : 22003] Numeric overflow occurred during computation.
I tried using a CAST with BIGINT, but now the value comes empty. When I run the actual intersect (without the COUNT clause) - I am able to see the list of rows of this intersect. I want to be able to count this number. Do you know how I can do this?
select CAST(count(a.main_id) AS BIGINT) from second_database.tra_rock a
database.game_active b ON a.main_key=b.main_key AND description_detail LIKE 'AC'
database.release_day c ON a.release_key = c.release_key AND g_description = 'FW'
database.ft_feature d on a.main_id = d.main_id AND first_time >= 20200319
where action_date_key between 20200319 and 20200324 and a.main_id IN
(select a.main_id
From second_database.tra_rock a
database.game_active b ON a.main_key=b.main_key AND description_detail LIKE 'AC'
where action_date > 20200324 and release_key = 200)
INTERSECT
select a.main_id
From second_database.tra_rock a
database.game_active b ON a.main_key=b.main_key AND description_detail LIKE 'AC'
database.release_day c ON a.release_key = c.release_key AND g_description = 'FW'
database.ft_feature d on a.main_id = d.main_id AND DATE_KEY >= 20200319
where action_date_key between 20200319 and 20200324 and a.main_id IN
(select a.main_id
From second_database.tra_rock a
database.game_active b ON a.genome_key=b.genome_key AND description_detail <> 'AC'
where action_date > 20200324 and release_key = 200)

The COUNT is applied to the first Select only and then you try to Intersect the counts and the main_id from the second Select.
You need to wrap the full query into a Derived Table or a Common Table Expression:
select cast(count(*) as bigint)
from
(
select a.main_id from second_database.tra_rock a
database.game_active b ON a.main_key=b.main_key AND description_detail LIKE 'AC'
database.release_day c ON a.release_key = c.release_key AND g_description = 'FW'
database.ft_feature d on a.main_id = d.main_id AND first_time >= 20200319
where action_date_key between 20200319 and 20200324 and a.main_id IN
(select a.main_id
From second_database.tra_rock a
database.game_active b ON a.main_key=b.main_key AND description_detail LIKE 'AC'
where action_date > 20200324 and release_key = 200)
INTERSECT
select a.main_id
From second_database.tra_rock a
database.game_active b ON a.main_key=b.main_key AND description_detail LIKE 'AC'
database.release_day c ON a.release_key = c.release_key AND g_description = 'FW'
database.ft_feature d on a.main_id = d.main_id AND DATE_KEY >= 20200319
where action_date_key between 20200319 and 20200324 and a.main_id IN
(select a.main_id
From second_database.tra_rock a
database.game_active b ON a.genome_key=b.genome_key AND description_detail <> 'AC'
where action_date > 20200324 and release_key = 200)
) as dt

Related

SQL query to find only those customer ids which have 2 source values

I have 2 tables, one which stores the customer id and the other table which stores customer id along with the information about different sources which use that customer information. Example:
TABLE A
Customer Id
1
2
3
..
TABLE B
Customer Id Source
1 'AA'
2 'AA'
1 'AB'
2 'AB'
2 'AC'
3 'AA'
3 'AB'
3 'AE'
4 'AA'
4 'AB'
I want to write a SQL query which returns records which have only AA and AB as sources (no other sources)
I have written the below query, but it is not working correctly:
select a.customer_id
from A a, B b
where a.customer_id = b.customer_id
and b.source IN ('AA','AB')
group by a.customer_id
having count(*) = 2;

A rather efficient solution is a couple of exists subqueries:
select a.*
from a
where
exists(select 1 from b where b.customer_id = a.customer_id and b.source = 'AA')
and exists(select 1 from b where b.customer_id = a.customer_id and b.source = 'AB')
and not exists(select 1 from b where b.customer_id = a.customer_id and b.source not in ('AA', 'AB'))
With an index on b(customer_id, source), this should run quickly.
Another option is aggreation:
select customer_id
from b
group by customer_id
having
max(case when source = 'AA' then 1 else 0 end) = 1
and max(case when source = 'AB' then 1 else 0 end) = 1
and max(case when source not in ('AA', 'AB') then 1 else 0 end) = 0

This assumes that the customer_id/source combination has no duplicates
select a.customer_id
from A a join B b
on a.customer_id = b.customer_id
group by a.customer_id
-- both 'AA' and 'AB', but no other
having sum(case when b.source IN ('AA','AB') then 1 else -1 end) = 2
It might be more efficient to aggregate before the join:
select a.customer_id
from A a join
( select customer_id
from B b
group by customer_id
-- both 'AA' and 'AB', but no other
having sum(case when source IN ('AA','AB') then 1 else -1 end) = 2
) b
on a.customer_id = b.customer_id

You can use aggregation:
select b.customer_id
from b
where b.source in ('AA', 'AB')
group by b.customer_id
having count(distinct b.source) = 2;
That said, your version should work. However, you should learn to use proper, explicit, standard, readable JOIN syntax. The join, however, is not needed in this case.
If you want only those two sources, you need to tweak the logic:
select b.customer_id
from b
group by b.customer_id
having sum(case when b.source = 'AA' then 1 else 0 end) > 0 and -- has AA
sum(case when b.source = 'AB' then 1 else 0 end) > 0 and -- has AB
count(distinct b.source) = 2;

I need to handle overlapping dates but if the end date is null then it to be assumed that the process has not stopped

I have a start date and end date of a process from two different sources.these two sources will be merged and the dates needs to be handled in case of conflicts
Dataset1
P_startDate P_EndDate
1-Jan-07 1-Jun-15
Dataset2
P_Start Date P_End Date
1-Mar-15 1-Jan-17
2-Jan-17 Null
Merged Dataset / Expected Dataset
| Process Start Date | Process End Date |
| 1-Jan-07 | 1-Mar-15 |
| 1-Mar-15 | 1-Jan-17 |
| 2-Jan-17 | Null |
I did create a code but that is giving me a result where the null (no end date) condition is not considered and my out put comes as
| Process Start Date | Process End Date |
| 1-Jan-07 | 1-Mar-15 |
| 1-Mar-15 | 1-Jan-17 |
| 1-Jan-17 | 2-Jan-17 |
I have followed the guidelines from here
http://www.schemamania.org/sql/#overlapping.dates
with D (ID, bound) as (
select ID
, case T when 's' then StartDate else EndDate end as bound
from (
select ID, StartDate, EndDate from so.A
UNION
select ID, StartDate, EndDate from so.B
) as U
cross join (select 's' as T union select 'e') as T
)
select P.*
from (
select s.ID, s.bound as StartDate, min(e.bound) as EndDate
from D as s join D as e
on s.ID = e.ID
and s.bound < e.bound
group by s.ID, s.bound
) as P
left join so.A as a
on P.ID = a.ID
and a.StartDate <= P.StartDate and P.EndDate <= a.EndDate
left join so.B as b
on P.ID = b.ID
and b.StartDate <= P.StartDate and P.EndDate <= b.EndDate
order by P.ID, P.StartDate, P.EndDate

This looks more like a merge overlapping interval problem. Here is one solution that keeps a running count of starts and ends:
CREATE TABLE ds1 (P_STARTDATE DATE, P_ENDDATE DATE);
CREATE TABLE ds2 (P_STARTDATE DATE, P_ENDDATE DATE);
INSERT INTO ds1 VALUES
('2007-01-01', '2015-06-01');
INSERT INTO ds2 VALUES
('2015-03-01', '2017-01-01'),
('2017-01-02', NULL);
WITH cte1(d, v) AS (
SELECT P_startDate, +1 FROM ds1 UNION ALL
SELECT P_EndDate, -1 FROM ds1 UNION ALL
SELECT P_startDate, +1 FROM ds2 UNION ALL
SELECT P_EndDate, -1 FROM ds2
), cte2(d, c) AS (
SELECT d, SUM(SUM(v)) OVER (ORDER BY CASE WHEN d IS NULL THEN 2 ELSE 1 END, d)
FROM cte1
GROUP BY d
), cte3(d, c, f) AS (
SELECT d, c, CASE WHEN LAG(c) OVER (ORDER BY CASE WHEN d IS NULL THEN 2 ELSE 1 END, d) > 0 THEN 0 ELSE 1 END
FROM cte2
), cte4(d, c, g) AS (
SELECT d, c, SUM(f) OVER (ORDER BY CASE WHEN d IS NULL THEN 2 ELSE 1 END, d)
FROM cte3
)
SELECT MIN(d) AS FromDate, CASE WHEN COUNT(d) = COUNT(*) THEN MAX(d) END AS ToDate
FROM cte4
GROUP BY g;
Result:
FromDate ToDate
01/01/2007 00:00:00 01/01/2017 00:00:00
02/01/2017 00:00:00 02/01/2017 00:00:00

Display 0 when count(*) is zero for a group by clause

I am trying to write a SQL query which can display value as 0 if there are no rows for the specified condition
I have tried the following so far but nothing seems to work
coalesce(count(m.a),'0')
isnull(count(m.a),'0')
case when count(*) > 0 then count(*) else '0' end
select M.a, m.b, m.c, m.d, m.e,
--coalesce(count(m.a),'0') as CountOfRecords
--isnull(count(m.a),'0') as CountOfRecords
--case when count(*) > 0 then count(*) else '0' end
from my_table M
left join
(select a, b,c,d,e
from my_table
group by a, b,c,d,e
having count(*) >1 ) B
on M.b = B.b
and M.c = B.c
and M.d = B.d
and M.e = B.e
and m.a <> B.a
where M.a in (1,2)
and M.date<= '1/1/2019'
group by M.a, m.b, m.c, m.d, m.e
Expected Result
A B C D E count
1 1 1 1 1 10
2 2 2 2 2 0
Actual Result
A B C D E count
1 1 1 1 1 10

You need to use a nested request:
select coalesce(nb, 0) from (
select count(*) nb from my_table
group by my_table.a
) nested;

Are you looking for something like this?
select a, b, c, d, e,
sum(case when M.date <= '2019-01-01' then 1 else 0 end) as cnt
from my_table
where a in (1, 2)
group by a, b, c, d, e;
This keeps all rows in the original data that match the condition on a, but not necessarily the condition on the date. It then counts only the rows that match the date.

SQL - SUM within subquery

I have the following code that looks at the SalesVol of different products and groups it by transaction_week
SELECT a.transaction_week,
SUM(CASE WHEN record_type IN (6,37,13) THEN quantity ELSE 0 END) as SalesVol
FROM table 1 a
LEFT JOIN table 2 b ON b.Date = a.transaction_date
LEFT JOIN table 3 c ON c.sku = a.product
WHERE series in (62,236,501,52)
GROUP BY a.transaction_week
ORDER BY a.transaction_week
| tw | SalesVol |
| 1 | 4768 |
| 2 | 4567 |
| 3 | 4354 |
| 4 | 4678 |
I want to be able to have multiple subqueries where I change the series numbers for example.
SELECT a.transaction_week,
(SELECT SUM(CASE WHEN record_type IN (6,37,13) THEN quantity ELSE 0 END) as SalesVol
FROM table 1 a
LEFT JOIN table 2 b ON b.Date = a.transaction_date
LEFT JOIN table 3 c ON c.sku = a.product
WHERE series in (62,236,501,52)) as personal care
(SELECT SUM(CASE WHEN record_type IN (6,37,13) THEN quantity ELSE 0 END) as SalesVol
FROM table 1 a
LEFT JOIN table 2 b ON b.Date = a.transaction_date
LEFT JOIN table 3 c ON c.sku = a.product
WHERE series in (37,202,203,456)) as white goods
FROM table 1 a
LEFT JOIN table 2 b ON b.Date = a.transaction_date
LEFT JOIN table 3 c ON c.sku = a.product
GROUP BY a.transaction_week
ORDER BY a.transaction_week
I can't get the subqueries at work as it is giving me the overall sum value and not grouping it by transaction_week

Instead of using subqueries, add series to the condition of the CASE statements:
SELECT a.transaction_week,
sum(CASE WHEN series IN (62,236,501,52) AND record_type IN (6,37,13)
THEN quantity ELSE 0 END) as personal_care,
sum(CASE WHEN series IN (37,202,203,456) AND record_type IN (6,37,13)
THEN quantity ELSE 0 END) as white_goods
FROM table 1 a
LEFT JOIN table 2 b ON b.Date = a.transaction_date
LEFT JOIN table 3 c ON c.sku = a.product
GROUP BY a.transaction_week
ORDER BY a.transaction_week;

You just miss the a.transaction_week in you subquery. The JOIN in outer query is unneccessary.
SELECT a.transaction_week,
(
SELECT SUM(CASE WHEN record_type IN (6,37,13) THEN quantity ELSE 0 END) as SalesVol
FROM table 1 a2
LEFT JOIN table 2 b ON b.Date = a2.transaction_date
LEFT JOIN table 3 c ON c.sku = a2.product
WHERE series in (62,236,501,52) AND a2.transaction_week = a.transaction_week
) as personal care,
(
SELECT SUM(CASE WHEN record_type IN (6,37,13) THEN quantity ELSE 0 END) as SalesVol
FROM table 1 a 2
LEFT JOIN table 2 b ON b.Date = a2.transaction_date
LEFT JOIN table 3 c ON c.sku = a2.product
WHERE series in (37,202,203,456) AND a2.transaction_week = a.transaction_week
) as white goods
FROM table 1 a
GROUP BY a.transaction_week
ORDER BY a.transaction_week

Try this it would work fast as well as up to your requirement:
SELECT a.transaction_week ,
whitegoods.SalesVol AS 'White Goods' ,
personalcare.SalesVol1 AS 'Personal Care'
FROM table1 a
LEFT JOIN table2 b ON b.[Date] = a.transaction_date
LEFT JOIN table3 c ON c.sku = a.product
CROSS APPLY ( SELECT SUM(CASE WHEN record_type IN ( 6, 37, 13 )
THEN quantity
ELSE 0
END) AS SalesVol
FROM table1 a2
WHERE b.[Date] = a2.transaction_date
AND c.sku = a2.product
AND series IN ( 37, 202, 203, 456 )
AND a2.transaction_week = a.transaction_week
) whitegoods
CROSS APPLY ( SELECT SUM(CASE WHEN record_type IN ( 6, 37, 13 )
THEN quantity
ELSE 0
END) AS SalesVol1
FROM table1 a2
WHERE b.[Date] = a2.transaction_date
AND c.sku = a2.product
AND series IN ( 62, 236, 501, 52 )
AND a2.transaction_week = a.transaction_week
) personalcare
GROUP BY a.transaction_week
ORDER BY a.transaction_week

You should use the UNION operator. Please refer to the query below:
select a.transaction_week, SalesVol from
(SELECT a.transaction_week as transaction_week,
SUM(CASE WHEN record_type IN (6,37,13) THEN quantity ELSE 0 END) as SalesVol
FROM table 1 a
LEFT JOIN table 2 b ON b.Date = a.transaction_date
LEFT JOIN table 3 c ON c.sku = a.product
WHERE series in (62,236,501,52)
UNION
SELECT a.transaction_week as transaction_week,
SUM(CASE WHEN record_type IN (6,37,13) THEN quantity ELSE 0 END) as SalesVol
FROM table 1 a
LEFT JOIN table 2 b ON b.Date = a.transaction_date
LEFT JOIN table 3 c ON c.sku = a.product
WHERE series in (37,202,203,456)
) AS tbl1
GROUP BY tbl1.transaction_week
ORDER BY tbl1.transaction_week

Find rows where one column value match and other does not

I have two tables A and B
Table A
CODE TYPE
A 1
A 2
A 3
B 1
C 1
C 2
Table B
CODE TYPE
A 1
A 2
A 4
B 2
C 1
C 3
I want to return rows where CODE is in both tables but TYPE is not and also CODE has more than one TYPE in both tables so my result would be
CODE TYPE SOURCE
A 3 Table A
A 4 Table B
C 2 Table A
C 3 Table B
Any help with this?

I think this covers both of your conditions.
select code, coalesce(typeA, typeB) as type, src
from
(
select
coalesce(a.code, b.code) as code,
a.type as typeA,
b.type as typeB,
case when b.type is null then 'A' when a.type is null then 'B' end as src,
count(a.code) over (partition by coalesce(a.code, b.code)) as countA,
count(b.code) over (partition by coalesce(a.code, b.code)) as countB
from
A a full outer join B b
on b.code = a.code and b.type = a.type
) T
where
countA >= 2 and countB >= 2
and (typeA is null or typeB is null)

You can use a full join to see if the code matches and check if the type is null on either of the tables.
select coalesce(a.code,b.code) code, coalesce(a.type,b.type) type,
case when b.type is null then 'A' when a.type is null then 'B' end src
from a
full join b on a.code = b.code and a.type = b.type
where a.type is null or b.type is null
To limit the results to codes which have more than one type, use
select x.code, coalesce(a.type,b.type) type,
case when b.type is null then 'Table A' when a.type is null then 'Table B' end src
from a
full join b on a.code = b.code and a.type = b.type
join (select a.code from a join b on a.code = b.code
group by a.code having count(*) > 1) x on x.code = a.code or x.code = b.code
where a.type is null or b.type is null
order by 1

Using union
with tu as (
select CODE, TYPE, src='Table A'
from TableA
union all
select CODE, TYPE, src='Table B'
from TableB
)
select CODE, TYPE, max(src)
from tu t1
where exists (select 1 from tu t2 where t2.CODE=t1.CODE and t2.src=t1.src and t1.TYPE <> t2.TYPE)
group by CODE, TYPE
having count(*)=1
order by CODE, TYPE

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Having trouble using COUNT with INTERSECT in Teradata - sql

Related

SQL query to find only those customer ids which have 2 source values

I need to handle overlapping dates but if the end date is null then it to be assumed that the process has not stopped

Display 0 when count(*) is zero for a group by clause

SQL - SUM within subquery

Find rows where one column value match and other does not

Categories

Resources