Long Running Query - Recommendations to improve performance in Redshift - sql

SELECT
A.load,
A.sender,
A.latlong,
COUNT(distinct B.load) as load_count,
COUNT(distinct B.sender) as sender_count
FROM TABLE_A A
JOIN TABLE_B B ON
A.sender <> B.sender AND
(
A.latlong = B.latlong
or
(
lower(A.address_line1) = lower(B.address_line1)
and lower(A.city) = lower(B.city)
and lower(A.state) = lower(B.state)
and lower(A.country) = lower(B.country)
)
)
GROUP BY A.load, A.sender, A.latlong ;
I am trying to run a query as above sample, which runs for more time (approx 2 hrs) which is not at all expected. I am trying to split the query and do UNION but the result sets are not matching.
Can you please help with options to improve this query performance or alternative ways to achieve this in AWS?
Approximately 1.5 million records

I would suggest removing the to lower function and sanitizing the data to be lower case
select
A.load, A.sender, A.latlong,
count(distinct B.load) as load_count,
count(distinct B.sender) as sender_count
from
TABLE_A A
join
TABLE_B B
on
A.sender <> B.sender and
(
A.latlong = B.latlong
or
(
A.address_line1 = B.address_line1
and A.city) = B.city)
and A.state) = B.state)
and A.country) = B.country)
))
group by
A.load, A.sender, A.latlong ;

Related

How do I update a column on a table where duplicate records are found

I have this sql script that selects duplicate record . i want to convert it to an update statement where it updates are particular column called STATUS in the table with the string "Duplicated"
Dbms is Oracle
Select * From (
select * from VisaNnsMainTable
Where TRN_TYPE = '500' OR TRN_TYPE = '2500') A
inner join (
select ARN,WAN_NUM, ABS(TRN_AMOUNT) AS AMOUNT
from VisaNnsMainTable
Where TRN_TYPE = '500' OR TRN_TYPE = '2500'
group by ARN, WAN_NUM,ABS(TRN_AMOUNT)
having count(*) > 1
) B on A.ARN = B.ARN
and A.WAN_NUM = B.WAN_NUM
and ABS(A.TRN_AMOUNT) = B.AMOUNT
ORDER BY A.WAN_NUM
Please help me review this I think it does what I want ,but I am not sure if i wrote it well
UPDATE VisaNnsMainTable
SET EXCEPTION = 'Duplicate'
where CONCAT(ARN,WAN_NUM,ABS(TRN_AMOUNT)) in
( select CONCAT(ARN,WAN_NUM,ABS(TRN_AMOUNT))
from VisaNnsMainTable
group by CONCAT(ARN,WAN_NUM,ABS(TRN_AMOUNT))
having count(*) > 1 ) AND (TRN_TYPE = '500' OR TRN_TYPE = '2500')

Should a subquery on a join use tables from an outer query in the where clause?

I need to add a subquery to a join, because one payment can have more than one allotment, so I only need to account for the first match (where rownum = 1).
However, I'm not sure if adding pmt from the outer query to the subquery on the allotment join is best.
Should I be doing this differently in the event of performance hits, etc.. ?
SELECT
pmt.payment_uid,
alt.allotment_uid,
FROM
payment pmt
/* HERE: is the reference to pmt.pay_key and pmt.client_id
incorrect in the below subquery? */
INNER JOIN allotment alc ON alt.allotment_uid = (
SELECT
allotment_uid
FROM
allotment
WHERE
pay_key = pmt.pay_key
AND
pay_code = 'xyz'
AND
deleted = 'N'
AND
client_id = pmt.client_id
AND
ROWNUM = 1
)
WHERE
AND
pmt.deleted = 'N'
AND
pmt.date_paid >= TO_DATE('2017-07-01')
AND
pmt.date_paid < TO_DATE('2017-10-01') + 1;
It's difficult to identify the performance issue in your query without seeing an explain plan output. You query does seem to do an additional SELECT on the allotment for every record from the main query.
Here is a version which doesn't use correlated sub query. Obviously I haven't been able to test it. It does a simple join in and then filters all records except one of the allotments. Hope this helps.
WITH v_payment
AS
(
SELECT
pmt.payment_uid,
alt.allotment_uid,
ROW_NUMBER () OVER(PARTITION BY allotment_id) r_num
FROM
payment pmt JOIN allotment alt
ON (pmt.pay_key = alt.pay_key AND
pmt.client_id = alt.client_id)
WHERE pmt.deleted = 'N' AND
pmt.date_paid >= TO_DATE('2017-07-01') AND
pmt.date_paid < TO_DATE('2017-10-01') + 1 AND
alt.pay_code = 'xyz' AND
alt.deleted = 'N'
)
SELECT payment_uid,
allotment_uid
FROM v_payment
WHERE r_num = 1;
Let's know how this performs!
You can phrase the query that way. I would be more likely to do:
SELECT . . .
FROM payment p INNER JOIN
(SELECT a.*,
ROW_NUMBER() OVER (PARTITION BY pay_key, client_id
ORDER BY allotment_uid
) as seqnum
FROM allotment a
WHERE pay_code = 'xyz' AND deleted = 'N'
) a
ON a.pay_key = p.pay_key AND a.client_id = p.client_id AND
seqnum = 1
WHERE p.deleted = 'N' AND
p.date_paid >= DATE '2017-07-01' AND
p.date_paid < (DATE '2017-10-01') + 1;

Recursive CTE and SELECT

I have written a fairly simple recursive CTE statement. The purpose is that it looks up a structure and returns the top level item
Here is the code
WITH cte_BOM(parent_serial_id, serial_id, serial_batch_no, sort)
AS (SELECT BOM.parent_serial_id, BOM.serial_id, p.serial_batch_no, 1
FROM serial_status AS BOM
INNER JOIN item_serial_nos p ON BOM.parent_serial_id = p.serial_id
WHERE BOM.serial_id = '16320' AND BOM.is_current = 'Y'
UNION ALL
SELECT
BOM1.parent_serial_id, bom1.serial_id, p1.serial_batch_no, cte_BOM.sort + 1
FROM cte_BOM
INNER JOIN serial_status AS BOM1 ON cte_BOM.parent_serial_id =
BOM1.serial_id
INNER JOIN item_serial_nos p1 ON BOM1.parent_serial_id = p1.serial_id
WHERE BOM1.is_current = 'Y'
)
SELECT TOP 1
cte_BOM.*
FROM
cte_BOM
ORDER BY sort desc
As you can see I just hard code the serial_id at the moment. What I now need to accomplish is to run this cte against a subset of data. I’m now stuck on how I can do this.
So I would produce a list of serial_ids by means of another select statement, and then for each row use this serial_id in place of the ones that is currently hard coded and return the 1st record. Importantly if a serial_id has no parent that should still return a row
The second SELECT would be this:
SELECT serial_id
FROM
item_serial_nos
WHERE
item_serial_nos.item_id = '15683'
Any suggestions appreciated. (using SQL 2008 R2)
If I understand your goal correctly, you can just remove the hardcoded serial_id from the cte and add the start serial_id:
WITH cte_BOM(parent_serial_id, serial_id, serial_batch_no, sort, Original_Serial_id)
AS (
SELECT BOM.parent_serial_id
, BOM.serial_id
, p.serial_batch_no
, 1
, BOM.serial_id
FROM serial_status AS BOM
INNER JOIN item_serial_nos p
ON BOM.parent_serial_id = p.serial_id
WHERE BOM.is_current = 'Y'
UNION ALL
SELECT BOM1.parent_serial_id
, bom1.serial_id
, p1.serial_batch_no
, cte_BOM.sort + 1
, cte_BOM.Original_Serial_id
FROM cte_BOM
INNER JOIN serial_status AS BOM1
ON cte_BOM.parent_serial_id = BOM1.serial_id
INNER JOIN item_serial_nos p1
ON BOM1.parent_serial_id = p1.serial_id
WHERE BOM1.is_current = 'Y'
)
SELECT cte_BOM.*
FROM cte_BOM
INNER JOIN (
SELECT cte_BOM.Original_Serial_id
, MAX(sort) sort_max
FROM cte_BOM
WHERE cte_BOM.Original_Serial_id IN (
SELECT serial_id
FROM item_serial_nos
WHERE item_serial_nos.item_id = '15683'
)
GROUP BY cte_BOM.Original_Serial_id
) max_cte
ON max_cte.Original_Serial_id = cte_BOM.Original_Serial_id
AND max_cte.sort_max = cte_BOM.sort
The recursive CTE is only executed when it's called from the last select, so it is only executed for those records that are selected in the IN query. So you shouldn't suffer a performance hit because of this.

Find duplicates in SQL Server database where one of the columns must differ

I'm trying to write a SQL query to find duplicates. What I can't manage to do is to make my query only select duplicates where one of the columns value must differ. So, I want to find all the duplicates where all the columns are the same, but one of the values must differ.
What I've got at the moment:
SELECT
a.1, underlag.1, f.1, f.2, f.3, f.4, f.5, f.6, f.7, f.8,
COUNT(*) TotalCount
FROM
f
JOIN
a ON a.Id = f.Id
JOIN
underlag ON underlag.Id = f.Id
GROUP BY
a.1, underlag.1, f.1, f.2, f.3, f.4, f.5, f.6, f.7, f.8
HAVING
COUNT(*) > 1
ORDER BY
underlag.1
The column that I want to differ is f.9 but I've no clue on how to do this. Any help or pointers in the right direction would be great!
SELECT *
FROM (
SELECT
a1 = a.[1]
, underlag1 = underlag.[1]
, f.[1], f.[2], f.[3], f.[4], f.[5], f.[6], f.[7], f.[8], f.[9]
, val = SUM(1) OVER (PARTITION BY CHECKSUM(f.[1], f.[2], f.[3], f.[4], f.[5], f.[6], f.[7], f.[8]))
FROM f
JOIN a on a.Id = f.Id
JOIN underlag on underlag.Id = f.Id
) t
WHERE t.val > 1
ORDER BY underlag1

How to replace a NULL when a COUNT(*) returns NULL in DB2

I have a query:
SELECT A.AHSHMT AS SHIPMENT, A.AHVNAM AS VENDOR_NAME, D.UNITS_SHIPPED, D.ADPON AS PO, B.NUMBER_OF_CASES_ON_TRANSIT, C.NUMBER_OF_CASES_RECEIVED FROM AHASNF00 A
INNER JOIN (SELECT IDSHMT, COUNT(*) AS NUMBER_OF_CASES_ON_TRANSIT FROM IDCASE00 WHERE IDSTAT = '01' GROUP BY IDSHMT) B
ON (A.AHSHMT = B.IDSHMT)
LEFT JOIN (SELECT IDSHMT, (COUNT(*) AS NUMBER_OF_CASES_RECEIVED FROM IDCASE00 WHERE IDSTAT = '10' GROUP BY IDSHMT) C
ON (A.AHSHMT = C.IDSHMT)
INNER JOIN (SELECT ADSHMT, ADPON, SUM(ADUNSH) AS UNITS_SHIPPED FROM ADASNF00 GROUP BY ADSHMT, ADPON) D
ON (A.AHSHMT = D.ADSHMT)
WHERE A.AHSHMT = '540041134';
On the first JOIN statement I have a COUNT(*), on this count sometimes I will get NULL. I need to replace this with a "0-zero", I know think I know how to do it in SQL
ISNULL(COUNT(*), 0)
But this doesn't work for DB2, how can I accomplish this? All your help is really appreciate it.
Wrap a COALESCE around each of the nullable totals in your SELECT list:
SELECT A.AHSHMT AS SHIPMENT,
A.AHVNAM AS VENDOR_NAME,
COALESCE( D.UNITS_SHIPPED, 0 ) AS UNITS_SHIPPED,
D.ADPON AS PO,
COALESCE( B.NUMBER_OF_CASES_ON_TRANSIT, 0 ) AS NUMBER_OF_CASES_ON_TRANSIT,
COALESCE( C.NUMBER_OF_CASES_RECEIVED, 0 ) AS NUMBER_OF_CASES_RECEIVED
FROM ...
The inner joins you're using for expressions B and D mean that you will only receive rows from A that have one or more cases in transit (expression B) and have one or more POs in expression D. Is that the way you want your query to work?
Instead of using ISNULL(COUNT(*), 0),
try using COALESCE(COUNT(*),0)
use IFNULL(COUNT(*), 0) for DB2