Teradata Query optimization - INDEX and COLLECT STATISTICS

Teradata Query optimization - INDEX and COLLECT STATISTICS - sql

I'm trying to create a SPROC and I am running into a spool space issue. My IT has confirmed I'm at max.
I can see where my query fails - it is at a place where I am trying to grab over 10m rows of data and do some various case statements, etc. then insert them into a separate table. Apologies for the anonymity but I can't show the real column headers.
How can I use INDEX or COLLECT STATISTICS after an INSERT INTO XYZ.Table statement to optimize this query?
INSERT INTO XYZ.TableFinal
SEL
b.Column1,
b.Column2,
b.Column3,
b.Column4,
b.Column5,
b.Column6,
CASE WHEN b.Column7 = 'A' THEN 'A1'
WHEN b.Column7 = 'B' THEN 'B1'
WHEN b.Column7 = 'C' THEN 'C1'
ELSE NULL end AS Column7,
b.Column8,
CASE WHEN b.Column9 = 'AA' THEN 100
WHEN b.Column9 = 'BB' THEN 200
WHEN b.Column9 = 'CC' THEN 300
ELSE 400 end AS Column9,
a.Column10
FROM Table1 a -----This table has millions upon millions of rows
JOIN Table2 b
ON a.Column1 = b.Column1
AND a.Column2 = b.Column2
AND a.Column3 = b.Column3
WHERE Column3>0 AND Column2 > 0 ----This helps narrow it down a bit, but I still need to pull and analyze 10m+ rows before aggregating them in the following queries.
;
I attempted to create an index prior to the select along with collecting stats on the tables, but did not have any success.

Related

What is the most optimal approach for a rate lookup in single SQL select query?

I am working on a platform that executes SQL select statements to get data for reporting.
I need to write a query that returns all columns from table A, plus a few calculated columns.
The calculated columns are multiplications with a single column from table B, repeated few times for different B rows, identified with a constant, i.e.
SELECT
A.Column1,
A.Column2,
A.Column1 * B1.Rate as Column1_Rate1,
A.Column2 * B1.Rate as Column2_Rate1,
A.Column1 * B2.Rate as Column1_Rate2,
A.Column2 * B2.Rate as Column2_Rate2
FROM TableA A
LEFT JOIN TabelB B1 on B1.ID = 'ID1'
LEFT JOIN TabelB B2 on B2.ID = 'ID2'
My question is - what would be the most optimal way to write such a query, considering that:
I am working with MSSQL 2019.
I cannot use a stored procedure (if I could, I would move rate lookup into a separate statement, which I think would be the most optimal).
The query will become a sub-query of another select statement, that will only pick a subset of columns from it, e.g.
SELECT Column1, Column1_Rate1 FROM (^ABOVE QUERY^)

I can advice next approach:
Because table of rates not linked to data table you can grab rates by one query and use cross join after that:
SELECT
A.Column1,
A.Column2,
A.Column1 * Rate1 as Column1_Rate1,
A.Column2 * Rate1 as Column2_Rate1,
A.Column1 * Rate2 as Column1_Rate2,
A.Column2 * Rate2 as Column2_Rate2
FROM TableA A
CROSS JOIN (
SELECT
MIN(CASE WHEN ID = 'ID1' THEN Rate END) Rate1,
MIN(CASE WHEN ID = 'ID2' THEN Rate END) Rate2
FROM TableB WHERE ID IN ('ID1', 'ID2')
) Rates
Test MS SQL 2019 queries online
or using CTE:
WITH Rates AS (
SELECT
MIN(CASE WHEN ID = 'ID1' THEN Rate END) Rate1,
MIN(CASE WHEN ID = 'ID2' THEN Rate END) Rate2
FROM TableB WHERE ID IN ('ID1', 'ID2')
) SELECT
A.Column1,
A.Column2,
A.Column1 * Rate1 as Column1_Rate1,
A.Column2 * Rate1 as Column2_Rate1,
A.Column1 * Rate2 as Column1_Rate2,
A.Column2 * Rate2 as Column2_Rate2
FROM Rates, TableA A;

Need help in Exists clause with or keyword

i am trying to change the IN clause with EXISTS clause but not getting the desired results
OLD SCRIPT
delete from ABC a
where column1 IN (select DISTINCT column1
from BCD b
where b.column2 = a.column2
and b.column3 = 'N')
or a.column3 = "Y";
when i am changing this to
delete from ABC a
where EXISTS (select column1
from BCD b
where b.column2 = a.column2
and b.column3 = 'N')
or a.column3 = "Y";
i am not getting the desired results i have doubt that it is due to "or" condition used in the last.
Need help to resolve this.

The problem is not due to OR condition.
Your OLD script is validating the values against COLUMN1
where column1 IN (select DISTINCT column1 from BCD b where b.column2=a.column2 and b.column3= 'N')
But, the other script checks if any records exist in the subquery.
where EXISTS (select column1 from BCD b where b.column2=a.column2 and b.column3= 'N')
To summarize, query with IN verifies the existence of returned values in column1, while EXISTS just checks if the subquery returns 0 or more rows.

You missed the column1 filter. Remember that the projected column in an exists subquery isn’t doing anything - you’re only checking that the subquery returns rows, not what they are. I typically use select null in (not) exists to make this obvious to the reader
delete from ABC a
where EXISTS
(select null
from BCD b
where b.column2=a.column2
And b.column1 = a.column1
and b.column3= 'N')
or a.column3= "Y";

How to calculate the percentage of records when comparing hive tables?

Two Hive tables called table1 and table2 are there. I got the count of both of these tables. I created a third table called abc with the non matching records from table1 and table2. How can I get the percentage of number of records in table abc compare to the entire count of table1 and table2?
1. select count(*) from table1 A
2. select count(*) from table2 B
3. create table dbo.abc as
select A.column1, A.columnb from table A
inner join table B
where A.column3 <> B.column3
4. how to get the percentage of records in table abc?
for example: count(*) from abc
-------------------- *100
count(*) from A + B
Expected output is:
Example:
number_of_non_matching_records = 20%

Are you trying to do this in one statement?
select count(*) as combos_in_ab,
sum(case when a.column3 <> b.column3 then 1 else 0 end) as combos_in_3,
avg(case when a.column3 <> b.column3 then 1.0 else 0 end) as percent_in_3
from a cross join
b;

Compare results from column1 from column2 using SQL

Backstory,
My company runs redundant call recording servers, each with a list of extensions.
We query these using SQL. I can see there is a 20+ extension difference between the two servers. These are columns that exist in the same table...so essentially I need to do the following:
Compare column1 data from column 2 'server1' in table system.name with column1 data from column 2 'server2' in table system.name and display those that DO NOT exist on both, but exist on one or the other.

Based on what I can undertand from your question
Select column1
from table1 a
where column2 = 'server1'
and not exists
(select *
from table1 b
where a.column1 = b.column1
and b.column2 = 'server2'
)
UNION
Select column1
from table1 a
where column2 = 'server2'
and not exists
(select *
from table1 b
where a.column1 = b.column1
and b.column2 = 'server1'
)

SQL Statement Performance Issue on Informix

I have this Informix SQL statement which takes ages to run. Does anybody see any way to optimize it so it wouldn't take so long?
SELECT * FROM OriginalTable WHERE type = 'S' AND flag <> 'S' INTO TEMP TempTableA;
SELECT * FROM OriginalTable WHERE type = 'Z' AND flag <> 'S' INTO TEMP TempTableB;
UPDATE OriginalTable SET flag = 'D' WHERE Serialnumber in
(
select Serialnumber from TempTableA
WHERE NOT EXISTS(SELECT * FROM TempTableB
WHERE TempTableB.Col1 = TempTableA.Col1
AND TempTableB.Col2 = TempTableA.Col2)
)
I have in my OriginalTable around 300 million rows, TempTableA 93K rows, and TempTableB 58K rows.

Update OriginalTable
Set flag = 'D'
Where Type = 'S'
And Flag <> 'S'
And Not Exists (
Select 1
From OriginalTable As T1
Where T1.Type = 'Z'
And T1.flag <> 'S'
And T1.Col1 = OriginalTable.Col1
And T1.Col2 = OriginalTable.Col2
)

In a similar approach as #tombom stated. Pre-query only the columns you care about to keep the temp table smaller. If you are dealing with a table of 60 columns, you are filling a whole lot more than just 3-4 columns where your primary consideration are valid serial numbers. Pre-test the query to make sure it gives you the correct set you are expecting, then apply that to your SQL-update.
So here, the inner query are the ones you DO NOT WANT... Since you were comparing against only column 1 and column 2 from this table, that's all I'm pre-querying. I'm then doing a LEFT JOIN to this inner result set on COL1 and COL2. I know, you want to EXCLUDE THOSE FOUND IN THIS result set... That's why, in the OUTER WHERE clause, I've added "AND ExcludeThese.Col1 IS NULL". So, any instances from OT1 that never existed in the subquery are good to go (via left join), and those that WERE FOUND, WILL have a match on col1 and col2, but THOSE will be excluded via the "and" clause I've described.
SELECT OT1.SerialNumber
FROM OriginalTable OT1
LEFT JOIN ( select OT2.Col1,
OT2.Col2
FROM OriginalTable OT2
where OT2.type = 'Z'
AND OT2.flag <> 'S' ) ExcludeThese
ON OT1.Col1 = ExcludeThese.Col1
AND OT1.Col2 = ExcludeThese.Col2
WHERE OT1.type = 'S'
AND OT1.flag <> 'S'
AND ExcludeThese.Col1 IS NULL
ORDER BY
OT1.SerialNumber
INTO
TEMP TempTableA;
Again, test this query by itself to make sure you ARE getting the records you expect. To help clarify the records returned, change the above select to include more columns for a mental / sanity check, such as
SELECT OT1.SerialNumber,
OT1.Col1,
OT1.Col2,
ExcludeThese.Col1 JoinedCol1,
ExcludeThese.Col2 JoinedCol2
from <keep rest of query intact>
Now, you'll be able to see the serial number and instances of those columns that would or not be joined to the "excludeThese" resultset... Try again, but remove only the
"AND ExcludeThese.Col1 IS NULL" clause, and you'll see the other lines and WHY they are being excluded -- that is if you DID have any questions to the content.
Once you are satisfied with the pre-query... which will only return the single column of SerialNumber, that can be index/optimized since you are pulling into a temp table, build an index, then apply your update.
UPDATE OriginalTable
SET flag = 'D'
WHERE Serialnumber in ( select Serialnumber from TempTableA );

I was too lazy to test with test data, but maybe this can do?
SELECT col1, col2,
CASE WHEN type = 'S' THEN 1
ELSE WHEN type = 'Z' THEN 2 END AS filteredType
FROM OriginalTable WHERE (type = 'S' OR type = 'Z') AND flag <> 'S' INTO TempTable;
UPDATE OriginalTable SET flag = 'D' WHERE Serialnumber IN
(
SELECT t1.Serialnumber FROM TempTable t1
LEFT JOIN TempTable t2 ON (t1.col1 = t2.col2 AND t1.col2 = t2.col2)
WHERE t1.filteredType = 1
AND t2.filteredType = 2
AND t2.Serialnumber IS NULL
)
That way you can omit one loading into temp table. On the other hand there will be no index on the new column filteredType.
Also I have no idea of informix. Hope it helps anyway.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Teradata Query optimization - INDEX and COLLECT STATISTICS - sql

Related

What is the most optimal approach for a rate lookup in single SQL select query?

Need help in Exists clause with or keyword

How to calculate the percentage of records when comparing hive tables?

Compare results from column1 from column2 using SQL

SQL Statement Performance Issue on Informix

Categories

Resources