Query Performance Netezza SQL - sql

What is the best way to have the below logic to be in a single Netezza SQL. I implemented the logic in a for loop but for my data set it is taking a long time in netezza (say 47 mins to complete the loop) I have two tables, “TABLE - A” (Sector_ID | Value) and “TABLE B” holds which sector_id is intersected with other sector_id combination.
Now, the TABLE-A will be sorted descending on Value, and need to take the each highest sector_id from table A and eliminate all the corresponding intersected sector_id for point A in TABLE- B.
For Example,
TABLE – A (After Sorting)
SECTOR_ID VALUE(DESC) DELETED ROWS
6 150
1 140 DELETED
4 50
2 45 DELETED
3 15
TABLE – B
SECTOR_ID INTERSECTED_ID DELETED ROWS
6 6
6 1 DELETED
6 2 DELETED
1 1 DELETED
1 4 DELETED
1 2 DELETED
4 4
4 1 DELETED
2 6 DELETED
2 1 DELETED
2 2 DELETED
3 3
Now the remaining values in TABLE – A will be the desired output. Please suggest. The DB I am using is Netezza.

I'm going to attempt to restate your problem, so if it is not accurate let me know in the comments so that we can formulate it (and this answer).
You need to remove records in table_a when table_a.sector_id appears in the list of previous table_b.intersected_ids given that table_b is sorted by table_a.value.
Solution
Note that this solution is not relegated to Netezza-only, but rather relational algebra. Also, as far as I know, this will be faster than any cursor or loop-based approach for any RDBMS.
The biggest chore is setting up the list of sector_ids that need to be deleted from table_a. See in-line comments for descriptions.
create temporary table table_b_extended as
with tba as ( --Enhance table_a to include a row number.
select
row_number() over (order by sector_value desc) rwn
,*
from
table_a
), tab as ( --Join tables A and B together to attach the sorting key.
select
tba.rwn table_a_rwn
,row_number() over (order by tba.rwn) table_b_rwn
,tbb.*
,case
when tbb.sector_id = tbb.intersected_id
then 1
else 0
end sector_is_intersected
from
tba
join table_b tbb on
tba.sector_id = tbb.sector_id
)
select * from tab
distribute on (sector_id);
-- Find out the row where the intersected id first appears.
create temporary table table_b_first_appearance as
select
intersected_id sector_id
,min(table_b_rwn) first_appearance
from
table_b_extended
where
sector_id <> intersected_id
group by 1
distribute on random;
create temporary table table_a_deletes as
with pid as ( --Get all previous intersected_ids.
select distinct
tab.*
,case --See if this row is after the intersected_id's first appearance.
when app.first_appearance < tab.table_b_rwn then 1
else 0
end sector_in_previous
from
table_b_extended tab
left outer join table_b_first_appearance app using (sector_id)
), vld as ( --Select records that qualify to delete from table_a.
select distinct
intersected_id
from
pid
where
--If it hasn't been seen and isn't equal to the intersected_id, delete it.
sector_is_intersected + sector_in_previous = 0
)
select
*
from
vld
distribute on random;
Given your initial input for table_b:
+------------+----------------+
| sector_id | intersected_id |
+------------+----------------+
| 6 | 6 |
| 1 | 2 |
| 1 | 1 |
| 1 | 4 |
| 4 | 4 |
| 4 | 1 |
| 2 | 6 |
| 2 | 1 |
| 2 | 2 |
| 3 | 3 |
| 6 | 1 |
| 6 | 2 |
+------------+----------------+
This generates a table, table_a_deletes, with two values: 1 and 2. Then deleting from table_a is simple.
delete from table_a tbl
where tbl.sector_id in (select sector_id from table_a_deletes);
And I'm not sure if those DELETED flags in table_b need to be replicated or not, but if so:
delete from table_b tbl
where
tbl.sector_id in (select sector_id from table_a_deletes)
or tbl.sector_id <> tbl.intersected_id;
Performance
On an 8 SPU test system, including the delete steps:
Test 1
table_a size 14976
table_b size 179427095
Runtime 2:50
Test 2
table_a size 14976
table_b size 196240063
Runtime 3:16
Test 3
table_a size 19919
table_b size 317428924
Runtime 5:28
By far the longest time is creating the _extended table. So if you can use some other comparator rather than establishing a new id field, that would be best.
Extra Cases
Here are some different cases to show that it works in a variety of situations. In all cases, I have just modified table_b, since changing table_a is always the trivial case.
Case 1
table_b
+--+-------------+------------------+--+
| | sector_id | intersected_id | |
+--+-------------+------------------+--+
| | 6 | 6 | |
| | 1 | 1 | |
| | 1 | 2 | |
| | 4 | 4 | |
| | 4 | 1 | |
| | 2 | 6 | |
| | 2 | 1 | |
| | 2 | 2 | |
| | 3 | 3 | |
+--+-------------+------------------+--+
Deletes 1 and 2.
Case 2
table_b
+--+-------------+------------------+--+
| | sector_id | intersected_id | |
+--+-------------+------------------+--+
| | 6 | 6 | |
| | 6 | 1 | |
| | 1 | 1 | |
| | 1 | 2 | |
| | 4 | 4 | |
| | 4 | 1 | |
| | 2 | 6 | |
| | 2 | 1 | |
| | 2 | 2 | |
| | 3 | 3 | |
+--+-------------+------------------+--+
Deletes 1.
Case 3
table_b
+--+-------------+------------------+--+
| | sector_id | intersected_id | |
+--+-------------+------------------+--+
| | 6 | 6 | |
| | 1 | 1 | |
| | 1 | 4 | |
| | 4 | 4 | |
| | 4 | 1 | |
| | 2 | 6 | |
| | 2 | 1 | |
| | 2 | 2 | |
| | 3 | 3 | |
+--+-------------+------------------+--+
Deletes 1, 4, and 6.

Related

Count without using functions (like count) oracle

I have two tables:
TABLE A :
CREATE TABLE z_ostan ( id NUMBER PRIMARY KEY,
name VARCHAR2(30) NOT NULL CHECK (upper(name)=name)
);
TABLE B:
CREATE TABLE z_shahr ( id NUMBER PRIMARY KEY,
name VARCHAR2(30) NOT NULL CHECK (upper(name)=name),
ref_ostan NUMBER,
CONSTRAINT fk_ref_ostan FOREIGN KEY (ref_ostan) REFERENCES z_ostan(id)
);
How can I find the second and third place "id" from -Table A- The least used table B in the table? Without using predefined functions like "count()"
This only processes existing references to Table A.
Updated for oracle (used 12c)
Without using any aggregate or window functions:
Sample data for Table: tblb
+----+---------+---------+
| id | name | tbla_id |
+----+---------+---------+
| 1 | TBLB_01 | 1 |
| 2 | TBLB_02 | 1 |
| 3 | TBLB_03 | 1 |
| 4 | TBLB_04 | 1 | 4 rows
| 5 | TBLB_05 | 2 |
| 6 | TBLB_06 | 2 |
| 7 | TBLB_07 | 2 | 3 rows
| 8 | TBLB_08 | 3 |
| 9 | TBLB_09 | 3 |
| 10 | TBLB_10 | 3 |
| 11 | TBLB_11 | 3 |
| 12 | TBLB_12 | 3 |
| 13 | TBLB_13 | 3 | 6 rows
| 14 | TBLB_14 | 4 |
| 15 | TBLB_15 | 4 |
| 16 | TBLB_16 | 4 | 3 rows
| 17 | TBLB_17 | 5 | 1 row
| 18 | TBLB_18 | 6 |
| 19 | TBLB_19 | 6 | 2 rows
| 20 | TBLB_20 | 7 | 1 row
+----+---------+---------+
There are many ways to express this logic.
Step by step with CTE terms.
The intent is (for each set of tbla_id rows in tblb)
generate a row_number (n) for the rows in each partition.
We would normally use window functions for this.
But I assume these are not allowed.
Use this row_number (n) to determine the count of rows in each tbla_id partition.
To find that count per partition, find the last row in each partition (from step 1).
Order the results of step 2 by n of these last rows.
Choose the 2nd and 3rd row of this result
Done.
WITH first AS ( -- Find the first row per tbla_id
SELECT t1.*
FROM tblb t1
LEFT JOIN tblb t2
ON t1.id > t2.id
AND t1.tbla_id = t2.tbla_id
WHERE t2.id IS NULL
)
, rnum (id, name, tbla_id, n) AS ( -- Generate a row_number (n) for each tbla_id partition
SELECT f.*, 1 FROM first f UNION ALL
SELECT n.id, n.name, n.tbla_id, c.n+1
FROM rnum c
JOIN tblb n
ON c.tbla_id = n.tbla_id
AND c.id < n.id
LEFT JOIN tblb n2
ON n.tbla_id = n2.tbla_id
AND c.id < n2.id
AND n.id > n2.id
WHERE n2.id IS NULL
)
, last AS ( -- Find the last row in each partition to obtain the count of tbla_id references
SELECT t1.*
FROM rnum t1
LEFT JOIN rnum t2
ON t1.id < t2.id
AND t1.tbla_id = t2.tbla_id
WHERE t2.id IS NULL
)
SELECT * FROM last
ORDER BY n, tbla_id OFFSET 1 ROWS FETCH NEXT 2 ROWS ONLY
;
Final Result, where n is the count of references to tbla:
+------+---------+---------+------+
| id | name | tbla_id | n |
+------+---------+---------+------+
| 20 | TBLB_20 | 7 | 1 |
| 19 | TBLB_19 | 6 | 2 |
+------+---------+---------+------+
Some intermediate results...
last CTE term result. The 2nd and 3rd rows of this become the final result.
+------+---------+---------+------+
| id | name | tbla_id | n |
+------+---------+---------+------+
| 17 | TBLB_17 | 5 | 1 |
| 20 | TBLB_20 | 7 | 1 |
| 19 | TBLB_19 | 6 | 2 |
| 7 | TBLB_07 | 2 | 3 |
| 16 | TBLB_16 | 4 | 3 |
| 4 | TBLB_04 | 1 | 4 |
| 13 | TBLB_13 | 3 | 6 |
+------+---------+---------+------+
rnum CTE term result. This provides the row_number over tbla_id partitions ordered by id
+------+---------+---------+------+
| id | name | tbla_id | n |
+------+---------+---------+------+
| 1 | TBLB_01 | 1 | 1 |
| 2 | TBLB_02 | 1 | 2 |
| 3 | TBLB_03 | 1 | 3 |
| 4 | TBLB_04 | 1 | 4 |
| 5 | TBLB_05 | 2 | 1 |
| 6 | TBLB_06 | 2 | 2 |
| 7 | TBLB_07 | 2 | 3 |
| 8 | TBLB_08 | 3 | 1 |
| 9 | TBLB_09 | 3 | 2 |
| 10 | TBLB_10 | 3 | 3 |
| 11 | TBLB_11 | 3 | 4 |
| 12 | TBLB_12 | 3 | 5 |
| 13 | TBLB_13 | 3 | 6 |
| 14 | TBLB_14 | 4 | 1 |
| 15 | TBLB_15 | 4 | 2 |
| 16 | TBLB_16 | 4 | 3 |
| 17 | TBLB_17 | 5 | 1 |
| 18 | TBLB_18 | 6 | 1 |
| 19 | TBLB_19 | 6 | 2 |
| 20 | TBLB_20 | 7 | 1 |
+------+---------+---------+------+
There are a few other ways to tackle this problem in just SQL.

How to write a sql script that cursors through a table and inserts into a different table

I am new to sql server i have the following table structure that contains more than a thousand rows.
But for example purposes this is what it would look like
Table Import
+------+---------+------------+------------+------------+------------+------------+
| Name | Code | SocksTotal | GlovesTotal| JeansTotal | ShirtsTotal| shoesTotal |
+------+---------+------------+------------+------------+------------+------------+
| OT | 45612 | 2 | 1 | 0 | 1 | 4 |
| OT | 1234 | 0 | 1 | 0 | 0 | 0 |
| US | 45896| 0 | 0 | 0 | 0 | 0 |
+------+---------+------------+------------+------------+------------+------------+
and a second table called Items follows
+------+---------+
| ID | Item |
+------+---------+
| 1 | socks |
| 2 | Gloves|
| 3 | Jeans |
| 4 | Shirts|
| 5 | shoes |
+------+---------+
from the above tables i need to write a script that would be inserted into a different table called ImportItems_Summary.
the expected output is
+------+---------+------------+------------+
| Id | Code | Items_id |Import_total|
+------+---------+------------+------------+
| 1 | 45612 | 1 | 2 |
| 2 | 45612 | 2 | 1 |
| 3 | 45612 | 4 | 1 |
| 4 | 45612 | 5 | 4 |
| 5 | 1234 | 2 | 1 |
+------+---------+------------+------------+
as you can see here that code 45612 now has 4 entries into the ImportItems_summary table where the items is not equal to 0 and the Items_id is linked to the Items table ID column.
How can i achieve the above output?.. I read up and saw a cursor might help but i am not sure how to implement this
One method uses cross apply to unpivot the columns of the unnormalized table to rows, then brings the items table with a join, and finally inserts in the target table:
insert into ImportItems_Summary (code, items_id, import_total)
select im.code, it.items_id, x.import_total
from import im
cross apply (values
('socks', sockstotal),
('gloves', glovestotal),
('jeans', jeanstotal),
('shirts', shirtstotal),
('shoes', shoestotal)
) x(item, import_total)
inner join items it on it.item = x.item

Update where value pair matches in SQL

I need to update this table:
Centers:
+-----+------------+---------+--------+
| id | country | process | center |
+-----+------------+---------+--------+
| 1 | 1 | 1 | 1 |
| 2 | 1 | 2 | 1 |
| 3 | 1 | 3 | 1 |
| 4 | 2 | 1 | 1 |
| 5 | 2 | 2 | 1 |
| 6 | 2 | 3 | 1 |
| 7 | 3 | 1 | 1 |
| 8 | 3 | 2 | 1 |
| 9 | 3 | 3 | 1 |
+-----+------------+---------+--------+
During a selection process I retrieve two tempTables:
TempCountries:
+-----+------------+
| id | country |
+-----+------------+
| 1 | 1 |
| 2 | 3 |
+-----+------------+
And TempProcesses:
+-----+------------+
| id | process |
+-----+------------+
| 1 | 2 |
| 2 | 3 |
+-----+------------+
In a subquery I get all possible combinations of the values:
SELECT TempCountries.countryId, TempProcesses.processesId FROM TempCenterCountries,TempCenterProcesses
This returns:
+-----+------------+---------+
| id | country | process |
+-----+------------+---------+
| 1 | 1 | 2 |
| 2 | 1 | 3 |
| 3 | 3 | 2 |
| 4 | 3 | 3 |
+-----+------------+---------+
During the selection process the user chooses a center for these combinations. Let’s say center = 7.
Now I need to update the center value in the Centers table where the combinations of the subquery are present.
So,
UPDATE Centers SET center = 7 WHERE ?
So I get:
+-----+------------+---------+--------+
| id | country | process | center |
+-----+------------+---------+--------+
| 1 | 1 | 1 | 1 |
| 2 | 1 | 2 | 7 |
| 3 | 1 | 3 | 7 |
| 4 | 2 | 1 | 1 |
| 5 | 2 | 2 | 1 |
| 6 | 2 | 3 | 1 |
| 7 | 3 | 1 | 1 |
| 8 | 3 | 2 | 7 |
| 9 | 3 | 3 | 7 |
+-----+------------+---------+--------+
Not all sql implementations let you have a from clause when using update. Fortunately in your case since you're doing a Cartesian product to get all the combinations it implies that you don't have any constraints between the two values.
UPDATE Centers
SET center = 7
WHERE country IN (SELECT countryId FROM TempCountries)
AND process IN (SELECT processId FROM TempCenterProcesses)
Try if this standard sql,
Update Centers
set center = 7
where country in (select country from TempCenterCountries)
and process in (select process from TempCenterProcesses)
You need to have exact match of country as well as process before you run the update query. So, something like below query would help you achieve that. Basically update the column if there exists a record
WITH (SELECT TempCountries.countryId, TempProcesses.processesId
FROM TempCenterCountries,
TempCenterProcesses) AS TempTables,
UPDATE Centers
SET center = 7
WHERE EXISTS (SELECT 1
FROM TempTables tmp
WHERE country = tmp.countryId and process = tmp.processesId
);
The idea is to update the record if both country and process matches with the one you have already fetched in temporary table.
Use update join -
For Sql Server
update c set SET center = 7 from Centers c
join
(SELECT TempCountries.countryId, TempProcesses.processesId FROM TempCenterCountries join TempCenterProcesses
)A on c.countryid=A.countryid and c.processesId=A.processId
For Mysql -
update Centers c
join
(SELECT TempCountries.countryId, TempProcesses.processesId FROM TempCenterCountries join TempCenterProcesses
)A on c.countryid=A.countryid and c.processesId=A.processId
set SET center = 7

Limit a sorted number of rows joined

I have two tables, A and B, and a join table M. I want to, for each A.id, get the top 2 B.id's sorting on the value in table M, producing the results below. This is running on an Azure SQL database
Table A Table M Table B
+-----+ +-----+-----+-------+ +-----+
| Id | | AId | BId | Value | | Id |
+-----+ +-----+-----+-------+ +-----+
| 1 | | 1 | 3 | 4 | | 1 |
| 2 | | 1 | 2 | 3 | | 2 |
| 3 | | 3 | 2 | 3 | | 3 |
| 4 | | 3 | 5 | 6 | | 4 |
+-----+ | 3 | 3 | 4 | | 5 |
| 4 | 1 | 2 | +-----+
| 4 | 2 | 1 |
| 4 | 4 | 3 |
+-----+-----+-------+
Result
+-----+-----+-------+
| AId | BId | Value |
+-----+-----+-------+
| 1 | 3 | 4 |
| 1 | 2 | 3 |
| 3 | 5 | 6 |
| 3 | 3 | 4 |
| 4 | 1 | 2 |
| 4 | 4 | 3 |
+-----+-----+-------+
I know that I can select all the M.AId rows where they equal 1, sort it, and limit by 2, but I need to do this for every row in Table A. I've made an attempt to use group by, but I wasn't sure how to sort and limit it. I've also tried to search for resources associated with this issue but I couldn't find any resources.
(I also wasn't sure how to word the title for this issue)
You can just use ROW_NUMBER:
SELECT
AId, BId, Value
FROM (
SELECT *,
Rn = ROW_NUMBER() OVER(PARTITION BY AId ORDER BY Value DESC)
FROM M
) t
WHERE Rn <= 2

Create a combined list from two tables

I have a table with CostCenter_ID (int) and a second table with Process_ID (int).
I'd like to combine the results of both tables so that each cost center ID is assigned to all process IDs, like so:
|CostCenterID | ProcessID |
---------------------------
| 1 | 1 |
| 1 | 2 |
| 1 | 3 |
| 2 | 1 |
| 2 | 2 |
| 2 | 3 |
| 3 | 1 |
| 3 | 2 |
| 3 | 3 |
I've done it before but I'm drawing a blank. I've tried this:
SELECT CostCenter_ID,NULL FROM dbo.Cost_Centers
UNION ALL
SELECT NULL,Process_ID FROM dbo.Processes
which returns this:
|CostCenterID | ProcessID |
---------------------------
| 1 | NULL |
| NULL | 1 |
| NULL | 2 |
| NULL | 3 |
Try:
select a.CostCenterID, b.ProcessID
from table1 a
cross join table2 b
or:
select a.CostCenterID, b.ProcessID
from table1 a
,table2 b
NB: cross join is the better method as it makes it clearer to the reader what your intentions are.
More info (with pics) here: http://www.w3resource.com/sql/joins/cross-join.php