Pivoting on table with huge number of records - sql

I have the following tables:
Create Table A
(rpt_id number,
Acct_id number,
type vatchar2(10));
Create Table 2
(rpt_id number,
Acct_id number,
tp varchar2(10),
information varchar2(100));
Insert into A1 (RPT_ID,ACCT_ID,TYPE) values (1,11,'type1');
Insert into A1 (RPT_ID,ACCT_ID,TYPE) values (2,22,'type2');
Insert into A2 (RPT_ID,ACCT_ID,TP,INFORMATION) values (1,11,'billnum','2341');
Insert into A2 (RPT_ID,ACCT_ID,TP,INFORMATION) values (1,11,'billname','abcd');
I need to take information as below:
RPT_ID ACCT_ID billnum billname
------ ------- ------- --------
1 11 2341 abcd
This table will have a huge amount of data, around 200000 records in A1 and related records in A2 - atleast 4 to 5 rows for each RPT_ID.
Should I be creating a pivot direct from these two joins to improve performance?
So far I have used this approach:
Insert into t3
as select a2.*
from a1
join a2 on a1.rpt_id = a2.rpt_id and a1.ACCT_ID = a2.ACCT_ID
where a1.type = 'type1';
Pivot on t3 to make the following structure and insert into t4 to use it later in the code.
RPT_ID ACCT_ID billnum billname
------ ------- ------- --------
1 11 2341 abcd
This is going full scan for the A2 table. Are there any ways to avoid a full scan? Will pivot have performance issues with huge data?

This is going full scan for A2 table, is there anything we can about
this ot avoid full scan
Have you created any indexes on the tables in question? If not, then a full table scan is the only option!
And remember: a full table scan can be the fastest way to get the rows. To see if that's the case, you need to get the execution plan for your query.
That said, the current process of loading the join into a third table, then pivoting the results into a forth is convoluted. And likely to be a lot slower than just running query.
If you want to pre-compute the pivot, you're better off with a materialized view. This stores the result of your query. And - provided you can make it fast refresh on commit - the database will update it after you run DML.
For example:
Create Table A1 (
rpt_id number,
Acct_id number,
type varchar2(10)
);
Create Table A2 (rpt_id number,
Acct_id number,
tp varchar2(10),
information varchar2(100)
);
Insert into A1 (RPT_ID,ACCT_ID,TYPE) values (1,11,'type1');
Insert into A1 (RPT_ID,ACCT_ID,TYPE) values (2,22,'type2');
Insert into A2 (RPT_ID,ACCT_ID,TP,INFORMATION) values (1,11,'billnum','2341');
Insert into A2 (RPT_ID,ACCT_ID,TP,INFORMATION) values (1,11,'billname','abcd');
commit;
create materialized view log on a1
with rowid, sequence ( rpt_id,acct_id,type )
including new values;
create materialized view log on a2
with rowid, sequence ( rpt_id,acct_id,tp,information )
including new values;
create materialized view mv
refresh fast on commit
as
with rws as (
select a1.type, a2.*
from a1
join a2 on a1.rpt_id = a2.rpt_id
and a1.ACCT_ID = a2.ACCT_ID
)
select type, rpt_id, acct_id,
max ( case when tp = 'billnum' then information end ) billnum,
max ( case when tp = 'billname' then information end ) billname,
count(*)
from rws
group by type, rpt_id, acct_id;
Insert into A2 (RPT_ID,ACCT_ID,TP,INFORMATION) values (2,22,'billname','abcd');
commit;
select * from mv;
TYPE RPT_ID ACCT_ID BILLNUM BILLNAME COUNT(*)
type1 1 11 2341 abcd 2
type2 2 22 <null> abcd 1
If necessary you can create indexes on the materialized view itself, further improving performance.
NB - Oracle Database does have a pivot clause, but this doesn't work with fast refresh on commit. You need the old-fashioned version.

Related

Update Oracle table from another table value on matching case

I have a main table (say tableA which has columns tab_a_id, field_code , field_id). There is another table, say tableB which has columns area_id , area_code. tab_a_id is a primary key of TableA. I want to update field_id of tableA based on field_code. field_code of tableA and area_code of tableB are matching but not identical, mean field_code has other values which are not matching with area_code column. I want to set field_id = area_id if field_code = area_code but, if not matched it should set to default value -1 which is 'unknown' field. I tried with subquery and bulk update (e.g Update tableA SET field_id = (SELECT area_code from tableB where area_code = field_code)). This worked for limited set of data. But I have 3 Million matching records , which means 3 million subqueries. Another problem is there are 7 million records, resulting 4 million unmatched records & useless subqueries.
Is there any optimal way to update such records with minimum time and better efficiency. I tried merge command but it has poor performance compare to forall loop query
Updating 3 out of 7 million rows seems to be the problem here.
I've created a test set in a database on a small machine, and the fasted way to get your results is to create a new table (CTAS) with the desired data and later swap names. I have not used the primary key column tab_a_id to simplify the answer.
CREATE TABLE a (field_id NUMBER, field_code VARCHAR2(30)) NOLOGGING;
CREATE TABLE b (area_id NUMBER, area_code VARCHAR2(30)) NOLOGGING;
Using MERGE and UPDATE is quite slow (15 minutes), presumably because of the amount of changes:
UPDATE a SET field_id=-1 WHERE field_code NOT IN (SELECT area_code FROM b);
5,599,989 rows updated. (560 seconds)
MERGE INTO a USING b ON (a.field_code=b.area_code)
WHEN MATCHED THEN UPDATE SET a.field_id = b.area_id;
2,400,011 rows merged. (232 seconds)
However, creating a new table with the changed data is 20 times faster and takes only 38 seconds:
CREATE TABLE x NOLOGGING AS
SELECT a.field_id, NVL(b.area_code, -1) AS field_code
FROM a JOIN b ON a.field_code=b.area_code;
Here is the test data generation:
INSERT /*+ APPEND */ INTO a (field_id, field_code) SELECT id, to_char(id) from (select level as id from dual connect by rownum <= 1000000); COMMIT;
INSERT /*+ APPEND */ INTO a (field_id, field_code) SELECT field_id+1000000, to_char(field_id+1000000) from a; COMMIT;
INSERT /*+ APPEND */ INTO a (field_id, field_code) SELECT field_id+2000000, to_char(field_id+2000000) from a; COMMIT;
INSERT /*+ APPEND */ INTO a (field_id, field_code) SELECT field_id+4000000, to_char(field_id+4000000) from a; COMMIT;
EXEC dbms_stats.gather_table_stats(null, 'a');
INSERT /*+ APPEND */ INTO b (area_id, area_code) SELECT -field_id, field_code FROM a SAMPLE (30);
exec dbms_stats.gather_table_stats(null, 'b');

Add column to ensure composite key is unique

I have a table which needs to have a composite primary key based on 2 columns (Material number, Plant).
For example, this is how it is currently (note that these rows are not unique):
MATERIAL_NUMBER PLANT NUMBER
------------------ ----- ------
000000000000500672 G072 1
000000000000500672 G072 1
000000000000500672 G087 1
000000000000500672 G207 1
000000000000500672 G207 1
However, I'll need to add the additional column (NUMBER) to the composite key such that each row is unique, and it must work like this:
For each MATERIAL_NUMBER, for each PLANT, let NUMBER start at 1 and increment by 1 for each duplicate record.
This would be the desired output:
MATERIAL_NUMBER PLANT NUMBER
------------------ ----- ------
000000000000500672 G072 1
000000000000500672 G072 2
000000000000500672 G087 1
000000000000500672 G207 1
000000000000500672 G207 2
How would I go about achieving this, specifically in SQL Server?
Best Regards!
SOLVED.
See below:
SELECT MATERIAL_NUMBER, PLANT, (ROW_NUMBER() OVER (PARTITION BY MATERIAL_NUMBER, PLANT ORDER BY VALID_FROM)) as NUMBER
FROM Table_Name
Will output the table in question, with the NUMBER column properly defined
Suppose this is actual table,
create table #temp1(MATERIAL_NUMBER varchar(30),PLANT varchar(30), NUMBER int)
Suppose you want to insert only single record then,
declare #Num int
select #Num=isnull(max(number),0) from #temp1 where MATERIAL_NUMBER='000000000000500672' and PLANT='G072'
insert into #temp1 (MATERIAL_NUMBER,PLANT , NUMBER )
values ('000000000000500672','G072',#Num+1)
Suppose you want to insert bulk record.Your bulk record sample data is like
create table #temp11(MATERIAL_NUMBER varchar(30),PLANT varchar(30))
insert into #temp11 (MATERIAL_NUMBER,PLANT)values
('000000000000500672','G072')
,('000000000000500672','G072')
,('000000000000500672','G087')
,('000000000000500672','G207')
,('000000000000500672','G207')
You want to insert `#temp11` in `#temp1` maintaining number id
insert into #temp1 (MATERIAL_NUMBER,PLANT , NUMBER )
select t11.MATERIAL_NUMBER,t11.PLANT
,ROW_NUMBER()over(partition by t11.MATERIAL_NUMBER,t11.PLANT order by (select null))+isnull(maxnum,0) as Number from #temp11 t11
outer apply(select MATERIAL_NUMBER,PLANT,max(NUMBER)maxnum from #temp1 t where t.MATERIAL_NUMBER=t11.MATERIAL_NUMBER
and t.PLANT=t11.PLANT group by MATERIAL_NUMBER,PLANT) t
select * from #temp1
drop table #temp1
drop table #temp11
Main question is Why you need number column ? In mot of the cases you don't need number column,you can use ROW_NUMBER()over(partition by t11.MATERIAL_NUMBER,t11.PLANT order by (select null)) to display where you need. This will be more efficient.
Or tell the actual situation and number of rows involved where you will be needing Number column.

Get the row values as column in SQL

I have below table,and need to get row values as an output.
This is a part of a view in Oracle Database.
I need to get the output using SQL as below.name,address,regionare taking from another table by referringID .
Looking for much simple way since full query have more than 15 columns and below also need to be added as columns.
Thanks.
"Looking for much simple way since full query have more than 15 columns"
Sorry, you can have a complex query or no query at all :)
The problem is the structure of the posted table mandates a complex query. That's because it uses a so-called "generic data model", which is actually a data anti-model. The time saved in not modelling the requirement and just smashing values into the table is time you will have to spend writing horrible queries to get those values out again.
I assume you need to drive off the other table you referred to, and the posted table contains attributes supplementary to the core record.
select ano.id
, ano.name
, ano.address
, ano.region
, t1.value as alt_id
, t2.value as birth_date
, t3.value as contact_no
from another_table ano
left outer join ( select id, value
from generic_table
where key = 'alt_id' ) t1
on ano.id = t1.id
left outer join ( select id, value
from generic_table
where key = 'birth_date' ) t2
on ano.id = t2.id
left outer join ( select id, value
from generic_table
where key = 'contact_no' ) t3
on ano.id = t3.id
Note the need to use outer joins: one of the problems with generic data models is the enforcement of integrity constraints. Weak data typing can also be an issue (say if you wanted to convert the birth_date string into an actual date).
PIVOT concept fits well for these types of problems :
SQL> create table person_info(id int, key varchar2(25), value varchar2(25));
SQL> create table person_info2(id int, name varchar2(25), address varchar2(125), region varchar2(25));
SQL> insert into person_info values(4150521,'contact_no',772289317);
SQL> insert into person_info values(4150522,'alt_id','98745612V');
SQL> insert into person_info values(4150522,'birth_date',date '1990-04-21');
SQL> insert into person_info values(4150522,'contact_no',777894561);
SQL> insert into person_info2 values(4150521,'ABC','AAAAAA','ASD');
SQL> insert into person_info2 values(4150522,'XYZ','BBBBB','WER');
SQL> select p1.id, name, address, region, alt_id, birth_date, contact_no
from person_info
pivot
(
max(value) for key in ('alt_id' as alt_id,'birth_date' as birth_date,'contact_no' as contact_no)
) p1 join person_info2 p2 on (p1.id = p2.id);
ID NAME ADDRESS REGION ALT_ID BIRTH_DATE CONTACT_NO
------- ------- ------- ------ --------- ---------- ----------
4150521 ABC AAAAAA ASD 12345678V 21-APR-89 772289317
4150522 XYZ BBBBB WER 98745612V 21-APR-90 777894561

Help With SQL - Combining Two Rows Into One Row

I have an interesting SQL problem that I need help with.
Here is the sample dataset:
Warehouse DateStamp TimeStamp ItemNumber ID
A 8/1/2009 10001 abc 1
B 8/1/2009 10002 abc 1
A 8/3/2009 12144 qrs 5
C 8/3/2009 12143 qrs 5
D 8/5/2009 6754 xyz 6
B 8/5/2009 6755 xyz 6
This dataset represents inventory transfers between two warehouses. There are two records that represent each transfer, and these two transfer records always have the same ItemNumber, DateStamp, and ID. The TimeStamp values for the two transfer records always have a difference of 1, where the smaller TimeStamp represents the source warehouse record and the larger TimeStamp represents the destination warehouse record.
Using the sample dataset above, here is the query result set that I need:
Warehouse_Source Warehouse_Destination ItemNumber DateStamp
A B abc 8/1/2009
C A qrs 8/3/2009
D B xyz 8/5/2009
I can write code to produce the desired result set, but I was wondering if this record combination was possible through SQL. I am using SQL Server 2005 as my underlying database. I also need to add a WHERE clause to the SQL, so that for example, I could search on Warehouse_Source = A. And no, I can't change the data model ;).
Any advice is greatly appreciated!
Regards,
Mark
SELECT source.Warehouse as Warehouse_Source
, dest.Warehouse as Warehouse_Destination
, source.ItemNumber
, source.DateStamp
FROM table source
JOIN table dest ON source.ID = dest.ID
AND source.ItemNumber = dest.ItemNumber
AND source.DateStamp = dest.DateStamp
AND source.TimeStamp = dest.TimeStamp + 1
Mark,
Here is how you can do this with row_number and PIVOT. With a clustered index or primary key on the columns as I suggest, it will use a straight-line query plan with no Sort operation, thus be particularly efficient.
create table T(
Warehouse char,
DateStamp datetime,
TimeStamp int,
ItemNumber varchar(10),
ID int,
primary key(ItemNumber,DateStamp,ID,TimeStamp)
);
insert into T values ('A','20090801','10001','abc','1');
insert into T values ('B','20090801','10002','abc','1');
insert into T values ('A','20090803','12144','qrs','5');
insert into T values ('C','20090803','12143','qrs','5');
insert into T values ('D','20090805','6754','xyz','6');
insert into T values ('B','20090805','6755','xyz','6');
with Tpaired(Warehouse,DateStamp,TimeStamp,ItemNumber,ID,rk) as (
select
Warehouse,DateStamp,TimeStamp,ItemNumber,ID,
row_number() over (
partition by ItemNumber,DateStamp,ID
order by TimeStamp
)
from T
)
select
max([1]) as Warehouse_Source,
max([2]) as Warehouse_Destination,
ItemNumber,
DateStamp
from Tpaired
pivot (
max(Warehouse) for rk in ([1],[2])
) as P
group by ItemNumber, DateStamp, ID;
go
drop table T;

Adding Row Numbers To a SELECT Query Result in SQL Server Without use Row_Number() function

i need Add Row Numbers To a SELECT Query without using Row_Number() function.
and without using user defined functions or stored procedures.
Select (obtain the row number) as [Row], field1, field2, fieldn from aTable
UPDATE
i am using SAP B1 DIAPI, to make a query , this system does not allow the use of rownumber() function in the select statement.
Bye.
I'm not sure if this will work for your particular situation or not, but can you execute this query with a stored procedure? If so, you can:
A) Create a temp table with all your normal result columns, plus a Row column as an auto-incremented identity.
B) Select-Insert your original query, sans the row column (SQL will fill this in automatically for you)
C) Select * on the temp table for your result set.
Not the most elegant solution, but will accomplish the row numbering you are wanting.
This query will give you the row_number,
SELECT
(SELECT COUNT(*) FROM #table t2 WHERE t2.field <= t1.field) AS row_number,
field,
otherField
FROM #table t1
but there are some restrictions when you want to use it. You have to have one column in your table (in the example it is field) which is unique and numeric and you can use it as a reference. For example:
DECLARE #table TABLE
(
field INT,
otherField VARCHAR(10)
)
INSERT INTO #table(field,otherField) VALUES (1,'a')
INSERT INTO #table(field,otherField) VALUES (4,'b')
INSERT INTO #table(field,otherField) VALUES (6,'c')
INSERT INTO #table(field,otherField) VALUES (7,'d')
SELECT * FROM #table
returns
field | otherField
------------------
1 | a
4 | b
6 | c
7 | d
and
SELECT
(SELECT COUNT(*) FROM #table t2 WHERE t2.field <= t1.field) AS row_number,
field,
otherField
FROM #table t1
returns
row_number | field | otherField
-------------------------------
1 | 1 | a
2 | 4 | b
3 | 6 | c
4 | 7 | d
This is the solution without functions and stored procedures, but as I said there are the restrictions. But anyway, maybe it is enough for you.
RRUZ, you might be able to hide the use of a function by wrapping your query in a View. It would be transparent to the caller. I don't see any other options, besides the ones already mentioned.