I am trying to insert data into P_TABLE which is taking lot of time ~5-6 hrs to insert .Its a simple insert joining with big tables .Is there any way to reduce the timing? its a truncate and load process
I have provided the necessary information including Explain.
P_TABLE -- PARTITION ON TEAM
WH_TAB --- Total count = 2222000000
UNIQUE INDEX ON EX_ID,PROD_CD,CAM_CD,SEG_CD,LIST_CD,MAIL_DT
PARTITION BY RANGE (MAIL_DT)
REF_TAB--Total count= 240000000
ACT_TAB --Total count = 31239890
ALTER SESSION ENABLE PARALLEL DML;
INSERT /*+ append */ INTO P_TABLE
(
V_CODE,
CST_ID,
EX_ID,
PROD_CD,
CAM_CD,
SEG_CD,
LIST_CD,
MAIL_DT
)
SELECT
'ABC',
COALESCE(REF.CST_ID, WH.CST_ID),
WH.EX_ID,
PROD_CD,
CAM_CD,
SEG_CD,
LIST_CD,
F.TEAM,
FROM WH_TAB WH
LEFT OUTER JOIN
(
SELECT EX_ID, CST_ID, ACCT_ID, row_number() over(partition by EX_ID order by CST_ID asc) RN
FROM REF_TAB
) REF
LEFT OUTER JOIN ACT_TAB F
on F.CST_ID=REF.CST_ID
ON REF.RN=1 AND REF.EX_ID=WH.EX_ID
WHERE TRUNC(MAIL_DT) >= add_months(TRUNC(sysdate),-13)
AND WH.CAM_CD NOT LIKE 'ORD%';
COMMIT;
I'd suggest you run the select part of the statement without the insert to establish whether the slow part is the query or the insert.
It's likely that it's the select that's the problem, so the table structure, including indexes and explain output are really needed to say much more.
Related
I need to update a EDW_END_DATE column in a Dimension table using the LEAD() function and the table has 3 Million records , the Oracle query seems to be running forever .
UPDATE
Edwstu.Stu_Class_D A
SET
EDW_END_DATE =
(
SELECT
Edw_New_End_Dt
FROM
(
SELECT
LEAD(Edw_Begin_Date-1,1,'31-DEC-2099') over ( PARTITION BY
Acad_Term_Code ,Class_Number Order By Edw_Begin_Date ASC) AS
Edw_New_End_Dt,
STU_CLASS_KEY
FROM
Edwstu.Stu_Class_D)
B
WHERE
A.STU_CLASS_KEY = B.STU_CLASS_KEY
);
Try to update it using MERGE statement:
MERGE INTO EDWSTU.STU_CLASS_D A
USING (
SELECT
LEAD(EDW_BEGIN_DATE - 1, 1, '31-DEC-2099') OVER(
PARTITION BY ACAD_TERM_CODE, CLASS_NUMBER
ORDER BY
EDW_BEGIN_DATE ASC
) AS EDW_NEW_END_DT,
STU_CLASS_KEY
FROM
EDWSTU.STU_CLASS_D
)
B ON ( A.STU_CLASS_KEY = B.STU_CLASS_KEY )
WHEN MATCHED THEN
UPDATE SET A.EDW_END_DATE = B.EDW_NEW_END_DT;
Cheers!!
You are updating all the rows in the table. This is generally an expensive operation due to locking and logging.
You might consider regenerating the table entirely. Note: before doing this, backup the table.
-- create the table with the results you want
create table temp_stu_class_d as
select d.*, lead(Edw_Begin_Date - 1, 1, date '20199-12-31') as Edw_New_End_Dt
from Edwstu.Stu_Class_D d;
-- remove the contents of the current table
truncate table Edwstu.Stu_Class_D
insert into Edwstu.Stu_Class_D ( . . . , Edw_End_Dt) -- list the columns here
select . . . , Edw_New_End_Dt -- and here
from temp_stu_class_d;
The insert is generally much more efficient than logging each update.
I can not figure out how to get it done where I have a main select list, in which I want to use values which I select in a sub query in where clause..My query have join statements as well..loosely code will look like this
if object_id('tempdb..#tdata') is not null drop table #tdata;
go
create table #tdata(
machine_id varchar(12),
temestamp datetime,
commit_count int,
amount decimal(6,2)
);
if object_id('tempdb..#tsubqry') is not null drop table #tsubqry;
go
--Edit:this is just to elaborate question, it will be a query that
--will return data which I want to use as if it was a temp table
--based upon condition in where clause..hope makes sense
create table #tsubqry(
machine_id varchar(12),
temestamp datetime,
amount1 decimal(6,2),
amount2 decimal(6,2)
);
insert into #tdata select 'Machine1','2018-01-02 13:03:18.000',1,3.95;
insert into #tdata select 'Machine1','2018-01-02 02:11:19.000',1,3.95;
insert into #tdata select 'Machine1','2018-01-01 23:18:16.000',1,3.95;
select m1.machine_id, m1.commit_count,m1.amount,***tsub***.amount1,***tsub***.amount2
from #tdata m1, (select amount1,amount2 from #tsubqry where machine_id=#tdata.machine_id) as ***tsub***
left join sometable1 m2 on m1.machine_id=m2.machine_id;
Edit: I have tried join but am getting m1.timestamp could not be bound as I need to compare these dates as well, here is my join statement
from #tdata m1
left join (
select amount1,amount2 from #tsubqry where cast(temestamp as date)<=cast(m1.temestamp as date)
) tt on m1.machine_id=tt.machine_id
Problem is I want to use some values which has to be brought in from another table matching a criteria of main query and on top of that those values from another table has to be in the column list of main query..
Hope it made some sense.
Thanks in advance
There seems to be several things wrong here but I think I see where you are trying to go with this.
The first thing I think you are missing is is the temestamp on the #tsubqry table. Since you are referencing it later I'm assuming it should be there. So, your table definition needs to include that field:
create table #tsubqry(
machine_id varchar(12),
amount1 decimal(6,2),
amount2 decimal(6,2),
temestamp datetime
);
Now, in your query I think you were trying to use some fields from #tdata in your suquery... Fine in a where clause, but not a from clause.
Also, I'm thinking you will not want to duplicate all the data from #tdata for each matching #tsubqry, so you probably want to group by. Based on these assumptions, I think your query needs to look something like this:
select m1.machine_id, m1.commit_count, m1.amount, sum(tt.amount1), sum(tt.amount2)
from #tdata m1
left join #tsubqry tt on m1.machine_id=tt.machine_id
and cast(tt.temestamp as date)<=cast(m1.temestamp as date)
group by m1.machine_id, m1.commit_count, m1.amount
MS SQL Server actually has a built-in programming construct that I think would be useful here, as an alternative solution to joining on a subquery:
-- # ###
-- # Legends
-- # ###
-- #
-- # Table Name and PrimaryKey changes (IF machine_id is NOT the primary key in table 2,
-- # suggest make one and keep machine_ie column as an index column).
-- #
-- #
-- # #tdata --> table_A
-- # #tsubqry --> table_B
-- #
-- =====
-- SOLUTION 1 :: JOIN on Subquery
SELECT
m1.machine_id,
m1.commit_count,
m1.amount,
m2.amount1,
m2.amount2
FROM table_A m1
INNER JOIN (
SELECT machine_id, amount1, amount2, time_stamp
FROM table_B
) AS m2 ON m1.machine_id = m2.machine_id
WHERE m1.machine_id = m2.machine_id
AND CAST(m2.time_stamp AS DATE) <= CAST(m1.time_stamp AS DATE);
-- SOLUTION 2 :: Use a CTE, which is specific temporary table in MS SQL Server
WITH table_subqry AS
(
SELECT machine_id, amount1, amount2, time_stamp
FROM table_B
)
SELECT
m1.machine_id,
m1.commit_count,
m1.amount,
m2.amount1,
m2.amount2
FROM table_A m1
LEFT JOIN table_subqry AS m2 ON m1.machine_id = m2.machine_id
WHERE m1.machine_id = m2.machine_id
AND CAST(m2.time_stamp AS DATE) <= CAST(m1.time_stamp AS DATE);
Also, I created an SQLFiddle in case it's helpful. I don't know what all your data looks like, but at least this fiddle has your schema and runs the CTE query qithout any errors.
Let me know if you need any more help!
SQL Fiddle
Source: Compare Time SQL Server
SQL SERVER Using a CTE
Cheers.
The question I asked yesterday was simplified but I realize that I have to report the whole story.
I have to extract the data of 4 from 4 different tables into a Firebird 2.5 database and the following query works:
SELECT
PRODUZIONE_T t.CODPRODUZIONE,
PRODUZIONE_T.NUMEROCOMMESSA as numeroco,
ANGCLIENTIFORNITORI.RAGIONESOCIALE1,
PRODUZIONE_T.DATACONSEGNA,
PRODUZIONE_T.REVISIONE,
ANGUTENTI.NOMINATIVO,
ORDINI.T_DATA,
FROM PRODUZIONE_T
LEFT OUTER JOIN ORDINI_T ON PRODUZIONE_T.CODORDINE=ORDINI_T.CODORDINE
INNER JOIN ANGCLIENTIFORNITORI ON ANGCLIENTIFORNITORI.CODCLIFOR=ORDINI_T.CODCLIFOR
LEFT OUTER JOIN ANGUTENTI ON ANGUTENTI.IDUTENTE = PRODUZIONE_T.RESPONSABILEUC
ORDER BY right(numeroco,2) DESC, left(numeroco,3) desc
rows 1 to 500;
However the query returns me double (or more) due to the REVISIONE column.
How do I select only the rows of a single NUMEROCOMMESSA with the maximum REVISIONE value?
This should work:
select COD, ORDER, S.DATE, REVISION
FROM TAB1
JOIN
(
select ORDER, MAX(REVISION) as REVISION
FROM TAB1
Group By ORDER
) m on m.ORDER = TAB1.ORDER and m.REVISION = TAB1.REVISION
Here you go - http://sqlfiddle.com/#!6/ce7cf/4
Sample Data (as u set it in your original question):
create table TAB1 (
cod integer primary key,
n_order varchar(10) not null,
s_date date not null,
revision integer not null );
alter table tab1 add constraint UQ1 unique (n_order,revision);
insert into TAB1 values ( 1, '001/18', '2018-02-01', 0 );
insert into TAB1 values ( 2, '002/18', '2018-01-31', 0 );
insert into TAB1 values ( 3, '002/18', '2018-01-30', 1 );
The query:
select *
from tab1 d
join ( select n_ORDER, MAX(REVISION) as REVISION
FROM TAB1
Group By n_ORDER ) m
on m.n_ORDER = d.n_ORDER and m.REVISION = d.REVISION
Suggestions:
Google and read the classic book: "Understanding SQL" by Martin Gruber
Read Firebird SQL reference: https://www.firebirdsql.org/file/documentation/reference_manuals/fblangref25-en/html/fblangref25.html
Here is yet one more solution using Windowed Functions introduced in Firebird 3 - http://sqlfiddle.com/#!6/ce7cf/13
I do not have Firebird 3 at hand, so can not actually check if there would not be some sudden incompatibility, do it at home :-D
SELECT * FROM
(
SELECT
TAB1.*,
ROW_NUMBER() OVER (
PARTITION BY n_order
ORDER BY revision DESC
) AS rank
FROM TAB1
) d
WHERE rank = 1
Read documentation
https://community.modeanalytics.com/sql/tutorial/sql-window-functions/
https://www.firebirdsql.org/file/documentation/release_notes/html/en/3_0/rnfb30-dml-windowfuncs.html
Which of the three (including Gordon's one) solution would be faster depends upon specific database - the real data, the existing indexes, the selectivity of indexes.
While window functions can make the join-less query, I am not sure it would be faster on real data, as it maybe can just ignore indexes on order+revision cortege and do the full-scan instead, before rank=1 condition applied. While the first solution would most probably use indexes to get maximums without actually reading every row in the table.
The Firebird-support mailing list suggested a way to break out of the loop, to only use a single query: The trick is using both windows functions and CTE (common table expression): http://sqlfiddle.com/#!18/ce7cf/2
WITH TMP AS (
SELECT
*,
MAX(revision) OVER (
PARTITION BY n_order
) as max_REV
FROM TAB1
)
SELECT * FROM TMP
WHERE revision = max_REV
If you want the max revision number in Firebird:
select t.*
from tab1 t
where t.revision = (select max(t2.revision) from tab1 t2 where t2.order = t.order);
For performance, you want an index on tab1(order, revision). With such an index, performance should be competitive with any other approach.
There is a table which look like this:
I need to retrieve only highlighted records. And I need a query which should work on bigger table where are millions of records are exists.
Criteria:
There are 4 sets, 1st and 3rd have the similar values but 2nd and 4th sets have different values
Edit:
I made slight modification in the table( ID column added). How can we achieve the same with the ID column?
return only this kind of set where 1 or more different value exists in the set
create table #ab
(
col1a int,
colb char(2)
)
insert into #ab
values
(1,'a'),
(1,'a'),
(1,'a'),
(2,'b'),
(2,'c'),
(2,'c')
select id,col1a,colb
from #ab
where col1a in (
Select col1a from #ab group by col1a having count (distinct colb)>1)
Regarding the performance over millions of rows,i would probably check the execution plan and deal with it.with my sample data set and my query ,Distinct sort takes nearly 40% of cost..with millions of rows,it can probably go to tempdb as well..so i suggest below index which can eliminate more rows
create index nci on #ab(colb)
include(col1a)
You can also achieve it using INNER JOIN instead of IN as it is million rows query.
SELECT f.colA,f.colB
FROM
filtertable f
INNER JOIN
(
SELECT colA
FROM filtertable
GROUP BY colA
HAVING COUNT(DISTINCT colB)>1
) f1
ON f.colA = f1.colA
I am currently testing a partitioning configuration, using actual execution plan to identify RunTimePartitionSummary/PartitionsAccessed info.
When a query is run with a literal against the partitioning column the partition elimination works fine (using = and <=). However if the query is joined to a lookup table, with the partitioning column <= to a column in the lookup table and restricting the lookup table with another criteria (so that only one row is returned, the same as if it was a literal) elimination does not occur.
This only seems to happen if the join criteria is <= rather than =, even though the result is the same. Reversing the logic and using between does not work either, nor does using a cross applied function.
Edit: (Repro Steps)
OK here you go!
--Create sample function
CREATE PARTITION FUNCTION pf_Test(date) AS RANGE RIGHT FOR VALUES ('20110101','20110102','20110103','20110104','20110105')
--Create sample scheme
CREATE PARTITION SCHEME ps_Test AS PARTITION pf_Test ALL TO ([PRIMARY])
--Create sample table
CREATE TABLE t_Test
(
RowID int identity(1,1)
,StartDate date NOT NULL
,EndDate date NULL
,Data varchar(50) NULL
)
ON ps_Test(StartDate)
--Insert some sample data
INSERT INTO t_Test(StartDate,EndDate,Data)
VALUES
('20110101','20110102','A')
,('20110103','20110104','A')
,('20110105',NULL,'A')
,('20110101',NULL,'B')
,('20110102','20110104','C')
,('20110105',NULL,'C')
,('20110104',NULL,'D')
--Check partition allocation
SELECT *,$PARTITION.pf_Test(StartDate) AS PartitionNumber FROM t_Test
--Run simple test (inlcude actual execution plan)
SELECT
*
,$PARTITION.pf_Test(StartDate)
FROM t_Test
WHERE StartDate <= '20110103' AND ISNULL(EndDate,getdate()) >= '20110103'
--<PartitionRange Start="1" End="4" />
--Run test with join to a lookup (with CTE for simplicity, but doesnt work with table either)
WITH testCTE AS
(
SELECT convert(date,'20110101') AS CalendarDate,'A' AS SomethingInteresting
UNION ALL
SELECT convert(date,'20110102') AS CalendarDate,'B' AS SomethingInteresting
UNION ALL
SELECT convert(date,'20110103') AS CalendarDate,'C' AS SomethingInteresting
UNION ALL
SELECT convert(date,'20110104') AS CalendarDate,'D' AS SomethingInteresting
UNION ALL
SELECT convert(date,'20110105') AS CalendarDate,'E' AS SomethingInteresting
UNION ALL
SELECT convert(date,'20110106') AS CalendarDate,'F' AS SomethingInteresting
UNION ALL
SELECT convert(date,'20110107') AS CalendarDate,'G' AS SomethingInteresting
UNION ALL
SELECT convert(date,'20110108') AS CalendarDate,'H' AS SomethingInteresting
UNION ALL
SELECT convert(date,'20110109') AS CalendarDate,'I' AS SomethingInteresting
)
SELECT
C.CalendarDate
,T.*
,$PARTITION.pf_Test(StartDate)
FROM t_Test T
INNER JOIN testCTE C
ON T.StartDate <= C.CalendarDate AND ISNULL(T.EndDate,getdate()) >= C.CalendarDate
WHERE C.SomethingInteresting = 'C' --<PartitionRange Start="1" End="6" />
--So all 6 partitions are scanned despite only 2,3,4 being required, as per the simple select.
--edited to make resultant ranges identical to ensure fair test
It makes sense for the query to scan all the partitions.
All partitions are involved in the predicate T.StartDate <= C.CalendarDate, because the query planner can't possibly know which values C.CalendarDate might take.