SQL Partition Elimination

SQL Partition Elimination - sql

I am currently testing a partitioning configuration, using actual execution plan to identify RunTimePartitionSummary/PartitionsAccessed info.
When a query is run with a literal against the partitioning column the partition elimination works fine (using = and <=). However if the query is joined to a lookup table, with the partitioning column <= to a column in the lookup table and restricting the lookup table with another criteria (so that only one row is returned, the same as if it was a literal) elimination does not occur.
This only seems to happen if the join criteria is <= rather than =, even though the result is the same. Reversing the logic and using between does not work either, nor does using a cross applied function.
Edit: (Repro Steps)
OK here you go!
--Create sample function
CREATE PARTITION FUNCTION pf_Test(date) AS RANGE RIGHT FOR VALUES ('20110101','20110102','20110103','20110104','20110105')
--Create sample scheme
CREATE PARTITION SCHEME ps_Test AS PARTITION pf_Test ALL TO ([PRIMARY])
--Create sample table
CREATE TABLE t_Test
(
RowID int identity(1,1)
,StartDate date NOT NULL
,EndDate date NULL
,Data varchar(50) NULL
)
ON ps_Test(StartDate)
--Insert some sample data
INSERT INTO t_Test(StartDate,EndDate,Data)
VALUES
('20110101','20110102','A')
,('20110103','20110104','A')
,('20110105',NULL,'A')
,('20110101',NULL,'B')
,('20110102','20110104','C')
,('20110105',NULL,'C')
,('20110104',NULL,'D')
--Check partition allocation
SELECT *,$PARTITION.pf_Test(StartDate) AS PartitionNumber FROM t_Test
--Run simple test (inlcude actual execution plan)
SELECT
*
,$PARTITION.pf_Test(StartDate)
FROM t_Test
WHERE StartDate <= '20110103' AND ISNULL(EndDate,getdate()) >= '20110103'
--<PartitionRange Start="1" End="4" />
--Run test with join to a lookup (with CTE for simplicity, but doesnt work with table either)
WITH testCTE AS
(
SELECT convert(date,'20110101') AS CalendarDate,'A' AS SomethingInteresting
UNION ALL
SELECT convert(date,'20110102') AS CalendarDate,'B' AS SomethingInteresting
UNION ALL
SELECT convert(date,'20110103') AS CalendarDate,'C' AS SomethingInteresting
UNION ALL
SELECT convert(date,'20110104') AS CalendarDate,'D' AS SomethingInteresting
UNION ALL
SELECT convert(date,'20110105') AS CalendarDate,'E' AS SomethingInteresting
UNION ALL
SELECT convert(date,'20110106') AS CalendarDate,'F' AS SomethingInteresting
UNION ALL
SELECT convert(date,'20110107') AS CalendarDate,'G' AS SomethingInteresting
UNION ALL
SELECT convert(date,'20110108') AS CalendarDate,'H' AS SomethingInteresting
UNION ALL
SELECT convert(date,'20110109') AS CalendarDate,'I' AS SomethingInteresting
)
SELECT
C.CalendarDate
,T.*
,$PARTITION.pf_Test(StartDate)
FROM t_Test T
INNER JOIN testCTE C
ON T.StartDate <= C.CalendarDate AND ISNULL(T.EndDate,getdate()) >= C.CalendarDate
WHERE C.SomethingInteresting = 'C' --<PartitionRange Start="1" End="6" />
--So all 6 partitions are scanned despite only 2,3,4 being required, as per the simple select.
--edited to make resultant ranges identical to ensure fair test

It makes sense for the query to scan all the partitions.
All partitions are involved in the predicate T.StartDate <= C.CalendarDate, because the query planner can't possibly know which values C.CalendarDate might take.

Related

SQL Using subquery in where clause and use values in select

I can not figure out how to get it done where I have a main select list, in which I want to use values which I select in a sub query in where clause..My query have join statements as well..loosely code will look like this
if object_id('tempdb..#tdata') is not null drop table #tdata;
go
create table #tdata(
machine_id varchar(12),
temestamp datetime,
commit_count int,
amount decimal(6,2)
);
if object_id('tempdb..#tsubqry') is not null drop table #tsubqry;
go
--Edit:this is just to elaborate question, it will be a query that
--will return data which I want to use as if it was a temp table
--based upon condition in where clause..hope makes sense
create table #tsubqry(
machine_id varchar(12),
temestamp datetime,
amount1 decimal(6,2),
amount2 decimal(6,2)
);
insert into #tdata select 'Machine1','2018-01-02 13:03:18.000',1,3.95;
insert into #tdata select 'Machine1','2018-01-02 02:11:19.000',1,3.95;
insert into #tdata select 'Machine1','2018-01-01 23:18:16.000',1,3.95;
select m1.machine_id, m1.commit_count,m1.amount,***tsub***.amount1,***tsub***.amount2
from #tdata m1, (select amount1,amount2 from #tsubqry where machine_id=#tdata.machine_id) as ***tsub***
left join sometable1 m2 on m1.machine_id=m2.machine_id;
Edit: I have tried join but am getting m1.timestamp could not be bound as I need to compare these dates as well, here is my join statement
from #tdata m1
left join (
select amount1,amount2 from #tsubqry where cast(temestamp as date)<=cast(m1.temestamp as date)
) tt on m1.machine_id=tt.machine_id
Problem is I want to use some values which has to be brought in from another table matching a criteria of main query and on top of that those values from another table has to be in the column list of main query..
Hope it made some sense.
Thanks in advance

There seems to be several things wrong here but I think I see where you are trying to go with this.
The first thing I think you are missing is is the temestamp on the #tsubqry table. Since you are referencing it later I'm assuming it should be there. So, your table definition needs to include that field:
create table #tsubqry(
machine_id varchar(12),
amount1 decimal(6,2),
amount2 decimal(6,2),
temestamp datetime
);
Now, in your query I think you were trying to use some fields from #tdata in your suquery... Fine in a where clause, but not a from clause.
Also, I'm thinking you will not want to duplicate all the data from #tdata for each matching #tsubqry, so you probably want to group by. Based on these assumptions, I think your query needs to look something like this:
select m1.machine_id, m1.commit_count, m1.amount, sum(tt.amount1), sum(tt.amount2)
from #tdata m1
left join #tsubqry tt on m1.machine_id=tt.machine_id
and cast(tt.temestamp as date)<=cast(m1.temestamp as date)
group by m1.machine_id, m1.commit_count, m1.amount

MS SQL Server actually has a built-in programming construct that I think would be useful here, as an alternative solution to joining on a subquery:
-- # ###
-- # Legends
-- # ###
-- #
-- # Table Name and PrimaryKey changes (IF machine_id is NOT the primary key in table 2,
-- # suggest make one and keep machine_ie column as an index column).
-- #
-- #
-- # #tdata --> table_A
-- # #tsubqry --> table_B
-- #
-- =====
-- SOLUTION 1 :: JOIN on Subquery
SELECT
m1.machine_id,
m1.commit_count,
m1.amount,
m2.amount1,
m2.amount2
FROM table_A m1
INNER JOIN (
SELECT machine_id, amount1, amount2, time_stamp
FROM table_B
) AS m2 ON m1.machine_id = m2.machine_id
WHERE m1.machine_id = m2.machine_id
AND CAST(m2.time_stamp AS DATE) <= CAST(m1.time_stamp AS DATE);
-- SOLUTION 2 :: Use a CTE, which is specific temporary table in MS SQL Server
WITH table_subqry AS
(
SELECT machine_id, amount1, amount2, time_stamp
FROM table_B
)
SELECT
m1.machine_id,
m1.commit_count,
m1.amount,
m2.amount1,
m2.amount2
FROM table_A m1
LEFT JOIN table_subqry AS m2 ON m1.machine_id = m2.machine_id
WHERE m1.machine_id = m2.machine_id
AND CAST(m2.time_stamp AS DATE) <= CAST(m1.time_stamp AS DATE);
Also, I created an SQLFiddle in case it's helpful. I don't know what all your data looks like, but at least this fiddle has your schema and runs the CTE query qithout any errors.
Let me know if you need any more help!
SQL Fiddle
Source: Compare Time SQL Server
SQL SERVER Using a CTE
Cheers.

Query Optimization with millions of row in table

i have a table which has 4 columns
PKID,OutMailID,JobMailingDate,InsertDatetime
This is how the data ot inserted into the table
PKID is the primary Key of the table
for a single outMailID with JObMailingDate there are on avg 3 records are present in the table with
different insert date time. The table is having millions of records
I have many other table which has the same data but those is partaining to different category
Now i would like to find out the
1) Find All OutMailID Whose InsertDatetime is in between the Parameter data range
2) Once i have the list of OutMailID I would Like to Find the Minimum InsertDatetime for all these OutMailID Where this min Date falls between Param 1 and Param2
The Data for the table is like this
Select 1 as PKID,1 as OutMailID,'2010/01/01' as JobMailingDate,'2010/01/01' as InsertDatetime
UNION ALL
Select 2 as PKID,1 as OutMailID,'2010/01/01' as JobMailingDate,'2010/01/02' as InsertDatetime
UNION ALL
Select 3 as PKID,1 as OutMailID,'2010/01/01' as JobMailingDate,'2010/01/03' as InsertDatetime
UNION ALL
Select 4 as PKID,1 as OutMailID,'2010/01/01' as JobMailingDate,'2010/01/04' as InsertDatetime
All the above 2 steps i want to perform in a single Query so my query is somethig like this
Select
OutMailID,Min(InsertDatetime)
from
Table T
INNER JOIN
(
Select
OutMailID
from
Table
Where
InsertDatetime Between #Param1 and #Param2
) as T1 On (T1.OutMailID = T.outMailID)
Group by
OutMailID
Having Min(InsertDatetime) Between Between #Param1 and #Param2
But this is not Performing well. can anyone please suggest me a good way of doing this
The second problem is that once i have the output of first query then i use the same above query for other category to find out the min InsertDatatime in that category and once i have all the min date for all the category then i have to find the Min insert date among all the category
Can you please help me on this
Thanks
Atul

Does this query give you the desired results?
Select T.OutMailID, Min(T.InsertDatetime)
from Table T
INNER JOIN Table T1 On T1.OutMailID = T.outMailID
And T2.InsertDatetime Between #Param1 and #Param2
Group by OutMailID

How about using on this with statement, the with is like views that keeps everything in cache to have it for later, here is an example
with Table1 as (
Select OutMailID from Table Where InsertDatetime Between #Param1 and #Param2
),
Table2 as (
Select 4 as PKID,1 as OutMailID,'2010/01/01' as JobMailingDate,'2010/01/04' as InsertDatetime
)
select * from Table as T
inner join Table1 as T1 on T1.OutMailID = T.outMailID
group by T.OutMailID
That way you can reuse the Table1 multiple times without re-querying it again.

I think a simpler way to express your requirement is that you want all OutMailId whose first InsertDateTime is in the period specified.
It turns out that the JOIN is not necessary at all for this. This is a simpler version of your query:
Select t.OutMailID, Min(InsertDatetime)
from Table T
Group by OutMailID
Having Min(InsertDatetime) Between #Param1 and #Param2;
Many databases could take advantage of an index on Table(OutMailId, InsertDateTime) for this query.
Now, this query might not be super efficient, particularly if the range is small relative to the entire data. So, sticking with the above index, the following might work better:
select t.*
from (select OutMailId, min(InsertDatetime) as min_InsertDatetime
from table t
where InsertDatetime Between #Param1 and #Param2
group by OutMailId
) t
where not exists (select 1
from table t2
where t2.OutMailId = t.OutMailId and
t2.InsertDateTime < #Param1
);
This should use the index for the first subquery, limiting the number of ids. It should use the same index for the not exists, on a reduced number of rows.

Numeric Overflow in Recursive Query : Teradata

I'm new to teradata. I want to insert numbers 1 to 1000 into the table test_seq, which is created as below.
create table test_seq(
seq_id integer
);
After searching on this site, I came up with recusrive query to insert the numbers.
insert into test_seq(seq_id)
with recursive cte(id) as (
select 1 from test_dual
union all
select id + 1 from cte
where id + 1 <= 1000
)
select id from cte;
test_dual is created as follows and it contains just a single value. (something like DUAL in Oracle)
create table test_dual(
test_dummy varchar(1)
);
insert into test_dual values ('X');
But, when I run the insert statement, I get the error, Failure 2616 Numeric overflow occurred during computation.
What did I do wrong here? Isn't the integer datatype enough to hold numeric value 1000?
Also, is there a way to write the query so that i can do away with test_dual table?

When you simply write 1 the parser assigns the best matching datatype to it, which is a BYTEINT. The valid range of values for BYTEINT is -128 to 127, so just add a typecast to INT :-)
Usually you don't need a dummy DUAL table in Teradata, "SELECT 1;" is valid, but in some cases the parser still insists on a FROM (don't ask me why). This trick should work:
SEL * FROM (SELECT 1 AS x) AS dt;
You can create a view on this:
REPLACE VIEW oDUAL AS SELECT * FROM (SELECT 'X' AS dummy) AS dt;
Explain "SELECT 1 FROM oDUAL;" is a bit stupid, so a real table might be better. But to get efficient access (= single AMP/single row) it must be defined as follows:
CREATE TABLE dual_tbl(
dummy VARCHAR(1) CHECK ( dummy = 'X')
) UNIQUE PRIMARY INDEX(dummy); -- i remember having fun when you inserted another row in Oracle's DUAL :_)
INSERT INTO dual_tbl VALUES ('X');
REPLACE VIEW oDUAL AS SELECT dummy FROM dual_tbl WHERE dummy = 'X';
insert into test_seq(seq_id)
with recursive cte(id) as (
select cast(1 as int) from oDUAL
union all
select id + 1 from cte
where id + 1 <= 1000
)
select id from cte;
But recursion is not an appropriate way to get a range of numbers as it's sequential and always an "all-AMP step" even if it the data resides on a single AMP like in this case.
If it's less than 73414 values (201 years) better use sys_calendar.calendar (or any other table with a known sequence of numbers) :
SELECT day_of_calendar
FROM sys_calendar.CALENDAR
WHERE day_of_calendar BETWEEN 1 AND 1000;
Otherwise use CROSS joins, e.g. to get numbers from 1 to 1,000,000:
WITH cte (i) AS
( SELECT day_of_calendar
FROM sys_calendar.CALENDAR
WHERE day_of_calendar BETWEEN 1 AND 1000
)
SELECT
(t2.i - 1) * 1000 + t1.i
FROM cte AS t1 CROSS JOIN cte AS t2;

finding a mismatch while iterating rows in sql

I have this issue where date keys where just inserted into a table through SQL Server. They are populated iteratively in the fashion shown below:
20130501
20130502
20130503
...
I am currently trying to find any row where one of the dates was skipped, i.e:
20130504
20130506
20130507
I'm still a rookie in SQL Server and I have looked at CURSOR but I'm having some trouble understanding how to go about about querying this. Any help would be appreciated. Thanks.

Using some tricks from the Itzik Ben-Gan school of thought. The easiest way to find gaps is with the use of a tally table. Here is a way to create a small one into a table variable, but i would recommend creating a substantiated Numbers table because they're really handy for this kind of thing. You can find a bunch of examples on how to do that here.
First create a number table
DECLARE #Numbers TABLE ( [Number] INT );
INSERT INTO #Numbers
(
Number
)
SELECT TOP 1000
ROW_NUMBER() OVER (ORDER BY [s1].[object_id]) AS Number
FROM sys.objects s1
CROSS JOIN sys.objects s2
Next I needed to create a temp table to recreate your example
DECLARE #ExampleDates TABLE ( [RecordDateKey] INT );
INSERT INTO #ExampleDates
( [RecordDateKey] )
VALUES ( 20130501 ),
( 20130502 ),
( 20130503 ),
( 20130504 ),
( 20130506 ),
( 20130507 ),
( 20130508 ),
( 20130511 );
this syntax only works 2008-r2 and forward but since i'm just staging data it's not really a big deal. Just leaving this note for other people testing this example.
Finally we need to do some conversion work.
For larger sets, it might be beneficial to substantiate this data, but for this small example, a cte sufficed.
WITH date_convert
AS (
SELECT [RecordDateKey]
, CONVERT(DATETIME, CAST([RecordDateKey] AS VARCHAR(50)), 112) [RecordDate]
FROM #ExampleDates ed
) ,
date_range
AS (
SELECT DATEDIFF(DAY, MIN([RecordDate]), MAX([RecordDate])) [Range]
, MIN([RecordDate]) [StartDate]
FROM [date_convert]
) ,
all_dates
AS (
SELECT CONVERT(INT, CONVERT(VARCHAR(8), DATEADD(DAY, num.[Number], [StartDate]), 112)) AS [RecordDateKey]
, DATEADD(DAY, num.[Number], [StartDate]) [RecordDate]
FROM #Numbers num
CROSS JOIN [date_range] dr
WHERE num.[Number] <= dr.[Range]
)
SELECT [RecordDateKey]
, [RecordDate]
FROM all_dates ad
WHERE NOT EXISTS ( SELECT 1
FROM [date_convert] dc
WHERE ad.[RecordDate] = dc.RecordDate )
date_convert: changes the key you provided to a datetime for easy comparison and for dateadd.
date_range: finds the range of dates, and where the range starts.
all_dates: finds all of the dates that should have existed in your range.
The final select finds the records in the data that aren't in the generated set.
Using this code, this was my output. This should find gaps regardless of gap size. Which appeared to be the issue with the current accepted answer.
RecordDateKey RecordDate
------------- ----------
20130505 2013-05-05 00:00:00.000
20130509 2013-05-09 00:00:00.000
20130510 2013-05-10 00:00:00.000

SELECT *
FROM table
WHERE date - 1 NOT IN (SELECT date FROM table)
It's probably not super efficient but it should work.

SQL Server: row present in one query, missing in another

Ok so I think I must be misunderstanding something about SQL queries. This is a pretty wordy question, so thanks for taking the time to read it (my problem is right at the end, everything else is just context).
I am writing an accounting system that works on the double-entry principal -- money always moves between accounts, a transaction is 2 or more TransactionParts rows decrementing one account and incrementing another.
Some TransactionParts rows may be flagged as tax related so that the system can produce a report of total VAT sales/purchases etc, so it is possible that a single Transaction may have two TransactionParts referencing the same Account -- one VAT related, and the other not. To simplify presentation to the user, I have a view to combine multiple rows for the same account and transaction:
create view Accounting.CondensedEntryView as
select p.[Transaction], p.Account, sum(p.Amount) as Amount
from Accounting.TransactionParts p
group by p.[Transaction], p.Account
I then have a view to calculate the running balance column, as follows:
create view Accounting.TransactionBalanceView as
with cte as
(
select ROW_NUMBER() over (order by t.[Date]) AS RowNumber,
t.ID as [Transaction], p.Amount, p.Account
from Accounting.Transactions t
inner join Accounting.CondensedEntryView p on p.[Transaction]=t.ID
)
select b.RowNumber, b.[Transaction], a.Account,
coalesce(sum(a.Amount), 0) as Balance
from cte a, cte b
where a.RowNumber <= b.RowNumber AND a.Account=b.Account
group by b.RowNumber, b.[Transaction], a.Account
For reasons I haven't yet worked out, a certain transaction (ID=30) doesn't appear on an account statement for the user. I confirmed this by running
select * from Accounting.TransactionBalanceView where [Transaction]=30
This gave me the following result:
RowNumber Transaction Account Balance
-------------------- ----------- ------- ---------------------
72 30 23 143.80
As I said before, there should be at least two TransactionParts for each Transaction, so one of them isn't being presented in my view. I assumed there must be an issue with the way I've written my view, and run a query to see if there's anything else missing:
select [Transaction], count(*)
from Accounting.TransactionBalanceView
group by [Transaction]
having count(*) < 2
This query returns no results -- not even for Transaction 30! Thinking I must be an idiot I run the following query:
select [Transaction]
from Accounting.TransactionBalanceView
where [Transaction]=30
It returns two rows! So select * returns only one row and select [Transaction] returns both. After much head-scratching and re-running the last two queries, I concluded I don't have the faintest idea what's happening. Any ideas?
Thanks a lot if you've stuck with me this far!
Edit:
Here are the execution plans:
select *
select [Transaction]
1000 lines each, hence finding somewhere else to host.
Edit 2:
For completeness, here are the tables I used:
create table Accounting.Accounts
(
ID smallint identity primary key,
[Name] varchar(50) not null
constraint UQ_AccountName unique,
[Type] tinyint not null
constraint FK_AccountType foreign key references Accounting.AccountTypes
);
create table Accounting.Transactions
(
ID int identity primary key,
[Date] date not null default getdate(),
[Description] varchar(50) not null,
Reference varchar(20) not null default '',
Memo varchar(1000) not null
);
create table Accounting.TransactionParts
(
ID int identity primary key,
[Transaction] int not null
constraint FK_TransactionPart foreign key references Accounting.Transactions,
Account smallint not null
constraint FK_TransactionAccount foreign key references Accounting.Accounts,
Amount money not null,
VatRelated bit not null default 0
);

Demonstration of possible explanation.
Create table Script
SELECT *
INTO #T
FROM master.dbo.spt_values
CREATE NONCLUSTERED INDEX [IX_T] ON #T ([name] DESC,[number] DESC);
Query one (Returns 35 results)
WITH cte AS
(
SELECT *, ROW_NUMBER() OVER (ORDER BY NAME) AS rn
FROM #T
)
SELECT c1.number,c1.[type]
FROM cte c1
JOIN cte c2 ON c1.rn=c2.rn AND c1.number <> c2.number
Query Two (Same as before but adding c2.[type] to the select list makes it return 0 results)
;
WITH cte AS
(
SELECT *, ROW_NUMBER() OVER (ORDER BY NAME) AS rn
FROM #T
)
SELECT c1.number,c1.[type] ,c2.[type]
FROM cte c1
JOIN cte c2 ON c1.rn=c2.rn AND c1.number <> c2.number
Why?
row_number() for duplicate NAMEs isn't specified so it just chooses whichever one fits in with the best execution plan for the required output columns. In the second query this is the same for both cte invocations, in the first one it chooses a different access path with resultant different row_numbering.
Suggested Solution
You are self joining the CTE on ROW_NUMBER() over (order by t.[Date])
Contrary to what may have been expected the CTE will likely not be materialised which would have ensured consistency for the self join and thus you assume a correlation between ROW_NUMBER() on both sides that may well not exist for records where a duplicate [Date] exists in the data.
What if you try ROW_NUMBER() over (order by t.[Date], t.[id]) to ensure that in the event of tied dates the row_numbering is in a guaranteed consistent order. (Or some other column/combination of columns that can differentiate records if id won't do it)

If the purpose of this part of the view is just to make sure that the same row isn't joined to itself
where a.RowNumber <= b.RowNumber
then how does changing this part to
where a.RowNumber <> b.RowNumber
affect the results?

It seems you read dirty entries. (Someone else deletes/insertes new data)
try SET TRANSACTION ISOLATION LEVEL READ COMMITTED.
i've tried this code (seems equal to yours)
IF object_id('tempdb..#t') IS NOT NULL DROP TABLE #t
CREATE TABLE #t(i INT, val INT, acc int)
INSERT #t
SELECT 1, 2, 70
UNION ALL SELECT 2, 3, 70
;with cte as
(
select ROW_NUMBER() over (order by t.i) AS RowNumber,
t.val as [Transaction], t.acc Account
from #t t
)
select b.RowNumber, b.[Transaction], a.Account
from cte a, cte b
where a.RowNumber <= b.RowNumber AND a.Account=b.Account
group by b.RowNumber, b.[Transaction], a.Account
and got two rows
RowNumber Transaction Account
1 2 70
2 3 70

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL Partition Elimination - sql

It makes sense for the query to scan all the partitions. All partitions are involved in the predicate T.StartDate <= C.CalendarDate, because the query planner can't possibly know which values C.CalendarDate might take.

Related

SQL Using subquery in where clause and use values in select

Query Optimization with millions of row in table

Numeric Overflow in Recursive Query : Teradata

finding a mismatch while iterating rows in sql

SQL Server: row present in one query, missing in another

Categories

Resources