How to get COUNT(*) from one partition of a table in SQL Server 2012? - sql

My table have 7 million records and I do split table in 14 part according to ID, each partition include 5 million record and size of partition is 40G. I want to run a query to get count in one partition but it scan all partitions and time of Query become very large.
SELECT COUNT(*)
FROM Item
WHERE IsComplated = 0
AND ID Between 1 AND 5000000
How can I run my query on one partition only without scan other partition?

Refer http://msdn.microsoft.com/en-us/library/ms188071.aspx
B. Getting the number of rows in each nonempty partition of a partitioned table or index
The following example returns the number of rows in each partition of table TransactionHistory that contains data. The TransactionHistory table uses partition function TransactionRangePF1 and is partitioned on the TransactionDate column.
To execute this example, you must first run the PartitionAW.sql script against the AdventureWorks2012 sample database. For more information, see PartitioningScript.
USE AdventureWorks2012;
GO
SELECT $PARTITION.TransactionRangePF1(TransactionDate) AS Partition,
COUNT(*) AS [COUNT] FROM Production.TransactionHistory
GROUP BY $PARTITION.TransactionRangePF1(TransactionDate)
ORDER BY Partition ;
GO
C. Returning all rows from one partition of a partitioned table or index
The following example returns all rows that are in partition 5 of the table TransactionHistory.
Note Note
To execute this example, you must first run the PartitionAW.sql script against the AdventureWorks2012 sample database. For more information, see PartitioningScript.
SELECT * FROM Production.TransactionHistory
WHERE $PARTITION.TransactionRangePF1(TransactionDate) = 5 ;

Related

How can I see the number of partitions in an impala table

Is it possible to see the total number of partitions of a table in impala?
For example db.table has 40.500 partitions
Use SHOW PARTITIONS statement.
SHOW PARTITIONS [database_name.]table_name
It will print partition list and you can count rows in the output minus header(3 rows) and footer(1 row). Unfortunately, there is no command which can return partition count already calculated except for Kudu tables: SHOW TABLE STATS prints the # of partitions in Kudu table.
Of course you can execute select count(distinct part_col1, part_col2...) from table, but it is not as efficient as SHOW partitions

select millions records from table sql server

I have more than 7 000 000 record in my temp table
and I want to select all record less than 3 minutes.
My query is
SELECT referrals.*,
ROW_NUMBER() OVER ( PARTITION BY Donorid ORDER BY startdate asc ) AS 'RowNumber'
FROM #tempReferrals as referrals
WHERE referrals.startdate IS NOT NULL
Otherwise I want to access only
SELECT id
FROM #tempReferrals WITH (NOLOCK)
So where take more than 5 minutes. Please give me any solution.
Also I have index id in my table.
Make sure have enough memory in your server to keep your temp table and the result table in main memory at the same time. As soon as the instance needs to start moving data to hard disk you will have trouble keeping your time constraints.

Oracle PL/SQL: SELECT DISTINCT FIELD1, FIELD2 FROM TWO ADIACENT PARTITIONS OF A PARTITIONED TABLE

I have a partitioned table with a field MY_DATE which is always and only the first day of EVERY month from year 1999 to year 2017.
In example, it contains records with 01/01/2015, 01/02/2015, ..... 01/12/2015, such as 01/01/1999, 01/02/1999, and so on.
The field MY_DATE is the partitioning field.
I would like to copy, IN THE MOST EFFICIENT WAY, the distinct values of the field2 and the field3 of two adjacent partitions (month M and month M-1), to another table, in order to find the distinct couple of (field2, field3) of the date overall.
Exchange Partition works only if destination table is not partitioned, but when copying the data of the second, adjacent partition, I receive the error,
"ORA-14099: all rows in table do not qualify for specified partition".
I am using the statement:
ALTER TABLE MY_USER.MY_PARTITIONED_TABLE EXCHANGE PARTITION PART_P201502 WITH TABLE MY_USER.MY_TABLE
Of course MY_PARTITIONED_TABLE and MY_TABLE have the same fields, but the first is partitioned as described above.
Please suppose that MY_PARTITIONED_TABLE is a huge table with about 500 million records.
The goal is to find the different couples of (field2, field3) values of the two adjacent partitions.
My approach was: copy the data of the partition M, copy the data of the partition M-1, and then SELECT DISTINCT FIELD2, FIELD3 from DESTINATION_TABLE.
Thank you very much for considering my request.
I would like to copy, ...
Please note that EXCHANGE PARTITION performs no copy, but EXCHANGE. I.e. the content of the partition of the big table and the temporary table are switched.
If you performs this twice for two different partitions and the same temp table you get exactly the error you received.
To copy (extract the data without changing the big table) you may use
create table tab1 as
select * from bigtable partition (partition_name1)
create table tab2 as
select * from bigtable partition (partition_name2)
Your source table is unchanged, after you are ready simple drop the two temp tables. You need only additional space for the two partitions.
Maybe you can event perform your query without copying the data
with tmp as (
select * from bigtable partition (partition_name1)
union all
select * from bigtable partition (partition_name2)
)
select ....
from tmp;
Good luck!

Create a unique index on a non-unique column

Not sure if this is possible in PostgreSQL 9.3+, but I'd like to create a unique index on a non-unique column. For a table like:
CREATE TABLE data (
id SERIAL
, day DATE
, val NUMERIC
);
CREATE INDEX data_day_val_idx ON data (day, val);
I'd like to be able to [quickly] query only the distinct days. I know I can use data_day_val_idx to help perform the distinct search, but it seems this adds extra overhead if the number of distinct values is substantially less than the number of rows in the index covers. In my case, about 1 in 30 days is distinct.
Is my only option to create a relational table to only track the unique entries? Thinking:
CREATE TABLE days (
day DATE PRIMARY KEY
);
And update this with a trigger every time we insert into data.
An index can only index actual rows, not aggregated rows. So, yes, as far as the desired index goes, creating a table with unique values like you mentioned is your only option. Enforce referential integrity with a foreign key constraint from data.day to days.day. This might also be best for performance, depending on the complete situation.
However, since this is about performance, there is an alternative solution: you can use a recursive CTE to emulate a loose index scan:
WITH RECURSIVE cte AS (
( -- parentheses required
SELECT day FROM data ORDER BY 1 LIMIT 1
)
UNION ALL
SELECT (SELECT day FROM data WHERE day > c.day ORDER BY 1 LIMIT 1)
FROM cte c
WHERE c.day IS NOT NULL -- exit condition
)
SELECT day FROM cte;
Parentheses around the first SELECT are required because of the attached ORDER BY and LIMIT clauses. See:
Combining 3 SELECT statements to output 1 table
This only needs a plain index on day.
There are various variants, depending on your actual queries:
Optimize GROUP BY query to retrieve latest row per user
Unused index in range of dates query
Select first row in each GROUP BY group?
More in my answer to your follow-up querstion:
Counting distinct rows using recursive cte over non-distinct index

Range partition skip check

We have large amount of data partitioned on year value using range partition in oracle. We have used range partition but each partition contains data only for one year. When we write a query targeting a specific year, oracle fetches the information from that partition but still checks if the year is what we have specified. Since this year column is not part of the index it fetches the year from table and compares it. We have seen that any time the query goes to fetch table data it is getting too slow.
Can we somehow avoid oracle comparing the year values since we for sure know that the partition contains information for only one year.
Update:
The year data type on which partition is performed is of type number.
We are not selecting any additional columns. I am just performing a count(*) and no columns are being selected.
If we remove the condition and target the query to specific partition as
select count(*) from table_name partition(part_2004)it is faster
while
select count(*) from table
where year = 2004is way slower.
The partition is on year column which is a number and is done something like below
year less than 2005 part_2004
year less than 2006 part_2005
year less than 2007 part_2006
...so on
Without the explain plan or the table definition it's really hard to tell what goes on. My first guess is that you have LOCAL partitionned indexes without the year column. They help with the COUNT(*) on a partition, however they don't seem to be used when you query a single year (at least on 10.2.0.3).
Here is a small example that reproduces your finding (and a workaround):
SQL> CREATE TABLE DATA (
2 YEAR NUMBER NOT NULL,
3 ID NUMBER NOT NULL,
4 extra CHAR(1000)
5 ) PARTITION BY RANGE (YEAR) (
6 PARTITION part1 VALUES LESS THAN (2010),
7 PARTITION part2 VALUES LESS THAN (2011)
8 );
Table created
SQL> CREATE INDEX ix_id ON DATA (ID) LOCAL;
Index created
SQL> INSERT INTO DATA
2 (SELECT 2009+MOD(ROWNUM, 2), ROWNUM, 'A' FROM DUAL CONNECT BY LEVEL <=1e4);
10000 rows inserted
SQL> EXEC dbms_stats.gather_table_stats(USER, 'DATA', CASCADE=>TRUE);
PL/SQL procedure successfully completed
Now compare the two explain plans:
SQL> SELECT COUNT(*) FROM DATA WHERE YEAR=2010;
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=197 Card=1 Bytes=4)
1 0 SORT (AGGREGATE)
2 1 PARTITION RANGE (SINGLE) (Cost=197 Card=5000 Bytes=20000)
3 2 TABLE ACCESS (FULL) OF 'DATA' (TABLE) (Cost=197 Card=5000...)
SQL> SELECT COUNT(*) FROM DATA PARTITION (part1);
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=11 Card=1)
1 0 SORT (AGGREGATE)
2 1 PARTITION RANGE (SINGLE) (Cost=11 Card=5000)
3 2 INDEX (FULL SCAN) OF 'IX_ID' (INDEX) (Cost=11 Card=5000)
As you can see the index is not used when you query the year directly. When you add the year to the LOCAL index it will be used. I used the COMPRESS 1 instruction to tell Oracle to compress the first column. The resulting index is nearly the same size as the original index (thanks to compression) so performance shouldn't be impacted.
SQL> DROP INDEX ix_id;
Index dropped
SQL> CREATE INDEX ix_id ON DATA (year, ID) LOCAL COMPRESS 1;
Index created
SQL> SELECT COUNT(*) FROM DATA WHERE YEAR=2010;
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=12 Card=1 Bytes=4)
1 0 SORT (AGGREGATE)
2 1 PARTITION RANGE (SINGLE) (Cost=12 Card=5000 Bytes=20000)
3 2 INDEX (RANGE SCAN) OF 'IX_ID' (INDEX) (Cost=12 Card=5000...)
Are you sure that it goes to the table just for checking the year? Maybe there are other columns involved?
Was the query supposed to work only on (partitioned) indexes?
If it needs to go to the table anyway, that extra check is not costing much (if the partition is right).
Can you post the query and execution plan?