Tracking row numbers in Oracle - sql

Is there a way I could keep track of modified tables in Oracle?
Is there a master table that keeps track of all other table's row? For example if I add a row to table1 it would update the row count stating that table1 now has 5 rows.
I was thinking of tracking either dba_tables or all_tables or user_tables but I'm not sure which one actually counts the number of rows each table has.

You can get an improvement on the just querying user/all/dba_statistics by combining them with information gathered by table monitoring.
The views DBA/ALL/USER_TAB_MODIFICATIONS are populated with the number on insets, updates, deletes and truncates on the table since statistics were last gathered. The view is populated asynchronously so call DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO to flush the latest in-memory data to the tables.
Bear in mind that statistics themselves may be estimated, and although the accuracy is pretty good on most tables even to surprisingly low levels of estimation percent (even down to 5% or below), if you need accurate numbers you'll have to query the tables themselves with count(*). You can put together a pipelined function to do this for multiple tables with a single query.

SELECT TABLE_NAME, NUM_ROWS
FROM USER_TABLES
I highly doubt you're actually using Oracle 3.1. This query works at least in 11g (I don't have other instances to test at the moment).
Keep in mind that this is a data dictionary table and it won't update automatically after you insert a row in any schema table. The Gather Statistics procedure must be run to update these records.

The only difference among dba|user|all_tables is scope. user_tables limit the output to the tables you own, all_tables is basically user_tables + tables from other schemas you've been granted access to, and dba_tables is everything that exists in the database.
num_rows is a valid option to track amount of number of rows in a table. unfortunately, it is not calculated real-time, but as a part of table statistics collection operation. There is no out of the box option of tracking amount of rows in real-time I am aware of.

Related

Oracle efficient way of updating non-indexed and non-partioned table?

Is there an efficient way to update rows of a table that has no indexes and no partitions (and ~50millions rows)?
I have a date field LOAD_DTTM and values of this field for rows that require update (around 2000 distinct dates).
WIll update be faster if i specify a date in a WHERE clause along with the UNIQUE_ID of a row?
If you want to update all, or a large number, of the rows then the quickest way is:
create table my_table_copy as
select ... -- all the columns, updating values as required
from my_table;
drop table my_table;
rename my_table_copy to my_table;
If your table had any indexes, constraints or triggers you would now need to re-add them - but it seems you don't have that issue!
You could create a PL/SQL procedure that loops and update and commit the table every n row count -- Say every 20.000 rows. I do not advise to update the full table as it will create a lock for a looong time and expose you to data loss in case of external factors.
The answer is NO.
Even if you specify both conditions in your WHERE clause as you stated, it won't help you to avoid a full scan of your table.
Even if one of your criteria will uniquely identify the row, it still won't help.
There is a real example tested on Oracle 12C ver.2 similar to your case. No indexes, no partitions, nothing. Just plain table with 4 columns
I have a table with 18mn records.
I also have CUSTOMER_ID which is a UNIQUE identifier for a row.
I also have ORDER_DATE column there.
Even if I do the query that you mentioned
update hit set status = 1 where customer_id = 408518625844 and order_date = '09-DEC-19';
it won't help me to avoid a full table scan. See below Execution Plan. Therefore under conditions, you've specified, you will be always getting the slowest execution time possible. Full Table Scan on 50mn rows is actually the worst-case scenario.
And pay attention to that Cost, it is 26539 on 18mn rows.
So if you have 50mn rows we can easily expect much more Cost for your query

Index to get row count of read-only (immutable) PostgreSQL table?

I have a script that runs several times a day, which records the row counts of several PostgreSQL tables.
Some of the tables though are read-only and never change. (No rows are added or removed, nor are any values changed.)
Is there a way I could quickly get the row count from PostgreSQL? Eg. Could I create an index on select count(*) from some_table;?
I'd prefer not to cache this in the script. If I were to cache in the script, I haven't found a reliably way to determine if a table has been changed since the last time the script has run.
Unfortunately, in postgresql SELECT COUNT(*) is often slower than mysql to which it often get's compared to.
You can use the following query as an alternative to SELECT COUNT(*).
SELECT reltuples FROM pg_class WHERE relname = 'mytable';
This is not always 100% upto date but for immutable tables it will be accurate every time. And instant. For very large tables the percentage error will be very small and thus well worth the massive saving in time.
If it does matter and the table does not contain nulls, you can use
SELECT COUNT(primary_key_column) FROM table
and this will be significantly faster than SELECT COUNT(*)

Trouble in displaying number of rows in table, Oracle express 11g DB

I know two ways to display no. of rows one using count() - slower, other using user_tables -quickie.
select table_name, num_rows from user_tables;
displays, null for 4 tables
TABLE_NAME NUM_ROWS
TABLEP
TABLEU
TABLEN
TABLE1
TRANSLATE 26
but
select count(*) from tableu
gives,
COUNT(*)
6
What is the problem here, what should I do so that user_tables will be updated/ or whatever to show exact no. of rows. I have already tried issuing commit statement.
num_rows is not accurate since it depends on when is the last time DBMS_STATS package was ran:
exec dbms_stats.gather_schema_stats('ONWER NAME');
Run stats like above and then re-run your query.
You should not assume or expect that num_rows in user_tables is an accurate row count. The only way to get an accurate row count would be to do a count(*) against the table.
num_rows is used by the cost-based optimizer (CBO) to provide estimates that drive query plans. The actual value does not need to be particularly accurate for those estimates to generate query plans-- if the optimizer guesses incorrectly by a factor of 3 or 4 the number of rows that an operation will produce, that's still likely to be more than accurate enough. This estimate is generated when statistics are gathered on the tables. Generally, that happens late at night and only on tables that are either missing (num_rows is NULL) or are stale (generally meaning that roughly 20% of the rows are new or updated since the last time statistics were gathered). And even then, the values that are generated are normally only estimates, they're not intended to be 100% accurate.
It is possible to call dbms_stats.gather_table_stats to force num_rows to be populated immediately before querying num_rows and to pass parameters to generate a completely accurate value. Of course, that means that gather_table_stats is doing a count(*) under the covers (plus doing additional work to gather additional statistics) so it would be easier and more efficient to have done a count(*) directly in the first place.

Alternatives to UPDATE statement Oracle 11g

I'm currently using Oracle 11g and let's say I have a table with the following columns (more or less)
Table1
ID varchar(64)
Status int(1)
Transaction_date date
tons of other columns
And this table has about 1 Billion rows. I would want to update the status column with a specific where clause, let's say
where transaction_date = somedatehere
What other alternatives can I use rather than just the normal UPDATE statement?
Currently what I'm trying to do is using CTAS or Insert into select to get the rows that I want to update and put on another table while using AS COLUMN_NAME so the values are already updated on the new/temporary table, which looks something like this:
INSERT INTO TABLE1_TEMPORARY (
ID,
STATUS,
TRANSACTION_DATE,
TONS_OF_OTHER_COLUMNS)
SELECT
ID
3 AS STATUS,
TRANSACTION_DATE,
TONS_OF_OTHER_COLUMNS
FROM TABLE1
WHERE
TRANSACTION_DATE = SOMEDATE
So far everything seems to work faster than the normal update statement. The problem now is I would want to get the remaining data from the original table which I do not need to update but I do need to be included on my updated table/list.
What I tried to do at first was use DELETE on the same original table using the same where clause so that in theory, everything that should be left on that table should be all the data that i do not need to update, leaving me now with the two tables:
TABLE1 --which now contains the rows that i did not need to update
TABLE1_TEMPORARY --which contains the data I updated
But the delete statement in itself is also too slow or as slow as the orginal UPDATE statement so without the delete statement brings me to this point.
TABLE1 --which contains BOTH the data that I want to update and do not want to update
TABLE1_TEMPORARY --which contains the data I updated
What other alternatives can I use in order to get the data that's the opposite of my WHERE clause (take note that the where clause in this example has been simplified so I'm not looking for an answer of NOT EXISTS/NOT IN/NOT EQUALS plus those clauses are slower too compared to positive clauses)
I have ruled out deletion by partition since the data I need to update and not update can exist in different partitions, as well as TRUNCATE since I'm not updating all of the data, just part of it.
Is there some kind of JOIN statement I use with my TABLE1 and TABLE1_TEMPORARY in order to filter out the data that does not need to be updated?
I would also like to achieve this using as less REDO/UNDO/LOGGING as possible.
Thanks in advance.
I'm assuming this is not a one-time operation, but you are trying to design for a repeatable procedure.
Partition/subpartition the table in a way so the rows touched are not totally spread over all partitions but confined to a few partitions.
Ensure your transactions wouldn't use these partitions for now.
Per each partition/subpartition you would normally UPDATE, perform CTAS of all the rows (I mean even the rows which stay the same go to TABLE1_TEMPORARY). Then EXCHANGE PARTITION and rebuild index partitions.
At the end rebuild global indexes.
If you don't have Oracle Enterprise Edition, you would need to either CTAS entire billion of rows (followed by ALTER TABLE RENAME instead of ALTER TABLE EXCHANGE PARTITION) or to prepare some kind of "poor man's partitioning" using a view (SELECT UNION ALL SELECT UNION ALL SELECT etc) and a bunch of tables.
There is some chance that this mess would actually be faster than UPDATE.
I'm not saying that this is elegant or optimal, I'm saying that this is the canonical way of speeding up large UPDATE operations in Oracle.
How about keeping in the UPDATE in the same table, but breaking it into multiple small chunks?
UPDATE .. WHERE transaction_date = somedatehere AND id BETWEEN 0000000 and 0999999
COMMIT
UPDATE .. WHERE transaction_date = somedatehere AND id BETWEEN 1000000 and 1999999
COMMIT
UPDATE .. WHERE transaction_date = somedatehere AND id BETWEEN 2000000 and 2999999
COMMIT
This could help if the total workload is potentially manageable, but doing it all in one chunk is the problem. This approach breaks it into modest-sized pieces.
Doing it this way could, for example, enable other apps to keep running & give other workloads a look in; and would avoid needing a single humungous transaction in the logfile.

Whats the best way to select fields from multiple tables with a common prefix?

I have sensor data from a client which is in ongoing acquisition. Every week we get a table of new data (about one million rows each) and each table has the same prefix. I'd like to run a query and select some columns across all of these tables.
what would be the best way to go about this ?
I have seen some solutions that use dynammic sql and i was considering writing a stored procedure that would form a dynamic sql statement and execute it for me. But im not sure this is the best way.
I see you are using Postgresql. This is an ideal case for partitioning with constraint exclusion based on dates. You create one master table without data, and the other tables added daily inherit from it. In your case, you don't even have to worry about the nuisance of triggers on INSERT; sounds like there is never any insertion other than the daily bulk creation of a new table. See the link above for full documentation.
Queries can be run against the parent table, and Postgres takes care of looking in all the child tables, plus it is smart enough to skip child tables ruled out by WHERE criteria.
You could query the meta data for tables with the same prefix.
select table_name from information_schema.tables where table_name like 'week%'
Then you could use union all to combine queries like
select * from week001
union all
select * from week002
[...]
However I suggest appending new records to one single table, and use an index on the timestamp column. This would especially speed up queries which span multiple weeks etc. It will simplify your queries a lot, if you only have to deal with one table. If the table is getting too large, you could partition by date etc. So there should be no need to partition manually by having multiple tables.
You are correct, sometimes you have to write dynamic SQL to handle cases such as this.
If all of your tables are loaded you can query for table names within your stored procedure. Something like this:
SELECT TABLE_NAME
FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_TYPE = 'BASE TABLE'
Play with that to get the specific table names you need.
How are the table names differentiated? By date? Some incrementing ID?