Splitting table into two parts: old data and recent data

Splitting table into two parts: old data and recent data - sql

I have a table records with structure
userId
messageId
message
timestamp
This table grows pretty large so I want to split it into two
records which will have only data for the last 30 days
and records_history which will have all the data, so most of the queries will hit only records table.
What is the best way to achieve this using Oracle? Writing a trigger or something else?

Related

How do I get the last update time of a sequence of tables in BigQuery?

A BigQuery best practice is to split timeseries in daily tables (as "NAME_yyyyMMdd") and then use Table Wildcards to query one or more of these tables.
Sometimes it is useful to get the last update time on a certain set of data (i.e. to check correctness of the ingestion procedure). How do I get the last update time over a set of tables organized like that?

A good way to achieve that is to use the __TABLES__ meta-table. Here is a generic query I use in several projects:
SELECT
MAX(last_modified_time) LAST_MODIFIED_TIME,
IF(REGEXP_MATCH(RIGHT(table_id,8),"[0-9]{8}"),LEFT(table_id,LENGTH(table_id) - 8),table_id) AS TABLE_ID
FROM
[my_dataset.__TABLES__]
GROUP BY
TABLE_ID
It will return the last update time of every table in my_dataset. For tables organized with a daily-split structure, it will return a single value (the update time of the latest table), with the initial part of their name as TABLE_ID.

SELECT *
FROM project_name.data_set_name.INFORMATION_SCHEMA.PARTITIONS
where table_name='my_table';
Solution for Google

See the number of new rows added to PostGRES tables each day

How can I see the number of new rows added to each of my database's tables in the past day?
Example result:
table_name new_rows
---------- -----------
users 32
questions 150
answers 98
...
I'm not seeing any table that stores this information in PostGRES statistics collector: http://www.postgresql.org/docs/9.1/static/monitoring-stats.html
The only solution I can think of, is create a database table that stores the row_count of each table at midnight each day.
Edit: I need this to work with any table, regardless of whether it has a "created_at" or other timestamp column. Many of the tables I would like to see the growth rate in, do not have timestamps columns & can't have one added.

The easiest way is to add a column in your table that keep a track of the insert/updated date.
Then to retrieve the rows, you can do a simple select for the last day.
From my knowledge, and I've also done a couple research to make sure, there is no intern functionality that allow you to do that without creating a field.

H2 incrementally update counts from another table?

With the H2 database, suppose there is a SUMS table that has a key and several count fields and there is an UPDATES table which has the same key and count fields. The keys in the UPDATES table may or may not exist in the SUMS table.
What is the most efficient way to add all the counts for each key from the UPDATES table to the SUM table, or insert a row with those counts if the SUMS table does not yet have it?
Of course I could always process the result set of a select on the UPDATES table and then one-by-one update or insert into the SUMS table, but this feels like there should be a more efficient way to do it.
If it is not possible in H2 but possible in some other Java-embeddable solution I would be interested in this too, because this processing is just an intermediate step for processing a larger number of these counts (a couple of dozen million keys and a couple of billion rows for updating them).

How to use sql to ignore a part of the table

I have an SQL table with 500,000 records in orders table.
The sql have been used for past 5 years and every year there are about 100,000 records added on the database.
The table has about 30 fields , one of the fields is "OrderDate"
The query needs only records for the last few months, maximum past 12 months.
so all the records before that are just useless and slow down all the query.
query is slow, and takes 3-4sec, same query was almost immediate few years ago.
i have to load and print all fields columns at once.
Can i make the SQL ignore and not look through part of records, suppose records with OrderDate before 2013, or first 400,000 records or ignore certain part of the records without deleting them?

As far as I can see you have two options:
Creating a new table which is identical to the old one and insert there the rows that you want to ignore, then, delete those same rows from the original table. This solution is good in the case that those rows are "useless" to every query (and if it's viable, you can update other queries that make use of those rows).
Index the column.

This is a classic use of table partitioning, but we don't know what type of SQL you're using so we don't know if it supports it. Add a tag for the version of SQL (SQL Server? Oracle?)

Sql Server 2008 partition table based on insert date

My question is about table partitioning in SQL Server 2008.
I have a program that loads data into a table every 10 mins or so. Approx 40 million rows per day.
The data is bcp'ed into the table and needs to be able to be loaded very quickly.
I would like to partition this table based on the date the data is inserted into the table. Each partition would contain the data loaded in one particular day.
The table should hold the last 50 days of data, so every night I need to drop any partitions older than 50 days.
I would like to have a process that aggregates data loaded into the current partition every hour into some aggregation tables. The summary will only ever run on the latest partition (since all other partitions will already be summarised) so it is important it is partitioned on insert_date.
Generally when querying the data, the insert date is specified (or multiple insert dates). The detailed data is queried by drilling down from the summarised data and as this is summarised based on insert date, the insert date is always specified when querying the detailed data in the partitioned table.
Can I create a default column in the table "Insert_date" that gets a value of Getdate() and then partition on this somehow?
OR
I can create a column in the table "insert_date" and put a hard coded value of today's date.
What would the partition function look like?
Would seperate tables and a partitioned view be better suited?

I have tried both, and even though I think partition tables are cooler. But after trying to teach how to maintain the code afterwards it just wasten't justified. In that scenario we used a hard coded field date field that was in the insert statement.
Now I use different tables ( 31 days / 31 tables ) + aggrigation table and there is an ugly union all query that joins togeather the monthly data.
Advantage. Super timple sql, and simple c# code for bcp and nobody has complained about complexity.
But if you have the infrastructure and a gaggle of .net / sql gurus I would choose the partitioning strategy.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Splitting table into two parts: old data and recent data - sql

Related

How do I get the last update time of a sequence of tables in BigQuery?

See the number of new rows added to PostGRES tables each day

H2 incrementally update counts from another table?

How to use sql to ignore a part of the table

Sql Server 2008 partition table based on insert date

Categories

Resources