How can I see the number of new rows added to each of my database's tables in the past day?
Example result:
table_name new_rows
---------- -----------
users 32
questions 150
answers 98
...
I'm not seeing any table that stores this information in PostGRES statistics collector: http://www.postgresql.org/docs/9.1/static/monitoring-stats.html
The only solution I can think of, is create a database table that stores the row_count of each table at midnight each day.
Edit: I need this to work with any table, regardless of whether it has a "created_at" or other timestamp column. Many of the tables I would like to see the growth rate in, do not have timestamps columns & can't have one added.
The easiest way is to add a column in your table that keep a track of the insert/updated date.
Then to retrieve the rows, you can do a simple select for the last day.
From my knowledge, and I've also done a couple research to make sure, there is no intern functionality that allow you to do that without creating a field.
Related
I have a table that I created with a unique key and each other column representing one day of December 2014 (eg named D20141226 for data from 26/12/2014). So the table consists of 32 columns (key + 31 days). These daily columns are indicating that a customer had a transaction on that specific day or no transaction is indicated by a 0.
Now I want to execute the same query on a daily basis, producing a list of unique keys that had a transaction on that specific day. I used this easy script:
CREATE TABLE C01012015 AS
SELECT DISTINCT CALLING_ISDN AS A_PARTY
FROM CDRICC_012015
WHERE CALL_STA_TIME ::date = '2015-01-01'
Now my question is, how can I add the content of the new daily table to the existing table with the 31 days, making it effectively a table with 32 days of data (and then continue to do so on a daily basis to store up to 360 days of data)?
Please note that new customer are doing transactions every day hence there will unique keys in the daily table that aren't in the big table holding all the previous days.
It would be ideal if those new rows would automatically get a 0 instead of a NULL but I can work around it if it gets a NULL value (not sure how to make sure it gets a 0 instead).
I thought that a FULL OUTER JOIN would be the solution but that would mean that I have to list all variables in the select statement, which becomes quite large as I add one more column each day. Is there a more elegant way to do this?
Or is SQL just not suited to this and a programming language like eg R would be much better at this?
If you have the option to change your schema completely, you should unpivot your table so that your columns are something like CUSTOMER_ID INTEGER, D DATE, DID_TRANSACTION BOOLEAN. There's a post on the Enzee Community website that suggests using a user-defined table function (UDTF) to do this. If you change your schema in this way, a simple insert will work just fine and there will be no need to add columns dynamically.
If you can't change your schema that much but you're still able to add columns, you could add a column for every day of the year up front with a default value of FALSE (assuming it's a boolean column representing whether the customer had a transaction or not on that day). You probably want to script this.
ALTER TABLE table_with_daily_columns MODIFY COLUMN (D20140101 BOOLEAN DEFAULT FALSE);
ALTER TABLE table_with_daily_columns MODIFY COLUMN (D20140102 BOOLEAN DEFAULT FALSE);
-- etc
ALTER TABLE table_with_daily_columns ADD COLUMN (D20150101 BOOLEAN DEFAULT FALSE);
GROOM TABLE table_with_daily_columns;
When you alter a table like this, Netezza creates a new table and an internal view that does a UNION of the new table and the old. You need to GROOM the table to merge the tables back into a single one for improved performance.
If you really must keep one column per day, then you'll have to use the method you described to pivot the data from your daily transaction table. Set the default value for each of your columns to 0 or FALSE as described above, then:
INSERT INTO table_with_daily_columns
SELECT
cust_id,
TRUE as D20150101
FROM C01012015;
My question is about table partitioning in SQL Server 2008.
I have a program that loads data into a table every 10 mins or so. Approx 40 million rows per day.
The data is bcp'ed into the table and needs to be able to be loaded very quickly.
I would like to partition this table based on the date the data is inserted into the table. Each partition would contain the data loaded in one particular day.
The table should hold the last 50 days of data, so every night I need to drop any partitions older than 50 days.
I would like to have a process that aggregates data loaded into the current partition every hour into some aggregation tables. The summary will only ever run on the latest partition (since all other partitions will already be summarised) so it is important it is partitioned on insert_date.
Generally when querying the data, the insert date is specified (or multiple insert dates). The detailed data is queried by drilling down from the summarised data and as this is summarised based on insert date, the insert date is always specified when querying the detailed data in the partitioned table.
Can I create a default column in the table "Insert_date" that gets a value of Getdate() and then partition on this somehow?
OR
I can create a column in the table "insert_date" and put a hard coded value of today's date.
What would the partition function look like?
Would seperate tables and a partitioned view be better suited?
I have tried both, and even though I think partition tables are cooler. But after trying to teach how to maintain the code afterwards it just wasten't justified. In that scenario we used a hard coded field date field that was in the insert statement.
Now I use different tables ( 31 days / 31 tables ) + aggrigation table and there is an ugly union all query that joins togeather the monthly data.
Advantage. Super timple sql, and simple c# code for bcp and nobody has complained about complexity.
But if you have the infrastructure and a gaggle of .net / sql gurus I would choose the partitioning strategy.
I have a database with a column that I want to query the amount of times it has changed over a period of time. For example, I have the username, user's level, and date. How do I query this database to see the number of times the user's level has changed over x amount of years?
(I've looked in other posts on stackoverflow, and they're telling me to use triggers. But in my situation, I want to query the database for the number of changes that has been made. If my question can't be answered, please tell me what other columns might I need to look into to figure this out. Am I supposed to use Lag for this? )
A database will not inherently capture this information for you. Two suggestions would be to either store your data as a time series so instead of updating the value you add a new row to a table as the new current value and expire the old value. The other alternative would be to just add a new column for tracking the number of updates to the column you care about. This could be done in code or in a trigger.
Have you ever heard of the LOG term ?
You have to create a new table, in wich you will store your wanted changes.
I can imagine this solution for the table:
id - int, primary key, auto increment
table - the table name where the info has been changed
table_id - the information unique id from the table where changes
have been made
year - integer
month - integer
day - integer
knowin this, you can count everything
In case you are already keeping track of the level history by adding a new row with a different level and date every time a user changes level:
SELECT username, COUNT(date) - 1 AS changes
FROM table_name
WHERE date >= '2011-01-01'
GROUP BY username
That will give you the number of changes since Jan 1, 2011. Note that I'm subtracting 1 from the COUNT. That's because a user with a single row on your table has never changed levels, that row represents the user's initial level.
I'm using one of my MySQL database tables as an actual table, with times of the day as each column, and one column called day. You guessed it, in day it says the day of the week, and in the rest of the cells it says what is happening at that time.
What I want to do is only show the cells that have value in it. In my case, I'm always going to have all the rows and 2 columns full. The 2 columns are 'day' and '19:00', however in the future I might add values for '18:00' etc.
So, how can I only SELECT the columns and rows which have data in them? Some type of 'WHERE: there is data'?
Thanks!
EDIT: Picture
Having time or day as columns means that you have data in your field names. Data belongs inside the table, so you should normalise the database:
table Calendar
--------------
Day
TimeOfDay
Appointment
This way you don't get a lot of empty fields in the table, and you don't have to change the database design to add another time of day.
Now you can easily fetch only the times that exist:
select Day, TimeOfDay, Appointment from Calendar
From what I gathere you are looking something along the lines of
WHERE col1 IS NOT NULL
But it would be helpful if you could elaborate more on your schema, especially if you could draw a sample table.
I have a table as below
dbo.UserLogs
-------------------------------------
Id | UserId |Date | Name| P1 | Dirty
-------------------------------------
There can be several records per userId[even in millions]
I have clustered index on Date column and query this table very frequently in time ranges.
The column 'Dirty' is non-nullable and can take either 0 or 1 only so I have no indexes on 'Dirty'
I have several millions of records in this table and in one particular case in my application i need to query this table to get all UserId that have at least one record that is marked dirty.
I tried this query - select distinct(UserId) from UserLogs where Dirty=1
I have 10 million records in total and this takes like 10min to run and i want this to run much faster than this.
[i am able to query this table on date column in less than a minute.]
Any comments/suggestion are welcome.
my env
64bit,sybase15.0.3,Linux
my suggestion would be to reduce the amount of data that needs to be queried by "archiving" log entries to an archive table in suitable intervals.
You can still access all entries if you provide a union-view over current and archived log data, but accessing current logs would be much reduced.
Add an index containing both the UserId and Dirty fields. Put UserId before Dirty in the index as it has more unique values.