I have a table as below
dbo.UserLogs
-------------------------------------
Id | UserId |Date | Name| P1 | Dirty
-------------------------------------
There can be several records per userId[even in millions]
I have clustered index on Date column and query this table very frequently in time ranges.
The column 'Dirty' is non-nullable and can take either 0 or 1 only so I have no indexes on 'Dirty'
I have several millions of records in this table and in one particular case in my application i need to query this table to get all UserId that have at least one record that is marked dirty.
I tried this query - select distinct(UserId) from UserLogs where Dirty=1
I have 10 million records in total and this takes like 10min to run and i want this to run much faster than this.
[i am able to query this table on date column in less than a minute.]
Any comments/suggestion are welcome.
my env
64bit,sybase15.0.3,Linux
my suggestion would be to reduce the amount of data that needs to be queried by "archiving" log entries to an archive table in suitable intervals.
You can still access all entries if you provide a union-view over current and archived log data, but accessing current logs would be much reduced.
Add an index containing both the UserId and Dirty fields. Put UserId before Dirty in the index as it has more unique values.
Related
We have a table of 627 columns and approx 850 000 records.
We are trying to retrieve only two columns and dump that data in new table, but the query is taking endless time and we are unable to get the result in new table.
create table test_sample
as
select roll_no, date_of_birth from sample_1;
We have unique index on roll_no column (varchar) and data type for date_of_birth is date.
Your query has no WHERE clause, so it scans the full table. It reads all the columns of every row into memory to extract the columns it needs to satisfy your query. This will take a long time because your table has 627 columns, and I'll bet some of them are pretty wide.
Additionally, a table with that many columns may give you problems with migrated rows or chaining. The impact of that will depend on the relative position of roll_no and date_of_birth in the table's projection.
In short, a table with 627 columns shows poor (non-existent) data modelling. Which doesn't help you now, it's just a lesson to be learned.
If this is a one-off exercise you'll just need to let the query run. (Although you should check whether it is running at all: can you see active progress in V$SESSION_LONGOPS?)
I have a table which has an id and a date. (id, date) make up the composite key for the table.
What I am trying to do is delete all entries older than a specific date.
delete from my_table where date < '2018-12-12'
The query plan explains that it will do a sequential scan for the date column.
I somehow want to make use of the index present since the number of distinct ids are very very small compared to total rows in the table.
How do I do it ? I have tried searching for it but to no avail
In case your use-case involves data-archival on monthly basis or some time period, you can think of updating your DataBase table to use partitions.
Let's say you collect data on monthly basis and want to keep data for the last 5 months. It would be really efficient to create partition over the table based on month of the year.
This will,
optimise your READ queries (table scans will reduce to partition scans)
optimise your DELETE requests (just delete the complete partition)
You need an index on date for this query:
create index idx_mytable_date on mytable(date);
Alternatively, you can drop your existing index and add a new one with (date, id). date needs to be the first key for this query.
How can I see the number of new rows added to each of my database's tables in the past day?
Example result:
table_name new_rows
---------- -----------
users 32
questions 150
answers 98
...
I'm not seeing any table that stores this information in PostGRES statistics collector: http://www.postgresql.org/docs/9.1/static/monitoring-stats.html
The only solution I can think of, is create a database table that stores the row_count of each table at midnight each day.
Edit: I need this to work with any table, regardless of whether it has a "created_at" or other timestamp column. Many of the tables I would like to see the growth rate in, do not have timestamps columns & can't have one added.
The easiest way is to add a column in your table that keep a track of the insert/updated date.
Then to retrieve the rows, you can do a simple select for the last day.
From my knowledge, and I've also done a couple research to make sure, there is no intern functionality that allow you to do that without creating a field.
I have a table MyTable with multiple int columns with date and one column containing a date. The date column has an index created like follows
CREATE INDEX some_index_name ON MyTable(my_date_column)
because the table will often be queried for its contents between a user-specified date range. The table has no foreign keys pointing to it, nor have any other indexes other than the primary key which is an auto-incrementing index filled by a sequence/trigger.
Now, the issue I have is that the data on this table is often replaced for a given time period because it was out of date. So they way it is updated is by deleting all the entries within a given time period and inserting the new ones. The delete is performed using
DELETE FROM MyTable
WHERE my_date_column >= initialDate
AND my_date_column < endDate
However, because the number of rows deleted is massive (from 5 million to 12 million rows) the program pretty much blocks during the delete.
Is there something I can disable to make the operation faster? Or maybe specify an option in the index to make it faster? I read something about redo space having to do with this but I don't know how to disable it during an operation.
EDIT: The process runs every day and it deletes the last 5 days of data, then it brings the data for those 5 days (which may have changed in the external source) and reinserts the data.
The amount of data deleted is a tiny fraction compared to the whole amount of data in the table ( < 1%). So copying the data I want to keep into another table and dropping-recreating the table may not be the best solution.
I can only think of two ways to speed up this.
if you do this on a regular basis, you should consider partitioning your table by month. Then you just drop the partition of the month you want to delete. That is basically as fast as dropping a table. Partitioning requires an enterprise license if I'm not mistaken
create a new table with the data you want to keep (using create table new_table as select ...), drop the old table and rename the interims table. This will be much faster, but has the drawback that you need to re-create all indexes and (primary, foreign key) constraints on the new table.
My table has the following fields:
Date (integer)
State (integer)
ProductId (integer)
ProductName (integer)
Description (text) (maximum text lenght
3000 characters)
There will be more than 8 million rows. I need to decide whether I should put the product description in another table. My main goal is to have this statement very fast:
SELECT Date,State,ProductId,ProductName FROM tablename ORDER BY DATE desc LIMIT 100
The SQL result will not fetch the Description field value in the above statement. The user will see the description only when the row is selected in the application (new query).
I would really want to have the product Description in the same table, but I'm not sure how SQLite scans the rows. If Date value doesn't match I would assume that SQLite can quickly skip to the next row. Or maybe it needs to scan all fields of the row till it gets to the end of the Description field value in order to know that the row has ended? If it needs to scan all fields to get to the next row will the value of 3000 characters in the Description field decrease the speed a lot?
EDIT: No indexing should be used since INSERT speed is important.
EDIT: The only reason of trying to have it all in one table is that I want to do INSERTs and UPDATEs in one transaction of hundreds of items. The same item could be inserted and later updated in the same transaction, so I can not know the last insert id per item.
When you use that query and do not have an index on the Date column, SQLite will read all records from the table, and use a temporary table to sort the result.
When you have an index on the Date column, SQLite will look up the last 100 records in the index, then read all the data of those records from the table.
When you have a covering index, i.e., one index with the four columns Date, State, ProductId, and ProductName, SQLite will just read the last 100 entries from the index.
Whenever SQLite reads from the database file, it does not read values or records, but entire pages (typically, 1 KB or 4 KB).
In case 1, SQLite will read all pages of the table.
In case 2, SQLite will read the last page of the index (because the 100 dates will fit into one page), and 100 pages of the table (one for each record, assuming that no two of these records happen to be in the same page).
In case 3, SQLite will read the last few pages of the index.
Case 2 will be much faster than case 1; case 3 will be faster still, but probably not enough to be noticeable.
I would suggest to rely on good old database normalization rules, in this case specifically 1NF. If that Description (same goes for the ProductName) is going to be repeated, you have a database design issue, and it being in SQLite or other has little to do with it. CL is right with his indexes, mind you, proper indexing will still matter.
Review your model, make a table for products and another for inventory.