I want to find which tables/columns in Redshift remain unused in the database in order to do a clean-up.
I have been trying to parse the queries from the stl_query table, but it turns out this is a quite complex task for which I haven't found any library that I can use.
Anyone knows if this is somehow possible?
Thank you!
The column question is a tricky one. For table use information I'd look at stl_scan which records info about every table scan step performed by the system. Each of these is date-stamped so you will know when the table was "used". Just remember that system logging tables are pruned periodically and the data will go back for only a few days. So may need a process to view table use daily to get extended history.
I ponder the column question some more. One thought is that query ids will also be provided in stl_scan and this could help in identifying the columns used in the query text. For every query id that scans table_A search the query text for each column name of the table. Wouldn't be perfect but a start.
How can I reorder rows in oracle sql database? I am using a 3rd party database-driven software and unfortunately I cannot change the call (or I would just add an order by), but can I change the order of the rows in the database?
Disclaimer: I know I should never depend on raw database order, and if it changes I understand, but can this be done?
Thank you!
You can't. If a query doesn't specify an ORDER BY, the order in which rows are returned is undefined.
If you are willing to accept a less-than-100% solution, you could try moving the data to a temporary table, truncate the table, and then insert the data back in the order you want it to appear. If the query is doing a table scan or some type of index scans and not doing anything complicated (like a join), it's likely that the rows would be returned in the order they are physically stored in the table. No guarantees, of course, but it might work most of the time.
In an MS Access 2010 application, I use this SQL statement:
SELECT myTable.field1, myTable.field2, ...
INTO temporaryTable
FROM myTable
ORDER BY myTable.field4, myTable.field3
The order of the records in temporaryTable often are not according to the definition in the ORDER clause, neither to the order in temporaryTable.
For some time now, I have tried ordering and copying tables There and Back Again to have the order clear and fixed, but it doesn't help. It also seems to be a phantom, sometimes it works, sometimes not. So I'll have to write a transparent but slow VBA workaround.
Does anybody know sth about this, is it a bug, and what is the best workaround? Did I miss a parameter to set?
Thanks in advance :-)
The standard response to this type of question is:
You should never depend on the rows in a table being in any particular "natural" order. This is true for most (if not all) databases, not just Access. In other words SELECT * FROM something (or equivalent) without an ORDER BY clause means that the rows can be returned in any order. In fact, such a statement may not necessarily return the rows in the same order for each invocation if you execute it more than once (although Access does tend to be fairly consistent about it).
If you need to export records to Excel in a certain order (as mentioned in comments to the question) then you should create a saved query in Access that includes an ORDER BY clause and then export the query to Excel.
If you really need a temp table and an specific order, try using DELETE + APPEND queries instead of a MAKE TABLE query. In your "not so temp" table, you will then be able to define a Primary Key and/or a default sort order.
i found a in a table there are 50 thousands records and it takes one minute when we fetch data from sql server table just by issuing a sql. there are one primary key that means a already a cluster index is there. i just do not understand why it takes one minute. beside index what are the ways out there to optimize a table to get the data faster. in this situation what i need to do for faster response. also tell me how we can write always a optimize sql. please tell me all the steps in detail for optimization.
thanks.
The fastest way to optimize indexes in table is to use SQL Server Tuning Advisor. Take a look http://www.youtube.com/watch?v=gjT8wL92mqE <-- here
Select only the columns you need, rather than select *. If your table has some large columns e.g. OLE types or other binary data (maybe used for storing images etc) then you may be transferring vastly more data off disk and over the network than you need.
As others have said, an index is no help to you when you are selecting all rows (no where clause). Using an index would be slower in such cases because of the index read and table lookup for each row, vs full table scan.
If you are running select * from employee (as per question comment) then no amount of indexing will help you. It's an "Every column for every row" query: there is no magic for this.
Adding a WHERE won't help usually for select * query too.
What you can check is index and statistics maintenance. Do you do any? Here's a Google search
Or change how you use the data...
Edit:
Why a WHERE clause usually won't help...
If you add a WHERE that is not the PK..
you'll still need to scan the table unless you add an index on the searched column
then you'll need a key/bookmark lookup unless you make it covering
with SELECT * you need to add all columns to the index to make it covering
for a many hits, the index will probably be ignored to avoid key/bookmark lookups.
Unless there is a network issue or such, the issue is reading all columns not lack of WHERE
If you did SELECT col13 FROM MyTable and had an index on col13, the index will probably be used.
A SELECT * FROM MyTable WHERE DateCol < '20090101' with an index on DateCol but matched 40% of the table, it will probably be ignored or you'd have expensive key/bookmark lookups
Irrespective of the merits of returning the whole table to your application that does sound an unexpectedly long time to retrieve just 50000 rows of employee data.
Does your query have an ORDER BY or is it literally just select * from employee?
What is the definition of the employee table? Does it contain any particularly wide columns? Are you storing binary data such as their CVs or employee photo in it?
How are you issuing the SQL and retrieving the results?
What isolation level are your select statements running at (You can use SQL Profiler to check this)
Are you encountering blocking? Does adding NOLOCK to the query speed things up dramatically?
In a certain app I must constantly query data that are likely to be amongst the last inserted rows. Since this table is going to grow a lot, I wonder if theres a standard way of optimizing the queries by making them start the lookup at the table's end. I think I would get the same optmization if the database stored data for the table in a stack-like structure, so the last inserted rows would be searched first.
The SQL spec doesn't mention anything about maintaining the insertion order. In practice, most of decent DB's also doesn't maintain it. Then it stops here. Sorting the table first ain't going to make it faster. Just index the column(s) of interest (at least the ones which you use in the WHERE).
One of the "tenets" of a proper RDBMS is that this kind of matters shouldn't concern you or anyone else using the DB.
The DB engine is "free" to use whatever method it wants to store/retrieve records, so if you want to enforce a "top" behaviour do what other suggested: add a timestamp field to the table (or tables), add an index on it and query using it as a sort and/or query criteria (e.g.: you poll the table each minute, and ask for records with timestamp>=systime-1 minute)
There is no standard way.
In some databases you can specify the sort order on an index.
SQL Server allows you to write ASC or DESC on an index:
[ ASC | DESC ]
Determines the ascending or descending sort direction for the particular index column. The default is ASC.
In MySQL you can also write ASC or DESC when you create the index but currently this is ignored. It might be implemented in a future version.
Add a counter or a time field in your table, sort on it and get top rows.
In other words: You should forget the idea that SQL tables are accessed in any particular order by default. A seqscan does not mean the oldest rows will be searched first, only that all rows will be checked. If you want to optimize some search you add indexes on some fields. What you are looking for is probably indexes.
If your data is indexed, it won't matter. The index is doing a binary search, not a sequential scan.
Unless you're doing TOP 1 (or something like it), the SELECT will have to scan the whole table or index anyway.
According to Data Independence you shouldn't care. That said a clustered index would probably suit your needs if you typically look for a date range. (sorting acs/desc shouldn't matter but you should try it out.)
If you find that you really need it you can also shard your database to increase perf on the most recently added data.
If you have enough rows that its actually becomming a problem, and you know how many "the most recently inserted rows" should be, you could try a round-about method.
Note: Even for pretty big tables, this is less efficient, but once your main table gets big enough, I've seen this work wonders for user-facing performance.
Create a "staging" table that exactly mimics your table's structure. Whenever you insert into your main table, also insert into your "staging" area. Limit your "staging" area to n rows by using a trigger to delete the lowest id row in the table when a new row over your arbitrary maximum is reached (say, 10,000 or whatever your limit is).
Then, queries can hit that smaller table first looking for the information. Since the table is arbitrarilly limited to the last n rows, it's only looking in the most recent data. Only if that fails to find a match would your query (actually, at this point a stored procedure because of the decision making) hit your main table.
Some Gotchas:
1) Make sure your trigger(s) is(are) set up properly to maintain the correct concurrancy between your "main" and "staging" tables.
2) This can quickly become a maintenance nightmare if not handled properly- and depending on your scenario it be be a little finiky.
3) I cannot stress enough that this is only efficient/useful in very specific scenarios. If yours doesn't match it, use one of the other answers.
ISO/ANSI Standard SQL does not consider optimization at all. For example the widely recognized CREATE INDEX SQL DDL does not appear in the Standard. This is because the Standard makes no assumptions about the underlying storage medium and nor should it. I regularly use SQL to query data in text files and Excel spreadsheets, neither of which have any concept of database indexes.
You can't do this.
However, there is a way to do something that might be even better. Depending on the design of your table, you should be able to create an index that keeps things in almost the order of entry. For example, if you adopt the common practice of creating an id field that autoincrements, then that index is just about in chronological order.
Some RDBMSes permit you to declare a backwards index, that is, one that descends instead of ascending. If you create a backwards index on the ID field, and if the optimizer uses that index, it will look at the most recent entries first. This will give you a rapid response for the first row.
The next step is to get the optimizer to use the index. You need to use explain plan to see if the index is being used. If you ask for the rows in order of id descending, the optimizer will almost certainly use the backwards index. If not you may be able to use hints to guide the optimizer.
If you still need to avoid reading all the rows in order to avoid wasting time, you may be able to use the LIMIT feature to declare that you only want, say 10 rows, and no more, or 1 row and no more. That should do it.
Good luck.
If your table has a create date, then I'd reverse sort by that and take the top 1.