How to know which attributes are used on GROUP_BY in Rails

How to know which attributes are used on GROUP_BY in Rails - ruby-on-rails-3

I am improving the database tables by adding indexes where appropriate.
I'd like to add indexes to attributes that are used on GROUP_BY queries.
Is there a way to automagically know which fields were used by GROUP_BY?

I don't see that there's a correlation between GROUP BY performance and the presence of indexes. If you have a query that runs slowly then I'd look to see indexes on the columns you have a predicate or a join on first of all.

Related

Is there a way to create indices on a Teradata SQL View?

Is it possible to add indices to a View in Teradata? Aiming to make querying of Views faster by adding indices.
Tried using SQL to check for indexes on existing Views by using SELECT * FROM DBC.Indices. Yet there are only results for Tables, none for Views.
I have also been checked internet, but have so far been unable to find anything so far for Teradata.
I would have expected to be able to find an index on some of the existing Views, if it was possible.

An index is always associated with a table (or with multiple tables in case of the Join index), never with a view.
But: the execution plan of a query (and consequently its performance) depends, among other things, on the indices defined on the tables involved in the query.
So while you can't create an index on a view, you can create indices on the underlying tables, and it will change how the queries referring the view are executed.
However, before starting to create additional indices in the hope of fixing your performance problem, you should first inspect the execution plan of the problematic queries and determine what the correct plan ought to look like. The problem might not be the lack of indices, but rather lack of up-to-date statistics, poor queries, or bad table design (e.g. wrong PI or poor partitioning).

How to set the right indexes on a sql table?

How can I identify the indexes that are worth to set on a sql table?
Take the following as an example:
select *
from products
where name = 'car'
and type = 'vehicle'
and availability > 3
and insertion_date > '2015-10-10'
order by price asc
limit 1
Imagine a database with a few million entries.
Would there be benefits if I set an index on the combination of all attributes that occur in the WHERE and ORDER BY clause?
For the example:
create index i_my_idx on products
(name, type, availability, insertion_date, price)

There are a few rules of thumb that can be useful when deciding which columns to index:
Make sure there's a unique index on the primary key - this is done automatically when you specify a PK in most RDBMSs including postgresql.
Add indexes for each foreign key. These are created automatically in some RDBMSs when you specify a FK but not in postgresql.
If a PK is a compound key, consider adding indexes on each FK making up the PK (except for the first, which is covered by the PK index). As in 2, some RDBMSs (e.g. MySQL with ISAM) add these indexes automatically when the FKs are specified.
Usually, but not always, table joins in queries will be PF to FK and by having indexes on both keys, the query optimizer of the RDBMS has flexibility in determining the optimum plan for maximum performance. This won't always be the best though, and experienced programmers will often format the SQL for a database query to influence the execution plan for best performance, or decide to omit indexes they know are not needed. It's worth noting that an SQL query that is optimal on one RDBMS is not necessarily optimal on another, or on future versions of the DB server, or as the database grows. The latter is important as in some RDBMSs such as postgres and Oracle, the query execution plans are dependent on the data in the tables (this is known as cost-based optimisation).
Once you've got these out of the way a lot comes down to experience and a knowledge of your data, and importantly, how the data is going to be accessed.
Generally you will be looking to index those columns which are best at filtering the data. In your query above, the obvious one is name. This might be enough to make that query run fast enough (unless all your products are cars).
Other than that it's worth making a list of the common ways the data is likely to be accessed e.g.
Get a list of products that are in a category - an index on category will probably help
However, get a list of products that are currently available - an index on availability will probably not help because a large proportion of products are likely to satisfy this condition.
Unless you are dealing with large amounts of data this can often be all you need to do, and it's not generally a good idea to add indexes "just in case" as there are overheads in maintaining them. But if your system does has performance issues, then it's worth considering how combinations of columns are being used in queries, reading up about the postgres query optimizer etc.
And to answer your last question - possibly, but it's far from the first thing to consider.

Well the way you are setting indexes is absolutely correct. Indexes has nothing to do with order by clause.
Some important points while designing SQL query
Always put the condition first in WHERE clause which will filter maximum rows for eg above query name ='car' will filter maximum records in products.
Do not use ">=" use ">" only because greater or equal to will always end up in checking greater first if failed equals as well which will reduce performance of query.
Create a single index in same order your where clause is arranged in.
Try minimizing IN clause use ANY instead.
Thanks
Anant

Is it better to do multiple selects from multiple tables or 1 select of all your data from all the tables?

I have multiple tables that data can be queried from with joins.
In regards to database performance:
Should I run multiple selects from multiple tables for the required data?
or
Should I write 1 select that uses a bunch of Joins to select the required data from all the tables at once?
EDIT:
The where clause I will be using for the select contains Indexed fields of the tables. It sounds like because of this, it will be faster to use 1 select statement with many joins. I will however still test the performance difference between the 2.
Thanks for all the great answers.

Just write one query with joins. If you are concerned about performance there are a number of options including:
Creating indexes that will help the performance of your selects
Create a persisted denormalized form of the data you want so you can query one table. This would most likely be an indexed view or another table.

This can be one of those, well-gee-it-depends, but generally if you're writing straight SQL do one query--especially since the joins might limit some of the data you get back.
There is a good chance if you do multiple point queries for one record in each table, if you're using the primary key of the table for lookup, the connection cost for each query will be more costly than the actual query.

It depends on how the tables are joined. If you do a cross-product of all tables than it would be better to do individual selects. However if your tables are properly indexed and well thought out one query with multiple selects would be more efficient.

If you have proper indexes on your tables, you might be better off with the JOINs but they are often the cause of bottlenecks. Instead of multiple selects, you might look at ways to de-normalize your data. It is far less "expensive" when a user performs an operation to update a count or timestamp in multiple tables which prevents you from having to join those tables.
The best tool I find for performance tuning of queries is using EXPLAIN. You type EXPLAIN before the query and you can see how many rows are scanned. Your goal is the lower the number the better, which means your indexes are working. The other thing is when creating indexes, use compound indexes on multiple fields and order them left to right in the order they appear in the WHERE clause.
For example you have 10,000 rows in sometable:
SELECT id, name, description, status FROM sometable WHERE name LIKE '%someName%' AND status = 'Active';
You could type EXPLAIN before the query and it might return 10,000 as number of rows scanned to match. You then create a compound index:
ALTER TABLE sometable ADD INDEX idx_st_search (name, status);
You then perform the EXPLAIN on table again and it might return 1 as number of rows scanned and performance significantly improved.

Depends on your Table designs.
Most of times one large query is better but be sure to
Use primary keys in where clause as much as you can for joins.
use indexed fields or make indexes for fields which are used in where clauses.

Is it a good practice to make an index on every column that's used on a WHERE condition in a SQL database?

The question says it all. I've ever put indexes for columns that I used on WHERE statements for optimization and help sites scale nicely. I'm talking with my co-worker who says it's best to not put those indexes and leave place for optimization when needed.
What do you think it's the best practice here?

The answer, as always, is "it depends".
If the WHERE clause uses a column in such a way that indexing is broken, then why bother? You'll want to rewrite those, if possible.
Indexes have to be calculated when you INSERT, so there's a cost to be weighed against querying. If you read mostly, then indexing might make sense. If your database is heavily transactional, indexes will slow down INSERTs. Especially bulk uploads.

I think the best practice here is to put a few indexes in initially, as best-guesses for what indexes will be needed. But after that, you want to actually measure which queries are slow and index those. Maybe your where clauses, or even your entire queries will change as requirements change.
This could be as easy as using something that aggregates query time over the course of a day, like pgfouine.

I'd have to say no: It's not a good practice to simply index every column because it happens to appear in a WHERE clause.
To start with, if you have two columns in a particular WHERE clause, you might have a decision as to whether to index both in the same index and which one to be the first column. Just with a single column, the choice of ASCENDING or DESCENDING in an index could be important. When the same table participates in many queries, and lots of columns in WHERE clauses, do you want to have multitudes of indexes with all these columns in various combinations and orders just because the columns appear in a WHERE clause? No.
I would say that it is a good practice to design your indexes taking into account which columns are used in a WHERE clause, but ultimately, columns which may not appear in the WHERE clause but appear in a JOIN may be more significant to most of your indexes. You can certainly design some indexes with inspection, but in general, you are going to want to actually profile your processes and see which indexes are actually useful to the bulk of the workload.

No. It depends on the selectivity of the column. For example, there is no use indexing a column EMPLOYEE.GENDER. Probably not for COLLEGE_STUDENT.YEAR_IN_SCHOOL_STATUS (4 likely values) either.
If there are a few rare values scattered with one or two common values, you could have a partial index.
I would definitely index any field used in a query where no value is in more than 10% of the rows.

SQL Relationships and indexes

I have an MS SQL server application where I have defined my relationships and primary keys.
However do I need to further define indexes on relationship fields which are sometimes not used in joins and just as part of a where clause?
I am working on the assumption that defining a relationship creates an index, which the sql engine can reuse.

Some very thick books have been written on this subject!
Here are some ruiles of thumb:-
Dont bother indexing (apart from PK) any table with < 1000 rows.
Otherwise index all your FKs.
Examine your SQL and look for the where clauses that will most reduce your result sets and index that columun.
eg. given:
SELECT OWNER FROM CARS WHERE COLOUR = 'RED' AND MANUFACTURER = "BMW" AND ECAP = "2.0";
You may have 5000 red cars out of 20,000 so indexing this wont help much.
However you may only have 100 BMWs so indexing MANUFACURER will immediatly reduce you result set to 100 and you can eliminate the the blue and white cars by simply scanning through the hundred rows.
Generally the dbms will pick one or two of the indexes available based on cardinality so it pays to second guess and define only those indexes that are likely to be used.

No indexes will be automatically created on foreign keys constraint. But unique and primary key constraints will create theirs.
Creating indexes on the queries you use, be it on joins or on the WHERE clause is the way to go.

Like everything in the programming world, it depends. You obviously want to create indexes and relationships to preserve normalization and speed up database lookups. But you also want to balance that by not having too many indexes that it will take SQL Server more time to build every index. Also the more indexes you have the more fragmentation that can occur in your database.
So what I do is put in the obvious indexes and relationships and then optimize after the application is build on the possible slow queries.

Defining a relationship does not create the index.
Usually in places where you have a where clause against some field you want an index but be careful not to just throw indexes out all over the place because they can and do have an effect on insert/update performance.

I would start by making sure that every PK and FK has an index.
Further to that, I have found that using the Index Tuning Wizard in SSMS provides excellent recommendations when you feed it the right information.

Database Considerations
When you design an index, consider the following database guidelines:
Large numbers of indexes on a table affect the performance of INSERT, UPDATE, DELETE, and MERGE statements because all indexes must be adjusted appropriately as data in the table changes.
Avoid over-indexing heavily updated tables and keep indexes narrow,
that is, with as few columns as possible.
Use many indexes to improve query performance on tables with low
update requirements, but large volumes of data. Large numbers of
indexes can help the performance of queries that do not modify data,
such as SELECT statements, because the query optimizer has more
indexes to choose from to determine the fastest access method.
Indexing small tables may not be optimal because it can take the
query optimizer longer to traverse the index searching for data than
to perform a simple table scan. Therefore, indexes on small tables
might never be used, but must still be maintained as data in the
table changes.
Indexes on views can provide significant performance gains when the
view contains aggregations, table joins, or a combination of
aggregations and joins. The view does not have to be explicitly
referenced in the query for the query optimizer to use it.
--Stay_Safe--

Indexes aren't very expensive, and speed up queries more than you realize. I would recommend adding indexes to all key and non-key fields that are often used in queries. You can even use the execution plan to recommend additional indexes that would speed up your queries.
The only point where indexes aren't in your favour is when you're doing large amounts of data inserts. Each insert requires each index in a table to be updated along with the table's data.
You can opt to wait until the application is running and you have some known queries against the database that you want to improve, or you could do it now, if you have a good idea.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas