I normally work with MySQL databases, and I am currently encountering some issues on a query towards a SQL Server database.
I'm trying to get the average of a column, grouped by day. This takes anywhere from 20-30 seconds, even if its just returning a few hundred rows.
The table however contains a couple million entries. I'm sure this has got something to do with the indexing properties, but I just can't seem to figure out the correct solution here.
So the query goes like:
select
[unit_id],
avg(weight) AS avg,
max(timestamp) AS dateDay
from
[measurements]
where
timestamp BETWEEN '2017-06-01' AND '2017-10-04'
group by
[unit_id], CAST(timestamp AS DATE)
order by
[unit_id] asc, [dateDay] asc
I have set up a nonclustered index containing the unit_id, weight and timestamp fields.
This is your query:
select unit_id, avg(weight) AS avg, max(timestamp) AS dateDay
from measurements m
where timestamp BETWEEN '2017-06-01' AND '2017-10-04'
group by unit_id, CAST(timestamp AS DATE)
order by unit_id asc, dateDay asc;
Under reasonable assumptions about your data, it is going to have similar performance in either MySQL or SQL Server. Your WHERE is not highly selective. Because of the inequality, SQL Server cannot make use of an index for the GROUP BY.
An index on measurements(timestamp, unit_id, weight) might benefit the query on either database. There might be some fancy ways to get SQL Server to improve the performance. But both it and MySQL will need to take the rows matching the WHERE clause and aggregate them (using a hash-based algorithm in all likelihood in SQL Server and using a filesort in MySQL).
The problem is likely the CAST in the group by. Though you don't say it explicitly, I'm assuming Timestamp is a DateTime value, which is why you CAST to Date in the group by clause. The problem is that the calculated value produced by CAST isn't indexed.
If it's your system, and this query is something done frequently, I'd add a new column of type Date to store just the day, and index that. If you can't, select out the values in the date range you're interested in, with the date casted to Date, into a temp table or CTE, then group by the date.
Or, even try this, just to pull the CAST out of the Group By clause:
select
[unit_id],
avg(weight) AS avg,
dateDay
from (
select [unit_id],
CAST(timestamp as Date) [dateDay],
weight
from [measurements]
where
timestamp BETWEEN '2017-06-01' AND '2017-10-04'
) x
group by
x.[unit_id], x.[dateDay]
order by
x.[unit_id] asc, x.[dateDay] asc
Related
I want to be able to count number of rows inserted in a table per second using SQL database. The count has to be for all the rows in the table. Sometimes there could be 100 rows and others 10 etc so this is just for stats. I managed to count rows per day but need more details. Any advise or any scripts would be appreciated
Thanks
If you truncate the datetime column to the second.
Then you can aggregate on it, to get totals per second.
For example:
SELECT
CAST(dt AS DATE) as [Date],
MIN(Total) as MinRecordsPerSec,
MAX(Total) as MaxRecordsPerSec,
AVG(Total) as AverageRecordsPerSec
FROM
(
SELECT
CONVERT(datetime, CONVERT(char(19), YourDatetimeColumn, 120), 120) as dt,
COUNT(*) AS Total
FROM YourTable
GROUP BY CONVERT(char(19), YourDatetimeColumn, 120)
) q
GROUP BY CAST(dt AS DATE)
ORDER BY 1;
Well it depends on language you are using, the way to do this would be to fetch your DB and change date column to timestamp, then group them by each stamp as you would know each timestamp is per second.
OR
Alternatively, you can store timestamps in DB instead of actual date the it will be easy to query from DB.
OR
Use this function 'UNIX_TIMESTAMP()' in mysql to get timestamp of column then you can do whatever and whichever comparison you want to do on it
https://dev.mysql.com/doc/refman/5.5/en/date-and-time-functions.html#function_unix-timestamp
Hope this gives you an idea.
I am doing a group by on columns CARD_NO, MAX(EXPIRE_DATE) which contains 11,910,317 records in the table.
What is the best way I could get this done.
Below is the query which I used, but it takes quite a long time to return results.
SELECT CARD_NO,
MAX(expire_date) EXPIRE_DATE
FROM LCT_CARD_ISSUE11
GROUP BY CARD_NO
I have really complicated query:
select * from (
select * from tbl_user ...
where ...
and date_created between :date_from and :today
...
order by date_created desc
) where rownum <=50;
Currently query is fast enough because of where clause (only 3 month before today, date_from = today - 90 days).
I have to remove this clause, but it causes performance degradation.
What if first calculate date_from by `
SELECT MIN(date_created) where...
and then insert this value into main query? Set of data will be the same. Will it improve performance? Does it make sense?
Could anyone have any assumption about optimization?
Using an order by operation will of course cause the query to take a little longer to return. That being said, it is almost always faster to sort in the DB than it is to sort in your application logic.
It's hard to really optimize without the full query and schema information, but I'll take a stab at what seems like the most obvious to me.
Converting to Rank()
Your query could be a lot more efficient if you use a windowed rank() function. I've also converted it to use a common table expression (aka CTE). This doesn't improve performance, but does make it easier to read.
with cte as (
select
*
, rank() over (
partition by
-- insert what fields differentiate your rows here
-- unlike a group by clause, this doesn't need to be
-- every field
order by
date_created desc
)
from
tbl_user
...
where
...
and date_created between :date_from and :today
)
select
*
from
cte
where
rk <= 50
Indexing
If date_created is not indexed, it probably should be.
Take a look at your autotrace results. Figure out what filters have the highest cost. These are probably unindexed, and maybe should be.
If you post your schema, I'd be happy to make better suggestions.
I often have to write queries with a fairly complex, constructed column that I will aggregate by. For example:
SELECT
EXTRACT(week FROM to_timestamp("Date Created"/1000)) AS week
...
I know that you cannot use aliases in the GROUP BY clause (this Why doesn't Oracle SQL allow us to use column aliases in GROUP BY clauses? question explains logically why), but is there anything else I can do other than re-doing the column calculation, or am I stuck with this:
SELECT COUNT(*), EXTRACT(week FROM to_timestamp("Date Created"/1000)) AS week
FROM mytable
GROUP BY EXTRACT(week FROM to_timestamp("Date Created"/1000))
Often, I break complexity by using sub-queries.
select count(*), week
from
(
SELECT EXTRACT(week FROM to_timestamp("Date Created"/1000)) AS week
FROM mytable
) sel
GROUP BY week
Divide and conquer approach has paid off pretty well so far.
Update
Alternatives to solving this issue:
Computed columns (as #gbn stated in his answer).
Pros:
You can declare a column that's pretty much used in most queries
Some RDBMs allow you to create an index over a computed column (pretty important for performance)
Cons:
not all RDBMs provide computed columns
You might end up declaring a column that's used in one very specific query (out of the thousands of queries you have in your system). Someday, this query will have changed and the column will just sit there...
CTEs
I think that you can do this:
SELECT COUNT(*), week
FROM ( SELECT *, EXTRACT(week FROM to_timestamp("Date Created"/1000)) AS week
FROM mytable) MT
GROUP BY week
Derived tables
SELECT foo FROM (SELECT 1+1 AS foo FROM ...) WHERE foo = ...
Computed columns (not all RDBMS)
ALTER TABLE someTable ADD WeekPart AS WEEK(SomeDate)
how to show max time in first row and min time in second row for access using vb6
What about:
SELECT time_value
FROM (SELECT MIN(time_column) AS time_value FROM SomeTable
UNION
SELECT MAX(time_column) AS time_value FROM SomeTable
)
ORDER BY time_value DESC;
That should do the job unless there are no rows in SomeTable (or your DBMS does not support the notation).
Simplifying per suggestion in comments - thanks!
SELECT MIN(time_column) AS time_value FROM SomeTable
UNION
SELECT MAX(time_column) AS time_value FROM SomeTable
ORDER BY time_value DESC;
If you can get two values from one query, you may improve the performance of the query using:
SELECT MIN(time_column) AS min_time,
MAX(time_column) AS max_time
FROM SomeTable;
A really good optimizer might be able to deal with both halves of the UNION version in one pass over the data (or index), but it is quite easy to imagine an optimizer tackling each half of the UNION separately and processing the data twice. If there is no index on the time column to speed things up, that could involve two table scans, which would be much slower than a single table scan for the two-value, one-row query (if the table is big enough for such things to matter).