Best way to use group by in Oracle - sql

I am doing a group by on columns CARD_NO, MAX(EXPIRE_DATE) which contains 11,910,317 records in the table.
What is the best way I could get this done.
Below is the query which I used, but it takes quite a long time to return results.
SELECT CARD_NO,
MAX(expire_date) EXPIRE_DATE
FROM LCT_CARD_ISSUE11
GROUP BY CARD_NO

Related

SQL Server slow query to get average

I normally work with MySQL databases, and I am currently encountering some issues on a query towards a SQL Server database.
I'm trying to get the average of a column, grouped by day. This takes anywhere from 20-30 seconds, even if its just returning a few hundred rows.
The table however contains a couple million entries. I'm sure this has got something to do with the indexing properties, but I just can't seem to figure out the correct solution here.
So the query goes like:
select
[unit_id],
avg(weight) AS avg,
max(timestamp) AS dateDay
from
[measurements]
where
timestamp BETWEEN '2017-06-01' AND '2017-10-04'
group by
[unit_id], CAST(timestamp AS DATE)
order by
[unit_id] asc, [dateDay] asc
I have set up a nonclustered index containing the unit_id, weight and timestamp fields.
This is your query:
select unit_id, avg(weight) AS avg, max(timestamp) AS dateDay
from measurements m
where timestamp BETWEEN '2017-06-01' AND '2017-10-04'
group by unit_id, CAST(timestamp AS DATE)
order by unit_id asc, dateDay asc;
Under reasonable assumptions about your data, it is going to have similar performance in either MySQL or SQL Server. Your WHERE is not highly selective. Because of the inequality, SQL Server cannot make use of an index for the GROUP BY.
An index on measurements(timestamp, unit_id, weight) might benefit the query on either database. There might be some fancy ways to get SQL Server to improve the performance. But both it and MySQL will need to take the rows matching the WHERE clause and aggregate them (using a hash-based algorithm in all likelihood in SQL Server and using a filesort in MySQL).
The problem is likely the CAST in the group by. Though you don't say it explicitly, I'm assuming Timestamp is a DateTime value, which is why you CAST to Date in the group by clause. The problem is that the calculated value produced by CAST isn't indexed.
If it's your system, and this query is something done frequently, I'd add a new column of type Date to store just the day, and index that. If you can't, select out the values in the date range you're interested in, with the date casted to Date, into a temp table or CTE, then group by the date.
Or, even try this, just to pull the CAST out of the Group By clause:
select
[unit_id],
avg(weight) AS avg,
dateDay
from (
select [unit_id],
CAST(timestamp as Date) [dateDay],
weight
from [measurements]
where
timestamp BETWEEN '2017-06-01' AND '2017-10-04'
) x
group by
x.[unit_id], x.[dateDay]
order by
x.[unit_id] asc, x.[dateDay] asc

SQL query to get single row value from an aggregate

I have an Oracle table with two columns ID and START_DATE, I want to run a query to get the ID of the record with the most recent date, initially i wrote this:
select id from (select * from mytable order by start_date desc) where rownum = 1
Is there a more cleaner and efficient way of doing this? I often run into this pattern in SQL and end up creating a nested query.
SELECT id FROM mytable WHERE start_date = (SELECT MAX(start_date) FROM mytable)
Still a nested query, but more straightforward and also, in my experience, more standard.
This looks to be a pretty clean and efficient solution to me - I don't think you can get any better than that, of course assuming that you've an index on start_date. If you want all ids for the latest start date then froadie's solution is better.

Max and Min Time query

how to show max time in first row and min time in second row for access using vb6
What about:
SELECT time_value
FROM (SELECT MIN(time_column) AS time_value FROM SomeTable
UNION
SELECT MAX(time_column) AS time_value FROM SomeTable
)
ORDER BY time_value DESC;
That should do the job unless there are no rows in SomeTable (or your DBMS does not support the notation).
Simplifying per suggestion in comments - thanks!
SELECT MIN(time_column) AS time_value FROM SomeTable
UNION
SELECT MAX(time_column) AS time_value FROM SomeTable
ORDER BY time_value DESC;
If you can get two values from one query, you may improve the performance of the query using:
SELECT MIN(time_column) AS min_time,
MAX(time_column) AS max_time
FROM SomeTable;
A really good optimizer might be able to deal with both halves of the UNION version in one pass over the data (or index), but it is quite easy to imagine an optimizer tackling each half of the UNION separately and processing the data twice. If there is no index on the time column to speed things up, that could involve two table scans, which would be much slower than a single table scan for the two-value, one-row query (if the table is big enough for such things to matter).

Query to get the duration and details from a table

I have a scenario and not quite sure how to query it. As a sample, I have following table structure and want to get the history of the action for bus:
ID-----TIME---------BUSID----OPID----MOVING----STOPPED----PARKED----COUNT
1------10:10:10-----101------1101-----1---------0----------0---------15
2------10:10:11-----102------1102-----0---------1----------0---------5
3------10:11:10-----101------1101-----1---------0----------0---------15
4------10:12:10-----101------1101-----0---------1----------0---------15
5------10:13:10-----101------1101-----1---------0----------0---------19
6------10:14:10-----101------1101-----1---------0----------0---------19
7------10:15:10-----101------1101-----0---------1----------0---------19
8------10:16:10-----101------1101-----0---------0----------1---------0
9------10:17:10-----101------1101-----0---------0----------1---------0
I want to write a query to get the status of a bus like:
BUSID----OPID----STATUS-----TIME---------DURATION---COUNT
101------1101----MOVING-----10:10:10-----2-----------15
101------1101----STOPPED----10:12:10-----1-----------15
101------1101----MOVING-----10:13:10-----2-----------19
101------1101----STOPPED----10:15:10-----1-----------19
101------1101----PARKED-----10:16:10-----2-----------0
I am using SQL Server 2008.
Thanks for your help.
You can use Common Table Expressions to calculate the duration between the different rows.
WITH cte_log AS
(
SELECT
Row_Number()
OVER
(
ORDER BY time DESC
)
AS
id, time, busid, opid, moving, stopped, parked, count
FROM
log_table
WHERE
busid = 101
)
SELECT
current_rows.busid,
current_rows.opid,
current_rows.time,
DATEDIFF(second, current_rows.time, previous_rows.time) AS duration
current_rows.count
FROM
cte_log_position AS current_rows
LEFT OUTER JOIN
log_table AS previous_rows ON ((current_rows.row_id + 1) = previous_rows.row_id)
WHERE
current_rows.busid = 101
ORDER BY
current_rows.time DESC;
The WITH statement creates a temporary result set that is defined within the execution scope of this query. We are using it to fetch the previous records of each row and to calculate the time difference between the the current and the previous record.
This example was not tested, and it may not work perfectly, but I hope it gets you going in the correct direction. Feel free to leave feedback.
You may also want to check the following external links on how to use Common Table Expressions:
SQL Select Next Row and SQL Select Previous Row with Current Row using T-SQL CTE
Calculate Difference between current and previous rows... CTE and Row_Number() rocks!
4 Guys From Rolla: Common Table Expressions (CTE) in SQL Server 2005
MSDN: Using Common Table Expressions
personally i would denormalize the data so you have start_time and end_time in the one row. this will make the query much more efficient.
I don't have access to SQL Server at the moment, so there may be syntax errors in the following:
SELECT
BUSID,
OPID,
IF (MOVING = 1) 'MOVING' ELSE IF (STOPPED = 1) 'STOPPED' ELSE 'PARKED' AS STATUS
TIME,
COUNT
FROM BUS_DATA_TABLE
GROUP BY BUSID
ORDER BY TIME
You'll note that this does not include duration. Until you order your data, you don't know which is the previous entry. Once the data is ordered you can calculate the duration as the difference between the times in consecutive records. You could do this by SELECTing into a new table and then running a second query.
Grouping by BUSID, should give you your report for all buses.
Making certain assumptions about column type, etc:
SELECT
BUSID,
OPID,
STATUS,
TIME,
DURATION,
COUNT
FROM
TABLENAME
WHERE
BUSID = 1O1
ORDER BY
TIME
;

Aggregate functions in WHERE clause in SQLite

Simply put, I have a table with, among other things, a column for timestamps. I want to get the row with the most recent (i.e. greatest value) timestamp. Currently I'm doing this:
SELECT * FROM table ORDER BY timestamp DESC LIMIT 1
But I'd much rather do something like this:
SELECT * FROM table WHERE timestamp=max(timestamp)
However, SQLite rejects this query:
SQL error: misuse of aggregate function max()
The documentation confirms this behavior (bottom of page):
Aggregate functions may only be used in a SELECT statement.
My question is: is it possible to write a query to get the row with the greatest timestamp without ordering the select and limiting the number of returned rows to 1? This seems like it should be possible, but I guess my SQL-fu isn't up to snuff.
SELECT * from foo where timestamp = (select max(timestamp) from foo)
or, if SQLite insists on treating subselects as sets,
SELECT * from foo where timestamp in (select max(timestamp) from foo)
There are many ways to skin a cat.
If you have an Identity Column that has an auto-increment functionality, a faster query would result if you return the last record by ID, due to the indexing of the column, unless of course you wish to put an index on the timestamp column.
SELECT * FROM TABLE ORDER BY ID DESC LIMIT 1
I think I've answered this question 5 times in the past week now, but I'm too tired to find a link to one of those right now, so here it is again...
SELECT
*
FROM
table T1
LEFT OUTER JOIN table T2 ON
T2.timestamp > T1.timestamp
WHERE
T2.timestamp IS NULL
You're basically looking for the row where no other row matches that is later than it.
NOTE: As pointed out in the comments, this method will not perform as well in this kind of situation. It will usually work better (for SQL Server at least) in situations where you want the last row for each customer (as an example).
you can simply do
SELECT *, max(timestamp) FROM table
Edit:
As aggregate function can't be used like this so it gives error. I guess what SquareCog had suggested was the best thing to do
SELECT * FROM table WHERE timestamp = (select max(timestamp) from table)