I have a fairly large table in which one of the columns is a date column. The query I execute is as follows.
select max(date) from tbl where date < to_date('10/01/2010','MM/DD/YYYY')
That is, I want to find the cell value closest to and less than a particular date value. This takes considerable time because of the max on the large table. Is there a faster way to do this? maybe using LAST_VALUE?
Put an index on the date column and the query should be plenty fast.
1) Add an index to the date column. Simply put, an index allows the database engine to store information about the data so it will speed up most queries where that column is one of the clauses. Info here http://docs.oracle.com/cd/B28359_01/server.111/b28310/indexes003.htm
2) Consider adding a second clause to the query. You have where date < to_date('10/01/2010','MM/DD/YYYY') now, why not change it to:
where date < to_date('10/01/2010','MM/DD/YYYY') and date > to_date('09/30/2010', 'MM/DD/YYYY')
since this will reduce the number of scanned rows.
Try
select date from (
select date from tbl where date < to_date('10/01/2010','MM/DD/YYYY') order by date desc
) where rownum = 1
Related
I have a table with some columns and a date column (that i made a partition with)
For example
[Amount, Date ]
[4 , 2020-4-1]
[3 , 2020-4-2]
[5 , 2020-4-4]
I want to get the latest Amount based on the Date.
I thought about doing a LIMIT 1 with ORDER BY, but, is that optimized by BigQuery or it will scan my entire table?
I want to avoid costs at all possible, I thought about doing a query based on the date today, and if nothing found search for yesterday, but I don't know how to do it in only one query.
Below is for BigQuery Standard SQL
#standardSQL
SELECT ARRAY_AGG(amount ORDER BY `date` DESC LIMIT 1)[SAFE_OFFSET(0)]
FROM `project.dataset.table`
WHERE `date` >= DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)
Note: above assumes your date field is of DATE data type.
If your date field is a partition, you can use it in WHERE clause to filter which partitions should be read in your query.
In your case, you could do something like:
SELECT value
FROM <your-table>
WHERE Date >= DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)
ORDER BY Data DESC
LIMIT 1
This query basically will:
Filter only today's and yesterday's partitions
Order the rows by your Date field, from the most recent to the older
Select the first element of the ordered list
If the table has a row with today's date, the query will return the data for today. If it dont't, the query will return the data for yesterday.
Finally, I would like to attach here this reference regarding querying partitioned tables.
I hope it helps
The LIMIT order stops the query whet it gets the amount of results indicated.
I think the query should be something like this, I'm not sure if "today()-1" returns
SELECT Amount
FROM <table> as t
WHERE date(t.Date) = current_date()
OR date(t.Date) = makedate(year(current_date()), dayofyear(current_date())-1);
Edited: Sorry, my answer is for MariaDB I now see you ask for Google-BigQuery which I didn't even know, but it looks like SQL, I hope it has some functions like the ones I posted.
i have this table:
COD (Integer) (PK)
ID (Varchar)
DATE (Date)
I just want to get the new ID's from today, compared with yesterday (the ID's from today that are not present yesterday)
This needs to be done with just one query, maximum efficiency because the table will have 4-5 millions records
As a java developer i am able to do this with 2 queries, but with just one is beyond my knowledge so any help would be so much appreciated
EDIT: date format is dd/mm/yyyy and every day each ID may come 0 or 1 times
Here is a solution that will go over the base data one time only. It selects the id and the date where the date is either yesterday or today (or both). Then it GROUPS BY id - each group will have either one or two rows. Then it filters by the condition that the MIN date in the group is "today". Those are the id's that exist today but did not exist yesterday.
DATE is an Oracle keyword, best not used as a column name. I changed that to DT. I also assume that your "dt" field is a pure date (as pure as it can be in Oracle, meaning: time of day, which is always present, is 00:00:00).
select id
from your_table
where dt in (trunc(sysdate), trunc(sysdate) - 1)
group by id
having min(dt) = trunc(sysdate)
;
Edit: Gordon makes a good point: perhaps you may have more than one such row per ID, in the same day? In that case the time-of-day may also be different from 00:00:00.
If so, the solution can be adapted:
select id
from your_table
where dt >= trunc(sysdate) - 1 and dt < trunc(sysdate) + 1
group by id
having min(dt) >= trunc(sysdate)
;
Either way: (1) the base table is read just once; (2) the column DT is not wrapped within any function, so if there is an index on that column, it can be used to access just the needed rows.
The typical method would use not exists:
select t.*
from t
where t.date >= trunc(sysdate) and t.date < trunc(sysdate + 1) and
not exists (select 1
from t t2
where t2.id = t.id and
t2.date >= trunc(sysdate - 1) and t2.date < trunc(sysdate)
);
This is a general solution. If you know that there is at most one record per day, there are better solutions, such as using lag().
Use MINUS. I suppose your date column has a time part, so you need to truncate it.
select id from mytable where trunc(date) = trunc(sysdate)
minus
select id from mytable where trunc(date) = trunc(sysdate) - 1;
I suggest the following function index. Without it, the query would have to full scan the table, which would probably be quite slow.
create idx on mytable( trunc(sysdate) , id );
I have a normal SQLite database table called table1 with 7 columns and of course a rowid. The first column is an custom_id number, the second is date in format YYYY-MM-DD and other 5 are real number data columns. There are about 10M rows in the database, and custom_id and date columns have indices.
What I want to do is to speed up the following query:
SELECT date,max(data1) AS maximum
FROM table1
WHERE custom_id = '1123' AND data1 <> 'NaN'
GROUP BY strftime('%Y-%m', date)
I want to find the maximum correct (not NaN) data1 value for the custom_id 1123 for each year-month-combination. The code above works actually fine, but the query lasts 10 seconds in the first run, but the second time it takes under 1 second, which is OK for me. I run the query in my home PC Apache server with PHP. I think Apache uses some caching which explains the difference.
But the question is, how to speed up the first time run performance? I have many other custom_id:s to query, not all can be cached! Do I need more indices? Another kind of query?
We are going to create an index that will support the following operations:
Retrieve the records of a specific customer
aggregate by month
Creating the following index is not possible since strftime is not a deterministic function
create index table1_ix on table1 (custom_id,strftime('%Y-%m', date));
non-deterministic functions prohibited in index expressions
So instead of strftime('%Y-%m', date) we are going to use substr(date,1,7)
create index table1_ix on table1 (custom_id,substr(date,1,7));
The query should be changed accordingly
select substr(date,1,7), max(data1) as maximum
from table1
where custom_id = '1123'
and data1 <> 'NaN'
group by substr(date,1,7)
I am guessing this is what you intend:
SELECT strftime('%Y-%m', date), max(data1) AS maximum
FROM table1
WHERE custom_id = 1123 AND data <> 'NaN'
GROUP BY strftime('%Y-%m', date)
Start with an index on table1(custom_id, date).
SELECT FILE_SUB_RET_DATE_TIME
FROM
(SELECT Y.FILE_SUB_RET_DATE_TIME,
ROW_NUMBER() OVER (partition by Y.WR_FILE_TRANS_INFO_ID order by Y.FILE_SUB_RET_DATE_TIME DESC) rowByID
FROM DPDBA.WORK_REQUEST_FILE_TRANS_AUDIT Y
WHERE Y.FILE_EVENT_TYPE = 'SUBMISSION'
AND Y.FILE_SUBMT_RETRL_STATUS = 'LEVEL1 POSTED'
AND Y.FILE_SUB_RET_DATE_TIME BETWEEN '11-DEC-2015' AND '03-FEB-2017')
WHERE rowByID = 1;
I got some performance issue and we need to add the index for this date column and i am looking for help whether its going to be straight index or any thing more than that..
You should not use STRINGS when you compare with DATE values, because it depends on current session NLS-Settings. Use DATE literal or TO_DATE(), functions (resp. TIMESTAMP and TO_TIMESTAMP).
It depends on your data whether Oracle will use an index on FILE_SUB_RET_DATE_TIME column, post the execution plan.
I don't think subquery is required in your case, this query should return the same result.
SELECT Max(FILE_SUB_RET_DATE_TIME)
FROM DPDBA.WORK_REQUEST_FILE_TRANS_AUDIT Y
WHERE Y.FILE_EVENT_TYPE = 'SUBMISSION'
AND Y.FILE_SUBMT_RETRL_STATUS = 'LEVEL1 POSTED'
AND Y.FILE_SUB_RET_DATE_TIME BETWEEN DATE '2015-12-11' AND DATE '2017-02-03'
GROUP BY WR_FILE_TRANS_INFO_ID;
I have a table (lets call it AAA) containing 3 colums ID,DateFrom,DateTo
I want to write a query to return all the records that contain (even 1 day) within the period DateFrom-DateTo of a specific year (eg 2016).
I am using SQL Server 2005
Thank you
Another way is this:
SELECT <columns list>
FROM AAA
WHERE DateFrom <= '2016-12-31' AND DateTo >= '2016-01-01'
If you have an index on DateFrom and DateTo, this query allows Sql-Server to use that index, unlike the query in Max xaM's answer.
On a small table you will probably see no difference but on a large one there can be a big performance hit using that query, since Sql-Server can't use an index if the column in the where clause is inside a function
Try this:
SELECT * FROM AAA
WHERE DATEPART(YEAR,DateFrom)=2016 OR DATEPART(YEAR,DateTo)=2016
Well you can use the following query
select * from Table1
WHERE DateDiff(day,DateFrom,DateTo)>0
AND YEAR(DateFrom) = YEAR(DateTo)
And here is the result:
Enjoy :D !