How to modify my Schema or Query, so it runs effectively? - sql

In a interview I was asked " You have a table with huge data but there is a requirement to view the rows that have been added in the last 15 minutes. How do you do this effectively without having to query the whole table as it takes so long".
I have said that I will create a view to have the latest 1000 records(Here I am assuming that there were less than a 1000 records cretaed in last 15 min) and I would query the view rather than the entire table. The interviewer was okay but he said there is a better approach and I am not able to find it.

You just need to make an index for the created_at column, this will avoid scanning the whole table and will considerably improve performance

You can try something like this:
SELECT columns
FROM table
WHERE DATE_ADD(last_seen, INTERVAL 15 MINUTE) >= NOW();
Or:
select *
from my_table
where my_column > timestamp '2020-10-09 00:00:05' - numtodsinterval(15,'MINUTE')
I am not sure aboutthe db you are using.
This goes 15 minutes back from last seen.

Related

Oracle SQL - Subquery Works fine, However Create Table with that subquery appears to hang

I have the following query structure
CREATE TABLE <Table Name> AS
(
SELECT .... FROM ...
)
When i run the SELECT statement on its own, this compiles and returns the results within seconds. however when I run that with the CREATE Table Statement it takes hours to the point where I believe it has hung and will never compile.
What is the reason for this? and what could a work around be?
Oracle Database 12c <12.1.0.2.0>
If you ran that SELECT in some GUI, note that most (if not all) of them return only a few hundred rows, not the whole result set. For example: if your query really returns 20 million rows, GUI displays the first 50 (or 500, depending on tool you use) rows which is kind of confusing - just like it confused you.
If you used current query as an inline view, e.g.
select count(*)
from
(select ... from ...) --> this is your current query
it would "force" Oracle to fetch all rows, so you'd see how long it actually takes.
Apart from that, see if SELECT can be optimized, e.g.
see whether columns used in WHERE clause are indexed
collect statistics for all involved tables (used in the FROM clause)
remove ORDER BY clause (if there's any; it is irrelevant in CTAS operation)
check explain plan
Performance Tuning is far more from what I've suggested; those are just a few suggestions you might want to look at.
Have you tried Direct Load insert by first creating the table using CTAS where 1= 2and then doing the insert. This will atleast tell us if anything is wrong in data(corrupt data) or if it is a performance issue.
I had the same problem before since the new data is too large (7 million rows) and it took me 3 hours to execute the code.
My best suggestion is to create a view since it took less space instead of a new table.
So the answer to this one.
CREATE TABLE <Table Name> AS
(
SELECT foo
FROM baa
LEFT JOIN
( SELECT foo FROM baa WHERE DATES BETWEEN SYSDATE AND SYSDATE - 100 )
WHERE DATES_1 BETWEEN SYSDATE - 10 AND SYSDATE - 100
)
The problem was that the BETWEEN statements did not match the same time period and the sub query was looking at more data than the main query (I guess this was causing a full scan over the tables?)
The below query has the matching between statement time period and this returned the results in less than 3 minutes.
CREATE TABLE <Table Name> AS
(
SELECT foo FROM baa
LEFT JOIN ( SELECT foo FROM baa WHERE DATES BETWEEN SYSDATE - 10 AND SYSDATE - 100 )
WHERE DATES_1 BETWEEN SYSDATE - 10 AND SYSDATE - 100
)

Improve the performance of a query on a view which references external tables

I have a view which looks like this:
CREATE VIEW My_View AS
SELECT * FROM My_Table UNION
SELECT * FROM My_External_Table
What I have found is that performance is very slow when ordering the data which I need to do for pagination. For example the following query takes almost 2 minutes despite only returning 20 rows:
SELECT * FROM My_View
ORDER BY My_Column
OFFSET 20 ROWS FETCH NEXT 20 ROWS ONLY
In contrast the following (useless) query takes less than 2 seconds:
SELECT * FROM My_View
ORDER BY GETDATE()
OFFSET 20 ROWS FETCH NEXT 20 ROWS ONLY
I cannot add indexes to the view as it is not SCHEMABOUND and I cannot make it SCHEMABOUND as it references an external table.
Is there any way I can improve the performance of the query or otherwise get the desired result. All the databases involved are AzureSQL.
If all items are unique in My_table and My_external_table using OUTER UNION would help you to improve the performance.
And adding an index to table would help to run your query faster.
You can't really get around the order by so I don't think there is anything you can do.
I'm a bit surprised the order by getdate() works, because ordering by a constant does not usually work. I imagine it is equivalent to order by (select null) and no ordering takes place.
My recommendation? You probably need to replicate the external table on the local system and have a process to create a new local table. That sounds complicated, but you may be able to do it using a materialized view. However this works with the "external" table depends on what you mean by "external".
Note that you will also want an index on my_column to avoid the sort.

What is the fastest way to perform a date query in Oracle SQL?

We have a 6B row table that is giving us challenges when retrieving data.
Our query returns values instantly when doing a...
SELECT * WHERE Event_Code = 102225120
That type of instant result is exactly what we need. We now want to filter to receive values for just a particular year - but the moment we add...
AND EXTRACT(YEAR FROM PERFORMED_DATE_TIME) = 2017
...the query takes over 10 minutes to begin returning any values.
Another SO post mentions that indexes don't necessarily help date queries when pulling many rows as opposed to an individual row. There are other approaches like using TRUNC, or BETWEEN, or specifying the datetime in YYYY-MM-DD format for doing comparisons.
Of note, we do not have the option to add indexes to the database as it is a vendor's database.
What is the way to add a date filtering query and enable Oracle to begin streaming the results back in the fastest way possible?
Another SO post mentions that indexes don't necessarily help date queries when pulling many rows as opposed to an individual row
That question is quite different from yours. Firstly, your statement above applies to any data type, not only dates. Also the word many is relative to the number of records in the table. If the optimizer decides that the query will return many of all records in your table, then it may decide that a full scan of the table is faster than using the index. In your situation, this translates to how many records are in 2017 out of all records in the table? This calculation gives you the cardinality of your query which then gives you an idea if an index will be faster or not.
Now, if you decide that an index will be faster, based on the above, the next step is to know how to build your index. In order for the optimizer to use the index, it must match the condition that you're using. You are not comparing dates in your query, you are only comparing the year part. So an index on the date column will not be used by this query. You need to create an index on the year part, so use the same condition to create the index.
we do not have the option to add indexes to the database as it is a vendor's database.
If you cannot modify the database, there is no way to optimize your query. You need to talk to the vendor and get access to modify the database or ask them to add the index for you.
A function can also cause slowness for the number of records involved. Not sure if Function Based Index can help you for this, but you can try.
Had you tried to add a year column in the table? If not, try to add a year column and update it using code below.
UPDATE table
SET year = EXTRACT(YEAR FROM PERFORMED_DATE_TIME);
This will take time though.
But after this, you can run the query below.
SELECT *
FROM table
WHERE Event_Code = 102225120 AND year = 2017;
Also, try considering Table Partitioned for this big data. For starters, see link below,
link: https://oracle-base.com/articles/8i/partitioned-tables-and-indexes
Your question is a bit ambiguous IMHO:
but the moment we add...
AND EXTRACT(YEAR FROM PERFORMED_DATE_TIME) = 2017
...the query takes over 10 minutes to begin returning any values.
Do you mean that
SELECT * WHERE Event_Code = 102225120
is fast, but
SELECT * WHERE Event_Code = 102225120 AND EXTRACT(YEAR FROM PERFORMED_DATE_TIME) = 2017
is slow???
For starters I'll agree with Mitch Wheat that you should try to use PERFORMED_DATE_TIME between Jan 1, 2017 and Dec 31, 2017 instead of Year(field) = 2017. Even if you'd have an index on the field, the latter would hardly be able to make use of it while the first method would benefit enormously.
I'm also hoping you want to be more specific than just 'give me all of 2017' because returning over 1B rows is NEVER going to be fast.
Next, if you can't make changes to the database, would you be able to maintain a 'shadow' in another database? This would require that you create a table with all date-values AND the PK of the original table in another database and query those to find the relevant PK values and then JOIN those back to your original table to find whatever you need. The biggest problem with this would be that you need to keep the shadow in sync with the original table. If you know the original table only changes overnight, you could merge the changes in the morning and query all day. If the application is 'real-time(ish)' then this probably won't work without some clever thinking... And yes, your initial load of 6B values will be rather heavy =)
May this could be usefull (because you avoid functions (a cause for context switching) and if you have an index on your date field, it could be used) :
with
dt as
(
select
to_date('01/01/2017', 'DD/MM/YYYY') as d1,
to_date('31/01/2017', 'DD/MM/YYYY') as d2
from dual
),
dates as
(
select
dt.d1 + rownum -1 as d
from dt
connect by dt.d1 + rownum -1 <= dt.d2
)
select *
from your_table, dates
where dates.d = PERFORMED_DATE_TIME
Move the date literal to RHS:
AND PERFORMED_DATE_TIME >= date '2017-01-01'
AND PERFORMED_DATE_TIME < date '2018-01-01'
But without an (undisclosed) appropriate index on PERFORMED_DATE_TIME, the query is unlikely to be any faster.
One option to create indexes in third party databases is to script in the index and then before any vendor upgrade run a script to remove any indexes you've added. If the index is important, ask the vendor to add it to their database design.

How do I avoid table a full table scan when converting date to timestamp in POSTGRES query?

I am querying a postgres database table that I only have read access to and cannot modify in any way. Long story short I need to run a query daily to pull records over several days. I have been going in each day and physically modifying the timestamp parameters. The table is about 40 million records and I am running pass through queries from a sql server to the linked server.
There is an index on the c_stime field which is a time stamp that I am trying to take advantage of, but when I perform a function over the field I kill that advantage. Here are some of my results:
select * from p_table where c_stime >= '2013-09-24 00:00:00.0000000'
select * from p_table where c_stime >= current_date::timestamp
select * from p_table where c_stime >= current_date+ interval '0 hour'
First one runs in 10 seconds, second runs in 74 seconds, third runs in 88 seconds. Want something dynamic like the second/third with performance close to the first. Any help appreciated
First, check your query plans. There may be surprises there.
Secondly, if the problem is as you say, I am surprised that this wouldn't have worked. However you can use the pg_catalog.* functions to convert types and these are immutable (allowing the function to be run before the query is planned), so you could try:
select * from p_table where c_stime >= pg_catalog.timestamp(current_date);

SQL - Optimize date calculation for large table

Can this query below be optimized?
select
max(date), sysdate - max(date)
from
table;
Query execution time ~5.7 seconds
I have another approach
select
date, sysdate - date
from
(select * from table order by date desc)
where
rownum = 1;
Query execution ~7.9 seconds
In this particular case, table has around 17,000,000 entries.
Is there a more optimal way to rewrite this?
Update: Well, I tried the hint a few of you suggested in a database development, although with a smaller subset than the original (approximately 1,000,000 records). Without the index the queries runs slower than with the index.
The first query, without index: ~0.56 secs, with index: ~0.2 secs. The second query, without index: ~0.41 secs, with index: ~0.005 secs. (This surprised me, I thought the first query would run faster than the second, maybe it's more suitable for smaller set of records).
I suggested to DBA this solution and he will change the table structure to accommodate this, and then i will test it with the actual data. Thanks
Is there an index on the date column?
That query is simple enough that there's likely nothing that can be done to optimize it beyond adding an index on the date column. What database is this? And is sysdate another column of the table?