I have a table called COMPUTED_DATA. It contains 3 columns: data, cluster, and last_execution.
There is job which runs every 2 weeks which inserts multiple data for a cluster and its last_execution time.
My requirement is to get the data for a cluster for its most recent last_execution time. Currently I have written query like this in my code
last_execution = SELECT distinct(last_execution) FROM COMPUTED_DATA
WHERE cluster=1204 AND ORDER BY last_execution DESC limit 1
The above query gets me the most recent last_execution
data = SELECT data FROM COMPUTED_DATA WHERE cluster=1204 AND
last_execution={last_execution}
This query uses that last_execution to get the data.
My question is can this be combined into just 1 query. I am running this in my spark cluster so each SQL query is very time expensive.
Hence I want to combine this into one query. Please help.
EDIT: the second query where I am getting data from returns multiple rows. Hence I cannot use limit on the second query because that count is unknown.
Yes it can
SELECT
data
FROM
COMPUTED_DATA
WHERE
cluster = 1204 and last_execution=(SELECT distinct(last_execution) FROM COMPUTED_DATA
WHERE cluster=1204 AND ORDER BY last_execution DESC limit 1)
This isn't the most beautiful way to write this, but you get idea how to use subquery in where clause.
Related
In my SQLite database, in each table, there is a sync_id column. I regularly want to retrieve the maximum sync_id for each table. Here's what I tried first:
SELECT
MAX(answer.sync_id),
MAX(community.sync_id),
MAX(question.sync_id),
MAX(topic.sync_id)
FROM
answer,
community,
question,
topic;
This query took forever, I actually never got to the end of it.
Here's what I tried next:
SELECT "answer" AS name, MAX(answer.sync_id) AS max_sync_id FROM answer
UNION SELECT "community" AS name, MAX(community.sync_id) AS max_sync_id FROM community
UNION SELECT "question" AS name, MAX(question.sync_id) AS max_sync_id FROM question
UNION SELECT "topic" AS name, MAX(topic.sync_id) AS max_sync_id FROM topic;
This one is blazingly fast and gives me the results I expected.
I have 2 questions about this:
Why are the 2 queries so different? I'm guessing there's some SQL semantics that I'm not getting, some kind of implicit JOIN...
The 1st query returns the maximums as one row, with columns named after the tables. The 2nd query returns 1 maximum per row, and I had to create a name column to keep the context. Is there a way I could get the result set of the 1st query, with the speed of the 2nd query?
1/ Why are the queries so different
Because the first one makes a big table as the cartesian product of the 4 tables before running the select against it, while the second one fires 1 request per table before aggregating the results in 4 lines. The execution plan of both requests can show that in details.
2/ Is there a way to get the result set of the 1st query with the speed of the 2nd query?
No. This is because of the nature of your data: seems like your 4 tables are not related anyhow, so you can't have a single (fast) request to hit them all. The best would probably be to make 4 requests, and group your results in your application.
I am using sql server 2012. I have one table, contains 1.7 million records. I am just selecting all record using (select * from table_name). But it takes 1 Hour 2 minutes to fetch.
What should i do to fetch records quickly?
All you can do is limit your Result by Using
Top(100) Result or Where Clause Your Desired Data instead of
SELECT * FROM table
This will help you get only concerning and limited data in a less amount of time.
Another thing you can do is to Get Concerning columns only which gives you desired results.
This will significantly enhance the fetching time.
I have a query that select the last 5($new) items from my database.
SELECT OvenRunData.dataId AS id, OvenRunData.data AS data
FROM ovenRuns INNER JOIN OvenRunData ON OvenRuns.id = OvenRunData.ovenRunId
WHERE OvenRunData.ovenRunId = (SELECT MAX(id) FROM OvenRuns)
ORDER BY id DESC LIMIT '$new'
I want to execute this query every 5 seconds with an AJAX request so I can update my table.
I know this query select the last 5 records but I want to know if the query runs through all records and then selects the last 5 or does it select only the last 5 without checking all the data?
I'm really worried that I'll have lag.
You need two indexes to make it fast enough:
create index ix_OvenRuns_id on OvenRuns(id)
create index ix_OvenRunData_ovenRunId on OvenRunData(ovenRunId)
you can even put OvenRunData.dataId OvenRunData.data into the second one, or create clustered index, however, these indexes definitely avoid full data scan.
That depends on the indexes.
In your case, you should have one on OverRuns(id).
More here: http://use-the-index-luke.com/sql/partial-results/top-n-queries
The LIMIT is applied after the ORDER BY, and the ORDER BY is applied to the entire result-set. So the answer to your question is, yes it must go through all of the records in your result-set determined by your WHERE clause before applying the LIMIT.
When I run my big query with limit 5 it returns more rows. This happens whenever i put limit greater than 3. For every limit it returns more rows.
this can happen, if you are using BigQuery Legacy SQL and querying table with repeated field for example as below
#legacySQL
SELECT request.parameters.*
FROM [yourProject:yourDataset.yourTable]
LIMIT 3
What it does? It limits number of output rows to 3, but than it outputs all repeated values in those three rows thus you see more than 3 :o)
The way to check this is to run below and confirm that real number of rows output is actually 3
#legacySQL
SELECT COUNT(1) FROM (
SELECT request.parameters.*
FROM [yourProject:yourDataset.yourTable]
LIMIT 3
)
The best way to deal with this is to migrate to using BigQuery Standard SQL
I've a big table which contains more than 100K records, in oracle. I want to get all of the records and save each row to a file with JDBC.
In order to make it faster, I want to create 100 threads to read the data from the table concurrently. I will get the total count of the records in the first sql, then split it to 100 pages, then get one page in a thread with a new connection.
But I've a problem, that there is no any column can be used to order. There is no column with sequence, no accurate timestamp. I can't use a sql query without order by clause to query, since there is no guarantee it will return the data with the same order every time (per this question).
So is it possible to solve it?
Finally, I used rowid to order:
select * from mytable order by rowid
It seems work well.