I am running some queries with Oracle SQL Developer, and want to export the output to an csv file. But the export takes way too long, it seems to be re-running the whole query again. Here are my steps, please let me know if I am doing anything wrong here.
Run the query with 'Run Statement'
Results returned after 10 minutes, results shown in 'query result' underneath.
Right click the results, click 'export' and select 'csv' in export wizard.
Click Next, and Next to save the results.
Takes 10-30 minutes to output 10,000 rows data.
I understand there is a difference between showing results in grid, vs full results. But something seems to be wrong here and it is wasting too much of my time exporting data.
Thanks too all your help.
It IS running the whole query again.
To avoid that, fetch all the results to the grid, and then EXPORT them.
Ctrl+End will do that.
I talk about this here
https://www.thatjeffsmith.com/archive/2012/03/how-to-export-sql-developer-query-results-without-re-running-the-query/
But WHY does SQL Developer force the 2nd query execution?
Good question.
If you do force all of the rows back into the grid, you are going to be consuming a decent amount of memory. Some query result sets are larger than others, and not all of us have 64 bit monsters to run our tools on. So, in order to conserve machine resources and memory, we just run the query again and write the data directly to the destination and bypass the data grid.
Related
We are performance tuning, both in terms of cost and speed, some of our queries and the results we get are a little bit weird. First, we had one query that did an overwrite on an existing table, we stopped that one after 4 hours. Running the same query to an entirely new table and it only took 5 minutes. I wonder if the 5 minute query maybe used a cached result from the first run, is that possible to check? Is it possible to force BigQuery to not use cache?
If you run query in UI - expand Options and make sure Use Cached Result properly set
Also, in UI, you can check Job Details to see if cached result was used
If you run your query programmatically - you should use respective attributes - configuration.query.useQueryCache and statistics.query.cacheHit
I am beginner with Oracle DB.
I want to know execution time for a query. This query returns around 20,000 records.
When I see the SQL Developer, it shows only 50 rows, maximum tweakable to 500. And using F5 upto 5000.
I would have done by making changes in the application, but application redeployment is not possible as it is running on production.
So, I am limited to using only SQL Developer. I am not sure how to get the seconds spent for execution of the query ?
Any ideas on this will help me.
Thank you.
Regards,
JE
If you scroll down past the 50 rows initially returned, it fetches more. When I want all of them, I just click on the first of the 50, then press CtrlEnd to scroll all the way to the bottom.
This will update the display of the time that was used (just above the results it will say something like "All Rows Fetched: 20000 in 3.606 seconds") giving you an accurate time for the complete query.
If your statement is part of an already deployed application and if you have rights to access the view V$SQLAREA, you could check for number of EXECUTIONS and CPU_TIME. You can search for the statement using SQL_TEXT:
SELECT CPU_TIME, EXECUTIONS
FROM V$SQLAREA
WHERE UPPER (SQL_TEXT) LIKE 'SELECT ... FROM ... %';
This is the most precise way to determine the actual run time. The view V$SESSION_LONGOPS might also be interesting for you.
If you don't have access to those views you could also use a cursor loop for running through all records, e.g.
CREATE OR REPLACE PROCEDURE speedtest AS
count number;
cursor c_cursor is
SELECT ...;
BEGIN
-- fetch start time stamp here
count := 0;
FOR rec in c_cursor
LOOP
count := count +1;
END LOOP;
-- fetch end time stamp here
END;
Depending on the architecture this might be more or less accurate, because data might need to be transmitted to the system where your SQL is running on.
You can change those limits; but you'll be using some time in the data transfer between the DB and the client, and possibly for the display; and that in turn would be affected by the number of rows pulled by each fetch. Those things affect your application as well though, so looking at the raw execution time might not tell you the whole story anyway.
To change the worksheet (F5) limit, go to Tools->Preferences->Database->Worksheet, and increase the 'Max rows to print in a script' value (and maybe 'Max lines in Script output'). To change the fetch size go to the Database->Advanced panel in the preferences; maybe to match your application's value.
This isn't perfect but if you don't want to see the actual data, just get the time it takes to run in the DB, you can wrap the query to get a single row:
select count(*) from (
<your original query
);
It will normally execute the entire original query and then count the results, which won't add anything significant to the time. (It's feasible it might rewrite the query internally I suppose, but I think that's unlikely, and you could use hints to avoid it if needed).
I have the following script:
SELECT
DEPT.F03 AS F03, DEPT.F238 AS F238, SDP.F04 AS F04, SDP.F1022 AS F1022,
CAT.F17 AS F17, CAT.F1023 AS F1023, CAT.F1946 AS F1946
FROM
DEPT_TAB DEPT
LEFT OUTER JOIN
SDP_TAB SDP ON SDP.F03 = DEPT.F03,
CAT_TAB CAT
ORDER BY
DEPT.F03
The tables are huge, when I execute the script in SQL Server directly it takes around 4 min to execute, but when I run it in the third party program (SMS LOC based on Delphi) it gives me the error
<msg> out of memory</msg> <sql> the code </sql>
Is there anyway I can lighten the script to be executed? or did anyone had the same problem and solved it somehow?
I remember having had to resort to the ROBUST PLAN query hint once on a query where the query-optimizer kind of lost track and tried to work it out in a way that the hardware couldn't handle.
=> http://technet.microsoft.com/en-us/library/ms181714.aspx
But I'm not sure I understand why it would work for one 'technology' and not another.
Then again, the error message might not be from SQL but rather from the 3rd-party program that gathers the output and does so in a 'less than ideal' way.
Consider adding paging to the user edit screen and the underlying data call. The point being you dont need to see all the rows at one time, but they are available to the user upon request.
This will alleviate much of your performance problem.
I had a project where I had to add over 7 million individual lines of T-SQL code via batch (couldn't figure out how to programatically leverage the new SEQUENCE command). The problem was that there was limited amount of memory available on my VM (I was allocated the max amount of memory for this VM). Because of the large amount lines of T-SQL code I had to first test how many lines it could take before the server crashed. For whatever reason, SQL (2012) doesn't release the memory it uses for large batch jobs such as mine (we're talking around 12 GB of memory) so I had to reboot the server every million or so lines. This is what you may have to do if resources are limited for your project.
Does postgres include the time taken for rendering the output on screen within \timing or explain analyze. From what I understand it does not. Am I correct?
Actually I am outputting a lot of rows on screen and I find that postgres does not take much time to display them on screen, whereas if I write a simple C program to output the results, then the C programs takes about 3000ms. Whereas postgres takes about 500ms to display the same data on screen.
"postgres" doesn't display anything at all. I think you mean the psql client.
If so: \timing displays time including the time to receive the data from the server. EXPLAIN ANALYZE doesn't, but adds the overhead of doing the detailed server-side timing. log_min_duration_statement just records statement timings server-side.
Good afternoon,
After computing a rather large vector (a bit shorter than 2^20 elements), I have to store the result in a database.
The script takes about 4 hours to execute with a simple code such as :
#Do the processing
myVector<-processData(myData)
#Sends every thing to the database
lapply(myVector,sendToDB)
What do you think is the most efficient way to do this?
I thought about using the same query to insert multiple records (multiple inserts) but it simply comes back to "chucking" the data.
Is there any vectorized function do send that into a database?
Interestingly, the code takes a huge amount of time before starting to process the first element of the vector. That is, if I place a browser() call inside sendToDB, it takes 20 minutes before it is reached for the first time (and I mean 20 minutes without taking into account the previous line processing the data). So I was wondering what R was doing during this time?
Is there another way to do such operation in R that I might have missed (parallel processing maybe?)
Thanks!
PS: here is a skelleton of the sendToDB function:
sendToDB<-function(id,data) {
channel<-odbcChannel(...)
query<-paste("INSERT INTO history VALUE(",id,",\"",data,"\")",sep="")
sqlQuery(channel,query)
odbcClose(channel)
}
That's the idea.
UPDATE
I am at the moment trying out the LOAD DATA INFILE command.
I still have no idea why it takes so long to reach the internal function of the lapply for the first time.
SOLUTION
LOAD DATA INFILE is indeed much quicker. Writing into a file line by line using write is affordable and write.table is even quicker.
The overhead I was experiencing for lapply was coming from the fact that I was looping over POSIXct objects. It is much quicker to use seq(along.with=myVector) and then process the data from within the loop.
What about writing it to some file and call LOAD DATA INFILE? This should at least give a benchmark. BTW: What kind of DBMS do you use?
Instead of your sendToDB-function, you could use sqlSave. Internally it uses a prepared insert-statement, which should be faster than individual inserts.
However, on a windows-platform using MS SQL, I use a separate function which first writes my dataframe to a csv-file and next calls the bcp bulk loader. In my case this is a lot faster than sqlSave.
There's a HUGE, relatively speaking, overhead in your sendToDB() function. That function has to negotiate an ODBC connection, send a single row of data, and then close the connection for each and every item in your list. If you are using rodbc it's more efficient to use sqlSave() to copy an entire data frame over as a table. In my experience I've found some databases (SQL Server, for example) to still be pretty slow with sqlSave() over latent networks. In those cases I export from R into a CSV and use a bulk loader to load the files into the DB. I have an external script set up that I call with a system() call to run the bulk loader. That way the load is happening outside of R but my R script is running the show.