So basically I'm doing a SQL select query, but I want to know how much data I am pulling back (how many kilobytes), any way?
Actually, "Show Client Statistics" within SSMS query Editor Window will return the resultset size, Bytes Received from Server, etc
SELECT <your query here>
INTO dbo.MyTempTable
FROM <query source>
exec sp_spaceused 'MyTempTable'
DROP TABLE MyTempTable
This wlil Return Rows, Reserved Space, Data space (in KB), Index space, and unused space for that table.
You can include the actual execution plan of the query in the Results window of SSMS, which will display an estimated row size for the results. Multiply that by the number of rows to get your result. Not sure how accurate the estimated row size is, though.
You can use sp_spaceused to get the size of a table. But I am unaware of any way to get the size of a query (of course that means little).
One way to get a quick estimate of the size would be to save the data as a text file. Obviously, there will be extra whitespace. But it would give you a general idea on how large the query is.
Related
I´m working with PostgreSQL. I have a database named db_as on it with 25.000.000 rows of data. I wanted to set some diskspace free so I updated a full column to null value thinking that I would decrease databases size, but it didnt happend, in fact, i did the oposite thing, I increased databases size, and i dont know why. It increased from 700MB to 1425MB, thats a lot :( .
I used this sentence to know each columns size:
SELECT sum(pg_column_size(_column)) as size FROM _table
And this one to know all the databases size:
SELECT pg_database.datname, pg_size_pretty(pg_database_size(pg_database.datname)) AS size FROM pg_database;
The original values will still be on disk, just dead.
Run a vacuum on the database to remove these.
vacuum full
Documentation
https://www.postgresql.org/docs/12/sql-vacuum.html
I have a query that is taking a long time to execute, so I want to count the number of rows to see how large the output is. I will prospectively use the following format:
SELECT COUNT(*) FROM
(Original_Query) AS COUNT_QUERY
Original_Query is serving as a placeholder for an actual query. Does Original_Query have to run in its entirety before I can obtain the count?
The query definitely run in such case. One option to get the result without running is to look on the result estimation in the compiled execution plan. Simply run this command before your query:
SET SHOWPLAN_ALL ON
If you run your query then you get the query plan together with the estimates instead of your query result. It is possible to use it even programmatically.
See for example this excellent book for details.
This is too long for a comment.
Yes. The entire query has to run. How else would you be able to get the exact count? After all, you have written a query. When you tell SQL Server to run it, you expect it to run.
You can look at the execution plan to get estimates of the number of rows. For a complicated query, this is probably not going to be very accurate.
I would like to know how do i convert number of row into size like in MB or KB?
Is there a way to do that or formula?
The reason I doing this is because I would like to know given this set of data but not all in tablespace, how much size is used by this set of data.
Thanks,
keith
If you want an estimate, you could multiple the row count with the information from user_table.avg_row_len for that table.
If you want the real size of the table on disk, this is available user_segments.bytes. Note that the smallest unit Oracle will use is a block. So even for an empty table, you will see a value that is bigger tzen zero in that column. That is actual size of the space reserved in the tablespace for that table.
I am currently working with large table (~105M Records) in C# application.
When query the table with 'Order by' or 'Order Each by' clause, then i am getting "Resources exceeded during query execution" error.
If i remove 'Order by' or 'Order Each by' clause, then i am getting Response too large to return error.
Here is the sample query for two scenarios (I am using Wikipedia public table)
SELECT Id,Title,Count(*) FROM [publicdata:samples.wikipedia] Group EACH by Id, title Order by Id, Title Desc
SELECT Id,Title,Count(*) FROM [publicdata:samples.wikipedia] Group EACH by Id, title
Here are the questions i have
What is the maximum size of Big Query Response?
How do we select all the records in Query Request not in 'Export Method'?
1. What is the maximum size of Big Query Response?
As it's mentioned on Quota-policy queries maximum response size: 10 GB compressed (unlimited when returning large query results)
2. How do we select all the records in Query Request not in 'Export Method'?
If you plan to run a query that might return larger results, you can set allowLargeResults to true in your job configuration.
Queries that return large results will take longer to execute, even if the result set is small, and are subject to additional limitations:
You must specify a destination table.
You can't specify a top-level ORDER BY, TOP or LIMIT clause. Doing so negates the benefit of using allowLargeResults, because the query output can no longer be computed in parallel.
Window functions can return large query results only if used in conjunction with a PARTITION BY clause.
Read more about how to paginate to get the results here and also read from the BigQuery Analytics book, the pages that start with page 200, where it is explained how Jobs::getQueryResults is working together with the maxResults parameter and int's blocking mode.
Update:
Query Result Size Limitations - Sometimes, it is hard to know what 10 GB of compressed
data means.
When you run a normal query in BigQuery, the response size is limited to 10 GB
of compressed data. Sometimes, it is hard to know what 10 GB of compressed
data means. Does it get compressed 2x? 10x? The results are compressed within
their respective columns, which means the compression ratio tends to be very
good. For example, if you have one column that is the name of a country, there
will likely be only a few different values. When you have only a few distinct
values, this means that there isn’t a lot of unique information, and the column
will generally compress well. If you return encrypted blobs of data, they will
likely not compress well because they will be mostly random. (This is explained on the book linked above on page 220)
As part of a data analysis project, I will be issuing some long running queries on a mysql database. My future course of action is contingent on the results I obtain along the way. It would be useful for me to be able to view partial results generated by a SELECT statement that is still running.
Is there a way to do this? Or am I stuck with waiting until the query completes to view results which were generated in the very first seconds it ran?
Thank you for any help : )
In general case the partial result cannot be produced. For example, if you have an aggregate function with GROUP BY clause, then all data should be analysed, before the 1st row is returned. LIMIT clause will not help you, because it is applied after the output is computed. Maybe you can give a concrete data and SQL query?
One thing you may consider is sampling your tables down. This is good practice in data analysis in general to get your iteration speed up when you're writing code.
For example, if you have table create privelages and you have some mega-huge table X with key unique_id and some data data_value
If unique_id is numeric, in nearly any database
create table sample_table as
select unique_id, data_value
from X
where mod(unique_id, <some_large_prime_number_like_1013>) = 1
will give you a random sample of data to work your queries out, and you can inner join your sample_table against the other tables to improve speed of testing / query results. Thanks to the sampling your query results should be roughly representative of what you will get. Note, the number you're modding with has to be prime otherwise it won't give a correct sample. The example above will shrink your table down to about 0.1% of the original size (.0987% to be exact).
Most databases also have better sampling and random number methods than just using mod. Check the documentaion to see what's available for your version.
Hope that helps,
McPeterson
It depends on what your query is doing. If it needs to have the whole result set before producing output - such as might happen for queries with group by or order by or having clauses, then there is nothing to be done.
If, however, the reason for the delay is client-side buffering (which is the default mode), then that can be adjusted using "mysql-use-result" as an attribute of the database handler rather than the default "mysql-store-result". This is true for the Perl and Java interfaces: I think in the C interface, you have to use an unbuffered version of the function that executes the query.