I am querying hive table which contains around 10m rows. earlier query use to finish quickly on hive 0.14. Then I moved to hive 1.2.1 and now its not starting MR job
hive> select count(1) from nodes;
Query ID = lagvankarh_20160608221653_5dd82f87-3527-4eb6-9a59-f11ccaf0a125
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
I get SocketTimeout error after long time.
Related
I'm using pypyodbc and pandas.read_sql_query to query a cloud stored MS Access Database .accdb file.
def query_data(group_id,dbname = r'\\cloudservername\myfile.accdb',table_names=['ContainerData']):
start_time = datetime.now()
print(start_time)
pypyodbc.lowercase = False
conn = pypyodbc.connect(
r"Driver={Microsoft Access Driver (*.mdb, *.accdb)};"+
r"DBQ=" + dbname + r";")
connection_time = datetime.now()-start_time
print("Connection Time: " + str(connection_time))
querystring = ("SELECT TOP 10 Column1, Column2, Column3, Column4 FROM " +
table_names[0] + " WHERE Column0 = " + group_id)
my_data = pd.read_sql_query(querystring,conn)
print("Query Time: " + str(datetime.now()-start_time-connection_time))
conn.close()
return(my_data)
The database has about 30,000 rows. The group_id are sequential numbers from 1 to 3000 with 10 rows assigned to each group. For example, rows 1-10 in the database (oldest date) all have group_id=1. Rows 2990-3000 (newest data) all have group_id = 3000.
When I store the database locally on my PC and run query_data('1') the connection time is 0.1s and the query time is 0.01s. Similarly, running query_data('3000') the connection time is 0.2s and the query time is 0.08s.
When the database is stored on the cloud server, the connection time varies from 20-60s. When I run query_data('1') the query time is ~3 seconds. NOW THE BIG ISSUE: When I run query_data('3000') the query time i ~10 minutes!
I've tried using ORDER BY group_id DESC but that causes both queries to take ~ 10 minutes.
I've also tried changing the "Order by" group_id to Descending in the accdb itself and setting "Order by on load" to yes. Neither of these seem to change how the SQL query locates the data.
The problem is, the code I'm using almost always needs to find the newest data (e.g. group_id = max) which takes the longest amount of time to find. Is there a way to have the SQL query reverse it's searching order, so that the newest entries are looked through first, rather than the oldest entries? I wouldn't mind a 3 second (or even 1 minute) query time, but a 10 minute query time is too long. Or is there a setting I can change in the access database to change the order in which the data is stored?
I've also watched the network monitor while running the script, and python.exe steadily sends about 2kb/s and receives about 25kb/s throughout the full 10 minute duration of the script.
My setup: 4 node cluster in Google Cloud Platform (1 master, 3 workers) running NixOS Linux.
I have been using the TPC-DS toolkit to generate both data and queries are standard. On smaller dataset / more simpler queries they work just fine.
The queries I take from here: https://github.com/hortonworks/hive-testbench/tree/hdp3/sample-queries-tpcds
This is the first one, query1.sql:
WITH customer_total_return AS
(
SELECT sr_customer_sk AS ctr_customer_sk ,
sr_store_sk AS ctr_store_sk ,
Sum(sr_fee) AS ctr_total_return
FROM store_returns ,
date_dim
WHERE sr_returned_date_sk = d_date_sk
AND d_year =2000
GROUP BY sr_customer_sk ,
sr_store_sk)
SELECT c_customer_id
FROM customer_total_return ctr1 ,
store ,
customer
WHERE ctr1.ctr_total_return >
(
SELECT Avg(ctr_total_return)*1.2
FROM customer_total_return ctr2
WHERE ctr1.ctr_store_sk = ctr2.ctr_store_sk)
AND s_store_sk = ctr1.ctr_store_sk
AND s_state = 'NM'
AND ctr1.ctr_customer_sk = c_customer_sk
ORDER BY c_customer_id limit 100;
At first I had the problem of not being able to run this at all to success, running into java.lang.OutOfMemoryError: Java heap space.
What I did was:
Increased GCP nodes power (up to 7.5 gb of RAM and dual core CPUs)
Set these variables inside of the Hive CLI:
set mapreduce.map.memory.mb=2048;
set mapreduce.map.java.opts=-Xmx1024m;
set mapreduce.reduce.memory.mb=4096;
set mapreduce.reduce.java.opts=-Xmxe3072m;
set mapred.child.java.opts=-Xmx1024m;
Restarted Hive
Then this query worked (along other similar ones) when it came to a 1 GB dataset. I've monitored the situation with htop and the memory usage does not exceed 2gb while both CPU cores are used to 100% almost constantly.
Now the problem is, when it comes to more complex queries with larger dataset, the error starts again:
The query runs just fine for an entire minute or so, but ends in a FAIL. Full stacktrace:
hive> with customer_total_return as
> (select sr_customer_sk as ctr_customer_sk
> ,sr_store_sk as ctr_store_sk
> ,sum(SR_FEE) as ctr_total_return
> from store_returns
> ,date_dim
> where sr_returned_date_sk = d_date_sk
> and d_year =2000
> group by sr_customer_sk
> ,sr_store_sk)
> select c_customer_id
> from customer_total_return ctr1
> ,store
> ,customer
> where ctr1.ctr_total_return > (select avg(ctr_total_return)*1.2
> from customer_total_return ctr2
> where ctr1.ctr_store_sk = ctr2.ctr_store_sk)
> and s_store_sk = ctr1.ctr_store_sk
> and s_state = 'TN'
> and ctr1.ctr_customer_sk = c_customer_sk
> order by c_customer_id
> limit 100;
No Stats for default#store_returns, Columns: sr_returned_date_sk, sr_fee, sr_store_sk, sr_customer_sk
No Stats for default#date_dim, Columns: d_date_sk, d_year
No Stats for default#store, Columns: s_state, s_store_sk
No Stats for default#customer, Columns: c_customer_sk, c_customer_id
Query ID = root_20190811164854_c253c67c-ef94-4351-b4d3-74ede4c5d990
Total jobs = 14
Stage-29 is selected by condition resolver.
Stage-1 is filtered out by condition resolver.
Stage-30 is selected by condition resolver.
Stage-10 is filtered out by condition resolver.
SLF4J: Found binding in [jar:file:/nix/store/jjm6636r99r0irqa03dc1za9gs2b4fx6-source/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/nix/store/q9jpwzbqbg8k8322q785xfavg0p0v18i-hadoop-3.1.1/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
Execution completed successfully
MapredLocal task succeeded
SLF4J: Found binding in [jar:file:/nix/store/jjm6636r99r0irqa03dc1za9gs2b4fx6-source/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/nix/store/q9jpwzbqbg8k8322q785xfavg0p0v18i-hadoop-3.1.1/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Execution completed successfully
MapredLocal task succeeded
Launching Job 3 out of 14
Number of reduce tasks is set to 0 since there's no reduce operator
Job running in-process (local Hadoop)
2019-08-11 16:49:19,415 Stage-20 map = 0%, reduce = 0%
2019-08-11 16:49:22,418 Stage-20 map = 100%, reduce = 0%
Ended Job = job_local404291246_0005
Launching Job 4 out of 14
Number of reduce tasks is set to 0 since there's no reduce operator
Job running in-process (local Hadoop)
2019-08-11 16:49:24,718 Stage-22 map = 0%, reduce = 0%
2019-08-11 16:49:27,721 Stage-22 map = 100%, reduce = 0%
Ended Job = job_local566999875_0006
Launching Job 5 out of 14
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Job running in-process (local Hadoop)
2019-08-11 16:49:29,958 Stage-2 map = 0%, reduce = 0%
2019-08-11 16:49:33,970 Stage-2 map = 100%, reduce = 0%
2019-08-11 16:49:35,974 Stage-2 map = 100%, reduce = 100%
Ended Job = job_local1440279093_0007
Launching Job 6 out of 14
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Job running in-process (local Hadoop)
2019-08-11 16:49:37,235 Stage-11 map = 0%, reduce = 0%
2019-08-11 16:49:40,421 Stage-11 map = 100%, reduce = 0%
2019-08-11 16:49:42,424 Stage-11 map = 100%, reduce = 100%
Ended Job = job_local1508103541_0008
SLF4J: Found binding in [jar:file:/nix/store/jjm6636r99r0irqa03dc1za9gs2b4fx6-source/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/nix/store/q9jpwzbqbg8k8322q785xfavg0p0v18i-hadoop-3.1.1/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
2019-08-11 16:49:51 Dump the side-table for tag: 1 with group count: 21 into file: file:/tmp/root/3ab30b3b-380d-40f5-9f72-68788d998013/hive_2019-08-11_16-48-54_393_105456265244058313-1/-local-10019/HashTable-Stage-19/MapJoin-mapfile71--.hashtable
Execution completed successfully
MapredLocal task succeeded
Launching Job 7 out of 14
Number of reduce tasks is set to 0 since there's no reduce operator
Job running in-process (local Hadoop)
2019-08-11 16:49:53,956 Stage-19 map = 100%, reduce = 0%
Ended Job = job_local2121921517_0009
Stage-26 is filtered out by condition resolver.
Stage-27 is selected by condition resolver.
Stage-4 is filtered out by condition resolver.
2019-08-11 16:50:01 Dump the side-table for tag: 0 with group count: 99162 into file: file:/tmp/root/3ab30b3b-380d-40f5-9f72-68788d998013/hive_2019-08-11_16-48-54_393_105456265244058313-1/-local-10017/HashTable-Stage-17/MapJoin-mapfile60--.hashtable
2019-08-11 16:50:02 Uploaded 1 File to: file:/tmp/root/3ab30b3b-380d-40f5-9f72-68788d998013/hive_2019-08-11_16-48-54_393_105456265244058313-1/-local-10017/HashTable-Stage-17/MapJoin-mapfile60--.hashtable (2832042 bytes)
Execution completed successfully
MapredLocal task succeeded
Launching Job 9 out of 14
Number of reduce tasks is set to 0 since there's no reduce operator
Job running in-process (local Hadoop)
2019-08-11 16:50:04,004 Stage-17 map = 0%, reduce = 0%
2019-08-11 16:50:05,005 Stage-17 map = 100%, reduce = 0%
Ended Job = job_local694362009_0010
Stage-24 is selected by condition resolver.
Stage-25 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
SLF4J: Found binding in [jar:file:/nix/store/q9jpwzbqbg8k8322q785xfavg0p0v18i-hadoop-3.1.1/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
2019-08-11 16:50:12 Starting to launch local task to process map join; maximum memory = 239075328
Execution completed successfully
MapredLocal task succeeded
Launching Job 11 out of 14
Number of reduce tasks is set to 0 since there's no reduce operator
Job running in-process (local Hadoop)
2019-08-11 16:50:14,254 Stage-13 map = 100%, reduce = 0%
Ended Job = job_local1812693452_0011
Launching Job 12 out of 14
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Job running in-process (local Hadoop)
2019-08-11 16:50:15,481 Stage-6 map = 0%, reduce = 0%
Ended Job = job_local920309638_0012 with errors
Error during job, obtaining debugging information...
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched:
Stage-Stage-20: HDFS Read: 8662606197 HDFS Write: 0 SUCCESS
Stage-Stage-22: HDFS Read: 9339349675 HDFS Write: 0 SUCCESS
Stage-Stage-2: HDFS Read: 9409277766 HDFS Write: 0 SUCCESS
Stage-Stage-11: HDFS Read: 9409277766 HDFS Write: 0 SUCCESS
Stage-Stage-19: HDFS Read: 4704638883 HDFS Write: 0 SUCCESS
Stage-Stage-17: HDFS Read: 4771516428 HDFS Write: 0 SUCCESS
Stage-Stage-13: HDFS Read: 4771516428 HDFS Write: 0 SUCCESS
Stage-Stage-6: HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec
The problem in the hive.log file is still the same:
java.lang.Exception: java.lang.OutOfMemoryError: Java heap space
And I realized my worker nodes don't actually do anything (htop showed that they were idle while only the master node was working)
Even in the stack trace:
Job running in-process (local Hadoop)
How can I make Hive use HDFS not just Local Hadoop?
Running hdfs dfs -df -h hdfs:<redacted>:9000/ returns
Filesystem Size Used Available Use%
hdfs://<redacted>:9000 88.5 G 34.3 G 35.2 G 39%
Which is correct, I have 3 worker nodes, each with 30 GB disks.
java.lang.OutOfMemoryError: Java heap space It will happen if you are trying to push too much data on the single machine.
Based on the query provided, there are few things that you can try:
Change your join conditions to explicit (remove WHERE CLAUSE and use INNER/LEFT JOIN). e.g.
FROM customer_total_return ctr1
INNER JOIN store s
ON ctr1.ctr_store_sk = s.s_store_sk
AND s_state = 'NM'
INNER JOIN customer c
ON ctr1.ctr_customer_sk = c.c_customer_sk
Check if you have skewed data for one of the following fields:
store_returns -> sr_returned_date_sk
store_returns -> sr_store_sk
store_returns -> sr_customer_sk
customer -> c_customer_sk
store -> s_store_sk
It might be possible the one of the KEY has high percent of values and that might cause 1 of the node to be overloaded (when data size is huge).
Basically you are trying eliminate possible reasons of node overloading.
Let me know if it helps.
It could be resource issue. Hive queries are internally executed as Map-Reduce jobs. You could check the Job History logs for the Hive Map-Reduce jobs failed. Sometimes executing queries from shell are faster compared to the Hive-Query editor.
OOM issues are related to query performance most of the time.
There are two queries here:
Part 1:
WITH customer_total_return AS
(
SELECT sr_customer_sk AS ctr_customer_sk ,
sr_store_sk AS ctr_store_sk ,
Sum(sr_fee) AS ctr_total_return
FROM store_returns ,
date_dim
WHERE sr_returned_date_sk = d_date_sk
AND d_year =2000
GROUP BY sr_customer_sk ,
sr_store_sk)
Part 2:
SELECT c_customer_id
FROM customer_total_return ctr1 ,
store ,
customer
WHERE ctr1.ctr_total_return >
(
SELECT Avg(ctr_total_return)*1.2
FROM customer_total_return ctr2
WHERE ctr1.ctr_store_sk = ctr2.ctr_store_sk)
AND s_store_sk = ctr1.ctr_store_sk
AND s_state = 'NM'
AND ctr1.ctr_customer_sk = c_customer_sk
ORDER BY c_customer_id limit 100;
Try enabling JMX for the hive cluster
link
And see the memory usage of both the parts of query. And the part2 inner query also.
Few hive optimizations for above queries can be tried out:
Use SORT BY instead of ORDER BY Clause -> SORT BY clause, that orders the data only within each reducer.
Partition the tables on the join keys to read only specific data instead of whole table scan.
cache the small hive table in distributed cache and use map side join to reduce the shuffling
For example:
select /*+MAPJOIN(b)*/ col1,col2,col3,col4
from table_A a
join
table_B b
on
a.account_number=b.account_number
If there is a possibility of skew data in any of the tables then use following parameters:
set hive.optimize.skewjoin=true;
set hive.skewjoin.key=100000; (i.e. the threshold of the data should go to one node)
We have insert query in which we are trying to insert data to partitioned table by reading data from non partitioned table.
Query -
insert into db1.fact_table PARTITION(part_col1, part_col2)
( col1,
col2,
col3,
col4,
col5,
col6,
.
.
.
.
.
.
.
col32
LOAD_DT,
part_col1,
Part_col2 )
select
col1,
col2,
col3,
col4,
col5,
col6,
.
.
.
.
.
.
.
col32,
part_col1,
Part_col2
from db1.main_table WHERE col1=0;
Table has 34 columns, number of records in main table depends on size of input file which we receive on daily basis.
and the number of partitions (part_col1, part_col2) which we insert in each run might vary from 4000 to 5000
Some time this query fails with below issue.
2019-04-28 13:23:31,715 Stage-1 map = 95%, reduce = 0%, Cumulative
CPU 177220.23 sec 2019-04-28 13:24:25,989 Stage-1 map = 100%, reduce
= 0%, Cumulative CPU 163577.82 sec MapReduce Total cumulative CPU time: 1 days 21 hours 26 minutes 17 seconds 820 msec Ended Job =
job_1556004136988_155295 with errors Error during job, obtaining
debugging information... Examining task ID:
task_1556004136988_155295_m_000003 (and more) from job
job_1556004136988_155295 Examining task ID:
task_1556004136988_155295_m_000004 (and more) from job
job_1556004136988_155295 Task with the most failures(4):
----- Task ID: task_1556004136988_155295_m_000000
----- Diagnostic Messages for this Task: Exception from container-launch. Container id:
container_e81_1556004136988_155295_01_000015 Exit code: 255 Stack
trace: ExitCodeException exitCode=255:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:563)
at org.apache.hadoop.util.Shell.run(Shell.java:460)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:748)
at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:305)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:356)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:88)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748) Shell output: main : command provided 1 main : user is bldadmin main :
requested yarn user is bldadmin Container exited with a non-zero
exit code 255 FAILED: Execution Error, return code 2 from
org.apache.hadoop.hive.ql.exec.mr.MapRedTask MapReduce Jobs Launched:
Stage-Stage-1: Map: 10 Cumulative CPU: 163577.82 sec MAPRFS Read:
0 MAPRFS Write: 0 FAIL Total MapReduce CPU Time Spent: 1 days 21 hours
26 minutes 17 seconds 820 msec
Current hive properties.
Using Tez Engine -
set hive.execution.engine=tez;
set hive.tez.container.size=3072;
set hive.tez.java.opts=-Xmx1640m;
set hive.vectorized.execution.enabled=false;
set hive.vectorized.execution.reduce.enabled=false;
set hive.enforce.bucketing=true;
set hive.exec.parallel=true;
set hive.auto.convert.join=false;
set hive.enforce.bucketmapjoin=true;
set hive.optimize.bucketmapjoin.sortedmerge=true;
set hive.optimize.bucketmapjoin=true;
set hive.exec.tmp.maprfsvolume=false;
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.dynamic.partition=true;
set hive.stats.fetch.partition.stats=true;
set hive.support.concurrency=true;
set hive.exec.max.dynamic.partitions=999999999;
set hive.enforce.bucketing=true;
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
set hive.compactor.initiator.on=true;
Based on input from other teams we changed the engine to mr and propertied are -
set hive.execution.engine=mr;
set hive.auto.convert.join=false;
set mapreduce.map.memory.mb=16384;
set mapreduce.map.java.opts=-Xmx14745m;
set mapreduce.reduce.memory.mb=16384;
set mapreduce.reduce.java.opts=-Xmx14745m;
With these properties query completed with out any errors few times.
How can i debug these issue and are there any hive properties which we can set so that we don't get these issues in future.
Add distribute by partition key. Each reducer will process only one partition, not every partition, this will result in less memory consumption, because reducer will create less files, keeping less buffers.
insert into db1.fact_table PARTITION(part_col1, part_col2)
select
col1,
...
col32,
part_col1,
Part_col2
from db1.main_table WHERE col1=0
distribute by part_col1, Part_col2; --add this
Use Predicate Push Down, it may help with filtering if source files are ORC:
SET hive.optimize.ppd=true;
SET hive.optimize.ppd.storage=true;
SET hive.optimize.index.filter=true;
Tune proper mapper and reducer parallelism: https://stackoverflow.com/a/48487306/2700344
Add distribute by random in addition to partition keys if your data is too big and the distribution by partition key is not even. This will help with skewed data:
distribute by part_col1, Part_col2, FLOOR(RAND()*100.0)%20;
Read also https://stackoverflow.com/a/55375261/2700344
I have two table in Apache hive. The first is called traffic_violations and the second is called cars.
So, in traffic_violations I have a column called fatl with values "Yes" or "No". I set this column as a STRING. I make the join between the table with an id.
So, I have this query:
select gender, fatal, substr(date_of_stop,7,10) as year, make
from traffic_violations t
join cars c on t.id = c.id
where fatal = "Yes"
group by date_of_stop, make, gender, fatal
If I remove the WHERE clause, the query works, but with this clause it does not work.
Hive print this message:
MapReduce Jobs Launched:
Stage-Stage-1: HDFS Read: 1809950422 HDFS Write: 0 SUCCESS
Stage-Stage-2: HDFS Read: 797286840 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
Time taken: 31.228 seconds
but Hive doesn't print the result of this query.
How do I resolve this problem?
Thanks to all!
I have a destination table(created as an output of some other query),
simple order by on one of its column is resulting "resource exceeded" error message.
Destination table created has 8.5 million rows and 6 columns (size 567 MB approx).
select col1,col2.....col6 from desttable order by col 5 desc
is resulting "resource exceeded" error message.
Remove ORDER BY and see if error disappear!
ORDER BY moves WHOLE data into one worker - thus resources exceeded
If I am adding "LIMIT" and "OFFSET" clause in the query after order by
its working,even though LIMIT clause is the last to be evaluated.How
it is working there??
When you add LIMIT N - query runs on multiple workers. Each worker gets only part of the data to process and outputs only respective N rows. Those N rows from all workers than gets "delivered" to one worker where final ORDER BY and LIMIT occurs and "winning" N rows becomes output of whole query