Data ingest issues hive: java.lang.OutOfMemoryError: unable to create new native thread - hive

I'm a hive newbie and having an odyssey of problems getting a large (1TB) HDFS file into a partitioned Hive managed table. Can you please help me get around this? I feel like I have a bad config somewhere because I'm not able to complete reducer jobs.
Here is my query:
DROP TABLE IF EXISTS ts_managed;
SET hive.enforce.sorting = true;
CREATE TABLE IF NOT EXISTS ts_managed (
svcpt_id VARCHAR(20),
usage_value FLOAT,
read_time SMALLINT)
PARTITIONED BY (read_date INT)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS ORC
TBLPROPERTIES("orc.compress"="snappy","orc.create.index"="true","orc.bloom.filter.columns"="svcpt_id");
SET hive.vectorized.execution.enabled = true;
SET hive.vectorized.execution.reduce.enabled = true;
SET set hive.cbo.enable=true;
SET hive.tez.auto.reducer.parallelism=true;
SET hive.exec.reducers.max=20000;
SET yarn.nodemanager.pmem-check-enabled = true;
SET optimize.sort.dynamic.partitioning=true;
SET hive.exec.max.dynamic.partitions=10000;
INSERT OVERWRITE TABLE ts_managed
PARTITION (read_date)
SELECT svcpt_id, usage, read_time, read_date
FROM ts_raw
DISTRIBUTE BY svcpt_id
SORT BY svcpt_id;
My cluster specs are:
VM cluster
4 total nodes
4 data nodes
32 cores
140 GB RAM
Hortonworks HDP 3.0
Apache Tez as default Hive engine
I am the only user of the cluster
My yarn configs are:
yarn.nodemanager.resource.memory-mb = 32GB
yarn.scheduler.minimum-allocation-mb = 512MB
yarn.scheduler.maximum-allocation-mb = 8192MB
yarn-heapsize = 1024MB
My Hive configs are:
hive.tez.container.size = 682MB
hive.heapsize = 4096MB
hive.metastore.heapsize = 1024MB
hive.exec.reducer.bytes.per.reducer = 1GB
hive.auto.convert.join.noconditionaltask.size = 2184.5MB
hive.tex.auto.reducer.parallelism = True
hive.tez.dynamic.partition.pruning = True
My tez configs:
tez.am.resource.memory.mb = 5120MB
tez.grouping.max-size = 1073741824 Bytes
tez.grouping.min-size = 16777216 Bytes
tez.grouping.split-waves = 1.7
tez.runtime.compress = True
tez.runtime.compress.codec = org.apache.hadoop.io.compress.SnappyCodec
I've tried countless configurations including:
Partition on date
Partition on date, cluster on svcpt_id with buckets
Partition on date, bloom filter on svcpt, sort by svcpt_id
Partition on date, bloom filter on svcpt, distribute by and sort by svcpt_id
I can get my mapping vertex to run, but I have not gotten my first reducer vertex to complete. Here is my most recent example from the above query:
----------------------------------------------------------------------------------------------
VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
----------------------------------------------------------------------------------------------
Map 1 .......... container SUCCEEDED 1043 1043 0 0 0 0
Reducer 2 container RUNNING 9636 0 0 9636 1 0
Reducer 3 container INITED 9636 0 0 9636 0 0
----------------------------------------------------------------------------------------------
VERTICES: 01/03 [=>>-------------------------] 4% ELAPSED TIME: 6804.08 s
----------------------------------------------------------------------------------------------
The error was:
Error: Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Reducer 2, vertexId=vertex_1537061583429_0010_2_01, diagnostics=[Task failed, taskId=task_1537061583429_0010_2_01_000070, diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( failure ) : java.lang.OutOfMemoryError: unable to create new native thread
I either get this OOM error which I cannot seem to get around or I get datanodes going offline and not being able to meet my replication factor requirements.
At this point I've been troubleshooting for over 2 weeks. Any contacts for professional consultants I can pay to solve this problem would also be appreciated.
Thanks in advance!

I ended up solving this after speaking with a Hortonworks tech guy. Turns out I was over-partitioning my table. Instead of partitioining by day over about 4 years I partitioned by month and it worked great.

Related

SQL Server does not perform job logic correct

I have some maintenance job, here is structure:
Step 1 and 3 just checking where are
Step 1 and 3 just checking where is specific database is currently resides - is it primary or secondary replica
If sys.fn_hadr_is_primary_replica ( 'DATABASENAME') <> 1
BEGIN
RAISERROR ('Not PRIMARY REPLICA FOR DATABASE NAME', -- Message text.
16, -- Severity.
1 -- State.
);
END
STEP 2 and 4 just performs backup of specific database using Ola script
EXECUTE [dbo].[DatabaseBackup]
#Databases = 'SOMEDB',
#Directory = 'SOMESHARE',
#BackupType = 'FULL',
#Verify = 'Y',
#CleanupTime = 336,
#CheckSum = 'Y',
#LogToTable = 'Y',
#Compress = 'Y'
Step 1 moves job execution to Step 3 in case of database is not on primary replica now
Step 3 quit with success in case of failure
This works well on secondary replica and job quit with success with failed first and third step
But on primary replica there absolutely strange things happening and SQL agent can mix step or doing logic that should not happen anyway. Looks like following
Just rebooted the server and now everything working fine , its probably happened because of installing multiple windows and sql server patches since last 5 days without reboot whole server.

clickhouse cluster : data not replicated

I have a cluster with 2 nodes for a test.
1 Shard and 2 replica.
3 nodes in the zookeeper cluster
<remote_servers>
<ch_cluster>
<shard>
<internal_replication>true</internal_replication>
<replica>
<host>ch1</host>
<port>9000</port>
</replica>
<replica>
<host>ch2</host>
<port>9000</port>
</replica>
</shard>
</ch_cluster>
</remote_servers>
macros in ch1 :
<macros>
<shard>shard_01</shard>
<replica>replica-01</replica>
</macros>
macro in ch2 :
<macros>
<shard>shard_01</shard>
<replica>replica-02</replica>
</macros>
zookeeper configuration :
<zookeeper>
<node>
<host>zoo1</host>
<port>2181</port>
</node>
<node>
<host>zoo2</host>
<port>2181</port>
</node>
<node>
<host>zoo3</host>
<port>2181</port>
</node>
</zookeeper>
I create the first table
CREATE TABLE IF NOT EXISTS test.hits_local ON CLUSTER ch_cluster
(
`date` Datetime,
`user_id` String,
`pageviews` Int32
)
ENGINE = ReplicatedMergeTree('/clickhouse/ch_cluster/tables/{shard}/hits_local', '{replica}')
PARTITION BY toStartOfHour(date)
ORDER BY (date)
then i create a distributed table :
CREATE TABLE IF NOT EXISTS test.hits ON CLUSTER 'ch_cluster'
AS test.hits_local
(
`date` Datetime,
`user_id` String,
`pageviews` Int32
)
ENGINE = Distributed('ch_cluster', 'test', 'hits_local')
then i insert data in test.hits_local table in ch1
when select data from test.hits_local in ch2 there is no data
then i tried to select from test.hits Distributed table in ch2 the data appear after 5-6 min
but no data in test.hits_local in ch2
my question when the data replicated to ch2?
who is responsible to replicate data to another node ? is it a zookeeper or should i insert the data into tables in ch1 and ch2?
should i change <internal_replication>true</internal_replication> to false ?
is it necessary for the data to be replicated to test.hits_local in ch2?
thank you.
should i change <internal_replication>true</internal_replication> to false ?
No, you should not. If you use ReplicatedMergeTree internal_replication MUST BE true.
Replication is done by ReplicatedMergeTree table engine internally.
Replicas communicate using their hostnames and port=9009.
Check system.replication_queue table for errors.
Most probably the node "ch1" announced own hostname in Zookeeper i.e. "localhost".
So the second node "ch2" unable to access localhost:9009 or something.
Such issues you can find in clickhouse-server.log or system.replication_queue (it has a column with errors).
Usually replication lag is less than 2 seconds even in very high-loaded setups.

How to iterate many Hive scripts over spark

I have many hive scripts (somewhat 20-25 scripts), each scripts having multiple queries. I want to run these scripts using spark so that the process can run faster. As map reduce job in hive takes long time to execute from spark it will be much faster. Below is the code I have written but its working for 3-4 files but when given multiple files with multiple queries its getting failed.
Below is the code for the same. Please help me if possible to optimize the same.
val spark = SparkSession.builder.master("yarn").appName("my app").enableHiveSupport().getOrCreate()
val filename = new java.io.File("/mapr/tmp/validation_script/").listFiles.filter(_.getName.endsWith(".hql")).toList
for ( i <- 0 to filename.length - 1)
{
val filename1 = filename(i)
scala.io.Source.fromFile(filename1).getLines()
.filterNot(_.isEmpty) // filter out empty lines
.foreach(query =>
spark.sql(query))
}
some of the error I cam getting is like
ERROR SparkSubmit: Job aborted.
org.apache.spark.SparkException: Job aborted.
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:224)
ERROR FileFormatWriter: Aborting job null.
org.apache.spark.SparkException: Job aborted due to stage failure: ShuffleMapStage 12 (sql at validationtest.scala:67) has failed the maximum allowable number of times: 4. Most recent failure reason: org.apache.spark.shuffle.FetchFailedException: failed to allocate 16777216 byte(s) of direct memory (used: 1023410176, max: 1029177344) at org.apache.spark.storage.ShuffleBlockFetcherIterator.throwFetchFailedException(ShuffleBlockFetcherIterator.scala:528)
many different types of error I get when run the same code multiple times.
Below is how one of the HQL file looks like. its name is xyz.hql and has
drop table pontis_analyst.daydiff_log_sms_distribution
create table pontis_analyst.daydiff_log_sms_distribution as select round(datediff(date_sub(current_date(),cast(date_format(CURRENT_DATE ,'u') as int) ),cast(subscriberActivationDate as date))/7,4) as daydiff,subscriberkey as key from pontis_analytics.prepaidsubscriptionauditlog
drop table pontis_analyst.weekly_sms_usage_distribution
create table pontis_analyst.weekly_sms_usage_distribution as select sum(event_count_ge) as eventsum,subscriber_key from pontis_analytics.factadhprepaidsubscriptionsmsevent where effective_date_ge_prt < date_sub(current_date(),cast(date_format(CURRENT_DATE ,'u') as int) - 1 ) and effective_date_ge_prt >= date_sub(date_sub(current_date(),cast(date_format(CURRENT_DATE ,'u') as int) ),84) group by subscriber_key;
drop table pontis_analyst.daydiff_sms_distribution
create table pontis_analyst.daydiff_sms_distribution as select day.daydiff,sms.subscriber_key,sms.eventsum from pontis_analyst.daydiff_log_sms_distribution day inner join pontis_analyst.weekly_sms_usage_distribution sms on day.key=sms.subscriber_key
drop table pontis_analyst.weekly_sms_usage_final_distribution
create table pontis_analyst.weekly_sms_usage_final_distribution as select spp.subscriberkey as key, case when spp.tenure < 3 then round((lb.eventsum )/dayDiff,4) when spp.tenure >= 3 then round(lb.eventsum/12,4)end as result from pontis_analyst.daydiff_sms_distribution lb inner join pontis_analytics.prepaidsubscriptionsubscriberprofilepanel spp on spp.subscriberkey = lb.subscriber_key
INSERT INTO TABLE pontis_analyst.validatedfinalResult select 'prepaidsubscriptionsubscriberprofilepanel' as fileName, 'average_weekly_sms_last_12_weeks' as attributeName, tbl1_1.isEqual as isEqual, tbl1_1.isEqualCount as isEqualCount, tbl1_2.countAll as countAll, (tbl1_1.isEqualCount/tbl1_2.countAll)* 100 as percentage from (select tbl1_0.isEqual as isEqual, count(isEqual) as isEqualCount from (select case when round(aal.result) = round(srctbl.average_weekly_sms_last_12_weeks) then 1 when aal.result is null then 1 when aal.result = 'NULL' and srctbl.average_weekly_sms_last_12_weeks = '' then 1 when aal.result = '' and srctbl.average_weekly_sms_last_12_weeks = '' then 1 when aal.result is null and srctbl.average_weekly_sms_last_12_weeks = '' then 1 when aal.result is null and srctbl.average_weekly_sms_last_12_weeks is null then 1 else 0 end as isEqual from pontis_analytics.prepaidsubscriptionsubscriberprofilepanel srctbl left join pontis_analyst.weekly_sms_usage_final_distribution aal on srctbl.subscriberkey = aal.key) tbl1_0 group by tbl1_0.isEqual) tbl1_1 inner join (select count(*) as countAll from pontis_analytics.prepaidsubscriptionsubscriberprofilepanel) tbl1_2 on 1=1
Your issue is your code is running out of memory as shown below
failed to allocate 16777216 byte(s) of direct memory (used: 1023410176, max: 1029177344)
Though what you are trying to do is not optimal way of doing things in Spark but I would recommend that you remove the memory serialization as it will not help in anyways. You should cache data only if it is going to be used in multiple transformations. If it is going to be used once there is no reason to put the data in cache.

DSE: Query Timeout/Slow

I am currently running a cluster of 3 nodes with 200 mill of data and the specific vertex I'm querying a total of 25 mill vertex and 30 Mill edges. I am running the following query
g.V().hasLabel('people_node').has("age", inside(0,25)).filter(outE('posted_question').count().is(gt(1))).profile()
I have tried this query on a smaller set of ~100 vertex and edges and the profiler showed that indexes have been used for all parts of the query. However, I think the problem might be in my schema which is shown below.
Schema
schema.propertyKey('id').Text().ifNotExists().create()
schema.propertyKey('name').Text().ifNotExists().create()
schema.propertyKey('age').Int().ifNotExists().create()
schema.propertyKey('location').Point().withGeoBounds().ifNotExists().create()
schema.propertyKey('gender').Text().ifNotExists().create()
schema.propertyKey('dob').Timestamp().ifNotExists().create()
schema.propertyKey('tags').Text().ifNotExists().create()
schema.propertyKey('date_posted').Timestamp().ifNotExists().create()
schema.vertexLabel('people_node').properties('id','name','location','gender','dob').create()
schema.vertexLabel('questions_node').properties('id','tags','date_posted').create()
schema.edgeLabel('posted_question').single().connection('people_node','questions_node').create()
Indexes Used
schema.vertexLabel("people_node").index("search").search().by("name").by("age").by("gender").by("location").by("dob").ifNotExists().add()
schema.vertexLabel("people_node").index("people_node_index").materialized().by("id").ifNotExists().add()
schema.vertexLabel("questions_node").index("search").search().by("date_posted").by("tags").ifNotExists().add()
schema.vertexLabel("questions_node").index("questions_node_index").materialized().by("id").ifNotExists().add()
I have also read about "OLAP" queries I believe I have activated it but the query is still way too slow. Any advise or insight on what is slowing it down will be greatly appreciated.
Profile Statement (OLTP)
gremlin> g1.V().has("people_node","age", inside(0,25)).filter(outE('posted_question').count().is(gt(1))).profile()
==>Traversal Metrics
Step Count Traversers
Time (ms) % Dur
=============================================================================================================
DsegGraphStep(vertex,[],(age < 25 & age > 0 & l... 1 1
38.310 25.54
query-optimizer
0.219
\_condition=((age < 25 & age > 0 & label = people_node) & (true))
query-setup
0.001
\_isFitted=true
\_isSorted=false
\_isScan=false
index-query
26.581
\_indexType=Search
\_usesCache=false
\_statement=SELECT "community_id", "member_id" FROM "MiniGraph"."people_node_p" WHERE "solr_query" = '{"q
":"*:*", "fq":["age:{0 TO 25}"]}' LIMIT ?; with params (java.lang.Integer) 50000
\_options=Options{consistency=Optional[ONE], serialConsistency=Optional.empty, fallbackConsistency=Option
al.empty, pagingState=null, pageSize=-1, user=Optional[cassandra], waitForSchemaAgreement=true,
async=true}
TraversalFilterStep([DsegVertexStep(OUT,[posted...
111.471 74.32
DsegVertexStep(OUT,[posted_question],edge,(di... 1 1
42.814
query-optimizer
0.227
\_condition=((direction = OUT & label = posted_question) & (true))
query-setup
0.036
\_isFitted=true
\_isSorted=false
\_isScan=false
vertex-query
29.908
\_usesCache=false
\_statement=SELECT * FROM "MiniGraph"."people_node_e" WHERE "community_id" = ? AND "member_id" = ? AND "
~~edge_label_id" = ? LIMIT ? ALLOW FILTERING; with params (java.lang.Integer) 1300987392, (j
ava.lang.Long) 1026, (java.lang.Integer) 65584, (java.lang.Integer) 2
\_options=Options{consistency=Optional[ONE], serialConsistency=Optional.empty, fallbackConsistency=Optio
nal.empty, pagingState=null, pageSize=-1, user=Optional[cassandra], waitForSchemaAgreement=tru
e, async=true}
\_usesIndex=false
RangeGlobalStep(0,2) 1 1
0.097
CountGlobalStep 1 1
0.050
IsStep(gt(1))
68.209
DsegPropertyLoadStep
0.205 0.14
>TOTAL - -
149.986 -
Next, due to the partial query being much faster I assume the long time consumption is due to the necessary graph traversals. Hence, is it possible to cache or activate the indexes (_usesIndex=false) so that OLAP queries to be much faster?
Will you please post the output of the .profile statement?
Semanticaly, it looks like you're trying to find all "people" under the age of 25 that have more than 1 posted question. Is that accurate?

Want to occur deadlock using sql query

I want to demonstrate a deadlock situation:
In my first transaction:
UPDATE POSITION SET EXTRA = EXTRA || 'yes' WHERE NAME="JOHN"
UPDATE POSITION SET EXTRA = 'HI' WHERE EXTRA = 'EXTRA';
So second transaction:
UPDATE POSITION SET BONUS = BONUS * 1.05;
UPDATE POSITION SET BONUS = 0 IF BONUS IS NULL;
So isn't possible to occur deadlock here just want to try and understand for it
for my knowledge. deadlock occur if update at different row but not different column and transaction occur same with each other, but for this 4 updates. i don't know how to make it become deadlock situation
Deadlocks occur when two processes block each other by trying to obtain the same resources in a different order. I've seen Oracle deadlocks happen for three reasons, there are probably more:
Concurrent sessions update the same rows in different order because explain plans retrieve the rows differently. For example, one session uses an index and another uses a full table scan.
Un-indexed foreign keys cause table locks.
Bitmap indexes and any type of concurrent DML on a table.
The code below demonstrates the first case. It generates a deadlock by looping through two of your update statements. The index causes the first session to use an INDEX RANGE SCAN and the second session uses a FULL TABLE SCAN. The results are not deterministic but it only took about a second for this to fail on my PC.
Sample schema and data
create table position(name varchar2(100), extra varchar2(4000), bonus number);
insert into position select 'JOHN', null, 1 from dual connect by level <= 100;
insert into position select level , null, 1 from dual connect by level <= 100000;
create index position_index on position(name);
Session 1 (run at the same time as session 2)
begin
for i in 1 .. 1000 loop
UPDATE POSITION SET EXTRA = EXTRA || 'yes' WHERE NAME='JOHN';
commit;
end loop;
end;
/
Session 2 (run at the same time as session 1)
begin
for i in 1 .. 1000 loop
UPDATE POSITION SET BONUS = BONUS * 1.05;
commit;
end loop;
end;
/
Error message
ERROR at line 1:
ORA-00060: deadlock detected while waiting for resource
ORA-06512: at line 3
Find the location of the trace file generated for each deadlock:
select value from v$parameter where name like 'background_dump_dest';
Example of a trace:
...
Deadlock graph:
---------Blocker(s)-------- ---------Waiter(s)---------
Resource Name process session holds waits process session holds waits
TX-0009000F-00004ACC-00000000-00000000 37 129 X 55 375 X
TX-0008001B-0000489C-00000000-00000000 55 375 X 37 129 X
session 129: DID 0001-0025-00000281 session 375: DID 0001-0037-00012A2C
session 375: DID 0001-0037-00012A2C session 129: DID 0001-0025-00000281
Rows waited on:
Session 129: obj - rowid = 0001AC1C - AAAawcAAGAAAudMAAQ
(dictionary objn - 109596, file - 6, block - 190284, slot - 16)
Session 375: obj - rowid = 0001AC1C - AAAawcAAGAAAudMAAL
(dictionary objn - 109596, file - 6, block - 190284, slot - 11)
----- Information for the OTHER waiting sessions -----
Session 375:
sid: 375 ser: 10033 audsid: 56764801 user: 111/xxxxxxxxxxxx
flags: (0x45) USR/- flags_idl: (0x1) BSY/-/-/-/-/-
flags2: (0x40009) -/-/INC
pid: 55 O/S info: user: oracle, term: xxxxxxxxxx, ospid: 7820
image: ORACLE.EXE (SHAD)
client details:
O/S info: user: xxxxxxxxxx\xxxxxxxxxx, term: xxxxxxxxxx, ospid: 11848:10888
machine: xxxxxxxxxx\xxxxxxxxxx program: sqlplus.exe
application name: SQL*Plus, hash value=3669949024
current SQL:
UPDATE POSITION SET BONUS = BONUS * 1.05
----- End of information for the OTHER waiting sessions -----
Information for THIS session:
----- Current SQL Statement for this session (sql_id=cp515bpfsjd07) -----
UPDATE POSITION SET EXTRA = EXTRA || 'yes' WHERE NAME='JOHN'
...
The locked object is not always the table directly modified. Check which object caused the problem:
select * from dba_objects where object_id = 109596;