We have a Scala Spark application, that reads something like 70K records from the DB to a data frame, each record has 2 fields.
After reading the data from the DB, we make minor mapping and load this as a broadcast for later usage.
Now, in local environment, there is an exception, timeout from the RetryingBlockFetcher while running the following code:
dataframe.select("id", "mapping_id")
.rdd.map(row => row.getString(0) -> row.getLong(1))
.collectAsMap().toMap
The exception is:
2022-06-06 10:08:13.077 task-result-getter-2 ERROR
org.apache.spark.network.shuffle.RetryingBlockFetcher Exception while
beginning fetch of 1 outstanding blocks
java.io.IOException: Failed to connect to /1.1.1.1:62788
at
org.apache.spark.network.client.
TransportClientFactory.createClient(Transpor .tClientFactory.java:253)
at
org.apache.spark.network.client.
TransportClientFactory.createClient(TransportClientFactory.java:195)
at
org.apache.spark.network.netty.
NettyBlockTransferService$$anon$2.
createAndStart(NettyBlockTransferService.scala:122)
In the local environment, I simply create the spark session with local "spark.master"
When I limit the max of records to 20K, it works well.
Can you please help? maybe I need to configure something in my local environment in order that the original code will work properly?
Update:
I tried to change a lot of Spark-related configurations in my local environment, both memory, a number of executors, timeout-related settings, and more, but nothing helped! I just got the timeout after more time...
I realized that the data frame that I'm reading from the DB has 1 partition of 62K records, while trying to repartition with 2 or more partitions the process worked correctly and I managed to map and collect as needed.
Any idea why this solves the issue? Is there a configuration in the spark that can solve this instead of repartition?
Thanks!
I have a DAG Generator that takes a JSON input and creates a new dynamic DAG in the dags directory. The time it takes for that newly created DAG to be available to use (through the API) can range from 2 seconds to 5 minutes.
I ran the test 100 times:
Create a new DAG (with the same input JSON, so the dynamic dags are
identical)
Once the DAG is saved in the dags directory, start
sending API requests to see if the DAG can be triggered.
Track the seconds passed before I was able to successfully trigger the
DAG.
Results are as follows:
[14.81, 6.44, 6.38, 6.36, 2.21, 6.42, 18.96, 23.14, 23.11, 14.82, 6.39, 23.10, 18.93, 14.80, 23.20, 31.49, 48.29, 35.83, 27.20, 18.96, 14.80, 44.14, 35.66, 35.77, 39.92, 31.50, 69.15, 48.22, 69.29, 39.87, 10.53, 69.15, 27.37, 48.22, 77.51, 39.90, 27.35, 65.03, 69.16, 31.47, 65.06, 90.00, 2.19, 111.33, 69.19, 98.46, 90.16, 27.28, 60.89, 56.57, 110.96, 18.92, 140.55, 39.95, 94.22, 85.89, 44.29, 94.54, 69.21, 136.20, 35.72, 102.57, 102.63, 81.72, 98.58, 77.55, 148.83, 102.79, 136.38, 115.22, 94.38, 148.68, 119.43, 48.24, 178.09, 81.80, 127.64, 119.59, 44.22, 194.88, 23.17, 170.00, 211.47, 153.18, 249.55, 182.40, 152.98, 86.00, 157.02, 98.54, 270.02, 81.75, 153.04, 69.23, 265.92, 27.30, 278.64, 23.19, 269.98, 81.91]
Average Time: 79.35 seconds
You can see that as the number of files in the dags folder increased, the time it took for the DAG to be triggered also increased, but it's still somewhat random. Is there any way to keep this consistent (without restarting the Airflow server after each creation). Or speed it up?
Thank you!
my presto version is 0.240
my operation: i want to use ssl for use https in presto
so i change my config refer only by this url: https://trino.io/docs/current/security/internal-communication.html
but i can't Access to the presto address https://192.168.100.142:9999/
I don't know which step I did wrong.
What should I do to implement HTTPS for Presto?
this is my config:
A cluster of two machines
node 1 142 hostname:sbider-dev-01
/opt/presto-server-0.240/etc/config.properties
coordinator=true
node-scheduler.include-coordinator=true
query.max-memory=7.5GB
query.max-memory-per-node=3.5GB
query.max-total-memory-per-node=3.5GB
experimental.reserved-pool-enabled=false
memory.heap-headroom-per-node=0.5GB
#experimental.spill-enabled=true
#experimental.max-spill-per-node=8GB
#experimental.query-max-spill-per-node=8GB
query.low-memory-killer.policy=total-reservation-on-blocked-nodes
#http-server.http.port=9999
#discovery-server.enabled=true
#discovery.uri=http://192.168.100.142:9999
internal-communication.shared-secret="8HRJWX41DwtuYZcNw8uMbshA8wDLoLS78tT3UVL+Z+m0xG7KCygGurE9SXEbGy2bLtPLza1MhAnWJp2mJp/S+j9EFWWuztXz7cHJhSz9QFiVxYCs1Wzn+IVKgHD5z+iGbdKjwRtgUjwNvS4MIfqwqwKlVZiEtGgEDv7j/kAgpOYPvFCRJfb/U/+b7qPpwPNDA6kXu3Dj5p1Q81+kmbFO59WSh6c4QwqdbFHAaY8XFWo8tIogxpmwQQqV3BvICmesxlIhBH/pOGgoyl86QQ/TaAMaWjaddNcgO5keTGhhOj/juGZ/gbOL/PHGNs1ENSPRnjvIGLHFQPDrm36YenhfTH5L7X0Q9HwwnEpEoYkDJsmMEV+elPZK767nZXHryuvDvHGs0PhYSRO8ekOgC3CaE1tfiGh5M9H5C2fnyeGRQ0iwtgXh83kRDuPzVrRx5yj2cHQJOZu+CcXCJ3aa1Tijxq56RfdcEz9Frr8n8aXaNMtRlchcXn3+B4biByS9duq28VHHBDlyYQQ6VSKbLDt1GBi5oOQICtrGuOY+/MD+rnV5uxPUQcSIh9KmA1WjahJEz0ItDKpB66JgVkTrVDWEJPeozKTvHRLG9sBudRhQ5abJGEAhx9b78dUbTcEkRlPuvUN1WjwVlUzjyUDKd14ocuhpoOBzjV9kFhTqQZ4zgNo="
http-server.http.enabled=false
#node.internal-address-source=FQDN
node.internal-address=sbider-dev-01,sbider-dev-02
http-server.https.enabled=true
http-server.https.port=9999
# jks文件全路径
http-server.https.keystore.path=/ceshi/keystore.jks
http-server.https.keystore.key=123456
discovery.uri=https://192.168.100.142:9999
internal-communication.https.required=true
internal-communication.https.keystore.path=/ceshi/keystore.jks
internal-communication.https.keystore.key=123456
node 2 143 hostname cat /opt/presto-server-0.240/etc/config.properties
coordinator=flase
query.max-memory=7.5GB
query.max-memory-per-node=3.5GB
query.max-total-memory-per-node=3.5GB
experimental.reserved-pool-enabled=false
memory.heap-headroom-per-node=0.5GB
#experimental.spill-enabled=true
#experimental.max-spill-per-node=8GB
#experimental.query-max-spill-per-node=8GB
query.low-memory-killer.policy=total-reservation-on-blocked-nodes
#discovery.uri=http://192.168.100.142:9999
internal-communication.shared-secret="8HRJWX41DwtuYZcNw8uMbshA8wDLoLS78tT3UVL+Z+m0xG7KCygGurE9SXEbGy2bLtPLza1MhAnWJp2mJp/S+j9EFWWuztXz7cHJhSz9QFiVxYCs1Wzn+IVKgHD5z+iGbdKjwRtgUjwNvS4MIfqwqwKlVZiEtGgEDv7j/kAgpOYPvFCRJfb/U/+b7qPpwPNDA6kXu3Dj5p1Q81+kmbFO59WSh6c4QwqdbFHAaY8XFWo8tIogxpmwQQqV3BvICmesxlIhBH/pOGgoyl86QQ/TaAMaWjaddNcgO5keTGhhOj/juGZ/gbOL/PHGNs1ENSPRnjvIGLHFQPDrm36YenhfTH5L7X0Q9HwwnEpEoYkDJsmMEV+elPZK767nZXHryuvDvHGs0PhYSRO8ekOgC3CaE1tfiGh5M9H5C2fnyeGRQ0iwtgXh83kRDuPzVrRx5yj2cHQJOZu+CcXCJ3aa1Tijxq56RfdcEz9Frr8n8aXaNMtRlchcXn3+B4biByS9duq28VHHBDlyYQQ6VSKbLDt1GBi5oOQICtrGuOY+/MD+rnV5uxPUQcSIh9KmA1WjahJEz0ItDKpB66JgVkTrVDWEJPeozKTvHRLG9sBudRhQ5abJGEAhx9b78dUbTcEkRlPuvUN1WjwVlUzjyUDKd14ocuhpoOBzjV9kFhTqQZ4zgNo="
http-server.http.enabled=false
#node.internal-address-source=FQDN
node.internal-address=sbider-dev-01,sbider-dev-02
http-server.https.enabled=true
http-server.https.port=9999
http-server.https.keystore.path=/ceshi/keystore.jks
http-server.https.keystore.key=123456
discovery.uri=https://192.168.100.142:9999
internal-communication.https.required=true
internal-communication.https.keystore.path=/ceshi/keystore.jks
internal-communication.https.keystore.key=123456
server log in sbider-dev-01: cat /opt/presto-server-0.240/var/log/server.log
Companion catalogs: catalog_name1=catalog_name2,catalog_name3=catalog_name4,...
2021-01-12T12:41:09.766+0800 INFO main Bootstrap transaction.idle-check-interval 1.00m 1.00m Time interval between idle transactions checks
2021-01-12T12:41:09.766+0800 INFO main Bootstrap transaction.idle-timeout 5.00m 5.00m Amount of time before an inactive transaction is considered expired
2021-01-12T12:41:09.767+0800 INFO main Bootstrap transaction.max-finishing-concurrency 1 1 Maximum parallelism for committing or aborting a transaction
2021-01-12T12:41:09.767+0800 WARN main Bootstrap UNUSED PROPERTIES
2021-01-12T12:41:09.767+0800 WARN main Bootstrap internal-communication.shared-secret
2021-01-12T12:41:09.767+0800 WARN main Bootstrap
2021-01-12T12:41:11.037+0800 ERROR main com.facebook.presto.server.PrestoServer Unable to create injector, see the following errors:
1) Configuration property 'internal-communication.shared-secret' was not used
at com.facebook.airlift.bootstrap.Bootstrap.lambda$initialize$2(Bootstrap.java:238)
1 error
com.google.inject.CreationException: Unable to create injector, see the following errors:
1) Configuration property 'internal-communication.shared-secret' was not used
at com.facebook.airlift.bootstrap.Bootstrap.lambda$initialize$2(Bootstrap.java:238)
1 error
at com.google.inject.internal.Errors.throwCreationExceptionIfErrorsExist(Errors.java:543)
at com.google.inject.internal.InternalInjectorCreator.initializeStatically(InternalInjectorCreator.java:159)
at com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:106)
at com.google.inject.Guice.createInjector(Guice.java:87)
at com.facebook.airlift.bootstrap.Bootstrap.initialize(Bootstrap.java:245)
at com.facebook.presto.server.PrestoServer.run(PrestoServer.java:131)
at com.facebook.presto.server.PrestoServer.main(PrestoServer.java:77)
You're following Trino (fka Presto SQL) documentation for securing internal documentation, but got Presto binary from facebook's fork of the project (prestodb).
Go to https://trino.io/download.html to get latest Trino release.
The alternative solution (using prestodb's documentation and prestodb's binary) is NOT a safe, viable alternative, due to security issues known and not fixed in prestodb code base.
Maybe the problem should be solved in another way.
I am using a Grafana SingleStat pane with "Delta" activated and the dashboard shows me "today so far".
Prometheus metric: sum(request_duration_count{...})
Problem:
I have a metric counting requests. Between 03:00 and 06:00 an automated test triggers my service and the metric is incremented by value x. (I set a Grafana annotation at the starting point.)
I want to get a single stat request count without the test requests.
Advanced Problem:
Nagios checks every y minutes and also increments the counter.
How can I remove these "test-counts"?
Any ideas?
I'm maintaining an antedeluvian Notes application which connects to a SAP back-end via a manually done 'Webservice'
The server is running Domino Release 7.0.4FP2 HF97.
The Webservice is not the more recently Webservice Consumer, but a large Java agent which is using Apache soap.jar (org.apache.soap). Below an example of the calling code.
private Call setupSOAPCall() {
Call call = new Call();
SOAPHTTPConnection conn = new SOAPHTTPConnection();
call.setSOAPTransport(conn);
call.setEncodingStyleURI(Constants.NS_URI_SOAP_ENC);
There has been a change in the SAP system which is now taking 8 minutes to complete (verified by SAP Team).
I'm getting an error message as follows:
[SOAPException: faultCode=SOAP-ENV:Client; msg=For input string: "906 "; targetException=java.lang.NumberFormatException: For input string: "906 "]
I found a blog article describing the error message quite closely:
https://thejavablog.wordpress.com/category/jmeter/
and I've come to the hypothesis that it is a timeout message that is returning to my Call object and that this timeout message is being incorrectly parsed, hence the NumberFormat Exception.
Looking at my logs I can see that there is a time difference of 62 seconds between my call and the response.
I recommended that the server setting in the server document, tab Internet Protocols/HTTP/Timeouts/Request timeouts be changed from 60 seconds to 600 seconds, and the http task restarted with
tell http restart
I've re-run the tests and I am getting the same error, and the time difference is still slightly more than 60 seconds, which is not what I was expecting.
I read Michael Rulnau's blog entry
http://www.mruhnau.net/2014/06/how-to-overcome-domino-webservice.html
which points to this APR
http://www-01.ibm.com/support/docview.wss?uid=swg1LO48272
but I'm not convinced that this would apply in this case, since there is no way that IBM would know that my Java agent is in fact making a Soap call.
My current hypothesis is that I have to use either the setTimeout() method on
org.apache.axis.client.Call
https://axis.apache.org/axis/java/apiDocs/org/apache/axis/client/Call.html
or on the org.apache.soap.transport.http.SOAPHTTPConnection
https://docs.oracle.com/cd/B13789_01/appdev.101/b12024/org/apache/soap/transport/http/SOAPHTTPConnection.html
and that the timeout value is an apache default, not something that is controlled by the Domino server.
I'd be grateful for any help.
I understand your approach, and I hope this is the correct one to solve your problem.
Add a debug (console write would be fine) that display the default Timeout then try to increase it to 10 min.
SOAPHTTPConnection conn = new SOAPHTTPConnection();
System.out.println("time out is :" + conn.getTimeout());
conn.setTimeout(600000);//10 min in ms
System.out.println("after setting it, time out is :" + conn.getTimeout());
call.setSOAPTransport(conn);
Now keep in mind that Dommino has also a Max LotusScript/Java execution time, check this value and (at least for a try) change it: http://www.ibm.com/support/knowledgecenter/SSKTMJ_9.0.1/admin/othr_servertasksagentmanagertab_r.html (it's version 9 help but this part should be identical)
I've since discovered that it wasn't my code generating the error; the default timeout for the apache axis SOAPHTTPConnetion is 0, i.e. no timeout.