table manager does not purge chunk in loki 2.4 - purge

I wanted to have a deletion of chunk data older than 31 days so I made such config
schema_config:
configs:
- from: 2020-10-24
store: boltdb-shipper
object_store: filesystem
schema: v11
index:
prefix: index_
period: 24h
storage_config:
boltdb_shipper:
active_index_directory: "/var/lib/loki/boltdb-shipper-active"
cache_location: "/var/lib/loki/boltdb-shipper-cache"
cache_ttl: 24h
shared_store: filesystem
filesystem:
directory: /var/lib/loki/chunks
chunk_store_config:
max_look_back_period: 0s
table_manager:
retention_deletes_enabled: true
retention_period: 31d
but actually the deletion never happen.
did I forget something in the configuration ?
loki version: 2.4.1

Retention in Grafana Loki is achieved either through the Table Manager or the Compactor. In your case, boltdb-shipper is used for index and chunks. Table manager isn't a great choice when it comes to local storage.
The Compactor retention will become the default and have a long-term support. It supports more granular retention policies on per tenant and per stream use cases.
Please kindly check the Guide here
compactor:
retention_enabled: true
retention_delete_delay: 2h <amount of hours to delay the retention period>
retention_delete_worker_count: 150
limits_config:
retention_period: [30d]

I'm not sure it's your case, but starting on the Loki 2.4.0, the single binary no longer runs a table-manager, this impacts anyone in the following scenarios:
Running a single binary Loki with any index type other than boltdb-shipper or boltdb
Relying on retention with the configs retention_deletes_enabled and retention_period
See more info and how to resolve this issue, in the Loki Upgrade Guide, here.

I had the same issue with Loki version 2.4.1. The chunks from the /data/loki/chunks directory were not deleted by the compactor. Tried all config variables I could find on the internet and the issue turned out to be some bug in this version of Loki. Upgrading to 2.5.0 resolved the issue for me and now I see that chunks are being removed and after the retention period is reached the volume doesn't grow any more.

Related

Spinnaker - SQL backend for front50

I am trying to setup SQL backend for front50 using the document below.
https://www.spinnaker.io/setup/productionize/persistence/front50-sql/
I have fron50-local.yaml for the mysql config.
But, not sure how to disable persistent storage in halyard config. Here, I can not completely remove persistentStorage and persistentStoreType should be one of a3,azs,gcs,redis,s3,oracle.
There is no option to disable persistent storage here.
persistentStorage:
persistentStoreType: s3
azs: {}
gcs:
rootFolder: front50
redis: {}
s3:
bucket: spinnaker
rootFolder: front50
maxKeys: 1000
oracle: {}
So within your front50-local.yaml you will want to disable the service you used to utilize e.g.
spinnaker:
gcs:
enabled: false
s3:
enabled: false
You may need/want to remove the section from your halconfig and run your apply with
hal deploy apply --no-validate
There are a number of users dealing with these same issues and some more help might be found on the Slack: https://join.spinnaker.io/
I've noticed the same issue just recently. Maybe this is because, for example Kayenta (which is an optional component to enable) is still missing the non-object storage persistent support, or...
I've created a GitHub issue on this here: https://github.com/spinnaker/spinnaker/issues/5447

Yarn local-dirs - per node setup

I've had a series of devops issues from time to time on our production cluster. Every now and then, / partition gets overwhelmed on couple of nodes. Long story short, it turns out that these nodes had 1 instead of 2 data drives. This would not be an issue if we don't have a following setup on our cluster:
<property>
<name>yarn.nodemanager.local-dirs</name>
<value>/data1/hadoop/yarn/local,/data2/hadoop/yarn/local</value>
</property>
Some devops or whoever, noticing there are no /data2 partitions on the smaller nodes, came up with the idea to simply go with / partition. Since / is 16GB, some of the more data-demanding jobs quickly fill the thing.
Now, my question: does yarn support per-node setup of yarn.nodemanager.local-dirs?
I resolved the problem by removing /data2/hadoop/yarn/local from the story, but it doesn't feel perfect.
We're using HDP 2.6.4.
Thx!
YARN allows this since each Node Manager would read it's local yarn-site.xml. However, I don't know how you would do this in Ambari.

Apache Drill - Hive Integration: Drill Not listing Tables

I have been trying to integrate Apache Drill with Hive using Hive Storage Plugin configuration. I configured the storage plugin with all the necessary properties required. On Drill Shell, I can view the Hive Databases using:
Show Databases;
But when i try to list tables using:
Show Tables;
I get no results (No List of Tables).
Below are the steps i have followed from Apache Drill documentation and other sources:
I created a Drill Distributed Cluster by updating drill-override.conf with same cluster id on all nodes along with ZK IP with Port and then invoking drillbit.sh on each node.
Started Drill shell using drill-conf, Ensured that the Hive metastore service is active as well.
Below is configuration made in Hive Storage Plugin for Drill (from its Web-UI):
{
"type": "hive",
"configProps": {
"hive.metastore.uris": "thrift://node02.cluster.com:9083",
"javax.jdo.option.ConnectionURL": "jdbc:mysql://node02.cluster.com/hive",
"hive.metastore.warehouse.dir": "/apps/hive/warehouse",
"fs.default.name": "hdfs://node01.cluster.com:8020",
"hive.metastore.sasl.enabled": "false"
},
"enabled": true
}
All the properties are set after referring to hive-site.xml
So, That's what all others have done to integrate Drill with Hive. Am i missing something here?
Regarding Versions-
Drill: 1.14 ,Hive : 1.2 (Hive Metastore: MySQL)
We also have Hive Server2 on the same nodes, is that causing any issues?
I just want to integrate Drill with Hive 1.2, am i doing it right?
Any pointers will be helpful, have spent nearly 2 days to get it right.
Thanks for your time.
Starting from Drill 1.13 version Drill leverages Hive client 2.3.2 version.
It is recommended to use Hive 2.3 version to avoid unpredictable issues.
Regarding your setup, please remove all configProps except hive.metastore.uris.
The other configs can be default (it is in HiveConf.java) or can be specified in your hive-site.xml.
Also in case of empty result after usage Show Tables; even after executing use hive, check for errors in Drill's log files. If some error is there, you can create the Jira ticket to improve the output from Drill to reflect that issue.

How to submit code to a remote Spark cluster from IntelliJ IDEA

I have two clusters, one in local virtual machine another in remote cloud. Both clusters in Standalone mode.
My Environment:
Scala: 2.10.4
Spark: 1.5.1
JDK: 1.8.40
OS: CentOS Linux release 7.1.1503 (Core)
The local cluster:
Spark Master: spark://local1:7077
The remote cluster:
Spark Master: spark://remote1:7077
I want to finish this:
Write codes(just simple word-count) in IntelliJ IDEA locally(on my laptp), and set the Spark Master URL to spark://local1:7077 and spark://remote1:7077, then run my codes in IntelliJ IDEA. That is, I don't want to use spark-submit to submit a job.
But I got some problem:
When I use the local cluster, everything goes well. Run codes in IntelliJ IDEA or use spark-submit can submit job to cluster and can finish the job.
But When I use the remote cluster, I got a warning log:
TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
It is sufficient resources not sufficient memory!
And this log keep printing, no further actions. Both spark-submit and run codes in IntelliJ IDEA result the same.
I want to know:
Is it possible to submit codes from IntelliJ IDEA to remote cluster?
If it's OK, does it need configuration?
What are the possible reasons that can cause my problem?
How can I handle this problem?
Thanks a lot!
Update
There is a similar question here, but I think my scene is different. When I run my codes in IntelliJ IDEA, and set Spark Master to local virtual machine cluster, it works. But I got Initial job has not accepted any resources;... warning instead.
I want to know whether the security policy or fireworks can cause this?
Submitting code programatically (e.g. via SparkSubmit) is quite tricky. At the least there is a variety of environment settings and considerations -handled by the spark-submit script - that are quite difficult to replicate within a scala program. I am still uncertain of how to achieve it: and there have been a number of long running threads within the spark developer community on the topic.
My answer here is about a portion of your post: specifically the
TaskSchedulerImpl: Initial job has not accepted any resources; check
your cluster UI to ensure that workers are registered and have
sufficient resources
The reason is typically there were a mismatch on the requested memory and/or number of cores from your job versus what were available on the cluster. Possibly when submitting from IJ the
$SPARK_HOME/conf/spark-defaults.conf
were not properly matching the parameters required for your task on the existing cluster. You may need to update:
spark.driver.memory 4g
spark.executor.memory 8g
spark.executor.cores 8
You can check the spark ui on port 8080 to verify that the parameters you requested are actually available on the cluster.

Amazon EC2 || RHEL || Connection refused on port 22 after reboot

I am aware that this question is asked many times in forums and I have tried all solutions mentioned in them, but no luck.
Actually, I doubt when last time I was trying to replace the /etc/sysconfig/iptables with my own iptables rules, I mistakenly replaced /etc/init.d/iptables and restarted the machine. And as expected it didn't start. Then I detached the EBS from this instance and attached to a new RHEL instance and fix the mess up by copying back the /etc/init.d/iptables from backup (I used to take backups before replacement :) ) and same for /etc/sysconfig/iptables.
I have also put some custom startup scripts in /etc/init.d folder for our application to start on instance reboot. I have removed those too to make sure any of my script is not causing this. But still system is not allowing me to connect via ssh. AWS console is showing 2/2 checks being successful, but not able to connect via 22.
Here is the last few lines of system log which states that something wrong is happening after or on iptables startup but not showing what. :(
blkfront: xvde1: barriers disabled
Changing capacity of (202, 65) to 62914560 sectors
xvde1: detected capacity change from 0 to 32212254720
EXT4-fs (xvde1): mounted filesystem with ordered data mode. Opts:
dracut: Mounted root filesystem /dev/xvde1
dracut: Loading SELinux policy
type=1404 audit(1398404320.826:2): enforcing=1 old_enforcing=0 auid=4294967295 ses=4294967295
type=1403 audit(1398404321.795:3): policy loaded auid=4294967295 ses=4294967295
dracut:
dracut: Switching root
udev: starting version 147
Initialising Xen virtual ethernet driver.
microcode: CPU0 sig=0x306e4, pf=0x1, revision=0x415
platform microcode: firmware: requesting intel-ucode/06-3e-04
Microcode Update Driver: v2.00 <tigran#aivazian.fsnet.co.uk>, Peter Oruba
NET: Registered protocol family 10
lo: Disabled Privacy Extensions
ip6_tables: (C) 2000-2006 Netfilter Core Team
nf_conntrack version 0.5.0 (16384 buckets, 65536 max)
ip_tables: (C) 2000-2006 Netfilter Core Team
Can anyone help me in identifying what is going wrong here?
Got it fixed.
Actually, it was not the problem of iptables. Again it was due to the known bug in RHEL 6.4 on EC2 which puts wrong entries in sshd_config files. Although, I have checked this file for wrong entries in my first attempt to resolve the issue, somehow it was being created again, may be because every time I start a new machine using my AMI or new RHEL 6.4 AMI. In both cases, AMI is still registered as 6.4, though the OS on the disk is updated to 6.5. May be this was the reason that it was creating wrong entries in sshd_config. Now, again I have fixed this file for wrong entries and created new AMI using RHEL 6.5 and attached the EBS volume from instance created using my RHEL 6.4 AMI, it works fine.