How to clear the unused space in fuseki - sparql

Observed a behaviour associated with fuseki that, even after dropping the graphs from a fuseki dataset (using DROP GRAPH command), the actual size of the folder "run/databases" is not decreasing. Recently read about the backup and restore mechanism to solve this issue, and just wanted to know if any alternative approach is also available for the same. Also, Is this size issue happens in fuseki 3x versions? I've observed this in fuseki 2.4.0 version.
Thanks in advance!

This answer relates to the current Apache Jena Fuseki - version 4.2.0.
TDB2 has a compaction tool (only run this when Fuseki is not running). tdb2.compact
Or, depending on your setup, curl -XPOST http://server:port/$/compact/<datasetname> will compact a database in a running server.

Related

VMWare resize disk size using vcenter api

I have been trying to solve this for the past week.
I'm using the vcenter API to add a new disk to an existing VM
https://vdc-repo.vmware.com/vmwb-repository/dcr-public/1cd28284-3b72-4885-9e31-d1c6d9e26686/71ef7304-a6c9-43b3-a3cd-868b2c236c81/doc/operations/com/vmware/vcenter/vm/hardware/disk.create-operation.html
and as able to do it successfully.
But I cannot figure out how to resize an existing VM disk.
https://vdc-repo.vmware.com/vmwb-repository/dcr-public/1cd28284-3b72-4885-9e31-d1c6d9e26686/71ef7304-a6c9-43b3-a3cd-868b2c236c81/doc/operations/com/vmware/vcenter/vm/hardware/disk.update-operation.html
This disk update operation does not allow to update the "capacity" attribute. So I'm not sure how to resolve this, unless I use an SDK.
Can someone please point me in the right direction?
I'm not 100% up to speed on the latest version, but there are several things that the REST API cannot do compared to the "old" SDK which is based on SOAP / WSDL.
The documentation on the page also states that the call only: "Updates the configuration of a virtual disk. An update operation can be used to detach the existing VMDK file and attach another VMDK file to the virtual machine." So there's no mention of changing the size (which is pretty lame I have to say...).
So I think unfortunately it seems like you either
Wait for a new version and hope this will be included
You use the good old SDK

Host Disk Usage: Warning message regarding disk usage

I've downloaded version HDF_3.0.2.0_vmware of the Hortonworks Sandbox. I am using VMWare Player version 6.0.7 on my laptop. Shortly after startup/logging into Ambari, I see this alert:
The message that is cut off reads: "Capacity Used: [60.11%, 32.3 GB], Capacity Total: [53.7 GB], path=/usr/hdp". I'd hoped that I would be able to focus on NiFi/Storm development rather than administering the sandbox itself, however it looks like the VM is undersized. Here are the VM settings I have for storage. How do I go about correcting the underlying issue prompting the alert?
I had similar issue, it's about node partitioning and directories mounted for data under HDFS -> Configs -> Settings -> DataNode
You can check your node partitioning using below command-
lsblk -o NAME,FSTYPE,SIZE,MOUNTPOINT,LABEL
Mostly hdfs namenode or datanode directories point to root partitions. We can change thresholds values for alerts temporary and to have permanent solution we can add additional data directories.
Below links can he helpful to do the same.
https://community.hortonworks.com/questions/21212/configure-storage-capacity-of-hadoop-cluster.html
Check from above link - I think your partitioning is wrong you are not using "/" for hdfs directory. If you want use full disk capacity, you can create any folder name under "/" example /data/1 on every data node using command "#mkdir -p /data/1" and add to it dfs.datanode.data.dir. restart the hdfs service.
https://hadooptips.wordpress.com/2015/10/16/fixing-ambari-agent-disk-usage-alert-critical/
https://community.hortonworks.com/questions/21687/how-to-increase-the-capacity-of-hdfs.html
I am not currently able to replicate this, but based on the screenshots the warning is just that there is less space available than recommended. If this is the case everything should still work.
Given that this is a sandbox that should never be used for production, feel free to ignore the warning.
If you want to get rid fo the warning sign, it may be possible to do a quick fix by changing the warning treshold via the alert definition.
If this is still not sufficient, or you want to leverage more storage, please follow the steps outlined by #manohar

How to specify the TGT kerberos ticket cache in beeline

I have this scenario where I want to make hive jdbc connections using multiple users/principals. I can get multiple KERBEROS tickets and store them in different cache files. For example one could be in /tmp/ticket1 and the other in /tmp/ticket2. However, when I execute the beeline how do I specify which ticket to use. I want to run queries as different users.
AFAIK you can't. The whole Hadoop ecosystem assumes that you use a ticket cache in the default location - even legitimate KRB5 environment variables are ignored (or just mess with some hardcoded defaults somewhere).
On my current assignment I had to develop a crude "Beeline emulator" in plain Java to handle that issue. It took me weeks to troubleshoot the KRB and the GSS configuration issues, including weird syntax inconsistencies between various versions of OpenJDK and Sun JRE (plus Linux vs. Windows), but finally I got it working.
And no, I will never share it with anyone outside of my Big Corp client with Big Lawyer staff...

host solr after creating a new collection

I tried the example provided by the Apache Solr package.
I was trying to create a new data collection for my own schema and configurations.
There how should I start running Solr? When I was running the example, there was a start.jar in example directory to start it. Will the same jar work for my case?
If not, how to create a executable for it?
The first line on the solr install page says : "Solr already includes a working demo server in the example directory that you may use as a template" . http://wiki.apache.org/solr/SolrInstall#Setup .
Even if the recomended server is tomcat i have a feeling jetty will work just as well for you. Having the index production ready is more about knowing your fields and query patterns really well, as well as optimising the index through the schema and config for speed according to those patterns

Errors adding users to Mongodb on Ubuntu linux

I am trying to add admin users to a Mongodb running on Ubuntu Linux on AWS -
working from the mongo shell, I first 'use admin', then when I run db.addUser("admin', "password)
the command fails saying "Can't take a write lock while out of disk space",
I checked disk space and there is 1GB remaining - any help?
I have been working with EC2 instances for some years and I found similar errors using software that does not have nothing in common with MongoDB, so I bet that it's a problem related with EC2 disk volumes management rather than a MongoDB issue.
In my opinion it should be one of the following errors:
You have started MongoDB with a user that cannot modify the Mongo data directory. Are you sure that you have started up MongoDB using a user with write permissions on the data directory?
The MongoDB data directory points to a full disk volume (it is something usual when software is installed using apt, yum or whatever package manager on Amazon EC2 instances). Check your MongoDB data directory configuration and use 'df -h' on the command line to see how much available disk space you have.