LibGit2Sharp equivalent of git diff --stat - libgit2

I'm looking for a way to capture how many lines have changed in each file in my working directory - like git diff --stat in git - is there a way to do this with LibGit2Sharp?
I know I can get total LinesAdded/Deleted from a Patch, but I'm wondering on a file by file basis.

The following will enumerate all the files that have been changed between two commits, along with the number of changes (global, line additions and line removals).
var patch = repo.Diff.Compare<Patch>(fromCommit, untilCommit);
foreach (var pec in patch)
{
Console.WriteLine("{0} = {1} ({2}+ and {3}-)",
pec.Path,
pec.LinesAdded + pec.LinesDeleted,
pec.LinesAdded,
pec.LinesDeleted);
}
Would you need to access a specific file within a Patch, the types exposes an indexer to ease that
PatchEntryChanges entryChanges = patch["path/to/my/file.txt"];

Related

How to determine why process fails in kotlin script

I am developing a kotlin script which executes code on the platform which it is running on. Platform code is called using this method from the script:
fun exec(command: String, vararg arguments: String, runLive: Boolean = !isDebug): OutputStream {
val allArgs = arguments.joinToString(" ")
if (runLive) {
val process = Runtime.getRuntime().exec(command, arguments)
val exitCode = process.waitFor()
if (exitCode != 0) {
val platformError = String(BufferedInputStream(process.errorStream).readAllBytes(), Charset.defaultCharset())
throw IllegalStateException("Execution of '$command $allArgs' failed with exit code $exitCode!\n$platformError")
}
return process.outputStream
} else {
println("$command $allArgs")
return object : OutputStream() {
override fun write(b: Int) {
println("dummy $b")
}
}
}
}
In the script, I try to get all the tags for a git repository using this call:
exec(command = "git", "tag", runLive = true)
When the command fails, how can process.errorStream be read? The output now is not readable and the script failure says:
java.lang.IllegalStateException: Execution of 'git tag' failed with exit code 1!
at Snap_tag_main.exec(snap-tag.main.kts:75)
at Snap_tag_main.<init>(snap-tag.main.kts:64)
Not all commands use the error stream when displaying errors. The solution is to bring both the input stream as well as the output stream like this:
if (exitCode != 0) {
val platformMessages = """
${String(BufferedInputStream(process.inputStream).readAllBytes(), Charset.defaultCharset())}
${String(BufferedInputStream(process.errorStream).readAllBytes(), Charset.defaultCharset())}
""".trimIndent().trim()
throw IllegalStateException("Execution of '$command $allArgs' failed with exit code $exitCode!\n$platformMessages\n---")
}
The failure message now looks like
java.lang.IllegalStateException: Execution of 'git tag' failed with exit code 1!
usage: git [--version] [--help] [-C <path>] [-c <name>=<value>]
[--exec-path[=<path>]] [--html-path] [--man-path] [--info-path]
[-p | --paginate | -P | --no-pager] [--no-replace-objects] [--bare]
[--git-dir=<path>] [--work-tree=<path>] [--namespace=<name>]
[--super-prefix=<path>] [--config-env=<name>=<envvar>]
<command> [<args>]
These are common Git commands used in various situations:
start a working area (see also: git help tutorial)
clone Clone a repository into a new directory
init Create an empty Git repository or reinitialize an existing one
work on the current change (see also: git help everyday)
add Add file contents to the index
mv Move or rename a file, a directory, or a symlink
restore Restore working tree files
rm Remove files from the working tree and from the index
examine the history and state (see also: git help revisions)
bisect Use binary search to find the commit that introduced a bug
diff Show changes between commits, commit and working tree, etc
grep Print lines matching a pattern
log Show commit logs
show Show various types of objects
status Show the working tree status
grow, mark and tweak your common history
branch List, create, or delete branches
commit Record changes to the repository
merge Join two or more development histories together
rebase Reapply commits on top of another base tip
reset Reset current HEAD to the specified state
switch Switch branches
tag Create, list, delete or verify a tag object signed with GPG
collaborate (see also: git help workflows)
fetch Download objects and refs from another repository
pull Fetch from and integrate with another repository or a local branch
push Update remote refs along with associated objects
'git help -a' and 'git help -g' list available subcommands and some
concept guides. See 'git help <command>' or 'git help <concept>'
to read about a specific subcommand or concept.
See 'git help git' for an overview of the system.
---
at Snap_tag_main.exec(snap-tag.main.kts:82)
at Snap_tag_main.<init>(snap-tag.main.kts:66)

How to use the taildir source in Flume to append only newest lines of a .txt file?

I recently asked the question Apache Flume - send only new file contents
I am rephrasing the question in order to learn more and provide more benefitto future users of Flume.
Setup: Two servers, one with a .txt file that gets lines appended to it regularly.
Goal: Use flume TAILDIR source to append the most recently written line to a file on the other server.
Issue: Whenever the source file has a new line of data added, the current configuration appends everything in file on server 1 to the file in server 2. This results in duplicate lines in file 2 and does not properly recreate the file from server 1.
Configuration on server 1:
#configure the agent
agent.sources=r1
agent.channels=k1
agent.sinks=c1
#using memort channel to hold upto 1000 events
agent.channels.k1.type=memory
agent.channels.k1.capacity=1000
agent.channels.k1.transactionCapacity=100
#connect source, channel,sink
agent.sources.r1.channels=k1
agent.sinks.c1.channel=k1
#define source
agent.sources.r1.type=TAILDIR
agent.sources.r1.channels=k1
agent.sources.r1.filegroups=f1
agent.sources.r1.filegroups.f1=/home/tail_test_dir/test.txt
agent.sources.r1.maxBackoffSleep=1000
#connect to another box using avro and send the data
agent.sinks.c1.type=avro
agent.sinks.c1.hostname=10.10.10.4
agent.sinks.c1.port=4545
Configuration on server 2:
#configure the agent
agent.sources=r1
agent.channels=k1
agent.sinks=c1
#using memory channel to hold up to 1000 events
agent.channels.k1.type=memory
agent.channels.k1.capacity=1000
agent.channels.k1.transactionCapacity=100
#connect source, channel, sink
agent.sources.r1.channels=k1
agent.sinks.c1.channel=k1
#here source is listening at the specified port using AVRO for data
agent.sources.r1.type=avro
agent.sources.r1.bind=0.0.0.0
agent.sources.r1.port=4545
#use file_roll and write file at specified directory
agent.sinks.c1.type=file_roll
agent.sinks.c1.sink.directory=/home/Flume_dump
You have to set position json file. Then the source check the position and write only new added lines to sink.
ex) agent.sources.s1.positionFile = /var/log/flume/tail_position.json

Remove artifacts from CI manually

I have a private repository at gitlab.com that uses the CI feature. Some of the CI jobs create artifacts files that are stored. I just implemented that the artifacts are deleted automatically after one day by adding this to the CI configuration:
expire_in: 1 day
That works great - however, old artifacts won't be deleted (as expected). So my question is:
How can I delete old artifacts or artifacts that do not expire? (on gitlab.com, no direct access to the server)
You can use the GitLab REST API to delete the artifacts from the jobs if you don't have direct access to the server. Here's a sample curl script that uses the API:
#!/bin/bash
# project_id, find it here: https://gitlab.com/[organization name]/[repository name]/edit inside the "General project settings" tab
project_id="3034900"
# token, find it here: https://gitlab.com/profile/personal_access_tokens
token="Lifg_azxDyRp8eyNFRfg"
server="gitlab.com"
# go to https://gitlab.com/[organization name]/[repository name]/-/jobs
# then open JavaScript console
# copy/paste => copy(_.uniq($('.ci-status').map((x, e) => /([0-9]+)/.exec(e.href)).toArray()).join(' '))
# press enter, and then copy the result here :
# repeat for every page you want
job_ids=(48875658 48874137 48873496 48872419)
for job_id in ${job_ids[#]}
do
URL="https://$server/api/v4/projects/$project_id/jobs/$job_id/erase"
echo "$URL"
curl --request POST --header "PRIVATE-TOKEN:${token}" "$URL"
echo "\n"
done
An API call should be easier to script, with GitLab 14.7 (January 2022), which now offers:
Bulk delete artifacts with the API
While a good strategy for managing storage consumption is to set regular expiration policies for artifacts, sometimes you need to reduce items in storage right away.
Previously, you might have used a script to automate the tedious task of deleting artifacts one by one with API calls, but now you can use a new API endpoint to bulk delete job artifacts quickly and easily.
See Documentation, Issue 223793 and Merge Request 75488.
curl --request DELETE --header "PRIVATE-TOKEN: <your_access_token>" \
"https://gitlab.example.com/api/v4/projects/1/artifacts"
Building on top of #David 's answer,
#Philipp pointed out that there is now an api endpoint to delete only the job artifacts instead of the entire job.
You can run this script directly in the browser's Dev Tools console, or use node-fetch to run in node.js.
//Go to: https://gitlab.com/profile/personal_access_tokens
const API_KEY = "API_KEY";
//You can find project id inside the "General project settings" tab
const PROJECT_ID = 12345678;
const PROJECT_URL = "https://gitlab.com/api/v4/projects/" + PROJECT_ID + "/"
let jobs = [];
for(let i = 0, currentJobs = []; i == 0 || currentJobs.length > 0; i++){
currentJobs = await sendApiRequest(
PROJECT_URL + "jobs/?per_page=100&page=" + (i + 1)
).then(e => e.json());
jobs = jobs.concat(currentJobs);
}
//skip jobs without artifacts
jobs = jobs.filter(e => e.artifacts);
//keep the latest build.
jobs.shift();
for(let job of jobs)
await sendApiRequest(
PROJECT_URL + "jobs/" + job.id + "/artifacts",
{method: "DELETE"}
);
async function sendApiRequest(url, options = {}){
if(!options.headers)
options.headers = {};
options.headers["PRIVATE-TOKEN"] = API_KEY;
return fetch(url, options);
}
According to the documentation, deleting the entire job log (click on the trash can) will also delete the artifacts.
I am on GitLab 8.17 and am able to remove artifacts for particular job by navigating to storage directory on server itself, default path is:
/var/opt/gitlab/gitlab-rails/shared/artifacts/<year_month>/<project_id?>/<jobid>
Removing both whole folder for job or simply contents, disappears artifact view from GitLab pipline page.
The storage path can be changed as described in docs:
https://gitlab.com/gitlab-org/gitlab-ce/blob/master/doc/administration/job_artifacts.md#storing-job-artifacts
If you have deleted all the jobs by accident (thinking the artifacts would be gone, but they didn't) what would be the alternative then brute-forcing a loop range?
I have this code, which does bruteforce on a range of numbers. But since I use the gitlab.com public runners, It's a long-range
# project_id, find it here: https://gitlab.com/[organization name]/[repository name]/edit inside the "General project settings" tab
project_id="xxxxxx" #
# token, find it here: https://gitlab.com/profile/personal_access_tokens
token="yyyyy"
server="gitlab.com"
# Get a range of the oldest known job and the lastet known one, then bruteforce. Used in the case when you deleted pipelines and can't retrive Job Ids.
# https://stackoverflow.com/questions/52609966/for-loop-over-sequence-of-large-numbers-in-bash
for (( job_id = 59216999; job_id <= 190239535; job_id++ )) do
echo "$job_id"
echo Job ID being deleted is "$job_id"
curl --request POST --header "PRIVATE-TOKEN:${token}" "https://${server}/api/v4/projects/${project_id}/jobs/${job_id}/erase"
echo -en '\n'
echo -en '\n'
done
This Python solution worked for me with GitLab 13.11.3.
#!/bin/python3
# delete_artifacts.py
import json
import requests
# adapt accordingly
base_url='https://gitlab.example.com'
project_id='1234'
access_token='123412341234'
#
# Get Version Tested with Version 13.11.3
# cf. https://docs.gitlab.com/ee/api/version.html#version-api
#
print(f'GET /version')
x= (requests.get(f"{base_url}/api/v4/version", headers = {"PRIVATE-TOKEN": access_token }))
print(x)
data=json.loads(x.text)
print(f'Using GitLab version {data["version"]}. Tested with 13.11.3')
#
# List project jobs
# cf. https://docs.gitlab.com/ee/api/jobs.html#list-project-jobs
#
request_str=f'projects/{project_id}/jobs'
url=f'{base_url}/api/v4/{request_str}'
print(f'GET /{request_str}')
x= (requests.get(url, headers = {"PRIVATE-TOKEN": access_token }))
print(x)
data=json.loads(x.text)
input('WARNING: This will delete all artifacts. Job logs will remain be available. Press Enter to continue...' )
#
# Delete job artifacts
# cf. https://docs.gitlab.com/ee/api/job_artifacts.html#delete-artifacts
#
for entry in data:
request_str=f'projects/{project_id}/jobs/{entry["id"]}/artifacts'
url=f'{base_url}/api/v4/{request_str}'
print(f'DELETE /{request_str}')
x = requests.delete(url, headers = {"PRIVATE-TOKEN": access_token })
print(x)
I'll keep an updated version here. Feel free to reach out and improve the code.

How can I inspect a Hadoop SequenceFile for which I lack full schema information?

I have a compressed Hadoop SequenceFile from a customer which I'd like to inspect. I do not have full schema information at this time (which I'm working on separately).
But in the interim (and in the hopes of a generic solution), what are my options for inspecting the file?
I found a tool forqlift: http://www.exmachinatech.net/01/forqlift/
And have tried 'forqlift list' on the file. It complains that it can't load classes for the custom subclass Writables included. So I will need to track down those implementations.
But is there any other option available in the meantime? I understand that most likely I can't extract the data, but is there some tool for scanning how many key values and of what type?
From shell:
$ hdfs dfs -text /user/hive/warehouse/table_seq/000000_0
or directly from hive (which is much faster for small files, because it is running in an already started JVM)
hive> dfs -text /user/hive/warehouse/table_seq/000000_0
works for sequence files.
Check the SequenceFileReadDemo class in the 'Hadoop : The Definitive Guide'- Sample Code. The sequence files have the key/value types embedded in them. Use the SequenceFile.Reader.getKeyClass() and SequenceFile.Reader.getValueClass() to get the type information.
My first thought would be to use the Java API for sequence files to try to read them. Even if you don't know which Writable is used by the file, you can guess and check the error messages (there may be a better way that I don't know).
For example:
private void readSeqFile(Path pathToFile) throws IOException {
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(conf);
SequenceFile.Reader reader = new SequenceFile.Reader(fs, pathToFile, conf);
Text key = new Text(); // this could be the wrong type
Text val = new Text(); // also could be wrong
while (reader.next(key, val)) {
System.out.println(key + ":" + val);
}
}
This program would crash if those are the wrong types, but the Exception should say which Writable type the key and value actually are.
Edit:
Actually if you do less file.seq usually you can read some of the header and see what the Writable types are (at least for the first key/value). On one file, for example, I see:
SEQ^F^Yorg.apache.hadoop.io.Text"org.apache.hadoop.io.BytesWritable
I'm not a Java or Hadoop programmer, so my way of solving problem could be not the best one, but anyway.
I spent two days solving the problem of reading FileSeq locally (Linux debian amd64) without installation of hadoop.
The provided sample
while (reader.next(key, val)) {
System.out.println(key + ":" + val);
}
works well for Text, but didn't work for BytesWritable compressed input data.
What I did?
I downloaded this utility for creating (writing SequenceFiles Hadoop data)
github_com/shsdev/sequencefile-utility/archive/master.zip
, and got it working, then modified for reading input Hadoop SeqFiles.
The instruction for Debian running this utility from scratch:
sudo apt-get install maven2
sudo mvn install
sudo apt-get install openjdk-7-jdk
edit "sudo vi /usr/bin/mvn",
change `which java` to `which /usr/lib/jvm/java-7-openjdk-amd64/bin/java`
Also I've added (probably not required)
'
PATH="/home/mine/perl5/bin${PATH+:}${PATH};/usr/lib/jvm/java-7-openjdk-amd64/"; export PATH;
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64/
export JAVA_VERSION=1.7
'
to ~/.bashrc
Then usage:
sudo mvn install
~/hadoop_tools/sequencefile-utility/sequencefile-utility-master$ /usr/lib/jvm/java-7-openjdk-amd64/bin/java -jar ./target/sequencefile-utility-1.0-jar-with-dependencies.jar
-- and this doesn't break the default java 1.6 installation that is required for FireFox/etc.
For resolving error with FileSeq compatability (e.g. "Unable to load native-hadoop library for your platform... using builtin-java classes where applicable"), I used the libs from the Hadoop master server as is (a kind of hack):
scp root#10.15.150.223:/usr/lib/libhadoop.so.1.0.0 ~/
sudo cp ~/libhadoop.so.1.0.0 /usr/lib/
scp root#10.15.150.223:/usr/lib/jvm/java-6-sun-1.6.0.26/jre/lib/amd64/server/libjvm.so ~/
sudo cp ~/libjvm.so /usr/lib/
sudo ln -s /usr/lib/libhadoop.so.1.0.0 /usr/lib/libhadoop.so.1
sudo ln -s /usr/lib/libhadoop.so.1.0.0 /usr/lib/libhadoop.so
One night drinking coffee, and I've written this code for reading FileSeq hadoop input files (using this cmd for running this code "/usr/lib/jvm/java-7-openjdk-amd64/bin/java -jar ./target/sequencefile-utility-1.3-jar-with-dependencies.jar -d test/ -c NONE"):
import org.apache.hadoop.io.*;
import org.apache.hadoop.io.SequenceFile;
import org.apache.hadoop.io.SequenceFile.ValueBytes;
import java.io.DataOutputStream;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
Path file = new Path("/home/mine/mycompany/task13/data/2015-08-30");
reader = new SequenceFile.Reader(fs, file, conf);
long pos = reader.getPosition();
logger.info("GO from pos "+pos);
DataOutputBuffer rawKey = new DataOutputBuffer();
ValueBytes rawValue = reader.createValueBytes();
int DEFAULT_BUFFER_SIZE = 1024 * 1024;
DataOutputBuffer kobuf = new DataOutputBuffer(DEFAULT_BUFFER_SIZE);
kobuf.reset();
int rl;
do {
rl = reader.nextRaw(kobuf, rawValue);
logger.info("read len for current record: "+rl+" and in more details ");
if(rl >= 0)
{
logger.info("read key "+new String(kobuf.getData())+" (keylen "+kobuf.getLength()+") and data "+rawValue.getSize());
FileOutputStream fos = new FileOutputStream("/home/mine/outb");
DataOutputStream dos = new DataOutputStream(fos);
rawValue.writeUncompressedBytes(dos);
kobuf.reset();
}
} while(rl>0);
I've just added this chunk of code to file src/main/java/eu/scape_project/tb/lsdr/seqfileutility/SequenceFileWriter.java just after the line
writer = SequenceFile.createWriter(fs, conf, path, keyClass,
valueClass, CompressionType.get(pc.getCompressionType()));
Thanks to these sources of info:
Links:
If using hadoop-core instead of mahour, then will have to download asm-3.1.jar manually:
search_maven_org/remotecontent?filepath=org/ow2/util/asm/asm/3.1/asm-3.1.jar
search_maven_org/#search|ga|1|asm-3.1
The list of avaliable mahout repos:
repo1_maven_org/maven2/org/apache/mahout/
Intro to Mahout:
mahout_apache_org/
Good resource for learning interfaces and sources of Hadoop Java classes (I used it for writing my own code for reading FileSeq):
http://grepcode.com/file/repo1.maven.org/maven2/com.ning/metrics.action/0.2.7/org/apache/hadoop/io/BytesWritable.java
Sources of project tb-lsdr-seqfilecreator that I used for creating my own project FileSeq reader:
www_javased_com/?source_dir=scape/tb-lsdr-seqfilecreator/src/main/java/eu/scape_project/tb/lsdr/seqfileutility/ProcessParameters.java
stackoverflow_com/questions/5096128/sequence-files-in-hadoop - the same example (read key,value that doesn't work)
https://github.com/twitter/elephant-bird/blob/master/core/src/main/java/com/twitter/elephantbird/mapreduce/input/RawSequenceFileRecordReader.java - this one helped me (I used reader.nextRaw the same as in nextKeyValue() and other subs)
Also I've changed ./pom.xml for native apache.hadoop instead of mahout.hadoop, but probably this is not required, because the bugs for read->next(key, value) are the same for both so I had to use read->nextRaw(keyRaw, valueRaw) instead:
diff ../../sequencefile-utility/sequencefile-utility-master/pom.xml ./pom.xml
9c9
< <version>1.0</version>
---
> <version>1.3</version>
63c63
< <version>2.0.1</version>
---
> <version>2.4</version>
85c85
< <groupId>org.apache.mahout.hadoop</groupId>
---
> <groupId>org.apache.hadoop</groupId>
87c87
< <version>0.20.1</version>
---
> <version>1.1.2</version>
93c93
< <version>1.1</version>
---
> <version>1.1.3</version>
I was just playing with Dumbo. When you run a Dumbo job on a Hadoop cluster, the output is a sequence file. I used the following to dump out an entire Dumbo-generated sequence file as plain text:
$ bin/hadoop jar contrib/streaming/hadoop-streaming-1.0.4.jar \
-input totals/part-00000 \
-output unseq \
-inputformat SequenceFileAsTextInputFormat
$ bin/hadoop fs -cat unseq/part-00000
I got the idea from here.
Incidentally, Dumbo can also output plain text.
Following the anwer of Praveen Sripati, here a small example of SequenceFileReadDemo.java from Hadoop the Definitive Guide by Tom White.
Data are in HDFS, in this position : user/hduser/output-hashsort/ and the file is
part-r-00001
In eclipse, in the Arguments folder I've written this string :
and this is part of the output, with the debugger

collective.xsendfile, ZODB blobs and UNIX file permissions

I am currently trying to configure collective.xsendfile, Apache mod_xsendfile and Plone 4.
Apparently the Apache process does not see blobstrage files on the file system because they contain permissions:
ls -lh var/blobstorage/0x00/0x00/0x00/0x00/0x00/0x18/0xd5/0x19/0x038ea09d0eddc611.blob
-r-------- 1 plone plone 1006K May 28 15:30 var/blobstorage/0x00/0x00/0x00/0x00/0x00/0x18/0xd5/0x19/0x038ea09d0eddc611.blob
How do I configure blobstorage to give additional permissions, so that Apache could access these files?
The modes with which the blobstorage writes it's directories and files is hardcoded in ZODB.blob. Specifically, the standard ZODB.blob.FileSystemHelper class creates secure directories (only readable and writable for the current user) by default.
You could provide your own implementation of FileSystemHelper that would either make this configurable, or just sets the directory modes to 0750, and then patch ZODB.blob.BlobStorageMixin to use your class instead of the default:
import os
from ZODB import utils
from ZODB.blob import FilesystemHelper, BlobStorageMixin
from ZODB.blob import log, LAYOUT_MARKER
class GroupReadableFilesystemHelper(FilesystemHelper):
def create(self):
if not os.path.exists(self.base_dir):
os.makedirs(self.base_dir, 0750)
log("Blob directory '%s' does not exist. "
"Created new directory." % self.base_dir)
if not os.path.exists(self.temp_dir):
os.makedirs(self.temp_dir, 0750)
log("Blob temporary directory '%s' does not exist. "
"Created new directory." % self.temp_dir)
if not os.path.exists(os.path.join(self.base_dir, LAYOUT_MARKER)):
layout_marker = open(
os.path.join(self.base_dir, LAYOUT_MARKER), 'wb')
layout_marker.write(self.layout_name)
else:
layout = open(os.path.join(self.base_dir, LAYOUT_MARKER), 'rb'
).read().strip()
if layout != self.layout_name:
raise ValueError(
"Directory layout `%s` selected for blob directory %s, but "
"marker found for layout `%s`" %
(self.layout_name, self.base_dir, layout))
def isSecure(self, path):
"""Ensure that (POSIX) path mode bits are 0750."""
return (os.stat(path).st_mode & 027) == 0
def getPathForOID(self, oid, create=False):
"""Given an OID, return the path on the filesystem where
the blob data relating to that OID is stored.
If the create flag is given, the path is also created if it didn't
exist already.
"""
# OIDs are numbers and sometimes passed around as integers. For our
# computations we rely on the 64-bit packed string representation.
if isinstance(oid, int):
oid = utils.p64(oid)
path = self.layout.oid_to_path(oid)
path = os.path.join(self.base_dir, path)
if create and not os.path.exists(path):
try:
os.makedirs(path, 0750)
except OSError:
# We might have lost a race. If so, the directory
# must exist now
assert os.path.exists(path)
return path
def _blob_init_groupread(self, blob_dir, layout='automatic'):
self.fshelper = GroupReadableFilesystemHelper(blob_dir, layout)
self.fshelper.create()
self.fshelper.checkSecure()
self.dirty_oids = []
BlobStorageMixin._blob_init = _blob_init_groupread
Quite a hand-full, you may want to make this a feature request for ZODB3 :-)
While setting up a backup routine for a ZOPE/ZEO setup, I ran into the same problem with blob permissions.
After trying to apply the monkey patch that Mikko wrote (which is not that easy) i came up with a "real" patch to solve the problem.
The patch suggested by Martijn is not complete, it still does not set the right mode on blob files.
So here's my solution:
1.) Create a patch containing:
Index: ZODB/blob.py
===================================================================
--- ZODB/blob.py (Revision 121959)
+++ ZODB/blob.py (Arbeitskopie)
## -337,11 +337,11 ##
def create(self):
if not os.path.exists(self.base_dir):
- os.makedirs(self.base_dir, 0700)
+ os.makedirs(self.base_dir, 0750)
log("Blob directory '%s' does not exist. "
"Created new directory." % self.base_dir)
if not os.path.exists(self.temp_dir):
- os.makedirs(self.temp_dir, 0700)
+ os.makedirs(self.temp_dir, 0750)
log("Blob temporary directory '%s' does not exist. "
"Created new directory." % self.temp_dir)
## -359,8 +359,8 ##
(self.layout_name, self.base_dir, layout))
def isSecure(self, path):
- """Ensure that (POSIX) path mode bits are 0700."""
- return (os.stat(path).st_mode & 077) == 0
+ """Ensure that (POSIX) path mode bits are 0750."""
+ return (os.stat(path).st_mode & 027) == 0
def checkSecure(self):
if not self.isSecure(self.base_dir):
## -385,7 +385,7 ##
if create and not os.path.exists(path):
try:
- os.makedirs(path, 0700)
+ os.makedirs(path, 0750)
except OSError:
# We might have lost a race. If so, the directory
# must exist now
## -891,7 +891,7 ##
file2.close()
remove_committed(f1)
if chmod:
- os.chmod(f2, stat.S_IREAD)
+ os.chmod(f2, stat.S_IRUSR | stat.S_IRGRP)
if sys.platform == 'win32':
# On Windows, you can't remove read-only files, so make the
You can also take a look at the patch here -> http://pastebin.com/wNLYyXvw
2.) Store the patch under name 'blob.patch' in your buildout root directory
3.) Extend your buildout configuration:
parts +=
patchblob
postinstall
[patchblob]
recipe = collective.recipe.patch
egg = ZODB3
patches = blob.patch
[postinstall]
recipe = plone.recipe.command
command =
chmod -R g+r ${buildout:directory}/var
find ${buildout:directory}/var -type d | xargs chmod g+x
update-command = ${:command}
The postinstall sections sets desired group read permissions on already existing blobs. Note, also execute permission must be given to the blob folders, that group can enter the directories.
I've tested this patch with ZODB 3.10.2 and 3.10.3.
As Martijn suggested, this should be configurable and part of the ZODB directly.