OS: Windows Server 2012 R2 Standart
FS: NTFS
=== perl5
e:\temporary>perl -v
This is perl 5, version 22, subversion 0 (v5.22.0) built for MSWin32-x64-multi-thread
e:\temporary>type ctime.pl
use File::stat;
use Time::Piece;
my $fn1 = 't:\temporary\tia\Энергия\print.pdf';
my $fn2 = 't:\temporary\tia\Энергия\kl_to_1c.txt';
for ($fn1,$fn2) {
my $fs = stat($_);
print "$_\n";
print 'changed ',gmtime($fs->ctime)->datetime,"\n";
print 'modified ',gmtime($fs->mtime)->datetime,"\n";
print 'accessed ',gmtime($fs->atime)->datetime,"\n";
}
e:\temporary>perl ctime.pl
t:\temporary\tia\Энергия\print.pdf
changed 2016-07-01T03:48:22 <== (1)
modified 2016-05-04T03:03:08
accessed 2016-07-01T03:48:22
t:\temporary\tia\Энергия\kl_to_1c.txt
changed 2016-07-01T03:48:22 <== (3)
modified 2016-07-01T03:11:00
accessed 2016-07-01T03:48:22
=== perl6
e:\temporary>perl6 -v
This is Rakudo version 2016.04 built on MoarVM version 2016.04
implementing Perl 6.c.
e:\temporary>type ctime.pl6
use v6;
my $fio1 = 't:\temporary\tia\Энергия\print.pdf'.IO;
my $fio2 = 't:\temporary\tia\Энергия\kl_to_1c.txt'.IO;
for $fio1,$fio2 {
say .path;
say 'changed ', .changed.DateTime.truncated-to('second');
say 'modified ', .modified.DateTime.truncated-to('second');
say 'accessed ', .accessed.DateTime.truncated-to('second');
}
e:\temporary>perl6 ctime.pl6
t:\temporary\tia\Энергия\print.pdf
changed 2016-05-04T03:03:08Z <== (2)
modified 2016-05-04T03:03:08Z
accessed 2016-07-01T03:48:22Z
t:\temporary\tia\Энергия\kl_to_1c.txt
changed 2016-07-01T05:46:12Z <== (4)
modified 2016-07-01T03:11:00Z
accessed 2016-07-01T03:48:22Z
Why (1),(2) and (3),(4) are different?
It's OK?
Reproducing (1),(2).
1) Create file with text editor. Difference will be in seconds.
From perl5:
changed 2016-06-30T16:38:42
modified 2016-06-30T16:38:48
accessed 2016-06-30T16:38:42
From perl6:
changed 2016-06-30T16:38:48Z
modified 2016-06-30T16:38:48Z
accessed 2016-06-30T16:38:42Z
2) Edit this file several minutes later. Difference will be more noticeable.
From perl5:
changed 2016-06-30T16:38:42 <==
modified 2016-06-30T16:49:17
accessed 2016-06-30T16:38:42
From perl6:
changed 2016-06-30T16:49:17Z <==
modified 2016-06-30T16:49:17Z
accessed 2016-06-30T16:38:42Z
'stat' from cgwin/babun:
{ ~ } » stat t:/temporary/tia/Энергия/print.pdf ~
File: ‘t:/temporary/tia/Энергия/print.pdf’
Size: 81595 Blocks: 80 IO Block: 65536 regular file
Device: dfe235h/14672437d Inode: 26458647810801926 Links: 1
Access: (0644/-rw-r--r--) Uid: ( 500/Administrator) Gid: ( 513/Domain Users)
Access: 2016-07-01 09:48:22.578784100 +0600
Modify: 2016-05-04 09:03:08.602697600 +0600
Change: 2016-05-04 09:03:08.602697600 +0600
Birth: 2016-07-01 09:48:22.578784100 +0600
{ ~ } » stat t:/temporary/tia/Энергия/kl_to_1c.txt ~ 1
File: ‘t:/temporary/tia/Энергия/kl_to_1c.txt’
Size: 4596 Blocks: 8 IO Block: 65536 regular file
Device: dfe235h/14672437d Inode: 24769797950537989 Links: 1
Access: (0644/-rw-r--r--) Uid: ( 500/Administrator) Gid: ( 513/Domain Users)
Access: 2016-07-01 09:48:22.563158800 +0600
Modify: 2016-07-01 09:11:00.585249200 +0600
Change: 2016-07-01 11:46:12.037712200 +0600
Birth: 2016-07-01 09:48:22.563158800 +0600
This may be a bug originating in libuv (see: https://irclog.perlgeek.de/perl6/2016-07-11#i_12818620). Even if it is, it should not drop through into Perl 6 code. Please file a bug report against Rakudo (see: http://rakudo.org/tickets/).
Related
I'm trying to combine outputs from two separate processes A and B, where each of them outputs multiple files, into input of process C. All file names have in common a chromosome number(for example "chr1"). The process A outputs files: /path/chr1_qc.vcf.gz, /path/chr2_qc.vcf.gz and etc (genotype files).
Process B outputs files: /path/chr1.a.bcf, /path/chr1.b.bcf, /path/chr1.c.bcf.../path/chr2.a.bcf, /path/chr2.b.bcf and etc (region files). And the number of both file-sets could vary each time.
Part of the code:
process A {
module "bcftools/1.16"
publishDir "${params.out_dir}", mode: 'copy', overwrite: true
input:
path vcf
path tbi
output:
path ("${(vcf =~ /chr\d{1,2}/)[0]}_qc.vcf.gz")
script:
"""
bcftools view -R ${params.sites_list} -Oz -o ${(vcf =~ /chr\d{1,2}/)[0]}_qc.vcf.gz ${vcf} //generates QC-ed genome files
tabix -f ${(vcf =~ /chr\d{1,2}/)[0]}_qc.vcf.gz //indexing QC-ed genomes
"""
}
process B {
publishDir "${params.out_dir}", mode: 'copy', overwrite: true
input:
path(vcf)
output:
tuple path("${(vcf =~ /chr\d{1,2}/)[0]}.*.bed")
script:
"""
python split_chr.py ${params.chr_lims} ${vcf} //generates region files
"""
}
process C {
publishDir "${params.out_dir}", mode: 'copy', overwrite: true
input:
tuple path(vcf), path(bed)
output:
path "${bed.SimpleName}.vcf.gz"
script:
"""
bcftools view -R ${bed} -Oz -o ${bed.SimpleName}.vcf.gz ${vcf}
"""
}
workflow {
A(someprocess.out)
B(A.out)
C(combined_AB_files)
}
Process B output.view() output:
[/path/chr1.a.bed, /path/chr1.b.bed]
[/path/chr2.a.bed, /path/chr2.b.bed]
How can I get the process C to receive an input as a channel of tuples (A and B outputs combined by chromosome name) like this:
[ /path/chr1_qc.vcf.gz, /path/chr1.a.bcf ]
[ /path/chr1_qc.vcf.gz, /path/chr1.b.bcf ]
...
[ /path/chr2_qc.vcf.gz, /path/chr2.a.bcf ]
...
I think what you want is the second form of the combine operator, which allows you to combine items that share a matching key using the by parameter. If one or more of your channels are missing a shared key in the first element, you can just use the map operator to produce such a key. To get the desired output, use the transpose operator and specify the index (zero based) of the element to be transposed, again using the by parameter. For example:
workflow {
Channel
.fromPath( './data/*.bed' )
.map { tuple( it.simpleName, it ) }
.groupTuple()
.set { bed_files }
Channel
.fromPath( './data/*_qc.vcf.gz' )
.map { tuple( it.simpleName - ~/_qc$/, it ) }
.combine( bed_files, by: 0 )
.transpose( by: 2 )
.map { chrom, vcf, bed -> tuple( vcf, bed ) }
.view()
}
Results:
$ touch ./data/chr{1..3}.{a..c}.bed
$ touch ./data/chr{1..3}_qc.vcf.gz
$ nextflow run main.nf
N E X T F L O W ~ version 22.10.0
Launching `main.nf` [pedantic_woese] DSL2 - revision: 9c5abfca90
[/data/chr1_qc.vcf.gz, /data/chr1.c.bed]
[/data/chr1_qc.vcf.gz, /data/chr1.a.bed]
[/data/chr1_qc.vcf.gz, /data/chr1.b.bed]
[/data/chr2_qc.vcf.gz, /data/chr2.b.bed]
[/data/chr2_qc.vcf.gz, /data/chr2.a.bed]
[/data/chr2_qc.vcf.gz, /data/chr2.c.bed]
[/data/chr3_qc.vcf.gz, /data/chr3.c.bed]
[/data/chr3_qc.vcf.gz, /data/chr3.b.bed]
[/data/chr3_qc.vcf.gz, /data/chr3.a.bed]
Note that when two or more queue channels are declared as process inputs (like in your process A), the process will block until it receives a value from each input channel. As these are run in parallel and asynchronously, there's no guarantee that items will be emitted in the order that they were received. This can result in mix-ups, where for example, you unexpectedly end up with an index file that belongs to another VCF. Most of the time, what you want is one queue channel and one or more value channels. The section in the docs on multiple input channels explains this quite well in my opinion, and is well worth the time reading if you haven't already. Also, joining and combining channels becomes a lot easier when your processes define tuples in their input and output declarations, where the first element is a key, like a sample name/id. I think you want something something like the following:
params.vcf_files = './data/*.vcf.gz{,.tbi}'
params.sites_list = './data/sites.tsv'
params.chr_lims = './data/file.txt'
params.outdir = './results'
process proc_A {
tag "${sample}: ${indexed_vcf.first()}"
publishDir "${params.outdir}/proc_A", mode: 'copy', overwrite: true
module "bcftools/1.16"
input:
tuple val(sample), path(indexed_vcf)
path sites_list
output:
tuple val(sample), path("${sample}_qc.vcf.gz{,.tbi}")
script:
def vcf = indexed_vcf.first()
"""
bcftools view \\
-R "${sites_list}" \\
-Oz \\
-o "${sample}_qc.vcf.gz" \\
"${vcf}"
bcftools index \\
-t \\
"${sample}_qc.vcf.gz"
"""
}
process proc_B {
tag "${sample}: ${indexed_vcf.first()}"
publishDir "${params.outdir}/proc_B", mode: 'copy', overwrite: true
input:
tuple val(sample), path(indexed_vcf)
path chr_lims
output:
tuple val(sample), path("*.bed")
script:
def vcf = indexed_vcf.first()
"""
split_chr.py "${chr_lims}" "${vcf}"
"""
}
process proc_C {
tag "${sample}: ${indexed_vcf.first()}: ${bed.name}"
publishDir "${params.outdir}/proc_C", mode: 'copy', overwrite: true
input:
tuple val(sample), path(indexed_vcf), path(bed)
output:
tuple val(sample), path("${bed.simpleName}.vcf.gz")
script:
def vcf = indexed_vcf.first()
"""
bcftools view \\
-R "${bed}" \\
-Oz \\
-o "${bed.simpleName}.vcf.gz" \\
"${vcf}"
"""
}
workflow {
vcf_files = Channel.fromFilePairs( params.vcf_files )
sites_list = file( params.sites_list )
chr_lims = file( params.chr_lims )
proc_A( vcf_files, sites_list )
proc_B( proc_A.out, chr_lims )
proc_A.out \
| combine( proc_B.out, by: 0 ) \
| map { sample, indexed_vcf, bed_files ->
bed_list = bed_files instanceof Path ? [bed_files] : bed_files
tuple( sample, indexed_vcf, bed_list )
} \
| transpose( by: 2 ) \
| proc_C \
| view()
}
The above should produce results like:
$ nextflow run main.nf
N E X T F L O W ~ version 22.10.0
Launching `main.nf` [mighty_elion] DSL2 - revision: 5ea25ae72c
executor > local (15)
[b4/08df9d] process > proc_A (foo: foo.vcf.gz) [100%] 3 of 3 ✔
[93/55e467] process > proc_B (foo: foo_qc.vcf.gz) [100%] 3 of 3 ✔
[8b/cd7193] process > proc_C (foo: foo_qc.vcf.gz: b.bed) [100%] 9 of 9 ✔
[bar, ./work/90/53b9c6468ca54bb0f4eeb99ca82eda/a.vcf.gz]
[bar, ./work/24/cca839d5f63ee6988ead96dc9fbe1d/b.vcf.gz]
[bar, ./work/6f/61e1587134e68d2e358998f61f6459/c.vcf.gz]
[baz, ./work/f8/1484e94b9187ba6aae81d68f0a18cf/b.vcf.gz]
[baz, ./work/9c/20578262f5a2c13c6c3b566dc7b7d8/c.vcf.gz]
[baz, ./work/f5/3405b54f81f6f500a3ee4a78f5e6df/a.vcf.gz]
[foo, ./work/39/945fb0d3f375260e75afbc9caebc5d/a.vcf.gz]
[foo, ./work/de/cecd94ff39f204e799cb8e4c4ad46f/c.vcf.gz]
[foo, ./work/8b/cd7193107f6be5472d2e29982e3319/b.vcf.gz]
Also note that third party scripts, like your Python script, can be moved to a folder called bin in the root of your project repository (i.e. the same directory as your main.nf). And if you make your script executable, you will be able to invoke "as-is", i.e. without the need for an absolute path to it.
This can be done with channel operators. Check the code below, with some comments:
workflow {
// Let's start by building channels similar to the ones you described
Channel
.of(file('/path/chr1_qc.vcf.gz'), file('/path/chr2_qc.vcf.gz'))
.set { pAoutput}
Channel
.of(file('/path/chr1.a.bcf'), file('/path/chr1.b.bcf'), file('/path/chr1.c.bcf'),
file('/path/chr2.a.bcf'), file('/path/chr2.b.bcf'), file('/path/chr2.c.bcf'))
.set { pBoutput }
// Now, let's create keys to relate the elements in the two channels
pAoutput
.map { filepath -> [filepath.name.tokenize('_')[0], filepath ] }
.set { pAoutput_tuple }
// The channel now looks like this:
// [chr1, /path/chr1_qc.vcf.gz]
// [chr2, /path/chr2_qc.vcf.gz]
pBoutput
.map { filepath -> [filepath.name.tokenize('.')[0], filepath ] }
.set { pBoutput_tuple }
// And:
// [chr1, /path/chr1.a.bcf]
// [chr1, /path/chr1.b.bcf]
// [chr1, /path/chr1.c.bcf]
// [chr2, /path/chr2.a.bcf]
// [chr2, /path/chr2.b.bcf]
// [chr2, /path/chr2.c.bcf]
// Combine the two channels and group by key
pAoutput_tuple
.mix(pBoutput_tuple)
.groupTuple()
.flatMap { chrom, path_list ->
path_list.split {
it.name.endsWith('.vcf.gz')
}.combinations()
}
.view()
}
You can check the output below:
N E X T F L O W ~ version 22.10.4
Launching `ex.nf` [maniac_pike] DSL2 - revision: f87873ef13
[/path/chr1_qc.vcf.gz, /path/chr1.a.bcf]
[/path/chr1_qc.vcf.gz, /path/chr1.b.bcf]
[/path/chr1_qc.vcf.gz, /path/chr1.c.bcf]
[/path/chr2_qc.vcf.gz, /path/chr2.a.bcf]
[/path/chr2_qc.vcf.gz, /path/chr2.b.bcf]
[/path/chr2_qc.vcf.gz, /path/chr2.c.bcf]
I've been struggling to identify why a nextflow (v20.10.00) process is not using all the items in a channel. I want the process to run for each sample bam file (10 in total) and for each chromosome (3 in total).
Here is the creation of the channels and the process:
ref_genome = file( params.RefGen, checkIfExists: true )
ref_dir = ref_genome.getParent()
ref_name = ref_genome.getBaseName()
ref_dict = file( "${ref_dir}/${ref_name}.dict", checkIfExists: true )
ref_index = file( "${ref_dir}/${ref_name}.*.fai", checkIfExists: true )
// Handles reading in data if the previous step is skipped
if( params.Skip_BP ){
Channel
.fromFilePairs("${params.ProcBamDir}/*{bam,bai}") { file -> file.name.replaceAll(/.bam|.bai$/,'') }
.ifEmpty { error "No bams found in ${params.ProcBamDir}" }
.map { ID, files -> tuple(ID, files[0], files[1]) }
.set { processed_bams }
}
// Setting up the chromosome channel
if( params.Chroms == "" ){
// Defaulting to using all chromosomes
chromosomes_ch = Channel
.from("AgamP4_2L", "AgamP4_2R", "AgamP4_3L", "AgamP4_3R", "AgamP4_X", "AgamP4_Y_unplaced", "AgamP4_UNKN")
println "No chromosomes specified, using all major chromosomes: AgamP4_2L, AgamP4_2R, AgamP4_3L, AgamP4_3R, AgamP4_X, AgamP4_Y_unplaced, AgamP4_UNKN"
} else {
// User option to choose which chromosome will be used
// This worked with the following syntax nextflow run testing.nf --profile imperial --Chroms "AgamP4_3R,AgamP4_2L"
chrs = params.Chroms.split(",")
chromosomes_ch = Channel
.from( chrs )
println "User defined chromosomes set: ${params.Chroms}"
}
process DNA_HCG {
errorStrategy { sleep(Math.pow(2, task.attempt) * 600 as long); return 'retry' }
maxRetries 3
maxForks params.HCG_Forks
tag { SampleID+"-"+chrom }
executor = 'pbspro'
clusterOptions = "-lselect=1:ncpus=${params.HCG_threads}:mem=${params.HCG_memory}gb:mpiprocs=1:ompthreads=${params.HCG_threads} -lwalltime=${params.HCG_walltime}:00:00"
publishDir(
path: "${params.HCDir}",
mode: 'copy',
)
input:
each chrom from chromosomes_ch
set SampleID, path(bam), path(bai) from processed_bams
path ref_genome
path ref_dict
path ref_index
output:
tuple chrom, path("${SampleID}-${chrom}.vcf") into HCG_ch
path("${SampleID}-${chrom}.vcf.idx") into idx_ch
beforeScript 'module load anaconda3/personal; source activate NF_GATK'
script:
"""
if [ ! -d tmp ]; then mkdir tmp; fi
taskset -c 0-${params.HCG_threads} gatk --java-options \"-Xmx${params.HCG_memory}G -XX:+UseParallelGC -XX:ParallelGCThreads=${params.HCG_threads}\" HaplotypeCaller \\
--tmp-dir tmp/ \\
--pair-hmm-implementation AVX_LOGLESS_CACHING_OMP \\
--native-pair-hmm-threads ${params.HCG_threads} \\
-ERC GVCF \\
-L ${chrom} \\
-R ${ref_genome} \\
-I ${bam} \\
-O ${SampleID}-${chrom}.vcf ${params.GVCF_args}
"""
}
But for reasons I cannot figure out, nextflow only creates 3 jobs: [d8/45499b] process > DNA_HCG (0_wt5_BP-CM029350.1) [ 0%] 0 of 3
I thought maybe it was because it only took the first sample and then one process for each chromosome. Though I doubted this since the code works for a different reference genome correctly. Regardless, I adjusted the input channels:
processed_bams
.combine(chromosomes_ch)
.set { HCG_in }
and
input:
set SampleID, path(bam), path(bai), chrom from HCG_in
But this resulted in only a single job being created: [6e/78b070] process > DNA_HCG (0_wt10_BP-CM029350.1) [ 0%] 0 of 1
Confusingly, when i use HCG_in.view() there are 30 items. And to further confuse me the correct number of jobs comes from the following code:
chrs = params.Chroms.split(",")
chromosomes_ch = Channel
.from(chrs)
Channel
.fromFilePairs("${params.ProcBamDir}/*{bam,bai}") { file -> file.name.replaceAll(/.bam|.bai$/,'') }
.ifEmpty { error "No bams found in ${params.ProcBamDir}" }
.map { ID, files -> tuple(ID, files[0], files[1]) }
.set { processed_bams }
process HCG {
executor 'local'
input:
each chrom from chromosomes_ch
set SampleID, path(bam), path(bai) from processed_bams
//set SampleID, path(bam), path(bai), chrom from HCG_in
script:
"""
echo "${SampleID} - ${chrom}"
"""
}
Output: [75/c1c25a] process > HCG (27) [100%] 30 of 30 ✔
I'm hoping I've just missed something obvious, but I cannot see it at the moment. Thanks in advance for the help.
Issues like this almost always involve the use of multiple input channels:
When two or more channels are declared as process inputs, the process
stops until there’s a complete input configuration ie. it receives an
input value from all the channels declared as input.
Your initial assessment was correct. However, the reason only three processes were run (i.e. one sample for each of the three chromosomes), is because this line (probably) returned a list (i.e. a java LinkedList) containing a single element, and lists behave like queue channels:
ref_index = file( "${ref_dir}/${ref_name}.*.fai", checkIfExists: true )
You might have expected this to return a UnixPath. Ultimately, the solution is to ensure ref_index is value channel.
I have 2 sets of logs. Each is going to their own syslog server. But the source of the logs is the same - a palo alto prisma vpn.
For whatever reason, Syslog-Server A (the oldest source) writes the logs like this (in bold):
Nov 22 15:08:03 34 456
But my newest Syslog Server, B, writes the logs like this (in bold):
Nov 22 15:08:03 34.0.0.1 456
This is a problem. Because, on each syslog-ng server, we have a Splunk Universal Forwarder that forwards the logs to an index. We use the Palo Tech Add-On to parse the data.
It appears the TA is expected to parse the incoming data with this field:
Nov 22 15:08:03 34 456
Any other way breaks parsing. I have my new syslog-ng file (for the new server - B) written as below:
#version:3.31
#include "scl.conf"
options {
flush_lines (0);
time_reopen (10);
log_fifo_size (1000);
chain_hostnames (off);
use_dns (no); #was yes
use_fqdn (no); #was yes
create_dirs (no);
keep_hostname (no); #was yes
};
source vpn_encrypted_log_traffic {
network(
ip(0.0.0.0)
port(6514)
transport("tls")
tls(
cert-file("/etc/syslog-ng/certs/prv.cer")
key-file("/etc/syslog-ng/certs/prv.key")
peer_verify(optional-untrusted)
)
);
};
destination prisma{ file("/directory/log.log") create_dir(yes) ); }
log { source(vpn_encrypted_log_traffic); destination(prisma); };
And the old syslog server (A) just has this:
#version:3.5
#include "scl.conf"
options {
time-reap (30);
keep_hostname (no); #was yes
};
source vpn_encrypted_log_traffic {
network(
ip(0.0.0.0)
port(6514)
transport("tls")
tls(
cert-file("/etc/syslog-ng/certs/prv.cer")
key-file("/etc/syslog-ng/certs/prv.key")
peer_verify(optional-untrusted)
)
);
};
destination prisma{ file("/directory/log.log") create_dir(yes) ); }
log { source(vpn_encrypted_log_traffic); destination(prisma); };
I can only think the problem exists in Prisma. But the configs look 1-to-1 to me.
34.0.0.1 is the hostname/IP address part of a BSD syslog message.
use_dns(no); keep_hostname(no); means this part of the message will be replaced with server B's IP address.
keep_hostname(yes) can be used to leave the hostname intact.
My excludes are being ignored.
.rubocop.yml
Rails:
Enabled: true
Exclude:
- 'db/**/*'
- 'config/**/*'
- 'script/**/*'
- 'bin/{rails,rake}'
- 'vendor/**/*'
- 'spec/fixtures/**/*'
- 'tmp/**/*'
Rubocop Message:
config/environments/development.rb:3:1: C: Metrics/BlockLength: Block
has too many lines. [32/25] Rails.application.configure do ...
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ config/environments/production.rb:3:1:
C: Metrics/BlockLength: Block has too many lines. [29/25]
Rails.application.configure do ...
pre-commit GitHook
#!/usr/bin/env ruby
require 'english'
require 'rubocop'
ADDED_OR_MODIFIED = /A|AM|^M/.freeze
changed_files = `git status --porcelain`.split(/\n/).
select { |file_name_with_status|
file_name_with_status =~ ADDED_OR_MODIFIED
}.
map { |file_name_with_status|
file_name_with_status.split(' ')[1]
}.
select { |file_name|
File.extname(file_name) == '.rb'
}.join(' ')
system("rubocop --force-exclusion -a #{changed_files}") unless changed_files.empty?
status=$CHILD_STATUS.to_s[-1].to_i
if status == 0
system("echo -en '\\033[32mFormatting Passed, Committing...\\033[0;39m\n'")
exit 0
else
system("echo -en '\\033[1;31mCannot commit, formating failing. Use --no-verify to force commit.\\033[0;39m\n'")
exit 1
end
Your configuration file only defines excludes for the Rails department of cops, so it is correct that BlockLength, which is in the Metrics department, still inspects the files.
If what you meant to do was to ignore these files for all cops, you can use:
AllCops:
Exclude:
- 'db/**/*'
- 'config/**/*'
- 'script/**/*'
- 'bin/{rails,rake}'
- 'vendor/**/*'
- 'spec/fixtures/**/*'
- 'tmp/**/*'
Rails:
Enabled: true
Or if you just want to exclude the files for the Metrics cops, substitute AllCops for Metrics in the configuration above.
In linux there is a limit for max open files for every process of each login user, as below:
$ ulimit -n
1024
When I study java nio, I'd like to check this value. Because channel also is a file in Linux,I wrote a client code to create socketChannel continuely until throwing below exception:
java.net.SocketException: Too many open files
at sun.nio.ch.Net.socket0(Native Method)
at sun.nio.ch.Net.socket(Net.java:423)
at sun.nio.ch.Net.socket(Net.java:416)
at sun.nio.ch.SocketChannelImpl.<init>(SocketChannelImpl.java:104)
at sun.nio.ch.SelectorProviderImpl.openSocketChannel(SelectorProviderImpl.java:60)
at java.nio.channels.SocketChannel.open(SocketChannel.java:142)
But I found it till created about 4085 socketChannel, it will throw this exception. This number is more than 1024. Somebody told me jvm changed the value implicitly. And I wrote a java program to execute ulimit command, and found jvm do change the value. As below:
String [] cmdArray = {"sh","-c","ulimit -n"};
Process p = Runtime.getRuntime().exec(cmdArray);
BufferedInputStream in = new BufferedInputStream(p.getInputStream());
byte[] buf = new byte[1024];
int len = in.read(buf);
System.out.println(new String(buf, 0, len)); //4096
Does anybody know when and where and how jvm changes this value? Does exist some sys log to record this change or some sys tool could monitor this change?
$ strace -f -o HelloWorld.strace java HelloWorld
Hello World!
$ vi HelloWorld.strace
...
16341 getrlimit(RLIMIT_NOFILE, {rlim_cur=1024, rlim_max=4*1024}) = 0
16341 setrlimit(RLIMIT_NOFILE, {rlim_cur=4*1024, rlim_max=4*1024}) = 0
...
Download openjdk, then cd into hotspot dir,
$ grep -r setrlimit
...
src/os/linux/vm/os_linux.cpp: status = setrlimit(RLIMIT_NOFILE, &nbr_files);
...
$ vi src/os/linux/vm/os_linux.cpp
...
if (MaxFDLimit) {
// set the number of file descriptors to max. print out error
// if getrlimit/setrlimit fails but continue regardless.
struct rlimit nbr_files;
int status = getrlimit(RLIMIT_NOFILE, &nbr_files);
if (status != 0) {
if (PrintMiscellaneous && (Verbose || WizardMode))
perror("os::init_2 getrlimit failed");
} else {
nbr_files.rlim_cur = nbr_files.rlim_max;
status = setrlimit(RLIMIT_NOFILE, &nbr_files);
if (status != 0) {
if (PrintMiscellaneous && (Verbose || WizardMode))
perror("os::init_2 setrlimit failed");
}
}
...
If you modify above code, e.g.
//nbr_files.rlim_cur = nbr_files.rlim_max;
nbr_files.rlim_cur = 2048;
then rebuild this openjdk, then use this new jdk to execute above code, you'll find the output is 2048.