Apache-ignite: Persistent Storage - ignite

My understanding for Ignite Persistent Storage is that the data is not only saved in memory, but also written to disk.
When the node is restarted, it should read the data from disk to memory.
So, I am using this example to test it out. But I update it a little bit because I don't want to use xml.
This is my slightly updated code.
public class PersistentIgniteExpr {
/**
* Organizations cache name.
*/
private static final String ORG_CACHE = "CacheQueryExample_Organizations";
/** */
private static final boolean UPDATE = true;
public void test(String nodeId) {
// Apache Ignite node configuration.
IgniteConfiguration cfg = new IgniteConfiguration();
// Ignite persistence configuration.
DataStorageConfiguration storageCfg = new DataStorageConfiguration();
// Enabling the persistence.
storageCfg.getDefaultDataRegionConfiguration().setPersistenceEnabled(true);
// Applying settings.
cfg.setDataStorageConfiguration(storageCfg);
List<String> addresses = new ArrayList<>();
addresses.add("127.0.0.1:47500..47502");
TcpDiscoverySpi tcpDiscoverySpi = new TcpDiscoverySpi();
tcpDiscoverySpi.setIpFinder(new TcpDiscoveryMulticastIpFinder().setAddresses(addresses));
cfg.setDiscoverySpi(tcpDiscoverySpi);
try (Ignite ignite = Ignition.getOrStart(cfg.setIgniteInstanceName(nodeId))) {
// Activate the cluster. Required to do if the persistent store is enabled because you might need
// to wait while all the nodes, that store a subset of data on disk, join the cluster.
ignite.active(true);
CacheConfiguration<Long, Organization> cacheCfg = new CacheConfiguration<>(ORG_CACHE);
cacheCfg.setAtomicityMode(CacheAtomicityMode.TRANSACTIONAL);
cacheCfg.setBackups(1);
cacheCfg.setWriteSynchronizationMode(CacheWriteSynchronizationMode.FULL_SYNC);
cacheCfg.setIndexedTypes(Long.class, Organization.class);
IgniteCache<Long, Organization> cache = ignite.getOrCreateCache(cacheCfg);
if (UPDATE) {
System.out.println("Populating the cache...");
try (IgniteDataStreamer<Long, Organization> streamer = ignite.dataStreamer(ORG_CACHE)) {
streamer.allowOverwrite(true);
for (long i = 0; i < 100_000; i++) {
streamer.addData(i, new Organization(i, "organization-" + i));
if (i > 0 && i % 10_000 == 0)
System.out.println("Done: " + i);
}
}
}
// Run SQL without explicitly calling to loadCache().
QueryCursor<List<?>> cur = cache.query(
new SqlFieldsQuery("select id, name from Organization where name like ?")
.setArgs("organization-54321"));
System.out.println("SQL Result: " + cur.getAll());
// Run get() without explicitly calling to loadCache().
Organization org = cache.get(54321l);
System.out.println("GET Result: " + org);
}
}
}
When I run the first time, it works as intended.
After running it one time, I am assuming that data is written to disk since the code is about persistent storage.
When I run the second time, I commented out this part.
if (UPDATE) {
System.out.println("Populating the cache...");
try (IgniteDataStreamer<Long, Organization> streamer = ignite.dataStreamer(ORG_CACHE)) {
streamer.allowOverwrite(true);
for (long i = 0; i < 100_000; i++) {
streamer.addData(i, new Organization(i, "organization-" + i));
if (i > 0 && i % 10_000 == 0)
System.out.println("Done: " + i);
}
}
}
That is the part where data is written. When the sql query is executed, it is returning null. That means data is not written to disk?
Another question is I am not very clear about TcpDiscoverySpi. Can someone explain about it as well?
Thanks in advance.

Do you have any exceptions at node startup?
Very probably, you don't have IGNITE_HOME env variable configured. And the Work Directory for persistence is chosen somehow differently each time you run a node.
You can either setup IGNITE_HOME env variable or add a code line to setup workDirectory explicitly: cfg.setWorkDirectory("C:\\workDirectory");
TcpDiscoverySpi provides a way to discover remote nodes in a grid, so the starting node can join a cluster. It is better to use TcpDiscoveryVmIpFinder if you know the list of IPs. TcpDiscoveryMulticastIpFinder broadcasts UDP messages to a network to discover other nodes. It does not require IPs list at all.
Please see https://apacheignite.readme.io/docs/cluster-config for more details.

Related

StreamingFileSink doesn't work sometimes when trying to write to S3

I am trying to write to a S3 sink.
private static StreamingFileSink<String> createS3SinkFromStaticConfig(
final Map<String, Properties> applicationProperties
) {
Properties sinkProperties = applicationProperties.get(SINK_PROPERTIES);
String s3SinkPath = sinkProperties.getProperty(SINK_S3_PATH_KEY);
return StreamingFileSink
.forRowFormat(
new Path(s3SinkPath),
new SimpleStringEncoder<String>(StandardCharsets.UTF_8.toString())
)
.build();
}
The following code works and I can see the results in S3
input.map(value -> { // Parse the JSON
JsonNode jsonNode = jsonParser.readValue(value, JsonNode.class);
return new Tuple2<>(jsonNode.get("ticker").asText(), jsonNode.get("price").asDouble());
}).returns(Types.TUPLE(Types.STRING, Types.DOUBLE))
.keyBy(0) // Logically partition the stream per stock symbol
.timeWindow(Time.seconds(10), Time.seconds(5)) // Sliding window definition
.min(1) // Calculate minimum price per stock over the window
.setParallelism(3) // Set parallelism for the min operator
.map(value -> value.f0 + ": ----- " + value.f1.toString() + "\n")
.addSink(createS3SinkFromStaticConfig(applicationProperties));
But the following doesn't write anything to S3.
KeyedStream<EnrichedMetric, EnrichedMetricKey> input = env.addSource(new EnrichedMetricSource())
.assignTimestampsAndWatermarks(
WatermarkStrategy.<EnrichedMetric>forMonotonousTimestamps()
.withTimestampAssigner(((event, l) -> event.getEventTime()))
).keyBy(new EnrichedMetricKeySelector());
DataStream<String> statsStream = input
.window(TumblingEventTimeWindows.of(Time.seconds(5)))
.process(new PValueStatisticsWindowFunction());
statsStream.addSink(createS3SinkFromStaticConfig(applicationProperties));
PValueStatisticsWindowFunction is a ProcessWindowFunction as below.
#Override
public void process(EnrichedMetricKey enrichedMetricKey,
Context context,
Iterable<EnrichedMetric> in,
Collector<String> out) throws Exception {
int count = 0;
for (EnrichedMetric m : in) {
count++;
}
out.collect("Count: " + count);
}
When I run the Flink app locally, statsStream.print() prints the results to log/flink-*-taskexecutor-*.out.
In the cluster, I can see checkpoint is enabled and the various checkpoints history from the Flink dashboard. I also made sure the S3 path is in the format s3a://<bucket>
Not sure what I am missing here.

what is correct use of consumer groups in spring cloud stream dataflow and rabbitmq?

A follow up to this:
one SCDF source, 2 processors but only 1 processes each item
The 2 processors (del-1 and del-2) in the picture are receiving the same data within milliseconds of each other. I'm trying to rig this so del-2 never receives the same thing as del-1 and vice versa. So obviously I've got something configured incorrectly but I'm not sure where.
My processor has the following application.properties
spring.application.name=${vcap.application.name:sample-processor}
info.app.name=#project.artifactId#
info.app.description=#project.description#
info.app.version=#project.version#
management.endpoints.web.exposure.include=health,info,bindings
spring.autoconfigure.exclude=org.springframework.boot.autoconfigure.security.servlet.SecurityAutoConfiguration
spring.cloud.stream.bindings.input.group=input
Is "spring.cloud.stream.bindings.input.group" specified correctly?
Here's the processor code:
#Transformer(inputChannel = Processor.INPUT, outputChannel = Processor.OUTPUT)
public Object transform(String inputStr) throws InterruptedException{
ApplicationLog log = new ApplicationLog(this, "timerMessageSource");
String message = " I AM [" + inputStr + "] AND I HAVE BEEN PROCESSED!!!!!!!";
log.info("SampleProcessor.transform() incoming inputStr="+inputStr);
return message;
}
Is the #Transformer annotation the proper way to link this bit of code with "spring.cloud.stream.bindings.input.group" from application.properties? Are there any other annotations necessary?
Here's my source:
private String format = "EEEEE dd MMMMM yyyy HH:mm:ss.SSSZ";
#Bean
#InboundChannelAdapter(value = Source.OUTPUT, poller = #Poller(fixedDelay = "1000", maxMessagesPerPoll = "1"))
public MessageSource<String> timerMessageSource() {
ApplicationLog log = new ApplicationLog(this, "timerMessageSource");
String message = new SimpleDateFormat(format).format(new Date());
log.info("SampleSource.timeMessageSource() message=["+message+"]");
return () -> new GenericMessage<>(new SimpleDateFormat(format).format(new Date()));
}
I'm confused about the "value = Source.OUTPUT". Does this mean my processor needs to be named differently?
Is the inclusion of #Poller causing me a problem somehow?
This is how I define the 2 processor streams (del-1 and del-2) in SCDF shell:
stream create del-1 --definition ":split > processor-that-does-everything-sleeps5 --spring.cloud.stream.bindings.applicationMetrics.destination=metrics > :merge"
stream create del-2 --definition ":split > processor-that-does-everything-sleeps5 --spring.cloud.stream.bindings.applicationMetrics.destination=metrics > :merge"
Do I need to do anything differently there?
All of this is running in Docker/K8s.
RabbitMQ is given by bitnami/rabbitmq:3.7.2-r1 and is configured with the following props:
RABBITMQ_USERNAME: user
RABBITMQ_PASSWORD <redacted>:
RABBITMQ_ERL_COOKIE <redacted>:
RABBITMQ_NODE_PORT_NUMBER: 5672
RABBITMQ_NODE_TYPE: stats
RABBITMQ_NODE_NAME: rabbit#localhost
RABBITMQ_CLUSTER_NODE_NAME:
RABBITMQ_DEFAULT_VHOST: /
RABBITMQ_MANAGER_PORT_NUMBER: 15672
RABBITMQ_DISK_FREE_LIMIT: "6GiB"
Are any other environment variables necessary?

Apache Ignite Near Cache - local cache metrics

I've been experimenting with Ignite Near Caches. In doing so I'm configuring a client node with two server nodes in the cluster. I instantiated a near cache and would like to see the associated metrics on the cache hits/misses. Functionally everything works fine, but I can't figure out where the near cache metrics are.
I've tried to extract the cache metrics via calls
NearCacheConfiguration<Integer, Integer> nearCfg =
new NearCacheConfiguration<>();
nearCfg.setNearEvictionPolicyFactory(new LruEvictionPolicyFactory<>(100));
nearCfg.setNearStartSize(50);
IgniteCache<Integer, Integer> cache = ignite.getOrCreateCache(
new CacheConfiguration<Integer, Integer>("myCache"), nearCfg);
// run some cache puts and gets
for (int i=0; i<10000; i++) { cache.put(i, i); }
for (int i=0; i<10000; i++) { cache.get(i); }
// then try to retrieve metrics
System.out.println(cache.localMetrics());
System.out.println(cache.metrics());
output
CacheMetricsSnapshot [reads=0, puts=0, hits=0, misses=0, txCommits=0, txRollbacks=0, evicts=0, removes=0, putAvgTimeNanos=0.0, getAvgTimeNanos=0.0, rmvAvgTimeNanos=0.0, commitAvgTimeNanos=0.0, rollbackAvgTimeNanos=0.0, cacheName=myCache, offHeapGets=0, offHeapPuts=0, offHeapRemoves=0, offHeapEvicts=0, offHeapHits=0, offHeapMisses=0, offHeapEntriesCnt=0, heapEntriesCnt=0, offHeapPrimaryEntriesCnt=0, offHeapBackupEntriesCnt=0, offHeapAllocatedSize=0, size=0, keySize=0, isEmpty=true, dhtEvictQueueCurrSize=0, txThreadMapSize=0, txXidMapSize=0, txCommitQueueSize=0, txPrepareQueueSize=0, txStartVerCountsSize=0, txCommittedVersionsSize=0, txRolledbackVersionsSize=0, txDhtThreadMapSize=0, txDhtXidMapSize=0, txDhtCommitQueueSize=0, txDhtPrepareQueueSize=0, txDhtStartVerCountsSize=0, txDhtCommittedVersionsSize=0, txDhtRolledbackVersionsSize=0, isWriteBehindEnabled=false, writeBehindFlushSize=-1, writeBehindFlushThreadCnt=-1, writeBehindFlushFreq=-1, writeBehindStoreBatchSize=-1, writeBehindTotalCriticalOverflowCnt=0, writeBehindCriticalOverflowCnt=0, writeBehindErrorRetryCnt=0, writeBehindBufSize=-1, totalPartitionsCnt=0, rebalancingPartitionsCnt=0, keysToRebalanceLeft=0, rebalancingKeysRate=0, rebalancingBytesRate=0, rebalanceStartTime=0, rebalanceFinishTime=0, keyType=java.lang.Object, valType=java.lang.Object, isStoreByVal=true, isStatisticsEnabled=false, isManagementEnabled=false, isReadThrough=false, isWriteThrough=false, isValidForReading=true, isValidForWriting=true]
CacheMetricsSnapshot [reads=0, puts=0, hits=0, misses=0, txCommits=0, txRollbacks=0, evicts=0, removes=0, putAvgTimeNanos=0.0, getAvgTimeNanos=0.0, rmvAvgTimeNanos=0.0, commitAvgTimeNanos=0.0, rollbackAvgTimeNanos=0.0, cacheName=myCache, offHeapGets=0, offHeapPuts=0, offHeapRemoves=0, offHeapEvicts=0, offHeapHits=0, offHeapMisses=0, offHeapEntriesCnt=0, heapEntriesCnt=100, offHeapPrimaryEntriesCnt=0, offHeapBackupEntriesCnt=0, offHeapAllocatedSize=0, size=0, keySize=0, isEmpty=true, dhtEvictQueueCurrSize=-1, txThreadMapSize=0, txXidMapSize=0, txCommitQueueSize=0, txPrepareQueueSize=0, txStartVerCountsSize=0, txCommittedVersionsSize=0, txRolledbackVersionsSize=0, txDhtThreadMapSize=0, txDhtXidMapSize=-1, txDhtCommitQueueSize=0, txDhtPrepareQueueSize=0, txDhtStartVerCountsSize=0, txDhtCommittedVersionsSize=-1, txDhtRolledbackVersionsSize=-1, isWriteBehindEnabled=false, writeBehindFlushSize=-1, writeBehindFlushThreadCnt=-1, writeBehindFlushFreq=-1, writeBehindStoreBatchSize=-1, writeBehindTotalCriticalOverflowCnt=-1, writeBehindCriticalOverflowCnt=-1, writeBehindErrorRetryCnt=-1, writeBehindBufSize=-1, totalPartitionsCnt=0, rebalancingPartitionsCnt=0, keysToRebalanceLeft=0, rebalancingKeysRate=0, rebalancingBytesRate=0, rebalanceStartTime=-1, rebalanceFinishTime=-1, keyType=java.lang.Object, valType=java.lang.Object, isStoreByVal=true, isStatisticsEnabled=false, isManagementEnabled=false, isReadThrough=false, isWriteThrough=false, isValidForReading=true, isValidForWriting=true]
Looks like there are no meaningful metrics. I figured that it may be part of the NearCacheConfiguration to configure stats as is the case with CacheConfiguration but no.
Any idea?
I figure it out. I missed the CacheConfiguration passed into the igniteCache object.
code snippet is:
CacheConfiguration<Integer, Integer> cacheConfiguration = new CacheConfiguration<Integer, Integer>("myCache");
cacheConfiguration.setStatisticsEnabled(true);
IgniteCache<Integer, Integer> cache = ignite.getOrCreateCache(cacheConfiguration, nearCfg);
after all the cache operations are run I see the statistics now.

Cluster sharding client not connecting with host

After recent investigation and a Stack over flow question I realise that the cluster sharding is a better option than a cluster-consistent-hash-router. But I am having trouble getting a 2 process cluster going.
One process is the Seed and the other is the Client. The Seed node seems to continuously throw dead letter messages (see the end of this question).
This Seed HOCON follows:
akka {
loglevel = "INFO"
actor {
provider = "Akka.Cluster.ClusterActorRefProvider, Akka.Cluster"
serializers {
wire = "Akka.Serialization.WireSerializer, Akka.Serialization.Wire"
}
serialization-bindings {
"System.Object" = wire
}
}
remote {
dot-netty.tcp {
hostname = "127.0.0.1"
port = 5000
}
}
persistence {
journal {
plugin = "akka.persistence.journal.sql-server"
sql-server {
class = "Akka.Persistence.SqlServer.Journal.SqlServerJournal, Akka.Persistence.SqlServer"
schema-name = dbo
auto-initialize = on
connection-string = "Data Source=localhost;Integrated Security=True;MultipleActiveResultSets=True;Initial Catalog=ClusterExperiment01"
plugin-dispatcher = "akka.actor.default- dispatcher"
connection-timeout = 30s
table-name = EventJournal
timestamp-provider = "Akka.Persistence.Sql.Common.Journal.DefaultTimestampProvider, Akka.Persistence.Sql.Common"
metadata-table-name = Metadata
}
}
sharding {
connection-string = "Data Source=localhost;Integrated Security=True;MultipleActiveResultSets=True;Initial Catalog=ClusterExperiment01"
auto-initialize = on
plugin-dispatcher = "akka.actor.default-dispatcher"
class = "Akka.Persistence.SqlServer.Journal.SqlServerJournal, Akka.Persistence.SqlServer"
connection-timeout = 30s
schema-name = dbo
table-name = ShardingJournal
timestamp-provider = "Akka.Persistence.Sql.Common.Journal.DefaultTimestampProvider, Akka.Persistence.Sql.Common"
metadata-table-name = ShardingMetadata
}
}
snapshot-store {
sharding {
class = "Akka.Persistence.SqlServer.Snapshot.SqlServerSnapshotStore, Akka.Persistence.SqlServer"
plugin-dispatcher = "akka.actor.default-dispatcher"
connection-string = "Data Source=localhost;Integrated Security=True;MultipleActiveResultSets=True;Initial Catalog=ClusterExperiment01"
connection-timeout = 30s
schema-name = dbo
table-name = ShardingSnapshotStore
auto-initialize = on
}
}
cluster {
seed-nodes = ["akka.tcp://my-cluster-system#127.0.0.1:5000"]
roles = ["Seed"]
sharding {
journal-plugin-id = "akka.persistence.sharding"
snapshot-plugin-id = "akka.snapshot-store.sharding"
}
}}
I have a method that essentially turns the above into a Config like so:
var config = NodeConfig.Create(/* HOCON above */).WithFallback(ClusterSingletonManager.DefaultConfig());
Without the "WithFallback" I get a null reference exception out of the config generation.
And then generates the system like so:
var system = ActorSystem.Create("my-cluster-system", config);
The client creates its system in the same manner and the HOCON is almost identical aside from:
{
remote {
dot-netty.tcp {
hostname = "127.0.0.1"
port = 5001
}
}
cluster {
seed-nodes = ["akka.tcp://my-cluster-system#127.0.0.1:5000"]
roles = ["Client"]
role.["Seed"].min-nr-of-members = 1
sharding {
journal-plugin-id = "akka.persistence.sharding"
snapshot-plugin-id = "akka.snapshot-store.sharding"
}
}}
The Seed node creates the sharding like so:
ClusterSharding.Get(system).Start(
typeName: "company-router",
entityProps: Props.Create(() => new CompanyDeliveryActor()),
settings: ClusterShardingSettings.Create(system),
messageExtractor: new RouteExtractor(100)
);
And the client creates a sharding proxy like so:
ClusterSharding.Get(system).StartProxy(
typeName: "company-router",
role: "Seed",
messageExtractor: new RouteExtractor(100));
The RouteExtractor is:
public class RouteExtractor : HashCodeMessageExtractor
{
public RouteExtractor(int maxNumberOfShards) : base(maxNumberOfShards)
{
}
public override string EntityId(object message) => (message as IHasRouting)?.Company?.VolumeId.ToString();
public override object EntityMessage(object message) => message;
}
In this scenario the VolumeId is always the same (just for experiment sake).
Both processes come to life but the Seed keeps throwing this error to the log:
[INFO][7/05/2017 9:00:58 AM][Thread 0003][akka://my-cluster-system/user/sharding
/company-routerCoordinator/singleton/coordinator] Message Register from akka.tcp
://my-cluster-system#127.0.0.1:5000/user/sharding/company-router to akka://my-cl
uster-system/user/sharding/company-routerCoordinator/singleton/coordinator was n
ot delivered. 4 dead letters encountered.
Ps. I am not using Lighthouse.
From the quick look, you're starting a cluster sharding proxy on your client node and you're telling it that sharded nodes are those using seed role. This doesn't match the cluster sharding definition on seed node, when you haven't specified any role.
Since there is no role to limit it, cluster sharding on a seed node will treat all nodes in the cluster as perfectly capable of hosting sharded actors - including client node, which doesn't have cluster sharding (non-proxy) instantiated on it.
This may not be the only issue, but you could either host cluster sharding on all of your nodes, or use ClusterShardingSettings.Create(system).WithRole("seed") to limit your shard only to a specific subset of nodes (having seed role) in the cluster.
Thanks Horusiath, that's fixed it:
return sharding.Start(
typeName: "company-router",
entityProps: Props.Create(() => new CompanyDeliveryActor()),
settings: ClusterShardingSettings.Create(system).WithRole("Seed"),
messageExtractor: new RouteExtractor(100)
);
The clustered shard is now communicating between the 2 processes. Thanks very much for that bit.

Libgit2sharp Push Performance Degrading With Many Commits

The project I am working on uses GIT in a weird way. Essentially it writes and pushes one commit at a time. The project could result in one branch having hundreds of thousands of commits. When testing we found that after only about 500 commits the performance of the GIT push started to degrade. Upon further investigation using a process monitor we believe that the degradation is due to a walk of the entire tree for the branch being pushed. Since we are only ever pushing one new commit at any given time is there any way to optimize this?
Alternatively is there a way to limit the commit history to be something like 50 commits to reduce this overhead?
I am using LibGit2Sharp Version 0.20.1.0
Update 1
To test I wrote the following code:
void Main()
{
string remotePath = #"E:\GIT Test\Remote";
string localPath = #"E:\GIT Test\Local";
string localFilePath = Path.Combine(localPath, "TestFile.txt");
Repository.Init(remotePath, true);
Repository.Clone(remotePath, localPath);
Repository repo = new Repository(localPath);
for(int i = 0; i < 2000; i++)
{
File.WriteAllText(localFilePath, RandomString((i % 2 + 1) * 10));
repo.Stage(localFilePath);
Commit commit = repo.Commit(
string.Format("Commit number: {0}", i),
new Signature("TestAuthor", "TestEmail#Test.com", System.DateTimeOffset.Now),
new Signature("TestAuthor", "TestEmail#Test.com", System.DateTimeOffset.Now));
Stopwatch pushWatch = Stopwatch.StartNew();
Remote defaultRemote = repo.Network.Remotes["origin"];
repo.Network.Push(defaultRemote, "refs/heads/master:refs/heads/master");
pushWatch.Stop();
Trace.WriteLine(string.Format("Push {0} took {1}ms", i, pushWatch.ElapsedMilliseconds));
}
}
private const string Characters = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
private static readonly Random Random = new Random();
/// <summary>
/// Get a Random string of the specified length
/// </summary>
public static string RandomString(int size)
{
char[] buffer = new char[size];
for (int i = 0; i < size; i++)
{
buffer[i] = Characters[Random.Next(Characters.Length)];
}
return new string(buffer);
}
And ran the process monitor found here:
http://technet.microsoft.com/en-us/sysinternals/bb896645.aspx
The time for each push ended up being generally low with large spikes in time increasing both in frequency and in latency. When looking at the output from the process monitor I believe these spikes lined up with a long stretch where objects in the .git\objects folder were being accessed. For some reason occasionally on a pull there are large reads of the objects which when looked at closer appears to be a walk through the commits and objects.
The above flow is a condensed version of the actual flow we were actually doing in the project. In our actual flow we would first create a new branch "Temp" from "Master", make a commit to "Temp", push "Temp", merge "Temp" with "Master" then push "Master". When we timed each part of that flow we found the push was by far the longest running operation and it was increasing in elapsed time as the commits piled up on "Master".
Update 2
I recently updated to use libgit2sharp version 0.20.1.0 and this problem still exists. Does anyone know why this occurs?
Update 3
We change some of our code to create the temporary branch off of the first commit ever on the "Master" branch to reduce the commit tree traversal overhead but found it still exists. Below is an example that should be easy to compile and run. It shows the tree traversal happens when you create a new branch regardless of the commit position. To see the tree traversal I used the process monitor tool above and command line GIT Bash to examine what each object it opened was. Does anyone know why this happens? Is it expected behavior or am I just doing something wrong? It appears to be the push that causes the issue.
void Main()
{
string remotePath = #"E:\GIT Test\Remote";
string localPath = #"E:\GIT Test\Local";
string localFilePath = Path.Combine(localPath, "TestFile.txt");
Repository.Init(remotePath, true);
Repository.Clone(remotePath, localPath);
// Setup Initial Commit
string newBranch;
using (Repository repo = new Repository(localPath))
{
CommitRandomFile(repo, 0, localFilePath, "master");
newBranch = CreateNewBranch(repo, "master");
repo.Checkout(newBranch);
}
// Commit 1000 times to the new branch
for(int i = 1; i < 1001; i++)
{
using(Repository repo = new Repository(localPath))
{
CommitRandomFile(repo, i, localFilePath, newBranch);
}
}
// Create a single new branch from the first commit ever
// For some reason seems to walk the entire commit tree
using(Repository repo = new Repository(localPath))
{
CreateNewBranch(repo, "master");
}
}
private const string Characters = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
private static readonly Random Random = new Random();
/// <summary>
/// Generate and commit a random file to the specified branch
/// </summary>
public static void CommitRandomFile(Repository repo, int seed, string rootPath, string branch)
{
File.WriteAllText(rootPath, RandomString((seed % 2 + 1) * 10));
repo.Stage(rootPath);
Commit commit = repo.Commit(
string.Format("Commit: {0}", seed),
new Signature("TestAuthor", "TestEmail#Test.com", System.DateTimeOffset.Now),
new Signature("TestAuthor", "TestEmail#Test.com", System.DateTimeOffset.Now));
Stopwatch pushWatch = Stopwatch.StartNew();
repo.Network.Push(repo.Network.Remotes["origin"], "refs/heads/" + branch + ":refs/heads/" + branch);
pushWatch.Stop();
Trace.WriteLine(string.Format("Push {0} took {1}ms", seed, pushWatch.ElapsedMilliseconds));
}
/// <summary>
/// Create a new branch from the specified source
/// </summary>
public static string CreateNewBranch(Repository repo, string sourceBranch)
{
Branch source = repo.Branches[sourceBranch];
string newBranch = Guid.NewGuid().ToString();
repo.Branches.Add(newBranch, source.Tip);
Stopwatch pushNewBranchWatch = Stopwatch.StartNew();
repo.Network.Push(repo.Network.Remotes["origin"], "refs/heads/" + newBranch + ":refs/heads/" + newBranch);
pushNewBranchWatch.Stop();
Trace.WriteLine(string.Format("Push of new branch {0} took {1}ms", newBranch, pushNewBranchWatch.ElapsedMilliseconds));
return newBranch;
}
/// <summary>
/// Get a Random string of the specified length
/// </summary>
public static string RandomString(int size)
{
char[] buffer = new char[size];
for (int i = 0; i < size; i++)
{
buffer[i] = Characters[Random.Next(Characters.Length)];
}
return new string(buffer);
}