If I have two streams(called Stream1, Stream2) of development and I want to merge these two streams into one stream.
This is currently how I do it :
Create a new Stream - Stream3. Stream 3 will contain a merge of Stream1 & Stream2
Create a new repositroy workspace(called workspace1) from Stream3
Set flow target of workspace1 to Stream1
Within 'Pending Changes' all of the change sets which were available in Stream1 but not in Stream2 should now appear as change sets which can be delivered to Stream3
Is this best practice for merging 2 or more streams or is there more elegant method?
Create a new stream??? No need.
When you open a repo workspace, you have a section called "Flow targets", which contains by your Stream (characterized as "Default").
Add to it the Stream source (the Stream from which you want to merge), set it as "current", and you will see in your "Pending changes" view an "Incoming" section with all the change sets or baselines coming from that source Stream.
The idea is for you to accept those change set, load them in your local workspace and test them (compilation and test), and then deliver them back to your default stream.
The "Accept" phase is where the merge occurs (automatically, or manually if conflicts).
As mentioned in this thread:
The merge algorithm in RTC is logically the same as is found in ClearCase, i.e., given a configuration (stream, workspace) that selects a different version of a given file, find the common ancestor of the two versions, and then do a 3-way merge.
Changing the "flow target" of a workspace is just RTC's way of letting you specify what branch (stream) you want to merge into your workspace (cleartool findmerge gives you the same flexibility).
Note that ClearCase and RTC use a different common ancestor algorithm.
The last step supposes you go back to your "Flow target" section, and set back as "current" the default Stream.
I prefer this workflow to this one:
Here Brent would set as current the destination stream in order to deliver the result of the merge. That is an alternative workflow, also described in "How to keep your streams flowing smoothly in Rational Team Concert 3.0.1".
Related
I am using the following code:
from datetime import datetime
import time, pandas as pd, os, pickle
df = pd.DataFrame(np.arange(1,200))
fn = r'C:\z1.p'
pickle.dump(df, open(fn, 'wb'))
print(datetime.fromtimestamp(os.stat(fn).st_ctime))
os.remove(fn)
time.sleep(5)
pickle.dump(df, open(fn, 'wb'))
print(datetime.fromtimestamp(os.stat(fn).st_ctime))
But I get the same create time from both print statements as:
2022-03-16 08:43:30.885011
2022-03-16 08:43:30.885011
How do I make sure that new time gets printed for second print statement?
This is a Windows feature, called "file system tunnelling".
The apocryphal history of file system tunnelling
One of the file system features you may find yourself surprised by is
tunneling, wherein the creation timestamp and short/long names of a
file are taken from a file that existed in the directory previously.
In other words, if you delete some file “File with long name.txt” and
then create a new file with the same name, that new file will have the
same short name and the same creation time as the original file. You
can read this KB article for details on what operations are sensitive
to tunnelling.
Why does tunneling exist at all?
When you use a program to edit an existing file, then save it, you
expect the original creation timestamp to be preserved, since you’re
editing a file, not creating a new one. But internally, many programs
save a file by performing a combination of save, delete, and rename
operations (such as the ones listed in the linked article), and
without tunneling, the creation time of the file would seem to change
even though from the end user’s point of view, no file got created.
...
See this archived copy of Windows NT Contains File System Tunneling Capabilities:
When a name is removed from a directory (rename or delete), its
short/long name pair and creation time are saved in a cache, keyed by
the name that was removed. When a name is added to a directory (rename
or create), the cache is searched to see if there is information to
restore. The cache is effective per instance of a directory. If a
directory is deleted, the cache for it is removed.
These paired operations can cause tunneling on "name."
delete(name)/create(name)
delete(name)/rename(source, name)
rename(name, newname)/create(name)
rename(name, newname)/rename(source, name)
The idea is to mimic the behavior MS-DOS programs expect when they use
the safe save method. They copy the modified data to a temporary file,
delete the original and rename the temporary to the original. This
should seem to be the original file when complete. Windows performs
tunneling on both FAT and NTFS file systems to ensure long/short file
names are retained when 16-bit applications perform this safe save
operation.
One Windows function related to file tunneling is FltGetTunneledName():
The FltGetTunneledName routine retrieves the tunneled name for a file, given the normalized name returned for the file by a previous call to FltGetFileNameInformation, FltGetFileNameInformationUnsafe, or FltGetDestinationFileNameInformation.
...
To disable tunnelling:
Open regedit
Navigate here:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\FileSystem
On the Edit menu, point to New and then click DWORD Value
Type MaximumTunnelEntries and then press Enter
On the Edit menu, click Modify
Type 0 and then click OK
Restart your computer
Done
When performing this sequence
Obtain a handle to a new file via window.showSaveFilePicker, say filename.ext
Obtain a writeable file stream from the handle
Write some content into the file using the stream
close the stream to signal completion
the File System API writes to filename.ext.crswap and on close copies filename.ext.crswap to filename.ext
Is there a reason that filename.ext.crswap is not rather renamed to filename.ext?
The reason for this behavior is to avoid partial writes:
"User agents try to ensure that no partial writes happen, i.e. the file represented by fileHandle will either contain its old contents or it will contain whatever data was written through stream up until the stream has been closed."—Spec.
for submissions in itertools.zip_longest(submission_stream, submission_stream2): #want to put all streams here
for submission in submissions:
# processing
The above code works for two streams that I have initialised. My goal is to combine streams based on username in a .csv file. If a username is there, run a stream for them. If it gets removed, or a new username is added, remove or start that stream respectively.
An example of a stream is:
submission_stream = reddit.redditor("username").stream.submissions(skip_existing=True, pause_after=-1)
I would really appreciate if someone would guide me.
You would probably have to start streaming over again every time your .csv file is changed, although you could get away with filtering (itertools.filterfalse)
for username removals. Code sketch, assuming functions to get a list of streams, determine if a submission belongs to deleted username, and determine if file was changed with an addition:
while True:
streams = get_list_of_streams_from_csv()
for submissions in itertools.zip_longest(streams):
for submission in itertools.filterfalse(not_deleted, submissions):
#processing
if csv_changed_to_add()
break
Adding in additional streams, capturing deletion with .filterfalse:
streams = get_list_of_streams_from_csv()
zip_iter = itertools.zip_longest(streams)
while True:
for submission in itertools.filterfalse(not_deleted, zip_iter):
#processing
if csv_changed_to_add()
break
zip_iter = itertools.zip_longest(zip_iter, get_list_of_new_streams())
Questions: How do I continue to process files that differ substantially from a base schema and that trigger tSchemaComplianceCheck errors?
Background
Suppose I have a folder with Customer xls files called file1,file2,....file1000. Assume I have imported the file schema into Talend repository and called it 6Columns and I have the talend job configured to iterate through each of the files and process them
1-tFileInput ->2-tSchemaCompliance-6Columns -> 3-tMap ->4-FurtherProcessing
Read each excel file
Compare it to the schema 6Columns
Format the output (rename columns)
Take the collection of Customer data and process it more
While processing I notice that the schema compliance is generating errors (errorCode 16) which points to a number of files (200) with a different schema 13Columns but there isn't a way to identify the files in advance to filter then into a subjob
How do I amend my processing to correctly integrate the files with 13Columnsschema into the process (whats the recommended way of handling) and designing incase other schema changes occur
1-tFileInput ->2-tSchemaCompliance-6Columns -> 3-tMap ->4-FurtherProcessing
|
|Reject Flow (ErrorCode 16)
|Schema-13Columns
|
|-> ??
Current Thinking When ErrorCode 16 detected
Option 1 Parallel. Take the file path for the current file and process it against 13Columns using a new FileInput before merging the 2 flows back into 1
Option 2 Serial. Collect the list of files that triggered the error and process them after I've finished with the compliance files?
You could try something like below :
tFileList - Read your input repository
tFileInput "schema6" - tSchemaComplianceCheck : read files as 6-columns schema
tMap_1 : further processing
In the reject part :
tMap after reject link : add a new column containing the filepath that has been rejected
tFlowToIterate : used to get an iterate link, acceptable input for tFileInputDelimited that follows.
tFileInput : read data as 13-columns schema. Following components are the same as in part 1.
After that, you can push your data to tHashOutput, in order to read them further in another subjob.
Allright I got myself in a deadlock with Mercurial and sub-repos... Here's what happenend:
I had a large mercurial repo that I server via apache and hgweb.cgi.
Due to the size of the repo I decided to move to sub-repositories and share these with hgwebdir.cgi.
Using the convert tool with the filemap option I created several sub-repositories:
/main/foo
/main/bar
Nicely created an entry for the sub-repositories in .hgsub:
foo = foo
bar = bar
And set hgwebdir.cgi up to show $/** as the root folder.
Now when I went to my site (foo.com/hg) I saw my sub-repositories with one empty reposory among them (no name, no content), but I could not download it (archive location unknown):
empty_repo http://img707.imageshack.us/img707/8237/emptysubrepo.png
That was allright until I added a new sub-repository.
I could not push the new .hgsub file to foo.com/hg, since that page is served by hgwebdir.
The only method I can work currently is switch from hgwebdir to hgweb, commit .hgsubste and switch back to hgwebdir.
Does someone have a good setup for such a mess?
On the webserver your main and its subrepos should appear as siblings -- not with the subrepos inside main.
Main
ASCII
AlignDistribute
And the URLs in your .hgsub should look like:
ASCII = ../ASCII
AlignDistribute = ../AlignDsitribute
Then you'll be able to push/pull to http://foo.com/hg/Main and when you clone it the clone/update will automatically attach and clone down the separate subrepos.
From what I've read on https://www.mercurial-scm.org/wiki/PublishingRepositories#multiple
The keys (on the left) and the values (on the right) are both filesystem paths
The keys should be prefixes of the values and are "subtracted" from the values in order to generate the URL paths to each repository
What I'm guessing happened is that in your hgweb(dir) configuration you're specifying the same value for a collection possibly as the key, so during subtraction it ends up with a blank name and no way to get to it.
When I use [collections] to set /a/full/path = /a/full/path directly to a repo, it'll end up blank too, because it's reading that folder as a repo because it is a repo, instead of each sub-directory being an individual repo, after I removed the .hg folder and .hgsubs and everything from the root of my collection entry, all the subfolders started showing up properly.
I originally used in [paths], /path/to/my/project = /path/to/my/project, and since that is a single referenced repository, it'll subtract the value from the key, leaving you once again with '', instead I used project = /path/to/my/project and it came out as 'project'.
Hopefully that URL or these descriptions will get you out of your pickle!