I am using the following code:
from datetime import datetime
import time, pandas as pd, os, pickle
df = pd.DataFrame(np.arange(1,200))
fn = r'C:\z1.p'
pickle.dump(df, open(fn, 'wb'))
print(datetime.fromtimestamp(os.stat(fn).st_ctime))
os.remove(fn)
time.sleep(5)
pickle.dump(df, open(fn, 'wb'))
print(datetime.fromtimestamp(os.stat(fn).st_ctime))
But I get the same create time from both print statements as:
2022-03-16 08:43:30.885011
2022-03-16 08:43:30.885011
How do I make sure that new time gets printed for second print statement?
This is a Windows feature, called "file system tunnelling".
The apocryphal history of file system tunnelling
One of the file system features you may find yourself surprised by is
tunneling, wherein the creation timestamp and short/long names of a
file are taken from a file that existed in the directory previously.
In other words, if you delete some file “File with long name.txt” and
then create a new file with the same name, that new file will have the
same short name and the same creation time as the original file. You
can read this KB article for details on what operations are sensitive
to tunnelling.
Why does tunneling exist at all?
When you use a program to edit an existing file, then save it, you
expect the original creation timestamp to be preserved, since you’re
editing a file, not creating a new one. But internally, many programs
save a file by performing a combination of save, delete, and rename
operations (such as the ones listed in the linked article), and
without tunneling, the creation time of the file would seem to change
even though from the end user’s point of view, no file got created.
...
See this archived copy of Windows NT Contains File System Tunneling Capabilities:
When a name is removed from a directory (rename or delete), its
short/long name pair and creation time are saved in a cache, keyed by
the name that was removed. When a name is added to a directory (rename
or create), the cache is searched to see if there is information to
restore. The cache is effective per instance of a directory. If a
directory is deleted, the cache for it is removed.
These paired operations can cause tunneling on "name."
delete(name)/create(name)
delete(name)/rename(source, name)
rename(name, newname)/create(name)
rename(name, newname)/rename(source, name)
The idea is to mimic the behavior MS-DOS programs expect when they use
the safe save method. They copy the modified data to a temporary file,
delete the original and rename the temporary to the original. This
should seem to be the original file when complete. Windows performs
tunneling on both FAT and NTFS file systems to ensure long/short file
names are retained when 16-bit applications perform this safe save
operation.
One Windows function related to file tunneling is FltGetTunneledName():
The FltGetTunneledName routine retrieves the tunneled name for a file, given the normalized name returned for the file by a previous call to FltGetFileNameInformation, FltGetFileNameInformationUnsafe, or FltGetDestinationFileNameInformation.
...
To disable tunnelling:
Open regedit
Navigate here:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\FileSystem
On the Edit menu, point to New and then click DWORD Value
Type MaximumTunnelEntries and then press Enter
On the Edit menu, click Modify
Type 0 and then click OK
Restart your computer
Done
Related
When performing this sequence
Obtain a handle to a new file via window.showSaveFilePicker, say filename.ext
Obtain a writeable file stream from the handle
Write some content into the file using the stream
close the stream to signal completion
the File System API writes to filename.ext.crswap and on close copies filename.ext.crswap to filename.ext
Is there a reason that filename.ext.crswap is not rather renamed to filename.ext?
The reason for this behavior is to avoid partial writes:
"User agents try to ensure that no partial writes happen, i.e. the file represented by fileHandle will either contain its old contents or it will contain whatever data was written through stream up until the stream has been closed."—Spec.
We got a problem with Microsoft Access .mdb-files. After some time working with an .mdb-file in a multiuser environment the file becomes corrupted and has to be repaired. After it's repaired it takes less time to become corrupted again. And at some point after multiple reperations the file isn't usable at all anymore.
This problem started to appear after we changed from MS Access Runtime 2010 to MS Access Runtime 2013.
I've already spent some time looking into this problem and this is my theory:
The mdb-file appearently contains a "Database Header Page" (described in a White Paper from Microsoft called "Understanding Microsoft Jet Locking" from 1996) which saves information about the commit bytes of Users (important: 0000 = writing to disk, 0100 = accessed a corrupted page). And there is a paragraph in the white paper about this Database Header Page which explains exactly what is happening in our problem:
"[...]Therefore, if a value of 00 00 is present without corresponding user lock [this would be an entry in the corresponding .ldb-file I think], or a value of 0100 is present, users will not be allowed to connect to the database without first executing the repair utility."
So my guess is, that after some amount of loosing connection to the mdb/breaking the connection this Database Header Page overflows and you have to repair the file but the repair doesn't remove every entry inside of this Database Header Page, so the amount of broken connections needed for the file to become corrupted decreases until the file completely breaks.
Now I would like to know if this theory is any good and if it is I would like to know:
How can I test this theory (how can I read this Database Header Page of the mdb-file)?
Can I modify the Database Header Page?
Can I modify the Database Header Page while someone is working with the mdb?
I know it's a very specific problem but I hope you guys can help me!
P.S. I can't find a link to the white paper but the "LDBViewer"-packet contains this white paper.
A few quick and dirty tricks that I often use are enabled by creating a small database which serves one purpose: control how the desired target database is opened.
This database is copied to the desktop of every user, so every user/session has its own.
So I would have
* 1 database on a server-location : [TheBigOne.mdb]
* 1 database copied to several desktops : [TheCaller.mdb]
--01-- :
Every morning, call a function which will perform the following steps :
* rename [TheCaller.mdb] to [TheCaller-xxx.mdb]
* create and open [TheCaller-NEW.mdb]
* for all tables, queries, forms, reports, macros, modules
DoCmd.TransferDatabase acImport, , [TheCaller-xxx.mdb], _
acTable-acQuery-acForm-acReport-acMacro-acModule, "y", "y", False
* rename [TheCaller-NEW.mdb] to [TheCaller.mdb]
--02--
Create a form in [TheCaller.mdb] with one button, with an OnClick-event like :
If Dir(ServerLocation & "\TheBigOnd.ldb") = "" Then
'open [TheBigOne.mdb]
Then
Msgbox "Database already in use!"
End If
This is prone to a lot of headaches, because it will happen that the [TheBigOnd.ldb] file does exists, while no one is using the database. But at least, you will have less database-corruption.
--03--
An alternative for this procedure --02-- can be achieved by using a command-line switch /excl. At any given time, only one user will be able to work in the database.
--04--
Create a second button on the form from point --02--, which will open [TheBigOne.mdb] with command-line switch /ro. This will open the database in a read-only mode for this user, avoiding the corruption of the database.
--05--
Create a small backend database db_sessions.mdb on a server-location, with
a table in it like T_Sessions(id, whois, started_at_timestamp, ended_at_timestamp, excl_or_ro, troubles) for keeping track [who] [opens] and [closes] [when] the database [TheBigOne.mdb].
If a user wants more priviliges then read-only, the following test has to be true: DCount("*","T_Sessions","ended_at_timestamp is null AND excl_or_ro = 'excl'") = 0. If this test results in false, the field troubles can be used to dispatch messages to other users.
I have a rather complex requirement - that I have to drop a very specifically named file in an FTP location, and the trick here is that I would often have to drop it into a new location and with a new file name each time (both directory name and file name depending on the year, month, date and time). Obviously, for this purpose I chose to use a Dynamic Send Port, which I have configured using a MessageAssignment Shape.
A file will be generated each day. I need to drop it in a remote location in this form:
sample-servername-stage/default/file/ftp/PaymentReports/YYYY/MM_[MonthName]/PaymentReportYYYYMMDD_HHMISS
For example, for a file posted on March, 2 2016 at 6:45pm, we would have:
sample-servername-stage/default/file/ftp/PaymentReports/2016/03_March/PaymentReport20160302_184500
Here's the code I have in the MessageAssignment Shape:
FTPSendPort1(Microsoft.XLANGs.BaseTypes.Address) = "ftp://sample-servername-stage:721";
FTPSendPort1(Microsoft.XLANGs.BaseTypes.TransportType) = "FTP";
Output(FTP.CommandLogFileName) = "D:\\BiztalkLogs\\FTPLog\\DynamicFTPLog.txt";
Output(FTP.UserName) = "sampleUsername";
Output(FTP.Password) = "samplePassword";
Output(FTP.BeforePut) = "MKD " + Variable_1 + ";CWD " + Variable_1;
FTPSendport1 - name of the Dynamic Send Port.
Output - name of the Output message.
Variable_1 - variable where I will store the directory name to be created.
Here are the biggest issues:
I need to check if a directory already exists - the year, then navigate in and check if the month already exists. If they exist I simply go in there and drop the file. If not, I create it and drop the file in there.
I need to name the file with the date time specifics in the format shown above. In addition to the code shown above, I have tried a number of things including setting FILE.ReceivedFileName, FTP.ReceivedFileName properties etc. Nothing seems to work. This may be because I cannot use the macro %SourceFileName% anywhere. Because of this it keeps dropping the file into the location with a GUID name instead of the one I set. It behaves as though it completely skips/overlooks the command where I set the file name.
I'm thoroughly confused at this point. I'm not sure of how I can mix checking conditions (if the folders already exist etc.) with FTP commands, and especially not sure of how to do this within an orchestration.
The file naming is done in the address property where you provide the FTP URL. In fact you can even use macros in there. Try that:
FTPSendPort1(Microsoft.XLANGs.BaseTypes.Address) = "ftp://sample-servername-stage:721/SomeFolder/SomeFileName_%datetime%.xml"
For you other problem of checking if folders exists on the FTP and creating them, I think you'll have to write a custom pipeline component.
I'm seeing an issue with an SSIS (SQL Server 2005) job where I'm getting the following error:
The file name "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=\UNC\FOLDERS\filename.xls;Extended Properties="EXCEL 8.0;HDR=YES";" specified in the connection was not valid.
My searching around this site and others indicates that the most common cause of this is a permissions error but I don't believe that's the case in this situation since any number of files have successfully been processed through this implementation.
Here's an overview of the setup:
Vendors FTP files to us on a daily basis that a Windows service picks up, copies to a temporary directory and then calls SSIS jobs on those files. There are two SSIS jobs for each vendor one for a snapshot data feed and one for a transaction listing.
There are currently over 50 different SSIS jobs in the overall process. All of them work except for one specific transaction job which fails with the above error in a script task step. Files come in at least daily with unique file names so I grab the job, determine the vendor based off the source directory and then the file type based off indicators in the file name to determine which SSIS job to call. Since file names change every day, when the service calls the SSIS job, I pass in a series of parameters including the vendor file name so it can properly connect to the file.
Each job begins with a script task that sets necessary variable values for the rest of the job. For example, since the vendor file name changes with each run, I pass in the vendor file name through the SSIS variables collection then set the connection string of a datasource using that file name as the DataSource in the string. It is at that point of the script task that the above error occurs. Here's the task script code where the error occurs:
Dts.Connections("Transactions File").ConnectionString = _
Dts.Variables("ConnectionString").Value.ToString().Replace("##FILE_PATH##", sourceFilePath)
The ConnectionString value is: Provider=Microsoft.Jet.OLEDB.4.0;Data Source=##FILE_PATH##;Extended Properties="EXCEL 8.0;HDR=YES";
The sourceFilePath is the full UNC path to the vendor file in the processing directory
I don't believe it's a permissions error since all the other files going through this process (using the same holding directory for processing) are working. It shouldn't be an issue of the file not existing since again it follows the same process as every other file and I have verified the file properly ends up in the correct directory. I also considered that the connection string might be too long, but the filepath ends up at 109 characters and even with a shorter (<90) full path, the same error occurs.
Is there anything else you can you think of for me to look at? Thanks for any help.
Based on the information presented, you are doing everything correct. If you're new to SSIS, one thing I'd suggest, is that you get a copy of the excellent add-in BIDSHelper. It has great features that can really save you time especially with regard to configurations and expressions.
I created a reference package that had an Excel Connection Manager pointing to C:\ssisdata\so_paulsmithjr.xls and wired everything up.
At this point, I know things are working so it was time to make the package move. I created the following variables and their values
CurrentFile - C:\ssisdata\so_paulsmithjr.xls
PlaceHolder - ##FILE_PATH##
TemplateConnection - Provider=Microsoft.Jet.OLEDB.4.0;Data Source=##FILE_PATH##;Extended Properties="Excel 8.0;HDR=YES";
A fourth variable is set to be an expression (Right click on variable, properties window. Set Evaluate as Expression = True & Expression is below)
CurrentConnection - REPLACE(#[User::TemplateConnection], #[User::PlaceHolder], #[User::CurrentFile])
I compared the CurrentConnection value to the ReferenceConnection (which is the original value of the Excel Connection Manager's connection string) and things were a match. At this point, if I were to change the value of CurrentFile to C:\ssisdata\so_paulsmithjr - Copy.xls, that would automatically be reflected in the value of CurrentConnection.
The final trick would be to use an Expression on the Excel Connection Manager. Again, right click on the CM and under Properties, there will be Expressions. It won't expand as there is nothing under it. Instead click the ellipses and then select ConnectionString property and select the ellipses again and this time drag down the #[User::CurrentFile] variable. Click OK x2 and now your connection manager is set to use wherever the CurrentConnection variable specifies.
Does that work any better?
Allright I got myself in a deadlock with Mercurial and sub-repos... Here's what happenend:
I had a large mercurial repo that I server via apache and hgweb.cgi.
Due to the size of the repo I decided to move to sub-repositories and share these with hgwebdir.cgi.
Using the convert tool with the filemap option I created several sub-repositories:
/main/foo
/main/bar
Nicely created an entry for the sub-repositories in .hgsub:
foo = foo
bar = bar
And set hgwebdir.cgi up to show $/** as the root folder.
Now when I went to my site (foo.com/hg) I saw my sub-repositories with one empty reposory among them (no name, no content), but I could not download it (archive location unknown):
empty_repo http://img707.imageshack.us/img707/8237/emptysubrepo.png
That was allright until I added a new sub-repository.
I could not push the new .hgsub file to foo.com/hg, since that page is served by hgwebdir.
The only method I can work currently is switch from hgwebdir to hgweb, commit .hgsubste and switch back to hgwebdir.
Does someone have a good setup for such a mess?
On the webserver your main and its subrepos should appear as siblings -- not with the subrepos inside main.
Main
ASCII
AlignDistribute
And the URLs in your .hgsub should look like:
ASCII = ../ASCII
AlignDistribute = ../AlignDsitribute
Then you'll be able to push/pull to http://foo.com/hg/Main and when you clone it the clone/update will automatically attach and clone down the separate subrepos.
From what I've read on https://www.mercurial-scm.org/wiki/PublishingRepositories#multiple
The keys (on the left) and the values (on the right) are both filesystem paths
The keys should be prefixes of the values and are "subtracted" from the values in order to generate the URL paths to each repository
What I'm guessing happened is that in your hgweb(dir) configuration you're specifying the same value for a collection possibly as the key, so during subtraction it ends up with a blank name and no way to get to it.
When I use [collections] to set /a/full/path = /a/full/path directly to a repo, it'll end up blank too, because it's reading that folder as a repo because it is a repo, instead of each sub-directory being an individual repo, after I removed the .hg folder and .hgsubs and everything from the root of my collection entry, all the subfolders started showing up properly.
I originally used in [paths], /path/to/my/project = /path/to/my/project, and since that is a single referenced repository, it'll subtract the value from the key, leaving you once again with '', instead I used project = /path/to/my/project and it came out as 'project'.
Hopefully that URL or these descriptions will get you out of your pickle!