How to update the "last update" date in text files automatically according their modification date in OS? - automation

There are sevaral source files in VHDL. All files have a header which gives the file name, creation date and description among other things. One of these things is the last update date. All files are version controlled in Git.
What happens is that often the files are modified, commited and pushed up. However, the last update date is not updated often. This happens by mistake since so many different files are worked on at different times and one might forget to always change the "last update" part of the file header to the latest date when it has actually been changed.
I want to automate this process and believe there are many different ways to do this.
A script of some sort, must check the last update date in the text file header. Then, if it is different from the actual last modified date that can be accessed through properties of the file in the file-system, the last update date in the text must be updated to the last modified date value. What would be the most optimal way to do this? A Python script, Bash script or something else?
Basically I want to do this when the files are being commited into Git. It should ideally happen automatically but running one line in terminal to execute script is not a big deal perhaps. The check is required on the files that are being commited and pushed up.

I'm not a Python programmer, but I made a little script to hopefully help you out. Maybe this fits your needs.
What the script should do:
Get all files form the path (here c:\Python) which have the extension .vdhl
Loop over the files and extract the date from line 9 via regex
Get the last modified date from the file
If last modified > then the date in the file, then update the file
import os
import re
import glob
import datetime
path = r"c:\Python"
mylist = [f for f in glob.glob("*.vhdl")]
print(mylist)
for i in mylist:
filepath = os.path.join(path, i)
with open(filepath, 'r+') as f:
content = f.read()
last_update = re.findall("Last\supdate\:\s+(\d{4}-\d{2}-\d{2})", content)
modified = os.path.getmtime(filepath)
modified_readable = str(datetime.datetime.fromtimestamp(modified))[:10]
#print(content)
#print(last_update)
#print(modified_readable)
#print("Date modified:", datetime.datetime.fromtimestamp(modified))
if (modified_readable > last_update[0]):
print(filepath, 'UPDATE')
text = re.sub(last_update[0], modified_readable, content)
f.seek(0)
f.write(text)
f.truncate()
else:
print(filepath, 'NO CHANGE')

Related

Pandas, Glob, use wildcard to stand for end of filename

I am programming (Pandas) around a problem where certain generated files are saved with a date attached to the file. For example: file-name_20220814.csv.
However, these files change each time they are generated, creating a new ending to the file. What is the best way to use a wildcard to stand for these file date endings?
Glob? How would I do that in the following code:
df1 = pd.read_csv('files/file-name_20220816.csv')
Answer provided by #mitoRibo:
pd.read_csv(glob('files/file-name_*csv')[0])

Export panda dataframe to CSV

I have a code where at the end I export a dataframe in CSV format. However each time I run my code it replaces the previous CSV while I would like to accumulate my csv files
Do you now a method to do this ?
dfind.to_csv(r'C:\Users\StageProject\Indicateurs\indStat.csv', index = True, header=True)
Thanks !
The question is really about how you want to name your files. The easiest way is just to attach a timestamp to each one:
import time
unix_time = round(time.time())
This should be unique under most real-world conditions because time doesn't go backwards and Python will give time.time() only in UTC. Then just save to the path:
rf'C:\Users\StageProject\Indicateurs\indStat_{unix_time}.csv'
If you want to do a serial count, like what your browser does when you save multiple versions, you will need to iterate through the files in that folder and then keep adding one to your suffix until you get to a file path that does not conflict, then save thereto.

Getting Error for Excel to Table Conversion

I just started learning Python and now I'm trying to integrate that with my GIS knowledge. As the title suggests, I'm attempting to convert an Excel sheet to a table but I keep getting errors, one which is wholly undecipherable to me and the other which seems to be suggesting that my file does not exist which, I know is incorrect since I copied it's location directly from it's properties.
Here is a screenshot of my environment. Please help if you can and thanks in advance.
Environment/Error
Simply set, you put the workspace directory inside the filename variable so when arcpy handles it, it tries to acess a file that does not exist, in an unknown workspace.
Try this.
arcpy.env.workspace = "J:\egis_work\dpcd\projects\SHARITA\Python\"
arcpy.ExcelToTable_conversion("Exceltest.xlsx", "Bookstorestable", "Sheet1")
Arcpy uses the following syntax to convert geodatabase tables to excel
It is straight forward.
Example
Excel tables cannot be stored in the geodatabase. Most reasonable thing is to store them in the rootfolder in which the geodatabase with the table is. Say I want to convert table below into excel and save it in the root folder or in the folder in which the geodatabase is.
I will go as follows: I have put the explanations after the #.
import arcpy
import os
from datetime import datetime, date, time
# Set environment settings
in_table= r"C:\working\Sunderwood\Network Analyst\MarchDistances\Centroid.gdb\SunderwoodFirstArcpyTable"
#os.path.basename(in_table)
out_xls= os.path.basename(in_table)+ datetime.now().strftime('%Y%m%d') # Here
#os.path.basename(in_table)- Gives the base name of pathname. In this case, it returns the name table
# + is used in python to concatenate
# datetime.now()- gives todays date
# Converts todays date into a string in the format YYYMMDD
# Please add all the above statements and you notice you have a new file name which is the table you input plus todays date
#os.path.dirname() method in Python is used to get the directory name from the specified path
geodatabase = os.path.dirname(in_table)
# In this case, os.path.dirname(in_table) gives us the geodatabase
# The The join() method takes all items in an iterable and joins them into one string
SaveInFolder= "\\".join(geodatabase.split('\\')[:-1])
# This case, I tell python take \ and join on the primary directory above which I have called geodatabase. However, I tell it to remove some characters. I will explain the split below.
# I use split method. The split() method splits a string into a list
#In the case above it splits into ['W:\\working\\Sunderwood\\Network', 'Analyst\\MarchDistances\\Centroid.gdb']. However, that is not what I want. I want to remove "\\Centroid.gdb" so that I remain with the follwoing path ['W:\\working\\Sunderwood\\Network', 'Analyst\\MarchDistances']
#Before I tell arcpy to save, I have to specify the workspace in which it will save. So I now make my environment the SaveInFolder
arcpy.env.workspace =SaveInFolder
## Now I have to tell arcpy what I will call my newtable. I use os.path.join.This method concatenates various path components with exactly one directory separator (‘/’) following each non-empty part except the last path component
newtable = os.path.join(arcpy.env.workspace, out_xls)
#In the above case it will give me "W:\working\Sunderwood\Network Analyst\MarchDistances\SunderwoodFirstArcpyTable20200402"
# You notice the newtable does not have an excel extension. I resort to + to concatenate .xls onto my path and make it "W:\working\Sunderwood\Network Analyst\MarchDistances\SunderwoodFirstArcpyTable20200402.xls"
table= newtable+".xls"
#Finally, I call the arcpy method and feed it with the required variables
# Execute TableToExcel
arcpy.TableToExcel_conversion(in_table, table)
print (table + " " + " is now available")

BIDS Import from changing file name [wildcard?]

I'm attempting to create a process to import data. I created the entire process and it works, but I'm having trouble creating the variable to find the file name of the csv i want to import automatically. Each time a new csv is uploaded to me it has a timestamp on it. I want to be able to grab that file no matter what the name is and do work to it.
So for example this week the file name would be
filename_4-14-2014.csv
And next week
filename_4_21_2014.csv
And so on into eternity. . .
Is there a way to create a variable that picks up the full file name even though its changing?
After doing some poking around, I've discovered the following...
You can use a file system task to perform the copy operation I was referring to. You can set the input file and the output file as variables. This way you can always know that the file you use for import is always named the same, and has the right data.
You just need to add the variables and a File System Task to your package.
Ok so to accomplish what I wanted I created a Foreach Loop Container. Using the foreach loop container I had it look for any files ending with .csv in my specified folder by using a wildcard [denoted by asterisk: *.csv] .
Within the Foreach Loop container is as follows.
Step 1: File System Task - rename file.
Step 2: Data Flow Task - Import data to sql
Step 3: File System Task - Copy the file to another folder, append datetime to filename
Step 4: File System Task - Delete source file.
I used variables to get all the file and folder names plus datetimes.

how to get the latest file in the folder

i have written the code to retrieve and file and time it got created by i just want to get the latest file name created. Please suggest how can i do that in jython .
import os
import glob
import time
folder='C:/xml'
for folder in glob.glob(folder):
for file in glob.glob(folder+'/*.xml'):
stats=os.stat(file)
print file ,time.ctime(stats[8])
Thanks again for all your help
I have re-modified the codes as suggested and i am not getting the right answer , Please suggest what mistake i am doing.
import os
import glob
import time
folder='C:/xml'
for x in glob.glob(folder+"/*.xml"):
(mode, ino, dev, nlink, uid, gid, size, atime, mtime, ctime)=os.stat(x)
time1=time.ctime(mtime)
for z in glob.glob(folder+"/*.xml"):
(mode, ino, dev, nlink, uid, gid, size, atime, mtime, ctime)=os.stat(z)
time2=time.ctime(mtime)
if (time1>time2):
new_file=x
new_time=time1
else:
new_file=z
new_time=time2
print new_file,new_time
Use two variables to keep track of the name and time of the latest file found so far. Whenever you find a later file, update both variables. When your loop is done, the variables will contain the name and time of the latest file.
I'm not quite sure why you have two nested loops in your example code; if you're looking for all *.xml files in the given directory, you only need one loop.
A Pythonic solution might be something like:
folder = "C:/xml"
print max((os.stat(x)[8], x) for x in glob.glob(folder+"/*.xml"))
If you choose the max() solution, be sure to consider the case where there are no *.xml files in your directory.