BIDS Import from changing file name [wildcard?] - variables

I'm attempting to create a process to import data. I created the entire process and it works, but I'm having trouble creating the variable to find the file name of the csv i want to import automatically. Each time a new csv is uploaded to me it has a timestamp on it. I want to be able to grab that file no matter what the name is and do work to it.
So for example this week the file name would be
filename_4-14-2014.csv
And next week
filename_4_21_2014.csv
And so on into eternity. . .
Is there a way to create a variable that picks up the full file name even though its changing?

After doing some poking around, I've discovered the following...
You can use a file system task to perform the copy operation I was referring to. You can set the input file and the output file as variables. This way you can always know that the file you use for import is always named the same, and has the right data.
You just need to add the variables and a File System Task to your package.

Ok so to accomplish what I wanted I created a Foreach Loop Container. Using the foreach loop container I had it look for any files ending with .csv in my specified folder by using a wildcard [denoted by asterisk: *.csv] .
Within the Foreach Loop container is as follows.
Step 1: File System Task - rename file.
Step 2: Data Flow Task - Import data to sql
Step 3: File System Task - Copy the file to another folder, append datetime to filename
Step 4: File System Task - Delete source file.
I used variables to get all the file and folder names plus datetimes.

Related

Archiving files using Pentaho PDI

I need to archive the txt file using Pentaho PDI by giving it a dynamic timestamp and append the variable to the output filename. I used get system info which automatically assigns variable as well as value. So my job was Start__ get system info___zip file. In the zip file component, I tried called the variable while giving the output filename along with ${Variable} but the output filename is not coming properly. It should be off filename__timestamp__variable. Can someone please help me with this?

Copy multiple files from multiple folder to single folder in Azure data factory

I have a folder structure like this as a source
Source/2021/01/01/*.xlsx files
Source/2021/03/02/*.xlsx files
Source/2021/04/03/.*xlsx files
Source/2021/05/04/.*xlsx files
I want to drop all these excel files into a different folder called Output.
Method 1:
When I am trying this, I used copy activity and I am able to get Files with folder structure( not a requirement) in Output folder. I used Binary file format.
Method 2:
Also, I am able to get files as some random id .xlsx in my output folder. I used Flatten Hierrachy.
My requirement is to get files with the same name as source.
This is what i suggest and I have implemented something in the past and i am pretty confident this should work .
Steps
Use getmetada activity and try to loop through all the folder inside Source/2021/
Use a FE loop and pass the ItemType as folder ( so that you get folder only and NO files , I know at this time you dont; have file )
Inside the IF , add a Execute pipeline activity , they should point to a new pipeline which will take a parameter like
Source/2021/01/01/
Source/2021/03/02/
The new pipeline should have a getmetadata activity and FE loop and this time we will look for files only .
Inside the FE loop add a copy activity and now will have to use the full file name as source name .

Getting Error for Excel to Table Conversion

I just started learning Python and now I'm trying to integrate that with my GIS knowledge. As the title suggests, I'm attempting to convert an Excel sheet to a table but I keep getting errors, one which is wholly undecipherable to me and the other which seems to be suggesting that my file does not exist which, I know is incorrect since I copied it's location directly from it's properties.
Here is a screenshot of my environment. Please help if you can and thanks in advance.
Environment/Error
Simply set, you put the workspace directory inside the filename variable so when arcpy handles it, it tries to acess a file that does not exist, in an unknown workspace.
Try this.
arcpy.env.workspace = "J:\egis_work\dpcd\projects\SHARITA\Python\"
arcpy.ExcelToTable_conversion("Exceltest.xlsx", "Bookstorestable", "Sheet1")
Arcpy uses the following syntax to convert geodatabase tables to excel
It is straight forward.
Example
Excel tables cannot be stored in the geodatabase. Most reasonable thing is to store them in the rootfolder in which the geodatabase with the table is. Say I want to convert table below into excel and save it in the root folder or in the folder in which the geodatabase is.
I will go as follows: I have put the explanations after the #.
import arcpy
import os
from datetime import datetime, date, time
# Set environment settings
in_table= r"C:\working\Sunderwood\Network Analyst\MarchDistances\Centroid.gdb\SunderwoodFirstArcpyTable"
#os.path.basename(in_table)
out_xls= os.path.basename(in_table)+ datetime.now().strftime('%Y%m%d') # Here
#os.path.basename(in_table)- Gives the base name of pathname. In this case, it returns the name table
# + is used in python to concatenate
# datetime.now()- gives todays date
# Converts todays date into a string in the format YYYMMDD
# Please add all the above statements and you notice you have a new file name which is the table you input plus todays date
#os.path.dirname() method in Python is used to get the directory name from the specified path
geodatabase = os.path.dirname(in_table)
# In this case, os.path.dirname(in_table) gives us the geodatabase
# The The join() method takes all items in an iterable and joins them into one string
SaveInFolder= "\\".join(geodatabase.split('\\')[:-1])
# This case, I tell python take \ and join on the primary directory above which I have called geodatabase. However, I tell it to remove some characters. I will explain the split below.
# I use split method. The split() method splits a string into a list
#In the case above it splits into ['W:\\working\\Sunderwood\\Network', 'Analyst\\MarchDistances\\Centroid.gdb']. However, that is not what I want. I want to remove "\\Centroid.gdb" so that I remain with the follwoing path ['W:\\working\\Sunderwood\\Network', 'Analyst\\MarchDistances']
#Before I tell arcpy to save, I have to specify the workspace in which it will save. So I now make my environment the SaveInFolder
arcpy.env.workspace =SaveInFolder
## Now I have to tell arcpy what I will call my newtable. I use os.path.join.This method concatenates various path components with exactly one directory separator (‘/’) following each non-empty part except the last path component
newtable = os.path.join(arcpy.env.workspace, out_xls)
#In the above case it will give me "W:\working\Sunderwood\Network Analyst\MarchDistances\SunderwoodFirstArcpyTable20200402"
# You notice the newtable does not have an excel extension. I resort to + to concatenate .xls onto my path and make it "W:\working\Sunderwood\Network Analyst\MarchDistances\SunderwoodFirstArcpyTable20200402.xls"
table= newtable+".xls"
#Finally, I call the arcpy method and feed it with the required variables
# Execute TableToExcel
arcpy.TableToExcel_conversion(in_table, table)
print (table + " " + " is now available")

load file with csv extension in ssis

I have to load file with csv extension from one particular folder to data base in ssis. file name is not known but folder and extension is fixed.
To load the content of a file, the file name with folder path is required else the connection manager can not be validated and configured.
The easiest way is to get file name is using a For Each Loop container:
Select the option [Foreach File Enumerator]
Provide the Folder path and extension (like *.csv) you already have.
Get the File Name in a variable and use it within the Source of the data flow task within the For each Loop container.
Refer

load script from other file extension?

is it possible to load module from file with extension other than .lua?
require("grid.txt") results in:
module 'grid.txt' not found:
no field package.preload['grid.txt']
no file './grid/txt.lua'
no file '/usr/local/share/lua/5.1/grid/txt.lua'
no file '/usr/local/share/lua/5.1/grid/txt/init.lua'
no file '/usr/local/lib/lua/5.1/grid/txt.lua'
no file '/usr/local/lib/lua/5.1/grid/txt/init.lua'
no file './grid/txt.so'
no file '/usr/local/lib/lua/5.1/grid/txt.so'
no file '/usr/local/lib/lua/5.1/loadall.so'
no file './grid.so'
no file '/usr/local/lib/lua/5.1/grid.so'
no file '/usr/local/lib/lua/5.1/loadall.so'
I suspect that it's somehow possible to load the script into package.preaload['grid.txt'] (whatever that is) before calling require?
It depends on what you mean by load.
If you want to execute the code in a file named grid.txt in the current directory, then just do dofile"grid.txt". If grid.txt is in a different directory, give a path to it.
If you want to use the path search that require performs, then add a template for .txt in package.path, with the correct path and then do require"grid". Note the absence of suffix: require loads modules identified by names, not by paths.
If you want require("grid.txt") to work should someone try that then yes, you'll need to manually loadfile and run the script and put whatever it returns (or whatever require is documented to return when the module doesn't return anything) into package.loaded["grid.txt"].
Alternatively, you could write your own loader just for entries like this which you set into package.preload["grid.txt"] which finds and loads/runs the file or, more generically, you could write yourself a loader function, insert it into package.loaders, and then let it do its job whenever it sees a "*.txt" module come its way.