I want to rename image files with schoolId in [school] table. Is there any other approach to do this.
Currently, i am doing following steps for each file:
1. copy image file name
2. use this query to get schoolId
SELECT * FROM [School]
where SchoolCDS='01611926000962
Rename image file with SchoolId
Any best approach?
have you stored image file name?
if yes,
then fetch the image file name from db, then fetch school id from db based on file name, finally update db with new file name.
if no, (in case of physical stored image)
get file name as given below:
string fileName = #"C:\mydir\myfile.ext";
string path = #"C:\mydir\";
string result;
result = Path.GetFileName(fileName);
rest code is same as above.
Related
In databricks I have several CSV files that I need to load. I would like to add a column to my table with the file path, but I can't seem to find that option
My data is structured with
FileStore/subfolders/DATE01/filenameA.csv
FileStore/subfolders/DATE01/filenameB.csv
FileStore/subfolders/DATE02/filenameA.csv
FileStore/subfolders/DATE02/filenameB.csv
I'm using this SQL function in databricks, as this can loop through all the dates and add all filenameA into clevertablenameA, and all filenameB into clevertablenameB etc.
DROP view IF EXISTS clevertablenameA;
create temporary view clevertablenameA
USING csv
OPTIONS (path "dbfs:/FileStore/subfolders/*/filenameA.csv", header = true)
My desired outcome is something like this
col1 | col2|....| path
data | data|....| dbfs:/FileStore/subfolders/DATE02/filenameA.csv
data | data|....| dbfs:/FileStore/subfolders/DATE02/filenameA.csv
data | data|....| dbfs:/FileStore/subfolders/DATE02/filenameA.csv
Is there a clever option, or should I load my data another way?
The function input_file_name() could be used to retrieve the file name while reading.
SELECT *, input_file_name() as path FROM clevertablenameA
Note that this does not add a column to the view and merely returns the name of the file being read.
Refer to below link for more information.
https://learn.microsoft.com/en-us/azure/databricks/spark/latest/spark-sql/language-manual/functions/input_file_name
Alternatively you could try reading the files in a pyspark/scala cell and add the file name using the same function using the .withColumn("path", input_file_name()) function and then create the view on top of it.
I Have to Upload Data in SQL Server from .dbf Files through SSIS.
My Output Column is fixed but the input column is not fixed because the files come from the client and the client may have updated data by his own style. there may be some unused columns too or the input column name can be different from the output column.
One idea I had in my mind was to map files input column with output column in SQL Database table and use only those column which is present in the row for file id.
But I am not getting how to do that. Any idea?
Table Example
FileID
InputColumn
OutputColumn
Active
1
CustCd
CustCode
1
1
CName
CustName
1
1
Address
CustAdd
1
2
Cust_Code
CustCode
1
2
Customer Name
CustName
1
2
Location
CustAdd
1
If you create a similar table, you can use it in 2 approaches to map columns dynamically inside SSIS package, or you must build the whole package programmatically. In this answer i will try to give you some insights on how to do that.
(1) Building Source SQL command with aliases
Note: This approach will only work if all .dbf files has the same columns count but the names are differents
In this approach you will generate the SQL command that will be used as source based on the FileID and the Mapping table you created. You must know is the FileID and the .dbf File Path stored inside a Variable. as example:
Assuming that the Table name is inputoutputMapping
Add an Execute SQL Task with the following command:
DECLARE #strQuery as VARCHAR(4000)
SET #strQuery = 'SELECT '
SELECT #strQuery = #strQuery + '[' + InputColumn + '] as [' + OutputColumn + '],'
FROM inputoutputMapping
WHERE FileID = ?
SET #strQuery = SUBSTRING(#strQuery,1,LEN(#strQuery) - 1) + ' FROM ' + CAST(? as Varchar(500))
SELECT #strQuery
And in the Parameter Mapping Tab select the variable that contains the FileID to be Mapped to the parameter 0 and the variable that contains the .dbf file name (alternative to table name) to the parameter 1
Set the ResultSet type to Single Row and store the ResultSet 0 inside a variable of type string as example #[User::SourceQuery]
The ResultSet value will be as following:
SELECT [CustCd] as [CustCode],[CNAME] as [CustName],[Address] as [CustAdd] FROM database1
In the OLEDB Source select the Table Access Mode to SQL Command from Variable and use #[User::SourceQuery] variable as source.
(2) Using a Script Component as Source
In this approach you have to use a Script Component as Source inside the Data Flow Task:
First of all, you need to pass the .dbf file path and SQL Server connection to the script component via variables if you don't want to hard code them.
Inside the script editor, you must add an output column for each column found in the destination table and map them to the destination.
Inside the Script, you must read the .dbf file into a datatable:
C# Read from .DBF files into a datatable
Load a DBF into a DataTable
After loading the data into a datatable, also fill another datatable with the data found in the MappingTable you created in SQL Server.
After that loop over the datatable columns and change the .ColumnName to the relevant output column, as example:
foreach (DataColumn col in myTable.Columns)
{
col.ColumnName = MappingTable.AsEnumerable().Where(x => x.FileID = 1 && x.InputColumn = col.ColumnName).Select(y => y.OutputColumn).First();
}
After loop over each row in the datatable and create a script output row.
In addition, note that in while assigning output rows, you must check if the column exists, you can first add all columns names to list of string, then use it to check, as example:
var columnNames = myTable.Columns.Cast<DataColumn>()
.Select(x => x.ColumnName)
.ToList();
foreach (DataColumn row in myTable.Rows){
if(columnNames.contains("CustCode"){
OutputBuffer0.CustCode = row("CustCode");
}else{
OutputBuffer0.CustCode_IsNull = True
}
//continue checking all other columns
}
If you need more details about using a Script Component as a source, then check one of the following links:
SSIS Script Component as Source
Creating a Source with the Script Component
Script Component as Source – SSIS
SSIS – USING A SCRIPT COMPONENT AS A SOURCE
(3) Building the package dynamically
I don't think there are other methods that you can use to achieve this goal except you has the choice to build the package dynamically, then you should go with:
BIML
Integration Services managed object model
EzApi library
(4) SchemaMapper: C# schema mapping class library
Recently i started a new project on Git-Hub, which is a class library developed using C#. You can use it to import tabular data from excel, word , powerpoint, text, csv, html, json and xml into SQL server table with a different schema definition using schema mapping approach. check it out at:
SchemaMapper: C# Schema mapping class library
You can follow this Wiki page for a step-by-step guide:
Import data from multiple files into one SQL table step by step guide
I have a input folder in ADLS in the format year/month/date eg: 2017/07/11. I want to pass this input folder as a parameter to my usql script. I am not using ADF. I dont want to generate current date from within Usql script as i am not sure if the input folder is of the current date. How to do it effectively?
One way I thought of was uploading a "done" file after all my input folder is uploaded to ADLS account and that "done" file will contain the date. But i am not able to use that date to form my input data path. Please help.
Let's assume you have several csv files in your folder structure (structured as yyyy/MM/dd) and you want to extract all the files in a folder of a specific date. You can do it in two ways (depending in whether you need to have exact datetime semantics or if you are fine with path concat).
First the path concat example:
DECLARE EXTERNAL #folder = "2017/07/11"; // Script parameter with default value.
// You can specify the value also with constant-foldable expression on Datetime.Now.
DECLARE #path = "/constantpath/"+#folder+"/{*.csv}";
#data = EXTRACT I int, s string // or whatever your schema is...
FROM #path
USING Extractors.Csv();
...
And here is the example with a file set virtual column:
DECLARE EXTERNAL #date = "2017/07/11"; // Script parameter with default value.
// You can specify the value also with constant-foldable expression on Datetime.Now and string serialization (I am not sure if the ADF parameter model supports DateTime values).
DECLARE #path = "/constantpath/{date:yyyy}/{date:MM}/{date:dd}/{*.csv}";
#data = EXTRACT I int, s string // or whatever your schema is...
, date DateTime // virtual column for the date pattern
FROM #path
USING Extractors.Csv();
// Now apply the requested filter to reduce the files to the requested set
#data = SELECT * FROM #data WHERE date == DateTime.Parse(#date);
...
In both cases, you pass the parameter via the ADF parameterization model and you can decide to wrap the code into a U-SQL stored procedure or TVF as suggested by Bob.
I'm currently creating a flat file export for one of our clients, i've managed to get the file in the format they want, i'm trying to get the easiest way of creating a dynamic file name. I've got the date in as a variable and the path ect but they want a count in the file name. For example
File name 1 : TDY_11-02-2013_{1}_T1.txt. The {} being the count. So next weeks file would be TDY_17-02-2013_{2}_T1.txt
I cant see an easy way of doing this!! any idea's??
EDIT:
on my first answer, I thought you meant count of values returned on a query. My bad!
two ways to achieve this, you could loop into the destination folder, select the last file by date, get its value and increase 1, which sound like a lot of trouble. Why not a simple log table on the DB with last execution date and ID and then you compose your file name base on the last row of this table?
where exactly is your problem?
you can make a dynamic file name using expressions:
the count, you can use a "row count" component inside your data flow to assign the result to a variable and use the variable on your expression:
Use Script task and get the number inside the curly braces of the file name and store it in a variable.
Create a variable(FileNo of type int) which stores the number for the file
Pseudo code
string name = string.Empty;
string loction = #"D:\";
/* Get the path from the connection manager like the code below
instead of hard coding like D: above
string flatFileConn =
(string(Dts.Connections["Yourfile"].AcquireConnection(null) as String);
*/
string pattern = string.Empty;
int number = 0;
string pattern = #"{([0-9])}"; // Not sure about the correct regular expression to retrieve the number inside braces
foreach (string s in Directory.GetFiles(loction,"*.txt"))
{
name = Path.GetFileNameWithoutExtension(s);
Match match = Regex.Match(name, pattern );
if (match.Success)
{
dts.Variables["User::FileNo"].Value = int.Parse(match.Value)+1;
}
}
Now once you get the value use it in your file expression in the connection manager
#[User::FilePath] +#[User::FileName]
+"_{"+ (DT_STR,10,1252) #[User::FileNo] + "}T1.txt"
I have two sqlite.db files. I'd like to copy the contents of one column in a table of on db file to another.
for example:
I have the model Information in db file called new.db:
class Information(models.Model):
info_id = models.AutoField(primary_key = True)
info_name = models.CharField( max_length = 50)
and the following information model in db file called old.db:
class Information(models.Model):
info_id = models.AutoField(primary_key = True)
info_type = models.CharField(max_length = 50)
info_name = models.CharField( max_length = 50)
I'd like to copy all the data in column info_id and info_name from old.db to info_id and info_name in new.db.
I was thinking something like:
manage.py dbshell
then
INSERT INTO "new.Information" ("info_id", "info_name")
SELECT "info_id", "info_name"
FROM "old.Information";
This doesn't seem to be working. It says new.Information table does not exist... any ideas?
You'd need to switch your database URL in your settings file to db2 and run syncdb to create the new tables. After that the easiest thing to do imo would be to switch back to db1 and run ./manage.py dumpdata myapp > data.json, followed by another switch to db2 where you can run ./manage.py loaddata data.json.
Afterwards, you can drop the data you don't need from db2.
Edit: Another approach would be to use the ATTACH function from sqlite. First, I recommend you do the first step above (change database settings and use syncdb to create the tables), then you can switch back and do this:
./manage.py dbshell
> ATTACH DATABASE 'new.db' AS newdb;
> INSERT INTO newdb.Information SELECT * FROM Information;
The dumped file from old.db contains info_type field which is not in the new Information model. This will fail the loaddata which checks all field loaded from JSON file. You could comment out info_type line before dump from old model.
The Attach way mentioned by Alex is easier and great, which needs a tiny tweak
INSERT INTO newdb.Information SELECT * FROM Information;
note the missing parentheses around the SELECT, sqlite does not accept them. Refs http://sqlite.org/lang_insert.html
If you are performing migration, have you tried South