How to convert a TStringDynArray to a TStringList - vcl

I'm using TDirectory::GetFiles() to get a list of files (obviously).
The result is stored in a TStringDynArray and I want to transfer it to a TStringList for the sole purpose to use the IndexOf() member to see if a string is present in the list or not.
Any solution that will let me know if a certain string is present in the list of files returned from TDirectory::GetFiles() will do fine. Although, it would be interesting to know how to convert the TStringDynArray.
TStringDynArray DynFiles = TDirectory::GetFiles("Foo path");
System::Classes::TStringList *Files = new System::Classes::TStringList;
Files->Assing(DynFiles) // I know this is wrong, but it illustrates what I want to do.
if(Files->IndexOf("Bar") { // <---- This is my goal, to find "Bar" in the list of files.
}

TStringList and TStringDynArray do not know anything about each other, so you will have to copy the strings manually:
TStringDynArray DynFiles = TDirectory::GetFiles("Foo path");
System::Classes::TStringList *Files = new System::Classes::TStringList;
for (int I = DynFiles.Low; I <= DynFiles.High; ++I)
Files->Add(DynFiles[I]);
if (Files->IndexOf("Bar")
{
//...
}
delete Files;
Since you have to manually loop through the array anyway, you can get rid of the TStringList:
TStringDynArray DynFiles = TDirectory::GetFiles("Foo path");
for (int I = DynFiles.Low; I <= DynFiles.High; ++I)
{
if (DynFiles[I] == "Bar")
{
//...
break;
}
}
But, if you are only interested in checking for the existence of a specific file, look at TFile::Exists() instead, or even Sysutils::FileExists().
if (TFile::Exists("Foo path\\Bar"))
{
//...
}
if (FileExists("Foo path\\Bar"))
{
//...
}
* personally, I hate that the IOUtils unit uses dynamic arrays for lists. They are slow, inefficient, and do not integrate well with the rest of the RTL. But that is just my opinion.

TStrings knows TStringDynArray good enough to provide a member AddStrings:
Files->AddStrings(TDirectory::GetFiles("Foo path"));
will do the job.

Related

mark processed JSON files in forkJoin

while there are certain pieces of code missing, the core logic is that a JSON file should be marked as 'processed' and not be processed again. If a call to assets/b.json is done, I need to cache that the file was visited before and hence, no second http call should be made to the same file. Right now, a file is called multiple times. Each JSON has a path identification:
let touchedJSONFiles = {}
private setJSONAsTouched(path) {
this.touchedJSONFiles[path]= true;
}
processJSONFiles() {
// do some logic to build the JSON files that needs to be checked
let files = [];//contains paths only such as abc/abc.json, 34.json
let urls = [];// which urls to process, if any
files.map(file => {
if (this.touchedJSONFiles[file] == undefined) {
urls.push(file);
}
});
// do we have files to process?
if (urls.length > 0 ) {
forkJoin(urls).subscribe(results => {
const len = urls.length;
for(let k=0; k < len; k++) {
this.setJSONAsTouched(files[k]);
}
});
}
}
Given the method could be called from wherever, I want to make it as reusable as possible. The current issue I am having is that a JSON file is processed more than once even though it does get touched and the setJSONAsTouched() method is called correctly. While I am not sure it looks like I need to find a way for forkJoin to finish before the method resolves.

Lodash functions are "type" sensitives

I've been using the last version of lodash since quite of time and I like it. I have one question though.
I noticed lodash functions are "types" sensitives
_.find(users, {'age': 1}); will not work well if 1 is "1"
_.filter(users, {'age': "36"}); will not work if "36" is 36
Question
Is there a way to make lodash able to filter or find objects without taking account the type?
_.find(users, {'age': 1}) would then return all objects whose age is a string or a number equals to 1
Its because of its comparison is with === when you are passing an condition, however you can always pass a callback for your kind of checking:
for your purpose:
_.filter(users, function(user){return user.age==36});
It is a plain and simple way of finding even in Native JavaScript, however if you really want to use the benefit of not writing a callback code every time you have a object literal as query data, you can function which will convert a object to it's corresponding callback.
function convertToFilterCallback(obj) {
var keys = Object.keys(obj);
return function(each) {
for (var idx = 0; idx < keys.length; idx++) {
var key = keys[idx];
if (obj[key] != each[key]) {
return false;
}
}
return true;
}
}
and then use it like,
_.filter(users, convertToFilterCallback({..<your object literal>..}));
However, if you are doing so, you can use the native find of filter method, and specifically not an advantage over lodash.

How to sort through a dictionary (a real world dictionary) that is in a .csv file?

I haven't read enough theory or have had enough practice in CS, but there must be a simpler, faster way to look up data from a file. I'm working with a literal, real world dictionary .csv file, and I'm wondering how I can speed up look up of every word. No doubt going through the whole list for the word does not make sense; splitting the file into a-z order, and only looking there for each word, makes sense.
But what else? Should I learn SQL or something and try to convert the text database into an SQL database? Are there methods in SQL that would enable me to do what I wish? Please give me ideas!
SQLite sounds fit to this task.
Create a table, import your csv file, create an index and you're done.
I just did this using interop with a moderate size .csv file given me by a supply company. It worked well, but still requires a considerable delay due to the cumbersome decorators used in interop/COM.
class Excel
{
private excel.Application application;
private excel.Workbook excelWorkBook;
protected const string WORD_POSITION = "A"; //whichever column the word is located in when loaded on Excel spreadsheet.
protected const string DEFINITION_POSITION = "B"; // whichever column the definition is loaded into on Excel spreadsheet.
Dictionary<string,string> myDictionary = new Dicationary<string,string>();
public Excel(string path) // where path is the fileName
{
try
{
application = new excel.Application();
excelWorkBook = application.Workbooks.Add(path);
int row = 1;
while (application.Cells[++row, WORD_POSITION].Value != null)
{
myDictionary[GetValue(row, WORD_POSITION)] = GetValue(row, DEFINITION_POSITION);
});
}
}
catch (Exception ex)
{
Debug.WriteLine(ex.ToString());
}
finally
{
excelWorkBook.Close();
application.Quit();
}
}
private string GetValue(int row, string columnName)
{
string returnValue = String.Empty;
returnValue = application.Cells[row, columnName].Value2;
if (returnValue == null) return string.Empty;
return returnValue;
}
}
}
Create a new sql database, import the cab into a new table, place an index on the column that stores the word values, then search against table... That is the approach I would take

Thingworx: Customize `GetImplementingThings` service

I am new to ThingWorx and I want to get some practical flavour of implementing services on this example.
I have such data model:
Thing 'Car' has Thing 'Sensor'(Infotable)
I want to have service of CarTemplate that will return all implemented Cars and instead of Sensor's Object it will return Sensor's 'name' property.
What I have now:
"Car1Name" | SensorObject
What I want:
"Car1Name" | "Accelerator1Name"
Please, help me to make it happens.
There's no kind of "Static" services on ThingTemplates, if you want to recover all Implementing things of a ThingTemplate with properties values you should build a Thing Helper.
What's a Thing Helper? It's another thing, call it whatever you want, let's say CarHelpers, which has a Service called GetCarsWithSensors, which does a ThingTemplates["ThingTemplateName"].GetImplementingThings(), or a GetImplementingThingsWithData and returns the desired Infotable.
Carles answer is valid, but I would avoid using QueryImplementingThingsWithData.
The problem with QueryImplementingThingsWithData is that Thingworx will check visibility, then security for every single property on every single implemented Thing. This is fine if you are running as a user in the Administrators group but once you have a lot of UserGroups and OrganizationalUnits this will slow down, A LOT.
Instead do something like this: (You'll need to create a DataShape and set that as your service return datashape)
var result; //result infotable, of your CarDataShape
var myThings = ThingTemplates["CarTemplate"].QueryImplementingThings();
for(var i=0; i< myThings.getRowCount(); i++) {
var myCar = Things[myThings.rows[i].name];
for(var j=0; j < myCar.sensorProperty.getRowCount(); j++) {
var newRow = {};
newRow.name = myCar.name;
newRow.sensor = myCar.sensorProperty.rows[j].sensorName;
result.AddRow(myCar);
}
}

How do I read a large file from disk to database without running out of memory

I feel embarrassed to ask this question as I feel like I should already know. However, given I don't....I want to know how to read large files from disk to a database without getting an OutOfMemory exception. Specifically, I need to load CSV (or really tab delimited files).
I am experimenting with CSVReader and specifically this code sample but I'm sure I'm doing it wrong. Some of their other coding samples show how you can read streaming files of any size, which is pretty much what I want (only I need to read from disk), but I don't know what type of IDataReader I could create to allow this.
I am reading directly from disk and my attempt to ensure I don't ever run out of memory by reading too much data at once is below. I can't help thinking that I should be able to use a BufferedFileReader or something similar where I can point to the location of the file and specify a buffer size and then CsvDataReader expects an IDataReader as it's first parameter, it could just use that. Please show me the error of my ways, let me be rid of my GetData method with it's arbitrary file chunking mechanism and help me out with this basic problem.
private void button3_Click(object sender, EventArgs e)
{
totalNumberOfLinesInFile = GetNumberOfRecordsInFile();
totalNumberOfLinesProcessed = 0;
while (totalNumberOfLinesProcessed < totalNumberOfLinesInFile)
{
TextReader tr = GetData();
using (CsvDataReader csvData = new CsvDataReader(tr, '\t'))
{
csvData.Settings.HasHeaders = false;
csvData.Settings.SkipEmptyRecords = true;
csvData.Settings.TrimWhitespace = true;
for (int i = 0; i < 30; i++) // known number of columns for testing purposes
{
csvData.Columns.Add("varchar");
}
using (SqlBulkCopy bulkCopy = new SqlBulkCopy(#"Data Source=XPDEVVM\XPDEV;Initial Catalog=MyTest;Integrated Security=SSPI;"))
{
bulkCopy.DestinationTableName = "work.test";
for (int i = 0; i < 30; i++)
{
bulkCopy.ColumnMappings.Add(i, i); // map First to first_name
}
bulkCopy.WriteToServer(csvData);
}
}
}
}
private TextReader GetData()
{
StringBuilder result = new StringBuilder();
int totalDataLines = 0;
using (FileStream fs = new FileStream(pathToFile, FileMode.Open, System.IO.FileAccess.Read, FileShare.ReadWrite))
{
using (StreamReader sr = new StreamReader(fs))
{
string line = string.Empty;
while ((line = sr.ReadLine()) != null)
{
if (line.StartsWith("D\t"))
{
totalDataLines++;
if (totalDataLines < 100000) // Arbitrary method of restricting how much data is read at once.
{
result.AppendLine(line);
}
}
}
}
}
totalNumberOfLinesProcessed += totalDataLines;
return new StringReader(result.ToString());
}
Actually your code is reading all data from file and keep into TextReader(in memory). Then you read data from TextReader to Save server.
If data is so big, data size in TextReader caused out of memory. Please try this way.
1) Read data (each line) from File.
2) Then insert each line to Server.
Out of memory problem will be solved because only each record in memory while processing.
Pseudo code
begin tran
While (data = FilerReader.ReadLine())
{
insert into Table[col0,col1,etc] values (data[0], data[1], etc)
}
end tran
Probably not the answer you're looking for but this is what BULK INSERT was designed for.
I would just add using BufferedFileReader with the readLine method and doing exatcly in the fashion above.
Basically understanding the resposnisbilties here.
BufferedFileReader is the class reading data from file (buffe wise)
There should be a LineReader too.
CSVReader is a util class for reading the data assuming that its in correct format.
SQlBulkCopy you are anywsay using.
Second Option
You can go to the import facility of database directly. If the format of the file is correct and thw hole point of program is this only. that would be faster too.
I think you may have a red herring with the size of the data. Every time I come across this problem, it's not the size of the data but the amount of objects created when looping over the data.
Look in your while loop adding records to the db within the method button3_Click(object sender, EventArgs e):
TextReader tr = GetData();
using (CsvDataReader csvData = new CsvDataReader(tr, '\t'))
Here you declare and instantiate two objects each iteration - meaning for each chunk of file you read you will instantiate 200,000 objects; the garbage collector will not keep up.
Why not declare the objects outside of the while loop?
TextReader tr = null;
CsvDataReader csvData = null;
This way, the gc will stand half a chance. You could prove the difference by benchmarking the while loop, you will no doubt notice a huge performance degradation after you have created just a couple of thousand objects.
pseudo code:
while (!EOF) {
while (chosenRecords.size() < WRITE_BUFFER_LIST_SIZE) {
MyRecord record = chooseOrSkipRecord(file.readln());
if (record != null) {
chosenRecords.add(record)
}
}
insertRecords(chosenRecords) // <== writes data and clears the list
}
WRITE_BUFFER_LIST_SIZE is just a constant that you set... bigger means bigger batches and smaller means smaller batches. A size of 1 is RBAR :).
If your operation is big enough that failing partway through is a realistic possibility, or if failing partway through could cost someone a non-trivial amount of money, you probably want to also write to a second table the total number of records processed so far from the file (including the ones you skipped) as part of the same transaction so that you can pick up where you left off in the event of partial completion.
Instead of reading csv rows one by one and inserting into db one by one I suggest read a chunk and insert it into database. Repeat this process until the entire file has been read.
You can buffer in memory, say 1000 csv rows at a time, then insert them in the database.
int MAX_BUFFERED=1000;
int counter=0;
List<List<String>> bufferedRows= new ...
while (scanner.hasNext()){
List<String> rowEntries= getData(scanner.getLine())
bufferedRows.add(rowEntries);
if (counter==MAX_BUFFERED){
//INSERT INTO DATABASE
//append all contents to a string buffer and create your SQL INSERT statement
bufferedRows.clearAll();//remove data so it could be GCed when GC kicks in
}
}