select latest date from string array - vb.net

below is the array of strings , ie , the names of the files and folders is something i will get in the array. now from this array i need to select the newest macro file. ie , among all the strings which ends with xslm in the string array , i will select the one which has the string 20200817_W.xslm .which is the latest file kept.
Edit :
for Min.Rep of the prob ,
here we are talking about a string array like below
{IOH Bot Files , Archive , IOH_AllPlants_BI_2020817_W.xlsm, IOH_AllPlants_BI_2020817_W.xlsm ,... }
from this array i need to choose , IOH_AllPlants_BI_2020817_W.xlsm- because this string has a date component in it and it is the latest in the available list of strings

You don't even need LINQ for this one, because of the regularity of the data:
Array.Sort(arr2)
Dim latestOne = arr(arr.Length-1)
Perhaps we should ensure only xlsm files of the right name are considered:
Dim arr2 = Array.FindAll(arr, Function(x) x.StartsWith("IOH_AllPlants_BI_") AndAlso x.EndsWith("xlsm"))
Array.Sort(arr)
Dim latestOne = arr(arr.Length-1)
We could use LINQ, and (keeping our "only matching names" logic) rather than using an expensive sort, just ask for the Max:
Dim onlyIOHXLSMFiles = arr.Where(Function(x) x.StartsWith("IOH_AllPlants_BI_") AndAlso x.EndsWith("xlsm"))
Dim latestOne = arr.Max()
We don't need to parse this date because it's yyyyMMdd; it sorts just fine as a string. Because it's just a simple string property it is fine to use with just Max which is more efficient than the typical "OrderBy/First" approach.
If the list was not just of a simple type, but instead was eg Person and you were wanting the most recently born Person (rather than just their birthdate, which is what Max would give you) you could:
Dim lastOne = personArr.OrderBy(Function(p) p.Birthdate).Last()
I use OrderBy/Last rather then OrderByDescending/First because it's fewer characters to type for the same effect
All these code samples (with the exception of the last one) make use of an array arr created like:
Dim arr = {"IOH Bot Files" , "Archive" , "IOH_AllPlants_BI_2020817_W.xlsm", "IOH_AllPlants_BI_2020817_W.xlsm" , ... }

See sample logic below which returns what you need. You can change the logic according to your need. You should read the file names to an array and use below logic.
using System;
using System.Linq;
namespace SampleConsoleApp
{
class Program
{
static void Main(string[] args)
{
string[] data = new string[] {
"IOH_AllPlants_BI_20200810_W.xslm"
, "IOH_AllPlants_BI_20200803_W.xslm"
, "IOH_AllPlants_BI_20200727_W.xslm"
, "IOH_AllPlants_BI_20200720_W.xslm",
"IOH_AllPlants_BI_20200817_W.xslm"
, "IOH_AllPlants_BI_20200713_W.xslm"
, "IOH_AllPlants_BI_20200706_W.xslm"};
var result = data.Select(s => s.Split('_')).Select(x => x[3]).OrderByDescending(x => x).First();
//result returns 20200817
}
}
}

Related

How to handle text file with multiple spaces as delimiter

I have a source data set which consists of text files where the columns are separated by one or more spaces, depending on the width of the column value. The data is right adjusted, i.e. the spaces are added before the actual data.
Can I use one of the built-in extractors or do I have to implement a custom extractor?
#wBob's solution works if your row fits into a string (128kB). Otherwise, write your custom extractor that does fixed with extraction. Depending on what information you have on the format, you can write it by using input.Split() to split into rows and then split the rows based on your whitespace rules as shown below (full example for Extractor pattern is here) or you could write one similar to the one described in this blog post.
public override IEnumerable<IRow> Extract(IUnstructuredReader input, IUpdatableRow outputrow)
{
foreach (Stream current in input.Split(this._row_delim))
{
using (StreamReader streamReader = new StreamReader(current, this._encoding))
{
int num = 0;
string[] array = streamReader.ReadToEnd().Split(new string[]{this._col_delim}, StringSplitOptions.None).Where(x => !String.IsNullOrWhiteSpace(x)));
for (int i = 0; i < array.Length; i++)
{
// Now write your code to convert array[i] into the extract schema
}
}
yield return outputrow.AsReadOnly();
}
}
}
You could create a custom extractor or more simply, import the data as one row then split and clean and it using c# methods available to you within U-SQL like Split and IsNullOrWhiteSpace, something like this:
My right-aligned sample data
// Import the row as one column to be split later; NB use a delimiter that will NOT be in the import file
#input =
EXTRACT rawString string
FROM "/input/input.txt"
USING Extractors.Text(delimiter : '|');
// Add a row number to the line and remove white space elements
#working =
SELECT ROW_NUMBER() OVER() AS rn, new SqlArray<string>(rawString.Split(' ').Where(x => !String.IsNullOrWhiteSpace(x))) AS columns
FROM #input;
// Prepare the output, referencing the column's position in the array
#output =
SELECT rn,
columns[0] AS id,
columns[1] AS firstName,
columns[2] AS lastName
FROM #working;
OUTPUT #output
TO "/output/output.txt"
USING Outputters.Tsv(quoting : false);
My results:
HTH

Calling a function within Entity Framework Select

A property from my object (oJobs) is as follows:
Private _brandlist As List(Of DAL.Brand)
Public Property PostBrandList() As List(Of DAL.Brand)
Get
Return _brandlist
End Get
Set(ByVal value As List(Of DAL.Brand))
_brandlist = value
End Set
End Property
In the database, the brand list is stored as a string separated by comma e.g. the column 'brands' can be a string '3,45,2' where each number represents an id of a brand stored in another table.
my select query is as below:
Dim jobposts As List(Of oJobs) = From j In db.JobPostings
Select New oJobs With { 'hiding all others for code brevity
.PostBrandList = 'problem is here'
}
Since j.BrandList will return a string, I will need to split that string and for each number, run another query to finally return and assign a List(Of DAL.Brand) into .PostBrandList
For those who might ask "what have you tried?",
I have run the query, then did a for each to add the list of brands later - succeeded but not optimal
Coded a function that takes the list as a parameter and returns a separate list of objects - very silly.
Also, I am not allowed to normalize the DB :(
Not tested and might need some tweaking but heres one idea. you will also need to change your property to an IEnumerable rather than List. Because the second linq query is embedded within the first, I believe it should execute it all as one query, but you should check it to make sure.
Dim jobposts As List(Of oJobs) = From j In db.JobPostings
Select New oJobs With { 'hiding all others for code brevity
.PostBrandList = From b In db.Brands Where j.Brands = b.ID Or j.Brands.StartsWith(b.ID & ",") Or j.Brands.EndsWith("," & b.ID) Or j.Brands.Contains("," & b.ID & ",") Select b
}
In c# you can use
.Select(x=>new {x.BrandList})
.ToList() //Materialize first before calling function
.Select(x=> new oJobs{
PostBrandList =
db.Brands.Where(z=>
x.BrandList
.Split(',')
.Select(y=>int.Parse(y.Trim()))
.Contains(z.Id))
.ToList()
});
Note that you must materialize entity first before calling String.Split
I don't know how to translate that to VB.NET.
Of course it will cause SELECT n+1 problem because you can't use join.
If you can't normalize table, my other suggestion is to create indexed view (sql server), so you can use join and improve performance.
Indexed view https://msdn.microsoft.com/en-us/library/ms191432.aspx
You could try it with the Let statement:
Dim jobposts As List(Of oJobs) = From j In db.JobPostings
/* li in the LINQ Statement represents one string value from the BrandList list */
Let postBrandElemsList = j.BrandList.Split(',').Select(Function(li) New DAL.Brand With { ... /* Field initializatione of the Class DAL.Brand here */ }
Select New oJobs With
{
.PostBrandList = postBrandElemsList
}
I'm sorry for the probably bad VB.NET syntax, you should check this when implementing it in your code.
Maybe you would just want to use the Split function on the column brands into an array and iterate through the result, using the Find function to retrieve the brand objects?

Spark JavaPairRDD iteration

How can iterate on JavaPairRDD. I have done a group by and got back a RDD as below JavaPairRDD (Tuple 7 set of Strings and List of Objects)
Now I have to iterate over this RDD and do some calculations like FOR EACH in Pig.
Basically I would like to iterate the key and the list of values and do some operations and then return back a JavaPairRDD?
JavaPairRDD<Tuple7<String, String,String,String,String,String,String>, List<Records>> sizes =
piTagRecordData.groupBy( new Function<Records, Tuple7<String, String,String,String,String,String,String>>() {
private static final long serialVersionUID = 2885738359644652208L;
#Override
public Tuple7<String, String,String,String,String,String,String> call(Records row) throws Exception {
Tuple7<String, String,String,String,String,String,String> compositeKey = new Tuple7<String, String, String, String, String, String, String>(row.getAsset_attribute_id(),row.getDate_time_value(),row.getOperation(),row.getPi_tag_count(),row.getAsset_id(),row.getAttr_name(),row.getCalculation_type());
return compositeKey;
}
});
After this I want to perform FOR EACH member of sizes (JavaPairRDD), operation -- something like
rejected_records = FOREACH sizes GENERATE FLATTEN(Java function on the List of Records based on the group key
I am using Spark 0.9.0
Even though you are talking about "FOR EACH", it really sounds like you want the flatMap operation, since you want to produce new values and flatten them. This is available for Java RDDs, including a JavaPairRDD.
You can use void foreach(VoidFunction<T> f) method. More info and methods: https://spark.apache.org/docs/1.1.0/api/java/org/apache/spark/api/java/JavaRDDLike.html#foreach(org.apache.spark.api.java.function.VoidFunction)
if you want to view some value of JavaPairRDD, I would do like this
for (Tuple2<String, String> test : pairRdd.take(10)) //or pairRdd.collect()
{
System.out.println(test._1);
System.out.println(test._2);
}
Note:Tuple2 (assuming you have strings inside the JavaPairRDD), change the datatype according to the data type stored in the JavaPairRDD.

Get a value from array based on the value of others arrays (VB.Net)

Supposed that I have two arrays:
Dim RoomName() As String = {(RoomA), (RoomB), (RoomC), (RoomD), (RoomE)}
Dim RoomType() As Integer = {1, 2, 2, 2, 1}
I want to get a value from the "RoomName" array based on a criteria of "RoomType" array. For example, I want to get a "RoomName" with "RoomType = 2", so the algorithm should randomize the index of the array that the "RoomType" is "2", and get a single value range from index "1-3" only.
Is there any possible ways to solve the problem using array, or is there any better ways to do this? Thank you very much for your time :)
Note: Code examples below using C# but hopefully you can read the intent for vb.net
Well, a simpler way would be to have a structure/class that contained both name and type properties e.g.:
public class Room
{
public string Name { get; set; }
public int Type { get; set; }
public Room(string name, int type)
{
Name = name;
Type = type;
}
}
Then given a set of rooms you can find those of a given type using a simple linq expression:
var match = rooms.Where(r => r.Type == 2).Select(r => r.Name).ToList();
Then you can find a random entry from within the set of matching room names (see below)
However assuming you want to stick with the parallel arrays, one way is to find the matching index values from the type array, then find the matching names and then find one of the matching values using a random function.
var matchingTypeIndexes = new List<int>();
int matchingTypeIndex = -1;
do
{
matchingTypeIndex = Array.IndexOf(roomType, 2, matchingTypeIndex + 1);
if (matchingTypeIndex > -1)
{
matchingTypeIndexes.Add(matchingTypeIndex);
}
} while (matchingTypeIndex > -1);
List<string> matchingRoomNames = matchingTypeIndexes.Select(typeIndex => roomName[typeIndex]).ToList();
Then to find a random entry of those that match (from one of the lists generated above):
var posn = new Random().Next(matchingRoomNames.Count);
Console.WriteLine(matchingRoomNames[posn]);

Partial replace on SQL image data column

This question is related to another one I posted earlier.
To recap, I need to fix an issue with an ancient legacy app where people messed up data storage by re-installing the software the wrong way.
The application stores data by saving a record in an SQL DB. Each record holds a reference to a file on disk of which the filename auto-increments.
By re-installing the app the filename auto-increment was re-set so the DB now holds multiple unrelated records which reference the same filename and I have to directories with files which I obviously cannot merge because of these identical filenames. The files hold no reference to the DB data so the only course of action that remains is to filter the DB records on date created and try to rename "EXED" to "IXED" or something like that.
The DB is relatively simple with one table containing a column that holds data of type "Image".
An example content of this image data is as follows:
0x3200001000000000000000200B0000000EFF00000300000031340000000070EC0100002C50000004000000C90000005D010000040000007955B63F4D01000004000000F879883E4F01000004000000BC95563E98010000040000009A99993F4A01000004000000000000004B01000004000000000000009101000004000000000000004E01000004000000721C83425101000004000000D841493F5E01000004000000898828414101000004000000F2D2BD3F4201000004000000FCA9B13F40010000040000007574204244010000040000000000204345010000040000007DD950414601000004000000000000004701000004000000000000009201000004000000000000008701000004000000D2DF13426A0100000400000000005C42740100000400000046B68F40500100000400000018E97A3F7901000004000000FB50CF3C7A01000004000000E645703F99010000040000000000E0404C010000040000008716593F8601000004000000000006439A0100000400000000008040700100000400000063D887449E01000004000000493CBA3E9C0100000400000069699D429B01000004000000DD60CA3F9D0100000400000035DE3C44B4010000040000008B5C744433000000040000003D0ABB4134000000040000000AFF7C44350000000400000093CB3942750400000400000054A69F41BA010000040000002635C64173040000040000008367C24100000080690100002B5000003101000032000010000000000000002009000000000000000100000000000000F00000000000000080080100000100000010000000540100000100000021F0AA42270000000200000010000000540100000200000021F0AA42280000000300000010000000540100000300000059C9E6432900000004000000100000005401000004000000637888442A00000005000000100000005401000005000000DFEF87442B00000006000000100000005401000006000000000000002C00000007000000100000005401000007000000000000002D00000008000000100000005401000008000000000000002D000000090000001000000054010000090000002F353D442D0000000A00000010000000540100000A00000035DE3C44340000000B00000010000000540100000B0000008B5C7444240000009D50000010000000CDCCCC3E2C513B41F65D5F3F2C51BB419E50000010000000CCBA2C3FE17C8C411553B13F83F32142000000403700000000FE0000090000004558454434386262002D50000008000000447973706E6F65008E5000000E00000056454C442052414D502033363000000000F000000000
The data is apparently Hex which mostly encodes meaningless crap but also holds the name of physical files (towards the end of the data field) in the filesystem that is linked to the SQL records:
??#7???????????EXED48bb?-P??????Dyspnoe??P??????VELD RAMP 360
I'm interested in the EXED part.
There is no clear regularity in the offset at which the filename appears and the filename is of variable length (so I do not know beforehand how long the substring will be).
I can call up all records with SQL like this:
SELECT COUNT(*) as "Number of EXED Files after critical date"
FROM [ZAN].[dbo].[zanu]
WHERE udata is not null
and SUBSTRING(udata, 1 , 2147483647) like '%EXED%'
and [udatum] > 0
and CONVERT(date,[udatum]) > CONVERT(date,'20100629')
What I would like to do now is know how to replace this EXED substring by something else (e.g. IXID).
I'm unfamiliar with SQL and Googling so far has yielded very little information on my options here.
I also have no other info on the original code that generated this data/the data format/encoding/whatever...
It's a mess really.
Any help is welcome!
An update on this:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Data.Linq;
using System.Text;
using System.Data.SqlClient;
using System.Threading;
namespace ZANLinq
{
class Program
{
static void Main(string[] args)
{
try
{
DataContext zanDB = new DataContext(#"Data Source=.\,1433;database=ZAN;Integrated Security=true");
string strSQL = #"SELECT
Idnr,
Udatum,
Uzeit,
Unr,
Uart,
Ubediener,
Uzugriff,
Ugr,
Uflags,
Usize,
Udata
FROM Zanu
WHERE (Udata IS NOT null and SubString(Udata, 1 , 2147483647) LIKE '%EXED%')
AND (Idnr = ' 2')";
var zanQuery = zanDB.ExecuteQuery<Zanu>(strSQL);
List<Zanu> list = zanQuery.ToList<Zanu>();
foreach (Zanu zanTofix in list)
{
string strOriginal = ASCIIEncoding.ASCII.GetString(zanTofix.Udata);
string strFixed = strOriginal.Replace("EXED", "IXED");
zanTofix.Udata = ASCIIEncoding.ASCII.GetBytes(strFixed);
}
zanDB.SubmitChanges();
//Console.WriteLine(zanResults.Count<Zanu>().ToString());
}
catch (SqlException e)
{
Console.WriteLine(e.Message);
}
}
}
}
It finds the records I'm interested in, I can easily manipulate the data but the commit doesnt work. I'm stumped, there are no exceptions, no indication the code is wrong.
Anybody have ideas?
UPDATE:
I think the above does not work because my table appears to have a composite PK (I cannot change this):
Since I could not debug this (no info anywhere, no exceptions, just a silent fail of the submitchanges()) I decided to use another approach and abandon Linq2SQL altogether:
try
{
SqlConnection thisConnection = new SqlConnection(#"Network Library=DBMSSOCN;Data Source=.\,1433;database=ZAN;Integrated Security=SSPI");
DataSet zanDataSet = new DataSet();
SqlDataAdapter zanDa;
SqlCommandBuilder zanCmdBuilder;
thisConnection.Open();
//Initialize the SqlDataAdapter object by specifying a Select command
//that retrieves data from the sample table.
zanDa = new SqlDataAdapter(#"SELECT
Idnr,
Udatum,
Uzeit,
Unr,
Uart,
Ubediener,
Uzugriff,
Ugr,
Uflags,
Usize,
Udata
FROM Zanu
WHERE (Udata IS NOT null and SubString(Udata, 1 , 2147483647) LIKE '%IXED%')
AND (Idnr = ' 2')
AND (Uzeit = '13:21')", thisConnection);
//Initialize the SqlCommandBuilder object to automatically generate and initialize
//the UpdateCommand, InsertCommand, and DeleteCommand properties of the SqlDataAdapter.
zanCmdBuilder = new SqlCommandBuilder(zanDa);
//Populate the DataSet by running the Fill method of the SqlDataAdapter.
zanDa.Fill(zanDataSet, "Zanu");
Console.WriteLine("Records that will be affected: " + zanDataSet.Tables["Zanu"].Rows.Count.ToString());
foreach (DataRow record in zanDataSet.Tables["Zanu"].Rows)
{
string strOriginal = ASCIIEncoding.ASCII.GetString((byte[])record["Udata"]);
string strFixed = strOriginal.Replace("IXED", "EXED");
record["Udata"] = ASCIIEncoding.ASCII.GetBytes(strFixed);
//string strPostMod = ASCIIEncoding.ASCII.GetString((byte[])record["Udata"]);
}
zanDa.Update(zanDataSet, "Zanu");
thisConnection.Close();
Console.ReadLine();
}
catch (SqlException e)
{
Console.WriteLine(e.Message);
}
This seems to work but any input on why the Linq does not work and whether or not my second solution is efficient/optimal or not is still very much appreciated.