How to add columns subsequently in a dataset using Spark within a for loop ( where for loop contains the column name) - apache-spark-sql

Here trying to add subsequently column to dataset Row, the issue coming up is last column is only visible. The columns added earlier do not persist
private static void populate(Dataset<Row> res, String[] args)
{
String[] propArr = args[0].split(","); // Eg: [abc, def, ghi]
// Dataset<Row> addColToMergedData = null;
/** Here each element is the name of the column to be inserted */
for(int i = 0; i < propArr.length; i++){
// addColToMergedData = res.withColumn(propArr[i], lit(null));
}
}

the logic in the for loop is flawed hence the issue.
you can modify the program as follows :
private static void populate(Dataset<Row> res, String[] args)
{
String[] propArr = args[0].split(","); // Eg: [abc, def, ghi]
Dataset<Row> addColToMergedData = null;
/** Here each element is the name of the column to be inserted */
for(int i = 0; i < propArr.length; i++)
{
res = res.withColumn(propArr[i], lit(null));
}
addColToMergedData = res
}

Sol:
// addColToMergedData = res.withColumn(colMap.get(propArr[i]), lit(null));
should be written as :
res = res.withColumn(colMap.get(propArr[i]), lit(null));

Here's the clean and conscious way to do this by using Scala:
val columnsList = Seq("one", "two", "three", "four", "five")
val res = columnsList.foldLeft(addColToMergedData) { case (df, colName) =>
df.withColumn(colu, lit(null))
}

Related

Automating Import of large csv files(~3gb) with C#

I am a bit new to this but my goal is to import the data from a csv file into a sql table and include additional values for each row being the file name and date. I was able to accomplish this using entity frame work and iterating through each row of the file but with the size of the files it will take too long too actually complete.
I am looking for a method to accomplish this import faster. I was looking into potentially using csvhelper with sqlbulkcopy to accomplish this but was not sure if there was a way to pass in the additional values needed for each row.
public void Process(string filePath)
{
InputFilePath = filePath;
DateTime fileDate = DateTime.Today;
string[] fPath = Directory.GetFiles(InputFilePath);
foreach (var file in fPath)
{
string fileName = Path.GetFileName(file);
char[] delimiter = new char[] { '\t' };
try
{
using (var db = new DatabaseName())
{
using (var reader = new StreamReader(file))
{
string line;
int count = 0;
int sCount = 0;
reader.ReadLine();
reader.ReadLine();
while ((line = reader.ReadLine()) != null)
{
count++;
string[] row = line.Split(delimiter);
var rowload = new ImportDestinationTable()
{
ImportCol0 = row[0],
ImportCol1 = row[1],
ImportCol2 = TryParseNullable(row[2]),
ImportCol3 = row[3],
ImportCol4 = row[4],
ImportCol5 = row[5],
IMPORT_FILE_NM = fileName,
IMPORT_DT = fileDate
};
db.ImportDestinationTable.Add(rowload);
if (count > 100)
{
db.SaveChanges();
count = 0;
}
}
db.SaveChanges();
//ReadLine();
}
}
}
static int? TryParseNullable(string val)
{
int outValue;
return int.TryParse(val, out outValue) ? (int?)outValue : null;
}
}

Creating listview using arraylist

I have 2 arrays as follows, I want to create a list view showing the details from both array together eg
Location:-------
Magnitude:-------
How can i do that?
String des = description.toString();
Matcher matcher = Pattern.compile("(?<=Location: ).*?(?= ;)").matcher(des);
List<String> list = new LinkedList<>();
while (matcher.find()) {
list.add(matcher.group());
}
String des1 = description.toString();
Matcher matcher1 = Pattern.compile("(?<=Magnitude: ).*?(?= ;)").matcher(des1);
List<String> listmagnitude = new ArrayList<>();
while (matcher1.find()) {
listmagnitude.add(matcher1.group());
}
Try this!
int i = 0;
for(String l: list) {
System.out.format("Location: %s \t Magnitude: %s\n", l, listMagnitude[i]);
i++;
}

Why doesn't my number sequence print from the 2d arraylist correctly?

I cannot get the loop to work in the buildDimArray method to store the number combinations "11+11", "11+12", "11+21", "11+22", "12+11", "12+12", "12+21", "12+22", "21+11", "21+12", "21+21", "21+22", "22+11", "22+12", "22+21", and "22+22" into the 2d arraylist with each expression going into one column of the index dimBase-1 row. The loop may work for other people, but for some reason mine isn't functioning correctly. The JVM sees the if dimBase==1 condition, but refuses to check the other conditions. The "WTF" not being printed as a result from the buildDimArray method. If dimBase=1, it prints successfully, but doesn't for the other integers. The dimBase==3 condition needs a loop eventually. The "WTF" is for illustrative purposes. I could get away with a 1d arraylist, but in the future I will likely need the 2d arraylist once the program is completed.
package jordanNumberApp;
import java.util.Scanner;
import java.util.ArrayList;
/*
* Dev Wills
* Purpose: This code contains some methods that aren't developed. This program is supposed to
* store all possible number combinations from numbers 1-dimBase for the math expression
* "##+##" into a 2d arraylist at index row dimBase-1 and the columns storing the
* individual combinations. After storing the values in the arraylist, the print method
* pours the contents in order from the arraylist as string values.
*/
public class JordanNumberSystem {
// a-d are digits, assembled as a math expression, stored in outcomeOutput, outcomeAnswer
public static int dimBase, outcomeAnswer, a, b, c, d;
public static String inputOutcome, outcomeOutput;
public static final int NUM_OF_DIMENSIONS = 9; //Eventually # combinations go up to 9
public static ArrayList<ArrayList<String>> dimBaseArray;
public static Scanner keyboard;
/*
* Constructor for JordanNumber System
* accepts no parameters
*/
public JordanNumberSystem() // Defunct constructor
{
// Declare and Initialize public variables
this.dimBase = dimBase;
this.outcomeOutput = outcomeOutput;
this.outcomeAnswer = outcomeAnswer;
}
// Set all values of variable values
public static void setAllValues()
{
// Initialize
dimBase = 1;
outcomeAnswer = 22; // variables not used for now
outcomeOutput = "1"; // variables not used for now
//a = 1;
//b = 1;
//c = 1;
//d = 1;
dimBaseArray = new ArrayList<ArrayList<String>>();
keyboard = new Scanner(System.in);
}
public static void buildDimArray(int dim)
{
dimBase = dim;
try
{
//create first row
dimBaseArray.add(dimBase-1, new ArrayList<String>());
if( dimBase == 1)
{
a = b = c = d = dimBase ;
dimBaseArray.get(0).add(a+""+b+"+"+c+""+d);
System.out.println("WTF"); // SHOWS
}
else if (dimBase == 2)
{ // dim = 2
a = b = c = d = 1 ;
System.out.println("WTF"); // doesn't show
// dimBaseArray.get(dimBase-1).add(a+""+b+"+"+c+""+d);
for( int i = 1 ; i <= dim ; i++)
a=i;
for( int j = 1 ; j <= dim ; j++)
b=j;
for( int k = 1 ; k <= dim ; k++)
c=k;
for( int l = 1 ; l <= dim ; l++)
{
d=l;
dimBaseArray.get(dim-1).add(a+""+b+"+"+c+""+d);
}
}
else if (dimBase == 3)
{
a = b = c = d = dimBase;
dimBaseArray.get(2).add(a+""+b+"+"+c+""+d);
System.out.println("WTF");
}
}catch (IndexOutOfBoundsException e)
{
System.out.println(e.getMessage());
}
}
public static void printArray(int num) // Prints the contents of the array
{ // Fixing the printing method
try
{
int i = num-1;
for( String string : dimBaseArray.get(i))
{
System.out.println(string);
System.out.println("");
}
} catch (IndexOutOfBoundsException e)
{
System.out.println(e.getMessage());
}
}
public static void main(String[] args) throws java.lang.IndexOutOfBoundsException
{
setAllValues(); // sets the initial a,b,c,d values and dimBase, initializes 2d arraylist
// Get the Dimension Base number
System.out.println("Enter Dimension Base Number. Input an integer: ");
int dimBaseInput = keyboard.nextInt(); // Receives integer
dimBase = dimBaseInput;
if( dimBase != 1 && dimBase != 2 && dimBase != 3)
{// Error checking
System.out.println("invalid Dimension Base Number should be 1 or 2 ");
System.exit(1);
}
// Build the arraylist, print, clear, exit
buildDimArray(dimBase);
printArray(dimBase);
dimBaseArray.clear();
System.exit(1);
}
}// End of class

Store cell values to Object

I'm currently new to TestNG using java. I'm trying to read the values from an excel using poi apache 4.0
public static void read2dRowExcelFile2(String filePath) throws IOException {
try {
FileInputStream fis = new FileInputStream(new File(filePath));
HSSFWorkbook wb = new HSSFWorkbook(fis);
HSSFSheet sheet = wb.getSheet("PerLocation");
Object[][] tableArr = new String[sheet.getLastRowNum() + 1][];
int arrNo1 = 0;
for (int i = 1; i <= sheet.getLastRowNum(); i++) {
Row row = sheet.getRow(i);
int arrNo2 = 0;
for (int j = 0; j < row.getLastCellNum(); j++) {
String cellValue = row.getCell(j).getStringCellValue();
System.out.println(acellValue);
//tableArr[arrNo1][arrNo2] = cellValue;
System.out.println("test");
arrNo2++;
}
arrNo1++;
}
wb.close();
} catch (Exception e) {
e.printStackTrace();
}
}
Code above displays the values on the console. My goal is to store those values to an Object. Something like [{London, Blue},{Tokyo,Yellow},{Manila,Red}] so I can pass them to a dataProvider
If I run the code above, it displays :
London
BLue
Tokyo
Yellow
Manila
Red
But If i uncomment this line :
//tableArr[arrNo1][arrNo2] = cellValue;
The output is only :
London
03-08-19 : After I enabled stacktrace, it says : java.lang.NullPointerException
which pertains to this code :
tableArr[arrNo1][arrNo2] = cellValue;
From your code,
Object[][] tableArr = new String[sheet.getLastRowNum() + 1][];
At the time of initialization of array, your are setting size of first dimension but not the second . You need to initialize array for second dimension before you access it.
Refer below example:
public static void main(String[] args) {
//read rows size
int numOfRows = 3;
String[][] tableArr = new String[numOfRows][];
for (int row = 0; row <3; row++) {
//read columns size
int numOfColsInRow = 3;
tableArr[row]=new String[numOfColsInRow];
for (int col = 0; col < 3; col++) {
String cellValue = "cell-" + row+""+col;//read cell value
tableArr[row][col] = cellValue;
}
}
for(String[] row: tableArr) {
System.out.println(Arrays.toString(row));
}
}
Running above code with generate expected output:
[cell-00, cell-01, cell-02]
[cell-10, cell-11, cell-12]
[cell-20, cell-21, cell-22]
To reproduce your problem you can try commenting line which initialize array for second dimension in the code and you will see Exception in thread "main" java.lang.NullPointerException
.
//tableArr[row]=new String[numOfColsInRow];
To avoid all such issues you also can check if any exiting TestNG data-provider extension satisfies your need.

How can I do this all in one query? Nhibernate

I have a list of Ids and I want to get all the rows back in one query. As a list of objects(So a List of Products or whatever).
I tried
public List<TableA> MyMethod(List<string> keys)
{
var query = "SELECT * FROM TableA WHERE Keys IN (:keys)";
var a = session.CreateQuery(query).SetParameter("keys", keys).List();
return a; // a is a IList but not of TableA. So what do I do now?
}
but I can't figure out how to return it as a list of objects. Is this the right way?
List<TableA> result = session.CreateQuery(query)
.SetParameterList("keys", keys)
.List<TableA>();
Howeever there could be a limitation in this query if number of ":keys" exceed more than 1000 (incase of oracle not sure with other dbs) so i would recommend to use ICriteria instead of CreateQuery- native sqls.
Do something like this,
[TestFixture]
public class ThousandIdsNHibernateQuery
{
[Test]
public void TestThousandIdsNHibernateQuery()
{
//Keys contains 1000 ids not included here.
var keys = new List<decimal>();
using (ISession session = new Session())
{
var tableCirt = session.CreateCriteria(typeof(TableA));
if (keys.Count > 1000)
{
var listsList = new List<List<decimal>>();
//Get first 1000.
var first1000List = keys.GetRange(0, 1000);
//Split next keys into 1000 chuncks.
for (int i = 1000; i < keys.Count; i++)
{
if ((i + 1)%1000 == 0)
{
var newList = new List<decimal>();
newList.AddRange(keys.GetRange(i - 999, 1000));
listsList.Add(newList);
}
}
ICriterion firstExp = Expression.In("Key", first1000List);
ICriterion postExp = null;
foreach (var list in listsList)
{
postExp = Expression.In("Key", list);
tableCirt.Add(Expression.Or(firstExp, postExp));
firstExp = postExp;
}
tableCirt.Add(postExp);
}
else
{
tableCirt.Add(Expression.In("key", keys));
}
var results = tableCirt.List<TableA>();
}
}
}