how do I pipe large resultsets from SQL query directly to CSV in LINQpad?

how do I pipe large resultsets from SQL query directly to CSV in LINQpad? - linqpad

We are using Linqpad for our data stewards as a reporting tool alternative to SSMS. Most of our stewards are still stuck using SQL (we're slowly transitioning some of them to LINQ but one step at a time). However, we've come across a limitation in LINQPad that I'm not sure how to deal with. For large resultsets, since LINQPad first pulls the query into memory before pushing to the screen we run out of memory on large resultsets. Is there some way to push a SQL query in LINQPad directly to a CSV?

LINQPad has no built-in method for this (I should add it), but it's easy enough to write. Put the following into 'My Extensions':
public static class MyExtensions
{
public static void WriteCsv<T> (this IEnumerable<T> elements, string filePath)
{
var fields = typeof (T).GetFields();
var props = typeof (T).GetProperties()
.Where (p => IsSimpleType (p.PropertyType))
.ToArray();
using (var writer = new StreamWriter (filePath))
{
string header = string.Join (",",
fields.Select (f => f.Name).Concat (props.Select (p => p.Name)))
// Work around bug in Excel
if (header.StartsWith ("ID")) header = " " + header;
writer.WriteLine (header);
foreach (var element in elements)
{
var values =
fields.Select (f => Format (f.GetValue (element))).Concat (
props.Select (p => Format (p.GetValue (element, null))));
writer.WriteLine (string.Join (",", values));
}
}
}
static string Format (object value)
{
if (value == null) return "";
// With DateTimes, it's safest to force a culture insensitive format:
if (value is DateTime) return ((DateTime)value).ToString ("s");
if (value is DateTimeOffset) return ((DateTimeOffset)value).ToString ("s");
string result = value.ToString();
result = result.Replace ("\"", "\"\"");
if (result.Contains (",") || result.Contains ("\"") || result.Any (c => c < 0x20))
result = "\"" + result + "\"";
return result;
}
public static bool IsSimpleType (Type t)
{
if (t.IsGenericType && t.GetGenericTypeDefinition () == typeof (Nullable<>))
t = t.GetGenericArguments() [0];
return t.IsPrimitive || Type.GetTypeCode (t) != TypeCode.Object ||
typeof (IFormattable).IsAssignableFrom (t);
}
}
Note that this formats values using the current culture (with the exception of dates). If you want use invariant culture, this is easy enough to change.
Before calling it, you must disable object tracking to avoid caching the objects in memory by setting ObjectTrackingEnabled to false:
ObjectTrackingEnabled = false;
Customers.WriteCsv (#"c:\\temp\\customers.csv");

Related

Using Lucene's highlighting, getting too much highlighted, is there a workaround for this?

I am using the highlighting feature of Lucene to isolate matching terms for my query, but some of the matched terms are excessive.
I have some simple test cases which are delivered in an Ant project (download details below).
Materials
You can download the test case here: mydemo_with_libs.zip
That archive includes the Lucene 8.6.3 libraries which my test uses; if you prefer a copy without the JAR files you can download that from here: mydemo_without_libs.zip
The necessary libraries are: core, analyzers, queries, queryparser, highlighter, and memory.
You can run the test case by unzipping the archive into an empty directory and running the Ant command ant synsearch
Input
I have provided a short synonym list which is used for indexing and analysing in the highlighting methods:
cope,manage
jobs,tasks
simultaneously,at once
and there is one document being indexed:
Queues are a useful way of grouping jobs together in order to manage a number of them at once. You can:
hold or release multiple jobs at the same time;
group multiple tasks (for the same event);
control the priority of jobs in the queue;
Eventually log all events that take place in a queue.
Use either job.queue or task.queue in specifications.
Process
When building the index I am storing the text field, and using a custom analyzer. This is because (in the real world) the content I am indexing is technical documentation, so stripping out punctuation is inappropriate because so much of it may be significant in technical expressions. My analyzer uses a TechTokenFilter which breaks the stream up into tokens consisting of strings of words or digits, or individual characters which don't match the previous pattern.
Here's the relevant code for the analyzer:
public class MyAnalyzer extends Analyzer {
public MyAnalyzer(String synlist) {
if (synlist != "") {
this.synlist = synlist;
this.useSynonyms = true;
}
}
public MyAnalyzer() {
this.useSynonyms = false;
}
#Override
protected TokenStreamComponents createComponents(String fieldName) {
WhitespaceTokenizer src = new WhitespaceTokenizer();
TokenStream result = new TechTokenFilter(new LowerCaseFilter(src));
if (useSynonyms) {
result = new SynonymGraphFilter(result, getSynonyms(synlist), Boolean.TRUE);
result = new FlattenGraphFilter(result);
}
return new TokenStreamComponents(src, result);
}
and here's my filter:
public class TechTokenFilter extends TokenFilter {
private final CharTermAttribute termAttr;
private final PositionIncrementAttribute posIncAttr;
private final ArrayList<String> termStack;
private AttributeSource.State current;
private final TypeAttribute typeAttr;
public TechTokenFilter(TokenStream tokenStream) {
super(tokenStream);
termStack = new ArrayList<>();
termAttr = addAttribute(CharTermAttribute.class);
posIncAttr = addAttribute(PositionIncrementAttribute.class);
typeAttr = addAttribute(TypeAttribute.class);
}
#Override
public boolean incrementToken() throws IOException {
if (this.termStack.isEmpty() && input.incrementToken()) {
final String currentTerm = termAttr.toString();
final int bufferLen = termAttr.length();
if (bufferLen > 0) {
if (termStack.isEmpty()) {
termStack.addAll(Arrays.asList(techTokens(currentTerm)));
current = captureState();
}
}
}
if (!this.termStack.isEmpty()) {
String part = termStack.remove(0);
restoreState(current);
termAttr.setEmpty().append(part);
posIncAttr.setPositionIncrement(1);
return true;
} else {
return false;
}
}
public static String[] techTokens(String t) {
List<String> tokenlist = new ArrayList<String>();
String[] tokens;
StringBuilder next = new StringBuilder();
String token;
char minus = '-';
char underscore = '_';
char c, prec, subc;
// Boolean inWord = false;
for (int i = 0; i < t.length(); i++) {
prec = i > 0 ? t.charAt(i - 1) : 0;
c = t.charAt(i);
subc = i < (t.length() - 1) ? t.charAt(i + 1) : 0;
if (Character.isLetterOrDigit(c) || c == underscore) {
next.append(c);
// inWord = true;
}
else if (c == minus && Character.isLetterOrDigit(prec) && Character.isLetterOrDigit(subc)) {
next.append(c);
} else {
if (next.length() > 0) {
token = next.toString();
tokenlist.add(token);
next.setLength(0);
}
if (Character.isWhitespace(c)) {
// shouldn't be possible because the input stream has been tokenized on
// whitespace
} else {
tokenlist.add(String.valueOf(c));
}
// inWord = false;
}
}
if (next.length() > 0) {
token = next.toString();
tokenlist.add(token);
// next.setLength(0);
}
tokens = tokenlist.toArray(new String[0]);
return tokens;
}
}
Examining the index I can see that the index contains the separate terms I expect, including the synonym values. For example the text at the end of the first line has produced the terms
of
them
at , simultaneously
once
.
You
can
:
and the text at the end of the third line has produced the terms
same
event
)
;
When the application performs a search it analyzes the query without using the synonym list (because the synonyms are already in the index), but I have discovered that I need to include the synonym list when analyzing the stored text to identify the matching fragments.
Searches match the correct documents, but the code I have added to identify the matching terms over-performs. I won't show all the search method here, but will focus on the code which lists matched terms:
public static void doSearch(IndexReader reader, IndexSearcher searcher,
Query query, int max, String synList) throws IOException {
SimpleHTMLFormatter htmlFormatter = new SimpleHTMLFormatter("\001", "\002");
Highlighter highlighter = new Highlighter(htmlFormatter, new QueryScorer(query));
Analyzer analyzer;
if (synList != null) {
analyzer = new MyAnalyzer(synList);
} else {
analyzer = new MyAnalyzer();
}
// Collect all the docs
TopDocs results = searcher.search(query, max);
ScoreDoc[] hits = results.scoreDocs;
int numTotalHits = Math.toIntExact(results.totalHits.value);
System.out.println("\nQuery: " + query.toString());
System.out.println("Matches: " + numTotalHits);
// Collect matching terms
HashSet<String> matchedWords = new HashSet<String>();
int start = 0;
int end = Math.min(numTotalHits, max);
for (int i = start; i < end; i++) {
int id = hits[i].doc;
float score = hits[i].score;
Document doc = searcher.doc(id);
String docpath = doc.get("path");
String doctext = doc.get("text");
try {
TokenStream tokens = TokenSources.getTokenStream("text", null, doctext, analyzer, -1);
TextFragment[] frag = highlighter.getBestTextFragments(tokens, doctext, false, 100);
for (int j = 0; j < frag.length; j++) {
if ((frag[j] != null) && (frag[j].getScore() > 0)) {
String match = frag[j].toString();
addMatchedWord(matchedWords, match);
}
}
} catch (InvalidTokenOffsetsException e) {
System.err.println(e.getMessage());
}
System.out.println("matched file: " + docpath);
}
if (matchedWords.size() > 0) {
System.out.println("matched terms:");
for (String word : matchedWords) {
System.out.println(word);
}
}
}
Problem
While the correct documents are selected by these queries, and the fragments chosen for highlighting do contain the query terms, the highlighted pieces in some of the selected fragments extend over too much of the input.
For example, if the query is
+text:event +text:manage
(the first example in the test case) then I would expect to see 'event' and 'manage' in the highlighted list. But what I actually see is
event);
manage
Despite the highlighting process using an analyzer which breaks terms apart and treats punctuation characters as single terms, the highlight code is "hungry" and breaks on whitespace alone.
Similarly if the query is
+text:queeu~1
(my final test case) I would expect to only see 'queue' in the list. But I get
queue.
job.queue
task.queue
queue;
It is so nearly there... but I don't understand why the highlighted pieces are inconsistent with the index, and I don't think I should have to parse the list of matches through yet another filter to produce the correct list of matches.
I would really appreciate any pointers to what I am doing wrong or how I could improve my code to deliver exactly what I need.
Thanks for reading this far!

I managed to get this working by replacing the WhitespaceTokenizer and TechTokenFilter in my analyzer with a PatternTokenizer; the regular expression took a bit of work but once I had it all the matching terms were extracted with pinpoint accuracy.
The replacement analyzer:
public class MyAnalyzer extends Analyzer {
public MyAnalyzer(String synlist) {
if (synlist != "") {
this.synlist = synlist;
this.useSynonyms = true;
}
}
public MyAnalyzer() {
this.useSynonyms = false;
}
private static final String tokenRegex = "(([\\w]+-)*[\\w]+)|[^\\w\\s]";
#Override
protected TokenStreamComponents createComponents(String fieldName) {
PatternTokenizer src = new PatternTokenizer(Pattern.compile(tokenRegex), 0);
TokenStream result = new LowerCaseFilter(src);
if (useSynonyms) {
result = new SynonymGraphFilter(result, getSynonyms(synlist), Boolean.TRUE);
result = new FlattenGraphFilter(result);
}
return new TokenStreamComponents(src, result);
}

Is there a way i can write to CSV Faster? [duplicate]

Could somebody please tell me why the following code is not working. The data is saved into the csv file, however the data is not separated. It all exists within the first cell of each row.
StringBuilder sb = new StringBuilder();
foreach (DataColumn col in dt.Columns)
{
sb.Append(col.ColumnName + ',');
}
sb.Remove(sb.Length - 1, 1);
sb.Append(Environment.NewLine);
foreach (DataRow row in dt.Rows)
{
for (int i = 0; i < dt.Columns.Count; i++)
{
sb.Append(row[i].ToString() + ",");
}
sb.Append(Environment.NewLine);
}
File.WriteAllText("test.csv", sb.ToString());
Thanks.

The following shorter version opens fine in Excel, maybe your issue was the trailing comma
.net = 3.5
StringBuilder sb = new StringBuilder();
string[] columnNames = dt.Columns.Cast<DataColumn>().
Select(column => column.ColumnName).
ToArray();
sb.AppendLine(string.Join(",", columnNames));
foreach (DataRow row in dt.Rows)
{
string[] fields = row.ItemArray.Select(field => field.ToString()).
ToArray();
sb.AppendLine(string.Join(",", fields));
}
File.WriteAllText("test.csv", sb.ToString());
.net >= 4.0
And as Tim pointed out, if you are on .net>=4, you can make it even shorter:
StringBuilder sb = new StringBuilder();
IEnumerable<string> columnNames = dt.Columns.Cast<DataColumn>().
Select(column => column.ColumnName);
sb.AppendLine(string.Join(",", columnNames));
foreach (DataRow row in dt.Rows)
{
IEnumerable<string> fields = row.ItemArray.Select(field => field.ToString());
sb.AppendLine(string.Join(",", fields));
}
File.WriteAllText("test.csv", sb.ToString());
As suggested by Christian, if you want to handle special characters escaping in fields, replace the loop block by:
foreach (DataRow row in dt.Rows)
{
IEnumerable<string> fields = row.ItemArray.Select(field =>
string.Concat("\"", field.ToString().Replace("\"", "\"\""), "\""));
sb.AppendLine(string.Join(",", fields));
}
And last suggestion, you could write the csv content line by line instead of as a whole document, to avoid having a big document in memory.

I wrapped this up into an extension class, which allows you to call:
myDataTable.WriteToCsvFile("C:\\MyDataTable.csv");
on any DataTable.
public static class DataTableExtensions
{
public static void WriteToCsvFile(this DataTable dataTable, string filePath)
{
StringBuilder fileContent = new StringBuilder();
foreach (var col in dataTable.Columns)
{
fileContent.Append(col.ToString() + ",");
}
fileContent.Replace(",", System.Environment.NewLine, fileContent.Length - 1, 1);
foreach (DataRow dr in dataTable.Rows)
{
foreach (var column in dr.ItemArray)
{
fileContent.Append("\"" + column.ToString() + "\",");
}
fileContent.Replace(",", System.Environment.NewLine, fileContent.Length - 1, 1);
}
System.IO.File.WriteAllText(filePath, fileContent.ToString());
}
}

A new extension function based on Paul Grimshaw's answer. I cleaned it up and added the ability to handle unexpected data. (Empty Data, Embedded Quotes, and comma's in the headings...)
It also returns a string which is more flexible. It returns Null if the table object does not contain any structure.
public static string ToCsv(this DataTable dataTable) {
StringBuilder sbData = new StringBuilder();
// Only return Null if there is no structure.
if (dataTable.Columns.Count == 0)
return null;
foreach (var col in dataTable.Columns) {
if (col == null)
sbData.Append(",");
else
sbData.Append("\"" + col.ToString().Replace("\"", "\"\"") + "\",");
}
sbData.Replace(",", System.Environment.NewLine, sbData.Length - 1, 1);
foreach (DataRow dr in dataTable.Rows) {
foreach (var column in dr.ItemArray) {
if (column == null)
sbData.Append(",");
else
sbData.Append("\"" + column.ToString().Replace("\"", "\"\"") + "\",");
}
sbData.Replace(",", System.Environment.NewLine, sbData.Length - 1, 1);
}
return sbData.ToString();
}
You call it as follows:
var csvData = dataTableOject.ToCsv();

If your calling code is referencing the System.Windows.Forms assembly, you may consider a radically different approach.
My strategy is to use the functions already provided by the framework to accomplish this in very few lines of code and without having to loop through columns and rows. What the code below does is programmatically create a DataGridView on the fly and set the DataGridView.DataSource to the DataTable. Next, I programmatically select all the cells (including the header) in the DataGridView and call DataGridView.GetClipboardContent(), placing the results into the Windows Clipboard. Then, I 'paste' the contents of the clipboard into a call to File.WriteAllText(), making sure to specify the formatting of the 'paste' as TextDataFormat.CommaSeparatedValue.
Here is the code:
public static void DataTableToCSV(DataTable Table, string Filename)
{
using(DataGridView dataGrid = new DataGridView())
{
// Save the current state of the clipboard so we can restore it after we are done
IDataObject objectSave = Clipboard.GetDataObject();
// Set the DataSource
dataGrid.DataSource = Table;
// Choose whether to write header. Use EnableWithoutHeaderText instead to omit header.
dataGrid.ClipboardCopyMode = DataGridViewClipboardCopyMode.EnableAlwaysIncludeHeaderText;
// Select all the cells
dataGrid.SelectAll();
// Copy (set clipboard)
Clipboard.SetDataObject(dataGrid.GetClipboardContent());
// Paste (get the clipboard and serialize it to a file)
File.WriteAllText(Filename,Clipboard.GetText(TextDataFormat.CommaSeparatedValue));
// Restore the current state of the clipboard so the effect is seamless
if(objectSave != null) // If we try to set the Clipboard to an object that is null, it will throw...
{
Clipboard.SetDataObject(objectSave);
}
}
}
Notice I also make sure to preserve the contents of the clipboard before I begin, and restore it once I'm done, so the user does not get a bunch of unexpected garbage next time the user tries to paste. The main caveats to this approach is 1) Your class has to reference System.Windows.Forms, which may not be the case in a data abstraction layer, 2) Your assembly will have to be targeted for .NET 4.5 framework, as DataGridView does not exist in 4.0, and 3) The method will fail if the clipboard is being used by another process.
Anyways, this approach may not be right for your situation, but it is interesting none the less, and can be another tool in your toolbox.

I did this recently but included double quotes around my values.
For example, change these two lines:
sb.Append("\"" + col.ColumnName + "\",");
...
sb.Append("\"" + row[i].ToString() + "\",");

Try changing sb.Append(Environment.NewLine); to sb.AppendLine();.
StringBuilder sb = new StringBuilder();
foreach (DataColumn col in dt.Columns)
{
sb.Append(col.ColumnName + ',');
}
sb.Remove(sb.Length - 1, 1);
sb.AppendLine();
foreach (DataRow row in dt.Rows)
{
for (int i = 0; i < dt.Columns.Count; i++)
{
sb.Append(row[i].ToString() + ",");
}
sb.AppendLine();
}
File.WriteAllText("test.csv", sb.ToString());

4 lines of code:
public static string ToCSV(DataTable tbl)
{
StringBuilder strb = new StringBuilder();
//column headers
strb.AppendLine(string.Join(",", tbl.Columns.Cast<DataColumn>()
.Select(s => "\"" + s.ColumnName + "\"")));
//rows
tbl.AsEnumerable().Select(s => strb.AppendLine(
string.Join(",", s.ItemArray.Select(
i => "\"" + i.ToString() + "\"")))).ToList();
return strb.ToString();
}
Note that the ToList() at the end is important; I need something to force an expression evaluation. If I was code golfing, I could use Min() instead.
Also note that the result will have a newline at the end because of the last call to AppendLine(). You may not want this. You can simply call TrimEnd() to remove it.

Try to put ; instead of ,
Hope it helps

The error is the list separator.
Instead of writing sb.Append(something... + ',') you should put something like sb.Append(something... + System.Globalization.CultureInfo.CurrentCulture.TextInfo.ListSeparator);
You must put the list separator character configured in your operating system (like in the example above), or the list separator in the client machine where the file is going to be watched. Another option would be to configure it in the app.config or web.config as a parammeter of your application.

To write to a file, I think the following method is the most efficient and straightforward: (You can add quotes if you want)
public static void WriteCsv(DataTable dt, string path)
{
using (var writer = new StreamWriter(path)) {
writer.WriteLine(string.Join(",", dt.Columns.Cast<DataColumn>().Select(dc => dc.ColumnName)));
foreach (DataRow row in dt.Rows) {
writer.WriteLine(string.Join(",", row.ItemArray));
}
}
}

Read this and this?
A better implementation would be
var result = new StringBuilder();
for (int i = 0; i < table.Columns.Count; i++)
{
result.Append(table.Columns[i].ColumnName);
result.Append(i == table.Columns.Count - 1 ? "\n" : ",");
}
foreach (DataRow row in table.Rows)
{
for (int i = 0; i < table.Columns.Count; i++)
{
result.Append(row[i].ToString());
result.Append(i == table.Columns.Count - 1 ? "\n" : ",");
}
}
File.WriteAllText("test.csv", result.ToString());

To mimic Excel CSV:
public static string Convert(DataTable dt)
{
StringBuilder sb = new StringBuilder();
IEnumerable<string> columnNames = dt.Columns.Cast<DataColumn>().
Select(column => column.ColumnName);
sb.AppendLine(string.Join(",", columnNames));
foreach (DataRow row in dt.Rows)
{
IEnumerable<string> fields = row.ItemArray.Select(field =>
{
string s = field.ToString().Replace("\"", "\"\"");
if(s.Contains(','))
s = string.Concat("\"", s, "\"");
return s;
});
sb.AppendLine(string.Join(",", fields));
}
return sb.ToString().Trim();
}

Here is an enhancement to vc-74's post that handles commas the same way Excel does. Excel puts quotes around data if the data has a comma but doesn't quote if the data doesn't have a comma.
public static string ToCsv(this DataTable inDataTable, bool inIncludeHeaders = true)
{
var builder = new StringBuilder();
var columnNames = inDataTable.Columns.Cast<DataColumn>().Select(column => column.ColumnName);
if (inIncludeHeaders)
builder.AppendLine(string.Join(",", columnNames));
foreach (DataRow row in inDataTable.Rows)
{
var fields = row.ItemArray.Select(field => field.ToString().WrapInQuotesIfContains(","));
builder.AppendLine(string.Join(",", fields));
}
return builder.ToString();
}
public static string WrapInQuotesIfContains(this string inString, string inSearchString)
{
if (inString.Contains(inSearchString))
return "\"" + inString+ "\"";
return inString;
}

Here is my solution, based on previous answers by Paul Grimshaw and Anthony VO.
I've submitted the code in a C# project on Github.
My main contribution is to eliminate explicitly creating and manipulating a StringBuilder and instead working only with IEnumerable. This avoids the allocation of a big buffer in memory.
public static class Util
{
public static string EscapeQuotes(this string self) {
return self?.Replace("\"", "\"\"") ?? "";
}
public static string Surround(this string self, string before, string after) {
return $"{before}{self}{after}";
}
public static string Quoted(this string self, string quotes = "\"") {
return self.Surround(quotes, quotes);
}
public static string QuotedCSVFieldIfNecessary(this string self)
{
return (self == null) ? "" : (self.Contains('"') || self.Contains('\r') || self.Contains('\n') || self.Contains(',')) ? self.Quoted() : self;
}
public static string ToCsvField(this string self) {
return self.EscapeQuotes().QuotedCSVFieldIfNecessary();
}
public static string ToCsvRow(this IEnumerable<string> self){
return string.Join(",", self.Select(ToCsvField));
}
public static IEnumerable<string> ToCsvRows(this DataTable self) {
yield return self.Columns.OfType<object>().Select(c => c.ToString()).ToCsvRow();
foreach (var dr in self.Rows.OfType<DataRow>())
yield return dr.ItemArray.Select(item => item.ToString()).ToCsvRow();
}
public static void ToCsvFile(this DataTable self, string path) {
File.WriteAllLines(path, self.ToCsvRows());
}
}
This approach combines nicely with converting IEnumerable to DataTable as asked here.

StringBuilder sb = new StringBuilder();
SaveFileDialog fileSave = new SaveFileDialog();
IEnumerable<string> columnNames = tbCifSil.Columns.Cast<DataColumn>().
Select(column => column.ColumnName);
sb.AppendLine(string.Join(",", columnNames));
foreach (DataRow row in tbCifSil.Rows)
{
IEnumerable<string> fields = row.ItemArray.Select(field =>string.Concat("\"", field.ToString().Replace("\"", "\"\""), "\""));
sb.AppendLine(string.Join(",", fields));
}
fileSave.ShowDialog();
File.WriteAllText(fileSave.FileName, sb.ToString());

public void ExpoetToCSV(DataTable dtDataTable, string strFilePath)
{
StreamWriter sw = new StreamWriter(strFilePath, false);
//headers
for (int i = 0; i < dtDataTable.Columns.Count; i++)
{
sw.Write(dtDataTable.Columns[i].ToString().Trim());
if (i < dtDataTable.Columns.Count - 1)
{
sw.Write(",");
}
}
sw.Write(sw.NewLine);
foreach (DataRow dr in dtDataTable.Rows)
{
for (int i = 0; i < dtDataTable.Columns.Count; i++)
{
if (!Convert.IsDBNull(dr[i]))
{
string value = dr[i].ToString().Trim();
if (value.Contains(','))
{
value = String.Format("\"{0}\"", value);
sw.Write(value);
}
else
{
sw.Write(dr[i].ToString().Trim());
}
}
if (i < dtDataTable.Columns.Count - 1)
{
sw.Write(",");
}
}
sw.Write(sw.NewLine);
}
sw.Close();
}

Possibly, most easy way will be to use:
https://github.com/ukushu/DataExporter
especially in case of your data of datatable containing /r/n characters or separator symbol inside of your dataTable cells. Almost all of other answers will not work with such cells.
only you need is to write the following code:
Csv csv = new Csv("\t");//Needed delimiter
var columnNames = dt.Columns.Cast<DataColumn>().
Select(column => column.ColumnName).ToArray();
csv.AddRow(columnNames);
foreach (DataRow row in dt.Rows)
{
var fields = row.ItemArray.Select(field => field.ToString()).ToArray;
csv.AddRow(fields);
}
csv.Save();

Most existing answers can easily cause OutOfMemoryException, so I decided to write my own answer.
DON' T DO THIS:
using a DataSet + StringBuilder causes the data to occupy the memory 3x at once:
Load All Data into DataSet
Copy all data into StringBuilder
Copy the data to string using StringBuilder.ToString();
Instead you should write each row to a FileStream separately. There is no need to create the whole CSV in memory.
Even better, use a DataReader instead DataSet. That way you can read from database billions of records one by one a write the to a file one by one.
If you don't mind using an external library for CSV, I can recommend the most popular CsvHelper, which has no dependencies.
using (var writer = new FileWriter("test.csv"))
using (var csv = new CsvWriter(writer, CultureInfo.InvariantCulture))
{
foreach (DataColumn dc in dt.Columns)
{
csv.WriteField(dc.ColumnName);
}
csv.NextRecord();
foreach (DataRow dr in dt.Rows)
{
foreach (DataColumn dc in dt.Columns)
{
csv.WriteField(dr[dc]);
}
csv.NextRecord();
}
writer.ToString().Dump();
}

In case anyone else stumbles on this, I was using File.ReadAllText to get CSV data and then I modified it and wrote it back with File.WriteAllText. The \r\n CRLFs were fine but the \t tabs were ignored when Excel opened it. (All solutions in this thread so far use a comma delimiter but that doesn't matter.) Notepad showed the same format in the resulting file as in the source. A Diff even showed the files as identical. But I got a clue when I opened the file in Visual Studio with a binary editor. The source file was Unicode but the target was ASCII. To fix, I modified both ReadAllText and WriteAllText with third argument set as System.Text.Encoding.Unicode, and from there Excel was able to open the updated file.

How to Error Handle a NullReferenceException

My website went down for a few days, therefore I am trying to produce some error handling while the MVC app doesnt have access to certain resources so if something doesnt become unavailable again the WHOLE THING doesnt have to go down.
At the moment a controller is trying to access viewbag.moreNewProducts that isnt available.
public ActionResult Index(string search)
{
string[] newProductLines = this.getMoreNewProducts();
string[] newNews = this.getMoreNews();
string[] newPromotions = this.getMorePromotions();
string[] fewerProductLines = this.getLessNewProducts(newProductLines);
ViewBag.moreNewProducts = newProductLines;
ViewBag.moreNews = newNews;
ViewBag.morePromotions = newPromotions;
ViewBag.lessNewProducts = fewerProductLines;
bool disableShowMore = false;
This is where I run into an error: " foreach (string line in newProductLines)"
public string[] getLessNewProducts(string[] newProductLines)
{
int charCount = 0;
int arrayCount = 0;
string[] displayProductLines = new string[6];
bool continueWriting;
if (newProductLines == null)
{
foreach (string line in newProductLines)
{
continueWriting = false;
for (int i = 0; charCount < 250 && i < line.Length && arrayCount < 5; i++)
{
string index = newProductLines[arrayCount].Substring(i, 1);
displayProductLines[arrayCount] += index;
charCount++;
continueWriting = true;
}
if (continueWriting == true)
{
arrayCount++;
}
}
string[] LessNewProducts = new string[arrayCount];
for (int d = 0; d < arrayCount; d++)
{
LessNewProducts[d] = displayProductLines[d];
}
return LessNewProducts;
}
else
{
return null;
}
}
how do I get around an if else statement so the whole thing doesnt have to crash?

Two things.
Your if (newProductLines == null) statement has the wrong condition on it. I don't believe that you want to enter that if newProductLines is null. You can inverse this condition to get the desired result(if (newProductLines != null)).
If you run into another situation later where you need to catch an error, you can always use the try-catch block to catch exceptions that you are expecting.
try
{
//code that could cause the error here
}
catch(NullReferenceException nullRefExcep)
{
//what you want it to do if the null reference exception occurs
}

if (newProductLines == null)
should be replaced with if (newProductLines != null) so you don't have to handle the code with newProductLines as null. Basically, with this condition you will always have the NullReferenceException unless you manage your exception with a try catch block.

The real question to ask yourself is:
Why would newProductLines be null?
Presumably getMoreNewProducts() found a situation where it thought it would be appropriate to return a null value.
If this is happening because the system has an error that would make your page meaningless, then you may just want to change getMoreNewProducts() so that it throws an exception when that error state occurs. Typically it's safest and easiest to debug programs that fail as soon as they run into an unexpected situation.
If this is happening because there are no new products, then you should just return an empty collection, rather than null. All your code should work just fine after that, without the need for an if/else statement: it will return an empty array for LessNewProducts, which is probably correct.
However, let's assume that there's a situation that you're anticipating will occur from time to time, which will make it impossible for you to retrieve newProductLines at that time, but which you would like the system to handle gracefully otherwise. You could just use null to indicate that the value isn't there, but it's really hard to know which variables might be null and which never should be. It may be wiser to use an optional type to represent that getMoreNewProducts() might not return anything at all, so you can force any consuming code to recognize this possibility and figure out how to deal with it before the project will even compile:
public ActionResult Index(string search)
{
Maybe<string[]> newProductLines = this.getMoreNewProducts();
string[] newNews = this.getMoreNews();
string[] newPromotions = this.getMorePromotions();
Maybe<string[]> fewerProductLines = newProductLines.Select(this.getLessNewProducts);
Disclaimer: I am the author of the Maybe<> class referenced above.
Here are some additional improvements I'd suggest:
Don't use ViewBag. Instead, create a strongly-typed ViewModel so that you can catch errors in your code at compile-time more often:
var viewModel = new ReportModel {
newProductLines = this.getMoreNewProducts(),
newNews = this.getMoreNews(),
...
};
...
return View(viewModel);
Learn to use LINQ. It will simplify a lot of your very complicated code. For example, instead of:
string[] LessNewProducts = new string[arrayCount];
for (int d = 0; d < arrayCount; d++)
{
LessNewProducts[d] = displayProductLines[d];
}
return LessNewProducts;
... you can say:
string[] LessNewProducts = displayProductLines.Take(arrayCount).ToArray();
In fact, I think your entire getLessNewProducts() method can be replaced with this:
return newProductLines
.Where(line => line.Length > 0)
.Select(line => line.Substring(0, Math.Min(line.Length, 250)))
.Take(5);

WPF application Linq to sql getting data

I'm making a WPF application with a datagrid that displays some sql data.
Now i'm making a search field but that doesn't seem to work:
Contactpersoon is an nvarchar
bedrijf is an nvarchar
but
LeverancierPK is an INT
How can I combinate that in my search?
If i convert LeverancierPK to string, then I can use Contains but that gives me an error
//Inisiatie
PRCEntities vPRCEntities = new PRCEntities();
var vFound = from a in vPRCEntities.tblLeveranciers
where ((((a.LeverancierPK).ToString()).Contains(vWoord)) ||
(a.Contactpersoon.Contains(vWoord)) ||
(a.Bedrijf.Contains(vWoord)))
orderby a.LeverancierPK
select a;
myDataGrid_Leveranciers.ItemsSource = vFound;
Thanks

If you don't care about pulling all records back from the DB (which in your answer you pulled everything back), then you can just do a .ToList() before the where clause.
var vFound = vPRCEntities.tblLeveranciers.ToList()
.Where(a => a.LeverancierPK.ToString().Contains(vWoord)) ||
a.Contactpersoon.Contains(vWoord) ||
a.Bedrijf.Contains(vWoord))
.OrderBy(a.LeverancierPK);

This code can do what I was looking for but I think it could be alot shorter.
PRCEntities vPRCEntities = new PRCEntities();
var vFound = from a in vPRCEntities.tblLeveranciers
orderby a.LeverancierPK
select a;
myDataGrid_Leveranciers.ItemsSource = null;
myDataGrid_Leveranciers.Items.Clear();
foreach (var item in vFound)
{
if (item.Bedrijf.Contains(vWoord))
{
myDataGrid_Leveranciers.Items.Add(item);
}
else
{
if (item.LeverancierPK.ToString().Contains(vWoord))
{
myDataGrid_Leveranciers.Items.Add(item);
}
else
{
if (item.Contactpersoon != null)
{
if (item.Contactpersoon.Contains(vWoord))
{
myDataGrid_Leveranciers.Items.Add(item);
}
}
}
}
}

Performance and Linq in iterations

These 2 ways of working both work, but I'm wondering if there's a difference in performance:
Dim collection As ItemCollection = CType(CellCollection.Where(Function(i) i.IsPending = True), ItemCollection)
For Each item As Item In collection
'Do something here
Next
and
For Each item As Item In CellCollection.Where(Function(i) i.IsPending = True)
'Do something here
Next
I thought the second one was better as you'd have a variable less and looks cleaner, but on second thought, I'm not quite sure what happens when you put a linq query in the iteration.
Does it have to reevaluate every time a loop is made? And which one is the cleanest/most performant to use?
Thanks in advance.

I've created a simple test console app.
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using System.Text;
namespace LinqPerformance
{
class Program
{
static void Main(string[] args)
{
var data = Enumerable.Range(1, 100000000);
for (int x = 0; x < 10; x++)
{
ExecuteMethods(data);
}
}
private static void ExecuteMethods(IEnumerable<int> data)
{
Method1("linq collection", () =>
{
var collection = data.Where(d => d % 2 == 0);
double count = 0;
foreach (var c in collection)
{
count += c;
}
});
Method1("list collection", () =>
{
var collection = data.Where(d => d % 2 == 0).ToList();
double count = 0;
foreach (var c in collection)
{
count += c;
}
});
Method1("iterable collection", () =>
{
double count = 0;
foreach (var c in data.Where(d => d % 2 == 0))
{
count += c;
}
});
}
private static void Method1(string name, Action body)
{
Stopwatch s = new Stopwatch();
s.Start();
body();
s.Stop();
Console.WriteLine(name + ": " + s.Elapsed);
}
}
}
After running this I can see that the ToList() is the slowest. The other two approaches appear to be the same.
I suppose this is because the foreach is expanded to a
var enumerator = collection.GetEnumerator();
while(enumerator.MoveNext() )
{
var c = enumerator.Current;
count += c;
}

Performance is the same, whether you assign the Linq query to a variable, or call it directly in the For Each. In both case the iterator will be created once and the For Each loop will go through each item in the list once.
In the first code sample, the CType is not necessary though (actually I don't think it would work). You can simply do:
Dim collection = CellCollection.Where(Function(i) i.IsPending = True)
For Each item As Item In collection
'Do something here
Next
But as I mentioned, assigning to a variable is not necessary. Having the Where clause on the For Each line will yield the same performance, and the code will be shorter and more readable.

The performance for both is the same. foreach will create the IEnumerable(Of T) then enumerate through it.
However, if you're concerned about performance, try:
Dim collection As IEnumerable(Of Item) _
= CellCollection.Where(Function(i) i.IsPending)
For Each item As Item In collection
'Do something here
Next
It's possible that casting the IEnumerable(Of Item) to ItemCollection would cause it to enumerate (like ToArray or ToList). This will cause the collection to enumerate twice. Keeping it as IEnumerable ensures that the i.IsPending check happens during the enumeration of the For Each and not the CType().
The fastest solution would be to forgo LINQ altogether (LINQ statements, although readable, add some overhead).
For Each item As Item In CellCollection
If Not item.IsPending Then
Continue For
End If
'Do something here
Next

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

how do I pipe large resultsets from SQL query directly to CSV in LINQpad? - linqpad

Related

Using Lucene's highlighting, getting too much highlighted, is there a workaround for this?

Is there a way i can write to CSV Faster? [duplicate]

How to Error Handle a NullReferenceException

WPF application Linq to sql getting data

Performance and Linq in iterations

Categories

Resources