Amazon RDS w/ SQL Server wont allow bulk insert from CSV source - sql

I've tried two methods and both fall flat...
BULK INSERT TEMPUSERIMPORT1357081926
FROM 'C:\uploads\19E0E1.csv'
WITH (FIELDTERMINATOR = ',',ROWTERMINATOR = '\n')
You do not have permission to use the bulk load statement.
but you cannot enable that SQL Role with Amazon RDS?
So I tried... using openrowset but it requires AdHoc Queries to be enabled which I don't have permission to do!

I know this question is really old, but it was the first question that came up when I searched bulk inserting into an aws sql server rds instance. Things have changed and you can now do it after integrating the RDS instance with S3. I answered this question in more detail on this question. But overall gist is that you setup the instance with the proper role, put your file on S3, then you can copy the file over to RDS with the following commands:
exec msdb.dbo.rds_download_from_s3
#s3_arn_of_file='arn:aws:s3:::bucket_name/bulk_data.csv',
#rds_file_path='D:\S3\seed_data\data.csv',
#overwrite_file=1;
Then BULK INSERT will work:
FROM 'D:\S3\seed_data\data.csv'
WITH
(
FIRSTROW = 2,
FIELDTERMINATOR = ',',
ROWTERMINATOR = '\n'
)
AWS doc

You can enable ad hoc distributed queries via heading to your Amazon Management Console, navigating to your RDS menu and then creating a DB Parameter group with ad hoc distributed queries set to 1, and then attaching this parameter group to your DB instance.
Don't forget to reboot your DB once you have made these changes.
Here is the source of my information:
http://blogs.lessthandot.com/index.php/datamgmt/dbadmin/turning-on-optimize-for-ad/
Hope this helps you.

2022
I'm adding for anyone like me who wants to quickly insert data into RDS from C#
While RDS allows csv bulk uploads directly from S3 instances, there are times when you just want to directly upload data straight from your program.
I've written a C# utility method which does inserts using a StringBuilder to concatenate statements to do 2000 inserts per call, which is way faster than an ORM like dapper which does one insert per call.
This method should handle date, int, double, and varchar fields, but I haven't had to use it for character escaping or anything like that.
//call as
FastInsert.Insert(MyDbConnection, new object[]{{someField = "someValue"}}, "my_table");
class FastInsert
{
static int rowSize = 2000;
internal static void Insert(IDbConnection connection, object[] data, string targetTable)
{
var props = data[0].GetType().GetProperties();
var names = props.Select(x => x.Name).ToList();
foreach(var batch in data.Batch(rowSize))
{
var sb = new StringBuilder($"insert into {targetTable} ({string.Join(",", names)})");
string lastLine = "";
foreach(var row in batch)
{
sb.Append(lastLine);
var values = props.Select(prop => CreateSQLString(row, prop));
lastLine = $"select '{string.Join("','", values)}' union all ";
}
lastLine = lastLine.Substring(0, lastLine.Length - " union all".Length) + " from dual";
sb.Append(lastLine);
var fullQuery = sb.ToString();
connection.Execute(fullQuery);
}
}
private static string CreateSQLString(object row, PropertyInfo prop)
{
var value = prop.GetValue(row);
if (value == null) return "null";
if (prop.PropertyType == typeof(DateTime))
{
return $"'{((DateTime)value).ToString("yyyy-MM-dd HH:mm:ss")}'";
}
//if (prop.PropertyType == typeof(string))
//{
return $"'{value.ToString().Replace("'", "''")}'";
//}
}
}
static class Extensions
{
public static IEnumerable<T[]> Batch<T>(this IEnumerable<T> source, int size) //split an IEnumerable into batches
{
T[] bucket = null;
var count = 0;
foreach (var item in source)
{
if (bucket == null)
bucket = new T[size];
bucket[count++] = item;
if (count != size)
continue;
yield return bucket;
bucket = null;
count = 0;
}
// Return the last bucket with all remaining elements
if (bucket != null && count > 0)
{
Array.Resize(ref bucket, count);
yield return bucket;
}
}
}

Related

Update context in SQL Server from ASP.NET Core 2.2

_context.Update(v) ;
_context.SaveChanges();
When I use this code then SQL Server adds a new record instead of updating the
current context
[HttpPost]
public IActionResult PageVote(List<string> Sar)
{
string name_voter = ViewBag.getValue = TempData["Namevalue"];
int count = 0;
foreach (var item in Sar)
{
count = count + 1;
}
if (count == 6)
{
Vote v = new Vote()
{
VoteSarparast1 = Sar[0],
VoteSarparast2 = Sar[1],
VoteSarparast3 = Sar[2],
VoteSarparast4 = Sar[3],
VoteSarparast5 = Sar[4],
VoteSarparast6 = Sar[5],
};
var voter = _context.Votes.FirstOrDefault(u => u.Voter == name_voter && u.IsVoted == true);
if (voter == null)
{
v.IsVoted = true;
v.Voter = name_voter;
_context.Add(v);
_context.SaveChanges();
ViewBag.Greeting = "رای شما با موفقیت ثبت شد";
return RedirectToAction(nameof(end));
}
v.IsVoted = true;
v.Voter = name_voter;
_context.Update(v);
_context.SaveChanges();
return RedirectToAction(nameof(end));
}
else
{
return View(_context.Applicants.ToList());
}
}
You need to tell the DbContext about your entity. If you do var vote = new Vote() vote has no Id. The DbContext see this and thinks you want to Add a new entity, so it simply does that. The DbContext tracks all the entities that you load from it, but since this is just a new instance, it has no idea about it.
To actually perform an update, you have two options:
1 - Load the Vote from the database in some way; If you get an Id, use that to find it.
// Loads the current vote by its id (or whatever other field..)
var existingVote = context.Votes.Single(p => p.Id == id_from_param);
// Perform the changes you want..
existingVote.SomeField = "NewValue";
// Then call save normally.
context.SaveChanges();
2 - Or if you don't want to load it from Db, you have to manually tell the DbContext what to do:
// create a new "vote"...
var vote = new Vote
{
// Since it's an update, you must have the Id somehow.. so you must set it manually
Id = id_from_param,
// do the changes you want. Be careful, because this can cause data loss!
SomeField = "NewValue"
};
// This is you telling the DbContext: Hey, I control this entity.
// I know it exists in the DB and it's modified
context.Entry(vote).State = EntityState.Modified;
// Then call save normally.
context.SaveChanges();
Either of those two approaches should fix your issue, but I suggest you read a little bit more about how Entity Framework works. This is crucial for the success (and performance) of your apps. Especially option 2 above can cause many many issues. There's a reason why the DbContext keep track of entities, so you don't have to. It's very complicated and things can go south fast.
Some links for you:
ChangeTracker in Entity Framework Core
Working with Disconnected Entity Graph in Entity Framework Core

How can I log something in USQL UDO?

I have custom extractor, and I'm trying to log some messages from it.
I've tried obvious things like Console.WriteLine, but cannot find where output is. However, I found some system logs in adl://<my_DLS>.azuredatalakestore.net/system/jobservice/jobs/Usql/.../<my_job_id>/.
How can I log something? Is it possible to specify log file somewhere on Data Lake Store or Blob Storage Account?
A recent release of U-SQL has added diagnostic logging for UDOs. See the release notes here.
// Enable the diagnostics preview feature
SET ##FeaturePreviews = "DIAGNOSTICS:ON";
// Extract as one column
#input =
EXTRACT col string
FROM "/input/input42.txt"
USING new Utilities.MyExtractor();
#output =
SELECT *
FROM #input;
// Output the file
OUTPUT #output
TO "/output/output.txt"
USING Outputters.Tsv(quoting : false);
This was my diagnostic line from the UDO:
Microsoft.Analytics.Diagnostics.DiagnosticStream.WriteLine(System.String.Format("Concatenations done: {0}", i));
This is the whole UDO:
using System.Collections.Generic;
using System.IO;
using System.Text;
using Microsoft.Analytics.Interfaces;
namespace Utilities
{
[SqlUserDefinedExtractor(AtomicFileProcessing = true)]
public class MyExtractor : IExtractor
{
//Contains the row
private readonly Encoding _encoding;
private readonly byte[] _row_delim;
private readonly char _col_delim;
public MyExtractor()
{
_encoding = Encoding.UTF8;
_row_delim = _encoding.GetBytes("\n\n");
_col_delim = '|';
}
public override IEnumerable<IRow> Extract(IUnstructuredReader input, IUpdatableRow output)
{
string s = string.Empty;
string x = string.Empty;
int i = 0;
foreach (var current in input.Split(_row_delim))
{
using (System.IO.StreamReader streamReader = new StreamReader(current, this._encoding))
{
while ((s = streamReader.ReadLine()) != null)
{
//Strip any line feeds
//s = s.Replace("/n", "");
// Concatenate the lines
x += s;
i += 1;
}
Microsoft.Analytics.Diagnostics.DiagnosticStream.WriteLine(System.String.Format("Concatenations done: {0}", i));
//Create the output
output.Set<string>(0, x);
yield return output.AsReadOnly();
// Reset
x = string.Empty;
}
}
}
}
}
And these were my results found in the following directory:
/system/jobservice/jobs/Usql/2017/10/20.../diagnosticstreams
good question. I have been asking myself the same thing. This is theoretical, but I think it would work (I'll updated if I find differently).
One very hacky way is that you could insert rows into a table with your log messages as a string column. Then you can select those out and filter based on some log_producer_id column. You also get the benefit of logging if part of the script works, but later parts do not assuming the failure does not roll back. Table can be dumped at end as well to file.
For the error cases, you can use the Job Manager in ADLA to open the job graph and then view the job output. The errors often have detailed information for data-related errors (e.g. row number in file with error and a octal/hex/ascii dump of the row with issue marked with ###).
Hope this helps,
J
ps. This isn't a comment or an answer really, since I don't have working code. Please provide feedback if the above ideas are wrong.

Dapper.Net and the DataReader

I have a very strange error with dapper:
there is already an open DataReader associated with this Command
which must be closed first
But I don't use DataReader! I just call select query on my server application and take first result:
//How I run query:
public static T SelectVersion(IDbTransaction transaction = null)
{
return DbHelper.DataBase.Connection.Query<T>("SELECT * FROM [VersionLog] WHERE [Version] = (SELECT MAX([Version]) FROM [VersionLog])", null, transaction, commandTimeout: DbHelper.CommandTimeout).FirstOrDefault();
}
//And how I call this method:
public Response Upload(CommitRequest message) //It is calling on server from client
{
//Prepearing data from CommitRequest
using (var tr = DbHelper.DataBase.Connection.BeginTransaction(IsolationLevel.Serializable))
{
int v = SelectQueries<VersionLog>.SelectVersion(tr) != null ? SelectQueries<VersionLog>.SelectVersion(tr).Version : 0; //Call my query here
int newVersion = v + 1; //update version
//Saving changes from CommitRequest to db
//Updated version saving to base too, maybe it is problem?
return new Response
{
Message = String.Empty,
ServerBaseVersion = versionLog.Version,
};
}
}
}
And most sadly that this exception appearing in random time, I think what problem in concurrent access to server from two clients.
Please help.
This some times happens if the model and database schema are not matching and an exception is being raised inside Dapper.
If you really want to get into this, best way is to include dapper source in your project and debug.

Using boolean fields with Magento ORM

I am working on a backend edit page for my custom entity. I have almost everything working, including saving a bunch of different text fields. I have a problem, though, when trying to set the value of a boolean field.
I have tried:
$landingPage->setEnabled(1);
$landingPage->setEnabled(TRUE);
$landingPage->setEnabled(0);
$landingPage->setEnabled(FALSE);
None seem to persist a change to my database.
How are you supposed to set a boolean field using magento ORM?
edit
Looking at my database, mysql is storing the field as a tinyint(1), so magento may be seeing this as an int not a bool. Still can't get it to set though.
This topic has bring curiosity to me. Although it has been answered, I'd like to share what I've found though I didn't do intense tracing.
It doesn't matter whether the cache is enabled / disabled, the table schema will be cached.
It will be cached during save process.
Mage_Core_Model_Abstract -> save()
Mage_Core_Model_Resource_Db_Abstract -> save(Mage_Core_Model_Abstract $object)
Mage_Core_Model_Resource_Db_Abstract
public function save(Mage_Core_Model_Abstract $object)
{
...
//any conditional will eventually call for:
$this->_prepareDataForSave($object);
...
}
protected function _prepareDataForSave(Mage_Core_Model_Abstract $object)
{
return $this->_prepareDataForTable($object, $this->getMainTable());
}
Mage_Core_Model_Resource_Abstract
protected function _prepareDataForTable(Varien_Object $object, $table)
{
$data = array();
$fields = $this->_getWriteAdapter()->describeTable($table);
foreach (array_keys($fields) as $field) {
if ($object->hasData($field)) {
$fieldValue = $object->getData($field);
if ($fieldValue instanceof Zend_Db_Expr) {
$data[$field] = $fieldValue;
} else {
if (null !== $fieldValue) {
$fieldValue = $this->_prepareTableValueForSave($fieldValue, $fields[$field]['DATA_TYPE']);
$data[$field] = $this->_getWriteAdapter()->prepareColumnValue($fields[$field], $fieldValue);
} else if (!empty($fields[$field]['NULLABLE'])) {
$data[$field] = null;
}
}
}
}
return $data;
}
See the line: $fields = $this->_getWriteAdapter()->describeTable($table);
Varien_Db_Adapter_Pdo_Mysql
public function describeTable($tableName, $schemaName = null)
{
$cacheKey = $this->_getTableName($tableName, $schemaName);
$ddl = $this->loadDdlCache($cacheKey, self::DDL_DESCRIBE);
if ($ddl === false) {
$ddl = parent::describeTable($tableName, $schemaName);
/**
* Remove bug in some MySQL versions, when int-column without default value is described as:
* having default empty string value
*/
$affected = array('tinyint', 'smallint', 'mediumint', 'int', 'bigint');
foreach ($ddl as $key => $columnData) {
if (($columnData['DEFAULT'] === '') && (array_search($columnData['DATA_TYPE'], $affected) !== FALSE)) {
$ddl[$key]['DEFAULT'] = null;
}
}
$this->saveDdlCache($cacheKey, self::DDL_DESCRIBE, $ddl);
}
return $ddl;
}
As we can see:
$ddl = $this->loadDdlCache($cacheKey, self::DDL_DESCRIBE);
will try to load the schema from cache.
If the value is not exists: if ($ddl === false)
it will create one: $this->saveDdlCache($cacheKey, self::DDL_DESCRIBE, $ddl);
So the problem that occurred in this question will be happened if we ever save the model that is going to be altered (add column, etc).
Because it has ever been $model->save(), the schema will be cached.
Later after he add new column and "do saving", it will load the schema from cache (which is not containing the new column) and resulting as: the data for new column is failed to be saved in database
Delete var/cache/* - your DB schema is cached by Magento even though the new column is already added to the MySQL table.

Export SQL query data to Excel

I have a query that returns a very large data set. I cannot copy and paste it into Excel which I usually do. I have been doing some research on how to export directly to an Excel sheet. I am running SQL SERVER 2008 on a server running Microsoft Server 2003. I am trying to use the Microsoft.Jet.OLEDB.4.0 data provider and Excel 2007. I've pieced together a small piece of code that looks like this from what I've seen in examples.
INSERT INTO OPENDATASOURCE('Microsoft.Jet.OLEDB.4.0',
'Data Source=C:\Working\Book1.xlsx;Extended Properties=EXCEL 12.0;HDR=YES')
SELECT productid, price FROM dbo.product
However this is not working, I am getting an error message saying
"Incorrect syntax near the keyword 'SELECT'".
Does anyone have any ideas about how to do this or possibly a better approach?
I don't know if this is what you're looking for, but you can export the results to Excel like this:
In the results pane, click the top-left cell to highlight all the records, and then right-click the top-left cell and click "Save Results As". One of the export options is CSV.
You might give this a shot too:
INSERT INTO OPENROWSET
('Microsoft.Jet.OLEDB.4.0',
'Excel 8.0;Database=c:\Test.xls;','SELECT productid, price FROM dbo.product')
Lastly, you can look into using SSIS (replaced DTS) for data exports. Here is a link to a tutorial:
http://www.accelebrate.com/sql_training/ssis_2008_tutorial.htm
== Update #1 ==
To save the result as CSV file with column headers, one can follow the steps shown below:
Go to Tools->Options
Query Results->SQL Server->Results to Grid
Check “Include column headers when copying or saving results”
Click OK.
Note that the new settings won’t affect any existing Query tabs — you’ll need to open new ones and/or restart SSMS.
If you're just needing to export to excel, you can use the export data wizard.
Right click the database, Tasks->Export data.
I had a similar problem but with a twist - the solutions listed above worked when the resultset was from one query but in my situation, I had multiple individual select queries for which I needed results to be exported to Excel. Below is just an example to illustrate although I could do a name in clause...
select a,b from Table_A where name = 'x'
select a,b from Table_A where name = 'y'
select a,b from Table_A where name = 'z'
The wizard was letting me export the result from one query to excel but not all results from different queries in this case.
When I researched, I found that we could disable the results to grid and enable results to Text. So, press Ctrl + T, then execute all the statements. This should show the results as a text file in the output window. You can manipulate the text into a tab delimited format for you to import into Excel.
You could also press Ctrl + Shift + F to export the results to a file - it exports as a .rpt file that can be opened using a text editor and manipulated for excel import.
Hope this helps any others having a similar issue.
For anyone coming here looking for how to do this in C#, I have tried the following method and had success in dotnet core 2.0.3 and entity framework core 2.0.3
First create your model class.
public class User
{
public string Name { get; set; }
public int Address { get; set; }
public int ZIP { get; set; }
public string Gender { get; set; }
}
Then install EPPlus Nuget package. (I used version 4.0.5, probably will work for other versions as well.)
Install-Package EPPlus -Version 4.0.5
The create ExcelExportHelper class, which will contain the logic to convert dataset to Excel rows. This class do not have dependencies with your model class or dataset.
public class ExcelExportHelper
{
public static string ExcelContentType
{
get
{ return "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"; }
}
public static DataTable ListToDataTable<T>(List<T> data)
{
PropertyDescriptorCollection properties = TypeDescriptor.GetProperties(typeof(T));
DataTable dataTable = new DataTable();
for (int i = 0; i < properties.Count; i++)
{
PropertyDescriptor property = properties[i];
dataTable.Columns.Add(property.Name, Nullable.GetUnderlyingType(property.PropertyType) ?? property.PropertyType);
}
object[] values = new object[properties.Count];
foreach (T item in data)
{
for (int i = 0; i < values.Length; i++)
{
values[i] = properties[i].GetValue(item);
}
dataTable.Rows.Add(values);
}
return dataTable;
}
public static byte[] ExportExcel(DataTable dataTable, string heading = "", bool showSrNo = false, params string[] columnsToTake)
{
byte[] result = null;
using (ExcelPackage package = new ExcelPackage())
{
ExcelWorksheet workSheet = package.Workbook.Worksheets.Add(String.Format("{0} Data", heading));
int startRowFrom = String.IsNullOrEmpty(heading) ? 1 : 3;
if (showSrNo)
{
DataColumn dataColumn = dataTable.Columns.Add("#", typeof(int));
dataColumn.SetOrdinal(0);
int index = 1;
foreach (DataRow item in dataTable.Rows)
{
item[0] = index;
index++;
}
}
// add the content into the Excel file
workSheet.Cells["A" + startRowFrom].LoadFromDataTable(dataTable, true);
// autofit width of cells with small content
int columnIndex = 1;
foreach (DataColumn column in dataTable.Columns)
{
int maxLength;
ExcelRange columnCells = workSheet.Cells[workSheet.Dimension.Start.Row, columnIndex, workSheet.Dimension.End.Row, columnIndex];
try
{
maxLength = columnCells.Max(cell => cell.Value.ToString().Count());
}
catch (Exception) //nishanc
{
maxLength = columnCells.Max(cell => (cell.Value +"").ToString().Length);
}
//workSheet.Column(columnIndex).AutoFit();
if (maxLength < 150)
{
//workSheet.Column(columnIndex).AutoFit();
}
columnIndex++;
}
// format header - bold, yellow on black
using (ExcelRange r = workSheet.Cells[startRowFrom, 1, startRowFrom, dataTable.Columns.Count])
{
r.Style.Font.Color.SetColor(System.Drawing.Color.White);
r.Style.Font.Bold = true;
r.Style.Fill.PatternType = OfficeOpenXml.Style.ExcelFillStyle.Solid;
r.Style.Fill.BackgroundColor.SetColor(Color.Brown);
}
// format cells - add borders
using (ExcelRange r = workSheet.Cells[startRowFrom + 1, 1, startRowFrom + dataTable.Rows.Count, dataTable.Columns.Count])
{
r.Style.Border.Top.Style = ExcelBorderStyle.Thin;
r.Style.Border.Bottom.Style = ExcelBorderStyle.Thin;
r.Style.Border.Left.Style = ExcelBorderStyle.Thin;
r.Style.Border.Right.Style = ExcelBorderStyle.Thin;
r.Style.Border.Top.Color.SetColor(System.Drawing.Color.Black);
r.Style.Border.Bottom.Color.SetColor(System.Drawing.Color.Black);
r.Style.Border.Left.Color.SetColor(System.Drawing.Color.Black);
r.Style.Border.Right.Color.SetColor(System.Drawing.Color.Black);
}
// removed ignored columns
for (int i = dataTable.Columns.Count - 1; i >= 0; i--)
{
if (i == 0 && showSrNo)
{
continue;
}
if (!columnsToTake.Contains(dataTable.Columns[i].ColumnName))
{
workSheet.DeleteColumn(i + 1);
}
}
if (!String.IsNullOrEmpty(heading))
{
workSheet.Cells["A1"].Value = heading;
// workSheet.Cells["A1"].Style.Font.Size = 20;
workSheet.InsertColumn(1, 1);
workSheet.InsertRow(1, 1);
workSheet.Column(1).Width = 10;
}
result = package.GetAsByteArray();
}
return result;
}
public static byte[] ExportExcel<T>(List<T> data, string Heading = "", bool showSlno = false, params string[] ColumnsToTake)
{
return ExportExcel(ListToDataTable<T>(data), Heading, showSlno, ColumnsToTake);
}
}
Now add this method where you want to generate the excel file, probably for a method in the controller. You can pass parameters for your stored procedure as well. Note that the return type of the method is FileContentResult. Whatever query you execute, important thing is you must have the results in a List.
[HttpPost]
public async Task<FileContentResult> Create([Bind("Id,StartDate,EndDate")] GetReport getReport)
{
DateTime startDate = getReport.StartDate;
DateTime endDate = getReport.EndDate;
// call the stored procedure and store dataset in a List.
List<User> users = _context.Reports.FromSql("exec dbo.SP_GetEmpReport #start={0}, #end={1}", startDate, endDate).ToList();
//set custome column names
string[] columns = { "Name", "Address", "ZIP", "Gender"};
byte[] filecontent = ExcelExportHelper.ExportExcel(users, "Users", true, columns);
// set file name.
return File(filecontent, ExcelExportHelper.ExcelContentType, "Report.xlsx");
}
More details can be found here
I see that you’re trying to export SQL data to Excel to avoid copy-pasting your very large data set into Excel.
You might be interested in learning how to export SQL data to Excel and update the export automatically (with any SQL database: MySQL, Microsoft SQL Server, PostgreSQL).
To export data from SQL to Excel, you need to follow 2 steps:
Step 1: Connect Excel to your SQL database‍ (Microsoft SQL Server, MySQL, PostgreSQL...)
Step 2: Import your SQL data into Excel
The result will be the list of tables you want to query data from your SQL database into Excel:

Step1: Connect Excel to an external data source: your SQL database
Install An ODBC
Install A Driver
Avoid A Common Error
Create a DSN
Step 2: Import your SQL data into Excel
Click Where You Want Your Pivot Table
Click Insert
Click Pivot Table
Click Use an external data source, then Choose Connection
Click on the System DSN tab
Select the DSN created in ODBC Manager
Fill the requested username and password
Avoid a Common Error
Access The Microsoft Query Dialog Box
Click on the arrow to see the list of tables in your database
Select the table you want to query data from your SQL database into Excel
Click on Return Data when you’re done with your selection
To update the export automatically, there are 2 additional steps:
Create a Pivot Table with an external SQL data source
Automate Your SQL Data Update In Excel With The GETPIVOTDATA Function
I’ve created a step-by-step tutorial about this whole process, from connecting Excel to SQL, up to having the whole thing automatically updated. You might find the detailed explanations and screenshots useful.