Getting long 'dirty' strings from SQL Server database into a 'clean' excel file - sql

I Have a table in which comments are kept about clients. This is an open field and be very long and include line breaks.
When I try and export this to Excel, the data is misaligned. I'd like to return as much of the comment as possible in an excel cell, without anything like a line break.
Is there a way I could do this in Excel? (Find and replace)
Is there a way to structure my SQL query to only return what I can fit?
Or is there a better way?

I found the best way to deal with this is to enclose all suspect String columns with Speech marks "" and then in excel under the text to columns option make sure to select speech marks as a text qualifier.
This always worked for me.
Just be sure to remove speech marks from the string column in question otherwise it will split it again.
Another method i used was to used an obscure delimiter like an Ibar | which was not likely to be found in my data and by again using the Text to columns option i specified the IBar as the column separator which did just what i needed.
T

Related

Can't export SQL results as excel file when commas are in the description column text

I am seeing an issue here. I have a sql database with over 10,000 records. There is a description column that contains user input from our support website. Some users put commas into their description for grammar purposes. When I go to export my sql results as a excel file, the commas in the user description text mess up the arrangement of the file. I need to export as what's in the SQL cells and not every time it sees a comma. Please help?
I believe if you wrap each output field in quotes, Excel should know to treat that as one field.
I hope this helps.
Thank you, I also did a replace within the database and replaced all the commas with a space, and then replaced all the tabs and line breaks with a space as well. The new line delimiter was making excel think it was a new cell. I opened the excel file in notepad++ to see all of the LF's and CRLF's and then just searched+replaced the ascii sequence of the two in SQL with a space. LF's, commas, and tabs, are all non important characters to preserve. Thanks again. -Chris

SQL Parse NVARCHAR Field

I am loading data from Excels into database on SQL Server 2008. There is one column which is in nvarchar data type. This field contains the data as
Text text text text text text text text text text.
(ABC-2010-4091, ABC-2011-0586, ABC-2011-0587, ABC-2011-0604)
Text text text text text text text text text text.
(ABC-2011-0562, ABC-2011-0570, ABC-2011-0575, ABC-2011-0588)
so its text with many sentences of this kind.
For each row I need to get the data ABC-####-####, respectivelly I only need the last part. So e.g. for ABC-2010-4091 I need to obtain 4091. This number I will need to join to other table. I guess it would be enough to get the last parts of the format ABC-####-####, then I should be able to handle the request.
So the example of given above, the result should be 4091, 0586, 0587, 0604, 0562, 0570, 0575, 0588 in the row instead of the whole nvarchar value field.
Is this possible somehow? The text in the nvarchar field differ, but the text format (ABC-####-####) I want to work with is still the same. Only the count of characters for the last part may vary so its not only 4 numbers, but could be 5 or more.
What is the best approach to get these data? Should I parse it in SSIS or on the SQL server side with SQL Query? And how?
I am aware this is though task. I appreciate every help or advice how to deal with this. I have not tried anything yet as I do not know where to start. I read articles about SQL parsing, but I want to ask for best approach to deal with this task.
Stackoverflow is about programming.
Sit down and start programming.
Ok, seriously. That is string parsing and the last part in brackets with multiple fields means no bulk import, it is not a standard CSV file.
Either you use SSIS in SQL Server and program the parsing there or.... you write a program for that.
String maniupation in SQL is the worst part of the language and I would avoid it.
So, yes, sit down and program a routine. Probable the fastest way.
If I understand correctly, "ABS-####-####" will be the value coming through in the column and the numeric part is variable in length.
If that is the case, maybe this will work.
Use a "Derived Column" transformation.
Lets say we call "ABC-####-####" = Column1
SUBSTRING("Column1",(FINDSTRING("Column1","-",2)+1),LEN(Column1)-(FINDSTRING("Column1","-",2)))
If I am not mistaken, that should give you the last # values in a new column no matter how long that value is.
HTH
I have worked this problem out with the following guides:
Split Multi Value Column into Multiple Records &
Remove Multiple Spaces with Only One Space

Replace all occurrences of a substring in a database text field

I have a database that has around 10k records and some of them contain HTML characters which I would like to replace.
For example I can find all occurrences:
SELECT * FROM TABLE
WHERE TEXTFIELD LIKE '%&#47%'
the original string example:
this is the cool mega string that contains &#47
how to replace all &#47 with / ?
The end result should be:
this is the cool mega string that contains /
If you want to replace a specific string with another string or transformation of that string, you could use the "replace" function in postgresql. For instance, to replace all occurances of "cat" with "dog" in the column "myfield", you would do:
UPDATE tablename
SET myfield = replace(myfield,"cat", "dog")
You could add a WHERE clause or any other logic as you see fit.
Alternatively, if you are trying to convert HTML entities, ASCII characters, or between various encoding schemes, postgre has functions for that as well. Postgresql String Functions.
The answer given by #davesnitty will work, but you need to think very carefully about whether the text pattern you're replacing could appear embedded in a longer pattern you don't want to modify. Otherwise you'll find someone's nooking a fire, and that's just weird.
If possible, use a suitable dedicated tool for what you're un-escaping. Got URLEncoded text? use a url decoder. Got XML entities? Process them though an XSLT stylesheet in text mode output. etc. These are usually safer for your data than hacking it with find-and-replace, in that find and replace often has unfortunate side effects if not applied very carefully, as noted above.
It's possible you may want to use a regular expression. They are not a universal solution to all problems but are really handy for some jobs.
If you want to unconditionally replace all instances of "&#47" with "/", you don't need a regexp.
If you want to replace "&#47" but not "&#471", you might need a regexp, because you can do things like match only whole words, match various patterns, specify min/max runs of digits, etc.
In the PostgreSQL string functions and operators documentation you'll find the regexp_replace function, which will let you apply a regexp during an UPDATE statement.
To be able to say much more I'd need to know what your real data is and what you're really trying to do.
If you don't have postgres, you can export all database to a sql file, replace your string with a text editor and delete your db on your host, and re-import your new db
PS: be careful

Testing a CSV - how far should I go?

I'm generating a CSV which contains several rows and columns.
However, when I'm testing said CSV I feel like I am simply repeating the code that builds the file in the test as I'm checking each and every field is correct.
Question is, is this more sensible than it seems to me, or is there a better way?
A far simpler test is to just import the CSV into a spreadsheet or database and verify the data output is aligned to the proper fields. No extra columns or extra rows, data selected from the imported recordset is a perfect INTERSECT with the recordset from which the CSV was generated, etc.
More importantly, I recommend making sure your test data includes common CSV fail scenarios such as:
Field contains a comma (or whatever your separator character)
Field contains multiple commas (You might think it's the same thing, but I've seen one fail where the other succeeded)
Field contains the new-row character(s)
Field contains characters not in the code page of the CSV file
...to make sure your code is handling them properly.

Converting Gridview into CSV

Is this possible to take delimeter other than comma for converting into CSV file....because in my scenario my gridview cell contains data with commas.
Well the 'C' in C‍SV does stand for "Comma".
That said, depending on what the purpose/destination of your "CSV" output is, I can see two options:
If your program is the only recipient, use whatever you like. Heck, something like the built in serialises might be easier.
Otherwise, follow the CSV format and double quote your values.
There is a lot a more useful information in this question, and this one.