Excel to SQL direct import error - sql-server-2005

Working for a considerable time on cracking some sales data, I came across an error which started to bug me for so real, eating my time of work. After so much of an effort, I was so fed up and nearly to give up on un-importable records.
The scenario:
Bulk sales data comes on txt/csv format needs to be imported to SQL database and then matched with Address History information available on a combination of tables by verifying strings directly from field to field.
If codes matched, need to run a script to update few tables with data. If not matched, need to insert a whole bunch of data in to different tables to create ID which requires for the final sales import.
Most of the was matching, except for few which was giving the trouble. I just needed to import those to history tables. Then started the problems, even though, I updated them, i couldn't match them.
After some much of frustrated hours, I just asked my girl-friend to check when there any error in the string, I worked with.
The string is "Bramhall Stockport" to be matched to "Bramhall   Stockport". For SQL script, these two strings are not matching.
I bet if you copy and paste on your table this would match, coz now this is txt format.
Then, Ana figured the error (She is not a computer geek, Architecture Masters), by simply coping and pasting on Microsoft Word 2007.
Screenshot: http://www.contentbcc.com/Anushka/sql_xls.png
Do you see the difference? First is in the txt/csv file and second on the SQL table.

In the first one, you have three regular spaces (ascii 20). In the second one, you have a regular space followed by a non-breaking space, (unicode 0xA0). In excel you can do a search and replace with ALT+0160 as the search and a space character as the replacement to fix it.

Related

How to find Bad characters in the column

I am trying to pull 'COURSE_TITLE' column value from 'PS_TRAINING' table in PeopleSoft and writing into UTF-8 text file to get loaded into Workday system. The file is erroring out while loading because of bad characters(Ã â and many more) present in the column. I have used a procedure which will convert non-ascii value into space. But because of this procedure, the 'Course_Title' which are written in non-english language like Chinese, Korean, Spanish also replacing with spaces.
I even tried using regular expressions (``regexp_like(course_title, 'Ã) only to find bad characters but since the table has hundreds of thousands of rows, it would be difficult to find all bad characters. Please suggest a way to solve this.
If you change your approach, this may work.
Define what you want, and retrieve it.
select *
from PS_TRAINING
where not regexp_like(course_title, '[0-9A-Za-z]')```
If you take out too much data, just add it to the regex

Excel data type issues

I am using MS query to pull data from sql server and all is good.
Problem starts when data comes from the server I am stuck with data type general for everything, and no way to change the data type in excel.
Main issue is numbers, where in database datatype is decimal yet i can do no calculations on it in excel. Any help would be appreciated.
I am using excel to execute a stored procedure on server
This pulls the data into the following table
Even though the data in the sql server for column price is formatted as decimal it becomes a general data type after getting to excel.
Changing it to number/currency etc. does not change anything.
Also no errors appear. Simply data comes down and no matter what changes in excel I apply nothing changes it all is treated as text.
You can do these things.
Select Column
Click Data-> Text to Columns
Follow the wizard
Set the format
Use this official support ticket from Microsoft
Problem in this case was created by myself.
But I suppose it could easily happen to others who are just starting on their path with sql and excel.
Here is what happened as I established after few days of going in circles.
as there was load of trailing spaces in the data coming down from the server I have decided to tidy things up.
Without considerring implications I have stuck an RTRIM() on everything.
This caused excel to treat everything as strings as string RTRIM is a built in string function.
What made things worse is the fact that when using power query I was able to transform the data to the desired, formats.
Unfortunately MS query does not seem to be quite as clever as power query hence the issues.

Column size of Google Big Query

I am populating the data from server to google big query. One of the attributes in the table is a string that has close to 150+ characters in it.
For example, "Had reseller test devices in a vehicle with known working device
Set to power cycle, never got green light Checked with cell provider and all SIMs were active all cases the modem appears to be dead,light in all but not green light".
Table in GBQ gets populated until it hits this specific attribute. When this attribute is about to load, this does not get loaded in the single cell. It gets splitted into different cells and it corroupts the table.
Is there any restriction on each field of the GBQ? Any information regarding this would be appreciated.
My guess is that quote and comma characters in the CSV data are confusing the CSV parser. For example, if one of your fields is hello, world, this will look like two separate fields. The way around this is to quote the field, so you'd need "hello, world". This, of course, has problems if you have embedded quotes in the field. For instance if you wanted to have a field that said She said, "Hello, world", you would either need to escape the quotes by doubling the internal quotes, as in "She said, ""Hello, world""", or by using a different field separator (for instance, |) and dropping the quote separator (using \0).
One final complication is if you have embedded newlines in your field. If you have Hello\nworld, this means you need to set the allow_quoted_newlines on the load job configuration. The downside is that large files will be slower to import with this option, since they can't be done in parallel.
These configuration options are all described here, and can be used via either the web UI or the bq command line shell.
I'm not sure there is a limit imposed, and certainly I have seen string fields with over 8,000 characters.
Can you please clarify, 'When this attribute is about to load, this does not get loaded in the single cell. It gets splitted into different cells and it corroupts the table.'? Does this happen every time? Could it be associated with certain punctuation?

Testing a CSV - how far should I go?

I'm generating a CSV which contains several rows and columns.
However, when I'm testing said CSV I feel like I am simply repeating the code that builds the file in the test as I'm checking each and every field is correct.
Question is, is this more sensible than it seems to me, or is there a better way?
A far simpler test is to just import the CSV into a spreadsheet or database and verify the data output is aligned to the proper fields. No extra columns or extra rows, data selected from the imported recordset is a perfect INTERSECT with the recordset from which the CSV was generated, etc.
More importantly, I recommend making sure your test data includes common CSV fail scenarios such as:
Field contains a comma (or whatever your separator character)
Field contains multiple commas (You might think it's the same thing, but I've seen one fail where the other succeeded)
Field contains the new-row character(s)
Field contains characters not in the code page of the CSV file
...to make sure your code is handling them properly.

Full Text Searching for single characters

I have a table with a TEXT column where the contents is just strings of CSV numbers. Example ",1,76,77,115," Each string can have an arbitrary number of numbers.
I am trying to set up Full Text Indexing so that I can search this column rapidly. This works great. Instead of running queries with
where MY_COL LIKE '%,77,%' and MY_COL LIKE '%,115,%'
I can do
where CONTAINS(MY_COL,'77 and 115')
However, when I try to search for a single character it doesn't work.
where CONTAINS(MY_COL,'1')
But I know that there should be records returned! I quickly found that I need to edit the Noise file and rebuild the index. But even after doing that it still doesn't work.
Working with relational databases that way is going to hurt.
Use a proper schema. Either store the values in different rows or use an array datatype for the column.
That will make solving the problem trivial.
I fixed my own problem, although I'm not exactly sure what fixed it.
I dropped my table and populated a new one (my program does batch processing) and created a new Full Text Index. Maybe I wasn't being patient enough to allow the indexing to fully rebuild.
Agreed. How does 12,15,33 not return that record for a search for 1 with fulltext? Use an actual table schema to accomplish this.