BCP error if last column is empty - bcp

My bcp app is getting an error if the last column in the datafile is empty. This causes an error:
XX,YY,42,0,2,201501,652,,
This doesn't:
XX,YY,42,0,2,201501,652,,0
Unfortunately I can't specify a zero instead of a null. The destination table allows null on every column. The datatype is float (the last three columns are floats in fact). Here's the format file:
8.0
9
1 SQLCHAR 0 10 "," 1 NOT SQL_Latin1_General_CP1_CI_AS
2 SQLCHAR 0 10 "," 2 VIOLATING SQL_Latin1_General_CP1_CI_AS
3 SQLCHAR 0 10 "," 3 COMPANY SQL_Latin1_General_CP1_CI_AS
4 SQLCHAR 0 2 "," 4 POLICY SQL_Latin1_General_CP1_CI_AS
5 SQLCHAR 0 2 "," 5 ON SQL_Latin1_General_CP1_CI_AS
6 SQLCHAR 0 6 "," 6 INFORMATION SQL_Latin1_General_CP1_CI_AS
7 SQLCHAR 0 25 "," 7 SECURITY ""
8 SQLCHAR 0 25 "," 8 QTY2 ""
9 SQLCHAR 0 25 "\n" 9 QTY1 ""
The error:
Row 1, Column 9: Invalid character value for cast specification

Related

SQL Does it make sense to re-order database instead of using ORDER BY to increase performance?

I have a database with around 120.000 Entries and I need to do substring comparisons (where ... like 'test%') for an autocomplete function. The database won't change.
I have a column called "relevance" and for my searches I want them to be ordered by relevance DESC. I noticed, that as soon as I add the "ORDER BY relevance DESC" to my queries, the execution time increases by about 100% - since my queries already take around 100ms on average, this causes significant lag.
Does it make sense to re-order the whole database by relevance once so I can remove the ORDER BY? Can I be certain, that when searching through the table with SQL it will always go through the database in the order that I added the rows?
This is how my query looks like right now:
select *
from hao2_dict
where definitions like 'ba%'
or searchable_pinyin like 'ba%'
ORDER BY relevance DESC
LIMIT 100
UPDATE: For context, here is my DB structure:
And some time measurements:
Using an Index (relevance DESC) for the search term 'b%' gives me 50ms, which is faster than not using an Index. But the search term 'banana%' takes over 1700ms which is way slower than not using an Index. These are the results from 'explain':
b%:
0 Init 0 27 0 0
1 Noop 1 11 0 0
2 Integer 100 1 0 0
3 OpenRead 0 5 0 9 0
4 OpenRead 2 4223 0 k(2,-,) 0
5 Rewind 2 26 2 0 0
6 DeferredSeek 2 0 0 0
7 Column 0 6 4 0
8 Function 1 3 2 like(2) 0
9 If 2 13 0 0
10 Column 0 4 6 0
11 Function 1 5 2 like(2) 0
12 IfNot 2 25 1 0
13 IdxRowid 2 7 0 0
14 Column 0 1 8 0
15 Column 0 2 9 0
16 Column 0 3 10 0
17 Column 0 4 11 0
18 Column 0 5 12 0
19 Column 0 6 13 0
20 Column 0 7 14 0
21 Column 2 0 15 0
22 RealAffinity 15 0 0 0
23 ResultRow 7 9 0 0
24 DecrJumpZero 1 26 0 0
25 Next 2 6 0 1
26 Halt 0 0 0 0
27 Transaction 0 0 10 0 1
28 String8 0 3 0 b% 0
29 String8 0 5 0 b% 0
30 Goto 0 1 0 0
banana%:
0 Init 0 27 0 0
1 Noop 1 11 0 0
2 Integer 100 1 0 0
3 OpenRead 0 5 0 9 0
4 OpenRead 2 4223 0 k(2,-,) 0
5 Rewind 2 26 2 0 0
6 DeferredSeek 2 0 0 0
7 Column 0 6 4 0
8 Function 1 3 2 like(2) 0
9 If 2 13 0 0
10 Column 0 4 6 0
11 Function 1 5 2 like(2) 0
12 IfNot 2 25 1 0
13 IdxRowid 2 7 0 0
14 Column 0 1 8 0
15 Column 0 2 9 0
16 Column 0 3 10 0
17 Column 0 4 11 0
18 Column 0 5 12 0
19 Column 0 6 13 0
20 Column 0 7 14 0
21 Column 2 0 15 0
22 RealAffinity 15 0 0 0
23 ResultRow 7 9 0 0
24 DecrJumpZero 1 26 0 0
25 Next 2 6 0 1
26 Halt 0 0 0 0
27 Transaction 0 0 10 0 1
28 String8 0 3 0 banana% 0
29 String8 0 5 0 banana% 0
30 Goto 0 1 0 0
Can I be certain, that when searching through the table with SQL it will always go through the database in the order that I added the rows?
No. SQL results have no inherent order. They might come out in the order you inserted them, but there is no guarantee.
Instead, put an index on the column. Indexes keep their values in order.
However, this will only deal with the sorting. In the query above it still has to search the whole table for rows with matching definitions and searchable_pinyins. In general, SQL will only use one index per table at a time; usually trying to use two is inefficient. So you need one multi-column index to make this query not have to search the whole table and get the results in sorted order. Make sure relevance is first, you need to have the index columns in the same order as your order by.
(relevance, definitions, searchable_pinyins) will make that query use only the index for searching and sorting. Adding (relevance, searchable_pinyins) as well will handle searching by definitions, searchable_pinyins, or both.

invalid column number in Formatfile sql

I looked at this one (Bulk inserting a csv in SQL using a formatfile to remove double quotes) but my situation is just different enough.
First, how would I upload this very lengthy format file? Looking at Github but just not clear enough.
This is the error I get
bulk insert equi2022a
From 'C:\Users\someone\Desktop\equi.txt'
WITH (FORMATFILE = 'C:\Users\someone\Desktop\formatfileequi-2.txt'
);
Msg 4823, Level 16, State 1, Line 1
Cannot bulk load. Invalid column number in the format file
"C:\Users\someone\Desktop\formatfileequi-2.txt".
I created this manually and here is a small snippet of it. I painstakingly went through every row to make sure that it was in perfect order....1,2,3,4,... and so on until 122 in both of the columns designated for this.
11.0
122
1 SQLCHAR 0 01 "" 1 transcode ""
2 SQLCHAR 0 02 "" 2 stfips ""
3 SQLCHAR 0 04 "" 3 year ""
4 SQLCHAR 0 01 "" 4 qtr ""
5 SQLCHAR 0 10 "" 5 uiacct ""
6 SQLCHAR 0 05 "" 6 run ""
7 SQLCHAR 0 09 "" 7 ein ""
8 SQLCHAR 0 10 "" 8 presesaid ""
9 SQLCHAR 0 05 "" 9 predrun ""
10 SQLCHAR 0 10 "" 10 succuiacct ""
11 SQLCHAR 0 05 "" 11 succrun ""
12 SQLCHAR 0 35 "" 12 legalname ""
and then the ending
115 SQLCHAR 0 10 "" 115 wrlargestcontribsucc""
116 SQLCHAR 0 06 "" 116 wrcountlargestcontrib""
117 SQLCHAR 0 06 "" 117 wrhires ""
118 SQLCHAR 0 06 "" 118 wrseparations""
119 SQLCHAR 0 06 "" 119 wrnewentrants""
120 SQLCHAR 0 60 "" 120 wrexits ""
121 SQLCHAR 0 06 "" 121 wrcontrecords""
122 SQLCHAR 0 78 "" 122 Blank7 ""

How to delete repeated rows if column 2 and column 3 matches using awk?

I have a file with 4 columns:
ifile.txt
3 5 2 2
1 4 2 1
4 5 7 2
5 5 7 1
0 0 1 1
I would like to delete the repeated rows whose column 2 & 3 are same. for instance, row 3 & 4 has same values in column 2 & 3. So I wnat to keep the 3rd row and delete 4th row. my output is:
ofile.txt
3 5 2 2
1 4 2 1
4 5 7 2
0 0 1 1
awk 'NR==FNR{a[$2,$3]++;next}a[$2,$3]==1' file file
3 5 2 2
1 4 2 1
0 0 1 1
GNU awk
awk '{a[NR]=$2""$3} a[NR]!=a[NR-1]{print}' file
Save $2 and $3 value to array a with index as NR. Then if value of a in current line and previous line doesn't match print line else ignore.

Pandas Dynamic Index Referencing during Calculation

I have the following data frame
val sum
0 1 0
1 2 0
2 3 0
3 4 0
4 5 0
5 6 0
6 7 0
I would like to calculate the sum of the next three rows' (including the current row) values. I need to do this for very big files. What is the most efficient way? The expected result is
val sum
0 1 6
1 2 9
2 3 12
3 4 15
4 5 18
5 6 13
6 7 7
In general, how can I dynamically referencing to other rows (via boolean operations) while making assignments?
> pd.rolling_sum(df['val'], window=3).shift(-2)
0 6
1 9
2 12
3 15
4 18
5 NaN
6 NaN
If you want the last values to be "filled in" then you'll need to tack on NaN's to the end of your dataframe.

BCP file format for SQL bulk insert of CSV file

I'm trying a bulk insert of a csv file into a SQL table using BCP but can't fix this error: "The column is too long in the data file for row 1, column 2. Verify that the field terminator and row terminator are specified correctly." - Can anyone help please?
Here's my SQL code:
BULK INSERT UKPostCodeStaging
FROM 'C:\Users\user\Desktop\Data\TestFileOf2Records.csv'
WITH (
DATAFILETYPE='char',
FIRSTROW = 1,
FORMATFILE = 'C:\Users\User\UKPostCodeStaging.fmt');
Here's my test data contained in TestFileOf2Records.csv:
"HS1 2AA",10,14,93,"S923","","S814","","S1213","S132605"
"HS1 2AD",10,14,93,"S923","","S814","","S1213","S132605"
And here's my BCP file that I have attempted to edit appropriately:
10.0
11
1 SQLCHAR 0 0 "\"" 0 FIRST_QUOTE SQL_Latin1_General_CP1_CI_AS
2 SQLCHAR 8 0 "\"," 1 PostCode SQL_Latin1_General_CP1_CI_AS
3 SQLINT 1 0 "," 2 PositionalQualityIndicator ""
4 SQLINT 1 0 "," 3 MetresEastOfOrigin ""
5 SQLINT 1 0 ",\"" 4 MetresNorthOfOrigin ""
6 SQLCHAR 8 0 "\",\"" 5 CountryCode SQL_Latin1_General_CP1_CI_AS
7 SQLCHAR 8 0 "\",\"" 6 NHSRegionalHACode SQL_Latin1_General_CP1_CI_AS
8 SQLCHAR 8 0 "\",\"" 7 NHSHACode SQL_Latin1_General_CP1_CI_AS
9 SQLCHAR 8 0 "\",\"" 8 AdminCountyCode SQL_Latin1_General_CP1_CI_AS
10 SQLCHAR 8 0 "\",\"" 9 AdminDistrictCode SQL_Latin1_General_CP1_CI_AS
11 SQLCHAR 8 0 "\"\r\n" 10 AdminWardCode SQL_Latin1_General_CP1_CI_AS
Any ideas where I am going wrong?
thanks