What does this error mean: Required column value for column index: 8 is missing in row starting at position: 0 - google-bigquery

I'm attempting to upload a CSV file (which is an output from a BCP command) to BigQuery using the gcloud CLI BQ Load command. I have already uploaded a custom schema file. (was having major issues with Autodetect).
One resource suggested this could be a datatype mismatch. However, the table from the SQL DB lists the column as a decimal, so in my schema file I have listed it as FLOAT since decimal is not a supported data type.
I couldn't find any documentation for what the error means and what I can do to resolve it.

What does this error mean? It means, in this context, a value is REQUIRED for a given column index and one was not found. (By the way, columns are usually 0 indexed, meaning a fault at column index 8 is most likely referring to column number 9)
This can be caused by myriad of different issues, of which I experienced two.
Incorrectly categorizing NULL columns as NOT NULL. After exporting the schema, in JSON, from SSMS, I needed to clean it
up for BQ and in doing so I assigned IS_NULLABLE:NO to
MODE:NULLABLE and IS_NULLABLE:YES to MODE:REQUIRED. These
values should've been reversed. This caused the error because there
were NULL columns where BQ expected a REQUIRED value.
Using the wrong delimiter The file I was outputting was not only comma-delimited but also tab-delimited. I was only able to validate this by using the Get Data tool in Excel and importing the data that way, after which I saw the error for tabs inside the cells.
After outputting with a pipe ( | ) delimiter, I was finally able to successfully load the file into BigQuery without any errors.

Related

Getting "unexpected error" message when trying to create table on BigQuery

I get an "unexpected error" message every time I try to create a table on BigQuery. I've tried inputting and omitting the schema and made sure that the file was compatible with csv format.
I was having the same issue.
I did a couple of things to clean up the csv file and I was then able to upload it successfully
Remove commas from any numerical data
If using auto detect for schema remove any spaces between headings
Remove header data completely and define schema
I HAD THE SAME PROBLEM AND I WRITE DATA TYPES IN CAPITAL.
and my problem of "unexpected error was gone"
FOR EX- TYPE (STRING) NOT (string)
TYPE (INTEGER) NOT (integer) same on.......

How to Set default value of empty data of column in copy activity from csv file using azure data factory v2

I've multiple csv files and multiple tables.
The table name is file name and column name is first row of csv file.
Now I want to add default value of empty string to the sink table.
Consider my scenario,
employee:
id int, name varchar, is_active bit NULL
employee.csv:
id|name|is_active
1|raja|
Now I'm trying to copy the csv data to PostgreSQL table its throwing error.
Expected result is default value if its empty value.
You can use NULLIF in PostgreSQL:
NULLIF(argument_1,argument_2);
The NULLIF function returns a null value if argument_1 equals to argument_2, otherwise it returns argument_1.
This way you can replace NULL value with some other value
If your error is related to Type mismatch then consider typecasting the column first
Thanks!
As per the issue, tried to repro the scenario and here is the following outcome which was successfully copied. You have to use
Source Dataset: employee.csv from Azure Blob Storage
Sink Dataset : Here, I have used the sink as Azure SQL DB for some limitations but as you have used PostgreSQL is almost similar.
Copy Activity Settings:
Under the mapping settings there will be type conversion, where you have to import schema else you can dynamically add
Output:
Alternative to use DataFlow - if you have multiple data fields, you need to use the derived column transformation to generate new columns in your data flow or to modify existing fields.
For more details, refer Derived column transformation in mapping data flow.
You can even refer to this Microsoft Q&A post for more insights: Copy Task failure because of conversion failure

How to resolve SSIS Package Truncation error

SSIS Error: 0xC02020A1
Tying to import data to SQL 2008 from CSV file, I am getting below error.
> Error: 0xC02020A1 at Data Flow Task, Source – Distribution by xyz
> table from CSV [1]: Data conversion failed.
> The data conversion for column "ID" returned status value 4 and
> status text "Text was truncated or one or more characters had no match
> in the target code page.".
previous have used varchar and never a problem, I have tried to convert data to Int and even increased the size but still getting this error. I have also tried using the Advance editor and changed data to almost anything I could think would cover datatype on that column, still getting an error. Thanks for the advice
Most likely you have "bad" records in your raw file.
For "bad", if could be one of these two: 1) implicitly conversion cannot be done to the string value; 2) string is too large (exceed 8000).
For debugging this, change the destination column to VARCHAR(MAX).
Then load the raw file (do not forget to increase the external column length to 8000 in the Advanced page in flag file connection manager).
if:
1) it is loaded successfully, query the table where ISNUMERIC([that numeric column]) =0, if anything returned, that is the bad record that cannot be converted when loading.
2) it is not loaded correctly, try to see if you have any value from that field has more than 8000 characters. (could use C# script if it is impossible to manually check)

SSIS Pipe delimited file not failing when the row has more number pipes than the column number?

My Source File is (|) Pipe Delimited text file(.txt). I am trying load the file into SQL Server 2012 using SSIS(SQL Server Data Tools 2012). I have three columns. Below is the example for how data in file looks like.
I am hoping my package should fail as this is pipe(|) delimited instead my package is a success and the last row in the third column with multiple Pipes into last column.
My Question is Why is't the package failing? I believe it has corrupt data because it has more number of columns if we go by delimiter?
If I want to fail the package what are my options,If number of delimiters are more than the number columns?
You can tell what is happening if you look at the advanced page of the flat file connection manager. For all but the last field the delimiter is '|', for the last field it is CRLF.
So by design all data after the last defined pipe and the end of the line (CRLF) is imported into your last field.
What I would do is add another column to the connection manager and your staging table. Map the new 'TestColumn' in the destination. When the import is complete you want to ensure that this column is null in every row. If not then throw an error.
You could use a script task but this way you will not need to code in c# and you will not have to process the file twice. If you are comfortable coding a script task and / or you can not use a staging table with extra column then that will be the only other route I could think of.
A suggestion for checking for null would be to use an execute sql task with single row result set to integer. If the value is > 0 then fail the package.
The query would be Select Count(*) NotNullCount From Table Where TestColumn Is Not Null.
You can write a script task that reads the file, counts the pipes, and raises an error if the number of pipes is not what you want.

Want to use SSIS to get csv file with Null values into SQL Server DB table while preserving Nulls and maintaining floating point casting

Here is an example csv file:
Col1,Col2,Col3,Col4
1.0E+4,2.0E+3,3.1E-2,4.1E+4
NULL,1.0E-2,2.0E+1,3.2E-2
Using SSIS in Visual Studio, I want to get this file from csv format to a SQL Server DB table. I have a Data Flow Task which contains a Flat File Source and an ADO NET Destination. The SQL table has already been created with all columns cast as float. In the Flat File source I cast all columns as (DT_R4). An error is raised when I execute the package. The error is [Flat File Source [21]], data conversion failure for Col1. It is because I have a "Null" in the file. If instead of a Null I have an empty space, the SQL data table contains a "0" rather than a "Null." Is there anything I can put in place of "Null" in the csv file that SQL Server will interpret as Null and won't cause errors for SSIS? Please keep in mind that I actually have 100+ data files, each 500 MB big and each with 600+ columns.
Use a derived column component - Create a DerivedCol1 as
[Col1]=="Null"? NULL(DT_R4):[Col1] and map it to the destination column. Hope this helps.
Did you try
IsNull(col)?" ":col in derived column
If you look at the technical error when you click OK you can see that it needs cast:
"null" == LOWER(myCol) ? (DT_STR, 50, 1252) NULL(DT_STR, 50, 1252) : myCol
It's weird becase NULL(DT_STR,50,1252) should already return a null of that type.