Google BigQuery Import csv Using Console - Use first row as header - google-bigquery

I have a csv file with 1 column which I want to import into my big query environment. When using the Console to import data - always take my first row as a data row rather than a column name. Is there a way in the console to always ensure the first row is always the column name
E.g.
Tk Number
Tk - 0001
Tk - 0002

In CSV format, if the first row is string and others are integers, then it automatically takes the first row as header name, if you have checked the auto-detect schema option while creating the table.
But since you have strings in header as well as body, you will need to give the schema manually while creating the table in BigQuery. And in advanced options you can specify the number of rows to be skipped under 'header rows to skip' option.

Related

Bigquery Doesn't take the first column as header when importing from sheet

i'm trying to create a table from a google sheet sheet, I marked the header rows to skip to 1, however when I import the data I find :
the columns name are : string_field_0, string_field_1 ...
the header raw values existed as data in the table
I checked that in the sheet the first raw (number 1) is the header
This error could be about 3 possible problems:
1.You need to set “Header rows to skip”, which is the number of rows that are at the top of the first row with data that includes the header or if there are some rows in blanks. You can do this in BigQuery UI.
2.Maybe all the header names are string type, with BigQuery you need to distinguish each header with something other than a String. In this case I will use integer. for example:
column1,column2,column3
foo,bar,1
cat,dog,2
fizz,buzz,3
You need to have something other than just Strings
3.You need to explicitly specify the schema yourself.

Issues Exporting from T-SQL to Excel including column names & sorting on a column that's not exported

I have a SQL table with ERP data containing over 1 million records for seven different companies. This data needs to be exported to Excel files breaking the records by company, and for each company by a date range. Based on the companies and date ranges it will produce > 100 separate files.
The results are being imported into a second ERP system which requires the import file to be Excel format and must include the column names as the first row. The records need to be sorted by a [PostDate] column which must NOT be exported to the final record set.
The client's SQL server does not have the MS Jet Engine components installed in order for me to export directly to Excel. Client will not allow install of the MS Jet Engine on this server.
I already had code from a similar project that exported to CSV files so have been attempting that. My T-SQL script loops through the table, parses out the records by company and by date range for each iteration, and uses BCP to export to CSV files. The BCP command includes the column names as the first row by using an initial SELECT 'Col_1_Name', 'Col_2_Name', etc. with a UNION SELECT [Col_1], [Col_2],...
This UNION works AS LONG AS I INCLUDE THE [PostDate] column needed for the ORDER BY in the SELECT. This exports the records I need in proper order, with column names, but also includes the [PostDate] column which the import routine will not accept. If I remove [PostDate] from the SELECT, then the UNION with ORDER BY [PostDate] fails.
We don't want to have the consultant spend the time to delete the unwanted column from each file for 100+ files.
Furthermore, one of the VarChar columns being exported ([Department]) contains rows that have a leading zero, "0999999" for example.
The user opens the CSV files by double-clicking on the file in Windows file explorer to review the data, notes the [Department] column values are ok with leading zero displayed, and then saves as Excel and closes the file to launch the import into the second ERP system. This process causes the leading zeros to be dropped from [Department] resulting in import failure.
How can I (1) export directly to Excel, (2) including column names in row 1, (3) sorting the rows by [PostDate] column, (4) excluding [PostDate] column from the exported results, and (5) preserving the leading zeros in [Department] column?
You could expand my answer to this question SSMS: Automatically save multiple result sets from same SQL script into separate tabs in Excel? by adding the sort functionality you require.
Alternatively, a better approach would be to use SSIS.

Create table schema and load data in bigquery table using source google drive

I am creating table using google drive as a source and google sheet as a format.
I have selected "Drive" as a value for create table from. For file Format, I selected Google Sheet.
Also I selected the Auto Detect Schema and input parameters.
Its creating the table but the first row of the sheet is also loaded as a data instead of table fields.
Kindly tell me what I need to do to get the first row of the sheet as a table column name not as a data.
It would have been helpful if you could include a screenshot of the top few rows of the file you're trying to upload at least to see the data types you have in there. BigQuery, at least as of when this response was composed, cannot differentiate between column names and data rows if both have similar datatypes while schema auto detection is used. For instance, if your data looks like this:
headerA, headerB
row1a, row1b
row2a, row2b
row3a, row3b
BigQuery would not be able to detect the column names (at least automatically using the UI options alone) since all the headers and row data are Strings. The "Header rows to skip" option would not help with this.
Schema auto detection should be able to detect and differentiate column names from data rows when you have different data types for different columns though.
You have an option to skip header row in Advanced options. Simply put 1 as the number of rows to skip (your first row is where your header is). It will skip the first row and use it as the values for your header.

How to combine two rows into one row with respective column values

I've a csv file which contains data with new lines for a single row i.e. one row data comes in two lines and I want to insert the new lines data into respective columns. I've loaded the data into sql but now I want to replace the second row data into 1st row with respective column values.
output details:
I wouldn't recommend fixing this in SQL because this is an issue with the CSV file. The issue is that file contains new lines, which causes rows split.
I strongly encourage fixing CSV files, if possible. It's going to be difficult to fix that in SQL given there are going to be more cases like that.
If you're doing the import with SSIS (or if you have the option of doing it with SSIS if you are not currently), the package can be configured to manage embedded carriage returns.
Define your file import connection manager with the columns you're expecting.
In the connection manager's Properties window, set the AlwaysCheckForRowDelimiters property to False. The default value is True.
By setting the property to False, SSIS will ignore mid-row carriage return/line feeds and will parse your data into the required number of columns.
Credit to Martin Smith for helping me out when I had a very similar problem some time ago.

VBA Access Transfertext Import Errors

I would like to import a spreadsheet to an access database, on column has ages 1-89 plus another that's says 90+ which in turn create a Import Error. Using the DoCmd.TransferText, is it possible to import everything as it is including 90+ in the column of all other numbers?
If you import data into a table that doesn't already exist, Access will create one for you. It automatically determines the data type of each column based on the first few rows that are imported. If your source data contains a mixture of data types in one column, then you may experience this error.
There are 2 solutions:
Build the import table to be of the correct data type for your data (i.e. specify that the age column is Short Text). Then import to the pre-defined import table.
Ensure that the CSV or Excel file stores each age as a string i.e. "20" instead of 20 (in Excel, format the cell as Text, so it is left aligned or start cell contents with an apostrophe '20 or use formulaic notation ="20").
You could do either of those things, but it would be better to do both of them if possible.