Problem appending CSV upload to existing BigQuery table

Problem appending CSV upload to existing BigQuery table - google-bigquery

I've been used to quickly uploading a CSV file to append data to an existing table in BigQuery.
I've made the new table name the same as the existing table, and I've then had options to overwrite or append data to the existing table.
This seems to have changed in the past few days and there is a new BigQuery console UI.
When I try and create a new table from a CSV file upload, under the table name field it currently says:
Unicode letters, marks, numbers, connectors, dashes or spaces allowed.
The job will create the specified destination table if needed, or the
table must be empty if it already exists.
However, when I try and create a table with the same name as an existing table (even though the existing table is empty), I get a red warning saying:
Table already exists
Does anyone know if this feature has now been removed or how to easily append data?
The long way round is to upload a CSV to a new table, then query the new table and set the destination to append or overwrite an existing table. Not ideal, particulalry having to define a new table schema.

In order to append a CSV file to an existing BigQuery table when using the Console, please follow the instructions below:
In the Explorer panel, expand your project and select a dataset.
Expand the Actions option and click Open.
In the details panel, click Create table.
On the Create table page, in the Source section:
For Create table from, select Upload.
Browse file from system
On the Create table page, in the Destination section:
For Dataset name, choose the appropriate dataset.
In the Schema section, for Auto detect, check Schema and input parameters to enable schema auto detection. Alternatively, you can manually enter the schema definition
Click Advanced options.
For Write preference, choose Append to table
Please review this document that expands on the same topic.

Related

How to drop columns from a partitioned table in BigQuery

We can not use create or replace table statement for partitioned tables in BigQuery. I can export the table to GCS but BigQuery generates then multiple JSON files that can not be imported into a table in once. Is there a safe way to drop a column from a partitioned table? I use BigQuery's web interface.

Renaming a column is not supported by the Cloud Console, the classic BigQuery web UI, the bq command-line tool, or the API. If you attempt to update a table schema using a renamed column, the following error is returned: BigQuery error in update operation: Provided Schema does not match Table project_id:dataset.table.
There are two ways to manually rename a column:
Using a SQL query: choose this option if you are more concerned about simplicity and ease of use, and you are less concerned about costs.
Recreating the table: choose this option if you are more concerned about costs, and you are less concerned about simplicity and ease of use.
If you want to drop a column you can either:
Use a SELECT * EXCEPT query that excludes the column (or columns) you want to remove and use the query result to overwrite the table or to create a new destination table
You can also remove a column by exporting your table data to Cloud Storage, deleting the data corresponding to the column (or columns) you want to remove, and then loading the data into a new table with a schema definition that does not include the removed column(s). You can also use the load job to overwrite the existing table
There is a guide published for Manually Changing Table Schemas.
edit
In order to change a Partitioned table to a Non-partitioned table, you can use the Console to query your data and overwrite your current table or copy to a new one. As an example, I have a table in BigQuery partitioned by _PARTITIONTIME. I used the following query to create a non-partitioned table,
SELECT *, _PARTITIONTIME as pt FROM `project.dataset.table`
With the above code, you will query the data among all table's partitions and create an extra column to show which partition it came from. Then, before executing it, there are two options, save the view in a new non-partitioned table or overwrite the current table:
Creating a new table go to: More(under the query editor) > Query Settings > Check the box "Set a destination table for query results" > Choose your project, dataset and write your new table's name > Under Destination table write preference check Write if empty.
Overwriting the current table: More(under the query editor) > Query Settings > Check the box "Set a destination table for query results" > Choose the same project and dataset for your current table > Write the same table's name as the one you want to overwrite > Under Destination table write preference check Overwrite table.
credit

Using SSIS Package, How to validate the source records for duplicate before inserting?

SQL Server 2012: using a SSIS package, how to validate the source records for duplicate before inserting?
Our source file is a .csv. We are facing duplicate records loaded in the staging table.
At present , we are following manual process of loading data.
How to validate the source file data against the destination table before loading and load only the valid records? Possibility of loading duplicate records not only because of the source file having duplicate records in it but also reloading the same file to the staging table.
We are not Truncate the staging table. We are keeping records as is.
Second question : How to pick the name of the source file and pass it in the loading ? Possibly having a derived column as "FileName" which will get loaded along with raw data to the staging table.

The typical load pattern I use in this case is:
Prepare a staging table that matches the source file
In SSIS run a SQL Task with TRUNCATE StagingTable; (which clears it out)
Then, run a data flow task that loads the entire data file into the staging table
Lastly, merge the staging table into the final table.
I prefer to do this last step in a SQL Task also:
INSERT INTO FinalTable
(PrimaryKey,Column1,Column2,Column3)
SELECT
PrimaryKey,Column1,Column2,Column3
FROM StagingTable SRC
WHERE NOT EXISTS (
SELECT * FROM FinalTable TGT WHERE TGT.PrimaryKey=SRC.PrimaryKey
);
If you prefer a graphical UI, and you don't mind the extra network traffic, and slower processing time, you can do the same type of merge operation using lookups. You can even use the SCD component but I strongly discourage it's use.
Whether you do it in T-SQL or the UI, you need a key that can be used to uniquely identify the records (referred to as PrimaryKey in my example). If you don't have this key, there is no way to 'deduplicate'
Note in this example you have a 'real' staging table whose only purpose is to get the data file into the database. Then you have a final table that contains the final consistent result
Also note that this pattern only adds new rows - it will not update existing rows if they change in the data file.

Given your exact scenario (of loading the same file again), I would first check if the data is even loaded to the staging table. If you do that, you don't have to worry about checking the duplicates at record level.
How are you setting the connection to the file? Most of the data loads I have dealt with, I designed for-each-loop-container where the file name/path would be populated in a user variable. As you said, you could just use a derived column transform to add a new column which gets the value from a variable. If you don't have the file name in a user variable, you could use expression task in the control flow to populate it.
To cover your exact requirement, I would use the above step to populate the file name in the table. You could even normalize to a different table instead of storing long file name for every data record. Once you have all the file names in the database, you could just have an "Execute SQL" at the beginning to see if that file name is already in the database.

Two years back I have faced the same problem with importing TSV files.
I tried many other solutions but best I could design is C# code script for such validation at its best.
What I did as a solution
Create one C# DataTable object in memory with Primary Key constraints,
like:-
DataColumn[] keyColumn = new DataColumn[30];
keyColumn[intJ] = dtFilterdPK.Columns["Column name"];
Then try to add one by one row from your CSV to this DataTables.
Whenever your data will get Duplication based on Primary Key will have an error
Handle this error code in (TRY)..CATCH block and make this duplication error as per your logging requirement.
Avoid those error records importing in DataTable object.
Atlast import your CSV file into your table as BulkImport
Like:
using (SqlBulkCopy bulkCopy = new SqlBulkCopy(myConnection))
{
bulkCopy.DestinationTableName = "Your DB Table Name"; //Assign table name
bulkCopy.WriteToServer(dtToBeImport); //Write into Actual table.
}
Hope this will help you.

Create SQL trigger if data exists in table

I am new to SQL.
What is the best way to create a TXT file, if a table has records > 0?
The code already exists to remove or add records to this table.
I am looking for ways to create a trigger file (with no content in the file) at a specific network folder.
Preferably, I would want this TXT file to be removed at the end of the day, so the process could repeat itself every morning

On an after delete Trigger do a select count(*) from table or query one of the system catalog views. If its zero, then call a stored proc that poops a file onto your share drive.
To move the file you could create a small package or call a powershell or bcp (after enabling xp_cmdshell though), or you could create a CLR function (after enabling CLR). I guess since the latter two you need to change a server setting, you could just create a package.
Annnd since there is no data you dont actually need to export, you just create a blank file!

Populating tables in SQL Developer using data from a file

I have a txt file containing a table with two columns student ID and GPA. I want to create a similar table in Oracle SQL developer. Is there some way to copy this data directly into SQL developer

If you are looking for a simple GUI method, you can connect to a db, right click on "Tables", "import data..." and use the Data Import Wizard.
Select the correct options (csv/delimited, tableName, cloumnsToImport...) and click "Finish".
Sorry, I cant post pics yet see the example screenshot here: http://i.stack.imgur.com/S7HFx.png

You can create external table with that file, then copy the external table into a new internal table (usual table), here is a full example:
External Tables Concepts
Then go to:
Example: Creating and Loading an External Table Using ORACLE_LOADER

phpMyAdmin CSV Upload Replace Data Not Working

I have created a database and table in phpMyAdmin.
I am importing the data from a csv file.
This works fine and adds the data correctly.
However each time I upload I want to replace the existing data.
I have ticked the box "Replace table data with file", it uploads fine but doesn't replace the existing rows it simply adds the new data as new rows below the old data.
Any ideas why this is happening?

This appears to be misleading text; it adds the "ON DUPLICATE KEY UPDATE" directive rather than truncating the table prior to insert, see the bug report at: https://sourceforge.net/p/phpmyadmin/bugs/4891/

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Problem appending CSV upload to existing BigQuery table - google-bigquery

Related

How to drop columns from a partitioned table in BigQuery

Using SSIS Package, How to validate the source records for duplicate before inserting?

Create SQL trigger if data exists in table

Populating tables in SQL Developer using data from a file

phpMyAdmin CSV Upload Replace Data Not Working

Categories

Resources