I have created one bigquery table with certain number of columns. Now added few more columns and uploading SQL file to run the airflow pipeline but values are copied into wrong columns. How columns will be mapped with the sql containing columns values?
Related
I have a large .csv table that I want to insert into a Postgres DB. I don't need all rows from the table, is it possible to somehow filter it using SQL before it is uploaded to the database? Or the only option is to delete the rows I don't need afterward?
I've created a SQL Dedicated pool table in Synapse, and now trying to copy data from multiple XML files to this database
I've mapped all fields from the XML file that I need to each specific column in the destination table, but the following error is blocking the copy data activity:
Message=Column count in target table does not match column count specified in input. If BCP command, ensure format file column count matches destination table. If SSIS data import, check column mappings are consistent with target.,Source=.Net SqlClient Data Provider,SqlErrorNumber=107098,Class=16,ErrorCode=-2146232060,State=1,Errors=[{Class=16,Number=107098,State=1,Message=Column count in target table does not match column count specified in input. If BCP command, ensure format file column count matches destination table. If SSIS data import, check column mappings are consistent with target.,},],'
Any idea what I am doing wrong?
This is because the count of column in the XML files and Synapse is not equal. You need to check the count of your XML files and make them be same as the count of your sink's column.
I have a table in Athena with the following columns.
Describe my_table
row_id
icd9_code
linksto
The column icd9_code is empty with intdata type. I want to insert some integer values to the column icd9_code of my table named my_table.
Those integer values are stored in an excel sheet in my local pc. Does AWS athena provide some way to do it.
Amazon Athena is primarily designed to run SQL queries across data stored in Amazon S3. It is not able to access data stored in Microsoft Excel files, nor is it able to access files stored on your computer.
To update a particular column of data for existing rows of data, you would need to modify the files in Amazon S3 that contain those rows of data.
We have a table that contains 50 rows of data.the table includes BLOB data types, we are trying to see if we can use SSIS to copy the data from table1 to table2 inclusing the BLOB columns as we have tried to use other methods but did not succeed.
The Blob columns contain Excel documents.
Is this possible? Is there an easier way to do it in SSIS?
If not possible as there an easier way to it on Oracle?
How can I programatically find all Impala tables that need INVALIDATE METADATA statement (because they were created in Hive, but not yet known to Impala) or REFRESH (because column added, datafile added, etc.)?
Invalidate Metadata:
As a workaround, create a shell script to do the below steps.
Using beeline, connect to a particular database and run show tables statement and save output data to a file.
Using impala-shell, connect to the same particular database and run show tables statement and save output data to another file.
Now compare both the file to remove the duplicates and get the unique tables list from the first file which is a list of tables which are only in hive but not in impala.
Note:
a. Instead of a particular database each at a time in 1 and 2 steps, you can loop over all databases and save the output to a file. Inside the loop itself, you can redirect and append the output files to another final output file with data in some format like database.table or database_table to get all tables from all databases into a single file. Finally, follow step 3.
b. The unique tables from the second output file after removing duplicates will be tables that are deleted in hive and invalidate metadata needs to be run in impala to remove them from the impala list.
c. Rename of a table in impala can be recognized by hive but vice-versa is not possible and invalidate metadata should be run for both old and new table names to remove and add respectively in impala. This applies to most operations not just rename of table.
Refresh:
Consider a text format table with 2 columns and 1 row data.
Now suppose, a third column is added to that table in the beeline.
select * from table; ---gives 3 columns in beeline and 2 columns in impala since refresh is not run on impala for this table.
If we run compute stats in impala before running refresh in this case, then that newly added column from the beeline will be removed from the table schema in hive as well.
select * from table; ---gives 2 columns in beeline and 2 columns in impala since compute stats from impala deleted the extra column metadata of table although data resides in hdfs for that column. This might cause parsing issues in impala if the column is added somewhere in the middle or front instead of ending.
So it is advised to run REFRESH table name in impala right after adding a new column or doing any modifications in beeline for an existing table to not lose table schema as explained in the above scenario.
refresh table; ---Right after modification in hive run refresh in impala.
select * from table; ---gives 3 columns in beeline and 3 columns in impala since refresh is run before compute stats in impala.