Why Pentaho Data-integration cannot read new field on table? - pentaho

I am trying to copy record from a few table into a new one(report_table). But when I've created the transformation on kettle, I need to add a new field into report_table. After I add the field, kettle wont show it. When I try to "enter field mapping", it does not show on 'target field' Why cant kettle read the field?
There's no special thing. I just put "Input Table" and give it a query to select from my resource table. Then I put "Output Table" and give a "Hop" between input and output table. Then when I choose "Enter field mapping" kettle can't read all field from target table.
Any idea.

Clear the database cache. PDI caches the database structure, and also the hop metadata.
Also, i've seen bugs in 5.0.x where it gets into its head the structure of the metadata and will not change until you restart spoon. So try that too! (Note this only happens occasionally in my experience, and I work with PDI all day every day.

Related

Run truncate in bigquery with Apache NiFi

I have a process that uses the PutBigQueryBatch processor, in which I would like it to truncate the table before inserting the data. I defined an AVRO schema, and previously created the table in BigQuery specifying how I wanted the fields. I am aware that if I change the "Write Disposition" property to the value "WRITE_TRUNCATE", it will truncate the table. However, when I use this option, the schema of the table in BigQuery ends up being deleted, which I would not like to happen, and a new schema is created to record the data. I understand that the "Create Disposition" property exists, and that if the "CREATE_NEVER" option is selected, the schema should be respected and not deleted.
When I run this processor with the "Write Disposition" property set to "WRITE_APPEND", the schema I created in BigQuery is respected, but with the "WRITE_TRUNCATE" not.
Is there any way to use the "WRITE_TRUNCATE" option and the table schema not be deleted?
Am I doing something wrong?
Below I forward the configuration that I am using in the PutBigQueryBatch processor:
PutBigQueryBatch processor configuration
It sounds like what you want is to run a TRUNCATE TABLE query before starting your process: https://cloud.google.com/bigquery/docs/reference/standard-sql/dml-syntax#truncate_table_statement

Problem appending CSV upload to existing BigQuery table

I've been used to quickly uploading a CSV file to append data to an existing table in BigQuery.
I've made the new table name the same as the existing table, and I've then had options to overwrite or append data to the existing table.
This seems to have changed in the past few days and there is a new BigQuery console UI.
When I try and create a new table from a CSV file upload, under the table name field it currently says:
Unicode letters, marks, numbers, connectors, dashes or spaces allowed.
The job will create the specified destination table if needed, or the
table must be empty if it already exists.
However, when I try and create a table with the same name as an existing table (even though the existing table is empty), I get a red warning saying:
Table already exists
Does anyone know if this feature has now been removed or how to easily append data?
The long way round is to upload a CSV to a new table, then query the new table and set the destination to append or overwrite an existing table. Not ideal, particulalry having to define a new table schema.
In order to append a CSV file to an existing BigQuery table when using the Console, please follow the instructions below:
In the Explorer panel, expand your project and select a dataset.
Expand the Actions option and click Open.
In the details panel, click Create table.
On the Create table page, in the Source section:
For Create table from, select Upload.
Browse file from system
On the Create table page, in the Destination section:
For Dataset name, choose the appropriate dataset.
In the Schema section, for Auto detect, check Schema and input parameters to enable schema auto detection. Alternatively, you can manually enter the schema definition
Click Advanced options.
For Write preference, choose Append to table
Please review this document that expands on the same topic.

Question about adding column to data source and destination in SSIS package

I have a data source which is something like
select
patient_id
from patient_table
the destination is a CSV file.
Now I want to add patient_name to both the source and the destination.
I go to the source and I change the query to
select
patient_id,
patient_name
from patient_table
After I add this when I click on columns the patient_name column is not there.
The same thing happens for my destination. I have a flat file destination with the patient_id column so I add the patient_name column to the actual .csv file and that column is not reflected on the flat file connection manager.
The only way that I've been able to get these new columns to show up is to delete the data flow task, connection managers, sources and destinations and to create everything new from scratch.
Is there any other way to do this?
I just created a simple data flow with an OLE DB source and a flat file destination. After adding a second column to the OLE DB source I double clicked my flat file destination which opened the Flat File Destination Editor. Clicking UPDATE added the 2nd column to the flat file connection.
Are you using the latest tooling available to modify SSIS packages?
I don't have an SSDT installation handy right now, so I'll do the best I can without screenshots (and working from memory).
In the Source object, after you add the column to the text of your query, click on Columns, which you already know. Your new column doesn't show up in the list at the bottom yet, which you also know. Up in the top of that window, there's a grid representation of the result set from the query. Find your new column in that grid and check the box to tell the connector you want that column to enter the data flow.
Now go to the connection manager for the .csv file. Add the new column there.
Once it's in the connection manager, now you should be able to map it in the destination object.
There's a possibility that you'll have to click on the arrow or arrows in your data flow task and map the new column in those, too, but it doesn't always happen like that. I haven't taken the time to figure out why that's necessary sometimes and not others, but you'll know right away because the arrows will have red Xs on them.
And that should get you there.

How do I transfer a schema and all of its tables to a new database?

I have database a with schema foo which contains 20 tables. I want to move all of the contents of schema foo into database b without overriding the current content in database b.
Is there also a way to do it in pgadmin?
I found this link and perhaps it will be quite similar. But this particular link is for transferring a table.
Copy a table from one database to another in Postgres
You can script the first database with all its data once scripted you can run the th script within the other database it should work as long as you dont have tables in the second database with the same name
so in pg admin follow these steps to script the
-Right click on the database and click Backup.
-Selece a filepath and filename on where you want to save your script
-Select Plain as the format in the format dropdown.
-Go to Options and check "schema and data" in tab # 1.
-Then click on Backup.
-Then click Done.
-Then right click on your 2nd database and create a new query.
-Find where you saved the script and copy the script to the query
-run the query and should be all good
if you are unsure about this just create 2 practice databases and practice on those before you do it on the main one

Padding ssis input source columns to avoid truncation errors?

First post. In SSIS I am using an ODBC Source, and the database (or ODBC driver) doesn't appear to report column metadata correctly for any of the tables in the database for varchar type columns. Therefore, each time I import a table, I get truncation errors on all the varchar fields. Is there any way to set the size of these fields besides doing it ONE AT A TIME in the advanced editor? When importing a flat file source it lets you select a padding % for string fields. Does something like this exist for OLE or ODBC sources? If not, is there any way I can override the column length to, say, force them all to be VARCHAR(1000)?
I have never experience SQL Server providing the wrong meta data for an ODBC connection and it is unlikely you have a ghost in the machine (Deus Ex Machina). The meta data of the column can be set in the ODBC source via the advanced editor. I am willing to bet that is where the difference is. To confirm this:
Right click the ODBC connection and select the Advanced Editor
Click on the Input/Ouput Properties tab
Expand OLE DB Source Output
Expand both External Columns and Output Columns
Inspect each column pair and verify that the meta data matches
Correct any outages in the meta data
Let me know if that works. If it does not work, please provide data and SQL query you are using.
The VARCHAR field width must be set to the maximum incoming field width. I know the default field width is 50. Regardless, each field must be set. I previously worked on a project with large numbers of columns on the input files. My solution was to store the meta-data for the columns in a database table and then I built a C# application to read in the meta-data and then modify the *.dtsx file and set the meta data on all columns. This is the best solution that I am aware of to automate the task.
Unfortunately, I don't have much experience with pulling data through ODBC. Are you pulling from an Access database? Or, what are you pulling from?