How do I load entire file content as a text into a column AzureSQLDW table? - azure-data-factory-2

I have a some file in an azure data lake 2 and I want to load them as a column value nvarchar(max) in AzureSQLDW. The table in AzureSQLDW is heap. I couldn't find any way to do it? All I see is column delimited when load them into multiple rows instead of one row in single column. How I achieve this?

I don't guarantee this will work, but try using COPY INTO and define non-present values for row and column delimiters. Make your target a single column table.

I would create a Source Dataset with a single column. You do this by specifying "No delimiter":
Next, go to the "Schema" tab and Import the schema, which should create a single column called "Prop_0":
Now the data should come through as a single string instead of delimited columns.

Related

How to split values in Azure Data Factory: CSV to SQL

Is it possible to split the column values in Azure Data Factory? I am wanting to split a value in a column from a CSV into a SQL table. I am wanting to keep the second value "Training Programmes Manager" in the same column deleting the 1st and 3rd and the 4th value "Education" moved to an already made column in SQL
Value separated by |
Image of value below:
Value in CSV
Thanks James
Since you need to work with a particular column value, you'll need to use a Data Flow.
Source: Create a DataSet for your CSV file.
In the Data Flow, use Derived Column to parse the | delimited column into new columns.
Sink to SQL, referencing the new column names.

Is there any way to exclude columns from a source file/table in Pentaho using "like" or any other function?

I have a CSV file having more than 700 columns. I just want 175 columns from them to be inserted into a RDBMS table or a flat file usingPentaho (PDI). Now, the source CSV file has variable columns i.e. the columns can keep adding or deleting but have some specific keywords that remain constant throughout. I have the list of keywords which are present in column names that have to excluded, e.g. starts_with("avgbal_"), starts_with("emi_"), starts_with("delinq_prin_"), starts_with("total_utilization_"), starts_with("min_overdue_"), starts_with("payment_received_")
Any column which have the above keywords have to be excluded and should not pass onto my RDBMS table or a flat file. Is there any way to remove the above columns by writing some SQL query in PDI? Selecting specific 175 columns is not possible as they are variable in nature.
I think your example is fit to use meta data injection you can refer to example shared below
https://help.pentaho.com/Documentation/7.1/0L0/0Y0/0K0/ETL_Metadata_Injection
two things you need to be careful
maintain list of columns you need to push in.
since you have changing column names so you may face issue with valid columns as well which you want to import or work with. in order to do so make sure you generate the meta data file every time so you are sure about the column names you want to push out from the flat file.

Create table schema and load data in bigquery table using source google drive

I am creating table using google drive as a source and google sheet as a format.
I have selected "Drive" as a value for create table from. For file Format, I selected Google Sheet.
Also I selected the Auto Detect Schema and input parameters.
Its creating the table but the first row of the sheet is also loaded as a data instead of table fields.
Kindly tell me what I need to do to get the first row of the sheet as a table column name not as a data.
It would have been helpful if you could include a screenshot of the top few rows of the file you're trying to upload at least to see the data types you have in there. BigQuery, at least as of when this response was composed, cannot differentiate between column names and data rows if both have similar datatypes while schema auto detection is used. For instance, if your data looks like this:
headerA, headerB
row1a, row1b
row2a, row2b
row3a, row3b
BigQuery would not be able to detect the column names (at least automatically using the UI options alone) since all the headers and row data are Strings. The "Header rows to skip" option would not help with this.
Schema auto detection should be able to detect and differentiate column names from data rows when you have different data types for different columns though.
You have an option to skip header row in Advanced options. Simply put 1 as the number of rows to skip (your first row is where your header is). It will skip the first row and use it as the values for your header.

How to alter data types of multiple columns in table? (postgresql)

Uploaded a csv file to postgresql database and trying to change the columns that contain numbers from default text to numeric datatype.
I understand that you can manually alter each individual column datatype using ALTER and adding "," for multiple columns. However, the csv file contains 50 columns so trying to find out if there's a more DRY approach to changing the datatypes.
I was thinking about looping through certain sections of the csv file that need to be changed from text to numeric. For example, if columns 10 through 25 (out of 50) need to be changed, how would I select the starting point at column 10 and end at column 25?
If there's a way to loop through all columns and change the data type depending on the values in the column, that would be good to know. I was thinking this might be a problem because all values are set to default "text", so determining datatype based on values would be problematic. If this is not the case, would like to know how to approach the problem this way. Thanks!

How to map input to output fileds from excel to csv in pentaho?

How to map input to output fileds from excel to csv in pentaho?
How to tranform this in pentaho ? Where to map values of input to output columns as the positions and name are different in input to output.
You can rename the fields right in your MS-Excel-Input step, and you can reorder the fields in Text-File-Output. Also, a Select-Values step allows you to rename and reorder fields in one sweep on the Select & Alter tab.
The Select Values step allows you to change the column names and position (as well as type).
Note that the column name in the Excel Input is arbitrary and do not need to be related to the actual name in the Excel file, so that can rename them at wish. You can even copy/paste the names into the Field list.
Note also that the order of the column in the CSV output file is defined in Fields tab of Text file output step. You can change it with the Ctrl-Arrows keys.
If you need to industrialize the process and have the new columns name and order in, for example, a set of files or a database table, then you need Metadata injection. Have a look to Diethard's or Jens' examples.