Changing the length of Text fields in an Access linked table - sql

I am exporting a file from a system as .csv. My aim is to link to this file as a table (which matches the output field for field) and then run the queries and export.
The problem I am having is that, upon import, all the fields are 255 bytes wide rather than what they need to be.
Here's what I've tried so far:
I've looked at ALTER TABLE but I cannot run multiple ALTER TABLE statements in one macro.
I've also tried appending the table into another table with the correct structure but it seems to overwrite the structure.
I've also tried using the Left function with the appropriate field length, but when I try to export, I pretty much just see 5 bytes per column.
What I would like is a suggestion as to what is the best path to take given my situation. I am not able to amend the initial .csv export, and I would like to avoid VBA if possible, as I am not at all familiar with it.

You don't really need to worry about the size of Text fields in an Access linked table that is connected to a CSV file. Access simply assigns each Text field the largest possible maximum size: 255. It does not mean that every value is actually 255 characters long, it just means that any values in those fields can be at most 255 characters long.
Even if you could change the structure of the linked table (which you can't), it wouldn't make any real difference except to possibly truncate longer Text values, and you could easily do that with a String function. For example, if a particular field had to be restricted to 15 characters then you could simply use Left([fieldName], 15) as a query column or as the control source in a report.

In the end, as the data set is not that large, I have set this up to append from my source data into a table with the correct structure. I can now run my processes against this table as per normal.

Related

How to get the column index number of a specific field name in a staged file on Snowflake?

I need to get the column number of a staged file on Snowflake.
The main idea behind it, is that I need to automate getting this field in other queries rather than using t.$3 whereas 3 is the position of the field, that might be changed because we are having an expandable surveys (more or less questions depending on the situation).
So what I need is something like that:
SELECT COL_NUMBER FROM #my_stage/myfile.csv WHERE value = 'my_column_name`
-- Without any file format to read the header
And then this COL_NUMBER could be user as t.$"+COL_NUMBER+" inside merge queries.

Is there any way to exclude columns from a source file/table in Pentaho using "like" or any other function?

I have a CSV file having more than 700 columns. I just want 175 columns from them to be inserted into a RDBMS table or a flat file usingPentaho (PDI). Now, the source CSV file has variable columns i.e. the columns can keep adding or deleting but have some specific keywords that remain constant throughout. I have the list of keywords which are present in column names that have to excluded, e.g. starts_with("avgbal_"), starts_with("emi_"), starts_with("delinq_prin_"), starts_with("total_utilization_"), starts_with("min_overdue_"), starts_with("payment_received_")
Any column which have the above keywords have to be excluded and should not pass onto my RDBMS table or a flat file. Is there any way to remove the above columns by writing some SQL query in PDI? Selecting specific 175 columns is not possible as they are variable in nature.
I think your example is fit to use meta data injection you can refer to example shared below
https://help.pentaho.com/Documentation/7.1/0L0/0Y0/0K0/ETL_Metadata_Injection
two things you need to be careful
maintain list of columns you need to push in.
since you have changing column names so you may face issue with valid columns as well which you want to import or work with. in order to do so make sure you generate the meta data file every time so you are sure about the column names you want to push out from the flat file.

PDI /Kettle - Passing data from previous hop to database query

I'm new to PDI and Kettle, and what I thought was a simple experiment to teach myself some basics has turned into a lot of frustration.
I want to check a database to see if a particular record exists (i.e. vendor). I would like to get the name of the vendor from reading a flat file (.CSV).
My first hurdle selecting only the vendor name from 8 fields in the CSV
The second hurdle is how to use that vendor name as a variable in a database query.
My third issue is what type of step to use for the database lookup.
I tried a dynamic SQL query, but I couldn't determine how to build the query using a variable, then how to pass the desired value to the variable.
The database table (VendorRatings) has 30 fields, one of which is vendor. The CSV also has 8 fields, one of which is also vendor.
My best effort was to use a dynamic query using:
SELECT * FROM VENDORRATINGS WHERE VENDOR = ?
How do I programmatically assign the desired value to "?" in the query? Specifically, how do I link the output of a specific field from Text File Input to the "vendor = ?" SQL query?
The best practice is a Stream lookup. For each record in the main flow (VendorRating) lookup in the reference file (the CSV) for the vendor details (lookup fields), based on its identifier (possibly its number or name or firstname+lastname).
First "hurdle" : Once the path of the csv file defined, press the Get field button.
It will take the first line as header to know the field names and explore the first 100 (customizable) record to determine the field types.
If the name is not on the first line, uncheck the Header row present, press the Get field button, and then change the name on the panel.
If there is more than one header row or other complexities, use the Text file input.
The same is valid for the lookup step: use the Get lookup field button and delete the fields you do not need.
Due to the fact that
There is at most one vendorrating per vendor.
You have to do something if there is no match.
I suggest the following flow:
Read the CSV and for each row look up in the table (i.e.: the lookup table is the SQL table rather that the CSV file). And put default upon not matching. I suggest something really visible like "--- NO MATCH ---".
Then, in case of no match, the filter redirect the flow to the alternative action (here: insert into the SQL table). Then the two flows and merged into the downstream flow.

Access randomly drops decimals when inserting from txt file

I have a number of .csv files in the following format with example data listed:
ID,Lat,Long
1,-43.76120167,158.0299917
2,-43.76119833,158.03
3,-43.7612,158.0299983
4,-43.76120167,158.0299967
The values change from file to file, but they're always the same format and similar amounts. What you see above is exactly how it shows up in the .csv file (when opened with a texpad/notepad not in Excel - so we can eliminate any Excel problems now).
However, when I run the following INSERT statement as an Access SQL query:
INSERT INTO Table1 SELECT * FROM [Text;Database=C:\;Hdr=Yes].[ImportFile.csv];
This is what shows up in my Access database table:
ID,Lat,Long
1,-43,158
2,-43,158
3,-43,158
4,-43,158
Now, I know what you're thinking. Let me just say that my table design in Access is set up such that ID is a Number/Long Integer, and both of my Lat and Long fields are set up as Number/Double with 4 Decimal Places. I've double checked this a million times and it can be confirmed because not all input files share this problem.
This is what is troubling me... where are all the digits after my decimal point going? I need to have them.
What confuses me even more is that some files read just fine and the decimal points stay in there just fine... same table, same insert query. Every file is generated from the same source and formatted the same for what it's worth.
However, if I fire up Access itself and run the import from text file wizard, the values end up just fine. Access automatically makes the field a double with auto decimals (I have also tried using auto decimals in my desired table, to no avail).
Anyone have any idea what is going on?
Thanks!
This is well known problem.
The Jet database engine determines the data types from data source. One of solutions is to use/create Schema.ini file.
Also, keep in mind that, in order to determine data type for columns, only first few rows are scanned.
For more info, please see here

Find or Strip Invalid characters from Database

We are using a database where the front end software has allowed the input of invalid characters. (I have no control or re-writing of the software.)
The types of characters are carriage returns, line breaks, �, ¶, basically anything that is not 0-9, a-z or standard punctuation causes us issues with the database and how we use the data.
I'm looking for a way to scan the entire database to identify these invalid codes and either display them as results or strip them out?
I had been looking at This site wondering if there was a way of searching for a certain range? But I might be barking up the wrong tree.
I'm fairly new to SQL so be gentle with me, thanks.
The only way I could think to do this would be to write a stored procedure which uses system tables to get a list of all fields in the database/schema in question. Have it exclude system tables (or only include those that are user defined) then dynamically write out SQL update statements based on the columns/tables found in the system table inquiries. Using regular expressions or character removal like in this article
The system tables in question are:
SELECT
table_name,column_name
FROM
information_schema.columns
Psudo code would be:
Get list of tables we want to do this for
For each table in list
get list of columns for table that have string data.
For each column in table
generate update statement to strip unwanted characters
--Consider writing out table, column key, before after values to history table. incase this
has to be undone.
--Consider counter so I have an idea of what was updated
execute updatestatement
next column
next table
write out counter
Since you say
the data then moves to a second program that cannot handle these
characters and this causes the process to fail.
I'm wondering if you can leave the unreadable data where it is and create a new column for changed data that's only populated if/when the 2nd process fails. You'll still have to test every character of the data in the failed cell, but you wouldn't have to test every character of every row. After you determine the updated text to process, you can call the 2nd process again with the updated value.