I am using IICS to do ETL. My source is a JSON file. I am using Hierarchy parser to parse that JSON and load the output into database. Everything is going fine except one thing. Within that source JSON file there is one field which is array like this:
label:[Red, Green, Blue, Yellow].
Now in table which I am loading data into, there is a label column of varchar2(510 byte). When data is loaded into target by running the job, label column in table gets only one value which is at last index in source array, in above case it gets only 'Yellow' value while other 3 values are lost.
Can anybody tell me how to store entire array into the table column?
PS: We have no flexibility to use any programming language in IICS, all that we need to do is using IICS transformations.
Please let me know if additional information is needed.
Thanks in advance!
Related
I found that BigQuery's schema autodetection doesn't recognize a field if that doesn't appear in the beginning of an input JSON file.
I have this field named "details" which is a record type. In the first 2K rows of the JSON input file, this field doesn't have any sub-fields. But then in 2,698 rows of the input file, this field has "report" sub-field for the first time. If I move the line to the top of the JSON file, it works fine.
How can I solve this issue? Explicitly specifying the schema is one way but I am wondering if there is a way to make the auto detection scan more rows or something like that.
I’m new to Pentaho and I’m currently having an issue with mapping specific row values to an ID.
I have a data file with around 30 columns, one of which is for currencies (USD, GBP, AUD, etc).
The main objective is to have the user select up to 8 (minimum of 1) currencies and map them to a corresponding ID 1-8. All other currencies not in the specified 8 will be mapped with an ID of 9.
The final step is to then output the original data set, along with the IDs.
I’m pretty sure I’m making this way harder than it should, but here is what I have at the moment.
I have created a job where the first step is to set the variables for my 8 currencies, selectionOne -> AUD, selectionTwo -> GBP, …, selectionEight -> JPY.
I then have a transformation to read the data from the file and use the copy rows to result step.
Following that I have a second job called for-each which is my loop for checking the current currency in the row.
Within this job I have two transformations, one called set-current, one called map-currencies.
set-current simply uses the get rows from result step (to grab the data from the first transformation). I then use the set variable step to set the current currency to the value in currency field. This works fine, as each pass through in the loop changes the current variable to the correct value.
Map-currencies is where I’m having the most issues.
The goal is to use the filter row step to compare the current currency against the original 8 selected currencies, and then using the value mapper step to map it to an ID, before outputting the csv file.
The main issue here, is that I can’t use my original variables in the filter or value mapper.
So, what I’ve done here is use the get variables step to retrieve the variables and named them: one, two, three, …, eight. This allows me to bypass the filtering issue, but they don’t seem to work for the value mapper, which is the all important step.
The second issue is that when the file is output it only outputs one value (because of the loop), selecting the append option works, but this could be a problem if the job is run more than once.
However, the priority here is the mapping issue.
I understand that this is rather long, and perhaps a tad confusing, but I will greatly appreciate any help on this, even if it’s an entirely new approach 😊.
Like I said, I’m probably making it harder than it should be.
Thanks for your time.
Edit for AlainD
Input example
Output example
This should be doable in a single transformation using the Stream Lookup step.
Text File Input is your main file, Property input reads your property file into Key and Value columns. You could use a normal text file with two columns instead of the property input.
Below are the settings of the Stream lookup. Note the default value "9" for records that are not found in the lookup stream.
I am a new user of Parse.com and do like what I see so far. I am helping a friend out. They have a locations table that houses lat and long values. However, the developer that created it used an Object data type instead of a GeoPoint data type to store lat/long info. The developer wrote an iOS app that pulls records from parse and stores them locally to display on the map. What I am trying to do is figure out how that developer pulls only the records within X number of miles... or even based on a bounding box. Anyways, my question is if I have an object column with data like what I have shown below is there any way to filter the results by lat/long values (gt or lt etc)?
{"Latitude":33.51882,"Longitude":-93.97484}
It seems like an Object data type is a loose way of storing key value pairs without having to create an entities table etc. However, I cannot find a way to query this subset.
thanks for any help!
We get weekly data files (flat files) from our vendor to import into SQL, and at times the column names change or new columns are added.
What we have currently is an SSIS package to import columns that have been defined. Since we've assigned the mapping, SSIS only throws up an error when a column is absent. However when a new column is added (apart from the existing ones), it doesn't get imported at all, as it is not named. This is a concern for us.
What we'd like is to get the list of all the columns present in the flat file so that we can check whether any new columns are present before we import the file.
I am relatively new to SSIS, so a detailed help would be much appreciated.
Thanks!
Exactly how to code this will depend on the rules for the flat file layout, but I would approach this by writing a script task that reads the flat file using the file system object and a StreamReader object, and looks at the columns, which are hopefully named in the first line of the file.
However, about all you can do if the columns have changed is send an alert. I know of no way to dynamically change your data transformation task to accomodate new columns. It will have to be edited to handle them. And frankly, if all you're going to do is send an alert, you might as well just use the error handler to do it, and save yourself the trouble of pre-reading the column list.
I agree with the answer provided by #TabAlleman. SSIS can't natively handle dynamic columns (and niether can your SQL destination).
May I propose an alternative? You can detect a change in headers without using a C# Script Tasks. One way to do this would be to create a flafile connection that reads the entire row as a single column. Use a Conditional Split to discard anything other than the header row. Save that row to a RecordSet object. Any change? Send Email.
The "Get Header Row" DataFlow would look like this. Row Number if needed.
The Control Flow level would look like this. Use a ForEach ADO RecordSet object to assign the header row value to an SSIS variable CurrentHeader..
Above, the precedent constraints (fx icons ) of
[#ExpectedHeader] == [#CurrentHeader]
[#ExpectedHeader] != [#CurrentHeader]
determine whether you load data or send email.
Hope this helps!
i have worked for banking clients. And for banks to randomly add columns to a db is not possible due to fed requirements and rules. That said I get your not fed regulated bizz. So here are some steps
This is not a code issue but more of soft skills and working with other teams(yours and your vendors).
Steps you can take are:
(1) reach a solid columns structure that you always require. Because for newer columns older data rows will carry NULL.
(2) if a new column is going to be sent by the vendor. You or your team needs to make the DDL/DML changes to the table were data will be inserted. Ofcouse of correct data type.
(3) document this change in data dictanary as over time you or another member will do analysis on this data and would like to know what is the use of each attribute or column.
(4) long-term you do not wish to keep changing table structure monthly because one of your many vendors decided to change the style the send you data. Some clients push back very aggresively other not so much.
If a third-party tool is an option for you, check out CozyRoc's Data Flow Task Plus. It handles variable columns in sources.
SSIS cannot make the columns dynamic,
one thing, i always do, is use a script task to read the first and last lines of a file.
if it is not an expected list of csv columns i mark file as errored and continue/fail as required.
Headers are obviously important, but so are footers. Files can through any unknown issue be partially built. Requesting the header be placed at the rear of the file it is a double check.
I also do not know if SSIS can do this dynamically, but it never ceases to amaze me how people add/change order of columns and assume things will still work.
1-SSIS Does not provide dynamic source and destination mapping.But some third party component such as Data flow task plus , supporting this feature
2-We can achieve this using ssis script task.
3-If the Header is correct process further for migration else fail the package before DFT execute.
4-Read the line from the header using script task and store in array or list object
5-Then compare those array values to user defined variables declare earlier contained default value as column name.
6-If values are matching exactly then progress further else fail it.
A real beginner here,
I am looking to have a table of static data with about 300 cells in it. (There will be 12 distinct tables in all)
The user would input two values, the first would indicate the row, and the second would point to the cell within that row, and I want my app to be able to read back the column heading for that row.
What is the best way to have this data stored in my app? Currently the data is in a spreadsheet.
The data looks like:
Index 0,Index 1,Index 2,Index 3 ,Index 4,Index 5,Index 6,Index 7,Index 8,Index 9
10,156,326,614,1261,1890,3639,5800,10253,20914
20,107,224,422,867,1299,2501,3986,7047,14374 ...etc.
Where the number at index zero is the name of the row (entered by user) and the numbers after that are the values also entered by the user.
I want the code to take the two numbers (row and value) and then return a string based on the column heading (shown here as index 0 - 9)
the last tricky bit is if the user enters a value that is in between the values give I want it to use the next highest value from the data. E.g. if in row "10" the user inputs 700 I want the code to return the index heading for 1261.
Does that make sense?
Possibilities are endless...
In code as a static 2D array
XML
JSON
Tab Delimited Text File
Comma Delimited Text File
PList
etc.
All depends on your needs and wants.
On the CONs for each:
Static 2D array may consume some memory every time the app runs...
A file will involve some disk IO or processing requirements to read the values out of the file stored in the Bundle.
On the PROs for each:
Data from the static array would be FAST...
Updating data in a file could be done on-the-fly over the web.
You could write a simple routine to dump your spreadsheet into any of the above listed options, so I don't think that's a real serious consideration. It's mostly about what works best for you in terms of size of data and updatability/maintainability.