How to update existing data in Apache Druid

How to update existing data in Apache Druid - datasource

The problem was to add a new field to existing datasource and fill it with some default value.
I have tried so via this aticle
But the actual result is that new column was added, but it filled with a null value.
Where I was wrong and can I fix it in the same way?

It would be hard to tell without looking at how you have added the new column in your ingestions spec.
I will suggest using druid unified console data loader UI > and parse your input data > define the additional column under transform section. The advantage of data loader UI is that you can preview the transformed result immediately and once the workflow is completed you will get an ingestions spec and can submit it from there itself.
Eg-
Transformation example

From the documentation here: https://druid.apache.org/docs/latest/design/segments.html#different-schemas-among-segments
Maybe the result is related to the fact that existing segments don't have the new field and therefore show null.

Related

SSIS Mapping and Transformation

I'm new to building SSIS packages, in fact this is my first package. I need to pull data from one DB view on Azure managed instance to an SQL on prem. I have built out the data flow and all. I'm moving data from a database view into a another database table but the destination table has a column that the source doesn't have hence my destination mapping view looks like (See attached image) How do I fix this or what are my options?

If this columns needs to stay empty in the source and you don't have it in source your best and only option is leave it like this. It basically needs to ignore it so no information will be fed. That will work.
In case you need information as current date you can add derivied column box in between your source and destination in your Data Flow where you can add current date or more columns that come from variable for example.

Its self explanatory that ignore(optional) means mapping for those columns can be ignored and if you want columns to be mapped with any calculated column you can do it by using derived column SSIS component Reference
As per your use case,try to use OLD DB component instead of ADO.NET component
to optimize performance for a relatively large data set

On Adding row in interactive grid apex 5, the sequence doesn't work and displays error that the field cant be empty

I imported an application from workspace 1 to workspace 2 together with its table data and definition, but the problem is that, on adding a record in interactive grid of the imported application in workspace 2, it displays error that the PK cant be NULL. I leave the PK field empty because i expect that the sequence will do that job the way it does in the same application of workspace 1.
What is the reason that the same imported application is working a bit differently in a sense that sequence doesn't populate values itself.
What should be done to make the sequence work in the imported application in workspace 2.

A sequence isn't just "known" to the Interactive grid. You have to specify what sequence to use. In your Interactive Grid go to the column definition. Under the "Default" header change the Type to "Sequence". Put the name of your sequence in the Sequence Field:
If the schema in workspace one is different then the schema of workspace two there could be a whole lot of reasons it isn't behaving the same. Check for differences in the schemas. Does the sequence exist in your new schema? If using the old schema is it as simple as do you need to preface your sequence with the schema name? Lots of reasons that could be, rule out the simple stuff first.

How to remove column in Pentaho Data Integration?

I am using PDI/Kettle. I know it is possible to add new columns by specifying them in fields. Is it possible to remove deprecated input columns coming from the previous step in Modified Javascript Step with Spoon?

You can use Select / Rename values step to remove any field from record stream.
Do it in a 2nd tab Remove where you define Fields to remove

#Hello-lad
Reading your question looks like you wanted to know specifically if you can discard an input column coming from a previous step inside of a Modified Javascript Step, but the real use of this transformation is to create columns derived from values coming in the stream of Pentaho and not really to eliminate unwanted items in that stream, for that you particularly use the Select / Rename values (as indicated by mzy)

Adding two extra columns to input data - Pentaho Kettle

I am working on a transformation step for Pentaho Kettle. It selects several input columns and based on that adds two new columns during transformation. I am unable to understand (based on code from other plugins), how I can add the two new columns so that 1) steps downstream are aware of these columns and 2) i can push the transformed data into these columns.
Thanks in advance.

You might need to override meta.getStepFields() to add new ValueMetaInterface objects to the RowMetaInterface passed in. This is the standard way to add columns at runtime; however, the row's metadata (i.e. list of ValueMetaInterface objects) must be the same from row to row or else the next step in your transformation will complain.
Often when doing data-driven custom plugins, you consume as many rows as you need (using getRow()) in order to figure out what the outgoing row format/metadata will be, then you can construct a RowMetaInterface (usually using meta.getStepFields()) that will be passed into the putRow() call. If you intend to pass through the incoming fields, do something like:
RowMetaInterface outputRowMeta = getInputRowMeta().clone();
If you're creating new rows use this:
RowMetaInterface outputRowMeta = new RowMeta();
Either way when you call meta.getStepFields(outputRowMeta, ...) it should populate outputRowMeta with the appropriate fields, by adding/changing/removing ValueMetaInterface objects from outputRowMeta.
I've got a blog post using Groovy to add/replace fields in the incoming rows here:
http://funpdi.blogspot.com/2014/10/flatten-json-to-key-value-pairs-in-pdi.html
Not sure if that is similar to your use case or not. If you have more questions, feel free to find me on IRC at ##pentaho (my nick is usually mburgess_pdi)

IF i have understood your question correctly, i think you are trying to create an output file with dynamic column. So you can do this by checking on the "fast dumping" option in Text File Output Step. While doing so , donot define any column names in the "Fields" tab
Check my image below:
Hope it helps :)

Get list of columns of source flat file in SSIS

We get weekly data files (flat files) from our vendor to import into SQL, and at times the column names change or new columns are added.
What we have currently is an SSIS package to import columns that have been defined. Since we've assigned the mapping, SSIS only throws up an error when a column is absent. However when a new column is added (apart from the existing ones), it doesn't get imported at all, as it is not named. This is a concern for us.
What we'd like is to get the list of all the columns present in the flat file so that we can check whether any new columns are present before we import the file.
I am relatively new to SSIS, so a detailed help would be much appreciated.
Thanks!

Exactly how to code this will depend on the rules for the flat file layout, but I would approach this by writing a script task that reads the flat file using the file system object and a StreamReader object, and looks at the columns, which are hopefully named in the first line of the file.
However, about all you can do if the columns have changed is send an alert. I know of no way to dynamically change your data transformation task to accomodate new columns. It will have to be edited to handle them. And frankly, if all you're going to do is send an alert, you might as well just use the error handler to do it, and save yourself the trouble of pre-reading the column list.

I agree with the answer provided by #TabAlleman. SSIS can't natively handle dynamic columns (and niether can your SQL destination).
May I propose an alternative? You can detect a change in headers without using a C# Script Tasks. One way to do this would be to create a flafile connection that reads the entire row as a single column. Use a Conditional Split to discard anything other than the header row. Save that row to a RecordSet object. Any change? Send Email.
The "Get Header Row" DataFlow would look like this. Row Number if needed.
The Control Flow level would look like this. Use a ForEach ADO RecordSet object to assign the header row value to an SSIS variable CurrentHeader..
Above, the precedent constraints (fx icons ) of
[#ExpectedHeader] == [#CurrentHeader]
[#ExpectedHeader] != [#CurrentHeader]
determine whether you load data or send email.
Hope this helps!

i have worked for banking clients. And for banks to randomly add columns to a db is not possible due to fed requirements and rules. That said I get your not fed regulated bizz. So here are some steps
This is not a code issue but more of soft skills and working with other teams(yours and your vendors).
Steps you can take are:
(1) reach a solid columns structure that you always require. Because for newer columns older data rows will carry NULL.
(2) if a new column is going to be sent by the vendor. You or your team needs to make the DDL/DML changes to the table were data will be inserted. Ofcouse of correct data type.
(3) document this change in data dictanary as over time you or another member will do analysis on this data and would like to know what is the use of each attribute or column.
(4) long-term you do not wish to keep changing table structure monthly because one of your many vendors decided to change the style the send you data. Some clients push back very aggresively other not so much.

If a third-party tool is an option for you, check out CozyRoc's Data Flow Task Plus. It handles variable columns in sources.

SSIS cannot make the columns dynamic,

one thing, i always do, is use a script task to read the first and last lines of a file.
if it is not an expected list of csv columns i mark file as errored and continue/fail as required.
Headers are obviously important, but so are footers. Files can through any unknown issue be partially built. Requesting the header be placed at the rear of the file it is a double check.
I also do not know if SSIS can do this dynamically, but it never ceases to amaze me how people add/change order of columns and assume things will still work.

1-SSIS Does not provide dynamic source and destination mapping.But some third party component such as Data flow task plus , supporting this feature
2-We can achieve this using ssis script task.
3-If the Header is correct process further for migration else fail the package before DFT execute.
4-Read the line from the header using script task and store in array or list object
5-Then compare those array values to user defined variables declare earlier contained default value as column name.
6-If values are matching exactly then progress further else fail it.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to update existing data in Apache Druid - datasource

The problem was to add a new field to existing datasource and fill it with some default value. I have tried so via this aticle But the actual result is that new column was added, but it filled with a null value. Where I was wrong and can I fix it in the same way?

From the documentation here: https://druid.apache.org/docs/latest/design/segments.html#different-schemas-among-segments Maybe the result is related to the fact that existing segments don't have the new field and therefore show null.

Related

SSIS Mapping and Transformation

On Adding row in interactive grid apex 5, the sequence doesn't work and displays error that the field cant be empty

How to remove column in Pentaho Data Integration?

Adding two extra columns to input data - Pentaho Kettle

Get list of columns of source flat file in SSIS

Categories

Resources