Keeping dValIds For auto-generated dimensions consistent - endeca

I am working with Endeca 6.4.1 and have many auto-generated dimensions present in my pipeline (mapped using Dev-studio), the application's indexing is CAS-less. So only FCM is creating Dimensions and assigning dValIds. I am using Endeca SEO, so the dVal Id directly reflects in my URL, and if an auto-gen dimension's value's Id changes, a link to that navigation State is lost.
I have a flat file as the dimension's source, for example
product.feature|neon finish
What I want is that, if the value some day changes to Neon-finish or Neon color, the dValId that was assigned to neon finish should be transferred to the new value. I can keep a custom mapping of the change to track that neon finish has been changed to a new value.
Is there any way to achieve this, may be by using some manipulators?
Please share your thoughts.

There are two basic ways to do this:
1) Update the state files when you change a dimension value (APPDIR/data/state/autogen_dimensions.xml ). This would most likely be a manual process.
2) A more robust but complex solution is to change the dimension values to be some ID number and use a synonym for the display name. Then the display name can change without a change to the id number. This may require some serious changes to your pipeline.
Good luck

Related

How to set KOFAX KTM Server global variable value which will be initialized in Batch open, updated in SeparateCurrentPage & used in BatchClose?

I am trying to count a specific barcode value from Project.Document_SeparateCurrentPage and use it in BatchClose to compare if the count is greater than 1 and if it is >1 then send the batch to a specific queue with specific priority. I used a global variable in KTM Project Script to hold the count value which was initialized to 0 in Batch open. It worked fine until unit testing. But our automation team found that out of 20 similar batches, few batches were sent to the queue where the batch should go only if the count satisfies the greater than one condition, though they used only one barcode.
I googled and found that KTM Server script events do not allow to use shared information in different processes(https://docshield.kofax.com/KTM/en_US/6.4.0-uuxag78yhr/help/SCRIPT/ScriptDocumentation/c_ServerScriptEvents.html). Then I tried to use a batch field to hold the barcode count but unable to update its value from Project.Document_SeparateCurrentPage function using pXRootFolder.Fields.ItemByName("BatchFieldName").Text = "GreaterThanOne". The logs show that the batch reads the first page three times and then errors out.
Any links would help. Thanks in advance.
As you mentioned, the different phases of batch/document processing can execute in different processes, so global variables initialized in one event won’t necessarily be available in others. Ideally you should only use global variables if their content can be set from Application_InitializeScript or Application_InitializeBatch, because these events occur in each separate process. As you’ve found out, you shouldn’t use a global variable for your use case, because Document_SeparateCurrentPage and Batch_Close for one batch may occur in different processes, just as the same process will likely execute those events for multiple batches.
Also, you cannot set batch fields from document level events for a related reason: any number of separate processes could be processing documents of a batch in parallel, so batch level data is read-only to document events. It is a bit unintuitive, but separation is a document level event even though it seems like it is acting on the whole batch. (The three times you saw is just an error retry mechanism.)
If it meets your needs, the simplest answer might be to use a barcode locator as part of normal extraction (not just separation), and assign to a field if needed. While you cannot set batch fields from document events, you can read document data from batch events. So instead of trying to track something like a count over the course of document events, just make sure whatever data you need is saved at a document level. Then in a Batch_Close you can iterate the documents and count/calculate whatever you need. (In your case maybe the number of locator alternatives for the barcode locator, across each document.)

How to manage additional processed data in MarkLogic

MarkLogic 9.0.8.2
We have around 20M records in MarkLogic.
For one of the business requirement, we need to generate additional data for each xml and then need user will search this data.
As we can't change original document, so need input on what is best way to manage additional data. Following are the few which we have thought of
Create separate collection and store additional data in separate xml with same unique number i.e. same as original xml. So when user search for it, search in this collection and then retrieved original documents and send response back.
Store additional data in original document properties
We also need to create element range index to make sure it works when end user provide data in range operators.
<abc>
<xyz>
<quan>qty1</quan>
<value1>1.01325E+05</value1>
<unit>Pa</unit>
</xyz>
<xyz>
<quan>qty2</quan>
<value1>9.73E+02</value1>
<value2>1.373E+03</value2>
<unit>K</unit>
</xyz>
<xyz>
<quan>qty3</quan>
<value1>1.8E+03</value1>
<unit>s</unit>
</xyz>
<xyz>
<quan>qty4</quan>
<value1>3.6E+03</value1>
<unit>s</unit>
</xyz>
</abc>
We need to process data from value1 element. User will then search for something like
qty1 >= minvalue AND qty1<=maxvalue
qty2 >= minvalue AND qty2<=maxvalue
qty3 >= minvalue AND qty3<=maxvalue
So when user will search for qty1 then it should only get data from element where value is qty1 and so on.
So would like to know
What is best approach to store data like this
What kind of index i should create to implement this
I would recommend wrapping the original data in an envelope, which allows adding extra data in the header. It could also allow creating a canonical view on the relevant pieces of the data, and either store that as instance, and original as 'attachment' (sub-property, not an attached binary), or keep the instance as-is, and put canonical values for indexing in the header.
There is a lengthy blog article about the topic, that discusses pros and cons in high detail: https://www.marklogic.com/blog/envelope-design-pattern/
HTH!
Grtjn's answer would be the recommended solution, as it is more performant to keep all the information inside the document itself, versus having to query across both the document with the properties, but it would require changes to the document.
Option 1 & 2 could both work.
Properties documents already exist, so it doesn't add fragments, but the properties must conform to the schema.
Creating a sidecar document provides more flexibility, because you are creating new documents, it will increase number of fragments.

Binary Sankey Diagram in Tableau - Not All Activities Match The Corresponding Number of KPIs

How do I link my activities variable to only the corresponding KPIs variable?
Using guidance from a number of sources, but primarily the genius of Jeffery Shafer articulated through the SuperDataScience video, I built a Sankey Diagram for my work. For the most part it works, however, I have been trying to figure out how to adjust my Sankey Diagram model to line up each activity with ONLY the corresponding KPIs, but am having no luck.
The data structure looks like this:
You'll note I changed the binary value to "", 2 instead of 0, 1 as it makes visual calculations easier. For the "Viz" variable, I have "Activity" for the raw data set, then I copy/paste/replicate the data to mirror the data (required for the model) but with "KPI" for the mirrored data.
In the following image, you'll see my main issue is that the smallest represented activity still shows as corresponding to all KPIs when in fact it does not. I want activity to line up only with the corresponding KPIs as some activities don't correspond with all, or even any, KPIs.
Finally, here is the model very similar to what the above video link shows:
Can someone help provide insight into how I can adjust the model to fit activities linking only to corresponding KPIs? I appreciate any insight. Thanks!
I have a solution to the issue, thanks to a helpful Tableau support member named Anthony. It was in the data structure. The data was not structured to only associate "Activities" with their "KPI" values within Tableau's requirements, but every "Activities" value with every "KPI" value. As a result, to achieve the desired result, the data needs to be restructured to only contain a row for every valid "Activities" and "KPI" combination. See the visual below where data is removed to format properly:
-------------------------------------->
Once the table is restructured, the desired visual result should configure with the model. It works like a charm!
Good luck out there!

Rally Lookback, Snapshot with empty custom field

I am trying to get a snapshot of deleted userstory to get value for a custom field(c_Dep). I get the snapshot but the custom field is empty. It had value in it. Does lookback not save value for cutomer created cutom field?
findConfig: {
_TypeHierarchy: 'HierarchicalRequirement',
"ObjectID": 12345,
"_ValidFrom": {
"$lte": "2017-01-25T19:00:57.475Z"
}
Sarita, It is hard to tell from the information you have given what is going on precisely. However, I can give you some pointers
The Lookback API will store changes in values for custom fields. The selection you have shown is valid from 24thJan to 25thJan. During this period was the custom field set? Probably not, because the array is only one long and I think it is showing the creation event.
Was the custom field updated to contain something after this time period?
The reason for asking is that a common misunderstanding is that the records stored in the lookback database will hold the current value of fields - it doesn't. It holds the changes in fields. If c_Dependencies didn't change during that time period, you may not see an entry returned in the array. The next entry in the database might be the record where the c_Dependencies field was set (changed from null to something) and that might be 'after' your time period filter.
It looks like your query is requesting snapshots earlier than 2017/1/25 ($lte). Since there's only one, it's probably the creation snapshot. If you get all snapshots for the ObjectID by removing the _ValidFrom parameter, you should see the changes made to c_Dep after artifact creation.
As I am not allowed to comment, I have to post a new answer.
I think William Scott meant remove the ValidTo filter. The one you have is the creation change. The update will be afterwards.

SSAS: How to handle date dimension when date is null

I'm trying to add a new column to my SSAS cube. The column is a date field, and links to my DimDate table (a Date dimension). This date represents the project completion date.
However.... not all of the projects have a project completion date due to old projects not ever being assigned this value. And this is expected. We don't want to put bogus dates into the field just to get SSAS to work.
When processing the cube, it crashes with:
Errors in the OLAP storage engine: The attribute key cannot be found when
processing: Table: 'dbo_FactMyTable', Column: 'MyDate_id', Value: '0'.
The attribute is 'Date Id'.
I can't disable "missing values" for the entire project because in most cases, this really is an error. How can I disable missing values for this dimension?
Or is there a better way to handle missing dates/values like this?
Small correction - based on your question, you need to change Processing error handling for special Measure Group, not Dimension. You can do it for all dimensions linked to some measure group, but not to specific dimension.
You can process individual measure group with _Table: 'dbo_FactMyTable'_ first with necessary missing value settings, and then - process rest of your cube with default settings.
Main problem here - how to process rest of the cube. You might have sophisticated system which creates processing XMLA scripts dynamically based on data update knowledge (I do it with SSIS); in this case you would not ask this question. Suppose your environment is simpler - you update cube and would like to process it as a whole completely. In such scenario I would sudgest the following workflow:
Process Default all Dimensions (will do initial processing or in structure changes)
Process Update all Dimensions
Process Cube with Unprocess - invalidating it
Process your special measure group
Process Cube with Process Default
This will first update Dimensions, then - clear processing status flag from all measure groups in the Cube. After that you process your measure group with special flags; this set processing status for this MG. And then during Process Default on Cube - only unprocessed MGs will be covered, which excludes your special MG from processing scope.
The answer is a bit complicated, but this article did a great job of explaining it, including screen shots for the SSAS-challenged like me.
http://msbusinessintelligence.blogspot.com/2015/06/handling-null-dates-in-sql-server.html?m=1