How to manage additional processed data in MarkLogic - indexing

MarkLogic 9.0.8.2
We have around 20M records in MarkLogic.
For one of the business requirement, we need to generate additional data for each xml and then need user will search this data.
As we can't change original document, so need input on what is best way to manage additional data. Following are the few which we have thought of
Create separate collection and store additional data in separate xml with same unique number i.e. same as original xml. So when user search for it, search in this collection and then retrieved original documents and send response back.
Store additional data in original document properties
We also need to create element range index to make sure it works when end user provide data in range operators.
<abc>
<xyz>
<quan>qty1</quan>
<value1>1.01325E+05</value1>
<unit>Pa</unit>
</xyz>
<xyz>
<quan>qty2</quan>
<value1>9.73E+02</value1>
<value2>1.373E+03</value2>
<unit>K</unit>
</xyz>
<xyz>
<quan>qty3</quan>
<value1>1.8E+03</value1>
<unit>s</unit>
</xyz>
<xyz>
<quan>qty4</quan>
<value1>3.6E+03</value1>
<unit>s</unit>
</xyz>
</abc>
We need to process data from value1 element. User will then search for something like
qty1 >= minvalue AND qty1<=maxvalue
qty2 >= minvalue AND qty2<=maxvalue
qty3 >= minvalue AND qty3<=maxvalue
So when user will search for qty1 then it should only get data from element where value is qty1 and so on.
So would like to know
What is best approach to store data like this
What kind of index i should create to implement this

I would recommend wrapping the original data in an envelope, which allows adding extra data in the header. It could also allow creating a canonical view on the relevant pieces of the data, and either store that as instance, and original as 'attachment' (sub-property, not an attached binary), or keep the instance as-is, and put canonical values for indexing in the header.
There is a lengthy blog article about the topic, that discusses pros and cons in high detail: https://www.marklogic.com/blog/envelope-design-pattern/
HTH!

Grtjn's answer would be the recommended solution, as it is more performant to keep all the information inside the document itself, versus having to query across both the document with the properties, but it would require changes to the document.
Option 1 & 2 could both work.
Properties documents already exist, so it doesn't add fragments, but the properties must conform to the schema.
Creating a sidecar document provides more flexibility, because you are creating new documents, it will increase number of fragments.

Related

How to get actual allowed values instead of reference to it with Attributes api in Rally

https://rally1.rallydev.com/slm/webservice/v2.0/typedefinition/<defect_id>/Attributes
After hitting the specified url we get the fields for the specified defect id but in order to fetch the allowed values for dropdown fields we have to hit another api.
Is there any other way through which we can get all the fields with the allowed values instead of the reference to the allowed values in a single api call?
Unfortunately with the introduction of WSAPI version 2.0, the ability to load sub collections of data within the initial request was removed. This was done in order to improve performance as it was previously possible to request too large a set of data and have a significant impact on the performance of the system.
So the only way to fetch the lists of allowed values for fields is to loop through the list of defect attributes, grab the necessary endpoint url from the _ref value and load it from there.
It could be worth saving the references to these attribute IDs as they shouldn't change as long as the fields aren't removed from the object model.

SQL Server big tables or store data in a xml field

I have a .net solution with a big form with many data that the customer need to fill, like a form with many steps to fill all data we need to get.
So i was wondering if it's better (from a performance and design approach) a traditional big table with many fields, o store the data only on one field of XML type.
Example of one "TraditionalTable":
RecordId
CustomerId
Data 1
Data 2....
to Data N
1
120
01/01/1980
abcd ....
123
2
20
04/02/2004
fgh ....
230
3
10
05/01/1995
xyz ....
135
Example of one "DataWithXMLField":
RecordId
CustomerId
FormData
1
120
< data>< customerdetails>< borndate>01/01/1980< /borndate>< /customerdetails >< financialinfo >...."
I've done many systems like this and prefer to keep the data as XML (often it's a serialized object). I find this to be efficient at runtime and at design time. (See item below about binary attachments).
The following are some suggestions based on what I've done in the past. Obviously it's not a one-sized hammer...
Often data is "collected" by a user and "approved" by an administrator. While collecting the data, it's stored as XML. When approved, the XML is shred and placed into "normal" relational tables/fields.
Often this data has been collected through multiple pages. Storing as XML allows collecting data in a way that is logical to the user but doesn't fit the final data structure very well.
If a form is abandoned (not completed or canceled) it's easy to delete a single row.
Things to keep in mind:
Some data is related to workflow and is separate from the data being collected. For example, and field for "Form Status" may go from "In Progress", to "Submitted" to "Approved". This type of data should be kept as regular columns.
Store Binary Data separately. If your form includes submitting binary data (like uploading a PDF) I like to generate a GUID on the front end. Store that GUID in the XML and then save the binary data separately using the GUID. Possibly on disk or in a separate "attachments" table.
Define a column for a "version number" of the XML. This way you can programmatically identify what is in the XML. This will help in the future when you need to make changes to the XML.
Define a column for a "Summary" that is short human-friendly version of the XML. For example, if your XML contains information for registering for summer camps, your "XML Summary" might contain the text: "SMITH,JOHN, Camp White Pine 2021". This text us calculated on the front end. It can then be used for displaying rows of data without having to poke into the XML. For example, an administrative page may exist that lists applications that require approval.
Define a column to indicate if the XML meets all your requirements. You don't want to validate XML in the database (it's often hard, and likely repetitive of the UI). Your business layer can apply business rules (Validation) to the XML (or classes) and store in the database an indicator that all business rules are met.

What kind of dynamic content is available in Eloqua?

In Eloqua, can you send out an email to a contact list but version the "hero" image headline for each segment using dynamic content blocks?
And then can you do the reverse, have the main image remain the same, and dynamically populate products below that they've purchased in the past?
For scenario 1, yes that is possible out of the box.
Scenario 2 however is a bit more complicated and would generally require a 3rd party tool to provide this type of dynamic code generation based upon a lookup table (in this case a line item inventory or purchases). Because a contact could have zero or more products (commonly as individual records in a CDO), you would generally need to aggregate or count the number of related records, and then generate your HTML table and formatting around those record values, and be contextually aware if it is the first or last record (to begin and close the table). Dynamic content does not have mathematical functions and would not be able to count those related records - this is something usually provided by a B2C system like SFMC using ampscript or dynamically generated through custom code and sent through a transactional SMTP service. You could have multiple dynamic content on top of each other, but your biggest limitation becomes the field merge, with only lets you select a record based upon earliest/last creation date, or last modified. This is not suitable if you have more than 2 records. A third party service that provides a cloud content module for your email is your best bet.

How to design RESTful URL with many input parameters

I am working to create a Java based RESTful API that uses Spring MVC.
Now for some of the API endpoints-- multiple different parameters are required... I am not talking about a list of values-- more like parameter1, parameter2, parameter3, parameter4 and so on-- where all the 4 (or more) parameters are of different data types as well.
How do I design the API endpoint URL for the above scenario, eg for 4 separate input parameters? Is there any recommended way/best practice for doing this? Or do I simply concatenate the 4 values, with ach pair of values separated by a delimiter like "/"?
EDIT from user comment:
Example: I have to retrieve a custom object(a 'file') based on 4 input parameters--(Integer) userid, (Integer) fileid, (String) type, and (String) usertype. Should I simply create a REST Endpoint like "getfile/{userid}/{fileid}/{type}/{usertype}-- or is there a better (or recommended way) to construct such REST endpoints?
In REST start by thinking about the resource and coming up with immutable permalinks (doesn't change)to identify that resource.
So, in your example (in comment), you said you want to retrieve a file resource for a user and type (file type or user type?)
So, start with just enough information to identify the resource. If the id is unique, then this is enough to identify the resource regardless of the user who owns the file:
/files/{fileId}
That's also important as the url if a file could change owners - remember we want to identify the resource with just the components needed so it can be a permalink.
You could also list the files for a specific user:
/users/{userId}/files/
The response would contain a list of files and each of those items in the list would contain links to the files (/files/{fileId})
If for some reason the file id is not unique but is unique only in the context of a user (files don't change owners and id increments within a user - wierd) then you would need these components to identify the resource:
/users/{userId}/files/{fileId}
Also note the order based on the description. In that wierd case, we said the files are logically contained and IDed by the user and that's also the containment in the url structure.
Hope that helps.
A GET request to file/{usertype}/{user}/{type}/{fileid} sounds good

What is the best way to fake a SQL array or list?

I'm building a chatroom application, and I want to keep track of which users are currently in the chatroom. However, I can't just store this array of users (or maybe a list would be better) in a field in one of my records in the Chatroom table.
Obviously one of the SQL data types is not an array, which leads me to this issue: what is the best way to fake/mock array functionality in a SQL database?
It seems there are 3 options:
1: Store the list/array of users as a string separated by commas, and just do some parsing when I want to get it back to an array
2: Since the max amount of users is allowed to be 10, just have 10 extra fields on each Chatroom record representing the users who are currently there
3: Have a new table Userchats, which has two fields, a reference to the chatroom, and a user name
I dunno, which is the best? I'm also open to other options. I'm also using Rails, which seems irrelevant here, but may be of interest.
Option 3 is the best. This is how you do it, in a relational schema. It is also the most flexible and future-proof option.
It can grow easier in width (extra columns say, a date joined, a channel status, a timestamp last talked) and length (extra rows when you decide there now can be 15 users in a room instead of 10).
The proper way to do this is to add an extra table representing an instance of a user being in a chatroom. In most cases, this is probably what you will want to do, since it gives you more flexibility in the types of queries you can do (for instance: list all chatrooms a particular user is in, find the average number of people in each chatroom, etc.) You would just need to add a new table - something like chat_room_users, with a chat_room_id, and a user_id.
If you're deadset on not adding an extra table, then Rails (or more specifically ActiveRecord), does have some functionality to store data structures like arrays in a SQL column. Just set up your column as a string or text type in a Rails migration, and add:
serialize :users
You can then use this column as a normal Ruby array / object, and ActiveRecord will automatically serialize / deserialize this object as you work with it. Keep in mind that's there are a lot of tradeoffs with this approach - you will never be able to query what users are in a particular room using SQL and will instead need to pull all data down to Ruby before working with it.