How to set up custom fields in Superset? - sql

Our database has a field with JSON data that we'd like to use in reports. E.g.
{
owner_type: "USER",
updated_at: 1641996749092389600,
version_no: 1,
entity_type: "INDIVIDUAL",
country:"ES",
}
How can one create dynamic fieldsĀ in Superset, e.g. to expose owner_type as its own field?
I'm coming from tools like Snowflake and Zoho Analytics where you could build Views, Dynamic Tables and Formula Fields based on aggregated raw data.

You can add columns to your table in Superset. Hover on 'Sources' on the header and select 'Tables'. Then from there, choose the option to edit the record of your table. In that you can add a calculated column/custom column.
To add a column for owner_type, lets name the custom column as owner_type. Fill the datatype for the new column as VARCHAR(100). Choose the table from the dropdown. In the expression, put json_column->"$.owner_type" and then hit save. This expression is for MySQL database. You can find the expression to parse JSON in your particular DB.

Related

BigQuery extending field metadata

is there a way to add an additional column when specifying BQ table schema?
What I mean - apart from the field type, optionality and description I also would like to add another column - additional metadata. For example a semantic column name. Do not want to use the Description one, as that would be for an actual field definition.
thank you!

Is it posible to insert into the bigQuery table rows with different fields?

Using bigQuery UI I've created new table free_schem_table and haven't set any schema, then I tried to execute:
insert into my_dataset.free_schema_table (chatSessionId, chatRequestId,senderType,senderFriendlyName)
values ("123", "1234", "CUSTOMER", "Player")
But BigQuery UI demonsrtrated me the popup where written:
Column chatSessionId is not present in table my_dataset.free_schema_table at [1:43]
I expected that BiqQuery is a NoSql storage and I should be able to insert rows with different columns.
How could I achieve it ?
P.S.
schema:
BigQuery requires a schema with strong type.
If you need free schema, similar thing in BigQuery is to define a single column in STRING type and store JSON inside.
JSON functions will help you extract field from JSON string later, but you don't benefit from BigQuery's optimization if you predefine your schema and save data in different columns.

How to use/do where in column of a lookup in Splunk Search Query

I want the search with a field which match with any of the values in
look up table.
For now, I have used below where in query. But, I still want to query with Look up table instead of manually putting all those values in double quotes using the in clause.
|where in(search,"abcd","bcda","efsg","zyca");
First, you need to create a lookup field in the Splunk Lookup manager. Here you can specify a CSV file or KMZ file as the lookup. You will name the lookup definition here too. Be sure to share this lookup definition with the applications that will use it.
Once you have a lookup definition created, you can use it in a query with the Lookup Command. Say you named your lookup definition "my_lookup_csv", and your lookup column in your search is "event_column", and your csv column names are "column1", "column2", etc. Your search query will now end in:
| lookup my_lookup_csv column1 as event_column

PDI /Kettle - Passing data from previous hop to database query

I'm new to PDI and Kettle, and what I thought was a simple experiment to teach myself some basics has turned into a lot of frustration.
I want to check a database to see if a particular record exists (i.e. vendor). I would like to get the name of the vendor from reading a flat file (.CSV).
My first hurdle selecting only the vendor name from 8 fields in the CSV
The second hurdle is how to use that vendor name as a variable in a database query.
My third issue is what type of step to use for the database lookup.
I tried a dynamic SQL query, but I couldn't determine how to build the query using a variable, then how to pass the desired value to the variable.
The database table (VendorRatings) has 30 fields, one of which is vendor. The CSV also has 8 fields, one of which is also vendor.
My best effort was to use a dynamic query using:
SELECT * FROM VENDORRATINGS WHERE VENDOR = ?
How do I programmatically assign the desired value to "?" in the query? Specifically, how do I link the output of a specific field from Text File Input to the "vendor = ?" SQL query?
The best practice is a Stream lookup. For each record in the main flow (VendorRating) lookup in the reference file (the CSV) for the vendor details (lookup fields), based on its identifier (possibly its number or name or firstname+lastname).
First "hurdle" : Once the path of the csv file defined, press the Get field button.
It will take the first line as header to know the field names and explore the first 100 (customizable) record to determine the field types.
If the name is not on the first line, uncheck the Header row present, press the Get field button, and then change the name on the panel.
If there is more than one header row or other complexities, use the Text file input.
The same is valid for the lookup step: use the Get lookup field button and delete the fields you do not need.
Due to the fact that
There is at most one vendorrating per vendor.
You have to do something if there is no match.
I suggest the following flow:
Read the CSV and for each row look up in the table (i.e.: the lookup table is the SQL table rather that the CSV file). And put default upon not matching. I suggest something really visible like "--- NO MATCH ---".
Then, in case of no match, the filter redirect the flow to the alternative action (here: insert into the SQL table). Then the two flows and merged into the downstream flow.

Talend - Read xml schema of Ldap from an xml file

What I'm looking ?
I want to read the schema of an LDAPinput from an xml file.
Info:
The user will define the attributes that he wants in the xml file.
The job will retrive only those attributes that are defined in the xml from the LDAP folder. How can I do that?
I am new to talend and I cant find any question on this in SO.
Honestly, this is very painful to do properly and I'd seriously reconsider why you need to limit the columns back from the LDAP service and not just ignore the extraneous columns.
First of all you need to parse your XML input to get the requested columns and drop that into a list and then lob that into the globalMap.
What you're going to have to do is read in the entire output with all the columns from a correctly configured tLDAPInput component but with the schema for the component set to have a single dynamic column.
From here you'll need to use a tJavaRow/tJavaFlex component to loop through the list of expected columns from your XML input and then retrieve each column's name from the dynamic column's metadata and if the column name matches the provided values from your XML input then to output the value into an output column.
The output schema for your tJavaRow/tJavaFlex will need to contain as many columns as you could possibly have return (so every LDAP column for your service) but then populate them as they are needed. Alternatively you could output another dynamic schema column which means you don't need fixed schema columns but you'd have to add a meta column (so a column inside the dynamic column) with each match of column names.