How to use a MarkLogic DataHub importFlow from inside another database - marklogic-9

I have a usecase that needs to query some data in one database and then use this data as a new input into the MarkLogic DataHub pipeline.
I created a working import and harmonization flow.
Now I want to run the import flow from another database to insert data into th staging database in the dhf.
'use strict';
let id= "/ClueyTest/track/cluey/a2c5c32c-6e99-47c9-8b4d-5b97897509f7.json";
let options = {"dhf.projectName":"ClueyTest", "entity":"Track", "flow":"ImportClueyTracks", "flowType":"input", "dataFormat":"json"};
let rawContent = {
"trackId": "a2c5c32c-6e99-47c9-8b4d-5b97897509f7",
"type": "Feature",
"geometry": {
"type": "LineString",
"coordinates": [
[5.4701967, 51.8190698],
[5.470028, 51.8193624],
[5.470038, 51.8193624],
[5.470048, 51.8193624],
[5.470028, 51.8193634]]
}
,
"properties": {
"timestamps": [
"2019-02-14T16:52:06+0100",
"2019-02-14T16:51:07+0100",
"2019-02-14T16:43:24+0100",
"2019-02-14T16:43:24+0100",
"2019-02-14T16:43:24+0100"
]
}
,
"tracktype": "on",
"endTimestamp": "2019-02-14T16:51:07+0100",
"startTimestamp": "2019-02-14T14:46:50+0100"
}
const clt = require('/entities/clueyTrack/input/ImportClueyTracks/main.sjs');
// the main in the import flow
clt.main(id,rawContent,options);
Obviously you need a working importflow inside your datahub to run this code but the question is about the general usecase how to run an import flow not from gradle but from inside a marklogic database.
All dhf code is sjs.

I think using the Server-Side Library would be most elegant, but I think it does require DataHub v4+:
https://marklogic.github.io/marklogic-data-hub/refs/server-side-library/
HTH!

Related

How to add mock data containing authorized owner to Amplify (either through GraphiQL or directly to AppSync)?

I'm working on a React app and testing some CRUD functionality by mocking the backend, creating some data through GraphiQL, and running the app (amplify mock, then yarn start).
I want to be able to create mock data tied to my user as the owner because most types in the schema are set up with owner authorization:
type XYZ
#auth(rules: [{ allow: owner, operations: [update, delete, create] }]) {
id: ID!
...more types...etc
}
Right now, I
run amplify mock
Go to GraphiQL local endpoint (192.etc....)
Run some createXYZ mutations to create data
Run my app w yarn start
login with testUser & password
Test the deleteXYZ button which should ideally remove a particular XYZ from the mocked data this is what doesn't work
I suspect what's happening is that I didn't run the createXYZ mutation as testUser, just as a generic GraphiQL user, so the owner property isn't tied to "myUserId". Is that the problem here?
How would I specify owner on my create mutations in GraphiQL?
This is the error I'm getting, pretty sure it means the XYZ object's owner is different than my testUser submitting the deleteXYZ request:
Error while executing Local DynamoDB
{
"version": "2018-05-29",
"operation": "DeleteItem",
"key": {
"id": {
"S": "18b152a6-c98d-4336-be74-1e122191"
}
},
"condition": {
"expression": "( #owner0 = :identity0) AND attribute_exists(#id)",
"expressionNames": {
"#owner0": "owner",
"#id": "id"
},
"expressionValues": {
":identity0": {
"S": "fd2a7758-f7ba-4d57-bdb0-e5346492"
}
}
}
}
Could I have to add the owner id in Amplify's GraphiQL Auth options popup?
I just ran into this issue. I was able to work around it by putting my Cognito User Sub in the username field.

How to run a SQL query in Cloud Formation template to enable Delayed_Durability in AWS RDS

I have a Cloud Formation template to create a SQL DB in the RDS and want to enable Delayed_Durability feature by default in it by running this query:
ALTER DATABASE dbname SET DELAYED_DURABILITY = FORCED;
Is there a way to run this query right after db instance is created through CF template?
My CF template looks like this:
"Type":"AWS::RDS::DBInstance",
"Properties":{
"AllocatedStorage":"200",
"AutoMinorVersionUpgrade":"false",
"BackupRetentionPeriod":"1",
"DBInstanceClass":"db.m4.large",
"DBInstanceIdentifier":"mydb",
"DBParameterGroupName": {
"Ref": "MyDBParameterGroup"
},
"DBSubnetGroupName":{
"Ref":"dbSubnetGroup"
},
"Engine":"sqlserver-web",
"EngineVersion":"13.00.4422.0.v1",
"LicenseModel":"license-included",
"MasterUsername":"prod_user",
"MasterUserPassword":{ "Ref" : "dbpass" },
"MonitoringInterval":"60",
"MonitoringRoleArn": {
"Fn::GetAtt": [
"RdsMontioringRole",
"Arn"
]
},
"PreferredBackupWindow":"09:39-10:09",
"PreferredMaintenanceWindow":"Sun:08:58-Sun:09:28",
"PubliclyAccessible": false,
"StorageType":"gp2",
"StorageEncrypted": true,
"VPCSecurityGroups":[
{
"Fn::ImportValue":{
"Fn::Sub":"${NetworkStackName}-RDSSecGrp"
}
}
],
"Tags":[
{
"Key":"Name",
"Value":"my-db"
}
]
}
}
Is there a way to run this query right after db instance is created through CF template?
Depends. If you want to do it from within CloudFormation (CFN) then sadly, you can't do this using plain CFN. To do it from CFN, you would have to develop a custom resource. The resource would be in the form of lambda function. You would pass the DB details to the function in your CFN, and it could run and execute your query. It could also return any results you want to your CFN for further use.
In contrast, if you create your CFN stack using AWS CLI or SDK, then once create-stack call is completed, you can run your query from bash or any programming language you use do deploy your stack.

Is it possible to read google sheets *metadata* only with API key?

It is possible to read data from a sheet only with API key (without OAuth 2.0), but it seems that reading the developer metadata requires OAuth 2.0.
Is there some way to read the metadata from an app without asking the user to connect his google account?
You want to retrieve the developer metadata of the Spreadsheet using the API key.
You have already been able to get values from Spreadsheet using the API key.
If my understanding is correct, how about this answer? Please think of this as just one of several possible answers.
Issue and workaround:
Unfortunately, "REST Resource: spreadsheets.developerMetadata" in Sheets API cannot be used with the API key. In this case, OAuth2 is required as mentioned in your question. The developer metadata can be also retrieved by the method of spreadsheets.get in Sheets API. The developer metadata can be retrieved by the API key. And in this method, all developer metadata is retrieved. So when you want to search the developer metadata, please search it from the retrieved all developer metadata.
IMPORTANT POINTS:
In this case, please set the visibility of developer metadata to DOCUMENT. By this, the developer metadata can be retrieved by the API key. If the visibility is PROJECT, it cannot be retrieved with the API key. Please be careful this.
When you want to retrieve the developer metadata with the API key, please publicly share the Spreadsheet. By this, it can be retrieved with the API key. Please be careful this.
Sample situation 1:
As a sample situation, it supposes that it creates new Spreadsheet, and create new developer metadata to the Spreadsheet as the key of "sampleKey" and value of "sampleValue".
In this case, the sample request body of spreadsheets.batchUpdate is as follows.
{
"requests": [
{
"createDeveloperMetadata": {
"developerMetadata": {
"location": {
"spreadsheet": true
},
"metadataKey": "sampleKey",
"metadataValue": "sampleValue",
"visibility": "DOCUMENT"
}
}
}
]
}
Sample curl command:
When you retrieve the developer metadata from above sample Spreadsheet, please use the following curl command.
curl "https://sheets.googleapis.com/v4/spreadsheets/### spreadsheetId ###?key=### your API key ###&fields=developerMetadata"
In this case, fields=developerMetadata is used to make it easier to see the response value. Of course, you can also use * as fields.
In this case, when above endpoint is put to the browser, you can see the retrieved value, because of GET method.
Result:
{
"developerMetadata": [
{
"metadataId": 123456789,
"metadataKey": "sampleKey",
"metadataValue": "sampleValue",
"location": {
"locationType": "SPREADSHEET",
"spreadsheet": true
},
"visibility": "DOCUMENT"
}
]
}
Sample situation 2:
As other situation, it supposes that it creates new Spreadsheet, and create new developer metadata to the 1st column (column "A") as the key of "sampleKey" and value of "sampleValue".
In this case, the sample request body is as follows.
{
"requests": [
{
"createDeveloperMetadata": {
"developerMetadata": {
"location": {
"dimensionRange": {
"sheetId": 0,
"startIndex": 0,
"endIndex": 1,
"dimension": "COLUMNS"
}
},
"metadataKey": "sampleKey",
"metadataValue": "sampleValue",
"visibility": "DOCUMENT"
}
}
}
]
}
Sample curl command:
When you retrieve the developer metadata from above sample Spreadsheet, please use the following curl command.
curl "https://sheets.googleapis.com/v4/spreadsheets/### spreadsheetId ###?key=### your API key ###&fields=sheets(data(columnMetadata(developerMetadata)))"
In this case, sheets(data(columnMetadata(developerMetadata))) is used to make it easier to see the response value. Of course, you can also use * as fields.
Result:
{
"sheets": [
{
"data": [
{
"columnMetadata": [
{
"developerMetadata": [
{
"metadataId": 123456789,
"metadataKey": "sampleKey",
"metadataValue": "sampleValue",
"location": {
"locationType": "COLUMN",
"dimensionRange": {
"dimension": "COLUMNS",
"startIndex": 0,
"endIndex": 1
}
},
"visibility": "DOCUMENT"
}
]
},
{},
,
,
]
}
]
}
]
}
References:
Method: spreadsheets.developerMetadata.get
DeveloperMetadataVisibility
If I misunderstood your question and this was not the direction you want, I apologize.

Google BigQuery connector (Connect Data Studio to BigQuery tables) - I would like to modify this connector to customize for my special requirements

I need to modify the Google Data Studio - Google BigQuery Connector for the customized requirements.
https://support.google.com/datastudio/answer/6370296
First Question: How could I find the source code for this data connector?
Second question:
According to the guide, https://developers.google.com/datastudio/connector/reference, getData(),
Returns the tabular data for the given request.
And the response is in this format
{
"schema":[
{
"name":"OpportunityName",
"dataType":"STRING"
},
{
"name":"IsVerified",
"dataType":"BOOLEAN"
},
{
"name":"Created",
"dataType":"STRING"
},
{
"name":"Amount",
"dataType":"NUMBER"
}
],
"rows":[
{
"values":[
"Interesting",
true,
"2017-05-23",
"120453.65"
]
},
{
"values":[
"SF",
false,
"2017-03-03",
"362705286.92"
]
},
{
"values":[
"Spring Sale",
true,
"2017-04-21",
"870.12"
]
}
],
"cachedData":true
}
But BigQuery could have 100 millions records in the table. We don't care that it could be 100 millions records, we just give the response in this format anyway?
Thanks!
The existing DS-BQ connector is not open source, hence you won't be able to modify its behavior.
With that said:
The DS-BQ connector has a "smarter" API contract than the open one - queries and filters will be passed down.
Feel free to create your own DS-BQ connector with whatever logic you might require! Community connectors would love your contributions.

New-StreamAnalyticsJob cannot create Operations Monitoring Input for an IOT Hub

We have a Stream Analytics job that has an Input mapping to an IOT Hub Operations Monitoring endpoint. We originally defined our job on the Azure Portal. It works fine when so created / updated.
We use the job logic in multiple "Azure environments" and are now keeping it in source control. We used the Visual Studio Stream Analytics Project type to manage the source code.
We are using the New-StreamAnalyticsJob Powershell command to deploy our job into different environments.
Each time we deploy, however, the resulting Stream Analytics Job's Input points to the Messaging endpoint of our IOT Hub instead of the Operations Monitoring endpoint.
Is there something we can enter into the input's JSON file to express the endpoint type? Here is the Input content of our JSON input to the cmdlet:
"Inputs": [{
"Name": "IOT-Hub-Monitoring-By-Consumer-Group",
"Properties": {
"DataSource": {
"Properties": {
"ConsumerGroupName": "theConsumerGroup",
"IotHubNamespace": "theIotNamespace",
"SharedAccessPolicyKey": null,
"SharedAccessPolicyName": "iothubowner"
},
"Type": "Microsoft.Devices/IotHubs"
},
"Serialization": {
"Properties": {
"Encoding": "UTF8",
"Format": "LineSeparated"
},
"Type": "Json"
},
"Type": "Stream"
}
},
{
"Name": "IOT-Hub-Messaging-By-Consumer-Group",
"Properties": {
"DataSource": {
"Properties": {
"ConsumerGroupName": "anotherConsumerGroup",
"IotHubNamespace": "theIotNamespace",
"SharedAccessPolicyKey": null,
"SharedAccessPolicyName": "iothubowner"
},
"Type": "Microsoft.Devices/IotHubs"
},
"Serialization": {
"Properties": {
"Encoding": "UTF8",
"Format": "LineSeparated"
},
"Type": "Json"
},
"Type": "Stream"
}
}
]
Is there an endpoint element within the IotHubProperties that we're not expressing? Is it documented somewhere?
I notice that the Azure Portal calls a different endpoint than is indicated here: https://learn.microsoft.com/en-us/rest/api/streamanalytics/stream-analytics-definition
It uses endpoints under https://main.streamanalytics.ext.azure.com/api. e.g.
GET /api/Jobs/GetStreamingJob?subscriptionId={guid}&resourceGroupName=MyRG&jobName=MyJobName
You'll notice in the results JSON:
{
"properties": {
"inputs": {
{
"properties": {
"datasource": {
"inputIotHubSource": {
"iotHubNamespace":"HeliosIOTHubDev",
"sharedAccessPolicyName":"iothubowner",
"sharedAccessPolicyKey":null,
---> "endpoint":"messages/events", <---
"consumerGroupName":"devicehealthmonitoring"
}
For operations monitoring you will see "endpoint":"messages/operationsMonitoringEvents"
They seem to implement Save for Inputs as PATCH /api/Inputs/PatchInput?... which takes a similarly constructed JSON with the same 2 values for endpoint.
Are you able to use that endpoint somehow? i.e. call New-AzureRmStreamAnalyticsJob as you normally would then Invoke-WebRequest -Method Patch -Uri ...
--Edit--
The Invoke-WebRequest was a no-go -- far too much authentication to try to replicate/emulate.
A better option is to go through this tutorial to create a console application and set the endpoint after deploying using the Powershell scripts.
Something like this should work (albeit with absolutely no error/null checks):
string tenantId = "..."; //Tenant Id Guid
string subscriptionId = "..."; //Subcription Id Guid
string rgName = "..."; //Name of Resource Group
string jobName = "..."; //Name of Stream Analytics Job
string inputName = "..."; //Name-of-Input-requiring-operations-monitoring
string accesskey = "..."; //Shared Access Key for the IoT Hub
var login = new ServicePrincipalLoginInformation();
login.ClientId = "..."; //Client / Application Id for AD Service Principal (from tutorial)
login.ClientSecret = "..."; //Password for AD Service Principal (from tutorial)
var environment = new AzureEnvironment
{
AuthenticationEndpoint = "https://login.windows.net/",
GraphEndpoint = "https://graph.windows.net/",
ManagementEnpoint = "https://management.core.windows.net/",
ResourceManagerEndpoint = "https://management.azure.com/",
};
var credentials = new AzureCredentials(login, tenantId, environment)
.WithDefaultSubscription(subscriptionId);
var azure = Azure
.Configure()
.WithLogLevel(HttpLoggingDelegatingHandler.Level.Basic)
.Authenticate(credentials)
.WithDefaultSubscription();
var client = new StreamAnalyticsManagementClient(credentials);
client.SubscriptionId = azure.SubscriptionId;
var job = client.StreamingJobs.List(expand: "inputs").Where(j => j.Name == jobName).FirstOrDefault();
var input = job.Inputs.Where(i => i.Name == inputName).FirstOrDefault();
var props = input.Properties as StreamInputProperties;
var ds = props.Datasource as IoTHubStreamInputDataSource;
ds.Endpoint = "messages/operationsMonitoringEvents";
ds.SharedAccessPolicyKey = accesskey;
client.Inputs.CreateOrReplace(input, rgName, jobName, inputName);
The suggestion from #DaveMontgomery was a good one but turned out to be not needed.
A simple CMDLET upgrade addressed the issue.
The root issue turned out to be that the Azure Powershell Cmdlets, up to and including version 4.1.x were using an older version of the Microsoft.Azure.Management.StreamAnalytics assembly, namely 1.0. Version 2.0 of Microsoft.Azure.Management.StreamAnalyticscame out some months ago and that release included, as I understand, adding an endpoint element to the Inputs JSON structure.
The new CMDLETs release is documented here: https://github.com/Azure/azure-powershell/releases/tag/v4.2.0-July2017. The commits for the release included https://github.com/Azure/azure-powershell/commit/0c00632aa8f767e58077e966c04bb6fc505da1ef, which upgrades to Microsoft.Azure.Management.StreamAnalytics v2.0.
Note that this was a beaking change, in that the JSON changed from PascalCase to camelCase.
With this change in hand we can add an endpoint element to the Properties / DataSource /Properties IOT input, and the as-deployed Stream Analytics Jobs contains an IOT Input properly sewn to the operationsMonitoring endpoint.