Databricks Job API create job with single node cluster - api

I am trying to figure out why I get the following error, when I use the Databricks Job API.
{
"error_code": "INVALID_PARAMETER_VALUE",
"message": "Cluster validation error: Missing required field: settings.cluster_spec.new_cluster.size"
}
What I did:
I created a Job running on a single node cluster using the Databricks UI.
I copy& pasted the job config json from the UI.
I deleted my job and tried to recreate it by sending a POST using the Job API with the copied json that looks like this:
{
"new_cluster": {
"spark_version": "7.5.x-scala2.12",
"spark_conf": {
"spark.master": "local[*]",
"spark.databricks.cluster.profile": "singleNode"
},
"azure_attributes": {
"availability": "ON_DEMAND_AZURE",
"first_on_demand": 1,
"spot_bid_max_price": -1
},
"node_type_id": "Standard_DS3_v2",
"driver_node_type_id": "Standard_DS3_v2",
"custom_tags": {
"ResourceClass": "SingleNode"
},
"enable_elastic_disk": true
},
"libraries": [
{
"pypi": {
"package": "koalas==1.5.0"
}
}
],
"notebook_task": {
"notebook_path": "/pathtoNotebook/TheNotebook",
"base_parameters": {
"param1": "test"
}
},
"email_notifications": {},
"name": " jobName",
"max_concurrent_runs": 1
}
The documentation of the API does not help (can't find anything about settings.cluster_spec.new_cluster.size). The json is copied from the UI, so I guess it should be correct.
Thanks for your help.

Source: https://learn.microsoft.com/en-us/azure/databricks/dev-tools/api/latest/clusters#--create
To create a Single Node cluster, include the spark_conf and custom_tags entries shown in the example and set num_workers to 0.
{
"cluster_name": "single-node-cluster",
"spark_version": "7.6.x-scala2.12",
"node_type_id": "Standard_DS3_v2",
"num_workers": 0,
"spark_conf": {
"spark.databricks.cluster.profile": "singleNode",
"spark.master": "local[*]"
},
"custom_tags": {
"ResourceClass": "SingleNode"
}
}

Related

"INVALID_CURSOR_ARGUMENTS" from Github graphql API

I am using the following query:
query myOrgRepos {
organization(login: "COMPANY_NAME") {
repositories(first: 100) {
edges {
node {
name
defaultBranchRef {
target {
... on Commit {
history(after: "2021-01-01T23:59:00Z", before: "2023-02-06T23:59:00Z", author: { emails: "USER_EMAIL" }) {
edges {
node {
oid
}
}
}
}
}
}
}
}
}
}
}
But with accurate names for the orginization and emails, and am persistantly getting the following error for every repo.
{
"type": "INVALID_CURSOR_ARGUMENTS",
"path": [
"organization",
"repositories",
"edges",
20,
"node",
"defaultBranchRef",
"target",
"history"
],
"locations": [
{
"line": 10,
"column": 29
}
],
"message": "`2021-01-01T23:59:00Z` does not appear to be a valid cursor."
},
If I remove the after field, it works just fine. However, I kind of need it. Acording to all the docs that I have read both after and before take the same timestamp. Can't tell where I am going wrong here.
I have tried:
to narrow the gap between before and after
return only a single repository
remove after (works fine without it)

If artifactory aql response rather than status 200 just failed the pipeline

I have an aql query for clean up to artifactory artifacts. My query sometimes is getting different from status 200 but response status no matter what pipeline still ends successfully. My question is that if I get different response status instead of response status 200 just broke the pipeline. Pipeline should succeed only like below outputs.
Output sample:
{
"status": "success",
"totals": {
"success": 18,
"failure": 0
}
}
My aql query:
{
"files": [{
"aql": { "items.find" : {
"$or": [
{
"$and": [
{"repo": "generic-temp"},
{"created": {"$before": "25days"} }
]
},
{
"$and": [
{"repo": "generic-temp"},
{
"$and": [
{"name":{"$nmatch":"*-test"}},
{"name":{"$nmatch":"*-sample.*"}},
]
}
]
}
]
}}}]}

Rename a field within an array in DocumentDB via pymongo

I am using docudmentDB(version 3.6) in AWS. I use a python lambda in AWS for this task. The task is to rename a field that is in an array. Here is the sample JSON document I have. Here I need to rename 'version' to 'label'.
{
"_id": "93ee62b2-4a1f-478f-a716-b4e2f435f27d",
"egrouping": [
{
"type": "Video",
"language_code": "eng",
"hierarchy": {
"pype": {
"parent": "episode",
"version": "1",
"uuid": "933433-4a1f-478f-a716-b4e2f435f27d"
}
}
},
{
"type": "Captions",
"language_code": "eng",
"hierarchy": {
"pype": {
"parent": "episode",
"version": "1",
"uuid": "943454-4a1f-478f-a716-b4e2f435f27d"
}
}
}
]
}
The following code snippet I tried to rename the 'version' field which is in an error:
collection.aggregate([
{
'$project': {
'egrouping': {
'$map': {
"input": "$egrouping",
'as': "egroup",
'in': {
"hierarchy.pype.label": "$$egroup.hierarchy.pype.version"
}
}
}
}
}
])
But I end up with this error:
"errorMessage": "Aggregation project operator not supported: '$map'",
Amazon DocumentDB does not support $map. For the complete list of APIs that DocumentDB supports, refer to https://docs.aws.amazon.com/documentdb/latest/developerguide/mongo-apis.html.
We are constantly working backwards from the APIs our customers are looking to use. You can keep an eye on our future launches here https://aws.amazon.com/documentdb/resources/
The $map operator is now supported, in both Amazon DocumentDB versions, 3.6 and 4.0.

dotnet-monitor and OpenTelemetry?

I'm learning OpenTelemetry and I wonder how dotnet-monitor is connected with OpenTelemetry (Meter). Are those things somehow connected or maybe dotnet-monitor is just custom MS tools that is not using standards from OpenTelemetry (API, SDK and exporters).
If you run dotnet-monitor on your machine it exposes the dotnet metrics in Prometheus format which mean you can set OpenTelemetry collector to scrape those metrics
For example in OpenTelemetry-collector-contrib configuration
receivers:
prometheus_exec:
exec: dotnet monitor collect
port: 52325
Please note that for dotnet-monitor to run you need to create a setting.json in theis path:
$XDG_CONFIG_HOME/dotnet-monitor/settings.json
If $XDG_CONFIG_HOME is not defined, create the file in this path:
$HOME/.config/dotnet-monitor/settings.json
If you want to identify the process by its PID, write this into settings.json (change Value to your PID):
{
"DefaultProcess": {
"Filters": [{
"Key": "ProcessId",
"Value": "1"
}]
}
}
If you want to identify the process by its name, write this into settings.json (change Value to your process name):
{
"DefaultProcess": {
"Filters": [{
"Key": "ProcessName",
"Value": "iisexpress"
}]
},
}
In my example I used this configuration:
{
"DefaultProcess": {
"Filters": [{
"Key": "ProcessId",
"Value": "1"
}]
},
"Metrics": {
"Providers": [
{
"ProviderName": "System.Net.Http"
},
{
"ProviderName": "Microsoft-AspNetCore-Server-Kestrel"
}
]
}
}

Kafka Connect: How to extract a field

I'm using Debezium SQL Server Connector to stream a table into a topic. Thanks to Debezium's ExtractNewRecordState SMT, I'm getting the following message in my topic.
{
"schema":{
"type":"struct",
"fields":[
{
"type":"int64",
"optional":false,
"field":"id"
},
{
"type":"string",
"optional":false,
"field":"customer_code"
},
{
"type":"string",
"optional":false,
"field":"topic_name"
},
{
"type":"string",
"optional":true,
"field":"payload_key"
},
{
"type":"boolean",
"optional":false,
"field":"is_ordered"
},
{
"type":"string",
"optional":true,
"field":"headers"
},
{
"type":"string",
"optional":false,
"field":"payload"
},
{
"type":"int64",
"optional":false,
"name":"io.debezium.time.Timestamp",
"version":1,
"field":"created_on"
}
],
"optional":false,
"name":"test_server.dbo.kafka_event.Value"
},
"payload":{
"id":129,
"customer_code":"DVTPRDFT411",
"topic_name":"DVTPRDFT411",
"payload_key":null,
"is_ordered":false,
"headers":"{\"kafka_timestamp\":1594566354199}",
"payload":"MSG 18",
"created_on":1594595154267
}
}
After adding value.converter.schemas.enable=false, I could get rid of the schema portion and only the payload part is left as shown below.
{
"id":130,
"customer_code":"DVTPRDFT411",
"topic_name":"DVTPRDFT411",
"payload_key":null,
"is_ordered":false,
"headers":"{\"kafka_timestamp\":1594566354199}",
"payload":"MSG 19",
"created_on":1594595154280
}
I'd like to go 1 step further and extract only the customer_code field. I tried ExtractField$Value SMT but I keep getting the exception IllegalArgumentException: Unknown field: customer_code.
My configuration is as following
transforms=unwrap,extract
transforms.unwrap.type=io.debezium.transforms.ExtractNewRecordState
transforms.unwrap.drop.tombstones=true
transforms.unwrap.delete.handling.mode=drop
transforms.extract.type=org.apache.kafka.connect.transforms.ExtractField$Key
transforms.extract.field=customer_code
I tried a bunch of other SMTs including ExtractField$Key, ValueToKey but I couldn't make it work. I'd be very grateful if you could show me what I've done wrong. According to this tutorial from Confluent, it should work but it didn't.
** UPDATE **
I'm running Kafka Connect using connect-standalone worker.properties sqlserver.properties.
worker.properties
offset.storage.file.filename=C:/development/kafka_2.12-2.5.0/data/kafka/connect/connect.offsets
plugin.path=C:/development/kafka_2.12-2.5.0/plugins
bootstrap.servers=127.0.0.1:9092
offset.flush.interval.ms=10000
rest.port=10082
rest.host.name=127.0.0.1
rest.advertised.port=10082
rest.advertised.host.name=127.0.0.1
internal.key.converter=org.apache.kafka.connect.json.JsonConverter
internal.value.converter=org.apache.kafka.connect.json.JsonConverter
internal.key.converter.schemas.enable=false
internal.value.converter.schemas.enable=false
key.converter=org.apache.kafka.connect.json.JsonConverter
value.converter=org.apache.kafka.connect.json.JsonConverter
value.converter.schemas.enable=false
sqlserver.properties
name=sql-server-connector
connector.class=io.debezium.connector.sqlserver.SqlServerConnector
database.hostname=127.0.0.1
database.port=1433
database.user=sa
database.password=dummypassword
database.dbname=STGCTR
database.history.kafka.bootstrap.servers=127.0.0.1:9092
database.server.name=wfo
table.whitelist=dbo.kafka_event
database.history.kafka.topic=db_schema_history
transforms=unwrap,extract
transforms.unwrap.type=io.debezium.transforms.ExtractNewRecordState
transforms.unwrap.drop.tombstones=true
transforms.unwrap.delete.handling.mode=drop
transforms.extract.type=org.apache.kafka.connect.transforms.ExtractField$Value
transforms.extract.field=customer_code
The schema and payload fields sound like you're using data that was serialized with a JsonConverter with schemas enabled.
You can just set value.converter.schemas.enable=false to achieve your goal.