solr indexing custom json - creates empty documents - indexing

I am quite new to solr. I currently have it running in cloud mode using docker compose (my configuration can be seen at the end of the question)
I created a collection called audittrail using default configuration. The idea is that I'll send event logging info from another app to solr. It has a convenient looking schema full of dynamic fields by default. (I know I shouldn't just use default settings in production, right now I'm looking for a proof of concept).
Now I'm following this document in an attempt to index some of my data: https://lucene.apache.org/solr/guide/7_2/transforming-and-indexing-custom-json.html#mapping-parameters
> curl 'http://0.0.0.0:8983/api/collections/audittrail/update/json'\
'?split=/events&'\
'f=action_kind_s:/action_kind_s&'\
'f=time_dt:/events/time_dt'\
'&echo=true' \ ########## NOTE this means we're running in debug more. solr returns the documents it should be creating
-H 'Content-type:application/json' -d '{
"action_kind_s": "task_exec",
"events": [
{
"event_kind_s": "start",
"in_transaction_b": false,
"time_dt": "2018-03-09T12:57:07Z"
},
{
"event_kind_s": "start_txn",
"in_transaction_b": true,
"time_dt": "2018-03-09T12:57:07Z"
},
{
"event_kind_s": "diff",
"in_transaction_b": true,
"key_s": "('MerchantWorkerProcess', 5819715045818368L)",
"property_s": "claim_time",
"time_dt": "2018-03-09T12:57:07Z",
"value_dt": "2018-03-09T12:57:07Z"
},
],
"final_status_s": "COMPLETE",
"request_s": "1dfda9955dac6f3cfd76fbedee98b15f6edc0db",
"task_name_s": "0p5k20100CcnMVxaxoWl32WlfPixjV1OFKgv0k1KZ0m_acc_work"
}'
# response:
{
"responseHeader":{
"status":0,
"QTime":1},
"docs":[{},
{},
{}]}
That's three empty documents...
So I thought maybe it was because I wasn't specifying an id. So I gave each event a unique id and tried again with the added &f=id:/events/id. Same result
Originally I tried using wildcards (&f=/**) with the same effect.
There is obviously something missing in my understanding.
So my question is:
What should I do to get my documents populated correctly?
EDIT
Also, my solr node logs arent turnng up any errors. Here's a sample:
2018-03-09 14:30:50.770 INFO (qtp257895351-21) [c:audittrail s:shard2 r:core_node4 x:audittrail_shard2_replica_n2] o.a.s.u.p.LogUpdateProcessorFactory [audittrail_shard2_replica_n2] webapp=null path=/update/json params={split=/events}{add=[78953602-6b02-4948-8443-fd1ebc340921 (1594470800573857792)]} 0 3
2018-03-09 14:31:05.770 INFO (commitScheduler-14-thread-1) [c:audittrail s:shard2 r:core_node4 x:audittrail_shard2_replica_n2] o.a.s.u.DirectUpdateHandler2 start commit{_version_=1594470816305643520,optimize=false,openSearcher=false,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
2018-03-09 14:31:05.770 INFO (commitScheduler-14-thread-1) [c:audittrail s:shard2 r:core_node4 x:audittrail_shard2_replica_n2] o.a.s.u.SolrIndexWriter Calling setCommitData with IW:org.apache.solr.update.SolrIndexWriter#13d117d6 commitCommandVersion:1594470816305643520
2018-03-09 14:31:05.918 INFO (commitScheduler-14-thread-1) [c:audittrail s:shard2 r:core_node4 x:audittrail_shard2_replica_n2] o.a.s.s.SolrIndexSearcher Opening [Searcher#4edc35b0[audittrail_shard2_replica_n2] realtime]
2018-03-09 14:31:05.921 INFO (commitScheduler-14-thread-1) [c:audittrail s:shard2 r:core_node4 x:audittrail_shard2_replica_n2] o.a.s.u.DirectUpdateHandler2 end_commit_flush
docker-compose.yml
version: '3'
services:
zookeeper:
image: zookeeper:3.4.11
ports:
- "2181:2181"
hostname: "zookeeper"
container_name: "zookeeper"
solr1:
image: solr:7.2.1
ports:
- "8983:8983"
container_name: solr1
links:
- zookeeper:ZK
command: /opt/solr/bin/solr start -f -z zookeeper:2181
solr2:
image: solr:7.2.1
ports:
- "8984:8983"
container_name: solr2
links:
- zookeeper:ZK
command: /opt/solr/bin/solr start -f -z zookeeper:2181
Here are the exact steps I go through to index some data.
This does not actually index anything and I want to know why
docker-compose up
create the collection
curl -X POST 'http://0.0.0.0:8983/solr/admin/collections?action=CREATE&name=audittrail&numShards=2'
{
"responseHeader":{
"status":0,
"QTime":6178},
"success":{
"172.24.0.3:8983_solr":{
"responseHeader":{
"status":0,
"QTime":3993},
"core":"audittrail_shard1_replica_n1"},
"172.24.0.4:8983_solr":{
"responseHeader":{
"status":0,
"QTime":4399},
"core":"audittrail_shard2_replica_n2"}},
"warning":"Using _default configset. Data driven schema functionality is enabled by default, which is NOT RECOMMENDED for production use. To turn it off: curl http://{host:port}/solr/audittrail/config -d '{\"set-user-property\": {\"update.autoCreateFields\":\"false\"}}'"}
curl to create some data ( this is the same curl as in the main question. but not in debug mode:
curl 'http://0.0.0.0:8983/api/collections/audittrail/update/json?split=/events&f=action_kind_s:/action_kind_s&f=time_dt:/events/time_dt' -H 'Content-type:application/json' -d '{ "action_kind_s": "task_exec", "events": [{"event_kind_s": "start","in_transaction_b": false, "time_dt": "2018-03-09T12:57:07Z"},{"event_kind_s": "start_txn", "in_transaction_b": true,"time_dt": "2018-03-09T12:57:07Z"},{"event_kind_s": "diff", "in_transaction_b": true,"key_s": "('MerchantWorkerProcess', 5819715045818368L)","property_s": "claim_time","time_dt": "2018-03-09T12:57:07Z","value_dt": "2018-03-09T12:57:07Z"},], "final_status_s": "COMPLETE", "request_s": "xxx", "task_name_s": "xxx"}'
{
"responseHeader":{
"status":0,
"QTime":126}}
Do the query:
curl 'http://0.0.0.0:8983/solr/audittrail/select?q=*:*'
{
"responseHeader":{
"zkConnected":true,
"status":0,
"QTime":12,
"params":{
"q":"*:*"}},
"response":{"numFound":0,"start":0,"maxScore":0.0,"docs":[]
}}

It seems that it's only the echo parameter that doesn't do what you expect it to do - remove that, and add commit=true to your URL to make Solr commit the documents to the index as soon as possible before returning, and you can then find documents (by searching for *:* in the admin interface under collection -> query with your fields present in the index:
{
"action_kind_s":"task_exec",
"time_dt":"2018-03-09T12:57:07Z",
"id":"b56100f5-ff61-45e7-8d6b-8072bac6c952",
"_version_":1594486636806144000},
{
"action_kind_s":"task_exec",
"time_dt":"2018-03-09T12:57:07Z",
"id":"f49fc3cb-eac6-4d02-bcdf-b7c1a34782e3",
"_version_":1594486636807192576}

Related

Unable to invoke another service with Dapr

I'm having major problems getting Dapr up and running with my microservices. Every time I try to invoke another service, it returns a 500 error with the message
client error: the server closed connection before returning the first response byte. Make sure the server returns 'Connection: close' response header before closing the connection
The services and dapr sidecars are currently running in docker-compose on our dev machines but will run in Kubernetes when it is deployed properly.
When I look at the logs for the dapr containers in Docker for Windows, I can see the application being discovered on port 443 and a few initialisation messages but nothing else ever gets logged after that, even when I make my invoke request.
I have a container called clients, which I'm calling an API called test in it and this is then trying to call Microsoft's example weather forecast API in another container called simpleapi.
I'm using swaggerUI to call the apis. The test api returns 200 but when I put a breakpoint on the invoke, I can see the response is 500.
If I call the weatherforecast api directly using swaggerui, it returns a 200 with the expected payload.
I have the Dapr dashboard running in a container and it doesn't show any applications.
Docker-Compose.yml
version: '3.4'
services:
clients:
image: ${DOCKER_REGISTRY-}clients
container_name: "Clients"
build:
context: .
dockerfile: Clients/Dockerfile
ports:
- "50002:50002"
depends_on:
- placement
- database
networks:
- platform
clients-dapr:
image: "daprio/daprd:edge"
container_name: clients-dapr
command: [
"./daprd",
"-app-id", "clients",
"-app-port", "443",
"-placement-host-address", "placement:50006",
"-dapr-grpc-port", "50002"
]
depends_on:
- clients
network_mode: "service:clients"
simpleapi:
image: ${DOCKER_REGISTRY-}simpleapi
build:
context: .
dockerfile: SimpleAPI/Dockerfile
ports:
- "50003:50003"
depends_on:
- placement
networks:
- platform
simpleapi-dapr:
image: "daprio/daprd:edge"
container_name: simpleapi-dapr
command: [
"./daprd",
"-app-id", "simpleapi",
"-app-port", "443",
"-placement-host-address", "placement:50006",
"-dapr-grpc-port", "50003"
]
depends_on:
- simpleapi
network_mode: "service:simpleapi"
placement:
image: "daprio/dapr"
container_name: placement
command: ["./placement", "-port", "50006"]
ports:
- "50006:50006"
networks:
- platform
dashboard:
image: "daprio/dashboard"
container_name: dashboard
ports:
- "8080:8080"
networks:
- platform
networks:
platform:
Test controller from the Clients API.
[Route("api/[controller]")]
[ApiController]
public class TestController : ControllerBase
{
[HttpGet]
public async Task<ActionResult> Get()
{
var httpClient = DaprClient.CreateInvokeHttpClient();
var response = await httpClient.GetAsync("https://simpleapi/weatherforecast");
return Ok();
}
}
This is a major new project for my company and it's looking like we're going to have to abandon Dapr and implement everything ourselves if we can't get this working soon.
I'm hoping there's some glaringly obvious problem here.
Actually turned out to be quite simple.
I needed to tell dapr to use ssl.
The clients-dapr needed the -app-ssl parameter so clients-dapr should have been as follows (the simpleapi-dapr needs the same param adding too)
clients-dapr:
image: "daprio/daprd:edge"
container_name: clients-dapr
command: [
"./daprd",
"-app-id", "clients",
"-app-port", "443",
"-app-ssl",
"-placement-host-address", "placement:50006",
"-dapr-grpc-port", "50002"
]
depends_on:
- clients
network_mode: "service:clients"
you can run your service-specific port without docker and check dapr works as expected. you can specify http port & grpc port.
dapr run `
--app-id serviceName `
--app-port 5139 `
--dapr-http-port 3500 `
--dapr-grpc-port 50001 `
--components-path ./dapr-components
if the above setup works then you can setup with the docker. check above solution

Selenium isn't able to reach a docker container with docker-compose run

I have the following docker-compose.yml which starts a chrome-standalone container and a nodejs application:
version: '3.7'
networks:
selenium:
services:
selenium:
image: selenium/standalone-chrome-debug:3
networks:
- selenium
ports:
- '4444:4444'
- '5900:5900'
volumes:
- /dev/shm:/dev/shm
user: '7777:7777'
node:
image: node_temp:latest
build:
context: .
target: development
args:
UID: '${USER_UID}'
GID: '${USER_GID}'
networks:
- selenium
env_file:
- .env
ports:
- '8090:8090'
volumes:
- .:/home/node
depends_on:
- selenium
command: >
sh -c 'yarn install &&
yarn dev'
I'm running the containers as follows:
docker-compose up -d selenium
docker-compose run --service-ports node sh
and starting the e2e from within the shell.
When running the e2e tests, selenium can be reached from the node container(through: http://selenium:4444), but node isn't reachable from the selenium container.
I have tested this by VNC'ing into the selenium container and pointing the browser to: http://node:8090. (The node container is reachable on the host however, through: http://localhost:8090).
I first thought that docker-compose run doesn't add the running container to the proper network, however by running docker network inspect test_app I get the following:
[
{
"Name": "test_app_selenium",
"Id": "df6517cc7b6446d1712b30ee7482c83bb7c3a9d26caf1104921abd6bbe2caf68",
"Created": "2019-06-30T16:08:50.724889157+02:00",
"Scope": "local",
"Driver": "bridge",
"EnableIPv6": false,
"IPAM": {
"Driver": "default",
"Options": null,
"Config": [
{
"Subnet": "172.31.0.0/16",
"Gateway": "172.31.0.1"
}
]
},
"Internal": false,
"Attachable": true,
"Ingress": false,
"ConfigFrom": {
"Network": ""
},
"ConfigOnly": false,
"Containers": {
"8a76298b237790c62f80ef612debb021549439286ce33e3e89d4ee2f84de3aec": {
"Name": "test_app_node_run_78427bac2fd1",
"EndpointID": "04310bc4e564f831e5d08a0e07891d323a5953fa936e099d20e5e384a6053da8",
"MacAddress": "02:42:ac:1f:00:03",
"IPv4Address": "172.31.0.3/16",
"IPv6Address": ""
},
"ef087732aacf0d293a2cf956855a163a081fc3748ffdaa01c240bde452eee0fa": {
"Name": "test_app_selenium_1",
"EndpointID": "24a597e30a3b0b671c8b19fd61b9254bea9e5fcbd18693383d93d3df789ed895",
"MacAddress": "02:42:ac:1f:00:02",
"IPv4Address": "172.31.0.2/16",
"IPv6Address": ""
}
},
"Options": {},
"Labels": {
"com.docker.compose.network": "selenium",
"com.docker.compose.project": "test_app",
"com.docker.compose.version": "1.24.1"
}
}
]
Which shows both containers running on the "selenium" network. I'm not sure however if the node container is properly aliased on the network and if this is proper behaviour.
Am I missing some config here?
Seems like docker-compose run names the container differently to evade the service namespace as noted in docker-compose.yml. http://node:8090 was therefore not reachable.
I solved this by adding a --name flag as follows:
docker-compose run --service-ports --name node node sh
EDIT:
It took me a while to notice, but I was overcomplicating the implementation by a lot. The above docker-compose.yml can be simplified by adding host networking. This simply exposes all running containers on localhost and makes them reachable on localhost by their specified ports. Considering that I don't need any encapsulation (it's meant for dev), the following docker-compose.yml sufficed:
version: '3.7'
services:
selenium:
image: selenium/standalone-chrome:3
# NOTE: port definition is useless with network_mode: host
network_mode: host
user: '7777:7777'
node:
image: node_temp:latest
build:
context: .
target: development
args:
UID: '${USER_UID}'
GID: '${USER_GID}'
network_mode: host
env_file:
- .env
volumes:
- .:/home/node
command: >
sh -c 'yarn install &&
yarn dev'

Ansible dynamic inventory refresh

I have created a simple dynamic inventory python which prints JSON to standard output mentioned below, but Ansible inventory does not refresh.
Command: ansible-playbook playbooks/deploy.yaml -i playbook/inventory_test.py
Inventory JSON:
{
'python_hosts': {
'hosts': ['10.220.21.122', '10.220.21.278'],
'vars': {
'ansible_ssh_user': 'projectuser',
}
},
'_meta': {
'hostvars': {
'10.220.21.122': {
'host_specific_var': 'testhost'
},
'10.220.21.278': {
'host_specific_var': 'towerhost'
}
}
}
}
I also tried this:
- hosts: localhost
tasks:
- name: test
script: ./inventory_test.py
- name: Refresh inventory
meta: refresh_inventory
- name: print new inventory
debug:
var: groups
But inventory still does not refresh automatically.
Ansible version is 2.6.4
Any help on this is really appreciated.
This is because of the cache...
Go to ~/.ansible/tmp and delete inventory_test.cache file.
Also, this is helpful answer if you able edit playbook...

kafka connect transforms RegExRouter exiting with unrecoverable exception

I have made a kafka pipeline to copy a sqlserver table to s3
During sink, i'm trying to transform topic names dropping prefix with the regexrouter function :
"transforms":"dropPrefix",
"transforms.dropPrefix.type":"org.apache.kafka.connect.transforms.RegexRouter",
"transforms.dropPrefix.regex":"SQLSERVER-TEST-(.*)",
"transforms.dropPrefix.replacement":"$1"
The sink fails with the message :
org.apache.kafka.connect.errors.ConnectException: Exiting WorkerSinkTask due to unrecoverable exception.
at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:586)
at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:322)
at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:225)
at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:193)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:175)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:219)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NullPointerException
at io.confluent.connect.s3.S3SinkTask.put(S3SinkTask.java:188)
at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:564)
... 10 more
If i remove the transform, the pipeline works fine
Problem can be reproduced with this docker-compose :
version: '2'
services:
smtproblem-zookeeper:
image: zookeeper
container_name: smtproblem-zookeeper
ports:
- "2181:2181"
smtproblem-kafka:
image: confluentinc/cp-kafka:5.0.0
container_name: smtproblem-kafka
ports:
- "9092:9092"
links:
- smtproblem-zookeeper
- smtproblem-minio
environment:
KAFKA_ADVERTISED_HOST_NAME : localhost
KAFKA_ZOOKEEPER_CONNECT: smtproblem-zookeeper:2181/kafka
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://smtproblem-kafka:9092
KAFKA_CREATE_TOPICS: "_schemas:3:1:compact"
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
smtproblem-schema_registry:
image: confluentinc/cp-schema-registry:5.0.0
container_name: smtproblem-schema-registry
ports:
- "8081:8081"
links:
- smtproblem-kafka
- smtproblem-zookeeper
environment:
SCHEMA_REGISTRY_HOST_NAME: http://smtproblem-schema_registry:8081
SCHEMA_REGISTRY_KAFKASTORE_BOOTSTRAP_SERVERS: PLAINTEXT://smtproblem-kafka:9092
SCHEMA_REGISTRY_GROUP_ID: schema_group
smtproblem-kafka-connect:
image: confluentinc/cp-kafka-connect:5.0.0
container_name: smtproblem-kafka-connect
command: bash -c "wget -P /usr/share/java/kafka-connect-jdbc http://central.maven.org/maven2/com/microsoft/sqlserver/mssql-jdbc/6.4.0.jre8/mssql-jdbc-6.4.0.jre8.jar && /etc/confluent/docker/run"
ports:
- "8083:8083"
links:
- smtproblem-zookeeper
- smtproblem-kafka
- smtproblem-schema_registry
- smtproblem-minio
environment:
CONNECT_BOOTSTRAP_SERVERS: smtproblem-kafka:9092
CONNECT_REST_PORT: 8083
CONNECT_GROUP_ID: "connect_group"
CONNECT_OFFSET_FLUSH_INTERVAL_MS: 1000
CONNECT_CONFIG_STORAGE_TOPIC: "connect_config"
CONNECT_OFFSET_STORAGE_TOPIC: "connect_offsets"
CONNECT_STATUS_STORAGE_TOPIC: "connect_status"
CONNECT_CONFIG_STORAGE_REPLICATION_FACTOR: 1
CONNECT_OFFSET_STORAGE_REPLICATION_FACTOR: 1
CONNECT_STATUS_STORAGE_REPLICATION_FACTOR: 1
CONNECT_KEY_CONVERTER: "io.confluent.connect.avro.AvroConverter"
CONNECT_VALUE_CONVERTER: "io.confluent.connect.avro.AvroConverter"
CONNECT_KEY_CONVERTER_SCHEMA_REGISTRY_URL: "http://smtproblem-schema_registry:8081"
CONNECT_VALUE_CONVERTER_SCHEMA_REGISTRY_URL: "http://smtproblem-schema_registry:8081"
CONNECT_INTERNAL_KEY_CONVERTER: "org.apache.kafka.connect.json.JsonConverter"
CONNECT_INTERNAL_VALUE_CONVERTER: "org.apache.kafka.connect.json.JsonConverter"
CONNECT_REST_ADVERTISED_HOST_NAME: "smtproblem-kafka_connect"
CONNECT_LOG4J_ROOT_LOGLEVEL: INFO
CONNECT_LOG4J_LOGGERS: org.reflections=ERROR
CONNECT_PLUGIN_PATH: "/usr/share/java"
AWS_ACCESS_KEY_ID: localKey
AWS_SECRET_ACCESS_KEY: localSecret
smtproblem-minio:
image: minio/minio:edge
container_name: smtproblem-minio
ports:
- "9000:9000"
entrypoint: sh
command: -c 'mkdir -p /data/datalake && minio server /data'
environment:
MINIO_ACCESS_KEY: localKey
MINIO_SECRET_KEY: localSecret
volumes:
- "./minioData:/data"
smtproblem-sqlserver:
image: microsoft/mssql-server-linux:2017-GA
container_name: smtproblem-sqlserver
environment:
ACCEPT_EULA: "Y"
SA_PASSWORD: "Azertyu&"
ports:
- "1433:1433"
Create a database in sqlserver container :
$ sudo docker exec -it smtproblem-sqlserver bash
# /opt/mssql-tools/bin/sqlcmd -S localhost -U SA -P 'Azertyu&'
Create a test database :
create database TEST
GO
use TEST
GO
CREATE TABLE TABLE_TEST (id INT, name NVARCHAR(50), quantity INT, cbMarq INT NOT NULL IDENTITY(1,1), cbModification smalldatetime DEFAULT (getdate()))
GO
INSERT INTO TABLE_TEST VALUES (1, 'banana', 150, 1); INSERT INTO TABLE_TEST VALUES (2, 'orange', 154, 2);
GO
exit
exit
Create a source connector :
curl -X PUT http://localhost:8083/connectors/sqlserver-TEST-source-bulk/config -H 'Content-Type: application/json' -H 'Accept: application/json' -d '{
"connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
"connection.password": "Azertyu&",
"validate.non.null": "false",
"tasks.max": "3",
"table.whitelist": "TABLE_TEST",
"mode": "bulk",
"topic.prefix": "SQLSERVER-TEST-",
"connection.user": "SA",
"connection.url": "jdbc:sqlserver://smtproblem-sqlserver:1433;database=TEST"
}'
Create the sink connector :
curl -X PUT http://localhost:8083/connectors/sqlserver-TEST-sink/config -H 'Content-Type: application/json' -H 'Accept: application/json' -d '{
"topics": "SQLSERVER-TEST-TABLE_TEST",
"topics.dir": "TABLE_TEST",
"s3.part.size": 5242880,
"storage.class": "io.confluent.connect.s3.storage.S3Storage",
"tasks.max": 3,
"schema.compatibility": "NONE",
"s3.region": "us-east-1",
"schema.generator.class": "io.confluent.connect.storage.hive.schema.DefaultSchemaGenerator",
"connector.class": "io.confluent.connect.s3.S3SinkConnector",
"partitioner.class": "io.confluent.connect.storage.partitioner.DefaultPartitioner",
"format.class": "io.confluent.connect.s3.format.avro.AvroFormat",
"s3.bucket.name": "datalake",
"store.url": "http://smtproblem-minio:9000",
"flush.size": 1,
"transforms":"dropPrefix",
"transforms.dropPrefix.type":"org.apache.kafka.connect.transforms.RegexRouter",
"transforms.dropPrefix.regex":"SQLSERVER-TEST-(.*)",
"transforms.dropPrefix.replacement":"$1"
}'
Error can be shown in Kafka connect UI, or with curl status command :
curl -X GET http://localhost:8083/connectors/sqlserver-TEST-sink/status
Thanks for your help
So, if we debug, we can see what it is trying to do...
There is a HashMap with the original topic name (SQLSERVER_TEST_TABLE_TEST-0), and the transform has already been applied (TABLE-TEST-0), so if we lookup the "new" topicname, it cannot find the S3 writer for the TopicPartition.
Therefore, the map returns null, and the subsequent .buffer(record) throws an NPE.
I had a similar use case for this before -- writing more than one topic into a single S3 path, and I ended up having to write a custom partitioner, e.g. class MyPartitioner extends DefaultPartitioner.
If you build a JAR using some custom code like that, put it under usr/share/java/kafka-connect-storage-common, then edit the connector config for partitioner.class, it should work as expected.
I'm not really sure if this is a "bug", per say, because back up the call stack, there is no way to get a reference to the regex transform at the time the topicPartitionWriters are declared with the source topic name(s).
If anything, the storage connector configurations should allow a separate regex transform that can edit the encodedPartition (the path where it writes the files)

Why isn't salt-api working with this configuration?

I'm trying to set up a really basic salt-api configuration just to test it out. I'm using salt-master and salt-minion 2016.3.0 Boron on Ubuntu 14.04
I'm using this tutorial, and my configuration is below.
/srv/salt/top.sls
base:
'*':
- reactor
/etc/salt/master.d/reactor.conf
reactor:
- 'salt/netapi/hook/restart':
- /srv/reactor/test.sls
/srv/reactor/test.sls
{% set postdata = data.get('post', {}) %}
{% if postdata.secretkey == "replacethiswithsomethingbetter" %}
test:
local.cmd.run:
- tgt: '{{ postdata.tgt }}'
-arg:
- touch /home/username/test.txt
{% endif %}
I have restarted the master, and if I run salt '*' state.sls reactor then everything in the state works fine. All it does is touch /home/username/test.txt, and that file is created when I run the state.
The command I'm running to use the API is
curl -H "Accept: application/json" -d tgt='*' -d secretkey="replacethiswithsomethingbetter" -k https://192.168.1.1:8080/hook/services/restart
and that command returns {"success": true}
Then I check on the minion, and the file hasn't been created.
The output of salt-run state.event pretty=True is
salt/netapi/hook/services/restart {
"_stamp": "2016-06-29T19:30:04.193832",
"body": "",
"headers": {
"Accept": "application/json",
"Content-Length": "46",
"Content-Type": "application/x-www-form-urlencoded",
"Host": "192.168.1.1:8080",
"Remote-Addr": "192.168.1.3",
"User-Agent": "curl/7.35.0"
},
"post": {
"secretkey": "replacethiswithsomethingbetter",
"tgt": "*"
}
}
I did go through all of the self signed cert steps. I'm not 100% sure why I need those, but it's done, and they are listed in the rest_cherrypy configs.
Any help is appreciated.
EDIT 1
I edited the reactor stuff, because there's an example of using cmd.run, or local.cmd.run located int he salt docs here
It's still returning true, and not working.
URL is wrong:
https://192.168.1.1:8080/hook/services/restart
should change to :
https://192.168.1.1:8080/hook/restart
because what you defined is :
reactor:
- 'salt/netapi/hook/restart':
- /srv/reactor/test.sls
You can view debug log by run salt-master as salt-master -l debug .