Modeshape binary store in where? - jcr

First i config my modeshape configuration file like this:
"storage" : {
"persistence" : {
"type" : "db",
"connectionUrl": "${database.url}",
"driver": "${database.driver}",
"username": "${database.user}",
"password": "${database.password}",
"tableName": "GOVERNANCE_MODESHAPE",
"poolSize" : 5,
"createOnStart" : true,
"dropOnExit" : false
}
}
After I create a node and set a property for it and save it in my local environment, I can still find the node and the property in my local environment. But it will can't be found in my colleague local environment.
Then I change the configuration like this:
"storage" : {
"persistence" : {
"type" : "db",
"connectionUrl": "${database.url}",
"driver": "${database.driver}",
"username": "${database.user}",
"password": "${database.password}",
"tableName": "GOVERNANCE_MODESHAPE",
"poolSize" : 5,
"createOnStart" : true,
"dropOnExit" : false
},
"binaryStorage" : {
"type" : "file",
"directory": "/var/thinkbig/modeshape",
"minimumBinarySizeInBytes" : 5000000
}
}
I can find the node and property which created in my local environment, and my colleague also can find it in his local environment. But i can't find the directory of path /var/thinkbig/modeshape.
So I want to know the modeshape binary store from where? Why I add the "binaryStorage" config in the configuration file, everybody can find the node and property? Thanks in advance!

Per the doc for minimumBinarySizeInBytesthe minimum size (in bytes) above which binary values will be stored in the store. Any binary value lower in size will be stored together with the other node information..
This means that binaries smaller than the specified size are stored in the database, rather than the file system. You could change this to a value of 1 byte if you want to ensure that all binaries get stored in the file system .

Related

Use variables in azure stream analytics properties

I want to reduce the number of overrides during the deployment of my ASA by using environment variables in my properties.
Expectations
Having variables defined in the asaproj.json or the JobConfig.json file or a .env file.
{
...
"variables": [
"environment": "dev"
]
}
Call those variables in a properties file such as an SQL Reference properties input file
{
"Name": "sql-query",
"Type": "Reference data",
"DataSourceType": "SQL Database",
"SqlReferenceProperties": {
"Database": "${environment}-sql-bdd",
"Server": "${environment}-sql",
"User": "user",
"Password": null,
"FullSnapshotPath": "sql-query.snapshot.sql",
"RefreshType": "Execute periodically",
"RefreshRate": "06:00:00",
"DeltaSnapshotPath": null
},
"DataSourceCredentialDomain": null,
"ScriptType": "Input"
}
Attempt
I could use a powershell script to override values from the ARM variables file generated by the npm package azure-streamanalytics-cicd. It's not clean at all.
Problem
I can't find resources about environment variables in azure stream analytics online. Does such a thing exist ? If so, can you provide some piece of documentation ?

GCP Bigquery: Can't query stackdriver access logs exported in cloudstorage because invalid json field "#type"

I store the access log of a pixel image in a cloudstorage bucket dev-access-log-bucket using the standard "sink"
so the files looks like this requests/2019/05/08/15:00:00_15:59:59_S1.json
and one line looks like this (I formatted the json, but it's on one line normmaly) :
{
"httpRequest": {
"cacheLookup": true,
"remoteIp": "93.24.25.190",
"requestMethod": "GET",
"requestSize": "224",
"requestUrl": "https://dev-snowplow.legalstart.fr/one_pixel_image.png?user_id=0&action=purchase&product_id=0&money=10",
"responseSize": "779",
"status": 200,
"userAgent": "python-requests/2.21.0"
},
"insertId": "w6wyz1g2jckjn6",
"jsonPayload": {
"#type": "type.googleapis.com/google.cloud.loadbalancing.type.LoadBalancerLogEntry",
"statusDetails": "response_sent_by_backend"
},
"logName": "projects/tracking-pixel-239909/logs/requests",
"receiveTimestamp": "2019-05-08T15:34:24.126095758Z",
"resource": {
"labels": {
"backend_service_name": "",
"forwarding_rule_name": "dev-yolaw-pixel-forwarding-rule",
"project_id": "tracking-pixel-239909",
"target_proxy_name": "dev-yolaw-pixel-proxy",
"url_map_name": "dev-urlmap",
"zone": "global"
},
"type": "http_load_balancer"
},
"severity": "INFO",
"spanId": "7d8823509c2dc94f",
"timestamp": "2019-05-08T15:34:23.140747307Z",
"trace": "projects/tracking-pixel-239909/traces/bb55577eedd5797db2867931f8de9162"
}
all of these once again are standard GCP things, I did not customize anything here.
So now I want to do some requests on it from Bigquery, I create a dataset and an external table configured like this :
External Data Configuration
Source URI(s) gs://dev-access-log-bucket/requests/*
Auto-detect schema true (note: I don't know why it puts true though i've manually defined it)
Ignore unknown values true
Source format NEWLINE_DELIMITED_JSON
Max bad records 0
and the following manual schema:
timestamp DATETIME REQUIRED
httpRequest RECORD REQUIRED
httpRequest. requestUrl STRING REQUIRED
and when I run a request
SELECT
timestamp
FROM
`path.to.my.table`
LIMIT
1000
I got
Invalid field name "#type". Fields must contain only letters, numbers, and underscores, start with a letter or underscore, and be at most 128 characters long.
How can I work around this without needing to pre-process the log to not have the "#type" field in it ?

Substitute parts of a typed array in ASP.NET core appsettings.json from secrets/environment variables?

We have an ASP.NET Core web app with this appsettings.json:
{
"Subscriptions": [
{
"Name": "Production",
"PublishSettings": "<PublishData>SECRET</PublishData>",
"Environments": [
{
"Name": "Prod",
"DeploymentServiceNames": [
"api1",
"api2",
"api3"
]
}
]
},
{
"Name": "Test",
"PublishSettings": "<PublishData>SECRET</PublishData>",
"Environments": [
{
"Name": "Test1",
"DeploymentServiceNames": [
"api1",
"api2"
]
},
{
"Name": "Test2",
"DeploymentServiceNames": [
"api1",
"api2"
]
}
]
}
]
}
The PublishSettings values are secret so I want these in my local user secrets file, and in environment variables for my deployments. But, because Subscriptions is an array I'm not sure how. I don't particularly want to swap in the entire Subscriptions section. Is there a way to swap in a single property for each item in such an array, perhaps by defining a key property on the strongly typed subscription model?
When you load configuration in .NET Core, under the hood it's represented as a set of key-value pairs (both key and value have string type) supplied by added configuration providers.
For example, appsettings.json will be represented by JsonConfigurationProvider as the following settings list:
{Subscriptions:0:Environments:0:DeploymentServiceNames:0, api1}
{Subscriptions:0:Environments:0:DeploymentServiceNames:1, api2}
{Subscriptions:0:Environments:0:DeploymentServiceNames:2, api3}
{Subscriptions:0:Environments:0:Name, Prod}
{Subscriptions:0:Name, Production}
{Subscriptions:0:PublishSettings, <PublishData>SECRET</PublishData>}
{Subscriptions:1:Environments:0:DeploymentServiceNames:0, api1}
{Subscriptions:1:Environments:0:DeploymentServiceNames:1, api2}
{Subscriptions:1:Environments:0:Name, Test1}
{Subscriptions:1:Environments:1:DeploymentServiceNames:0, api1}
{Subscriptions:1:Environments:1:DeploymentServiceNames:1, api2}
{Subscriptions:1:Environments:1:Name, Test2}
{Subscriptions:1:Name, Test}
{Subscriptions:1:PublishSettings, <PublishData>SECRET</PublishData>}
As you see JSON structure was flattened and keys are built by joining inner section names with a colon. Array element are added with appropriate index as a name.
If you add another configuration source, e.g. environment variables or another secrets json file, which will have settings with the same keys, it will overwrite the setting.
So if you want to add or overwrite PublishSettings, you could add either another JSON file as configuration source:
{
"Subscriptions": [
{
"PublishSettings": "<PublishData>SECRET</PublishData>"
},
{
"PublishSettings": "<PublishData>SECRET</PublishData>"
}
]
}
Or add it as environment variables with the following keys:
Subscriptions:0:PublishSettings
Subscriptions:1:PublishSettings
Such setting override (or addition) is transparent for .NET Core configuration binder. Settings POCO will contain value of PublishSettings from the last configuration source that provides such value.

Druid RabbitMQ Firehose

I'm trying to setup druid to work with rabbitmq firehose but getting the following error from Tranquility
java.lang.IllegalArgumentException: Could not resolve type id 'rabbitmq' into a subtype of [simple type, class io.druid.data.input.FirehoseFactory]
I did the following
1. Installed Druid
2. Downloaded extension druid-rabbitmq
3. Copied druid-rabbitmq into druid extensions
4. Copied amqp-client jar to druid lib
5. Added druid-rabbitmq into druid.extensions.loadList in common.runtime.properties
6. In Tranquility server.json configuration added the firehose config
"ioConfig" : {
"type" : "realtime",
"firehose" : {
"type" : "rabbitmq",
"connection" : {
"host": "localhost",
"port": "5672",
"username": "blackbox",
"password": "blackbox",
"virtualHost": "blackbox-vhost",
"uri": "amqp://localhost:5672/blackbox-vhost"
},
"config" : {
"exchange": "test-exchange",
"queue" : "test-q",
"routingKey": "#",
"durable": "true",
"exclusive": "false",
"autoDelete": "false",
"maxRetries": "10",
"retryIntervalSeconds": "1",
"maxDurationSeconds": "300"
}
}
}
I'm using imply 1.3.0 but I think Tranquility is for stream pushing while a firehose is used for stream pulling so I think this was the problem. So now I created a realtime node and it's running fine. I also had to copy lyra jar file into druid lib directory. Now I can publish data from rabbit and its been inserted into druid and I can query the data but problem is that in rabbit the message is still showing as unacked. Any idea?

Making storage plugin on Apache Drill to HDFS

I'm trying to make storage plugin for Hadoop (hdfs) and Apache Drill.
Actually I'm confused and I don't know what to set as port for hdfs:// connection, and what to set for location.
This is my plugin:
{
"type": "file",
"enabled": true,
"connection": "hdfs://localhost:54310",
"workspaces": {
"root": {
"location": "/",
"writable": false,
"defaultInputFormat": null
},
"tmp": {
"location": "/tmp",
"writable": true,
"defaultInputFormat": null
}
},
"formats": {
"psv": {
"type": "text",
"extensions": [
"tbl"
],
"delimiter": "|"
},
"csv": {
"type": "text",
"extensions": [
"csv"
],
"delimiter": ","
},
"tsv": {
"type": "text",
"extensions": [
"tsv"
],
"delimiter": "\t"
},
"parquet": {
"type": "parquet"
},
"json": {
"type": "json"
},
"avro": {
"type": "avro"
}
}
}
So, is ti correct to set localhost:54310 because I got that with command:
hdfs -getconf -nnRpcAddresses
or it is :8020 ?
Second question, what do I need to set for location? My hadoop folder is in:
/usr/local/hadoop
, and there you can find /etc /bin /lib /log ... So, do I need to set location on my datanode, or?
Third question. When I'm connecting to Drill, I'm going through sqlline and than connecting on my zookeeper like:
!connect jdbc:drill:zk=localhost:2181
My question here is, after I make storage plugin and when I connect to Drill with zk, can I query hdfs file?
I'm very sorry if this is a noob question but I haven't find anything useful on internet or at least it haven't helped me.
If you are able to explain me some stuff, I'll be very grateful.
As per Drill docs,
{
"type" : "file",
"enabled" : true,
"connection" : "hdfs://10.10.30.156:8020/",
"workspaces" : {
"root" : {
"location" : "/user/root/drill",
"writable" : true,
"defaultInputFormat" : null
}
},
"formats" : {
"json" : {
"type" : "json"
}
}
}
In "connection",
put namenode server address.
If you are not sure about this address.
Check fs.default.name or fs.defaultFS properties in core-site.xml.
Coming to "workspaces",
you can save workspaces in this. In the above example, there is a workspace with name root and location /user/root/drill.
This is your HDFS location.
If you have files under /user/root/drill hdfs directory, you can query them using this workspace name.
Example: abc is under this directory.
select * from dfs.root.`abc.csv`
After successfully creating the plugin, you can start drill and start querying .
You can query any directory irrespective to workspaces.
Say you want to query employee.json in /tmp/data hdfs directory.
Query is :
select * from dfs.`/tmp/data/employee.json`
I have similar problem, Drill cannot read dfs server. Finally, the problem is cause by namenode port.
The default address of namenode web UI is http://localhost:50070/.
The default address of namenode server is hdfs://localhost:8020/.