How does Raven DB SQL Replication handle saving arrays to SQL? - sql

I am currently setting up SQL Replication to replicate our Raven DB documents into SQL for reporting purposes. So far everything has been working great. However, I am now trying to save a document that contains an array for days of the week.
This is how the document looks in Raven:
{
"ClientId": "clients/385",
"Description": "Test",
"IsOneOff": false,
"RecursEveryWeeks": 1,
"StartDate": "2013-03-19T00:00:00.0000000",
"TaskStartTime": "12:00:00",
"TaskDuration": 120,
"TaskEndTime": "14:00:00",
"AdditionalResources": false,
"AdditionalVisitType": "TestType",
"BillableTo": "Private",
"RecurrenceEndDate": "2013-04-30T00:00:00.0000000",
"DaysOfWeek": [
"Monday",
"Tuesday",
"Wednesday",
"Friday",
"Saturday"
]
}
In SQL Replication I have done the following:
sqlReplicate("AdditionalVisit", "AdditionalVisitId", {
ClientId: this.ClientId,
Description: this.Description,
IsOneOff: this.IsOneOff,
RecursEveryWeeks: this.RecursEveryWeeks,
StartDate: this.StartDate,
TaskStartTime: this.TaskStartTime,
TaskDuration: this.TaskDuration,
TaskEndTime: this.TaskEndTime,
AdditionalResources: this.AdditionalResources,
AdditionalVisitType: this.AdditionalVisitType,
BillableTo: this.BillableTo,
RecurrenceEndDate: this.RecurrenceEndDate,
DaysOfWeek: this.DaysOfWeek
});
All of this works fine when I leave DaysOfWeek out of the SQL Replication but causes the server to crash when I leave in.
How should this be done in SQL Replication so everything in the array is saved to a DaysOfWeek column in SQL?

I've not tested this, but it's along the lines of what you want...just add this to the end of your current script.
for (var i=0; i<this.DaysOfWeek.length; i++) {
var day = this.DaysOfWeek[i];
sqlReplicate('AdditionalVisit_DaysOfWeek', 'AdditionalVisitId', {
AdditionalVisitId: documentId,
DayOfWeek: day,
});
}
By the way there is currently a bug in the SqlReplication for RavenDb 2.1 where deletes won't be pushed through to SqlReplication, it's supposed to be fixed in 2.5 branch but there are still some other issues that need to be worked on for it to become usable.

Related

Proper way to convert Data type of a field in MongoDB

Possible Replication of How to change the type of a field?
I am currently newly learning MongoDB and I am facing problem while converting Data type of field value to another data type.
Below is an example of my document
[
{
"Name of Restaurant": "Briyani Center",
"Address": " 336 & 338, Main Road",
"Location": "XYZQWE",
"PriceFor2": "500.0",
"Dining Rating": "4.3",
"Dining Rating Count": "1500",
},
{
"Name of Restaurant": "Veggie Conner",
"Address": " New 14, Old 11/3Q, Railway Station Road",
"Location": "ABCDEF",
"PriceFor2": "1000.0",
"Dining Rating": "4.4",
}]
Like above I have 12k documents. Notice the datatype of PriceFor2 is a string. I would like to convert the data type to Integer data type.
I have referred many amazing answers given in the above link. But when I try to run the query, I get .save() is not a function error. Please advice what is the problem.
Below is the code I used
db.chennaiData.find().forEach( function(x){ x.priceFor2= new NumberInt(x.priceFor2);
db.chennaiData.save(x);
db.chennaiData.save(x);});
This is the error I am getting..
TypeError: db.chennaiData.save is not a function
From MongoDB's save documentation:
Starting in MongoDB 4.2, the
db.collection.save()
method is deprecated. Use db.collection.insertOne() or db.collection.replaceOne() instead.
Likely you are having a MongoDB with version 4.2+, so the save function is no longer available. Consider migrate to the usage of insertOne and replaceOne as suggested.
For your specific scenario, it is actually preferred to do with a single update as mentioned in another SO answer. It only does one db call(while your approach fetches all documents in the collection to the application level) and performs n db call to save them back.
db.collection.update({},
[
{
$set: {
PriceFor2: {
$toDouble: "$PriceFor2"
}
}
}
],
{
multi: true
})
Mongo Playground

Apache superset with mongoDB(NO SQL database)

I am using MongoDB. My task is to build Dashboard charts for the data. So, I am using Apache superset. I connected MongoDB to apache drill as it wont connect directly with superset. Then connected apache drill to Apachesueperset. My collection is nested. How can I process this nested data to get use for dashboard charts.My data looks as below
{
"_id": {
"$oid": "6229d3cfdbfc81a8777e4821"
},
"jobs": [
{
"job_ID": {
"$oid": "62289ded8079821eb24760e0"
},
"New": false,
"Expired": false
},
{
"job_ID": {
"$oid": "6228a252fb4554dd5c48202a"
},
"New": true,
"Expired": true
},
{
"job_ID": {
"$oid": "622af1c391b290d34701af9f"
},
"New": true,
"Expired": false
}
],
"email": "mani2090996#ail.com"
}
I am querying in apache drill as follows
SELECT flat.fill FROM (SELECT FLATTEN(t.jobs) AS fill FROM mongo.recruitingdb.flatten.`Vendorjobs` t) flat WHERE flat.fill.New = flase;
And i am getting parsing error
org.apache.drill.common.exceptions.UserRemoteException: PARSE ERROR: Encountered "." at line 1, column 123.
Superset doesn't really handle nested data very well. Drill does however, so you'll have to craft queries to produce columns that can be visualized.
Take a look here: https://drill.apache.org/docs/json-data-model/
and here: https://drill.apache.org/docs/querying-complex-data-introduction/.
UPDATE:
Try the query below. The FROM clause may not be exactly right, but you should get the idea from this.
Note that you can access maps in Drill in two ways:
tablename.mapname.field OR
mapname['field']
You can do this for any level of nesting.
SELECT mongoTable.jobs.job_ID.`$oid` AS job_ID,
mongoTable.jobs.`New` AS new,
mongoTable.jobs.`Expired` AS expired
FROM
(
SELECT flatten(jobs) AS jobs
FROM mongo.recruitingdb.flatten.`Vendorjobs` AS t1
WHERE t1.jobs.New = false
) AS mongoTable

Kafka Connect S3 sink - how to use the timestamp from the message itself [timestamp extractor]

I've been struggling with a problem using kafka connect and the S3 sink.
First the structure:
{
Partition: number
Offset: number
Key: string
Message: json string
Timestamp: timestamp
}
Normally when posting to Kafka, the timestamp should be set by the producer. Unfortunately there seems to be cases where this didn't happen. This means that the Timestamp might sometimes be null
To extract this timestamp the connector was set to the following value:
"timestamp.extractor":"Record".
Now it is always certain that the Message field itself always contains a timestamp as well.
Message:
{
timestamp: "2019-04-02T06:27:02.667Z"
metadata: {
creationTimestamp: "1554186422667"
}
}
The question however is that now, I would like to use that field for the timestamp.extractor
I was thinking that this would suffice, but this doesn't seem to work:
"timestamp.extractor":"RecordField",
"timestamp.field":"message.timestamp",
This results in a NullPointer as well.
Any ideas as to how to use the timestamp from the kafka message payload itself, instead of the default timestamp field that is set for kafka v0.10+
EDIT:
Full config:
{ "name": "<name>",
"config": {
"connector.class":"io.confluent.connect.s3.S3SinkConnector",
"tasks.max":"4",
"topics":"<topic>",
"flush.size":"100",
"s3.bucket.name":"<bucket name>",
"s3.region": "<region>",
"s3.part.size":"<partition size>",
"rotate.schedule.interval.ms":"86400000",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"key.converter.schemas.enable": "false",
"value.converter.schemas.enable": "false",
"storage.class":"io.confluent.connect.s3.storage.S3Storage",
"format.class":"io.confluent.connect.s3.format.json.JsonFormat",
"locale":"ENGLISH",
"timezone":"UTC",
"schema.generator.class":"io.confluent.connect.storage.hive.schema.TimeBasedSchemaGenerator",
"partitioner.class":"io.confluent.connect.storage.partitioner.TimeBasedPartitioner",
"partition.duration.ms": "3600000",
"path.format": "'year'=YYYY/'month'=MM/'day'=dd",
"timestamp.extractor":"RecordField",
"timestamp.field":"message.timestamp",
"max.poll.interval.ms": "600000",
"request.timeout.ms": "610000",
"heartbeat.interval.ms": "6000",
"session.timeout.ms": "20000",
"s3.acl.canned":"bucket-owner-full-control"
}
}
EDIT 2:
Kafka message payload structure:
{
"reference": "",
"clientId": "",
"gid": "",
"timestamp": "2019-03-19T15:27:55.526Z",
}
EDIT 3:
{
"transforms": "convert_op_creationDateTime",
"transforms.convert_op_creationDateTime.type": "org.apache.kafka.connect.transforms.TimestampConverter$Value",
"transforms.convert_op_creationDateTime.target.type": "Timestamp",
"transforms.convert_op_creationDateTime.field": "timestamp",
"transforms.convert_op_creationDateTime.format": "yyyy-MM-dd'T'HH:mm:ss.SSSXXX"
}
So I tried doing a transform on the object, but it seems like I've been stuck again on this thing. The pattern seems to be invalid. Looking around the internet it does seem like this is a valid SimpleDatePattern. It seems to be complaining about the 'T'. Updated the message schema as well.
Based on the schema you've shared, you should be setting:
"timestamp.extractor":"RecordField",
"timestamp.field":"timestamp",
i.e. no message prefix to the timestamp field name.
If the data is a string, then Connect will try to parse as milliseconds - source code here.
In any case, message.timestamp assumes the data looks like { "message" : { "timestamp": ... } }, so just timestamp would be correct. And having nested fields didn't use to be possible anyway, so you might want to clarify which version of Connect you have.
I'm not entirely sure how you would get instanceof Date to evalutate to true when using JSON Converter, and even if you had set schema.enable = true, then also in the code, you can see there is only conditions for schema types of numbers and strings, but still assumes that it is milliseconds.
You can try using the TimestampConverter transformation to convert your date string.

Incorrect query run in BigQuery by Apps Script code

We have a number of .gs files in an Apps Script project, which schedule a number of BiqQuery SQL queries into tables.
Everything was fine till a few days ago when one table started not updating correctly. We've looked into the query history, and seen that one of our tables hasn't been updated in quite a while. When we run the Apps Script responsible for that table, and check the BigQuery query history, it actually is running a different query, even though the script is valid and references different source and destination tables.
Our scripts mostly look like the below:
function table_load_1() {
var configuration = {
"query": {
"useQueryCache": false,
"destinationTable": {
"projectId": "project",
"datasetId": "schema",
"tableId": "destination_table"
},
"writeDisposition": "WRITE_TRUNCATE",
"createDisposition": "CREATE_IF_NEEDED",
"allowLargeResults": true,
"useLegacySql": false,
"query": "select * from `project.schema.source_table` "
}
};
var job = {
"configuration": configuration
};
var jobResult = BigQuery.Jobs.insert(job, "project");
Logger.log(jobResult);
}
Any idea why this would be happening?

Ravendb 2.5: query giving 0 results while it should be 1

I've got a problem with a query I can't figure out, I'm doing this query in code:
var userList = (from user in this.documentSession.Query<User>()
where user.FederatedUserIds[authenticatedClient.ProviderName] == authenticatedClient.UserInformation.Id
select user).ToList();
In this case the providername is facebook and the id = 100001103765630. The FederatedUserId's is a Dictionary.
Which results in this query to the server:
http://localhost:8080/indexes/dynamic/Users?&query=FederatedUserIds.facebook%3A100001103765630&pageSize=128
Which gives zero results, also from a query in the webbrowser:
{
"Results": [],
"Includes": [],
"IsStale": false,
"IndexTimestamp": "2013-08-24T14:52:44.0511623Z",
"TotalResults": 0,
"SkippedResults": 0,
"IndexName": "Auto/Users/ByFederatedUserIds_facebook",
"IndexEtag": "01000000-0000-0064-0000-000000000001",
"ResultEtag": "2BD9AA1E-935A-FEDF-3636-FAB0F155ED9E",
"Highlightings": {},
"NonAuthoritativeInformation": false,
"LastQueryTime": "2013-08-24T15:00:30.1200358Z",
"DurationMilliseconds": 1
}
While I have a document that's like this in the database, so I expect 1 result instead of 0:
{
"DisplayName": "neographikal",
"RealName": "x",
"Email": "x",
"PictureUri": "x",
"Roles": [
"User"
],
"ProfileImages": [],
"FederatedUserIds": {
"google": "x",
"twitter": "x",
"windowslive": "x",
"linkedin": "x",
"facebook": "100001103765630"
}
}
The strange thing is, this has never bothered me before in this piece of code. Can somebody see where I'm doing this wrong?
I was going to say that you might have stale results, but I see that "IsStale": false.
The only other thing I see is that the query on the URL comes through as FederatedUserIds.facebook while the field name is going to be FederatedUserIds_facebook in the index. However, I tested this and it worked, so it appears that the . is translated to _ before the query executes. I'm not sure when this was added or if it's always been that way.
Note if you try to query with . in Raven Studio, it doesn't work there, but _ does.
What build version are you running? I tested on 2.5.2666 and it worked for me.
Somehow, this was related to the indexes. Although it was a auto indexed query, deleting all the indexes on the server and rebooting the application solved the problem. I don't think this should have happened, but reproducing it seemed very difficult. If I can reproduce it in the feature, I'll try to investigate this further.
edit 01-09:
Found the problem and created a bug report: http://issues.hibernatingrhinos.com/issue/RavenDB-1334
Worked around it by creating a decent index:
public class UserByFederatedLoginIndex : AbstractIndexCreationTask<Core.Domain.User>
{
public UserByFederatedLoginIndex()
{
Map = users => from u in users
select new
{
u.DisplayName,
_ = u.FederatedUserIds.Select(x => CreateField("FederatedUserIds_"+x.Key, x.Value))
};
}