Migration didn't properly handle: appointment(com.example.spaappointment.data.Appointment) - kotlin

I had this error while my Appointment class is
package com.example.spaappointment.data
import androidx.annotation.NonNull
import androidx.room.ColumnInfo
import androidx.room.Entity
import androidx.room.PrimaryKey
#Entity(tableName = "appointment")
data class Appointment(
#NonNull #PrimaryKey
val date: String,
val name: String,
val phonenumber: Int,
val email: String,
val note: String
and the Jason file looks like this
{ "formatVersion": 1, "database": { "version": 3, "identityHash": "939188bd3f343aa9fe8319733e41e36a", "entities": [ { "tableName": "service", "createSql": "CREATE TABLE IF NOT EXISTS `${TABLE_NAME}` (`price`TEXT NOT NULL,`name`TEXT NOT NULL,`description`TEXT NOT NULL,`id` INTEGER NOT NULL, PRIMARY KEY(`id`))", "fields": [ { "fieldPath": "price", "columnName": "price", "affinity": "TEXT", "notNull": true }, { "fieldPath": "name", "columnName": "name", "affinity": "TEXT", "notNull": true }, { "fieldPath": "description", "columnName": "description", "affinity": "TEXT", "notNull": true }, { "fieldPath": "id", "columnName": "id", "affinity": "INTEGER", "notNull": true } ], "primaryKey": { "columnNames": [ "id" ], "autoGenerate": false }, "indices": [], "foreignKeys": [] }, { "tableName": "appointment", "createSql": "CREATE TABLE IF NOT EXISTS `${TABLE_NAME}` (`date`TEXT NOT NULL,`name`TEXT NOT NULL,`phonenumber`INTEGER NOT NULL,`email`TEXT NOT NULL,`note` TEXT NOT NULL, PRIMARY KEY(`date`))", "fields": [ { "fieldPath": "date", "columnName": "date", "affinity": "TEXT", "notNull": true }, { "fieldPath": "name", "columnName": "name", "affinity": "TEXT", "notNull": true }, { "fieldPath": "phonenumber", "columnName": "phonenumber", "affinity": "INTEGER", "notNull": true }, { "fieldPath": "email", "columnName": "email", "affinity": "TEXT", "notNull": true }, { "fieldPath": "note", "columnName": "note", "affinity": "TEXT", "notNull": true } ], "primaryKey": { "columnNames": [ "date" ], "autoGenerate": false }, "indices": [], "foreignKeys": [] } ], "views": [], "setupQueries": [ "CREATE TABLE IF NOT EXISTS room_master_table (id INTEGER PRIMARY KEY,identity_hash TEXT)", "INSERT OR REPLACE INTO room_master_table (id,identity_hash) VALUES(42, '939188bd3f343aa9fe8319733e41e36a')" ] } }
and this is the error:
E/AndroidRuntime: FATAL EXCEPTION: main
Process: com.example.spaappointment, PID: 9952
java.lang.IllegalStateException: Migration didn't properly handle: appointment(com.example.spaappointment.data.Appointment).
TableInfo{name='appointment', columns={date=Column{name='date', type='TEXT', affinity='2', notNull=true, primaryKeyPosition=1, defaultValue='null'}, name=Column{name='name', type='TEXT', affinity='2', notNull=true, primaryKeyPosition=0, defaultValue='null'}, phonenumber=Column{name='phonenumber', type='INTEGER', affinity='3', notNull=true, primaryKeyPosition=0, defaultValue='null'}, note=Column{name='note', type='TEXT', affinity='2', notNull=true, primaryKeyPosition=0, defaultValue='null'}, email=Column{name='email', type='TEXT', affinity='2', notNull=true, primaryKeyPosition=0, defaultValue='null'}}, foreignKeys=[], indices=[]}
TableInfo{name='appointment', columns={}, foreignKeys=[], indices=[]}
I don't know how to fix it.

Room is a little fickle in regards to what it expects and it's not the easiest task to ascertain what the difference is between the expected and found schemas (CREATE TABLE SQL) in the log when the Migration fails.
However, there is no need, as Room can tell you exactly what is expected.
What you do is
create the new/changed #Entity annotated classes
include them, if not already included in the entities parameter of the #Database annotation.
Compile the project (Ctrl + F9)
Locate the java(generated) (visible via the Android View)
Find the class that is the same name as the #Database annotated class but suffixed with _Impl
Find the createAllTables method and then you have the SQL that Room would use to create the tables (ignore room_master which holds the hash).
Base the migration on the respective SQL.
Saying that the found (i.e. what is actually in the database after the Migration has been run) being
Found: TableInfo{name='appointment', columns={}, foreignKeys=[], indices=[]}
Indicates that there are no columns, foreign keys or indexes for the appointment table. Are you sure that you have created the appointment table in the migration? If not then you need to and you can use the SQL from the generated java as indicated previously.


Query Druid SQL inner join with a dataSource name that has a dash

How to write an INNER JOIN query between two data sources that one of them has a dash as it's schema name
Executing the following query on the Druid SQL binary results in a query error
FROM first
INNER JOIN "second-schema" on first.device_id = "second-schema".device_id;
org.apache.druid.java.util.common.ISE: Cannot build plan for query
Is this the correct syntax when trying to refrence a data source that has a dash in it's name?
"dataSchema": {
"dataSource": "second-schema",
"parser": {
"type": "string",
"parseSpec": {
"format": "json",
"timestampSpec": {
"column": "ts_start"
"dimensionsSpec": {
"dimensions": [
"dimensionExclusions": [],
"spatialDimensions": []
"metricsSpec": [
{ "type": "hyperUnique", "name": "conn_id_hll", "fieldName": "conn_id"},
"type": "count",
"name": "event_count"
"granularitySpec": {
"type": "uniform",
"segmentGranularity": "HOUR",
"queryGranularity": "minute"
"ioConfig": {
"type": "realtime",
"firehose": {
"type": "kafka-0.8",
"consumerProps": {
"zookeeper.connect": "localhost:2181",
"zookeeper.connectiontimeout.ms": "15000",
"zookeeper.sessiontimeout.ms": "15000",
"zookeeper.synctime.ms": "5000",
"group.id": "flow-info",
"fetch.size": "1048586",
"autooffset.reset": "largest",
"autocommit.enable": "false"
"feed": "flow-info"
"plumber": {
"type": "realtime"
"tuningConfig": {
"type": "realtime",
"maxRowsInMemory": 50000,
"basePersistDirectory": "\/opt\/druid-data\/realtime\/basePersist",
"intermediatePersistPeriod": "PT10m",
"windowPeriod": "PT15m",
"rejectionPolicy": {
"type": "serverTime"
"dataSchema": {
"dataSource": "first",
"parser": {
"type": "string",
"parseSpec": {
"format": "json",
"timestampSpec": {
"column": "ts_start"
"dimensionsSpec": {
"dimensions": [
"dimensionExclusions": [],
"spatialDimensions": []
"metricsSpec": [
{ "type": "doubleSum", "name": "val_num", "fieldName": "val_num" },
{ "type": "doubleMin", "name": "val_num_min", "fieldName": "val_num" },
{ "type": "doubleMax", "name": "val_num_max", "fieldName": "val_num" },
{ "type": "doubleSum", "name": "size", "fieldName": "size" },
{ "type": "doubleMin", "name": "size_min", "fieldName": "size" },
{ "type": "doubleMax", "name": "size_max", "fieldName": "size" },
{ "type": "count", "name": "first_count" }
"granularitySpec": {
"type": "uniform",
"segmentGranularity": "HOUR",
"queryGranularity": "minute"
"ioConfig": {
"type": "realtime",
"firehose": {
"type": "kafka-0.8",
"consumerProps": {
"zookeeper.connect": "localhost:2181",
"zookeeper.connectiontimeout.ms": "15000",
"zookeeper.sessiontimeout.ms": "15000",
"zookeeper.synctime.ms": "5000",
"group.id": "first",
"fetch.size": "1048586",
"autooffset.reset": "largest",
"autocommit.enable": "false"
"feed": "first"
"plumber": {
"type": "realtime"
"tuningConfig": {
"type": "realtime",
"maxRowsInMemory": 50000,
"basePersistDirectory": "\/opt\/druid-data\/realtime\/basePersist",
"intermediatePersistPeriod": "PT10m",
"windowPeriod": "PT15m",
"rejectionPolicy": {
"type": "serverTime"
Based on your schema definitions there are a few observations I'll make.
When doing a join you usually have to list out columns explicitly (not use a *) otherwise you get collisions from duplicate columns. In your join, for example, you have a device_id in both "first" and "second-schema", not to mention all the other columns that are the same across both.
When using a literal delimiter I don't mix them up. I either use them or I don't.
So I think your query will work better in the form of something more like this
"second-schema"."etid" as "ss_etid",
"second-schema"."device_id" as "ss_device_id",
"second-schema"."device_name" as "ss_device_name",
"second-schema"."x_1" as "ss_x_1",
"second-schema"."x_2" as "ss_x_2",
"second-schema"."x_3" as "ss_x_3",
"second-schema"."vlan" as "ss_vlan",
"second-schema"."s_x" as "ss_s_x",
"second-schema"."d_x" as "ss_d_x",
"second-schema"."d_p" as "ss_d_p",
FROM "first"
INNER JOIN "second-schema" ON "first"."device_id" = "second-schema"."device_id";
Obviously feel free to name columns as you see fit, or include exclude columns as needed. Select * will only work when all columns across both tables are unique.

How to make preConditions for two columns in liquibase?

I don't know how to check two columns in table migration.
I use liquibase.
I wanna to make something like this:
"preConditions": [
"onFail": "MARK_RAN",
"not": {
"columnExists": {
"columnName": "first_column",
"tableName": "my_table"
"columnExists": {
"columnName": "second_column",
"tableName": "my_table"
not is supposed to be an array of objects.
It may add additional logic by using an “and” (default value) or “or” operators.
I’d go with the following:
"preConditions": [{
"onFail": "MARK_RAN",
"not": [{
"and": [{
"columnExists": {
"columnName": "first_column",
"tableName": "my_table"
}, {
"columnExists": {
"columnName": "second_column",
"tableName": "my_table"

Why does Azure Data Factory seemingly insist on inserting DateTimes as string?

I'm trying to set up an Azure Data Factory to copy and denormalize my data from a AzureSQL database to another AzureSQL database for reporting/BI purposes with a data flow, but I ran into a problem with inserting dates.
This is the definition of my dataflow.
"name": "dataflow1",
"properties": {
"type": "MappingDataFlow",
"typeProperties": {
"sources": [
"dataset": {
"referenceName": "AzureSqlTable1",
"type": "DatasetReference"
"name": "source1"
"sinks": [
"dataset": {
"referenceName": "AzureSqlTable2",
"type": "DatasetReference"
"name": "sink1"
"script": "\n\nsource(output(\n\t\tBucketId as string,\n\t\tStreamId as string,\n\t\tStreamIdOriginal as string,\n\t\tStreamRevision as integer,\n\t\tItems as integer,\n\t\tCommitId as string,\n\t\tCommitSequence as integer,\n\t\tCommitStamp as timestamp,\n\t\tCheckpointNumber as long,\n\t\tDispatched as boolean,\n\t\tHeaders as binary,\n\t\tPayload as binary\n\t),\n\tallowSchemaDrift: true,\n\tvalidateSchema: false,\n\tisolationLevel: 'READ_UNCOMMITTED',\n\tformat: 'table') ~> source1\nsource1 sink(allowSchemaDrift: true,\n\tvalidateSchema: false,\n\tformat: 'table',\n\tdeletable:false,\n\tinsertable:true,\n\tupdateable:false,\n\tupsertable:false,\n\tmapColumn(\n\t\tBucketId,\n\t\tCommitStamp\n\t)) ~> sink1"
and these are the definitions of my source
"name": "AzureSqlTable1",
"properties": {
"linkedServiceName": {
"referenceName": "Source_Test",
"type": "LinkedServiceReference"
"annotations": [],
"type": "AzureSqlTable",
"schema": [
"name": "BucketId",
"type": "varchar"
"name": "StreamId",
"type": "char"
"name": "StreamIdOriginal",
"type": "nvarchar"
"name": "StreamRevision",
"type": "int",
"precision": 10
"name": "Items",
"type": "tinyint",
"precision": 3
"name": "CommitId",
"type": "uniqueidentifier"
"name": "CommitSequence",
"type": "int",
"precision": 10
"name": "CommitStamp",
"type": "datetime2",
"scale": 7
"name": "CheckpointNumber",
"type": "bigint",
"precision": 19
"name": "Dispatched",
"type": "bit"
"name": "Headers",
"type": "varbinary"
"name": "Payload",
"type": "varbinary"
"typeProperties": {
"tableName": "[dbo].[Commits]"
and sink data sets
"name": "AzureSqlTable2",
"properties": {
"linkedServiceName": {
"referenceName": "Dest_Test",
"type": "LinkedServiceReference"
"annotations": [],
"type": "AzureSqlTable",
"schema": [],
"typeProperties": {
"tableName": "dbo.Test2"
When running my pipeline with the data flow I get the following error:
Activity dataflow1 failed: DF-EXEC-1 Conversion failed when converting date and/or time from character string.
com.microsoft.sqlserver.jdbc.SQLServerException: Conversion failed when converting date and/or time from character string.
at com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDatabaseError(SQLServerException.java:258)
at com.microsoft.sqlserver.jdbc.TDSTokenHandler.onEOF(tdsparser.java:256)
at com.microsoft.sqlserver.jdbc.TDSParser.parse(tdsparser.java:108)
at com.microsoft.sqlserver.jdbc.TDSParser.parse(tdsparser.java:28)
at com.microsoft.sqlserver.jdbc.SQLServerBulkCopy.doInsertBulk(SQLServerBulkCopy.java:1611)
at com.microsoft.sqlserver.jdbc.SQLServerBulkCopy.access$200(SQLServerBulkCopy.java:58)
at com.microsoft.sqlserver.jdbc.SQLServerBulkCopy$1InsertBulk.doExecute(SQLServerBulkCopy.java:709)
at com.microsoft.sqlserver.jdbc.TDSCommand.execute(IOBuffer.java:7151)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.executeCommand(SQLServerConnection.java:2478)
at com.microsoft.sqlserver.jdbc.SQLServerBulkCopy.sendBulkLoadBCP(SQLServerBulkCopy.java:739)
at com.microsoft.sqlserver.jdbc.SQLServerBulkCopy.writeToServer(SQLServerBulkCopy.java:1684)
at com.microsoft.sqlserver.jdbc.SQLServerBulkCopy.writeToServer(SQLServerBulkCopy.java:669)
at com.microsoft.azure.sqldb.spark.connect.DataFrameFunctions.com$microsoft$azure$sqldb$spark$connect$DataFrameFunctions$$bulkCopy(DataFrameFunctions.scala:127)
at com.microsoft.azure.sqldb.spark.connect.DataFrameFunctions$$anonfun$bulkCopyToSqlDB$1.apply(DataFrameFunctions.scala:72)
at com.microsoft.azure.sqldb.spark.connect.DataFrameFunctions$$anonfun$bulkCopyToSqlDB$1.apply(DataFrameFunctions.scala:72)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:948)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:948)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2226)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2226)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:124)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$11.apply(Executor.scala:459)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1401)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:465)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
My Azure SQL audit log shows the following statement that failed (which is not a huge surprise considering that it uses VARCHAR(50) as type for [CommitStamp]:
INSERT BULK dbo.T_301fcb5e4a4148d4a48f2943011b2f04 (
[CommitStamp] VARCHAR(50),
[StreamIdOriginal] NVARCHAR(MAX),
[StreamRevision] INT,
[Items] INT,
[CommitSequence] INT,
[CheckpointNumber] BIGINT,
[Dispatched] BIT,
[r8e440f7252bb401b9ead107597de6293] INT)
I have absolutely no idea why this occurs. It looks like the schema information is correct but somehow it seems the data factory/data flow wants to insert the CommitStamp as a string type.
As requested, the output from the data flow/code/plan view:
BucketId as string,
StreamId as string,
StreamIdOriginal as string,
StreamRevision as integer,
Items as integer,
CommitId as string,
CommitSequence as integer,
CommitStamp as timestamp,
CheckpointNumber as long,
Dispatched as boolean,
Headers as binary,
Payload as binary
allowSchemaDrift: true,
validateSchema: false,
isolationLevel: 'READ_UNCOMMITTED',
format: 'table',
schemaName: '[dbo]',
tableName: '[Commits]',
store: 'sqlserver',
server: 'sign2025-sqldata.database.windows.net',
database: 'SignPath.Application',
user: 'Sign2025Admin',
password: '**********') ~> source1
source1 sink(allowSchemaDrift: true,
validateSchema: false,
format: 'table',
schemaName: 'dbo',
tableName: 'Test2',
store: 'sqlserver',
server: 'sign2025-sqldata.database.windows.net',
database: 'SignPath.Reporting',
user: 'Sign2025Admin',
password: '**********') ~> sink1
I created a data flow to copy data from an Azure SQL database to another Azure SQL database. It succeeded to covert datatime2 to VARCHAR(50).
This is the definition of my dataflow:
"name": "dataflow1",
"properties": {
"type": "MappingDataFlow",
"typeProperties": {
"sources": [
"dataset": {
"referenceName": "DestinationDataset_sto",
"type": "DatasetReference"
"name": "source1"
"sinks": [
"dataset": {
"referenceName": "DestinationDataset_mex",
"type": "DatasetReference"
"name": "sink1"
"script": "\n\nsource(output(\n\t\tID as integer,\n\t\ttName as string,\n\t\tmyTime as timestamp\n\t),\n\tallowSchemaDrift: true,\n\tvalidateSchema: false,\n\tisolationLevel: 'READ_UNCOMMITTED',\n\tformat: 'table') ~> source1\nsource1 sink(input(\n\t\tID as integer,\n\t\ttName as string,\n\t\tmyTime as string\n\t),\n\tallowSchemaDrift: true,\n\tvalidateSchema: false,\n\tformat: 'table',\n\tdeletable:false,\n\tinsertable:true,\n\tupdateable:false,\n\tupsertable:false) ~> sink1"
The definitions of my source:
"name": "DestinationDataset_sto",
"properties": {
"linkedServiceName": {
"referenceName": "AzureSqlDatabase1",
"type": "LinkedServiceReference"
"annotations": [],
"type": "AzureSqlTable",
"schema": [
"name": "ID",
"type": "int",
"precision": 10
"name": "tName",
"type": "varchar"
"name": "myTime",
"type": "datetime2",
"scale": 7
"typeProperties": {
"tableName": "[dbo].[demo]"
"type": "Microsoft.DataFactory/factories/datasets"
My sink settings:
"name": "DestinationDataset_mex",
"properties": {
"linkedServiceName": {
"referenceName": "AzureSqlDatabase1",
"type": "LinkedServiceReference"
"annotations": [],
"type": "AzureSqlTable",
"schema": [
"name": "ID",
"type": "int",
"precision": 10
"name": "tName",
"type": "varchar"
"name": "myTime",
"type": "varchar"
"typeProperties": {
"tableName": "[dbo].[demo1]"
"type": "Microsoft.DataFactory/factories/datasets"
Here are my data flow steps.
Step 1: Source settings:
Step 2: Sink settings:
Running succeeded:
The table demo and demo1 almost have the same schema except the myTime.
My source table and it's data:
My sink table and the data copied from demo:
Data Flow plan:
ID as integer,
tName as string,
myTime as timestamp
allowSchemaDrift: true,
validateSchema: true,
isolationLevel: 'SERIALIZABLE',
format: 'table',
schemaName: '[dbo]',
tableName: '[demo]',
store: 'sqlserver',
server: '****.database.windows.net',
database: '****',
user: 'ServerAdmin',
password: '**********') ~> source1
source1 sink(input(
ID as integer,
tName as string,
myTime as string
allowSchemaDrift: true,
validateSchema: false,
format: 'table',
schemaName: '[dbo]',
tableName: '[demo1]',
store: 'sqlserver',
server: '****.database.windows.net',
database: '****',
user: 'ServerAdmin',
password: '**********') ~> sink1
I create the sink table manually and found that:
Data Flow can convert datatime2 to VARCHAR()(maybe NVARCHAR()) , date ,datetimeoffset.
When I try the date type time, datetime, datetime2, smalldatetime, Data Flow always gives the error:
"message": "DF-EXEC-1 Conversion failed when converting date and/or time from character
Update 2019-7-11:
I asked Azure Support for help and they replied me: this is a bug of Data Flow and there is no solution for now.
Update 2019-7-12:
I tested with Azure Support and they conform this is a bug. Here is the new email:
They also told me that the fix is already made and it will be deployed in next deployment train. This could be end of next week.
Hope this helps.
Looks like your Sink dataset defines myTime as a String:
ID as integer,
tName as string,
myTime as string
Can you change that to timestamp or Date, whichever you'd like to land it as?
Alternatively, you can land the data in a temporary staging table in SQL by setting "Recreate table" on the Sink and let ADF generate a new table definition on the fly using the data types of your mapped fields in the data flow.

How to find match elements in between two collections in mongodb?

I am working on mongodb database, but i am little stuck in one logic, how do i find match elements in between two collections in mongodb.
Users Collection
"_id": "57cd539d168df87ae2695543",
"userid": "3658975589",
"name": "John Doe",
"email": "johndoe#gmail.com",
"number": "123654789"
}, {
"_id": "57cd53e6168df87ae2695544",
"userid": "789456123",
"name": "William Rust",
"email": "williamrust#gmail.com",
"number": "963258741"
Contacts Collection
"_id": "57cd2f6c3966037787ce9550",
"contact": [{
"id": "457899979",
"fullname": "Abcd Hello",
"phonenumber": "123575784565",
"currentUserid": "123456789"
}, {
"id": "7994949849",
"fullname": "Keyboard Mouse",
"phonenumber": "23658974262",
"currentUserid": "123456789"
}, {
"id": "7848848885",
"fullname": "John Doe",
"phonenumber": "852147852",
"currentUserid": "123456789"
So i want to find (phone number) matched elements from these two collections and list out those elements with their name and email.
Please kindly go through my post and suggest me some solution.
I'm guessing that you want to do is "aggregate + lookup". Something like this:
from: "contacts",
localField: "number",
foreignField: "phonenumber",
as: "same"
$match: { "same": { $ne: [] } }
As a result you get:
"_id" : "57cd539d168df87ae2695543",
"userid" : "3658975589",
"name" : "Anshuman Pattnaik",
"email" : "anshuman#gmail.com",
"number" : "7022650603",
"same" : [
"_id" : ObjectId("5b361b864aa5144b974c9733"),
"id" : "7848848885",
"fullname" : "Anshuman Pattnaik",
"phonenumber" : "7022650603",
"currentUserid" : "123456789"
If you want show only the name and the email, you have to add { $project: { name: 1, email:1, _id:0 }
from: "contacts",
localField: "number",
foreignField: "phonenumber",
as: "same"
$match: { "same": { $ne: [] } }
{ $project: { name: 1, email:1, _id:0 }
Then you'll get:
{ "name" : "Anshuman Pattnaik", "email" : "anshuman#gmail.com" }
For this to work you have to correct the insert of your contacts like this:
"id": "457899979",
"fullname": "Abcd Hello",
"phonenumber": "123575784565",
"currentUserid": "123456789"
}, {
"id": "7994949849",
"fullname": "Keyboard Mouse",
"phonenumber": "23658974262",
"currentUserid": "123456789"
}, {
"id": "7848848885",
"fullname": "Anshuman Pattnaik",
"phonenumber": "7022650603",
"currentUserid": "123456789"
Hope it works!
For more information https://docs.mongodb.com/manual/reference/operator/aggregation/lookup/
it's not your complete answer, but it may help you to solve your problem.
you can compare two documents using below function. for more details see this answer
var compareCollections = function(){
db.users collection.find().forEach(function(obj1){
db.contacts collection.find({/*if you know some properties, you can put them here...if don't, leave this empty*/}).forEach(function(obj2){
var equals = function(o1, o2){
// some code.
if(equals(ob1, obj2)){
// Do what you want to do

how to add data to the repeatable fields in avro schema?

I'm trying to test avro serde and deserde without code generation (I completed this task using code generation). Schema is as follows
"type": "record",
"name" : "person",
"namespace" : "avro",
"fields": [
{ "name" : "personname", "type": ["null","string"] },
{ "name" : "personId", "type": ["null","string"] },
{ "name" : "Addresses", "type": {
"type": "array",
"items": [ {
"type" : "record",
"name" : "Address",
"fields" : [
{ "name" : "addressLine1", "type": ["null", "string"] },
{ "name" : "addressLine2", "type": ["null", "string"] },
{ "name" : "city", "type": ["null", "string"] },
{ "name" : "state", "type": ["null", "string"] },
{ "name" : "zipcode", "type": ["null", "string"] }
{ "name" : "contact", "type" : ["null", "string"]}
I understand this is how data is added to the schema.
Schema schema = new Schema.Parser().parse(new File("src/person.avsc.txt"));
GenericRecord person1 = new GenericData.Record(schema);
person1.put("personname", "goud");
But how do I add city, state etc to address and then add it to addresses?
GenericRecord address1 = new GenericData.Record(schema);
address1.put("city", "SanJose");
The above snippet doesn't work. I tried to look into GenericArray, but I couldn't get my head around it.
You need to describe inner complex type ("type" : "record", "name" : "Address") in separate schema, like this:
"type" : "record",
"name" : "Address",
"fields" : [
{ "name" : "addressLine1", "type": ["null", "string"] },
{ "name" : "addressLine2", "type": ["null", "string"] },
{ "name" : "city", "type": ["null", "string"] },
{ "name" : "state", "type": ["null", "string"] },
{ "name" : "zipcode", "type": ["null", "string"] }
Then you may create an inner object:
Schema innerSchema = new Schema.Parser().parse(new File("person_address.avsc"));
GenericRecord address = new GenericData.Record(innerSchema);
address.put("addressLine1", "adr_1");
address.put("addressLine2", "adr_2");
address.put("city", "test_city");
address.put("state", "test_state");
address.put("zipcode", "zipcode_00000");
Then add an inner object you created to ArrayList.
At last, create the main object and add all this staff in it.
Here is full example in java:
Schema innerSchema = new Schema.Parser().parse(new File("person_address.avsc"));
GenericRecord address = new GenericData.Record(innerSchema);
address.put("addressLine1", "adr_1");
address.put("addressLine2", "adr_2");
address.put("city", "test_city");
address.put("state", "test_state");
address.put("zipcode", "zipcode_00000");
ArrayList<GenericRecord> addresses = new ArrayList<>();
Schema mainSchema = new Schema.Parser().parse(new File("person.avsc"));
GenericRecord person1 = new GenericData.Record(mainSchema);
person1.put("personname", "goud");
person1.put("personId", "123_id");
person1.put("Addresses", addresses);
"personname": "goud",
"personId": "123_id",
"Addresses": [
"addressLine1": "adr_1",
"addressLine2": "adr_2",
"city": "test_city",
"state": "test_state",
"zipcode": "zipcode_00000"
"contact": "test_contact"