I'm created a UDF and expect List<MAP<STRING,STRING>> argument. It works fine in unit test. However when I use it in hql. I cannot get any value of the map by key. it's always null.
However log shows every key has non-null value. don't know why.
error message:
Caused by: java.lang.RuntimeException: service_type cannot be
source_locale=en, source_updated_at=1501377418000, content_type=PLAIN,
// UDF code:
public class MusselContentUDF extends GenericUDF {
private static final Log LOG = LogFactory.getLog(MusselContentUDF.class);
private ListObjectInspector listObjectInspector;
private MapObjectInspector mapObjectInspector;
public ObjectInspector initialize(ObjectInspector[] arguments) throws UDFArgumentException {
ObjectInspector a = arguments[0];
if (!(a instanceof ListObjectInspector)) {
throw new UDFArgumentException("first argument must be a list / array");
this.listObjectInspector = (ListObjectInspector) a;
if (!(listObjectInspector.getListElementObjectInspector() instanceof MapObjectInspector)) {
throw new UDFArgumentException("element must be map type");
this.mapObjectInspector =
(MapObjectInspector) (listObjectInspector.getListElementObjectInspector());
return PrimitiveObjectInspectorFactory.javaByteArrayObjectInspector;
public byte[] evaluate(DeferredObject[] arguments) throws HiveException {
List<MusselContentUnit> contentUnits =
e -> {
Map<?, ?> map = mapObjectInspector.getMap(e);
if (map.get("service_type") == null) {
LOG.error("service_type cannot be null:" + map);
throw new RuntimeException("service_type cannot be null:" + map);
if (map.get("value") == null) {
LOG.error("value cannot be null:" + map);
throw new RuntimeException("value cannot be null:" + map);
if (map.get("source_hash") == null) {
LOG.error("source_hash cannot be null:" + map);
throw new RuntimeException("source_hash cannot be null:" + map);
if (map.get("source_locale") == null) {
LOG.error("source_locale cannot be null:" + map);
throw new RuntimeException("source_locale cannot be null:" + map);
if (map.get("source_updated_at") == null) {
LOG.error("source_updated_at cannot be null:" + map);
throw new RuntimeException("source_updated_at cannot be null:" + map);
return MusselContentUnit.builder()
.serviceType((String) map.get("service_type"))
.value((String) map.get("value"))
.sourceHash((String) map.get("source_hash"))
.sourceLocale((String) map.get("source_locale"))
.sourceUpdatedAt(Long.valueOf((String) map.get("source_updated_at")))
.contentType((String) map.get("content_type"))
try {
return ThriftCodec.serialize(MusselContent.builder().units(contentUnits).build(), true);
} catch (TException e) {
throw new HiveException("Cannot parse idl request content");
public String getDisplayString(String[] children) {
return "MusselContentUDF(" + children[0] + ")";
hql code:
q4 AS (
SELECT MUSSEL_PRIMARY_KEY(publisher_name, model, field_name, shard_num) AS primary_key
,CONCAT(field_name, '.', locale) AS secondary_key
'service_type', service_type,
'value', value,
'source_hash', source_hash,
'source_locale', source_locale,
'source_updated_at', source_updated_at,
'content_type', content_type
) AS content_unit
q5 AS (
SELECT primary_key
,COLLECT_LIST(content_unit) AS content_units
GROUP BY primary_key, secondary_key

The map keys may not be strings when you get the map from mapObjectInspector. Currently I see two possible ways how to solve the problem:
Try to use getMapValueElement(Object data, Object key) method to get the map value
To work out keys and values ObjectInspectors, e.g.,
protected transient PrimitiveObjectInspector keyOI;
protected transient PrimitiveObjectInspector valueOI;
keyOI = (PrimitiveObjectInspector) this.mapObjectInspector.getMapKeyObjectInspector();
valueOI = (PrimitiveObjectInspector) this.mapObjectInspector.getMapValueObjectInspector();
then use getPrimitiveJavaObject to get the Java object and cast it to String, this way you can cast Map<?, ?> map to Map<String, String>, and then use new_map.get("service_type").
Unit tests are not realy helpful, I would suggest you to use HiveRunner instead


Spring R2dbc: Is there are way to get constant stream from postgresql database and process them?

I want to fetch records for newly created records in a table in postgresql as a live/continuous stream. Is it possible to use using spring r2dbc? If so what options do I have?
You need to use pg_notify and start to listing on it. Any change that you want to see should be wrapped in simple trigger that will send message to pg_notify.
I have an example of this on my github, but long story short:
prepare function and trigger:
CREATE OR REPLACE FUNCTION notify_member_saved()
AS $$
PERFORM pg_notify('MEMBER_SAVED', row_to_json(NEW)::text);
$$ LANGUAGE plpgsql;
CREATE TRIGGER member_saved_trigger
ON members
EXECUTE PROCEDURE notify_member_saved();
In java code prepare listener
class NotificationService {
private final ConnectionFactory connectionFactory;
private final Set<NotificationTopic> watchedTopics = Collections.synchronizedSet(new HashSet<>());
private final ObjectMapper objectMapper;
private PostgresqlConnection connection;
private void preDestroy() {
private PostgresqlConnection getConnection() {
if(connection == null) {
synchronized(NotificationService.class) {
if(connection == null) {
try {
connection = Mono.from(connectionFactory.create())
} catch(InterruptedException e) {
throw new RuntimeException(e);
} catch(ExecutionException e) {
throw new RuntimeException(e);
return this.connection;
public <T> Flux<T> listen(final NotificationTopic topic, final Class<T> clazz) {
if(!watchedTopics.contains(topic)) {
return getConnection().getNotifications()
.filter(notification -> && notification.getParameter() != null)
.handle((notification, sink) -> {
final String json = notification.getParameter();
if(!StringUtils.isBlank(json)) {
try {, clazz));
} catch(JsonProcessingException e) {
log.error(String.format("Problem deserializing an instance of [%s] " +
"with the following json: %s ", clazz.getSimpleName(), json), e);
Mono.error(new DeserializationException(topic, e));
private void executeListenStatement(final NotificationTopic topic) {
getConnection().createStatement(String.format("LISTEN \"%s\"", topic)).execute()
.doOnComplete(() -> watchedTopics.add(topic))
public void unlisten(final NotificationTopic topic) {
if(watchedTopics.contains(topic)) {
private void executeUnlistenStatement(final NotificationTopic topic) {
getConnection().createStatement(String.format("UNLISTEN \"%s\"", topic)).execute()
.doOnComplete(() -> watchedTopics.remove(topic))
start listiong from controller
public Flux<ServerSentEvent<Object>> listenToEvents() {
return Flux.merge(listenToDeletedItems(), listenToSavedItems())
.map(o -> ServerSentEvent.builder()
public Mono<ResponseEntity<Void>> unlistenToEvents() {
return Mono.just(
private Flux<Member> listenToSavedItems() {
return this.notificationService.listen(MEMBER_SAVED, Member.class);
private void unlistenToSavedItems() {
but remember that if something broke then you lost pg_notify events for some time so it is for non-mission-citical solutions.

Oracle Coherence index not working with ContainsFilter query

I've added an index to a cache. The index uses a custom extractor that extends AbstractExtractor and overrides only the extract method to return a List of Strings. Then I have a ContainsFilter which uses the same custom extractor that looks for the occurence of a single String in the List of Strings. It does not look like my index is being used based on the time it takes to execute my test. What am I doing wrong? Also, is there some debugging I can switch on to see which indices are used?
public class DependencyIdExtractor extends AbstractExtractor {
private static final long serialVersionUID = 1L;
public Object extract(Object oTarget) {
if (oTarget == null) {
return null;
if (oTarget instanceof CacheValue) {
CacheValue cacheValue = (CacheValue)oTarget;
// returns a List of String objects
return cacheValue.getDependencyIds();
throw new UnsupportedOperationException();
Adding the index:
mCache = CacheFactory.getCache(pCacheName);
mCache.addIndex(new DependencyIdExtractor(), false, null);
Performing the ContainsFilter query:
public void invalidateByDependencyId(String pDependencyId) {
ContainsFilter vContainsFilter = new ContainsFilter(new DependencyIdExtractor(), pDependencyId);
Set setKeys = mCache.keySet(vContainsFilter);
I solved this by adding a hashCode and equals method implementation to the DependencyIdExtractor class. It is important that you use exactly the same value extractor when adding an index and creating your filter.
public class DependencyIdExtractor extends AbstractExtractor {
private static final long serialVersionUID = 1L;
public Object extract(Object oTarget) {
if (oTarget == null) {
return null;
if (oTarget instanceof CacheValue) {
CacheValue cacheValue = (CacheValue)oTarget;
return cacheValue.getDependencyIds();
throw new UnsupportedOperationException();
public int hashCode() {
return 1;
public boolean equals(Object obj) {
if (obj == null) {
return false;
if (obj instanceof DependencyIdExtractor) {
return true;
return false;
To debug Coherence indices/queries, you can generate an explain plan similar to database query explain plans.
public void invalidateByDependencyId(String pDependencyId) {
ContainsFilter vContainsFilter = new ContainsFilter(new DependencyIdExtractor(), pDependencyId);
if (mLog.isTraceEnabled()) {
QueryRecorder agent = new QueryRecorder(RecordType.EXPLAIN);
Object resultsExplain = mCache.aggregate(vContainsFilter, agent);
mLog.trace("resultsExplain = \n" + resultsExplain + "\n");
Set setKeys = mCache.keySet(vContainsFilter);

entity framework 5 change log how to implement?

I am creating an application with MVC4 and entity framework 5. How do can I implement this?
I have looked around and found that I need to override SaveChanges .
Does anyone have any sample code on this? I am using code first approach.
As an example, the way I am saving data is as follows,
public class AuditZoneRepository : IAuditZoneRepository
private AISDbContext context = new AISDbContext();
public int Save(AuditZone model, ModelStateDictionary modelState)
if (model.Id == 0)
var recordToUpdate = context.AuditZones.FirstOrDefault(x => x.Id == model.Id);
if (recordToUpdate != null)
recordToUpdate.Description = model.Description;
recordToUpdate.Valid = model.Valid;
recordToUpdate.ModifiedDate = DateTime.Now;
return 1;
catch (Exception ex)
modelState.AddModelError("", "Database error has occured. Please try again later");
return -1;
There is no need to override SaveChanges.
You can
Trigger Context.ChangeTracker.DetectChanges(); // may be necessary depending on your Proxy approach
Then analyze the context BEFORE save.
you can then... add the Change Log to the CURRENT Unit of work.
So the log gets saved in one COMMIT transaction.
Or process it as you see fit.
But saving your change log at same time. makes sure it is ONE Transaction.
Analyzing the context sample:
I have a simple tool, to Dump context content to debug output so when in debugger I can use immediate window to check content. eg
You can use this as a starter to prepare your CHANGE Log.
Try it in debugger immediate window. I have FULL dump on my Context class.
Sample Immediate window call. UoW.Context.FullDump();
public void FullDump()
Debug.WriteLine("=====Begin of Context Dump=======");
var dbsetList = this.ChangeTracker.Entries();
foreach (var dbEntityEntry in dbsetList)
Debug.WriteLine(dbEntityEntry.Entity.GetType().Name + " => " + dbEntityEntry.State);
switch (dbEntityEntry.State)
case EntityState.Detached:
case EntityState.Unchanged:
case EntityState.Added:
case EntityState.Modified:
case EntityState.Deleted:
throw new ArgumentOutOfRangeException();
Debug.WriteLine("==========End of Entity======");
Debug.WriteLine("==========End of Context======");
private static void WriteCurrentValues(DbEntityEntry dbEntityEntry)
foreach (var cv in dbEntityEntry.CurrentValues.PropertyNames)
Debug.WriteLine(cv + "=" + dbEntityEntry.CurrentValues[cv]);
private static void WriteOriginalValues(DbEntityEntry dbEntityEntry)
foreach (var cv in dbEntityEntry.OriginalValues.PropertyNames)
Debug.WriteLine(cv + "=" + dbEntityEntry.OriginalValues[cv]);
EDIT: Get the changes
I use this routine to get chnages...
public class ObjectPair {
public string Key { get; set; }
public object Original { get; set; }
public object Current { get; set; }
public virtual IList<ObjectPair> GetChanges(object poco) {
var changes = new List<ObjectPair>();
var thePoco = (TPoco) poco;
foreach (var propName in Entry(thePoco).CurrentValues.PropertyNames) {
var curr = Entry(thePoco).CurrentValues[propName];
var orig = Entry(thePoco).OriginalValues[propName];
if (curr != null && orig != null) {
if (curr.Equals(orig)) {
if (curr == null && orig == null) {
var aChangePair = new ObjectPair {Key = propName, Current = curr, Original = orig};
return changes;
edit 2 If you must use the Internal Object tracking.
var context = ???// YOUR DBCONTEXT class
// get objectcontext from dbcontext...
var objectContext = ((IObjectContextAdapter) context).ObjectContext;
// for each tracked entry
foreach (var dbEntityEntry in context.ChangeTracker.Entries()) {
//get the state entry from the statemanager per changed object
var stateEntry = objectContext.ObjectStateManager.GetObjectStateEntry(dbEntityEntry.Entity);
var modProps = stateEntry.GetModifiedProperties();
I decompiled EF6 . Get modified is indeed using private bit array to track fields that have
been changed.
// EF decompiled source..... _modifiedFields is a bitarray
public override IEnumerable<string> GetModifiedProperties()
if (EntityState.Modified == this.State && this._modifiedFields != null)
for (int i = 0; i < this._modifiedFields.Length; ++i)
if (this._modifiedFields[i])
yield return this.GetCLayerName(i, this._cacheTypeMetadata);

Developing Hive UDAF meet a ClassCastException without an idea

`public class GenericUdafMemberLevel implements GenericUDAFResolver2 {
private static final Log LOG = LogFactory
public GenericUDAFEvaluator getEvaluator(GenericUDAFParameterInfo paramInfo)
throws SemanticException {
return new GenericUdafMeberLevelEvaluator();
public GenericUDAFEvaluator getEvaluator(TypeInfo[] parameters)
throws SemanticException {
if (parameters.length != 2) {//参数大小
throw new UDFArgumentTypeException(parameters.length - 1,
"Exactly two arguments are expected.");
if (parameters[0].getCategory() != ObjectInspector.Category.PRIMITIVE) {
throw new UDFArgumentTypeException(0,
"Only primitive type arguments are accepted but "
+ parameters[0].getTypeName() + " is passed.");
if (parameters[1].getCategory() != ObjectInspector.Category.PRIMITIVE) {
throw new UDFArgumentTypeException(1,
"Only primitive type arguments are accepted but "
+ parameters[1].getTypeName() + " is passed.");
return new GenericUdafMeberLevelEvaluator();
public static class GenericUdafMeberLevelEvaluator extends GenericUDAFEvaluator {
private PrimitiveObjectInspector inputOI;
private PrimitiveObjectInspector inputOI2;
private DoubleWritable result;
public ObjectInspector init(Mode m, ObjectInspector[] parameters)
throws HiveException {
super.init(m, parameters);
if (m == Mode.PARTIAL1 || m == Mode.COMPLETE){
inputOI = (PrimitiveObjectInspector) parameters[0];
inputOI2 = (PrimitiveObjectInspector) parameters[1];
result = new DoubleWritable(0);
return PrimitiveObjectInspectorFactory.writableLongObjectInspector;
/** class for storing count value. */
static class SumAgg implements AggregationBuffer {
boolean empty;
double value;
public AggregationBuffer getNewAggregationBuffer() throws HiveException {
SumAgg buffer = new SumAgg();
return buffer;
public void reset(AggregationBuffer agg) throws HiveException {
((SumAgg) agg).value = 0.0;
((SumAgg) agg).empty = true;
private boolean warned = false;
public void iterate(AggregationBuffer agg, Object[] parameters)
throws HiveException {
// parameters == null means the input table/split is empty
if (parameters == null) {
try {
double flag = PrimitiveObjectInspectorUtils.getDouble(parameters[1], inputOI2);
if(flag > 1.0) //参数条件
merge(agg, parameters[0]); //这里将Map之后的操作,放入combiner进行合并
} catch (NumberFormatException e) {
if (!warned) {
warned = true;
LOG.warn(getClass().getSimpleName() + " "
+ StringUtils.stringifyException(e));
public void merge(AggregationBuffer agg, Object partial)
throws HiveException {
if (partial != null) {
double p = PrimitiveObjectInspectorUtils.getDouble(partial, inputOI);
((SumAgg) agg).value += p;
public Object terminatePartial(AggregationBuffer agg)
throws HiveException {
return terminate(agg);
public Object terminate(AggregationBuffer agg) throws HiveException {
result.set(((SumAgg) agg).value);
return result;
I have used some chinese to comment the code for understanding the theory.
Actually, the idea of the UDAF is like follow:
select test_sum(col1,col2) from tbl ;
if col2 satisfy some condition, then sum col1's value.
Most of the code are copied from the offical avg() udaf function.
I met a weried Exception:
java.lang.RuntimeException: Hive Runtime Error while closing operators
at org.apache.hadoop.hive.ql.exec.ExecMapper.close(
at org.apache.hadoop.mapred.MapTask.runOldMapper(
at org.apache.hadoop.mapred.Child$
at Method)
at org.apache.hadoop.mapred.Child.main(
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: cannot be cast to
at org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(
at org.apache.hadoop.hive.ql.exec.Operator.close(
at org.apache.hadoop.hive.ql.exec.Operator.close(
at org.apache.hadoop.hive.ql.exec.Operator.close(
at org.apache.hadoop.hive.ql.exec.Operator.close(
at org.apache.hadoop.hive.ql.exec.ExecMapper.close(
... 8 more
Caused by: java.lang.ClassCastException: cannot be cast to
at org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableLongObjectInspector.get(
at org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe.serialize(
at org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe.serializeStruct(
at org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe.serialize(
at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(
at org.apache.hadoop.hive.ql.exec.Operator.process(
at org.apache.hadoop.hive.ql.exec.Operator.forward(
at org.apache.hadoop.hive.ql.exec.GroupByOperator.forward(
at org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(
... 13 more
Am I have something wrong with my UDAF??
please kindly point it out.
Thanks a lllllllot .
Replace PrimitiveObjectInspectorFactory.writableLongObjectInspector in init method with PrimitiveObjectInspectorFactory.writableDoubleObjectInspector.

How does hive achieve count(distinct ...)?

In the
#Description(name = "count",
value = "_FUNC_(*) - Returns the total number of retrieved rows, including "
+ "rows containing NULL values.\n"
+ "_FUNC_(expr) - Returns the number of rows for which the supplied "
+ "expression is non-NULL.\n"
+ "_FUNC_(DISTINCT expr[, expr...]) - Returns the number of rows for "
+ "which the supplied expression(s) are unique and non-NULL.")
but I don`t see any code to deal with the 'distinct' expression.
public static class GenericUDAFCountEvaluator extends GenericUDAFEvaluator {
private boolean countAllColumns = false;
private LongObjectInspector partialCountAggOI;
private LongWritable result;
public ObjectInspector init(Mode m, ObjectInspector[] parameters)
throws HiveException {
super.init(m, parameters);
partialCountAggOI =
result = new LongWritable(0);
return PrimitiveObjectInspectorFactory.writableLongObjectInspector;
private GenericUDAFCountEvaluator setCountAllColumns(boolean countAllCols) {
countAllColumns = countAllCols;
return this;
/** class for storing count value. */
static class CountAgg implements AggregationBuffer {
long value;
public AggregationBuffer getNewAggregationBuffer() throws HiveException {
CountAgg buffer = new CountAgg();
return buffer;
public void reset(AggregationBuffer agg) throws HiveException {
((CountAgg) agg).value = 0;
public void iterate(AggregationBuffer agg, Object[] parameters)
throws HiveException {
// parameters == null means the input table/split is empty
if (parameters == null) {
if (countAllColumns) {
assert parameters.length == 0;
((CountAgg) agg).value++;
} else {
assert parameters.length > 0;
boolean countThisRow = true;
for (Object nextParam : parameters) {
if (nextParam == null) {
countThisRow = false;
if (countThisRow) {
((CountAgg) agg).value++;
public void merge(AggregationBuffer agg, Object partial)
throws HiveException {
if (partial != null) {
long p = partialCountAggOI.get(partial);
((CountAgg) agg).value += p;
public Object terminate(AggregationBuffer agg) throws HiveException {
result.set(((CountAgg) agg).value);
return result;
public Object terminatePartial(AggregationBuffer agg) throws HiveException {
return terminate(agg);
How does hive achieve count(distinct ...)? When task runs, it really cost much time.
Where is it in the source code?
As you can just run SELECT DISTINCT column1 FROM table1, DISTINCT expression isn't a flag or option, it's evaluated independently
This page says:
The actual filtering of data bound to parameter types for DISTINCT
implementation is handled by the framework and not the COUNT UDAF
If you want drill down to source details, have a look into hive git repository