Jpql query to extract array value - sql

My database contain a column which is the type of varchar.
In that column i am storing value in the form [1,2] .
I need to get first value '1'.
I need to write jpql query using #query of spring data jpa.
I have decided to remove the braces of the value by using substring function then i need to convert it to array.
So, I have tried like this for the substring extraction.
SUBSTRING(u.output,1,LENGTH(u.output-2))
Is this syntax correct for substring extraction? how convert it to array.

If you cannot normalise the database then I would normalise the domain model via a JPA Converter.
You can these just load the entity and navigate the array as normal.
https://www.baeldung.com/jpa-attribute-converters
#Entity
public MyEntity{
#Convert(converter = MyArrayConverter.class)
int [] values;
}
Create a JPA Converter:
#Converter
public class MyArrayConverter implements
AttributeConverter<Integer [], String> {
#Override
public String convertToDatabaseColumn(int [] values) {
//convert int [] to string "[1,2,3]"
}
#Override
public Integer [] convertToEntityAttribute(String dbValue) {
//convert string "[1,2,3]" to int []
}
}

Related

How to check collection for null in spring data jpa #Query with in predicate

I have this query in my spring data jpa repository:
#Query("SELECT table1 FROM Table1 table1 "
+ "INNER JOIN FETCH table1.error error"
+ "WHERE table1.date = ?1 "
+ "AND (COALESCE(?2) IS NULL OR (table1.code IN ?2)) "
+ "AND (COALESCE(?3) IS NULL OR (error.errorCode IN ?3)) ")
List<Table1> findByFilter(Date date, List<String> codes, List<String> errorCodes);
When I run this query, it shows me this error by console:
org.postgresql.util.PSQLException: ERROR: operator does not exist: character varying = bytea
Hint: No operator matches the given name and argument types. You might need to add explicit type casts.
Position: 1642
However if I run the query without the (COALESCE (?2) IS NULL OR part, just the table1.code IN ?2, it does work
Does anyone know what this error could be due to?
COALESCE with one parameter does not make sense. This is an abbreviated CASE expression that returns the first non-null operand. (See this)
I would suggest you to use named parameters instead of position-based parameters. As it's stated in the documentation this makes query methods a little error-prone when refactoring regarding the parameter position.
As it's stated in documentation related to the IN predicate:
The list of values can come from a number of different sources. In the constructor_expression and collection_valued_input_parameter, the list of values must not be empty; it must contain at least one value.
I would suggest you also avoid to use outdated Date and use instead java 8 Date/Time API.
So, taken into account all above, you should use a dynamic query as it was suggested also in comments by #SimonMartinelli. Particularly you can have a look at the specifications.
Assuming that you have the following mapping:
#Entity
public class Error
{
#Id
private Long id;
private String errorCode;
// ...
}
#Entity
public class Table1
{
#Id
private Long id;
private LocalDateTime date;
private String code;
#ManyToOne
private Error error;
// ...
}
you can write the following specification:
import javax.persistence.criteria.JoinType;
import javax.persistence.criteria.Predicate;
import org.springframework.data.jpa.domain.Specification;
import org.springframework.util.CollectionUtils;
public class TableSpecs
{
public static Specification<Table1> findByFilter(LocalDateTime date, List<String> codes, List<String> errorCodes)
{
return (root, query, builder) -> {
root.fetch("error", JoinType.LEFT);
Predicate result = builder.equal(root.get("date"), date);
if (!CollectionUtils.isEmpty(codes)) {
result = builder.and(result, root.get("code").in(codes));
}
if (!CollectionUtils.isEmpty(errorCodes)) {
result = builder.and(result, root.get("error").get("errorCode").in(errorCodes));
}
return result;
};
}
}
public interface TableRepository extends CrudRepository<Table1, Long>, JpaSpecificationExecutor<Table1>
{
default List<Table1> findByFilter(LocalDateTime date, List<String> codes, List<String> errorCodes)
{
return findAll(TableSpecs.findByFilter(date, codes, errorCodes));
}
}
and then use it:
List<Table1> results = tableRepository.findByFilter(date, Arrays.asList("TBL1"), Arrays.asList("ERCODE2")));

Java8 Streams - Compare Two List's object values and add value to sub object of first list?

I have two classes:
public class ClassOne {
private String id;
private String name;
private String school;
private String score; //Default score is null
..getters and setters..
}
public class ClassTwo {
private String id;
private String marks;
..getters and setters..
}
And, I have two Lists of the above classes,
List<ClassOne> listOne;
List<ClassTwo> listTwo;
How can I compare two list and assign marks from listTwo to score of listOne based on the criteria if the IDs are equal. I know, we can use two for loops and do it. But I want to implement it using Java8 streams.
List<ClassOne> result = new ArrayList<>();
for(ClassOne one : listOne) {
for(ClassTwo two : listTwo) {
if(one.getId().equals(two.getId())) {
one.setScore(two.getmarks());
result.add(one);
}
}
}
return result;
How can I implement this using Java8 lambda and streams?
Let listOne.size() is N and listTwo.size() is M.
Then 2-for-loops solution has complexity of O(M*N).
We can reduce it to O(M+N) by indexing listTwo by ids.
Case 1 - assuming listTwo has no objects with the same id
// pair each id with its marks
Map<String, String> marksIndex = listTwo.stream().collect(Collectors.toMap(ObjectTwo::getId, ObjectTwo::getMarks));
// go through list of `ObjectOne`s and lookup marks in the index
listOne.forEach(o1 -> o1.setScore(marksIndex.get(o1.getId())));
Case 2 - assuming listTwo has objects with the same id
final Map<String, List<ObjectTwo>> marksIndex = listTwo.stream()
.collect(Collectors.groupingBy(ObjectTwo::getId, Collectors.toList()));
final List<ObjectOne> result = listOne.stream()
.flatMap(o1 -> marksIndex.get(o1.getId()).stream().map(o2 -> {
// make a copy of ObjectOne instance to avoid overwriting scores
ObjectOne copy = copy(o1);
copy.setScore(o2.getMarks());
return copy;
}))
.collect(Collectors.toList());
To implement copy method you either need to create a new object and copy fields one by one, but in such cases I prefer to follow the Builder pattern. It also results in more "functional" code.
Following code copies marks from ObjectTwo to score in ObjectOne, if both ids are equal, it doesn't have intermediate object List<ObjectOne> result
listOne.stream()
.forEach(one -> {listTwo.stream()
.filter(two -> {return two.getId().equals(one.getId());})
.limit(1)
.forEach(two -> {one.setScore(two.getMarks());});
});
This should work.
Map<String, String> collect = listTwo.stream().collect(Collectors.toMap(ObjectTwo::getId, ObjectTwo::getMarks));
listOne
.stream()
.filter(item -> collect.containsKey(item.getId()))
.forEach(item -> item.setScore(collect.get(item.getId())));

Spark JavaPairRDD iteration

How can iterate on JavaPairRDD. I have done a group by and got back a RDD as below JavaPairRDD (Tuple 7 set of Strings and List of Objects)
Now I have to iterate over this RDD and do some calculations like FOR EACH in Pig.
Basically I would like to iterate the key and the list of values and do some operations and then return back a JavaPairRDD?
JavaPairRDD<Tuple7<String, String,String,String,String,String,String>, List<Records>> sizes =
piTagRecordData.groupBy( new Function<Records, Tuple7<String, String,String,String,String,String,String>>() {
private static final long serialVersionUID = 2885738359644652208L;
#Override
public Tuple7<String, String,String,String,String,String,String> call(Records row) throws Exception {
Tuple7<String, String,String,String,String,String,String> compositeKey = new Tuple7<String, String, String, String, String, String, String>(row.getAsset_attribute_id(),row.getDate_time_value(),row.getOperation(),row.getPi_tag_count(),row.getAsset_id(),row.getAttr_name(),row.getCalculation_type());
return compositeKey;
}
});
After this I want to perform FOR EACH member of sizes (JavaPairRDD), operation -- something like
rejected_records = FOREACH sizes GENERATE FLATTEN(Java function on the List of Records based on the group key
I am using Spark 0.9.0
Even though you are talking about "FOR EACH", it really sounds like you want the flatMap operation, since you want to produce new values and flatten them. This is available for Java RDDs, including a JavaPairRDD.
You can use void foreach(VoidFunction<T> f) method. More info and methods: https://spark.apache.org/docs/1.1.0/api/java/org/apache/spark/api/java/JavaRDDLike.html#foreach(org.apache.spark.api.java.function.VoidFunction)
if you want to view some value of JavaPairRDD, I would do like this
for (Tuple2<String, String> test : pairRdd.take(10)) //or pairRdd.collect()
{
System.out.println(test._1);
System.out.println(test._2);
}
Note:Tuple2 (assuming you have strings inside the JavaPairRDD), change the datatype according to the data type stored in the JavaPairRDD.

Reading Hadoop SequenceFiles with Hive

I have some mapred data from the Common Crawl that I have stored in a SequenceFile format. I have tried repeatedly to use this data "as is" with Hive so I can query and sample it at various stages. But I always get the following error in my job output:
LazySimpleSerDe: expects either BytesWritable or Text object!
I have even constructed a simpler (and smaller) dataset of [Text, LongWritable] records, but that fails as well. If I output the data to text format and then create a table on that, it works fine:
hive> create external table page_urls_1346823845675
> (pageurl string, xcount bigint)
> location 's3://mybucket/text-parse/1346823845675/';
OK
Time taken: 0.434 seconds
hive> select * from page_urls_1346823845675 limit 10;
OK
http://0-italy.com/tag/package-deals 643 NULL
http://011.hebiichigo.com/d63e83abff92df5f5913827798251276/d1ca3aaf52b41acd68ebb3bf69079bd1.html 9 NULL
http://01fishing.com/fly-fishing-knots/ 3437 NULL
http://01fishing.com/flyin-slab-creek/ 1005 NULL
...
I tried using a custom inputformat:
// My custom input class--very simple
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.SequenceFileInputFormat;
public class UrlXCountDataInputFormat extends
SequenceFileInputFormat<Text, LongWritable> { }
I create the table then with:
create external table page_urls_1346823845675_seq
(pageurl string, xcount bigint)
stored as inputformat 'my.package.io.UrlXCountDataInputFormat'
outputformat 'org.apache.hadoop.mapred.SequenceFileOutputFormat'
location 's3://mybucket/seq-parse/1346823845675/';
But I still get the same SerDer error.
I'm sure there's something really basic I'm missing here, but I can't seem to get it right. Additionally, I have to be able to parse the SequenceFiles in place (i.e. I can't convert my data to text). So I need to figure out the SequenceFile approach for future portions of my project.
Solution:
As #mark-grover pointed out below, the issue is that Hive ignores the key by default. With only one column (i.e. just the value), the serder was unable to map my second column.
The solution was to use a custom InputFormat that was great deal more complex than what I had used originally. I tracked down one answer at link to a Git about using the keys instead of the values, and then I modified it to suit my needs: take the key and value from an internal SequenceFile.Reader and then combining them into the final BytesWritable. I.e. something like this (from the custom Reader, as that's where all the hard work happens):
// I used generics so I can use this all with
// other output files with just a small amount
// of additional code ...
public abstract class HiveKeyValueSequenceFileReader<K,V> implements RecordReader<K, BytesWritable> {
public synchronized boolean next(K key, BytesWritable value) throws IOException {
if (!more) return false;
long pos = in.getPosition();
V trueValue = (V) ReflectionUtils.newInstance(in.getValueClass(), conf);
boolean remaining = in.next((Writable)key, (Writable)trueValue);
if (remaining) combineKeyValue(key, trueValue, value);
if (pos >= end && in.syncSeen()) {
more = false;
} else {
more = remaining;
}
return more;
}
protected abstract void combineKeyValue(K key, V trueValue, BytesWritable newValue);
}
// from my final implementation
public class UrlXCountDataReader extends HiveKeyValueSequenceFileReader<Text,LongWritable>
#Override
protected void combineKeyValue(Text key, LongWritable trueValue, BytesWritable newValue) {
// TODO I think we need to use straight bytes--I'm not sure this works?
StringBuilder builder = new StringBuilder();
builder.append(key);
builder.append('\001');
builder.append(trueValue.get());
newValue.set(new BytesWritable(builder.toString().getBytes()) );
}
}
With that, I get all my columns!
http://0-italy.com/tag/package-deals 643
http://011.hebiichigo.com/d63e83abff92df5f5913827798251276/d1ca3aaf52b41acd68ebb3bf69079bd1.html 9
http://01fishing.com/fly-fishing-knots/ 3437
http://01fishing.com/flyin-slab-creek/ 1005
http://01fishing.com/pflueger-1195x-automatic-fly-reels/ 1999
Not sure if this is impacting you but Hive ignores keys when reading SequenceFiles. You may need to create a custom InputFormat (unless you can find one online:-))
Reference: http://mail-archives.apache.org/mod_mbox/hive-user/200910.mbox/%3C5573211B-634D-4BB0-9123-E389D90A786C#metaweb.com%3E

Get a value from array based on the value of others arrays (VB.Net)

Supposed that I have two arrays:
Dim RoomName() As String = {(RoomA), (RoomB), (RoomC), (RoomD), (RoomE)}
Dim RoomType() As Integer = {1, 2, 2, 2, 1}
I want to get a value from the "RoomName" array based on a criteria of "RoomType" array. For example, I want to get a "RoomName" with "RoomType = 2", so the algorithm should randomize the index of the array that the "RoomType" is "2", and get a single value range from index "1-3" only.
Is there any possible ways to solve the problem using array, or is there any better ways to do this? Thank you very much for your time :)
Note: Code examples below using C# but hopefully you can read the intent for vb.net
Well, a simpler way would be to have a structure/class that contained both name and type properties e.g.:
public class Room
{
public string Name { get; set; }
public int Type { get; set; }
public Room(string name, int type)
{
Name = name;
Type = type;
}
}
Then given a set of rooms you can find those of a given type using a simple linq expression:
var match = rooms.Where(r => r.Type == 2).Select(r => r.Name).ToList();
Then you can find a random entry from within the set of matching room names (see below)
However assuming you want to stick with the parallel arrays, one way is to find the matching index values from the type array, then find the matching names and then find one of the matching values using a random function.
var matchingTypeIndexes = new List<int>();
int matchingTypeIndex = -1;
do
{
matchingTypeIndex = Array.IndexOf(roomType, 2, matchingTypeIndex + 1);
if (matchingTypeIndex > -1)
{
matchingTypeIndexes.Add(matchingTypeIndex);
}
} while (matchingTypeIndex > -1);
List<string> matchingRoomNames = matchingTypeIndexes.Select(typeIndex => roomName[typeIndex]).ToList();
Then to find a random entry of those that match (from one of the lists generated above):
var posn = new Random().Next(matchingRoomNames.Count);
Console.WriteLine(matchingRoomNames[posn]);