How to handle sequence file in spark-sql - sql

I have the following data sample :
file name: sample.txt
| TRANSACTION_ID | ITEM_ID | AUC_END_DT | BD_ID | BD_SITE |
+----------------+--------------+------------+-----------+---------+
| 320562466 | 7322548247 | 5/22/2005 | 32148826 | 77 |
| 569643695009 | 190558793670 | 7/31/2011 | 112644812 | 0 |
Here is the query that I'm running :
select * from table_name where item_id = '$item_id';
I need to convert this sample.txt file into sequence file and then need to create DataFrame for that sequence file for further analysis.
case class db_col( transaction_id:Double,
item_id:Long,
auc_end_dt:String,
bd_id:Long,
bd_site:Int)
object V_bd {
def main(args: Array[String]) {
val item_id_args = args(0)
val conf = new SparkConf().setAppName("POC_Naren").setMaster("local")
val sc = new SparkContext(conf)
val ssc = new SQLContext(sc)
import ssc.implicits._
val dw_bid_base_rdd = sc.textFile("C:/Users/Downloads/sqlscript/reference/data/sample.txt")
val bd_trans_rdd = dw_bid_base_rdd.map(row => row.split("\\|"))
val bd_col_rdd = bd_trans_rdd.map(p => db_col(p(0).trim.toDouble,p(1),p(2),p(.3).trim.tolong,p(4).trim.toInt))
val bd_df_rdd = bd_col_rdd.toDF()
bd_df_rdd.registerTempTable("bd_table")
val bd_table_query = ssc.sql("select * from table_name where item_id = '$item_id_args';")
bd_table_query.show()
}
}

You'll need to convert your DataFrame into a RDD[(K,V)]. Example
import org.apache.spark.rdd.RDD
import org.apache.spark.sql.{Row, DataFrame}
val bd_table_query : DataFrame = ???
val rdd : RDD[(Int,String)] = df.rdd.map {
case r : Row => (r.getAs[Int](0),r.getAs[String](1)) // I'll let you choose your keys and convert into the right format
}
Then you can save the RDD :
rdd.saveAsSequenceFile("output.seq")

Related

Get index of each root on level wise in tree data structure

Hey I am working on tree data structure. I want to know can we get index of each node in level wise. I below diagram represent how I want the value + index. Level A or B represent node value and index value represent index value
Node
| | |
Level A -> 1 2 3
index value-> 0 1 2
| | | | | |
| | | | | |
Leve B-> 4 5 6 7 8 9
index value-> 0 1 2 3 4 5
....// more level
How can we achieved index in each level wise. I am adding my logic how I am adding value in each level wise. Could you someone suggest how can I achieve this?
var baseNode: LevelIndex = LevelIndex()
var defaultId = "1234"
fun main() {
val list = getUnSortedDataListForLevel()
val tempHashMap: MutableMap<String, LevelIndex> = mutableMapOf()
list.forEach { levelClass ->
levelClass.levelA?.let { levelA ->
val levelOneTempHashMapNode = tempHashMap["level_a${levelA}"]
if (levelOneTempHashMapNode != null) {
if (defaultId == levelClass.id && levelOneTempHashMapNode is LevelOne) {
levelOneTempHashMapNode.defaultValue = true
}
return#let
}
val tempNode = LevelOne().apply {
value = levelA
if (defaultId == levelClass.id) {
defaultValue = true
}
}
baseNode.children.add(tempNode)
tempHashMap["level_a${levelA}"] = tempNode
}
levelClass.levelB?.let { levelB ->
val levelTwoTempHashMapNode = tempHashMap["level_a${levelClass.levelA}_level_b${levelB}"]
if (levelTwoTempHashMapNode != null) {
if (defaultId == levelClass.id && levelOneTempHashMapNode is LevelTwo) {
levelTwoTempHashMapNode.defaultValue = true
}
return#let
}
val tempNode = LevelTwo().apply {
value = levelB
if (defaultId == levelClass.id) {
defaultValue = true
}
}
val parent =
tempHashMap["level_a${levelClass.levelA}"] ?: baseNode
parent.children.add(tempNode)
tempHashMap["level_a${levelClass.levelA}_level_b${levelB}"] =
tempNode
}
levelClass.levelC?.let { levelC ->
val tempNode = LevelThree().apply {
value = levelC
if (defaultId == levelClass.id) {
defaultValue = true
}
}
val parent =
tempHashMap["level_a${levelClass.levelA}_level_b${levelClass.levelB}"]
?: baseNode
parent.children.add(tempNode)
}
}
}
open class LevelIndex(
var value: String? = null,
var children: MutableList<LevelIndex> = arrayListOf()
)
class LevelOne : LevelIndex() {
var defaultValue: Boolean? = false
}
class LevelTwo : LevelIndex() {
var defaultValue: Boolean? = false
}
class LevelThree : LevelIndex() {
var defaultValue: Boolean = false
}
UPDATE
I want index value by root level because, I have one id, I want to match that combination with that id, if that value is present then I am storing that value b true, and need to find that index value.
Node
| | |
Level A -> 1 2 3
index value-> 0 1 2
default value-> false true false
| | | | | |
| | | | | |
Leve B-> 4 5 6 7 8 9
index value-> 0 1 2 3 4 5
default value->false false true false false false
....// more level
So, Level A I'll get index 1.
For Level B I'll get index 2
I'd create a list to put the nodes at each level in order. You can recursively collect them from your tree.
val nodesByLevel = List(3) { mutableListOf<LevelIndex>() }
fun collectNodes(parent: LevelIndex) {
for (child in parent.children) {
val listIndex = when (child) {
is LevelOne -> 0
is LevelTwo -> 1
is LevelThree -> 2
// I made LevelIndex a sealed class. Otherwise you would need an else branch here.
}
nodesByLevel[listIndex] += child
collectNodes(child)
}
}
collectNodes(baseNode)
Now nodesByLevel contains three lists containing all the nodes in each layer in order.
If you just need the String values, you could change that mutableList to use a String type and use += child.value ?: "" instead, although I would make value non-nullable (so you don't need ?: ""), because what use is a node with no value?
Edit
I would move defaultValue up into the parent class so you don't have to cast the nodes to be able to read it. And I'm going to treat is as non-nullable.
sealed class LevelIndex(
var value: String = "",
val children: MutableList<LevelIndex> = arrayListOf()
var isDefault: Boolean = false
)
Then if you want to do something with the items based on their indices:
for ((layerNumber, layerList) in nodesByLevel.withIndex()) {
for((nodeIndexInLayer, node) in layerList) {
val selectedIndexForThisLayer = TODO() //with layerNumber
node.isDefault = nodeIndexInLayer == selectedIndexForThisLayer
}
}

Compare column values in Dataframes and return difference column name with value in Spark java 8

I have two data frames with 230 columns each and would like to compare the two data frames on one key column and get column names with values from both of them if any difference in column value in Java 8 with Spark.
id Col_1 Col_2 Col_3 Col_4 Col_5
1 A B C D E
2 X Y Z P Q
id Col_1 Col_2 Col_3 Col_4 Col_5
1 A B6 C D E
2 X Y Z8 P Q3
OutPut
id Col_1 Col_2 Col_3 Col_4 Col_5
1 null [B,B6] null null null
2 null null [Z,Z8] null [Q,Q3]
Using Spark and Java8
Df1.except(DF2);
StructType one = DF1.schema();
JavaPairRDD<String, Row> pair1 = DF1.toJavaRDD()
.mapToPair(new PairFunction<Row, String, Row>() {
public Tuple2<String, Row> call(Row row) {
return new Tuple2<String, Row>(row.getString(0), row);
}
});
JavaPairRDD<String, Row> pair2 = DF2.toJavaRDD()
.mapToPair(new PairFunction<Row, String, Row>() {
public Tuple2<String, Row> call(Row row) {
return new Tuple2<String, Row>(row.getString(0), row);
}
});
JavaPairRDD<String, Row> subs = pair1.subtractByKey(pair2);
JavaRDD<Row> rdd = subs.values();
Dataset<Row> diff = spark.createDataFrame(rdd, one);
diff.show();
Please help.
Please find the below solution
I have tried to solve the problem by keeping Dataframes as Dataframes and you can find inline comments for the code explanation.
The actual solution starts after the line //Below is the solution
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Encoders;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.SparkSession;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
import static org.apache.spark.sql.functions.*;
public class CompareDfs {
public static void main(String[] args) {
SparkSession spark = Constant.getSparkSess();
List<String> list1 = new ArrayList<>();
list1.add("1,A,B,C,D,E");
list1.add("2,X,Y,Z,P,Q");
List<String> list2 = new ArrayList<>();
list2.add("1,A,B6,C,D,E");
list2.add("2,X,Y,Z8,P,Q3");
Dataset<Row> df = spark.createDataset(list1, Encoders.STRING()).toDF().selectExpr("split(value, ',')[0] as id",
"split(value, ',')[1] as Col_1",
"split(value, ',')[2] as Col_2",
"split(value, ',')[3] as Col_3",
"split(value, ',')[4] as Col_4",
"split(value, ',')[5] as Col_5");
// df.printSchema();
// df.show();
// Convert
Dataset<Row> df1 = spark.createDataset(list2, Encoders.STRING()).toDF().selectExpr("split(value, ',')[0] as id",
"split(value, ',')[1] as Col_1",
"split(value, ',')[2] as Col_2",
"split(value, ',')[3] as Col_3",
"split(value, ',')[4] as Col_4",
"split(value, ',')[5] as Col_5");
// df1.printSchema();
// df1.show();
//Below is the solution
List<String> columns = Arrays.asList("Col_1", "Col_2", "Col_3", "Col_4", "Col_5"); // List of columns to merge
// inner join the 2 dataframes
Dataset<Row> joinedDf = df.join(df1).where(df.col("id").equalTo(df1.col("id")));
// Iterate throgh the columns
for (String column : columns) {
joinedDf = joinedDf
.withColumn(column + "_temp",
when(df.col(column).equalTo(df1.col(column)), null) // When and otherwise clause for column to array/nul transformation
.otherwise(split(concat_ws(",", df.col(column), df1.col(column)), ",")))
.drop(df.col(column)) // Drop column from 1st dataframe
.drop(df1.col(column)) // Drop column from 2nd dataframe
.withColumnRenamed(column + "_temp", column); // Rename column to the result column name
}
// .withColumn("Col_2_t",when(df.col("Col_2").equalTo(df1.col("Col_2")), null ).otherwise(split(concat_ws(",",df.col("Col_2"),df1.col("Col_2")),",")))
joinedDf.show();
}
}
I tried to solve this using dataframe approach-
List<Column> cols = Arrays.stream(df1.columns())
.map(c -> {
if (c.equalsIgnoreCase("id"))
return col("a.id");
else
return array(toScalaSeq(Arrays.asList(col("a."+c), col("b."+c))).toBuffer()).as(c);
}).collect(Collectors.toList());
Dataset<Row> processedDf = df1.as("a").join(df2.as("b"), df1.col("id").equalTo(df2.col("id")))
.select(toScalaSeq(cols).toBuffer());
List<Column> cols1 = Arrays.stream(df1.columns())
.map(f -> {
if (f.equalsIgnoreCase("id"))
return expr(f);
else
return expr("if(size(array_distinct(" + f + "))==1, NULL, " + f + " ) as " + f);
}).collect(Collectors.toList());
processedDf.select(toScalaSeq(cols1).toBuffer())
.show(false);
/**
* +---+-----+-------+-------+-----+-------+
* |id |Col_1|Col_2 |Col_3 |Col_4|Col_5 |
* +---+-----+-------+-------+-----+-------+
* |1 |null |[B, B6]|null |null |null |
* |2 |null |null |[Z, Z8]|null |[Q, Q3]|
* +---+-----+-------+-------+-----+-------+
*/
Please refer full code here- gist

Merge several time-series datasets into one

Given several sets of time-stamped data, how can one merge them into one?
Suppose, I have a dataset represented by the following data structure (Kotlin):
data class Data(
val ax: Double?,
val ay: Double?,
val az: Double?,
val timestamp: Long
)
ax, ay, az - accelerations over the respective axes
timestamp - unix timestamp
Now, I got three datasets: Ax, Ay, Az. Each dataset has two non-null fields: the timestamp and the acceleration over its' own axis.
Ax:
+-----+------+------+-----------+
| ax | ay | az | timestamp |
+-----+------+------+-----------+
| 0.0 | null | null | 0 |
| 0.1 | null | null | 50 |
| 0.2 | null | null | 100 |
+-----+------+------+-----------+
Ay:
+------+-----+------+-----------+
| ax | ay | az | timestamp |
+------+-----+------+-----------+
| null | 1.0 | null | 10 |
| null | 1.1 | null | 20 |
| null | 1.2 | null | 30 |
+------+-----+------+-----------+
Az:
+------+------+-----+-----------+
| ax | ay | az | timestamp |
+------+------+-----+-----------+
| null | null | 2.0 | 20 |
| null | null | 2.1 | 40 |
| null | null | 2.2 | 60 |
+------+------+-----+-----------+
What the algorithm would produce is:
+------+------+------+-----------+
| ax | ay | az | timestamp |
+------+------+------+-----------+
| 0.0 | null | null | 0 |
| 0.0 | 1.0 | null | 10 |
| 0.0 | 1.1 | 2.0 | 20 |
| 0.0 | 1.2 | 2.0 | 30 |
| 0.0 | 1.2 | 2.1 | 40 |
| 0.1 | 1.2 | 2.1 | 50 |
| 0.1 | 1.2 | 2.2 | 60 |
| 0.2 | 1.2 | 2.2 | 100 |
+------+------+------+-----------+
So in order to merge three datasets into one, I:
Put Ax, Ay and Az into one list:
val united: List<Data> = arrayListOf<Data>()
united.addAll(Ax)
united.addAll(Ay)
united.addAll(Az)
Sort the resulting list by timestamp:
united.sortBy { it.timestamp }
Copy unchanged values down the stream:
var tempAx: Double? = null
var tempAy: Double? = null
var tempAz: Double? = null
for (i in 1 until united.size) {
val curr = united[i]
val prev = united[i-1]
if (curr.ax == null) {
if (prev.ax != null) {
curr.ax = prev.ax
tempAx = prev.ax
}
else curr.ax = tempAx
}
if (curr.ay == null) {
if (prev.ay != null) {
curr.ay = prev.ay
tempAy = prev.ay
}
else curr.ay = tempAy
}
if (curr.az == null) {
if (prev.az != null) {
curr.az = prev.az
tempAz = prev.az
}
else curr.az = tempAz
}
}
Remove duplicated rows (with the same timestamp):
return united.distinctBy { it.timestamp }
The above method could be improved by merging two lists at a time, I could perhaps create a function for that.
Is there a more elegant solution to this problem? Any thoughts? Thanks.
I assume that your Data rather contains vars instead of vals (otherwise your code wouldn't work). The following is a rewrite of your function using grouped timestamps and a method, that either extracts the interested property or returns the last known value for the given property otherwise.
// your tempdata containing the default (starting) values:
val tempData = Data(0.0, 0.0, 0.0, 0L)
fun extract(dataList: List<Data>, prop: KMutableProperty1<Data, Double?>) =
// find the first non null value for the given property
dataList.firstOrNull { prop(it) != null }
// extract that property
?.let(prop)
// set the extracted value in our tempData so that it can reused if a null value is retrieved in future
?.also { prop.set(tempData, it) }
// if the above didn't return a value, use the last one set into tempData
?: prop(tempData)
val mergedData = /* your united.addAll */ (Ax + Ay + Az)
.groupBy { it.timestamp }
// your sort by timestamp
.toSortedMap()
.map {(timestamp, dataList) ->
Data(extract(dataList, Data::ax),
extract(dataList, Data::ay),
extract(dataList, Data::az),
timestamp
)
It's rather hard to come up with a better approach as your main condition (defaulting to the last resolved value) will actually force you to have your dataset sorted and to hold a (or several) temporary variable(s).
However, the benefits of this version in contrast to yours are the following:
don't bother about the indices
less duplicated code
no need to remove any duplicates from the returned list (no need to distinctBy)
while the extract-method itself might be complex, the usage of it is more readable
Maybe by refactoring the extract the whole gets more readable too.
As you also said, that you want it to be easily portable to Java, here a possible Java rewrite:
Map<Long, List<Data>> unitedList = Stream.concat(Stream.concat(Ax.stream(), Ay.stream()), Az.stream())
.collect(Collectors.groupingBy(Data::getTimestamp));
List<Data> mergedData = unitedList.keySet().stream().sorted()
.map(key -> {
List<Data> dataList = unitedList.get(key);
return new Data(extract(dataList, Data::getAx, Data::setAx),
extract(dataList, Data::getAy, Data::setAy),
extract(dataList, Data::getAz, Data::setAz),
key);
}).collect(Collectors.toList());
and the extract could then look like:
Double extract(List<Data> dataList, Function<Data, Double> getter, BiConsumer<Data, Double> setter) {
Optional<Double> relevantProperty = dataList.stream()
.map(getter)
.filter(Objects::nonNull)
.findFirst();
if (relevantProperty.isPresent()) {
setter.accept(tempData, relevantProperty.get());
return relevantProperty.get();
} else {
return getter.apply(tempData);
}
}
Basically the same mechanism.
So at the moment I am using this solution:
data class Data(
var ax: Double?,
var ay: Double?,
var az: Double?,
val timestamp: Long
)
fun mergeDatasets(Ax: List<Data>, Ay: List<Data>, Az: List<Data>): List<Data> {
val united = mutableListOf<Data>()
united.addAll(Ax)
united.addAll(Ay)
united.addAll(Az)
united.sortBy { it.timestamp }
var tempAx: Double? = null
var tempAy: Double? = null
var tempAz: Double? = null
for (i in 1 until united.size) {
val curr = united[i]
val prev = united[i-1]
if (curr.ax == null) {
if (prev.ax != null) {
curr.ax = prev.ax
tempAx = prev.ax
}
else curr.ax = tempAx
}
if (curr.ay == null) {
if (prev.ay != null) {
curr.ay = prev.ay
tempAy = prev.ay
}
else curr.ay = tempAy
}
if (curr.az == null) {
if (prev.az != null) {
curr.az = prev.az
tempAz = prev.az
}
else curr.az = tempAz
}
if (curr.timestamp == prev.timestamp) {
prev.ax = curr.ax
prev.ay = curr.ay
prev.az = curr.az
}
}
return united.distinctBy { it.timestamp }
}

Reverse Cartesian Product

Given the data set below:
a | b | c | d
1 | 3 | 7 | 11
1 | 5 | 7 | 11
1 | 3 | 8 | 11
1 | 5 | 8 | 11
1 | 6 | 8 | 11
Perform a reverse Cartesian product to get:
a | b | c | d
1 | 3,5 | 7,8 | 11
1 | 6 | 8 | 11
I am currently working with scala, and my input/output data type is currently:
ListBuffer[Array[Array[Int]]]
I have come up with a solution (seen below), but feel it could be optimized. I am open to optimizations of my approach, and completely new approaches. Solutions in scala and c# are preferred.
I am also curious if this could be done in MS SQL.
My current solution:
def main(args: Array[String]): Unit = {
// Input
val data = ListBuffer(Array(Array(1), Array(3), Array(7), Array(11)),
Array(Array(1), Array(5), Array(7), Array(11)),
Array(Array(1), Array(3), Array(8), Array(11)),
Array(Array(1), Array(5), Array(8), Array(11)),
Array(Array(1), Array(6), Array(8), Array(11)))
reverseCartesianProduct(data)
}
def reverseCartesianProduct(input: ListBuffer[Array[Array[Int]]]): ListBuffer[Array[Array[Int]]] = {
val startIndex = input(0).size - 1
var results:ListBuffer[Array[Array[Int]]] = input
for (i <- startIndex to 0 by -1) {
results = groupForward(results, i, startIndex)
}
results
}
def groupForward(input: ListBuffer[Array[Array[Int]]], groupingIndex: Int, startIndex: Int): ListBuffer[Array[Array[Int]]] = {
if (startIndex < 0) {
val reduced = input.reduce((a, b) => {
mergeRows(a, b)
})
return ListBuffer(reduced)
}
val grouped = if (startIndex == groupingIndex) {
Map(0 -> input)
}
else {
groupOnIndex(input, startIndex)
}
val results = grouped.flatMap{
case (index, values: ListBuffer[Array[Array[Int]]]) =>
groupForward(values, groupingIndex, startIndex - 1)
}
results.to[ListBuffer]
}
def groupOnIndex(list: ListBuffer[Array[Array[Int]]], index: Int): Map[Int, ListBuffer[Array[Array[Int]]]] = {
var results = Map[Int, ListBuffer[Array[Array[Int]]]]()
list.foreach(a => {
val key = a(index).toList.hashCode()
if (!results.contains(key)) {
results += (key -> ListBuffer[Array[Array[Int]]]())
}
results(key) += a
})
results
}
def mergeRows(a: Array[Array[Int]], b: Array[Array[Int]]): Array[Array[Int]] = {
val zipped = a.zip(b)
val merged = zipped.map{ case (array1: Array[Int], array2: Array[Int]) =>
val m = array1 ++ array2
quickSort(m)
m.distinct
.array
}
merged
}
The way this works is:
Loop over columns, from right to left (the groupingIndex specifies which column to run on. This column is the only one which does not have to have values equal to each other in order to merge the rows.)
Recursively group the data on all other columns (not groupingIndex).
After grouping all columns, it is assumed that the data in each group have equivalent values in every column except for the grouping column.
Merge the rows with the matching columns. Take the distinct values for each column and sort each one.
I apologize if some of this does not make sense, my brain is not functioning today.
Here is my take on this. Code is in Java but could easily be converted into Scala or C#.
I run groupingBy on all combinations of n-1 and go with the one that has the lowest count, meaning largest merge depth, so this is kind of a greedy approach. However it is not guaranteed that you will find the optimal solution, meaning minimize the number k which is np-hard to do, see link here for an explanation, but you will find a solution that is valid and do it rather fast.
Full example here: https://github.com/jbilander/ReverseCartesianProduct/tree/master/src
Main.java
import java.util.*;
import java.util.stream.Collectors;
public class Main {
public static void main(String[] args) {
List<List<Integer>> data = List.of(List.of(1, 3, 7, 11), List.of(1, 5, 7, 11), List.of(1, 3, 8, 11), List.of(1, 5, 8, 11), List.of(1, 6, 8, 11));
boolean done = false;
int rowLength = data.get(0).size(); //4
List<Table> tables = new ArrayList<>();
// load data into table
for (List<Integer> integerList : data) {
Table table = new Table(rowLength);
tables.add(table);
for (int i = 0; i < integerList.size(); i++) {
table.getMap().get(i + 1).add(integerList.get(i));
}
}
// keep track of count, needed so we know when to stop iterating
int numberOfRecords = tables.size();
// start algorithm
while (!done) {
Collection<List<Table>> result = getMinimumGroupByResult(tables, rowLength);
if (result.size() < numberOfRecords) {
tables.clear();
for (List<Table> tableList : result) {
Table t = new Table(rowLength);
tables.add(t);
for (Table table : tableList) {
for (int i = 1; i <= rowLength; i++) {
t.getMap().get(i).addAll(table.getMap().get(i));
}
}
}
numberOfRecords = tables.size();
} else {
done = true;
}
}
tables.forEach(System.out::println);
}
private static Collection<List<Table>> getMinimumGroupByResult(List<Table> tables, int rowLength) {
Collection<List<Table>> result = null;
int min = Integer.MAX_VALUE;
for (List<Integer> keyCombination : getKeyCombinations(rowLength)) {
switch (rowLength) {
case 4: {
Map<Tuple3<TreeSet<Integer>, TreeSet<Integer>, TreeSet<Integer>>, List<Table>> map =
tables.stream().collect(Collectors.groupingBy(t -> new Tuple3<>(
t.getMap().get(keyCombination.get(0)),
t.getMap().get(keyCombination.get(1)),
t.getMap().get(keyCombination.get(2))
)));
if (map.size() < min) {
min = map.size();
result = map.values();
}
}
break;
case 5: {
//TODO: Handle n = 5
}
break;
case 6: {
//TODO: Handle n = 6
}
break;
}
}
return result;
}
private static List<List<Integer>> getKeyCombinations(int rowLength) {
switch (rowLength) {
case 4:
return List.of(List.of(1, 2, 3), List.of(1, 2, 4), List.of(2, 3, 4), List.of(1, 3, 4));
//TODO: handle n = 5, n = 6, etc...
}
return List.of(List.of());
}
}
Output of tables.forEach(System.out::println)
Table{1=[1], 2=[3, 5, 6], 3=[8], 4=[11]}
Table{1=[1], 2=[3, 5], 3=[7], 4=[11]}
or rewritten for readability:
a | b | c | d
--|-------|---|---
1 | 3,5,6 | 8 | 11
1 | 3,5 | 7 | 11
If you were to do all this in sql (mysql) you could possibly use group_concat(), I think MS SQL has something similar here: simulating-group-concat or STRING_AGG if SQL Server 2017, but I think you would have to work with text columns which is a bit nasty in this case:
e.g.
create table my_table (A varchar(50) not null, B varchar(50) not null,
C varchar(50) not null, D varchar(50) not null);
insert into my_table values ('1','3,5','4,15','11'), ('1','3,5','3,10','11');
select A, B, group_concat(C order by C) as C, D from my_table group by A, B, D;
Would give the result below, so you would have to parse and sort and update the comma separated result for any next merge iteration (group by) to be correct.
['1', '3,5', '3,10,4,15', '11']

How to upate a record that has another record within it?

I have a record :
type alias Point = {x : Int, y : Int}
I have another record like this :
type alias SuperPoint = {p : Point, z : Int}
p = Point 5 10
sp = SuperPoint p 15
Now if I need to update SuperPoint.z I can do this :
{sp | z = 20}
How do I update SuperPoint.Point?
sp2 =
let
p2 = { p | x = 42 }
in
{ sp | p = p2 }
Right now you have four ways to update:
Two of them you can find below in code
third is using Monocle.
fourth: Re-structure your model with Dictionaries and handle the update in a proper generic way
A the end there is also a code which doesn't work, but could be useful, if that worked that way
Example in elm-0.18
import Html exposing (..)
model =
{ left = { x = 1 }
}
updatedModel =
let
left =
model.left
newLeft =
{ left | x = 10 }
in
{ model | left = newLeft }
updateLeftX x ({ left } as model) =
{ model | left = { left | x = x } }
updatedModel2 =
updateLeftX 11 model
main =
div []
[ div [] [ text <| toString model ]
, div [] [ text <| toString updatedModel ]
, div [] [ text <| toString updatedModel2 ]
]
Examples from https://groups.google.com/forum/#!topic/elm-discuss/CH77QbLmSTk
-- nested record updates are not allowed
-- https://github.com/elm-lang/error-message-catalog/issues/159
updatedModel3 =
{ model | left = { model.left | x = 12 } }
-- The .field shorthand does not for the update syntax
-- https://lexi-lambda.github.io/blog/2015/11/06/functionally-updating-record-types-in-elm/
updateInt : Int -> (data -> Int) -> data -> data
updateInt val accessor data =
{ data | accessor = val }
updatedModel4 =
{ model | left = updateInt 13 .x model.left }