How to partially update a purescript record - record

everyone!
I'm a Purescript beginner and have trouble with working on records.
I have one record type:
type Employee =
{ firstName :: String
, lastName :: String
, address :: String
, height :: Number
, weight :: Number
...
}
And I want to update just a portion of this record.
Let's say I want to only update height like the following typescript code.
let a: Employee = {
...a,
height: 180
}
How can I achieve this in Purescript?
Thank you.

Syntax for record update in PureScript is the following:
r2 = r1 { x = 42, y = "foo" }
Where:
r1 is the original record
r2 is the new, updated record
x and y are record fields (not necessarily ALL fields)
The above snippet is equivalent to the following JavaScript code:
r2 = { ...r1, x: 42, y: "foo" }

Related

How to remove all JSON attributes with certain value in PostgreSQL

given this table
parent
payload
1
{ a: 7, b: 3 }
2
{ a: 7, c: 3 }
1
{ d: 3, e: 1, f: 3 }
I want to update children of 1 and remove any attribute X where payload->X is 3.
after executing the query the records should look like this:
parent
payload
1
{ a: 7 }
2
{ a: 7, c: 3 }
1
{ e: 1 }
update records set payload=?? where parent = 1 and ??
There is no built-inf function for this, but you can write your own:
create function remove_keys_by_value(p_input jsonb, p_value jsonb)
returns jsonb
as
$$
select jsonb_object_agg(t.key, t.value)
from jsonb_each(p_input) as t(key, value)
where value <> p_value;
$$
language sql
immutable;
Then you can do:
update records
set payload = remove_key_by_value(payload, to_jsonb(3))
where parent = 1;
This assumes that payload is defined as jsonb (which it should be). If it's not, you have to cast it: payload::jsonb
Try this
update records
set payload = payload - 'x'
where parent = 1 and payload->>'x'::int = 3

Sum up the elements of a record

type MyType = {a:int; b:int};;
let test = [{a=5;b=10}; {a=10;b=100}; {a=200; b=500}; {a=100; b=2}];;
I would like to create a function which will sum up a and b in such a way it will display
Sum of a : 315
Sum of b : 612
I think I have to use a recursive function. Here is my attempt :
let my_func record =
let adds = [] in
let rec add_func = match adds with
| [] -> record.a::adds; record.b::adds
| _ -> s + add_func(e::r)
add_func test;;
This function doesn't seem to work at all. Is there a right way to do such function?
OK, you know exactly what you're doing, you just need one suggestion I think.
Assume your function is named sumab and returns a pair of ints. Then the key expression in your recursive function will be something like this:
| { a = ha; b = hb; } :: t ->
let (a, b) = sumab t in (ha + a, hb + b)

VarcharType mismatch Spark dataframe

I'am trying to change the schema of a dataframe. every time i have a column of string type i want to change it's type to VarcharType(max) where max is the maximum lentgh of string in that column. i wrote the following code. ( i want to export the dataframe later to sql server and i don't want to have nvarchar in sql server so i'am trying to limit it on spark side )
val df = spark.sql(s"SELECT * FROM $tableName")
var l : List [StructField] = List()
val schema = df.schema
schema.fields.foreach(x => {
if (x.dataType == StringType) {
val dataColName = x.name
val maxLength = df.select(dataColName).reduce((x, y) => {
if (x.getString(0).length >= y.getString(0).length) {
x
} else {
y
}
}).getString(0).length
val dataType = VarcharType(maxLength)
l = l :+ StructField(dataColName, dataType)
} else {
l = l :+ x
}
})
val newSchema = StructType(l)
val newDf = spark.createDataFrame(df.rdd, newSchema)
However when running it i get this error.
20/01/22 15:29:44 ERROR ApplicationMaster: User class threw exception: scala.MatchError:
VarcharType(9) (of class org.apache.spark.sql.types.VarcharType)
scala.MatchError: VarcharType(9) (of class org.apache.spark.sql.types.VarcharType)
Can a dataframe column can be of type VarcharType(n) ?
The data mapping from a database to/from dataframe happens in the dialect class. For MS SQL server the class is org.apache.spark.sql.jdbc.MsSqlServerDialect. You can inherit from this and override getJDBCType to influence datatype mapping from a dataframe to a table. Then register your dialect for it to take effect.
I have done this for Oracle (not sqlserver), however it can be done similarly.
//Change this
override def getJDBCType(dt: DataType): Option[JdbcType] = dt match {
case TimestampType => Some(JdbcType("DATETIME", java.sql.Types.TIMESTAMP))
case StringType => Some(JdbcType("NVARCHAR(MAX)", java.sql.Types.NVARCHAR))
case BooleanType => Some(JdbcType("BIT", java.sql.Types.BIT))
case _ => None
}
You can't use VarcharType because it is not a DataType. Also you can't check length of actual data because it is not exposed. You only have access to "dt: DataType", so you can set a default size for NVARCHAR if max is not acceptable.

Pentaho to convert tree structure data

I have a stream of data from a CSV. It is a flat structured database.
E.g.:
a,b,c,d
a,b,c,e
a,b,f
This essentially transforms into:
Node id,Nodename,parent id,level
100, a , 0 , 1
200, b , 100 , 2
300, c , 200 , 3
400, d , 300 , 4
500, e , 300 , 4
600, f , 200 , 3
Can this be done using Pentaho? I have gone through the transformation steps. But nothing strikes me as usable for this purpose. Please let me know if there is any step that I may have missed.
Your CSV file contains graph or tree definition. The output format is rich (node_id needs to be generated, parent_id needs to be resolved, level needs to be set). There are few issues you will face when processing this kind of CSV file in Pentaho Data Integration:
Data loading & processing:
Rows do not have same length (sometimes 4 nodes, sometimes 3 node).
Load whole rows. And then split rows to nodes and process one node per record stream item.
You can calculate output values in the same step as where the nodes are split.
Solution Steps:
CSV file input: Load data from CSV. Settings: No header row; Delimiter = ';'; One output column named rowData
Modified Java Script Value: Split rowData to nodes and calculate output values: nodeId, nodeName, parentId, nodeLevel [See the code below]
Sort rows: Sort rows by nodeName. [a,b,c,d,a,b,c,e,a,b,f >> a,a,a,b,b,c,c,d,e,f]
Unique rows: Delete duplicate rows by nodeName. [a,a,a,b,b,c,c,d,e,f >> a,b,c,d,e,f]
Text file output: Write out results.
Modified Java Script Value Code:
function writeRow(nodeId, nodeName, parentId, nodeLevel){
newRow = createRowCopy(getOutputRowMeta().size());
var rowIndex = getInputRowMeta().size();
newRow[rowIndex++] = nodeId;
newRow[rowIndex++] = nodeName;
newRow[rowIndex++] = parentId;
newRow[rowIndex++] = nodeLevel;
putRow(newRow);
}
var nodeIdsMap = {
a: "100",
b: "200",
c: "300",
d: "400",
e: "500",
f: "600",
g: "700",
h: "800",
}
// rowData from record stream (CSV input step)
var nodes = rowData.split(",");
for (i = 0; i < nodes.length; i++){
var nodeId = nodeIdsMap[nodes[i]];
var parentNodeId = (i == 0) ? "0" : nodeIdsMap[nodes[i-1]];
var level = i + 1;
writeRow(nodeId, nodes[i], parentNodeId, level);
}
trans_Status = SKIP_TRANSFORMATION;
Modified Java Script Value Field Settings:
Fieldname; Type; Replace value'Fieldname' or 'Rename to'
nodeId; String; N
nodeName; String; N
parent_id; String; N
nodeLevel; String; N

Hive combine column values based upon condition

I was wondering if it is possible to combine column values based upon a condition. Let me explain...
Let say my data looks like this
Id name offset
1 Jan 100
2 Janssen 104
3 Klaas 150
4 Jan 160
5 Janssen 164
An my output should be this
Id fullname offsets
1 Jan Janssen [ 100, 160 ]
I would like to combine the name values from two rows where the offset of the two rows are no more apart then 1 character.
My question is if this type of data manipulation is possible with and if it is could someone share some code and explaination?
Please be gentle but this little piece of code return some what what I want...
ArrayList<String> persons = new ArrayList<String>();
// write your code here
String _previous = "";
//Sample output form entities.txt
//USER.A-GovDocs-f83c6ca3-9585-4c66-b9b0-f4c3bd57ccf4,Berkowitz,PERSON,9,10660
//USER.A-GovDocs-f83c6ca3-9585-4c66-b9b0-f4c3bd57ccf4,Marottoli,PERSON,9,10685
File file = new File("entities.txt");
try {
//
// Create a new Scanner object which will read the data
// from the file passed in. To check if there are more
// line to read from it we check by calling the
// scanner.hasNextLine() method. We then read line one
// by one till all line is read.
//
Scanner scanner = new Scanner(file);
while (scanner.hasNextLine()) {
if(_previous == "" || _previous == null)
_previous = scanner.nextLine();
String _current = scanner.nextLine();
//Compare the lines, if there offset is = 1
int x = Integer.parseInt(_previous.split(",")[3]) + Integer.parseInt(_previous.split(",")[4]);
int y = Integer.parseInt(_current.split(",")[4]);
if(y-x == 1){
persons.add(_previous.split(",")[1] + " " + _current.split(",")[1]);
if(scanner.hasNextLine()){
_current = scanner.nextLine();
}
}else{
persons.add(_previous.split(",")[1]);
}
_previous = _current;
}
} catch (Exception e) {
e.printStackTrace();
}
for(String person : persons){
System.out.println(person);
}
Working of this piece sample data
USER.A-GovDocs-f83c6ca3-9585-4c66-b9b0-f4c3bd57ccf4,Richard,PERSON,7,2732
USER.A-GovDocs-f83c6ca3-9585-4c66-b9b0-f4c3bd57ccf4,Marottoli,PERSON,9,2740
USER.A-GovDocs-f83c6ca3-9585-4c66-b9b0-f4c3bd57ccf4,Marottoli,PERSON,9,2756
USER.A-GovDocs-f83c6ca3-9585-4c66-b9b0-f4c3bd57ccf4,Marottoli,PERSON,9,3093
USER.A-GovDocs-f83c6ca3-9585-4c66-b9b0-f4c3bd57ccf4,Marottoli,PERSON,9,3195
USER.A-GovDocs-f83c6ca3-9585-4c66-b9b0-f4c3bd57ccf4,Berkowitz,PERSON,9,3220
USER.A-GovDocs-f83c6ca3-9585-4c66-b9b0-f4c3bd57ccf4,Berkowitz,PERSON,9,10660
USER.A-GovDocs-f83c6ca3-9585-4c66-b9b0-f4c3bd57ccf4,Marottoli,PERSON,9,10685
USER.A-GovDocs-f83c6ca3-9585-4c66-b9b0-f4c3bd57ccf4,Lea,PERSON,3,10858
USER.A-GovDocs-f83c6ca3-9585-4c66-b9b0-f4c3bd57ccf4,Lea,PERSON,3,11063
USER.A-GovDocs-f83c6ca3-9585-4c66-b9b0-f4c3bd57ccf4,Ken,PERSON,3,11186
USER.A-GovDocs-f83c6ca3-9585-4c66-b9b0-f4c3bd57ccf4,Marottoli,PERSON,9,11234
USER.A-GovDocs-f83c6ca3-9585-4c66-b9b0-f4c3bd57ccf4,Berkowitz,PERSON,9,17073
USER.A-GovDocs-f83c6ca3-9585-4c66-b9b0-f4c3bd57ccf4,Lea,PERSON,3,17095
USER.A-GovDocs-f83c6ca3-9585-4c66-b9b0-f4c3bd57ccf4,Stephanie,PERSON,9,17330
USER.A-GovDocs-f83c6ca3-9585-4c66-b9b0-f4c3bd57ccf4,Putt,PERSON,4,17340
Which produces this output
Richard Marottoli
Marottoli
Marottoli
Marottoli
Berkowitz
Berkowitz
Marottoli
Lea
Lea
Ken
Marottoli
Berkowitz
Lea
Stephanie Putt
Kind regards
Load the table using below create table
drop table if exists default.stack;
create external table default.stack
(junk string,
name string,
cat string,
len int,
off int
)
ROW FORMAT DELIMITED
FIELDS terminated by ','
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
location 'hdfs://nameservice1/....';
Use below query to get your desired output.
select max(name), off from (
select CASE when b.name is not null then
concat(b.name," ",a.name)
else
a.name
end as name
,Case WHEN b.off1 is not null
then b.off1
else a.off
end as off
from default.stack a
left outer join (select name
,len+off+ 1 as off
,off as off1
from default.stack) b
on a.off = b.off ) a
group by off
order by off;
I have tested this it generates your desired result.