deleteDocuments(Term) doesn't delete documents - lucene

Deletion does not work when using a term:
val term = Term(INDEX_PROJEKT_ID, aggregat.projekt.id)
it.deleteDocuments(term)
It does work however when using a query:
val query = QueryParser(INDEX_PROJEKT_ID, StandardAnalyzer()).parse("\"${aggregat.projekt.id}\"")
it.deleteDocuments(query)

Related

Push a SQL query to a server from JDBC connection which reads from multiple databases within that server

I'm pushing a query down to a server to read data into Databricks as below:
val jdbcUsername = dbutils.secrets.get(scope = "", key = "")
val jdbcPassword = dbutils.secrets.get(scope = "", key = "")
Class.forName("com.microsoft.sqlserver.jdbc.SQLServerDriver")
val jdbcHostname = ""
val jdbcPort = ...
val jdbcDatabase = ""
// Create the JDBC URL without passing in the user and password parameters.
val jdbcUrl = s"jdbc:sqlserver://${jdbcHostname}:${jdbcPort};database=${jdbcDatabase}"
// Create a Properties() object to hold the parameters.
import java.util.Properties
val connectionProperties = new Properties()
connectionProperties.put("user", s"${jdbcUsername}")
connectionProperties.put("password", s"${jdbcPassword}")
val driverClass = "com.microsoft.sqlserver.jdbc.SQLServerDriver"
connectionProperties.setProperty("Driver", driverClass)
// define a query to be passed to database to display the tables available for a given DB
val query_results = "(SELECT * FROM INFORMATION_SCHEMA.TABLES) as tables"
// push the query down to the server to retrieve the list of available tables
val table_names = spark.read.jdbc(jdbcUrl, query_results, connectionProperties)
table_names.createOrReplaceTempView("table_names")
Running display(table_names) would provide a list of tables for a given defined database. This is no issue, however when trying to read and join tables from multiple databases in the same server I havent yet found a solution that works.
An example would be:
// define a query to be passed to database to display a result across many tables
val report1_results = "(SELECT a.Field1, b.Field2 FROM database_1 as a left join database_2 as b on a.Field4 == b.Field8) as report1"
// push the query down to the server to retrieve the query result
val report1_results = spark.read.jdbc(jdbcUrl, report1_results, connectionProperties)
report1_results .createOrReplaceTempView("report1_results")
Any pointers appreciated wrt to restructuring this code (equivalent in Python would also be super helpful).
SQL Server uses 3-part naming like database.schema.table. This example comes from the SQL Server information_schema docs:
SELECT TABLE_CATALOG, TABLE_SCHEMA, TABLE_NAME, COLUMN_NAME, COLUMN_DEFAULT
FROM AdventureWorks2012.INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = N'Product';
To query across databases you need to specify all 3 parts in the query being pushed down to SQL Server.
SELECT a.Field1, b.Field2
FROM database_1.schema_1.table_1 as a
LEFT JOIN database_2.schema_2.table_2 as b
on a.Field4 == b.Field8

JOOQ - Select Distinct with Join - Fetch mapper

This is the SQL I am trying to create with JOOQ -
select distinct(kmp.*) from office_all_company_kmp kmp
inner join company_kmp companykmp on kmp.id=companykmp.kmp_id
where companykmp.company_id=?1
I am writing code in Kotlin. I had 2 issues doing this -
In the select clause, unless I add a .asList() to the fields array, I couldn't get it compiling.
The fetch mapper had to be handcoded. Is there a way I can use do this without writing all that code? I can map records fetched back from one table without writing any mapping.
Here's what I am talking about:
fun OfficeAllCompanyKmpDao.findByCompany(companyId: UUID): List<OfficeAllCompanyKmp> =
this.ctx()
.selectDistinct(OFFICE_ALL_COMPANY_KMP.fields().asList()) // without the asList() it wouldn't compile
.from(OFFICE_ALL_COMPANY_KMP)
.join(COMPANY_KMP).on(OFFICE_ALL_COMPANY_KMP.ID.eq(COMPANY_KMP.KMP_ID))
.where(COMPANY_KMP.COMPANY_ID.eq(companyId))
.fetch { // how do I write the mapper without manually writing code like the below?
OfficeAllCompanyKmp(
id = it[OFFICE_ALL_COMPANY_KMP.ID],
officeId = it[OFFICE_ALL_COMPANY_KMP.OFFICE_ID],
din = it[OFFICE_ALL_COMPANY_KMP.DIN],
pan = it[OFFICE_ALL_COMPANY_KMP.PAN],
name = it[OFFICE_ALL_COMPANY_KMP.NAME],
dateOfBirth = it[OFFICE_ALL_COMPANY_KMP.DATE_OF_BIRTH],
address = it[OFFICE_ALL_COMPANY_KMP.ADDRESS],
email = it[OFFICE_ALL_COMPANY_KMP.EMAIL],
kmpDetails = it[OFFICE_ALL_COMPANY_KMP.KMP_DETAILS],
createdTimestamp = it[OFFICE_ALL_COMPANY_KMP.CREATED_TIMESTAMP],
updatedTimestamp = it[OFFICE_ALL_COMPANY_KMP.UPDATED_TIMESTAMP],
versionNo = it[OFFICE_ALL_COMPANY_KMP.VERSION_NO],
createdUserId = it[OFFICE_ALL_COMPANY_KMP.CREATED_USER_ID],
updatedUserId = it[OFFICE_ALL_COMPANY_KMP.UPDATED_USER_ID]
)
}
A better approach than inner joining and then removing duplicates again would be to semi join your other table using IN or EXISTS:
this.ctx()
.selectFrom(OFFICE_ALL_COMPANY_KMP)
.where(OFFICE_ALL_COMPANY_KMP.ID.`in`(
select(COMPANY_KMP.KMP_ID)
.from(COMPANY_KMP)
.where(COMPANY_KMP.COMPANY_ID.eq(companyId)))
.fetchInto(OfficeAllCompanyKmp::class.java)
Or, alternatively, use jOOQ's synthetic LEFT SEMI JOIN syntax (see also this blog post for an explanation for this syntax, or this one for joins in general, or Wikipedia's nice explanation about semi joins)
this.ctx()
.select()
.from(OFFICE_ALL_COMPANY_KMP)
.leftSemiJoin(COMPANY_KMP)
.on(OFFICE_ALL_COMPANY_KMP.ID.eq(COMPANY_KMP.KMP_ID))
.and(COMPANY_KMP.COMPANY_ID.eq(companyId))
.fetchInto(OfficeAllCompanyKmp::class.java)
Your problem 1) went away by using different jOOQ API, where you don't have to list all columns explicitly in SELECT. Your problem 2) is fixed easily by calling fetchInto() instead.

Alias for an aggregate column

I would like to get the average of a column using Kotlin Exposed.
object MyTable: IntIdTable("MyTable") {
val score = integer("score")
val result = MyTable.slice(
MyTable.score.avg().alias("avg_points")
).first()
How do I get the result?
For normal columns I would use
result[MyTable.score]
But now it is an aggregate with an alias. I've tried
result["avg_points"]
But that fails. I don't see many public methods on ResultRow.
Try this.
First save the average to a variable
val avgColumn = MyTable.score.avg().alias("avg_points")
Then get the results as such
val result = MyTable.slice(
avgColumn
).selectAll().first()
val avg = result[avgColumn]

Does Apache Spark SQL support MERGE clause?

Does Apache Spark SQL support MERGE clause that's similar to Oracle's MERGE SQL clause?
MERGE into <table> using (
select * from <table1>
when matched then update...
DELETE WHERE...
when not matched then insert...
)
Spark does support MERGE operation using Delta Lake as storage format. The first thing to do is to save the table using the delta format to provide support for transactional capabilities and support for DELETE/UPDATE/MERGE operations with spark
Python/scala:
df.write.format("delta").save("/data/events")
SQL: CREATE TABLE events (eventId long, ...) USING delta
Once the table exists, you can run your usual SQL Merge command:
MERGE INTO events
USING updates
ON events.eventId = updates.eventId
WHEN MATCHED THEN
UPDATE SET events.data = updates.data
WHEN NOT MATCHED
THEN INSERT (date, eventId, data) VALUES (date, eventId, data)
The command is also available in Python/Scala:
DeltaTable.forPath(spark, "/data/events/")
.as("events")
.merge(
updatesDF.as("updates"),
"events.eventId = updates.eventId")
.whenMatched
.updateExpr(
Map("data" -> "updates.data"))
.whenNotMatched
.insertExpr(
Map(
"date" -> "updates.date",
"eventId" -> "updates.eventId",
"data" -> "updates.data"))
.execute()
To support Delta Lake format, you also need the delta package as dependency in your spark job:
<dependency>
<groupId>io.delta</groupId>
<artifactId>delta-core_x.xx</artifactId>
<version>xxxx</version>
</dependency>
See https://docs.delta.io/latest/delta-update.html#upsert-into-a-table-using-merge for more details
As of Spark 3.0, Spark offers a very clean way of doing the merge operation using the spark delta table.
https://docs.delta.io/latest/delta-update.html#upsert-into-a-table-using-merge
It does not. As of now (it might change in the future) Spark doesn't support UPDATES, DELETES or any other variant of record modification.
It can only overwrite existing storage (with different implementation depending on the source) or append with plain INSERT.
you can write your custom code: Below code you can edit to go with merge instead of Insert. Make sure this is computation heavy operations. but get y
df.rdd.coalesce(2).foreachPartition(partition => {
val connectionProperties = brConnect.value
val jdbcUrl = connectionProperties.getProperty("jdbcurl")
val user = connectionProperties.getProperty("user")
val password = connectionProperties.getProperty("password")
val driver = connectionProperties.getProperty("Driver")
Class.forName(driver)
val dbc: Connection = DriverManager.getConnection(jdbcUrl, user, password)
val db_batchsize = 1000
var pstmt: PreparedStatement = null
partition.grouped(db_batchsize).foreach(batch => {
batch.foreach{ row =>
{
val id = row.id
val fname = row.fname
val lname = row.lname
val userid = row.userid
println(id, fname)
val sqlString = "INSERT employee USING " +
" values (?, ?, ?, ?) "
var pstmt: PreparedStatement = dbc.prepareStatement(sqlString)
pstmt.setLong(1, row.id)
pstmt.setString(2, row.fname)
pstmt.setString(3, row.lname)
pstmt.setString(4, row.userid)
pstmt.addBatch()
pstmt.executeBatch()
}
}
//pstmt.executeBatch()
dbc.commit()
pstmt.close()
})
dbc.close()
} )
If you are working over Spark, maybe this answers could help you to lead with the merge issue using DataFrames.
Anyway, reading some documentation of Hortonworks, it says that Merge sentence is supported in Apache Hive 0.14 and later.
There is an Apache project - Apache Iceberg - which creates a table format type with editing capabilities, including MERGE:
https://iceberg.apache.org/docs/latest/spark-writes/

CodeIgniter - Grouping where clause

I have this following query for CodeIgniter:
$q = $this->db->where('(message_from="'.$user_id.'" AND message_to="'.$this->auth_model->userdata['user_id'].'")')
->or_where('(message_from="'.$this->auth_model->userdata['user_id'].'" AND message_to="'.$user_id.'")')
->get('messages');
I want to write this query with completely active record.
I have tried something like this:
$from_where = array('message_from'=>$user_id, 'message_to'=>$this->auth_model->userdata['user_id']);
$to_where = array('message_from'=>$this->auth_model->userdata['user_id'],'message_to'=>$user_id);
$q = $this->db->where($from_where)
->or_where($to_where)
->get('messages');
die($this->db->last_query());
The above code produces this query:
SELECT * FROM (`messages`) WHERE `message_from` = '2' AND `message_to` = '1' OR `message_from` = '1' OR `message_to` = '2'
But this is what I want to produce:
SELECT * FROM (`messages`) WHERE (message_from="2" AND message_to="1") OR (message_from="1" AND message_to="2")
There are similar questions here and here, but thosedid not provide a real solution for me.
How's this possible, If not via core libraries, is there an extension which allows writing such queries?
Thanks,
You can use sub query way of codeigniter to do this for this purpose you will have to hack codeigniter. like this
Go to system/database/DB_active_rec.php Remove public or protected keyword from these functions
public function _compile_select($select_override = FALSE)
public function _reset_select()
Now subquery writing in available And now here is your query with active record
$this->db->where('message_from','2');
$this->db->where('message_to','1');
$subQuery1 = $this->db->_compile_select();
$this->db->_reset_select();
$this->db->where('message_from','1');
$this->db->where('message_to','2');
$subQuery2 = $this->db->_compile_select();
$this->db->_reset_select();
$this->db->select('*');
$this->db->where("$subQuery1");
$this->db->or_where("$subQuery2");
$this->db->get('messages');
Look at this answer of mine. This shows how to use sub queries. This will help
Using Mysql WHERE IN clause in codeigniter
EDITES
Yes i have done it
Rewrite the query this way exactly you want
$this->db->where('message_from','2');
$this->db->where('message_to','1');
$subQuery1 = $this->db->_compile_select(TRUE);
$this->db->_reset_select();
$this->db->where('message_from','1');
$this->db->where('message_to','2');
$subQuery2 = $this->db->_compile_select(TRUE);
$this->db->_reset_select();
$this->db->select('*');
$this->db->where("($subQuery1)");
$this->db->or_where("($subQuery2)");
$this->db->get('messages');
Compile select is with true parameter. Will not produce select clause. This will produce
SELECT * FROM (`messages`) WHERE (`message_from` = '2' AND `message_to` = '1') OR (`message_from` = '1' AND `message_to` = '2')