BigQuery streaming insert with a function applied to another column - google-bigquery

When using client libraries I can pass a list of objects to be inserted into BigQuery, like this one in Go https://cloud.google.com/bigquery/docs/samples/bigquery-table-insert-rows#bigquery_table_insert_rows-go
But what if I want to do something like this:
INSERT INTO table_name (col1, col2)
VALUES
("a", FARM_FINGERPRINT("a")),
("bcd", FARM_FINGERPRINT("bcd")),
i.e. providing only values "a", "bcd" insert into both columns where one is just a function of another.
How to do this using streaming insert in Go library, for example? Like this pseudo-code:
...
inserter := client.Dataset(datasetID).Table(tableID).Inserter()
items := []*Item{
// Item implements the ValueSaver interface.
{Name: "Phred Phlyntstone", Age: 32, SomeColumn: 'CALL_ME("Phred Phlyntstone")'},
{Name: "Wylma Phlyntstone", Age: 29, SomeColumn: 'CALL_ME("Wylma Phlyntstone")'}},
}
...
One possibility is to re-implement the function in Go code and insert in explicitly, but it's not ideal. Or with simple INSERT INTO I can hit DML limits. Is there a better solution?

Based from the documentation it might not be possible to perform DML functions using Inserter(). Inserter() accepts either Struct, StructSaver and ValueSaver as “src”.
Struct is made of key value pairs so whatever value assigned to those, that will be it. Type StructSaver accepts Schema, InsertId and Struct as parameters. So basically it is just the same as passing a Struct but you can just pass on extra parameters. ValueSaver accepts Schema, InsertID and Row. Row is of type Value so it can accept any value with any data type. Based from the accepted types of Inserter(), they just all accept key value pairs and whatever the value of the key value pairs, that will be the data loaded.
Unfortunately, the best possible workaround for this is the suggestion that you have presented which is re-implementing the function in the code since it won't hit the DML limits. You can try creating a feature request for Go lang to possibly implement this since it will be a cool feature to have.

Both the legacy streaming API (e.g. the Go inserter you cited) and the newer storage write API allow insertion of values directly into a table.
This doesn't involve the query engine, so doing things like calling functions/UDFs/etc is not a possibility.
Another possibility: you could do something like construct a logical or materialized view that uses the table you're streaming data into as the base table.

Related

How can I pass from Java a collection/array of data to procedure call? HANA

Let's say I have a list of ids: 111, 112, 113 that I fetched executing the following query using Java:
SELECT "id" FROM User WHERE (email, name) IN (("", ""), ("", ""));
The list length will vary.
From Java I need to pass this list/collection/array of IDs to a stored procedure. How can I do that?
CREATE PROCEDURE "PROCEDUREEXAMPLE" (IN userIds ??COLLECTION??) LANGUAGE SQLSCRIPT SQL SECURITY DEFINER
AS
BEGIN
//Do the rest
END
I wanted to use another procedure to pass the result of first query to the other procedure but as you can see, the first sql is dynamic and the values will vary.
A way to do is to store those ids in a temporary table and the procedure call will access them, but I wanted to know if it possible to pass a collection of data to procedure call.
Feel free to suggest other ways of doing this.
Thanks
This has been asked & answered here several times before.
No, it’s not possible to pass a java collection or java array of values into a SAP HANA SQL statement and get the corresponding IN LIST.
There is also no mapping of java arrays to SAP HANA SQL arrays.
To deal with that, two main approaches are available:
Create the IN-LIST based on the collection elements yourself. This of course can lead to issues with prepared statement reuse, due to the changing number of elements. One way to handle this could be to prepare a statement with a larger number of elements and only bind those for which you got elements in the collection/array.
Create a temporary table, fill it with the elements of the collection, one element = one row and use an INNER JOIN to filter based on this set of elements.

Access column from composite type array in Postgres C API

I access array of composite values like this:
PG_GETARG_ARRAYTYPE_P(0)
/* Then I deconstruct it into C array */
deconstruct_array()
/* Later I iterate thru values and attempt to access columns of my composite type */
GetAttributeByName(input_data1[i], "keyColumnName", &isnull[0])
This is how it looks in SQL:
SELECT * FROM my_c_function(array[(44, 1)::comp_type, (43, 0)::comp_type], array[(42, 1)::comp_type, (43, 1)::comp_type]);
Expected result:
array[(44, 1)::comp_type, (42, 1)::comp_type, (43, 1)::comp_type] /*order doesn't matter*/
But this does not work, because GetAttributeByName() works only with HeapTupleHeader, sadly I have array of Datum.
Normally you get HeapTupleHeader by accessing function attribute like so: PG_GETARG_HEAPTUPLEHEADER(0) but that is not meant for arrays (or I'm wrong?).
So is there some function/makro to get columns from Datum that is composite type or to convert composite type Datum into HeapTuple? I have gone as deep as heap_getattr(), but can't really find anything useful. Can't remember if there is already some kind of function that would access composite array in similar fashion and would show me how to do it.
For the context:
I have 2 arrays of composite type and I want to write C function for fast concatenation of them. I however cannot simply add right argument to left, because they could share "key" column and in that case I would like result to have only values from right side.
This is simple task in plpgSQL (unnest, full join, array_agg) but is very slow. I have tested the same task in hstore and json and both are much faster than unnest+array_agg, but I cannot use those data types without extensive database structure changes, so I was looking for different solution.
I guess all you need is the DatumGetHeapTupleHeader macro defined in fmgr.h.

Using bind variables in large insert statements

I am inheriting an application which has to read data from various types of files and use the OCI interface to move the data into an Oracle database. Most of the tables in question have about 40-50 columns, so the SQL insert statements become pretty lengthy.
When I inherited this code, it basically built up the insert statements via a series of strcats as a C string, then passed it to the appropriate OCI functions to set up and execute the statement. However, since much of the data is read directly from files into the column values, this leaves the application open to easy SQL injection. So I am trying to use bind variables to solve this problem.
In every example OCI application I can find, each variable is statically allocated and bound individually. This would lead to quite a bit of boilerplate, however and I'd like to reduce it to some sort of looping construct. So my solution is to, for each table, create a static array of strings containing the names of the table columns:
const char const *TABLE_NAME[N_COLS] = {
"COL_1",
"COL_2",
"COL_3",
...
"COL_N"
};
along with a short function that makes a placeholder out of a column name:
void makePlaceholder(char *buf, const char *col);
// "COLUMN_NAME" -> ":column_name"
So I then loop through each array and bind my values to each column, generating the placeholders as I go. One potential problem here is that, because the types of each column vary, I bind everything as SQLT_STR (strings) and thus expect Oracle to convert to the proper datatype on insertion.
So, my question(s) are:
What is the proper/idiomatic (if such a thing exists for SQL/OCI) to use bind variables for SQL insert statements with a large number of columns/params? More generally, what is the best way to use OCI to make this type of large insert statement?
Do large numbers of bind calls have a significant cost in efficiency compared to building and using vanilla C strings?
Is there any risk in binding all variables as strings and allowing Oracle to make the proper type conversion?
Thanks in advance!
Not sure about the C aspects of this. My answer will be from a DBA perspective.
Question 2:
Always use bind variables. It prevent SQL-injection and enhances performance.
The performance aspect is often overlooked by programmers. When Oracle receives a SQL it makes a hash of the entire SQL-text and looks in it's internal repository of execution plans to see if it has one. If bind variables was used it the SQL-text will be the same each time you run the query, not matter what the value of a variable is. However if you have concatenated the string your self Oracle will hash the SQL-text including content of (what you aught to have put in) variables, getting a unique hash every time. So if you do a query one million times Oracle will if you used bind variables make one execution plan, while if you did not use bind variables it will make one million execution plans and waste loads of resources doing that.

Reference an arbitrary row and field in another table

Is there any form (data type, inherence..) of implement in postgresql something like this:
CREATE TABLE log (
datareferenced table_row_column_reference,
logged boolean
);
The referenced data may be any row field from the database. My objective is implement something like this without use Procedural Language or implement it in a higher layer, using only a relational approach and without modify the rest of the tables. Another feature may be referencial integrity, example:
-- Table foo (id, field1, field2, fieldn)
-- ('bar', '2014-01-01', 4.33, Null)
-- Table log (datareferenced, logged)
-- ({table foo -> id:'bar' -> field2 } <=> 4.33, True)
DELETE FROM foo where id='bar';
-- as result, on cascade, deleted both rows.
I have an application build onto a MVC pattern. The logic is written in Python. The application is a management tool, very data intensive. My goal is implement a module that could store additional information per every data present in the DDBB. Per example, a client have a serie of attributes (name, address, phone, email ...) across multiple tables, and I want that the app could store metadata-like for every registry from all the DDBB. A metadata could be last modification, or a user flag, etc.
I have implemented the metadata model (in postgres), its mapping to objects and a parcial API. But the part left is the most important, the glue. My plan B is create that glue in the data mapping layer as a module. Something like this:
address= person.addresses[0]
address.saveMetadata('foo', 'bar')
-- in the superclass of Address
def saveMetadata(self, code, value):
self.mapper.metadata_adapter.save(self, code, value)
-- in the metadata adapter class:
def save(self, entity, code, value):
sql = """update value=%s from metadata_values
where code=%s and idmetadata=
(select id from metadata_rels mr
where mr.schema=%s and mr.table=%s and
mr.field=%s and mr.recordpk=%s)"""%
(value, code,
self.class2data[entity.__class__]["schema"],
self.class2data[entity.__class__]["table"],
self.class2data[entity.__class__]["field"],
entity.id)
self.mapper.execute(sql)
def read(self, entity , code):
sql = """select mv.value
from metadata_values mv
join metadata_rels mr on mv.idmetadata=mr.id
where mv.code=%s and mr.schema=%s and mr.table=%s and
mr.field=%s and mr.recordpk=%s"""%
(code,
self.class2data[entity.__class__]["schema"],
self.class2data[entity.__class__]["table"],
self.class2data[entity.__class__]["field"],
entity.id )
return self.mapper.execute(sql)
But it would add overhead between python and postgresql, complicate Python logic, and using PL and triggers may be very laborious and bug-prone. That is why i'm looking at doing the same at the DDBB level.
No, there's nothing like that in PostgreSQL.
You could build triggers yourself to do it, probably using a composite type. But you've said (for some reason) you don't want to use PL/PgSQL, so you've ruled that out. Getting RI triggers right is quite hard, though, and you must apply a trigger to the referencing and referenced ends.
Frankly, this seems like a square peg, round hole kind of problem. Are you sure PostgreSQL is the right choice for this application?
Describe your needs and goal in context. Why do you want this? What problem are you trying to solve? Maybe there's a better way to approach the same problem one step back...

Best Way to Handle SQL Parameters?

I essentially have a database layer that is totally isolated from any business logic. This means that whenever I get ready to commit some business data to a database, I have to pass all of the business properties into the data method's parameter. For example:
Public Function Commit(foo as object) as Boolean
This works fine, but when I get into commits and updates that take dozens of parameters, it can be a lot of typing. Not to mention that two of my methods--update and create--take the same parameters since they essentially do the same thing. What I'm wondering is, what would be an optimal solution for passing these parameters so that I don't have to change the parameters in both methods every time something changes as well as reduce my typing :) I've thought of a few possible solutions. One would be to move all the sql parameters to the class level of the data class and then store them in some sort of array that I set in the business layer. Any help would be useful!
So essentially you want to pass in a List of Parameters?
Why not redo your Commit function and have it accept a List of Parameter objects?
If your on SQL 2008 you can use merge to replace insert / update juggling. Sometimes called upsert.
You could create a struct to hold the parameter values.
Thanks for the responses, but I think I've figured out a better way for what I'm doing. It's similar to using the upsert, but what I do is have one method called Commit that looks for the given primary key. If the record is found in the database, then I execute an update command. If not, I do an insert command. Since the parameters are the same, you don't have to worry about changing them any.
For your problem I guess Iterator design pattern is the best solution. Pass in an Interface implementation say ICommitableValues you can pass in a key pair enumeration value like this. Keys are the column names and values are the column commitable values. A property is even dedicated as to return the table name in which to insert these value and or store procedures etc.
To save typing you can use declarative programming syntax (Attributes) to declare the commitable properties and a main class in middleware can use reflection to extract the values of these commitable properties and prepare a ICommitableEnumeration implementation from it.