How to stream repeated fields into bigquery?

How to stream repeated fields into bigquery? - google-bigquery

I'm using the tabledata().insertAll()
here is some test data I'm trying to insert:
row = {
'insertId': str(i*o),
'json': {
'meterId': i*o,
'erfno': str(i),
'latitude': '123123',
'longitude': '123123',
'address': str(random.randint(1, 100)) + 'foobar street',
'readings': [
{
'read_at': time.time(),
'usage': random.randrange(50, 500),
'account': 'acc' + str(i*o)
}
]
}
}
It gives me the error:
array specified for non-repeated field
I wish to stream (and thus append to the repeated field) one record at a time of the 'readings' repeated field every minute.

You cannot update an existing row. You cannot add to an existing row. You need to rethink this. Don't forget that BigQuery is append only.
You can have repeated fields in rows, but it must be declared as that in your schema.
In your situation, you need to create new rows with every reading. reading can be a record if you want to structure your data like that.

Correct! You should consider flatenning your table, inserting a new row for every new reading.

Related

How to update a large (1 million+ rows) postgres column of jsonb type values

Trying to update a specific array inside of a jsonb type in a column called params , and having issues with how long it’s taking. For example, there is a table with a row that contains an array owners
{
"hidden": false,
"org_id": "34692",
"owners": [
"tim#facebuk.com"
],
"deleted": false
}
And another example
{
"hidden": false,
"org_id": "34692",
"owners": [
"tim#google.com"
],
"deleted": false
}
And there’s essentially a million of these rows (all with different email domains as owners.
I have this query which I want to execute across all of these rows:
UPDATE table
set params = CASE
WHEN params->>'owners' NOT LIKE '%google.com%' THEN jsonb_set(params, '{owners}', concat('"', substr(md5(random()::text), 0, 25), '#googlefake.com"')::jsonb)
ELSE params
END
I’ve tested with a dataset of 100, and it executes perfectly time, but doing this with a 1000x multiple, makes the query forever execute, and I’ve no clue if it will actually successfully complete. Not entirely sure how to speed up this process or utilize this in a better fashion. I did try indexing e.g. CREATE INDEX ON table((params->>'owners')); to no avail. Query has run >1 hour, and there are multiple rows similar to this.
Am i indexing incorrectly? Also, I've looked into the gin operator and #> won't help since each owner field differs
Update:
UPDATE table AS "updatetarget"
SET params = jsonb_set(params, '{owners}', concat('"', substr(md5(random()::text), 0, 25), '#googlefake.com"')::jsonb)
query updated to this and still taking awhile. Is there some way to index the key so i can make the second query faster?

Avoid unnecessary updates with a WHERE clause that filters out the rows that don't need to be modified.
UPDATE table
set params = jsonb_set(
params,
'{owners}',
concat(
'"',
substr(md5(random()::text), 0, 25),
'#googlefake.com"'
)::jsonb
)
WHERE params->>'owners' NOT LIKE '%google.com%';
If a lot of rows are affected, you may want to run VACUUM (FULL) once the update is done.

Get the difference from data in Swift and data in database

A table in database have two column: ID and Value
In my project there are another data in a dictionary that key is ID and value is Value.
I want to get difference: Data that are in the dictionary and are not in the database. If these both data were in database, I could use SQL commands "Except" or "Not Exist" to get the difference like below image.
What is the best way to do this?
I use SQLiteDB that the result of query is a dictionary like this:
[["ID":"id1", "Value": "val1"], ["ID":"id2", "Value": "val2"],...]
Also notice that both columns should be considered while compare these two data (dictionary and data in db).

// here we get intersection of new data and data we have
let intersection = dataSet1.filtered { data in dataSet2.contains(where: { $0.id == data.id })}
// delete elements from dataSet1 which belongs to intersection
let dataSet1MinusDataSet2 = dataSet1.filter { data in intersection.contains(where: { data.id == $0.id })}
I've written code here without any Xcode so errors in syntax is possible but I think you will get the idea

sql.eachRow only adds the last record into a list

Good day, I'm trying to add all the users from my db to a list and print it out in a frame. But the problem is that I am only retrieving the LAST record of the users table. The others are being ignored. Here's my code
table(selectionMode: ListSelectionModel.SINGLE_SELECTION){
sql.eachRow("select * from users"){row->
println row;
def staffList = []
staffList.add(uname:row.uname,pwd:row.pwd);
tableModel(list : staffList){
closureColumn(header:'Username',read:{row1 -> return row1.uname})
closureColumn(header:'Password',read:{row1 -> return row1.pwd})
}

I think the problem is that you have redefined the staffList array within the loop. Move that to before, and you may have better results.

CHtml::listData findall returns last element

I have a dropdown list. Within the dropdownlist, I use listdata to retrieve the data from another table. But strangely, it only gets the last item in the table.
$form->dropDownList($model,'status_id',CHtml::listData(OrderStatus::model()->findAll(),'status_id', 'status'))
chtml::listdata strangely only shows this array(1) { [""]=> string(9) "Delivered" } while in the table there are 7 rows/id, where delivered is the last entry. What happened to the others?
Another odd thing is that $model->status_id is actually id 1, so it shouldn't display 'Delivered', it should be showing 'New'.

Take a look at this:
Example 1: Generating a list data for categories
// you can use here any find method you think proper to return your data from db*/
$models = categories::model()->findAll();
// format models resulting using listData
$list = CHtml::listData($models, 'category_id', 'category_name');
print_r($list);
HTML Output (Example):
array("1" => "Arts", "2" => "Science", "3" => "Culture");

See if you have by any chance a default scope.
Just do a debug of OrderStatus::model()->findAll() and see if it returns 7 records or just 1.

Your chtml::listdata strangely showing
array(1) { [""]=> string(9) "Delivered" }
Is because you must have same entries in status_id column of all the rows in OrderStatus which is blank/null as per above array.
In below call
CHtml::listData(OrderStatus::model()->findAll(),'status_id', 'status')
status_id is key( index of array ) for your generated list array & its getting overwritten by same value everytime, thats whay its showing only one & last value.

Best way to compare two hash of hashes?

Right now I have two hash of hashes, 1 that I created by parsing a log file, and 1 that I grab from SQL. I need to compare them to find out if the record from the log file exists in the database already. Right now I am iterating through each element to compare them:
foreach my $i(#record)
{
foreach my $a(#{$data})
{
if ($i->{port} eq $a->{port} and $i->{name} eq $a->{name})
{
print "match found $i->{name}, updating record in table\n";
}
else
{
print "no match found for $tableDate $i->{port} $i->{owner} $i->{name} adding record to table\n";
executeStatement("INSERT INTO client_usage (date, port, owner, name, emailed) VALUES (\'$tableDate\', \'$i->{port}\', \'$i->{owner}\', \'$i->{name}\', '0')");
}
}
}
Naturally, this takes a long time to run through as the database gets bigger. Is there a more efficient way of doing this? Can I compare the keys directly?

You have more than a hash of hashes. You have two lists and each element in each list contains a hash of hashes. Thus, you have to compare each item in the list with each item in the other list. Your algorithm is efficiency is O2 -- not because it's a hash of hashes, but because you're comparing each row in one list with each row in another list.
Is it possible to go through your lists and turn them into a hash that is keyed by the port and name? That way, you go through each list once to create the indexing hash, then go through the hash once to do the comparison.
For example, to create the hash from the record:
my %record_hash;
foreach my $record_item (#record) {
my $name = $record_item->{name};
my $data = $record_item->{data}
my $record_hash{$name:$data} = \$record_item #Or something like this...
}
Next, you'd do the same for your data list:
my %data_hash;
foreach my $data_item (#{$data}) {
my $name = $data_item->{name};
my $data = $data_item->{data}
my $data_hash{$name:$data} = \$data_item #Or something like this...
}
Now you can go through your newly created hash just once:
foreach my $key (keys %record_hash) {
if (exists $data_hash{$key}) {
print "match found $i->{name}, updating record in table\n";
}
else {
print "no match found for $tableDate $i->{port} $i->{owner} $i->{name} adding record to table\n";
executeStatement("INSERT INTO client_usage (date, port, owner, name, emailed) VALUES (\'$tableDate\', \'$i->{port}\', \'$i->{owner}\', \'$i->{name}\', '0')");
}
}
Let's say you have 1000 elements in one list, and 500 elements in the other. Your original algorithm would have to loop 500 * 1000 times (half a million times). By creating an index hash, you have to loop through 2(500 + 1000) times (about 3000 times).
Another possibility: Since you're already using a SQL database, why not do the whole thing as a SQL query. That is, don't fetch the records. Instead, go through your data, and for each data item, fetch the record. If the record exists, you update it. If not, you create a new one. That maybe even faster because you're not turning the whole thing into a list in order to turn it into a hash.
There's a way to tie SQL databases directly to hashes. That might be a good way to go too.
Are you using Perl-DBI?

How about using Data::Difference:
use Data:Difference qw(data_diff);
my #diff = data_diff(\%hash_a, \%hash_b);
#diff = (
{ 'a' => 'value', 'path' => [ 'data' ] }, # exists in 'a' but not in 'b'
{ 'b' => 'value', 'path' => [ 'data' ] }, # exists in 'b' not in 'a'
);

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to stream repeated fields into bigquery? - google-bigquery

Correct! You should consider flatenning your table, inserting a new row for every new reading.

Related

How to update a large (1 million+ rows) postgres column of jsonb type values

Get the difference from data in Swift and data in database

sql.eachRow only adds the last record into a list

CHtml::listData findall returns last element

Best way to compare two hash of hashes?

Categories

Resources