Biq-query: add new key:val in json - sql

How can I add a new key/val pair in an already existing JSON col in bigqyery using SQL (big query flavor).
To something like

BigQuery provides Data Manipulation Language (DML) statements such as the SQL Update statement. See:
https://cloud.google.com/bigquery/docs/reference/standard-sql/dml-syntax#update_statement
What you will have to do is retrieve the original value of your structured column and then perform a SQL UPDATE statement to set the new value of the column to be the absolute new value that you want.
Take care to realize that BigQuery is an OLAP database and is optimized for queries rather that updates or deletes. Make sure you read the information on using DML statements in BigQuery found here.
https://cloud.google.com/bigquery/docs/reference/standard-sql/data-manipulation-language

I feel like this question is less about how to update the table, but more about how to adjust existing json with extra/new key:value (then to either update table or just simply select out)
So, I assume you have table like below
and you might have another table with those new key:value pairs to use
in case if you don't really have second table - you can just use CTE like below
with new_key_val as (
select 1 id, '{"key3":"value3"}' add_json union all
select 2 id, '{"key14":"value14"}'
)
So, having above - you can use below approach
select *,
( select '{' || string_agg(trim(kv)) || ',' || trim(add_json, '{}') || '}'
from unnest(split(trim(json_col, '{}'), ',')) kv
) adjusted_json
from your_table
left join new_key_val
using(id)
with output

BigQuery supports JSON as a native data type but only offers a limited set of JSON functions. Unless your json data has a pre-defined, simple schema with known keys, you probably want to go the string-manipulation way.

Related

Using Regex to determine what kind of SQL statement a row is from a list?

I have a large list of SQL commands such as
SELECT * FROM TEST_TABLE
INSERT .....
UPDATE .....
SELECT * FROM ....
etc. My goal is to parse this list into a set of results so that I can easily determine a good count of how many of these statements are SELECT statements, how many are UPDATES, etc.
so I would be looking at a result set such as
SELECT 2
INSERT 1
UPDATE 1
...
I figured I could do this with Regex, but I'm a bit lost other than simply looking at everything string and comparing against 'SELECT' as a prefix, but this can run into multiple issues. Is there any other way to format this using REGEX?
You can add the SQL statements to a table and run them through a SQL query. If the SQL text is in a column called SQL_TEXT, you can get the SQL command type using this:
upper(regexp_substr(trim(regexp_replace(SQL_TEXT, '\\s', ' ')),
'^([\\w\\-]+)')) as COMMAND_TYPE
You'll need to do some clean up to create a column that indicates the type of statement you have. The rest is just basic aggregation
with cte as
(select *, trim(lower(split_part(regexp_replace(col, '\\s', ' '),' ',1))) as statement
from t)
select statement, count(*) as freq
from cte
group by statement;
SQL is a language and needs a parser to turn it from text into a structure. Regular expressions can only do part of the work (such as lexing).
Regular Expression Vs. String Parsing
You will have to limit your ambition if you want to restrict yourself to using regular expressions.
Still you can get some distance if you so want. A quick search found this random example of tokenizing MySQL SQL statements using regex https://swanhart.livejournal.com/130191.html

How can I create a calculate column in the creation of table in POSTGRESQL, for example in sql server LineTotal AS Price * Quantity [duplicate]

Does PostgreSQL support computed / calculated columns, like MS SQL Server? I can't find anything in the docs, but as this feature is included in many other DBMSs I thought I might be missing something.
Eg: http://msdn.microsoft.com/en-us/library/ms191250.aspx
Postgres 12 or newer
STORED generated columns are introduced with Postgres 12 - as defined in the SQL standard and implemented by some RDBMS including DB2, MySQL, and Oracle. Or the similar "computed columns" of SQL Server.
Trivial example:
CREATE TABLE tbl (
int1 int
, int2 int
, product bigint GENERATED ALWAYS AS (int1 * int2) STORED
);
fiddle
VIRTUAL generated columns may come with one of the next iterations. (Not in Postgres 15, yet).
Related:
Attribute notation for function call gives error
Postgres 11 or older
Up to Postgres 11 "generated columns" are not supported.
You can emulate VIRTUAL generated columns with a function using attribute notation (tbl.col) that looks and works much like a virtual generated column. That's a bit of a syntax oddity which exists in Postgres for historic reasons and happens to fit the case. This related answer has code examples:
Store common query as column?
The expression (looking like a column) is not included in a SELECT * FROM tbl, though. You always have to list it explicitly.
Can also be supported with a matching expression index - provided the function is IMMUTABLE. Like:
CREATE FUNCTION col(tbl) ... AS ... -- your computed expression here
CREATE INDEX ON tbl(col(tbl));
Alternatives
Alternatively, you can implement similar functionality with a VIEW, optionally coupled with expression indexes. Then SELECT * can include the generated column.
"Persisted" (STORED) computed columns can be implemented with triggers in a functionally equivalent way.
Materialized views are a related concept, implemented since Postgres 9.3.
In earlier versions one can manage MVs manually.
YES you can!! The solution should be easy, safe, and performant...
I'm new to postgresql, but it seems you can create computed columns by using an expression index, paired with a view (the view is optional, but makes makes life a bit easier).
Suppose my computation is md5(some_string_field), then I create the index as:
CREATE INDEX some_string_field_md5_index ON some_table(MD5(some_string_field));
Now, any queries that act on MD5(some_string_field) will use the index rather than computing it from scratch. For example:
SELECT MAX(some_field) FROM some_table GROUP BY MD5(some_string_field);
You can check this with explain.
However at this point you are relying on users of the table knowing exactly how to construct the column. To make life easier, you can create a VIEW onto an augmented version of the original table, adding in the computed value as a new column:
CREATE VIEW some_table_augmented AS
SELECT *, MD5(some_string_field) as some_string_field_md5 from some_table;
Now any queries using some_table_augmented will be able to use some_string_field_md5 without worrying about how it works..they just get good performance. The view doesn't copy any data from the original table, so it is good memory-wise as well as performance-wise. Note however that you can't update/insert into a view, only into the source table, but if you really want, I believe you can redirect inserts and updates to the source table using rules (I could be wrong on that last point as I've never tried it myself).
Edit: it seems if the query involves competing indices, the planner engine may sometimes not use the expression-index at all. The choice seems to be data dependant.
One way to do this is with a trigger!
CREATE TABLE computed(
one SERIAL,
two INT NOT NULL
);
CREATE OR REPLACE FUNCTION computed_two_trg()
RETURNS trigger
LANGUAGE plpgsql
SECURITY DEFINER
AS $BODY$
BEGIN
NEW.two = NEW.one * 2;
RETURN NEW;
END
$BODY$;
CREATE TRIGGER computed_500
BEFORE INSERT OR UPDATE
ON computed
FOR EACH ROW
EXECUTE PROCEDURE computed_two_trg();
The trigger is fired before the row is updated or inserted. It changes the field that we want to compute of NEW record and then it returns that record.
PostgreSQL 12 supports generated columns:
PostgreSQL 12 Beta 1 Released!
Generated Columns
PostgreSQL 12 allows the creation of generated columns that compute their values with an expression using the contents of other columns. This feature provides stored generated columns, which are computed on inserts and updates and are saved on disk. Virtual generated columns, which are computed only when a column is read as part of a query, are not implemented yet.
Generated Columns
A generated column is a special column that is always computed from other columns. Thus, it is for columns what a view is for tables.
CREATE TABLE people (
...,
height_cm numeric,
height_in numeric GENERATED ALWAYS AS (height_cm * 2.54) STORED
);
db<>fiddle demo
Well, not sure if this is what You mean but Posgres normally support "dummy" ETL syntax.
I created one empty column in table and then needed to fill it by calculated records depending on values in row.
UPDATE table01
SET column03 = column01*column02; /*e.g. for multiplication of 2 values*/
It is so dummy I suspect it is not what You are looking for.
Obviously it is not dynamic, you run it once. But no obstacle to get it into trigger.
Example on creating an empty virtual column
,(SELECT *
From (values (''))
A("virtual_col"))
Example on creating two virtual columns with values
SELECT *
From (values (45,'Completed')
, (1,'In Progress')
, (1,'Waiting')
, (1,'Loading')
) A("Count","Status")
order by "Count" desc
I have a code that works and use the term calculated, I'm not on postgresSQL pure tho we run on PADB
here is how it's used
create table some_table as
select category,
txn_type,
indiv_id,
accum_trip_flag,
max(first_true_origin) as true_origin,
max(first_true_dest ) as true_destination,
max(id) as id,
count(id) as tkts_cnt,
(case when calculated tkts_cnt=1 then 1 else 0 end) as one_way
from some_rando_table
group by 1,2,3,4 ;
A lightweight solution with Check constraint:
CREATE TABLE example (
discriminator INTEGER DEFAULT 0 NOT NULL CHECK (discriminator = 0)
);

UPDATE column table postgres

I'm trying to update a column table in postgres but I don't know how to do this...
I have a table with 3 columns like nickname, color and json which is a string this one has an object like
{"value1":"answer1", "value2":"answer2" }
At the column with the json I want to add nickname and color values...
Like this:
{"value1":"answer1", "value2":"answer2", "nickname":"name1", "color":"red" }
How can I do this update?
If you really want to do this from Postgres, then you may try using REGEXP_REPLACE:
WITH cte AS (
SELECT '{"value1":"answer1", "value2":"answer2"}'::json AS col
)
UPDATE cte
SET col = REGEXP_REPLACE(col::text,
'\}$',
', "nickname":"name1", "color":"red"}')::json;
{"value1":"answer1", "value2":"answer2", "nickname":"name1", "color":"red"}
This approach uses direct string manipulation to find the end of the JSON string, and then to insert the new key value pairs.
If you were using Postgres 9.5 or later, you might be to use the merge || operator to add the new JSON content. Postgres' support for JSON manipulation has gotten better in recent versions, but version 9.4 does not offer much.
Also, if you could handle this from your Hibernate/JPA layer, it might make more sense.
In case you want to use the existing columns of the table, you could use json_build_object.
UPDATE t
SET json_col = ( replace(json_col :: text, '}', ',')
|| replace(Json_build_object ('nickname', nickname,
'color', color) :: text,'{', '' ) ) :: json ;
The replace() statements help in shaping the json string. Note that here I have considered your string to be a simple json string ( without nested arrays/jsons).
Demo

BigQuery query creation without variables?

Coming from SQL Server and a little bit of MySQL, I'm not sure how to proceed on google's BigQuery web browser query tool.
There doesn't appear to be any way to create, use or Set/Declare variables. How are folks working around this? Or perhaps I have missed something obvious in the instructions or the nature of BigQuery? Java API?
It is now possible to declare and set variables using SQL. For more information, see the documentation, but here is an example:
-- Declare a variable to hold names as an array.
DECLARE top_names ARRAY<STRING>;
-- Build an array of the top 100 names from the year 2017.
SET top_names = (
SELECT ARRAY_AGG(name ORDER BY number DESC LIMIT 100)
FROM `bigquery-public-data`.usa_names.usa_1910_current
WHERE year = 2017
);
-- Which names appear as words in Shakespeare's plays?
SELECT
name AS shakespeare_name
FROM UNNEST(top_names) AS name
WHERE name IN (
SELECT word
FROM `bigquery-public-data`.samples.shakespeare
);
There is currently no way to set/declare variables in BigQuery. If you need variables, you'll need to cut and paste them where you need them. Feel free to file this as a feature request here.
Its not elegant, and its a a pain, but...
The way we handle it is using a python script that replaces a "variable placeholder" in our query and than sending the amended query via the API.
I have opened a feature request asking for "Dynamic SQL" capabilities.
If you want to avoid BQ scripting, you can sometimes use an idiom which utilizes WITH and CROSS JOIN.
In the example below:
the events table contains some timestamped events
the reports table contain occasional aggregate values of the events
the goal is to write a query that only generates incremental (non-duplicate) aggregate rows
This is achieved by
introducing a state temp table that looks at a target table for aggregate results
to determine parameters (params) for the actual query
the params are CROSS JOINed with the actual query
allowing the param row's columns to be used to constrain the query
this query will repeatably return the same results
until the results themselves are appended to the reports table
WTIH state AS (
SELECT
-- what was the newest report's ending time?
COALESCE(
SELECT MAX(report_end_ts) FROM `x.y.reports`,
TIMESTAMP("2019-01-01")
) AS latest_report_ts,
...
),
params AS (
SELECT
-- look for events since end of last report
latest_report_ts AS event_after_ts,
-- and go until now
CURRENT_TIMESTAMP() AS event_before_ts
)
SELECT
MIN(event_ts) AS report_begin_ts,
MAX(event_ts) AS report_end_ts
COUNT(1) AS event_count,
SUM(errors) AS error_total
FROM `x.y.events`
CROSS JOIN params
WHERE event_ts > event_after_ts
AND event_ts < event_before_ts
)
This approach is useful for bigquery scheduled queries.

Can SQL determine which values from a set of possible column values do not exist?

I have a unique column. I also have a known set of elements that are possible values for the column. I need to know which of the possible values are not already in the table, and as such, are suitable for insertion.
Is this possible with SQL or is post processing required?
Currently, I am using the "in" operator to select all rows where the column value equals an element in my set. Then I remove all matched elements from my set via post processing.
Stick the allowed values in a temporary table allowed, then use a subquery using NOT IN:
SELECT *
FROM allowed
WHERE allowed.val NOT IN (
SELECT maintable.val
)
Some DBs will allow you to build up a table "in-place", instead of having to create a separate table. E.g. in PostgreSQL (any version):
SELECT *
FROM (
SELECT 'foo'
UNION ALL SELECT 'bar'
UNION ALL SELECT 'baz' -- etc.
) inplace_allowed
WHERE inplace_allowed.val NOT IN (
SELECT maintable.val
)
More modern versions of PostgreSQL (and perhaps other DBs) will let you use the slightly nicer VALUES syntax to do the same thing.
To do this entirely in SQL you will need to create a separate table with one column. Each row holds one value from the known set of elements. Assuming the table is called ElementList and the other table is called Existing:
SELECT * FROM ElementList WHERE Element NOT IN
(SELECT DISTINCT Element FROM Existing)
Depending on what database engine you're using you may be able to use a temporary table to create and hold the list without saving it permanently in the database. However, storing the list of allowed elements is valuable for constraining the Element column in the Existing table (and for presenting the user with allowed Elements in the user interface).