Creation of a temporary table in postgres - sql

I'm trying to create a temporary table in Postgres (to speed up joining, as there will be a lot of similar queries throughout a session). The SQL that will be called at the beginning of a session is the following:
CREATE TEMPORARY TABLE extended_point AS
SELECT (
point.id,
local_location,
relative_location,
long_lat,
region,
dataset,
region.name,
region.sub_name,
color,
type)
FROM point, region, dataset
WHERE point.region = region.id AND region.dataset = dataset.id;
The tables point has the columns id::int, region::int, local_location::point, relative_location::point, long_lat:point (longitude, latitude).
Region has the columns id::int, color::int, dataset::int, name::varchar, sub_name::varchar.
Dataset has the columns id::int, name::varchar, type:varchar.
When this is run, I get the error message: [25P02] ERROR: current transaction is aborted, commands ignored until end of transaction block.
As a side, the commands are executed in PyCharm, and is part of a Python project.
Any suggestions?
Thanks in advance :)

There is an important difference between these two queries:
select 1, 'abc';
select (1, 'abc');
The first query returns one row with two columns with values 1 and 'abc'. The second one returns a row with one column of pseudo-type record with value (1, 'abc').
Your query tries to create a table with one column of pseudo-type record. This is impossible and should end with
ERROR: column "row" has pseudo-type record
SQL state: 42P16
Just remove brackets from your query.
As a_horse stated, [25P02] ERROR does not apply to the query in question.
Btw, my advice: never use keywords as table/column names.

Related

BigQuery insert values AS, assume nulls for missing columns

Imagine there is a table with 1000 columns.
I want to add a row with values for 20 columns and assume NULLs for the rest.
INSERT VALUES syntax can be used for that:
INSERT INTO `tbl` (
date,
p,
... # 18 more names
)
VALUES(
DATE('2020-02-01'),
'p3',
... # 18 more values
)
The problem with it is that it is hard to tell which value corresponds to which column. And if you need to change/comment out some value then you have to make edits in two places.
INSERT SELECT syntax also can be used:
INSERT INTO `tbl`
SELECT
DATE('2020-02-01') AS date,
'p3' AS p,
... # 18 more value AS column
... # 980 more NULL AS column
Then if I need to comment out some column just one line has to be commented out.
But obviously having to set 980 NULLs is an inconvenience.
What is the way to combine both approaches? To achieve something like:
INSERT INTO `tbl`
SELECT
DATE('2020-02-01') AS date,
'p3' AS p,
... # 18 more value AS column
The query above doesn't work, the error is Inserted row has wrong column count; Has 20, expected 1000.
Your first version is really the only one you should ever be using for SQL inserts. It ensures that every target column is explicitly mentioned, and is unambiguous with regard to where the literals in the VALUES clause should go. You can use the version which does not explicitly mention column names. At first, it might seem that you are saving yourself some code. But realize that there is a column list which will be used, and it is the list of all the table's columns, in whatever their positions from definition are. Your code might work, but appreciate that any addition/removal of a column, or changing of column order, can totally break your insert script. For this reason, most will strongly advocate for the first version.
You can try following solution, it is combination of above 2 process which you have highlighted in case study:-
INSERT INTO `tbl` (date, p, 18 other coll names)
SELECT
DATE('2020-02-01') AS date,
'p3' AS p,
... # 18 more value AS column
Couple of things you should consider here are :-
Other 980 Columns should ne Nullable, that means it should hold NULL values.
All 18 columns in Insert line and Select should be in same order so that data will be inserted in same correct order.
To Avoid any confusion, try to use Alease in Select Query same as Insert Table Column name. It will remove any ambiguity.
Hopefully it will work for you.
In BigQuery, the best way to do what you're describing is to first load to a staging table. I'll assume you can get the values you want to insert into JSON format with keys that correspond to the target column names.
values.json
{"date": "2020-01-01", "p": "p3", "column": "value", ... }
Then generate a schema file for the target table and save it locally
bq show --schema project:dataset.tbl > schema.json
Load the new data to the staging table using the target schema. This gives you "named" null values for each column present in the target schema but missing from your json, bypassing the need to write them out.
bq load --replace --source_format=NEWLINE_DELIMIITED_JSON \
project:dataset.stg_tbl values.json schema.json
Now the insert select statement works every time
insert into `project:dataset.tbl`
select * from `project:dataset.stg_tbl`
Not a pure SQL solution but I managed this by loading my staging table with data then running something like:
from google.cloud import bigquery
client = bigquery.Client()
table1 = client.get_table(f"{project_id}.{dataset_name}.table1")
table1_col_map = {field.name: field for field in table1.schema}
table2 = client.get_table(f"{project_id}.{dataset_name}.table2")
table2_col_map = {field.name: field for field in table2.schema}
combined_schema = {**table2_col_map, **table1_col_map}
table1.schema = list(combined_schema.values())
client.update_table(table1_cols, ["schema"])
Explanation:
This will retrieve the schemas of both, convert their schemas into a dictionary with key as column name and value as the actual field info from the sdk. Then both are combined with dictionary unpacking (the order of unpacking determines which table's columns have precedence when a column is common between them. Finally the combined schema is assigned back to table 1 and used to update the table, adding the missing columns with nulls.

Get Id from a conditional INSERT

For a table like this one:
CREATE TABLE Users(
id SERIAL PRIMARY KEY,
name TEXT UNIQUE
);
What would be the correct one-query insert for the following operation:
Given a user name, insert a new record and return the new id. But if the name already exists, just return the id.
I am aware of the new syntax within PostgreSQL 9.5 for ON CONFLICT(column) DO UPDATE/NOTHING, but I can't figure out how, if at all, it can help, given that I need the id to be returned.
It seems that RETURNING id and ON CONFLICT do not belong together.
The UPSERT implementation is hugely complex to be safe against concurrent write access. Take a look at this Postgres Wiki that served as log during initial development. The Postgres hackers decided not to include "excluded" rows in the RETURNING clause for the first release in Postgres 9.5. They might build something in for the next release.
This is the crucial statement in the manual to explain your situation:
The syntax of the RETURNING list is identical to that of the output
list of SELECT. Only rows that were successfully inserted or updated
will be returned. For example, if a row was locked but not updated
because an ON CONFLICT DO UPDATE ... WHERE clause condition was not
satisfied, the row will not be returned.
Bold emphasis mine.
For a single row to insert:
Without concurrent write load on the same table
WITH ins AS (
INSERT INTO users(name)
VALUES ('new_usr_name') -- input value
ON CONFLICT(name) DO NOTHING
RETURNING users.id
)
SELECT id FROM ins
UNION ALL
SELECT id FROM users -- 2nd SELECT never executed if INSERT successful
WHERE name = 'new_usr_name' -- input value a 2nd time
LIMIT 1;
With possible concurrent write load on the table
Consider this instead (for single row INSERT):
Is SELECT or INSERT in a function prone to race conditions?
To insert a set of rows:
How to use RETURNING with ON CONFLICT in PostgreSQL?
How to include excluded rows in RETURNING from INSERT ... ON CONFLICT
All three with very detailed explanation.
For a single row insert and no update:
with i as (
insert into users (name)
select 'the name'
where not exists (
select 1
from users
where name = 'the name'
)
returning id
)
select id
from users
where name = 'the name'
union all
select id from i
The manual about the primary and the with subqueries parts:
The primary query and the WITH queries are all (notionally) executed at the same time
Although that sounds to me "same snapshot" I'm not sure since I don't know what notionally means in that context.
But there is also:
The sub-statements in WITH are executed concurrently with each other and with the main query. Therefore, when using data-modifying statements in WITH, the order in which the specified updates actually happen is unpredictable. All the statements are executed with the same snapshot
If I understand correctly that same snapshot bit prevents a race condition. But again I'm not sure if by all the statements it refers only to the statements in the with subqueries excluding the main query. To avoid any doubt move the select in the previous query to a with subquery:
with s as (
select id
from users
where name = 'the name'
), i as (
insert into users (name)
select 'the name'
where not exists (select 1 from s)
returning id
)
select id from s
union all
select id from i

Postgres serial values insertion

From the Postgres documentation on INSERT, default keyword should auto increment columns declared as serial.But when I combine it with select statement, it throws me an error
syntax error at or near "DEFAULT"
Here is the insert statement
insert into abc (id,date,serialnumber) (DEFAULT,select (data.date,data.serialnumber) from data)
DEFAULT can only be as a "literal" for an INSERT statement in the VALUES clause. I cannot be used inside the column list of a SELECT statement even if that is used for an INSERT.
To apply the default value, simply leave out the column:
insert into abc (date,serialnumber)
select date, serialnumber
from data
For an example see here: http://sqlfiddle.com/#!12/d291a/1
Also: do not put a column list into parantheses. (a,b) is something different than a,b in Postgres. The first is a single record with two attributes, the second are two different columns.
See this SQLFiddle demo here: http://sqlfiddle.com/#!12/3a890/1 and note the difference between the two results.

SQL Column Swap Behavior

I'm swapping column values in a table using the following statement:
UPDATE SwapTable
SET ValueA=ValueB
,ValueB=ValueA
This works and the values do get swapped, as can be verified by this SQL Fiddle.
However, if we did such thing in (mostly any) other language, we would end up with both ValueA and ValueB having identical values.
So my question is why/how this works in SQL.
You can just see the execution plan.
Select all the rows from the table and make it as a row set.
Open a transaction
Update the table referenced (SwapTable) with corresponding row address, from the old values read from the row set to the field reference.
Commit -- done updating.

SQL: I need to take two fields I get as a result of a SELECT COUNT statement and populate a temp table with them

So I have a table which has a bunch of information and a bunch of records. But there will be one field in particular I care about, in this case #BegAttField# where only a subset of records have it populated. Many of them have the same value as one another as well.
What I need to do is get a count (minus 1) of all duplicates, then populate the first record in the bunch with that count value in a new field. I have another field I call BegProd that will match #BegAttField# for each "first" record.
I'm just stuck as to how to make this happen. I may have been on the right path, but who knows. The SELECT statement gets me two fields and as many records as their are unique #BegAttField#'s. But once I have them, I haven't been able to work with them.
Here's my whole set of code, trying to use a temporary table and SELECT INTO to try and populate it. (Note: the fields with # around the names are variables for this 3rd party app)
CREATE TABLE #temp (AttCount int, BegProd varchar(255))
SELECT COUNT(d.[#BegAttField#])-1 AS AttCount, d.[#BegAttField#] AS BegProd
INTO [#temp] FROM [Document] d
WHERE d.[#BegAttField#] IS NOT NULL GROUP BY [#BegAttField#]
UPDATE [Document] d SET d.[#NumAttach#] =
SELECT t.[AttCount] FROM [#temp] t INNER JOIN [Document] d1
WHERE t.[BegProd] = d1.[#BegAttField#]
DROP TABLE #temp
Unfortunately I'm running this script through a 3rd party database application that uses SQL as its back-end. So the errors I get are simply: "There is already an object named '#temp' in the database. Incorrect syntax near the keyword 'WHERE'. "
Comment out the CREATE TABLE statement. The SELECT INTO creates that #temp table.