postgresql INSERTs NULL values from SELECT COS(another field) query - sql

If I run
SELECT
(cos(radians(spa.spa_zenithangle)))
FROM generic.spa;
I get a sensible range of results from -1 to 1. but if I run this insert statement all the resulting values in the spa.spa_cos_zenith field are NULLs
INSERT INTO generic.spa
(spa_cos_zenith)
SELECT
(cos(radians(spa.spa_zenithangle)))
FROM generic.spa;
The table definition is:
CREATE TABLE generic.spa (
spaid INTEGER DEFAULT nextval('generic.spa_id_seq'::regclass) NOT NULL,
measurementdatetime TIMESTAMP WITHOUT TIME ZONE,
spa_zenithangle NUMERIC(7,3),
spa_cos_zenith DOUBLE PRECISION,
CONSTRAINT spa_pk PRIMARY KEY(spaid)
)
WITH (oids = false);
Anyone know why the COS functions returns results ok but they cant be inserted into another field?

I suspect you want update, not insert:
UPDATE generic.spa
SET spa_cos_zenith = cos(radians(spa.spa_zenithangle));
INSERT inserts new rows, so you are duplicating the rows. The only column in the new rows is the COS() value. Nothing changes in the old rows.

Related

SQL - Multiple fields are updated instead of one

I have four columns: ID, STARTTIME, ENDINGTIME and DURATION.
The table is created with:
CREATE TABLE tableName (
ID INTEGER NOT NULL PRIMARY KEY AUTO_INCREMENT,
STARTTIME TIMESTAMP,
ENDINGTIME TIMESTAMP,
DURATION TIME);
The ID is an auto_increment column. Then I've the code for inserting a new STARTTIME:
INSERT INTO tableName(STARTTIME) VALUES(CURRENT_TIMESTAMP);
Secondly I've the code for updating the row with the biggest ID to set the ENDINGTIME:
SET #latestInsertID = (SELECT MAX(ID) FROM tableName);
UPDATE tableName SET ENDINGTIME=(CURRENT_TIMESTAMP) WHERE ID=#latestInsertID;
Now I can execute both (all three) queries without getting an exception and the first query works totally fine (as I expected). But the last query updates (from the row I wanted to update) the ENDINGTIME as well as the STARTTIME. Why doesn't it just update the ENDINGTIME?
Thank you for every solution!
Use DATETIME instead of TIMESTAMP (MWE)
Here's why:
The timestamp field is generally used to define at which moment in time a row was added or updated and by default will automatically be assigned the current datetime when a record is inserted or updated. The automatic properties only apply to the first TIMESTAMP in the record; subsequent TIMESTAMP columns will not be changed.
Educated guess. Column is defined as:
CREATE TABLE tablename(
-- ...
STARTTIME TIMESTAMP ON UPDATE CURRENT_TIMESTAMP
);
Or there is underlying trigger that perfroms same logic.

UPDATE two columns with new value under large size table

We have table like :
mytable (pid, string_value, int_value)
This table has more than 20M rows in total. Now we have a feature try to mark all the rows from this tables as invalid. So we need update the table columns: string_Value = NULL and int_value = 0 which indicate this is invalid row ( we still want to keep the pid as it is important to us)
So what is the best way?
I use the following SQL:
UPDATE Mytable
SET string_value = NULL,
int_value = 0;
but this query takes more than 4 minutes in my test env. Is there any better way we can improve it?
Updating all the rows can be quite expensive. Often, it is faster to empty the table and reload it.
In generic SQL this looks like:
create table mytable_temp as
select pid
from mytable;
truncate table mytable; -- back it up first!
insert into mytable (pid, string_value, int_value)
select pid, null, 0
from mytable_temp;
The creation of the temporary table may use different syntax, depending on our database.
Updates can take time to complete. Another way of achieving this is to follow the following steps:
Add new columns with the values you need set as the default value
Drop the original columns
Rename the new columns with the names of the original columns.
You can then drop the default values on the new columns.
This needs to be tested as different DBMSs allow different levels of table alters (i.e. not all DMBSs allow a drop default or a drop column).

Bulk Inserting data to table which have default current timestamp column

I have a table on redshift with following structure
CREATE TABLE schemaName.tableName (
some_id INTEGER,
current_time TIMESTAMP DEFAULT GETDATE()
);
If I bulk insert data from other table for example
INSERT INTO schemaName.tableName (some_id) SELECT id FROM otherSchema.otherTable;
Will the value for current_time column be same for all bulk inserted rows? Or it will depend on insertion time for each record. As the column data-type is TIMESTAMP
I am considering this for Amazon Redshift only.
So far I have tested with changing the default value of current_time column to SYSDATE and bulk inserting 10 rows to target table. current_time column values per row yields results like 2016-11-16 06:38:52.339208 and are same for each row, where GETDATE() yields result like 2016-11-16 06:43:56. I haven't found any documentation regarding this and need confirmation regarding this.
To be precise, all rows get same timestamp values after executing following statement
INSERT INTO schemaName.tableName (some_id) SELECT id FROM otherSchema.otherTable;
But if I change the table structure to following
CREATE TABLE schemaName.tableName (
some_id INTEGER,
current_time DOUBLE PRECISION DEFAULT RANDOM()
);
rows get different random values for current_time
Yes. Redshift will have same default value in the case of bulk insert. The RedshiftDocumentation has the below content:
the evaluated DEFAULT expression for a given column is the same for
all loaded rows, a DEFAULT expression that uses a RANDOM() function
will assign to same value to all the rows.

Insert row to database based on form values not currently in database

I am using Access 2013 and I am trying to insert rows to a table but I don't want any duplicates. Basically if not exists in table enter the data to table. I have tried to using 'Not Exists' and 'Not in' and currently it still does not insert to table. Here is my code if I remove the where condition then it inserts to table but If I enter same record it duplicates. Here is my code:
INSERT INTO [UB-04s] ( consumer_id, prov_id, total_charges, [non-covered_chrgs], patient_name )
VALUES ([Forms]![frmHospitalEOR]![client_ID], [Forms]![frmHospitalEOR]![ID], Forms![frmHospitalEOR].[frmItemizedStmtTotals].Form.[TOTAL BILLED], Forms![frmHospitalEOR].[frmItemizedStmtTotals].Form.[TOTAL BILLED], [Forms]![frmHospitalEOR]![patient_name])
WHERE [Forms]![frmHospitalEOR]![ID]
NOT IN (SELECT DISTINCT prov_id FROM [UB-04s]);
You cannot use WHERE in this kind of SQL:
INSERT INTO tablename (fieldname) VALUES ('value');
You can add a constraint to the database, like a unique index, then the insert will fail with an error message. It is possible to have multiple NULL values for several rows, the unique index makes sure that rows with values are unique.
To avoid these kind of error messages you can build a procedure or use code to check data first, and then perform some action - like do the insert or cancel.
This select could be used to check data:
SELECT COUNT(*) FROM [UB-04s] WHERE prov_id = [Forms]![frmHospitalEOR]![ID]
It will return number of rows with the spesific value, if it is 0 then you are redy to run the insert.

INSERT new row if value does not exist and get id either way

I would like to insert a record into a table and if the record is already present get its id, otherwise run the insert and get the new record's id.
I will be inserting millions of records and have no idea how to do this in an efficient manner. What I am doing now is to run a select to check if the record is already present, and if not, insert it and get the inserted record's id. As the table is growing I imagine that SELECT is going to kill me.
What I am doing now in python with psycopg2 looks like this:
select = ("SELECT id FROM ... WHERE ...", [...])
cur.execute(*select)
if not cur.rowcount:
insert = ("INSERT INTO ... VALUES ... RETURNING id", [...])
cur.execute(*insert)
rid = cur.fetchone()[0]
Is it maybe possible to do something in a stored procedure like this:
BEGIN
EXECUTE sql_insert;
RETURN id;
EXCEPTION WHEN unique_violation THEN
-- return id of already existing record
-- from the exception info ?
END;
Any ideas of how optimize a case like this?
First off, this is obviously not an UPSERT as UPDATE was never mentioned. Similar concurrency issues apply, though.
There will always be a race condition for this kind of task, but you can minimize it to an extremely tiny time slot, while at the same time querying for the ID only once with a data-modifying CTE (introduced with PostgreSQL 9.1):
Given a table tbl:
CREATE TABLE tbl(tbl_id serial PRIMARY KEY, some_col text UNIQUE);
Use this query:
WITH x AS (SELECT 'baz'::text AS some_col) -- enter value(s) once
, y AS (
SELECT x.some_col
, (SELECT t.tbl_id FROM tbl t WHERE t.some_col = x.some_col) AS tbl_id
FROM x
)
, z AS (
INSERT INTO tbl(some_col)
SELECT y.some_col
FROM y
WHERE y.tbl_id IS NULL
RETURNING tbl_id
)
SELECT COALESCE(
(SELECT tbl_id FROM z)
,(SELECT tbl_id FROM y)
);
CTE x is only for convenience: enter values once.
CTE y retrieves tbl_id - if it already exists.
CTE z inserts the new row - if it doesn't.
The final SELECT avoids running another query on the table with the COALESCE construct.
Now, this can still fail if a concurrent transaction commits a new row with some_col = 'foo' exactly between CTE y and z, but that's extremely unlikely. If it happens you get a duplicate key violation and have to retry. Nothing lost. If you don't face concurrent writes, you can just forget about this.
You can put this into a plpgsql function and rerun the query on duplicate key error automatically.
Goes without saying that you need two indexes in this setup (like displayed in my CREATE TABLE statement above):
a UNIQUE or PRIMARY KEY constraint on tbl_id (which is of serial type!)
another UNIQUE or PRIMARY KEY constraint on some_col
Both implement an index automatically.