In my exercise sheet, my teacher use a python interface with SQLite. But I preferred to work with SQLite on Datagrip.
I'm asked to write a function which take as argument a table and returns the primary key column. I found that writing function in SQLite wasn't possible and that I needed some php interface or whatever.
Anyway as far as it's just training to get some SQL knowledge I tried to write a query for a table which returns a table containing the primary keys of a table.
Using the PRAGMA table_info (Evenements) query I get this table :
So I copied it in a Test table and wrote this query which works :
SELECT name from Test where pk;
But I can't replace Test by PRAGMA table_info (Evenements) or table_info (Evenements).
Do you know if I can directly use the table_info result without having to copy it in a new table?
The function that you should use is pragma_table_info():
SELECT *
FROM pragma_table_info('Evenements')
You will get info about all the columns of the table Evenements.
If you want info only for the primary key add a WHERE clause:
WHERE pk
Related
I am not able to get generated key from column organized table from an INSERT SQL statement run against IBM DB2 warehouse. I am using Java and JDBC driver. Everything works fine - I am able to connect to DB, create tables, insert data, I am just not able to get a generated key if it is generated in column organized table. Note that row organized tables work fine and return the key properly.
Consider a table:
CREATE TABLE users (
id INTEGER not null GENERATED ALWAYS AS IDENTITY (START WITH 1 INCREMENT BY 1),
username VARCHAR(16),
PRIMARY KEY (id)
);
If this is row organized table I am able to get the generated key fine by using:
PreparedStatement pr = connection.prepareStatement("INSERT INTO users(username) VALUES(?)", PreparedStatement.RETURN_GENERATED_KEYS);
However, If this is column organized table the PreparedStatemnt creation fails with an error:
com.ibm.db2.jcc.am.SqlSyntaxErrorException: DB2 SQL Error: SQLCODE=-1667, SQLSTATE=42858, SQLERRMC=BLUADMIN.USERS;ORGANIZE BY COLUMN;FINAL|NEW|OLD TABLE, DRIVER=4.25.13
Even if I specify columns I want to get returned like so:
PreparedStatement pr = connection.prepareStatement("INSERT INTO users(username) VALUES(?)", new String[]{"id","username"});
pr.setString(1, "test");
pr.executeUpdate();
I get
com.ibm.db2.jcc.am.SqlSyntaxErrorException: DB2 SQL Error: SQLCODE=-1667, SQLSTATE=42858, SQLERRMC=BLUADMIN.USERS;ORGANIZE BY COLUMN;FINAL|NEW|OLD TABLE, DRIVER=4.25.13
on line pr.executeUpdate();.
Does this mean that it is not possible to get generated key from column organized table from the INSERT statement in DB2 Warehouse?
Currently shipping versions v11.1.x and V11.5.x will throw SQL1667N when the query sent to Db2 uses 'FINAL TABLE' or 'OLD TABLE', or 'NEW TABLE' clauses for a column organized table.
When you use the jdbc syntax PreparedStatement.RETURN_GENERATED_KEYS, this syntax may be used under the covers.
Currently those clauses are not supported (i.e. will cause the exception to be thrown) for ORGANIZE BY COLUMN tables. There are other restrictions on column organized tables that you should be aware of before using them.
You can workaround this by creating your tables explicitly with the ORGANIZE BY ROW clause.
Have you tried actually selecting the generated ID? Try something like this:
SELECT ID FROM FINAL TABLE
(INSERT INTO users(username) VALUES(?))
See "Retrieval of result sets from an SQL data change statement" in the IBM Db2 documentation.
I searched but only could found partial answer to this question
The goal would be here to create a new ID column on an existing table.
This new column would be the primary key for the table and I simply want it to be filled with integer values from 1 to number of rows.
What would be the query for that?
I know I have to first alter table to create the new column :
ALTER TABLE <MYTABLE> ADD (ID INTEGER);
Then I could use the series generator :
INSERT INTO <MYTABLE.ID> SELECT SERIES_GENERATE_INTEGER(1,1,(number of rows));
Once the column is filled I could use this line:
ALTER TABLE <MYTABLE> ADD PRIMARY KEY ("ID");
I am sure there is an easier way to do this
You wrote that you want to add a "new ID column to an existing table" and fill it with unique values.
That's not a "standard" operation in any DBMS, as the usual assumption is that records are created with a primary key and not retro fitted.
Thus, "ease" of operation for this is relative to what else you want to do.
For example, if you want to continue using this ID as a primary key for further operations, then using a once-off generator function like the SERIES_GENERATE_INTEGER or a query won't be very helpful since you have to avoid duplicates of already existing values.
Two, relatively easy, options come to mind:
Using a sequence:
create sequence myid;
update <table> set ID = myid.nextval;
And for succeeding inserts:
insert into <table> (id, ..., ...) VALUES (myid.nextval, ..., ...) ;
Note that this generates a value for every existing record and not a predefined set of size X.
Using a GUID
By using a GUID you generate a unique value every time you call the 'SYSUUID' function in SAP HANA. check docu here
Something like
update <table> set ID = SYSUUID;
should do the trick here.
Subsequent inserts would simply call the function for values of ID.
I have a 30 GB tab separated text file which has more than 100 million rows, when I want to import this text file to a PostgreSQL table using \copy command, some rows cause error. how can I ignore those rows and also take a record of the ignored rows while importing to postgresql?
I connect to my machine by SSH so I can not use pgadmin!
it's very hard to edit the text file before importing because so many different rows have different problems. if there exists a way to check the rows one by one before importing and then run the \copy command for individual rows, it would be helpful.
Below is the code which generates the table:
CREATE TABLE Papers(
Paper_ID CHARACTER(8) PRIMARY KEY,
Original_paper_title TEXT,
Normalized_paper_title TEXT,
Paper_publish_year INTEGER,
Paper_publish_date DATE,
Paper_Document_Object_Identifier TEXT,
Original_venue_name TEXT,
Normalized_venue_name TEXT,
Journal_ID_mapped_to_venue_name CHARACTER(8),
Conference_ID_mapped_to_venue_name CHARACTER(8),
Paper_rank BIGINT,
FOREIGN KEY(Journal_ID_mapped_to_venue_name) REFERENCES Journals(Journal_ID),
FOREIGN KEY(Conference_ID_mapped_to_venue_name) REFERENCES Conferences(Conference_ID));
Don't load directly to your destination table but to a single column staging table.
create table Papers_stg (rec text);
Once you have all the data loaded you can the do verifications on the data using SQL.
Find records with wrong number of fields:
select rec
from Papers_stg
where cardinality(string_to_array(rec,' ')) <> 11
Create a table with all text fields
create table Papers_fields_text
as
select fields[1] as Paper_ID
,fields[2] as Original_paper_title
,fields[3] as Normalized_paper_title
,fields[4] as Paper_publish_year
,fields[5] as Paper_publish_date
,fields[6] as Paper_Document_Object_Identifier
,fields[7] as Original_venue_name
,fields[8] as Normalized_venue_name
,fields[9] as Journal_ID_mapped_to_venue_name
,fields[10] as Conference_ID_mapped_to_venue_name
,fields[11] as Paper_rank
from (select string_to_array(rec,' ') as fields
from Papers_stg
) t
where cardinality(fields) = 11
For fields conversion checks you might want to use the concept described here
Your only option is to use row-by-row processing. Write shell script (for example) that will loop thru input file and send each row to "copy" then check execution result, then write failed rows to some "err_input.txt".
More complicated logic can increase processing speed. Using "portions" instead of row-by-row and use row-by-row logic on failed segments.
Consider using pgloader
Check BATCHES AND RETRY BEHAVIOUR
You could use an BEFORE INSERT - trigger and check your criteria. If the record fails the check, write a log (or an entry into a separate table) and return null. You could even correct some values, if possible and feasible.
Of course, if checking criteria requires other queries (like finding duplicate keys etc.), you might get a performance issue. But I'm not sure which kind of "different problems in different rows" you mean...
Confer also an answer on StackExchange Database Administrators, and the following example taken from Bartosz Dmytrak at PostgreSQL forum:
CREATE OR REPLACE FUNCTION "myschema"."checkTriggerFunction" ()
RETURNS TRIGGER
AS
$BODY$
BEGIN
IF EXISTS (SELECT 1 FROM "myschema".mytable WHERE "MyKey" = NEW."MyKey")
THEN
RETURN NULL;
ELSE
RETURN NEW;
END IF;
END;
$BODY$
LANGUAGE plpgsql;
and trigger:
CREATE TRIGGER "checkTrigger"
BEFORE INSERT
ON "myschema".mytable
FOR EACH ROW
EXECUTE PROCEDURE "myschema"."checkTriggerFunction"();
Application has different versions. Each version has it's own set of values in each table. I need to provide functionality to copy data from one version to another. Problem :
By inserting data I am trying to insert Ids which has already been in use in this table. So, I need to change ids of components which I want to insert but I must save relationship between those components. How cat I do that?
Create a master table which has a surrogate key as your primary key. A numeric value of type NUMBER(9) works well. You can create a sequence and trigger to automatically insert this.
The rest of the table is the column of your current table plus a column to indicate which version the row is for.
For simplicity you may wish to create views on top of the table along the lines of
select * from master_table where version_id = ####;
To copy the data from one version to another this will work:
Insert into master_table seq_master_table.nextval, new version_id,.....
from master_table
where version_id = ####;
I'm currently working on a project that uses a Redshift table with 51 columns. However, the person who made the table forgot to add a sortkey to our time column which will hurt performance for our use case if we don't add it.
How can I make a version of the table with our time column as the sortkey? I'm aware that you can't make a column a sortkey if its a member of an existing table, but I was hoping there's a way to do it that doesn't involve writing out the CREATE TABLE syntax by hand; for example, something like this would be nice:
timecube=# CREATE TABLE foo (like bar) sortkey(time);
ERROR: CREATE TABLE LIKE is not supported with DISTSTYLE, DISTKEY(), or SORTKEY() clauses
but as you can see its not supported. Is there another way? As we're still developing we don't need any of existing data.
Using traditional tools like pgdump didn't work well because they don't include any of the Redshift extras like encoding.
Redshift supports specifying the DIST and SORT keys as part of CREATE TABLE AS statements, as per the docs.
CREATE TABLE table_name
DISTSTYLE KEY
DISTKEY ( column )
SORTKEY ( column )
AS
(SELECT *
FROM source_table)
;
First step you need to do use get create table statement for existing table. Then create new table this time add sort key to new table.
Check encoding for old table ( when you load data using copy command it automatically adds compression encodings)
select "column", type, encoding
from pg_table_def where tablename = 'old_table'
When creating new table add encoding type for each column. Create table with Sort key .
Once new table is created use below command
insert into new table ( select * from old table order by time asc)