I use sqoop to export a table in mysql to hive.
When job complete, I check the record number and found number of records in hive is much more than in mysql.
I further check the primary key of data in hive, and found many null values.
Can any one help explain why this happened? Is that possible that these null values are caused by delete operation in mysql.
id column in mysql is primary key.
A lot of null primary key found in corresponding hive table.
Related
I'm running Postgres10 on PGAdmin4
To see if my database is out of sync, I'm checking my primary key and primary key sequence.
I'm running this query to check primary key:
MAX(sid) FROM schema_name.table_name; Returns 1032
sid is the primary key
schema_name is the schema where my table is located
table_name is name of the table where the unique constraint is being violated
I'm running this query to check primary key sequence:
SELECT nextval(pg_get_serial_sequence('schema_name.table_name', 'sid')); returns 1042 (current value is 1041).
Referencing this SO: I'm referencing this stack overflow: postgresql duplicate key violates unique constraint
But the post is 9 years old and the solution only checks if the primary key's max is greater than the next value of the sequence (which it isn't in my case).
I have run into issues with a key being out of sync with a sequence when a custom program is inserting records records into a table and taking the last_key_value + 1 while another program is using the sequence nextval to insert into the same table.
This can create duplicate key problems.
I would check to make sure you don't have programs with this conflict.
A better way to fully circumvent this type of problem is to use an IDENTITY type column. Though I dont know if Postgres supports this data type.
I am trying to insert into a table in a Postgres database from two other Postgres databases using Foreign Data Wrappers. The objective is to have an autogenerate primary key, independent of the source, as there will be more than two in.
I first defined the tables like so:
Target database:
create table dummy (
dummy_pk bigserial primary key
-- other fields
);
Sources databases:
create foreign table dummy (
dummy_pk bigserial
-- other fields
) server ... ;
This solution worked fine as long as I inserted from only one source, when I tried to insert from the other one, without specifying dummy_pk, I got this message:
Duplicate key (dummy_pk)=(1)
Because postgres tries to insert an id of 1, I believe the sequence used for each source foreign table is different. I changed the source tables a bit in an attempt to let the target table's sequence do the job for the id:
create foreign table dummy (
dummy_pk bigint
-- other fields
) server ... ;
This time I got a diffrent error:
NULL value violates NOT NULL constaint on column « dummy_pk »
Therefore I believe the source server sends a query to the target where the dummy_pk is null, and the target does not replace it with the default value.
So, is there a way I can force the use of the target's sequence in a query executed on the source? Maybe I have to share that sequence, can I create a foreign sequence? I cannot remove the column on the foreign tables as I need a read access to them.
Thanks!
Remove dummy_pk from foreign tables so that destination table does not get NULL nor value and so fall backs to DEFAULT or NULL if no DEFAULT specified. If you attempt to pass DEFAULT to foreign table it will try to use DEFAULT value of foreign table instead.
create foreign table dummy (
/*dummy_pk bigserial,*/
column1 text,
column2 int2,
-- other fields
) server ... ;
Another way would be to grab sequence values from destination server using dblink, but I think this is better (if you can afford to have this column removed from foreign tables).
This question have been asked by several people but my problem seems to be different.
Actually I have to merge same structured tables from different databases in postgresql into a new DB. What I am doing is that I connect to remote db using dblink, reads that table in that db and insert it into the table in the current DB like below
INSERT INTO t_types_of_dementia SELECT * FROM dblink('host=localhost port=5432 dbname=snap-cadence password=xxxxxx', 'SELECT dementia_type, snapid FROM t_types_of_dementia') as x(dementia_type VARCHAR(256),snapid integer);
First time this query runs fine, but when I run it again or try to run it with some other remote database's table: it gives me this error
ERROR: duplicate key value violates unique constraint
"t_types_of_dementia_pkey"
I want that this new tables gets populated by entries of others tables from other dbs.
Some of the solutions proposed talks about sequence, but i am not using any
The structure of the table in current db is
CREATE TABLE t_types_of_dementia(
dementia_type VARCHAR(256),
snapid integer NOT NULL,
PRIMARY KEY (dementia_type,snapid)
);
P.S. There is a specific reason why both the columns are used as a primary key which may not be relevant in this discussion, because same issue happens in other tables where this is not the case.
As the error message tells you - you can not have two rows with the same value in the columns dementia_type, snapid since they need to be unique.
You have to make sure that the two databases has the same values for dementia_type, snapid.
A workaround would be to add a column to your table alter table t_types_of_dementia add column id serial generated always and use that as primary key instead of your current.
I am newbie to MSSQL Server and don't have any knowledge about it.
i have below question.
I have added nine records with same value as show per below image in SQL Server 2005.
i Have not given any primary key to Table.
Now when i selecting one record or multiple record and hit the delete key it does not delete the records from table instead it gives me error.
You need to add a primary key to uniquely identify each record, otherwise the SQL server has no way of distinguishing the records, and therefore no way of knowing which one to delete, causing an error.
That's because you don't have any primary key and server doesn't know which row to remove. Clear the table ( DELETE * FROM dbo.Patient ) and create new Id column as a primary key.
In MSSQL you need to have a primary key for the table. This will uniquely identify each row of that particular table.
For example in Oracle you don't need this as there you can use ROWID (meaning every row from every table has a unique ID in the database). Once you know this ID you Oracle knows for sure from which table it is.
So now you can add a primary key to the table and you can make it be auto-increment - ensuring uniqueness.
I create the following table in H2:
CREATE TABLE TEST
(ID BIGINT NOT NULL PRIMARY KEY)
Then I look into INFORMATION_SCHEMA.TABLES table:
SELECT SQL
FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_NAME = 'TEST'
Result:
CREATE CACHED TABLE TEST(
ID BIGINT NOT NULL
)
Then I look into INFORMATION_SCHEMA.CONSTRAINTS table:
SELECT SQL
FROM INFORMATION_SCHEMA.CONSTRAINTS
WHERE TABLE_NAME = 'TEST'
Result:
ALTER TABLE TEST
ADD CONSTRAINT CONSTRAINT_4C
PRIMARY KEY(ID)
INDEX PRIMARY_KEY_4C
These statements are not the ones which I have stated, therefore, the question is:
Is the information in TABLES and CONSTRAINS reflects how real SQL which was executed in database?
In original CREATE TABLE statement
there was no CACHED word. (not a problem)
I have never executed ALTER TABLE .. ADD CONSTRAINT statement.
The actual reason why I am asking the question is that I am not sure which statement should I execute in order to guarantee that primary key is used in a clustered index.
If you look at my previous question H2 database: clustered index support then you may find in the answer of Thomas Mueller the following statement:
If a primary key is created after the table has been created then the primary key is stored in a new index b-tree.
Therefore, if the statements are executed as such they are shown in INFORMATION_SCHEMA, then primary key is created after the table is created and hence ID is not used in a clustered index (basically as a key in a data b-tree).
Is there a way how one can guarantee that primary key is used in a clustered index in H2?
Is the information in TABLES and CONSTRAINS reflects how real SQL which was executed in database?
Yes. Basically, those are the statements that are run when opening the database.
If you look at my previous question
The answer "If a primary key is created after the table has been created..." was incorrect, I fixed it now to "If a primary key is created after data has been inserted...".
Is there a way how one can guarantee that primary key is used as a clustered index in H2?
This is now better described in the H2 documentation at "How Data is Stored Internally": "If a single column primary key of type BIGINT, INT, SMALLINT, TINYINT is specified when creating the table (or just after creating the table, but before inserting any rows), then this column is used as the key of the data b-tree."