How to ignore some rows while importing from a tab separated text file in PostgreSQL?

How to ignore some rows while importing from a tab separated text file in PostgreSQL? - sql

I have a 30 GB tab separated text file which has more than 100 million rows, when I want to import this text file to a PostgreSQL table using \copy command, some rows cause error. how can I ignore those rows and also take a record of the ignored rows while importing to postgresql?
I connect to my machine by SSH so I can not use pgadmin!
it's very hard to edit the text file before importing because so many different rows have different problems. if there exists a way to check the rows one by one before importing and then run the \copy command for individual rows, it would be helpful.
Below is the code which generates the table:
CREATE TABLE Papers(
Paper_ID CHARACTER(8) PRIMARY KEY,
Original_paper_title TEXT,
Normalized_paper_title TEXT,
Paper_publish_year INTEGER,
Paper_publish_date DATE,
Paper_Document_Object_Identifier TEXT,
Original_venue_name TEXT,
Normalized_venue_name TEXT,
Journal_ID_mapped_to_venue_name CHARACTER(8),
Conference_ID_mapped_to_venue_name CHARACTER(8),
Paper_rank BIGINT,
FOREIGN KEY(Journal_ID_mapped_to_venue_name) REFERENCES Journals(Journal_ID),
FOREIGN KEY(Conference_ID_mapped_to_venue_name) REFERENCES Conferences(Conference_ID));

Don't load directly to your destination table but to a single column staging table.
create table Papers_stg (rec text);
Once you have all the data loaded you can the do verifications on the data using SQL.
Find records with wrong number of fields:
select rec
from Papers_stg
where cardinality(string_to_array(rec,' ')) <> 11
Create a table with all text fields
create table Papers_fields_text
as
select fields[1] as Paper_ID
,fields[2] as Original_paper_title
,fields[3] as Normalized_paper_title
,fields[4] as Paper_publish_year
,fields[5] as Paper_publish_date
,fields[6] as Paper_Document_Object_Identifier
,fields[7] as Original_venue_name
,fields[8] as Normalized_venue_name
,fields[9] as Journal_ID_mapped_to_venue_name
,fields[10] as Conference_ID_mapped_to_venue_name
,fields[11] as Paper_rank
from (select string_to_array(rec,' ') as fields
from Papers_stg
) t
where cardinality(fields) = 11
For fields conversion checks you might want to use the concept described here

Your only option is to use row-by-row processing. Write shell script (for example) that will loop thru input file and send each row to "copy" then check execution result, then write failed rows to some "err_input.txt".
More complicated logic can increase processing speed. Using "portions" instead of row-by-row and use row-by-row logic on failed segments.

Consider using pgloader
Check BATCHES AND RETRY BEHAVIOUR

You could use an BEFORE INSERT - trigger and check your criteria. If the record fails the check, write a log (or an entry into a separate table) and return null. You could even correct some values, if possible and feasible.
Of course, if checking criteria requires other queries (like finding duplicate keys etc.), you might get a performance issue. But I'm not sure which kind of "different problems in different rows" you mean...
Confer also an answer on StackExchange Database Administrators, and the following example taken from Bartosz Dmytrak at PostgreSQL forum:
CREATE OR REPLACE FUNCTION "myschema"."checkTriggerFunction" ()
RETURNS TRIGGER
AS
$BODY$
BEGIN
IF EXISTS (SELECT 1 FROM "myschema".mytable WHERE "MyKey" = NEW."MyKey")
THEN
RETURN NULL;
ELSE
RETURN NEW;
END IF;
END;
$BODY$
LANGUAGE plpgsql;
and trigger:
CREATE TRIGGER "checkTrigger"
BEFORE INSERT
ON "myschema".mytable
FOR EACH ROW
EXECUTE PROCEDURE "myschema"."checkTriggerFunction"();

Related

If record exists INSERT from the next row (Sql)

I have imported a .csv file into my database table.
It has three columns, for example
(numbers, first_name, last_name)
numbers column's rows start from 1 to 200.
Now, I want to import a new .csv file into that same table. I will do it every day. Each time rows start from 1 to some number.
New .csv file's "numbers" values start at 1 and go to 500.
When I import the new .csv file, it needs to continue from 201. It shouldn't update or delete old rows. Vice versa, it needs to continue rows from 201.
How can I do it? Please help me.

As you never want to use the value provided from the CSV file, define numbers as an identity column:
create table the_table
(
numbers integer not null generated always as identity,
first_name text,
last_name text
);
Then when you import the CSV, only import first_name and last_name. How you do that, depends on the tool you use to import the file. Unfortunately the built-in copy command or psql's \copy can import only some columns of a file. You an also apply this to an existing table that already contains data.
If you can't import only some columns, you can create a sequence that will always be used through a trigger. This will effectively ignore any value provided for the numbers column:
create sequence numbers_seq
start with 201; --<< or whatever you need depend
create table the_table
(
numbers integer not null,
first_name text,
last_name text
);
create function set_number()
returns trigger
as
$$
begin
new.numbers := nextval('stuff.numbers_seq');
return new;
end;
$$
language plpgsql;
create trigger populate_numbers_trigger
before insert on the_table
for each row
execute procedure set_number();
During an import (any INSERT actually) the values will always be taken from the sequence, regardless of what was specified in the INSERT statement. This works independently of any tool. The trigger however is slightly slower than the identity column.

SQLiteexception database disk image is malformed

I have a weird error with a SQLite Database: You can download it here
Everytime I try to insert something in the Table "CurrencyTransactions" it fails because a new column called 7 appeared for no reason.
I tried to drop the table but
I ran PRAGMA integrity_check but I've this error then
Then I tried to export a .sql file and to import it again in a fresh new database but
1) If I import the structure only, it works fine and I don't have the 7 column anymore
2) If I import the entries then, it fails with this error:
It means something like: "Error in process #74: not an error"
To finish, I also tried this solution but the new database created is empty.
What can I do? I really need to save the entries.

What I suggest is in DB Browser.
File/Export/Database to SQL file.
Select All (for all tables)
Other options up to you other than Export Everything
Save the file.
Close the database.
Open a new database e.g nadekobotfix.db. (could be same name but different location)
Note 1-6 takes a minute or so (just under 60k).
Do the hard work according to :-
You may need to remove/ignore the first and last lines (BEGIN TRANSACTION; and the subsequent COMMIT;)
You would probably not be able to run the generated SQL directly due to constraints (tried this an failed with constraints).
You need to copy sections from the file and run according to the hierarchy as imposed by the constraints (foreign keys). If you have CHECK constraints these may need to be considered. (no Triggers to worry about).
Running SELECT * FROM sqlite_master WHERE type = 'table' AND instr(sql,'CHECK'); returns nothing so there are no CHECK constraints.
Indexes could/should be left till last (as they are in the generated SQL).
A section would consist of a table's create statement along with the insert statements.
You may wish to create a spreadsheet of the tables(sections) marking them off when they have been done.
The following query could assist as those with NA could be done first
SELECT CASE WHEN instr(sql,'FOREIGN KEY') THEN 'FK' ELSE 'NA' END AS fkey, name,sql
FROM sqlite_master
WHERE type = 'table' AND name NOT LIKE 'sqlite%' ORDER BY instr(sql,'FOREIGN KEY')
you could export individual tables from DB Browser for SQlite marking them off when done.
You may wish to do an integrity_check at regular intervals.
If this works (you might have to make adjustments to the SQL) then you can rename the old db and then rename the new (or move the old and the copy the new if using the same database name).
Note you may still have to determine how the corruption occurred.
You may wish to backup the database regularly.
You may wish to have a look at How To Corrupt An SQLite Database File
You may wish to heed :-
With few exceptions, analysis of a corrupt database does not normally
help to determine what went wrong. A better approach to avoiding
"danger", we have found, is to read and understand
https://www.sqlite.org/howtocorrupt.html
* in database main *
Page 10628: btreeInitPage() returns error code 11
This indicates that the page header is so badly corrupted that SQLite
cannot interpret this page at all. One possible reason: page 10628
has been zeroed. Can you look at a hex dump of that page? (Remember
that SQLite numbers pages beginning with 1, so the start of the page
is pgsz*10627 where pgsz is the page size.)
-- D. Richard Hipp
“btreeInitPage() returns error code 11”
Sample adjustment required
The Reminders table has a column called When, this is an SQL keyword (inadvisable column name IMO) so the generated SQL for the INSERT doesn't wrap the column name so you will get an error.
i.e. :-
CREATE TABLE IF NOT EXISTS `Reminders` (
`Id` INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT,
`ChannelId` INTEGER NOT NULL,
`IsPrivate` INTEGER NOT NULL,
`Message` TEXT,
`ServerId` INTEGER NOT NULL,
`UserId` INTEGER NOT NULL,
`When` TEXT NOT NULL,
`DateAdded` TEXT
);
INSERT INTO `Reminders` (Id,ChannelId,IsPrivate,Message,ServerId,UserId,When,DateAdded) VALUES (1270,367886754973351936,1,'Birthday Day',318127386367623170,367886754973351936,'2018-05-03 22:07:48.1860996','2018-03-18 22:07:48.186101'),
(1271,248278722656993281,1,'to remind Chanmi to remind Jayos to DeagleMomoka',318127386367623170,248278722656993281,'2018-05-05 22:08:58.4915565','2018-03-18 22:08:58.4915582'),
(1376,170240129414201344,1,'timely',318127386367623170,170240129414201344,'2018-03-29 09:00:29.4476776','2018-03-28 09:00:29.447679'),
(1377,373301201158144000,1,'timely',318127386367623170,373301201158144000,'2018-03-29 09:50:14.1631563','2018-03-28 09:50:14.1631577'),
(1378,248278722656993281,1,'timely',318127386367623170,248278722656993281,'2018-03-29 11:24:27.0250275','2018-03-28 11:24:27.025029'),
(1379,421433212716318721,1,'to timely',318127386367623170,421433212716318721,'2018-03-29 19:21:17.7465563','2018-03-28 19:21:17.7465584'),
(1380,346513954966863872,1,'t',318127386367623170,346513954966863872,'2018-03-29 19:42:23.4758798','2018-03-28 19:42:23.4758816'),
(1381,272735316002209792,1,'t!daily',318127386367623170,272735316002209792,'2018-03-29 21:01:47.5616218','2018-03-28 21:01:47.5616236'),
(1382,298272937243312132,1,'timely',318127386367623170,298272937243312132,'2018-03-29 23:18:02.8826873','2018-03-28 23:18:02.8826891'),
(1383,332340162774302720,1,'t',318127386367623170,332340162774302720,'2018-03-30 01:55:21.4704139','2018-03-29 01:55:21.4704156'),
(1384,367165474246754314,1,'tatyahaksodoeo',318127386367623170,367165474246754314,'2018-03-30 03:46:18.8805182','2018-03-29 03:46:18.8805196'),
(1385,290086674761908225,1,'timely',318127386367623170,290086674761908225,'2018-03-30 07:02:33.4115303','2018-03-29 07:02:33.4115321'),
(1386,168064128500367360,1,'timely',318127386367623170,168064128500367360,'2018-03-30 07:19:09.1915867','2018-03-29 07:19:09.1915885');
would have to be changed to use (square brackets, single or double quotes or grave accents can be used to enclose/wrap/quote the offending keyword):-
.......INSERT INTO `Reminders` (Id,ChannelId,IsPrivate,Message,ServerId,UserId,[When],DateAdded) ......
Likewise table SelfAssignableRoles has the GROUP keyword as a column name.
Likewise table Permissionv2 and table StartupCommand have the INDEX keyword as a column name.
Potential Issue
As an exercise I've tried doing the above and have managed to get 67 out of the 71 tables (66 out of 70 of your tables as sqlite_sequence is automatically created).
However, there appears to be an issue, between the Clubs table and the DiscordUser table. I believe that there is a circular reference between them. Thus as WaifuInfo and WaifuUpdates are reliant upon the DiscordUser table and as WaifuItem is realiant upon WaifInfo. The tables mentioned here have not been successfully copied.
A word of warning. If you attempt to create Clubs and or DiscordUser using the existing constraints you may end up in a situation where one always has to exist.
e.g. if DiscordUser exists but Clubs doesn't then
DROP TABLE IF EXISTS `DiscordUser`;
results in :-
no such table: main.Clubs: DROP TABLE IF EXISTS `DiscordUser`;
If you then create Clubs and try the DROP with a very basic (no Constraints) using :-
CREATE TABLE IF NOT EXISTS `Clubs` (ID INTEGER PRIMARY KEY);
DROP TABLE IF EXISTS `DiscordUser`;
The result is good as per :-
Query executed successfully: DROP TABLE IF EXISTS `DiscordUser`; (took 1ms)
Now try to DROP Clubs using :-
--CREATE TABLE IF NOT EXISTS `Clubs` (ID INTEGER PRIMARY KEY);
--DROP TABLE IF EXISTS `DiscordUser`;
DROP TABLE IF EXISTS `Clubs`;
and you can't as DiscordUser doesn't exist as per :-
no such table: main.DiscordUser: DROP TABLE IF EXISTS `Clubs`;
I've tried closing the database in case it was a caching issue but the behaviour remains.
As such, I'd strongly suggest having a good look at the constraint usage and being sure of correcting the issues before trying to copy all of the tables (I guess that there is a chance that this could be part of the cause of the corruption, however why/how is way beyond me).
P.S. The method I used was (1-6):-
Then for 7 :-
Run the sqlite_master query, from above, select all cells and copy, then drop the results into a spreadsheet (you could drop the sql column as the create gets truncated unless you try to fiddle with the delimiters).
Open the exported file (I used NotePad++) in your Editor.
Open a new DB in DB Browser (will refer to it as DBB from now) for SQLite.
In DBB in EXEC SQL tab, input PRAGMA integrity_check, run to check.
Create new tab (for next SQL).
Switch to Spreadsheet and copy the first table name that isn't marked as done.
Switch to Editor do find on EXISTS copied_table_name
Select the section (i.e the CREATE statement along to and including the last row to be inserted, note can be a pain for the larger tables so might be easier to create separate export for those tables). Copy the selection to the clipboard.
Paste into the empty tab and run.
If OK then
in DBB click to create a new tab for the next
switch to spreadsheet and mark table as DONE
goto 5.
If not OK then
If you can fix the issue by altering the SQL fix the SQL (e.g. column name needs enclosing/wrapping/quoting) and then go to 9.
If the issue is due to Constraints then go to 5 but select the table causing the constraint.
OK the issue with the DiscordUser/Club tables is that a Clubs.Ownerid requires a DiscordUser. So clubs cannot be added without the relevant Discord users (id's 1,2,7,14 and 32). Some DiscordUsers are club members so they require a club to exist.
What I have done is to load the DiscordUsers rows for the Club owners changing their ClubId to null. Load the Clubs. Update the ClubId's of the DiscordUsers so they are members of the club that they were before (i.e. undo the null) and to load the rest of the nearly 600 Discordusers (excluding those already loaded).
Here's the SQL I used for that part (note except for the Discorduser, Clubs and the 3 waifu tables, all other tables have been successfully created and loaded).
INSERT INTO `DiscordUser` (Id,AvatarId,Discriminator,UserId,DateAdded,Username,ClubId,LastLevelUp,NotifyOnLevelUp,LastXpGain,TotalXp,IsClubAdmin,CurrencyAmount) VALUES
-- ClubId was 6 changed to null
(1,'6d5212a0f5e862d57c8ffc6f254a2e85','1458',299779864045682689,'2017-10-07 18:02:04.8287878','Anubis',NULL,'2018-03-27 02:22:26.362966',0,'2017-11-17 01:19:14.0313957',7056,1,280),
-- Owns a club but not in a club
(2,'3b37e0f635706f81fdde2b6de9889283','9810',181200115539640321,'2017-10-07 18:04:39.767728','AnnaHime',NULL,'2018-01-02 02:27:38.8011863',0,'2017-11-16 01:29:49.0371488',429,0,360),
-- ClubId null was 3
(7,'612c67b6eb57d8806dcc92ed45b3a6d0','0396',177502331582021639,'2017-10-07 18:11:09.7830603','Tsuchimursu',NULL,'2018-03-28 17:45:53.7399883',0,'2017-11-17 15:53:59.084885',18156,1,4725),
-- ClubId null was 4
(14,'b2dd362171277337294de325bf92ad6a','3267',215597863441268737,'2017-10-07 18:45:54.8092675','LaLaâ˜†Star',NULL,'2018-01-14 20:52:15.7531274',0,'2017-11-08 19:00:22.7778305',2061,1,286),
-- ClubId null was 5
(32,'667f4d802b977c4d4be974e35ae63c55','2593',251689019929395200,'2017-10-08 00:58:16.6089546','username',NULL,'2018-03-28 07:27:34.9348084',0,'2017-11-17 20:02:14.0283998',4704,1,1188),
-- ClubId was 2 changed to NULL
(91,'0adb399c9f2cd94370038e2452ab8c8d','6790',346513954966863872,'2017-10-13 05:48:51.7788964','mayoi',NULL,'2018-03-24 02:50:06.8970518',0,'2017-11-17 20:01:29.0692552',7635,1,515)
;
INSERT INTO `Clubs` (Id,DateAdded,Discrim,ImageUrl,MinimumLevelReq,Name,OwnerId,Xp,Description) VALUES
(2,'2017-11-14 07:39:57.5091592',1,'https://lh3.googleusercontent.com/_7WKFouxTx1fdFpnmmuykDAd5SoiiJOPzHdRmXKOmRRZhV5Ba4V_kZct5ooVjQ9BuzU=w300',5,'We â¤ waifus',91,40137,'Love your waifus short & tall, big & small, cute as dolls, we love ''em all!'),
(3,'2017-12-11 07:00:59.3762914',1,'',30,'Den of Faes',7,11607,NULL),
(4,'2017-12-11 07:03:59.093402',1,'',5,'Skeleton Enthusiasts',14,657,NULL),
(5,'2017-12-11 07:05:56.9111719',1,'',5,'Saki''s Juice',32,2610,NULL),
(6,'2017-12-22 04:46:24.7271709',1,'',5,'nap pile',1,24870,'For the sleeping beauties and the wandering insomniacs who enjoy a good night sleep.')
;
UPDATE `DiscordUser` SET ClubId = 6 WHERE Id=1;
UPDATE `DiscordUser` SET ClubId = 3 WHERE Id=7;
UPDATE `DiscordUser` SET ClubId = 4 WHERE Id=14;
UPDATE `DiscordUser` SET ClubId = 5 WHERE Id=32;
UPDATE `DiscordUser` SET ClubId = 2 WHERE Id=91;
-- LOAD Remaining DiscordUser rows (note incomplete)
INSERT INTO `DiscordUser` (Id,AvatarId,Discriminator,UserId,DateAdded,Username,ClubId,LastLevelUp,NotifyOnLevelUp,LastXpGain,TotalXp,IsClubAdmin,CurrencyAmount) VALUES
--(1,'6d5212a0f5e862d57c8ffc6f254a2e85','1458',299779864045682689,'2017-10-07 18:02:04.8287878','Anubis',6,'2018-03-27 02:22:26.362966',0,'2017-11-17 01:19:14.0313957',7056,1,280),
--(2,'3b37e0f635706f81fdde2b6de9889283','9810',181200115539640321,'2017-10-07 18:04:39.767728','AnnaHime',NULL,'2018-01-02 02:27:38.8011863',0,'2017-11-16 01:29:49.0371488',429,0,360),
(3,'a3cd92d397ad357834d0e6c9f10bfc59','0429',145356302347010048,'2017-10-07 18:04:49.786657','Rebel Lucy',NULL,'2018-03-26 12:55:21.1149964',0,'2017-11-17 22:21:24.0263741',6876,0,3600),
(4,'7225dccaab1c93896657a61e18595378','5286',84689434536050688,'2017-10-07 18:05:44.765554','scarletflame234',NULL,'2018-03-28 22:56:28.7427437',0,'2017-11-17 23:21:41.4446535',13368,0,288),
(5,'c1316bc0673f4a2709b3ce550ed54395','0760',303279191116480514,'2017-10-07 18:06:39.7664015','zachary',NULL,'2018-03-02 03:48:43.4817755',0,'2017-11-17 18:44:14.1082867',210,0,50),
(6,'2ed95eae7c3088c46b23e71578dacc42','8801',161369834314137601,'2017-10-07 18:07:04.7672808','Kou',NULL,'2018-03-07 06:24:32.3405246',0,'2017-11-17 23:20:00.0648699',2640,0,55),
--(7,'612c67b6eb57d8806dcc92ed45b3a6d0','0396',177502331582021639,'2017-10-07 18:11:09.7830603','Tsuchimursu',3,'2018-03-28 17:45:53.7399883',0,'2017-11-17 15:53:59.084885',18156,1,4725),
(8,'5b1d239935ab4dd6d3eee98954601d52','9859',179093512610906113,'2017-10-07 18:13:54.7939334','TheCorty',NULL,'2017-11-12 12:07:36.4752178',0,'2017-11-12 23:47:26.4744132',2460,0,205), ...........
NOTE The SQL from -- LOAD Remaining DiscordUser rows, will not work as it's only intended to show how ID's 1,2 and 7 have been commented out, as should be rows 14, 32 and 91 as they have already been loaded, the other close to 600 rows should be included.
Note I've just also loaded the outstanding 3 waifu tables so all data can be retrieved (assuming that none has been lost due to the corruption). PRAGMA integrity_check; returns OK.

SQL constraint to prevent updating a column based on its prior value

Can a Check Constraint (or some other technique) be used to prevent a value from being set that contradicts its prior value when its record is updated.
One example would be a NULL timestamp indicating something happened, like "file_exported". Once a file has been exported and has a non-NULL value, it should never be set to NULL again.
Another example would be a hit counter, where an integer is only permitted to increase, but can never decrease.
If it helps I'm using postgresql, but I'd like to see solutions that fit any SQL implementation

Use a trigger. This is a perfect job for a simple PL/PgSQL ON UPDATE ... FOR EACH ROW trigger, which can see both the NEW and OLD values.
See trigger procedures.

lfLoop has the best approach to the question. But to continue Craig Ringer's approach using triggers, here is an example. Essentially, you are setting the value of the column back to the original (old) value before you update.
CREATE OR REPLACE FUNCTION example_trigger()
RETURNS trigger AS
$BODY$
BEGIN
new.valuenottochange := old.valuenottochange;
new.valuenottochange2 := old.valuenottochange2;
RETURN new;
END
$BODY$
LANGUAGE plpgsql VOLATILE
COST 100;
DROP TRIGGER IF EXISTS trigger_name ON tablename;
CREATE TRIGGER trigger_name BEFORE UPDATE ON tablename
FOR EACH ROW EXECUTE PROCEDURE example_trigger();

One example would be a NULL timestamp indicating something happened,
like "file_exported". Once a file has been exported and has a non-NULL
value, it should never be set to NULL again.
Another example would be a hit counter, where an integer is only
permitted to increase, but can never decrease.
In both of these cases, I simply wouldn't record these changes as attributes on the annotated table; the 'exported' or 'hit count' is a distinct idea, representing related but orthogonal real world notions from the objects they relate to:
So they would simply be different relations. Since We only want "file_exported" to occur once:
CREATE TABLE thing_file_exported(
thing_id INTEGER PRIMARY KEY REFERENCES(thing.id),
file_name VARCHAR NOT NULL
)
The hit counter is similarly a different table:
CREATE TABLE thing_hits(
thing_id INTEGER NOT NULL REFERENCES(thing.id),
hit_date TIMESTAMP NOT NULL,
PRIMARY KEY (thing_id, hit_date)
)
And you might query with
SELECT thing.col1, thing.col2, tfe.file_name, count(th.thing_id)
FROM thing
LEFT OUTER JOIN thing_file_exported tfe
ON (thing.id = tfe.thing_id)
LEFT OUTER JOIN thing_hits th
ON (thing.id = th.thing_id)
GROUP BY thing.col1, thing.col2, tfe.file_name

Stored procedures and functions in PostgreSQL have access to both old and new values, and that code can access arbitrary tables and columns. It's not hard to build simple (crude?) finite state machines in stored procedures. You can even build table-driven state machines that way.

A trigger to find the sum of one field in a different table and error if it's over a certain value in oracle

I have two tables
moduleprogress which contains fields:
studentid
modulecode
moduleyear
modules which contains fields:
modulecode
credits
I need a trigger to run when the user is attempting to insert or update data in the moduleprogress table.
The trigger needs to:
look at the studentid that the user has input and look at all modules that they have taken in moduleyear "1".
take the modulecode the user input and look at the modules table and find the sum of the credits field for all these modules (each module is worth 10 or 20 credits).
if the value is above 120 (yearly credit limit) then it needs to error; if not, input is ok.
Does this make sense? Is this possible?
#a_horse_with_no_name
This looks like it will work but I will only be using the database to input data manually so it needs to error on input. I'm trying to get a trigger similar to this to solve the problem(trigger doesn't work) and forget that "UOS_" is before everything. Just helps me with my database and other functions.
CREATE OR REPLACE TRIGGER "UOS_TESTINGS"
BEFORE UPDATE OR INSERT ON UOS_MODULE_PROGRESS
REFERENCING NEW AS NEW OLD AS OLD
DECLARE
MODULECREDITS INTEGER;
BEGIN
SELECT
m.UOS_CREDITS,
mp.UOS_MODULE_YEAR,
SUM(m.UOS_CREDITS)
INTO MODULECREDITS
FROM UOS_MODULE_PROGRESS mp JOIN UOS_MODULES m
ON m.UOS_MODULE_CODE = mp.UOS_MODULE_CODE
WHERE mp.UOS_MODULE_YEAR = 1;
IF MODULECREDITS >= 120 THEN
RAISE_APPLICATION_ERROR(-20000, 'Students are only allowed to take upto 120 credits per year');
END IF;
END;
I get the error message :
8 23 PL/SQL: ORA-00947: not enough values
4 1 PL/SQL: SQL Statement ignored

I'm not sure I understand your description, but the way I understand it, this can be solved using a materialized view, which might give better transactional behaviour than the trigger:
CREATE MATERIALIZED VIEW LOG
ON moduleprogress WITH ROWID (modulecode, studentid, moduleyear)
INCLUDING NEW VALUES;
CREATE MATERIALIZED VIEW LOG
ON modules with rowid (modulecode, credits)
INCLUDING NEW VALUES;
CREATE MATERIALIZED VIEW mv_module_credits
REFRESH FAST ON COMMIT WITH ROWID
AS
SELECT pr.studentid,
SUM(m.credits) AS total_credits
FROM moduleprogress pr
JOIN modules m ON pr.modulecode = m.modulecode
WHERE pr.moduleyear = 1
GROUP BY pr.studentid;
ALTER TABLE mv_module_credits
ADD CONSTRAINT check_total_credits CHECK (total_credits <= 120)
But: depending on the size of the table this might however be slower than a pure trigger based solution.
The only drawback of this solution is, that the error will be thrown at commit time, not when the insert happens (because the MV is only refreshed on commit, and the check constraint is evaluated then)

mysql duplicates with LOAD DATA INFILE

When using LOAD DATA INFILE, is there a way to either flag a duplicate row, or dump any/all duplicates into a separate table?

From the LOAD DATE INFILE documentation:
The REPLACE and IGNORE keywords control handling of input rows that duplicate existing rows on unique key values:
If you specify REPLACE, input rows replace existing rows. In other words, rows that have the same value for a primary key or unique index as an existing row. See Section 12.2.7, “REPLACE Syntax”.
If you specify IGNORE, input rows that duplicate an existing row on a unique key value are skipped. If you do not specify either option, the behavior depends on whether the LOCAL keyword is specified. Without LOCAL, an error occurs when a duplicate key value is found, and the rest of the text file is ignored. With LOCAL, the default behavior is the same as if IGNORE is specified; this is because the server has no way to stop transmission of the file in the middle of the operation.
Effectively, there's no way to redirect the duplicate records to a different table. You'd have to load them all in, and then create another table to hold the non-duplicated records.

It looks as if there actually is something you can do when it comes to duplicate rows for LOAD DATA calls. However, the approach that I've found isn't perfect: it acts more as a log for all deletes on a table, instead of just for LOAD DATA calls. Here's my approach:
Table test:
CREATE TABLE test (
id INTEGER PRIMARY KEY,
text VARCHAR(255) DEFAULT NULL
);
Table test_log:
CREATE TABLE test_log (
id INTEGER, -- not primary key, we want to accept duplicate rows
text VARCHAR(255) DEFAULT NULL,
time TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
Trigger del_chk:
delimiter //
drop trigger if exists del_chk;
CREATE TRIGGER del_chk AFTER DELETE ON test
FOR EACH ROW
BEGIN
INSERT INTO test_log(id,text) values(OLD.id,OLD.text);
END;//
delimiter ;
Test import (/home/user/test.csv):
1,asdf
2,jkl
3,qwer
1,tyui
1,zxcv
2,bnm
Query:
LOAD DATA INFILE '/home/ken/test.csv'
REPLACE INTO TABLE test
FIELDS
TERMINATED BY ','
LINES
TERMINATED BY '\n' (id,text);
Running the above query will result in 1,asdf, 1,tyui, and 2,jkl being added to the log table. Based on a timestamp, it could be possible to associate the rows with a particular LOAD DATA statement.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas