Strip first whitespace importing csv data

Strip first whitespace importing csv data - sql

I would like to import data into my postgresql table.
I have .csv file that is formated like this:
1; John Blake
2; Roberto Young
3;Mark Palmer
Any solution how to strip first whitespace where it exists?
i used following code
\copy users from 'users.csv' using delimiters E';'
And it does keep whitespaces

COPY to a temporary staging table and INSERT into the target table from there, trimming the text column.
CREATE TEMP TABLE tmp_x AS
SELECT * FROM users LIMIT 0; -- empty temp table with structure of target
\copy tmp_x FROM '/absolute/path/to/file' delimiters E';'; -- psql command (!)
INSERT INTO users
(usr_id, usr, ...) -- list columns
SELECT usr_id, ltrim(usr), ...
FROM tmp_x;
DROP TABLE tmp_x; -- optional; is destroyed at end of session automatically
ltrim() only trims space from the left of the string.
This sequence of actions performs better than updating rows in the table after COPY, which take longer and produce a dead rows. Also, only newly imported rows are manipulated this way.
Related answer:
Delete rows of a table specified in a text file in Postgres

You won't be able to use COPY alone to do that.
You can use an UPDATE coupled with trim:
UPDATE table SET column = trim(from column)
Or use a script to clean the data before bulk inserting the data to the DB.

Related

How to import daily csv data into table with generated columns postgres

I'm new to PostgreSQL and and looking for some guidance and best practice.
I have created a table by importing data from a csv file. I then altered the table by creating multiple generated columns like this:
ALTER TABLE master
ADD office VARCHAR(50)
GENERATED ALWAYS AS (CASE WHEN LEFT(location,4)='Chic' THEN 'CHI'
ELSE LEFT(location,strpos(location,'_')-1) END) STORED;
But when I try to import new data into the table I get the following error:
ERROR: column "office" is a generated column
DETAIL: Generated columns cannot be used in COPY.
My goal is to be able to import new data each day to the table and have the generated columns automatically populate in order to transform the data as I would like. How can I do so?

CREATE TEMP TABLE master (location VARCHAR);
ALTER TABLE master
ADD office VARCHAR
GENERATED ALWAYS AS (
CASE
WHEN LEFT(location, 4) = 'Chic' THEN 'CHI'
ELSE LEFT(location, strpos(location, '_') - 1)
END
) STORED;
--INSERT INTO master (location) VALUES ('Chicago');
--INSERT INTO master (location) VALUES ('New_York');
COPY master (location) FROM $$d:\cities.csv$$ CSV;
SELECT * FROM master;
Is this the structure and the behaviour you are expecting? If not, please provide more details regarding your table structure, your importable data and your importing commands.
Also, maybe when you try to import the csv file, the columns are not linked properly, or maybe the delimiter is not properly set. Try to specify each column in the exact order that appear in your csv file.
https://www.postgresql.org/docs/12/sql-copy.html
Note: d:\cities.csv contains:
Chicago
New_York
EDIT:
If columns positions are mixed up between table and csv, the following operation may come in handy:
1. create temporary table tmp (csv_column1 <data_type>, csv_column_2 <data_type>, ...); (including ALL csv columns)
2. copy tmp from '/path/to/file.csv';
3. insert into master (location, other_info, ...) select csv_column_3 as location, csv_column_7 as other_info, ... from tmp;
Importing data using an intermediate table may slow things down a little, but gives you a lot of flexibility.

I was getting the same error when importing to PG from a csv - I found that even though my column was generated, I still had to have it in the imported data, just left it empty. Worked fine when the column name was in there and mapped to my DB col name.

How to ignore some rows while importing from a tab separated text file in PostgreSQL?

I have a 30 GB tab separated text file which has more than 100 million rows, when I want to import this text file to a PostgreSQL table using \copy command, some rows cause error. how can I ignore those rows and also take a record of the ignored rows while importing to postgresql?
I connect to my machine by SSH so I can not use pgadmin!
it's very hard to edit the text file before importing because so many different rows have different problems. if there exists a way to check the rows one by one before importing and then run the \copy command for individual rows, it would be helpful.
Below is the code which generates the table:
CREATE TABLE Papers(
Paper_ID CHARACTER(8) PRIMARY KEY,
Original_paper_title TEXT,
Normalized_paper_title TEXT,
Paper_publish_year INTEGER,
Paper_publish_date DATE,
Paper_Document_Object_Identifier TEXT,
Original_venue_name TEXT,
Normalized_venue_name TEXT,
Journal_ID_mapped_to_venue_name CHARACTER(8),
Conference_ID_mapped_to_venue_name CHARACTER(8),
Paper_rank BIGINT,
FOREIGN KEY(Journal_ID_mapped_to_venue_name) REFERENCES Journals(Journal_ID),
FOREIGN KEY(Conference_ID_mapped_to_venue_name) REFERENCES Conferences(Conference_ID));

Don't load directly to your destination table but to a single column staging table.
create table Papers_stg (rec text);
Once you have all the data loaded you can the do verifications on the data using SQL.
Find records with wrong number of fields:
select rec
from Papers_stg
where cardinality(string_to_array(rec,' ')) <> 11
Create a table with all text fields
create table Papers_fields_text
as
select fields[1] as Paper_ID
,fields[2] as Original_paper_title
,fields[3] as Normalized_paper_title
,fields[4] as Paper_publish_year
,fields[5] as Paper_publish_date
,fields[6] as Paper_Document_Object_Identifier
,fields[7] as Original_venue_name
,fields[8] as Normalized_venue_name
,fields[9] as Journal_ID_mapped_to_venue_name
,fields[10] as Conference_ID_mapped_to_venue_name
,fields[11] as Paper_rank
from (select string_to_array(rec,' ') as fields
from Papers_stg
) t
where cardinality(fields) = 11
For fields conversion checks you might want to use the concept described here

Your only option is to use row-by-row processing. Write shell script (for example) that will loop thru input file and send each row to "copy" then check execution result, then write failed rows to some "err_input.txt".
More complicated logic can increase processing speed. Using "portions" instead of row-by-row and use row-by-row logic on failed segments.

Consider using pgloader
Check BATCHES AND RETRY BEHAVIOUR

You could use an BEFORE INSERT - trigger and check your criteria. If the record fails the check, write a log (or an entry into a separate table) and return null. You could even correct some values, if possible and feasible.
Of course, if checking criteria requires other queries (like finding duplicate keys etc.), you might get a performance issue. But I'm not sure which kind of "different problems in different rows" you mean...
Confer also an answer on StackExchange Database Administrators, and the following example taken from Bartosz Dmytrak at PostgreSQL forum:
CREATE OR REPLACE FUNCTION "myschema"."checkTriggerFunction" ()
RETURNS TRIGGER
AS
$BODY$
BEGIN
IF EXISTS (SELECT 1 FROM "myschema".mytable WHERE "MyKey" = NEW."MyKey")
THEN
RETURN NULL;
ELSE
RETURN NEW;
END IF;
END;
$BODY$
LANGUAGE plpgsql;
and trigger:
CREATE TRIGGER "checkTrigger"
BEFORE INSERT
ON "myschema".mytable
FOR EACH ROW
EXECUTE PROCEDURE "myschema"."checkTriggerFunction"();

Editing an oracle table after copying it

I have an oracle table with 2 columns, both of them are using NUMBERS data type, When I enter any number starting with 0 it removes the 0. So the solution is to change the data type to VARCHAR2. I have a script that
creates a temp table with VARCHAR2 and primary key
copies the old table
Drops the old table
Renames the temp to the old table
However I'm facing an issue. When copying the table, any data that was truncated before remains that way, is there anyway I can add a 0 at the start of the old data?. Below is the script I have created.
/* create a new table named temp */
CREATE TABLE TEMP_TABLE
(
IMEISV_PREFIX VARCHAR2(8),
IMEI_FLAG NUMBER(2),
CONSTRAINT IMEIV_PK PRIMARY KEY (IMEISV_PREFIX)
);
/* copy everything from the old table to the new temp table */
INSERT INTO TEMP_TABLE
SELECT * FROM REF_IMEISV_PREFIX;
/* Delete the original table */
DROP TABLE REF_IMEISV_PREFIX;
/* Rename the temp table to the original table */
RENAME TEMP_TABLE TO REF_IMEISV_PREFIX;

No there is not. When Oracle saves the data to the database, it saves it in the format at that time. All other information is removed. There is no way to restore historic data.
In fact, when you stored the data to the database before, let's say you do this:
insert into tableX (anumber) values ('01');
In fact it does:
insert into tableX (anumber) values (to_number('01'));
So it is lost from the very beginning. (Note that the example is actually a bad habit! You should never rely on casting in the database, always hand over the data in the right data type!)

If you need to show that leading zero your problem is a interface problem, not a database problem. You can format your output to show how many leading zero do you want.
If the data is a number let it as is.

How to add lines from text file to sqlite db rows that already exist?

I have 12 columns with +/- 2000 rows in a sqlite DB.
Now I want to add a 13th column with the same amount of rows.
If I import the text from a cvs file it will add this after the existing rows (now I have a 4000 row table)
How can I avoid adding it underneath these rows?
Do I need to create a script to run trough each row of the table and add the text from the cvs file for each row?

If you have the code that imported the original data, and if the data has not changed in the meantime, you could just drop the table and reimport it.
Otherwise, you indeed have to create a script that looks up the corresponding record in the table and updates it.
You could also import the new data into a temporary table, and then copy the values over with a command like this:
UPDATE MyTable
SET NewColumn = (SELECT NewColumn
FROM TempTable
WHERE ID = MyTable.ID)

I ended up using Razor SQL great program.
http://www.razorsql.com/

sql dump of data based on selection criteria

When extracting data from a table (schema and data) I can do this by right clicking on the database and by going to tasks->Generate Scripts and it gives me all the data from the table including the create script, which is good.
This though gives me all the data from the table - can this be changed to give me only some of the data from the table? e.g only data on the table after a certain dtmTimeStamp?
Thanks,

I would recommend extracting your data into a separate table using a query and then using generate scripts on this table. Alternatively you can extract the data separately into a flatfile using the export data wizard (include your column headers and use comma seperators with double quote field delimiters).
To make a copy of your table:
SELECT Col1 ,Col2
INTO CloneTable
FROM MyTable
WHERE Col3 = #Condition
(Thanks to #MarkD for adding that)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Strip first whitespace importing csv data - sql

You won't be able to use COPY alone to do that. You can use an UPDATE coupled with trim: UPDATE table SET column = trim(from column) Or use a script to clean the data before bulk inserting the data to the DB.

Related

How to import daily csv data into table with generated columns postgres

How to ignore some rows while importing from a tab separated text file in PostgreSQL?

Editing an oracle table after copying it

How to add lines from text file to sqlite db rows that already exist?

sql dump of data based on selection criteria

Categories

Resources