Exporting data from spreadsheet to Pgsql database - sql

I have one huge spreadsheet file (1000+ lines) and one postgreSQL table. I need to compare spreadsheet data with postgresql table data and add fill the blank fields in table with data from spreadsheet and add entries not present in db table.
Yes, I can convert (via csv) whole spreadsheet into database table. But, there are unique values in both documents, so I will lose data doing this. Or, is it possible to compare 2 tables and fill missing fields in table A with data from table B?
Thanks in advance!

It's easy in SQL to compare two tables, and insert rows not in one table. For example:
INSERT INTO TableA
(col1, col2, col3)
SELECT
col1, col2, col3
FROM SpreadSheetTable
WHERE NOT EXISTS (
SELECT *
FROM TableA
WHERE TableA.col1 = SpreadSheetTable.col1
)
This query inserts all rows from SpreadSheetTable into TableA, except those rows for which TableA already contains a row with the same "col1" value.

Related

How to add NULL to a union of multiple tables that don't have the same number of columns?

I am trying to do a union of ten tables, all of which I want in a single flat file for data visualization. They share a lot of columns in common, but there is a decent amount of columns unique to each table that won't exist in others.
If it were only a few columns, I would do the following (see below) but this would take way too long for all of the columns and tables I am working with. Is there a way to do this without going through each column individually with NULL AS 'missingColumn'
SELECT COL1, COL2, COL3 FROM TABLE 1
UNION ALL
SELECT COL1, COL2, NULL AS COL3 FROM TABLE 2
I am using SQL Server
Thank you!

Merging 2 SQL tables with same table convention design without using update command

I have 2 tables in my SQL database:
And I want to merge them in a way the result will be:
This is just an example for 2 tables which need to be merged into one new table (The tables contain an example data, the statement should work for any amount of data inside the tables).
The ID which got different value in CSV should be updated into the new table for example:
ID 3's value is 'KKK' and in table T is 'CCC', then what should be updated is the CSV table.
You seem to want a left join and to match to the second table if available:
select t.id, coalesce(csv.value, t.value) as value
from t left join
csv
on t.id = csv.id;
If you want this in a new table, use the appropriate construct for your database, or use insert to insert into an existing table.

Transfer only rows that have different values as existing table data

I am using postgres and have 2 tables Transaction and Backup.
I would like to transfer rows of data from Transaction to Backup.
There will be new rows of data in Transaction table.
How do I transfer only rows of data that have different values as the existing data in Transaction table as I do not want to have duplicate rows of data?
As long as data in 1 of the column is different, I will transfer the row from Transaction to Backup.
e.g
Day 1: Transaction (20 rows) , Backup (20 rows) [All transaction file being backup to Backup at night]
Day 2: Transaction (40 rows), Backup(20 rows) [The additional 20 rows in Transaction may contain duplicate rows as the previous 20 rows in Transaction. I only want to transfer non-duplicate rows to Backup]
Reading between the lines I think this is a harder question than you know.
The real issue here is that you don't really know what has changed. If information is append-only and we can assume they are all visible when the last backup was made then we just select rows inserted after a point in time. If these are not good assumptions, then you are going to have a long-term issue. a_horse_with_no_name has an ok solution above assuming your data is append-only (all inserts, no updates), but it is not going to perform very well as these tables bet bigger.
A few options you might consider instead:
A table audit trigger would allow you to specify columns, values, etc as well as when they are changed and it could do this real-time. That would be my first solution.
Even if it is insert only you may want to store information in the backup table regarding max ids or the like and go back only one backup in checking. Then you could use a_horse_with_no_name's solution as a template.
Suppose you want to transfer rows from table Source to table Result. From your question, I understand that they have the same columns.
As you mentioned, you need values from Source different from the ones, that are already in Result.
SELECT * FROM [Source]
WHERE column NOT IN (SELECT column FROM Result)
It will return ,,new" records. Now you need insert it:
INSERT INTO Result
SELECT * FROM [Source]
WHERE column NOT IN (SELECT column FROM Result)
Try this
insert into Backup (fields1, fields2, ......)
select fields1, fields2 from Transaction t where your condition by date here and not exists (select * from Backup b where t.fields1 = b.fields1
t.fields2 = b.fields2
.....................
)
This will insert if any changes happens in transaction table. if you change an existing row from transaction table -NULL also included- will be inserted into backup table. but you shouldnt have primary key in your backup table because you wont be able to insert that row:
duplicate key value violates unique constraint "tb_backup_pkey"
will comes out.
this should work for entire row:
insert into backup (col1, col2, col3, col4)
select t.* from transactions t
EXCEPT
select * from backup b
If I understand you correctly you want to copy data from the transactions table to the backup table.
Something like this should do it. As you haven't shown us the actual table definitions I had to use dummy names for the columns. pk_col is the primary key column of the table.
insert into backup (pk_col, col1, col2, col3, col4)
select t.*
from transactions t
full outer join backup b on t.pk_col = b.pk_col
where t is distinct from b;
This assumes that the target table does not have a unique key. If it does you need to use an ON CONFLICT clause

Multiple values into one cell SQL Server

I have an item A than has one-many relationship with Table T1 (Col1, Col2, Col3, Col4).
Currently I am retrieving data from T1 for Item A in XML into a single column. Is there any more efficient way to do so rather than parsing into XML.
The thing is in my query I have 1 column for Table T1's value.

How to combine identical tables into one table?

I have 100s of millions of unique rows spread across 12 tables in the same database. They all have the same schema/columns. Is there a relatively easy way to combine all of the separate tables into 1 table?
I've tried importing the tables into a single table, but given this is a HUGE size of files/rows, SQL Server is making me wait a long time as if I was importing from a flat file. There has to be an easier/faster way, no?
You haven't given much info about your table structure, but you can probably just do a plain old insert from a select, like below. The example would take all records that don't already exist Table2 and Table3, and insert them into Table1. You could do this to merge everything from all your 12 tables into a single table.
INSERT INTO Table1
SELECT * FROM Table2
WHERE SomeUniqueKey
NOT IN (SELECT SomeUniqueKey FROM Table1)
UNION
SELECT * FROM Table3
WHERE SomeUniqueKey
NOT IN (SELECT SomeUniqueKey FROM Table1)
--...
Do what Jim says, but first:
1) Drop (or disable) all indices in the destination table.
2) Insert rows from each table, one table at a time.
3) Commit the transaction after each table is appended, otherwise much disk space will be taken up in case of a possible rollback.
4) Renable or recreate the indices after you are done.
If there is a possibility of duplicate keys, you may need to retain an index on the key field and have a NOT EXISTS clause to hold back the duplicate records from being added.