I have a table ("Posts") that tracks social media data. I want to import a .csv file that contains the new data.
The .csv file, from an outside service, maintains a rolling three months of data.
I'd like to (1) open the .csv file, (2) identify each line, based on date, that doesn't exist in my "Posts" table, and then (3) import the data for the new line without changing data already in the table.
I've been digging through the forums but am not finding what I need.
For the import step, I'm trying:
DoCmd.TransferText TransferType:=acImportDelim, TableName:="tblPosts", FileName:="C:\Users\[myname]\Desktop\Historical Reports\Posts.csv", HasFieldNames:=True
Unfortunately, I have somewhat different field names than column names. Do I need to skip the first line of the .csv and then build a custom SQL Insert statement?
Consider an SQL query (as the MS Access engine can directly query CSV files) that runs the classic duplicate avoidance queries with NOT IN, NOT EXISTS, LEFT JOIN / IS NULL. Adjust below column names including Date accordingly:
INSERT INTO Posts (col1, col2, col3)
SELECT t.col1, t.col2, t.col3)
FROM [text;database=C:\Users\[myname]\Desktop\Historical Reports].Posts.csv AS t
LEFT JOIN Posts p
ON t.Date = p.Date
WHERE p.Date IS NULL;
INSERT INTO Posts (col1, col2, col3)
SELECT t.col1, t.col2, t.col3)
FROM [text;database=C:\Users\[myname]\Desktop\Historical Reports].Posts.csv AS t
WHERE t.Date NOT IN
(SELECT p.Date FROM Posts p);
INSERT INTO Posts (col1, col2, col3)
SELECT t.col1, t.col2, t.col3)
FROM [text;database=C:\Users\[myname]\Desktop\Historical Reports].Posts.csv AS t
WHERE NOT EXISTS
(SELECT 1 FROM Posts p
WHERE p.Date = t.Date);
Related
I have been busting my head for some time already and without any result.
Honestly I think that I need fresh eyes on this query.
I have written a query that deletes data from one table and puts it into another table. What I can't really figure out, is how to update one column for those rows I am moving, within the same query.
Here is how the query looks:
INSERT table1_archive
SELECT * FROM (
DELETE table
OUTPUT
DELETED.*
WHERE <condition1>
) AS RowsToMove;
What I want is to add also
UPDATE table1 SET <my_column> = "" WHERE <condition1>
Since it is the same condition and table for delete and update, I was thinking that it makes no sense to call two different queries to do some actions for exactly the same rows.
What I want is to clear data out of the <my_column> either before moving rows to table1_archive, or after doing so.
I guess my question is: How would I apply this update statement to the selected rows I am about to insert into the table1_archive?
ANSWER
This question becomes a little redundant as the UPDATE statement was not necessary to achieve what I wanted. I could just list all my columns in the SELECT statement and replace the <my_column> with NULL, or '''.
You can simply manipulate the column to be updated in the select statement.
INSERT INTO table1_archive
SELECT Col1,Col2...,"" AS <my_column> FROM (
DELETE table
OUTPUT
DELETED.*
WHERE <condition1>
) AS RowsToMove;
You can do this in a single statement - but it requires that you enumerate the columns.
Assuming that your tables have columns (col1, col2, col3, mycol), where mycol should be set to null when copied to the archive, you would write this as:
with del as (
delete ...
output deleted.*
where ...
)
insert into table1_archive (col1, col2, col3, mycol)
select col1, col2, col3, null
from del
and also you can try this solution
UPDATE example_table1 table1 ,
(SELECT
my_column1, my_column2
FROM example_table2
WHERE
table1.my_column3=<condition1>
) table2
SET
table1.my_column1 = table2.my_column1,
table1.my_column2 = table2.my_column2
where table1.ID = table2.ID
UPDATE table1 SET
table1.my_column1= table2.my_column1
FROM
example_table1 AS table1
INNER JOIN example_table2 AS table2
ON table1.ID = table2.ID
WHERE
table1.my_column2 = <condition1>
I have searched but can't get answer for this (maybe wrong keyword...)
I come to this problem today when I need to create a procedure to calculate data to save to 2 report table in 2 different schemas. Let say those two tables have same structure.
The query to calculate data may take more than 60 seconds (data may or may not change the result of SELECT statemant if run again)
I have two way to insert data to those two table:
Just run insert TWO time with that same select query.
Using a GTT - global temporary table to save calculated data from SELECT query, then INSERT to those two tables using data in that GTT.
I wonder if Oracle will keep cache of result for the SELECT query so that the first way will have better performance then second way (but have longer code, and duplicate code, not synchronized?).
So could anyone confirm and explain the right way to solve this for me? Or a better way of doing this?
Thank you,
Appendix 1:
INSERT INTO report_table (col1, col2, ....)
SELECT .....
FROM .....
--(long query)
;
INSERT INTO center_schema.report_table (col1, col2, ....)
SELECT .....
FROM .....
--same select query as above
;
And 2:
INSERT INTO temp_report_table(col1, col2, ...)
SELECT .....
FROM .....
--(long query)
;
INSERT INTO report_table (col1, col2, ....)
SELECT col1, col2, ....
FROM temp_report_table
;
INSERT INTO center_schema.report_table (col1, col2, ....)
SELECT col1, col2, ....
FROM temp_report_table
;
No, you have a third option - the wonderful multi-insert...
INSERT ALL
INTO report_table (col1, col2, ....)
VALUES (X.col1, X.col2, ...)
INTO center_schema.report_table (col1, col2, ...)
VALUES (X.col1, X.col2, ...)
SELECT col1, col2, ...
FROM your_table X
--(long query)
;
For a detailed info on this nice way of loading multiple tables at once please refer to the respective part of Oracle documentation.
I'm trying to append two tables in MS Access at the moment. This is my SQL View of my Query at the moment:
INSERT INTO MainTable
SELECT
FROM Table1 INNER JOIN Table2 ON Table1.University = Table2.University;
Where "University" is the only field name that would have similarities between the two tables. When I try and run the query, I get this error:
Query must have at least one destination field.
I assumed that the INSERT INTO MainTable portion of my SQL was defining the destination, but apparently I am wrong. How can I specify my destination?
You must select something from your select statement.
INSERT INTO MainTable
SELECT col1, col2
FROM Table1 INNER JOIN Table2 ON Table1.University = Table2.University;
Besides Luke Ford's answer (which is correct), there's another gotcha to consider:
MS Access (at least Access 2000, where I just tested it) seems to match the columns by name.
In other words, when you execute the query from Luke's answer:
INSERT INTO MainTable
SELECT col1, col2
FROM ...
...MS Access assumes that MainTable has two columns named col1 and col2, and tries to insert col1 from your query into col1 in MainTable, and so on.
If the column names in MainTable are different, you need to specify them in the INSERT clause.
Let's say the columns in MainTable are named foo and bar, then the query needs to look like this:
INSERT INTO MainTable (foo, bar)
SELECT col1, col2
FROM ...
As other users have mentioned, your SELECT statement is empty. If you'd like to select more that just col1, col2, however, that is possible. If you want to select all columns in your two tables that are to be appended, you can use SELECT *, which will select everything in the tables.
I'm working with some code. There are several queries whose effect is, if the row exists with some of the data filled in, then that row is updated with the rest of the data, and if the row does not exist, a new one is created. They look like this:
INSERT INTO table_name (col1, col2, col3)
SELECT %s AS COL1, %s AS COL2, %s AS COL3
FROM ( SELECT %s AS COL1, %s AS COL2, %s AS COL3 ) A
LEFT JOIN table_name B
ON B.COL1 = %s
AND B.COL2 = %s --note: doesn't mention all columns here
WHERE B.id IS NULL
LIMIT 1
I can mimic this pattern and it seems to work, but I'm confused as to what is actually going on behind the scenes. Can anyone elucidate how this actually works? I'm using PostgreSQL.
Are you sure that is updating using only that piece of code?
What is happing is that you are doing left join with the table_name (the table to insert new records) and filtering only for rows that doesn't exist in that table. (WHERE B.id IS NULL)
Is like doing "not exists", only in a different way.
I hope my answer could help you.
Regards.
The LEFT JOIN/IS NULL means that the query is INSERTing records that don't already exist. That's assuming the table defined on the INSERT clause is the same as the one in the LEFT JOIN clause - careful about abstracting...
I'm interested to know what %s is
How do I append only distinct records from a master table to another table, when the master may have duplicates. Example - I only want the distinct records in the smaller table but I need to insert/append records to what I already have in the smaller table.
Ignoring any concurency issues:
insert into smaller (field, ... )
select distinct field, ... from bigger
except
select field, ... from smaller;
You can also rephrase it as a join:
insert into smaller (field, ... )
select distinct b.field, ...
from bigger b
left join smaller s on s.key = b.key
where s.key is NULL
If you don't like NOT EXISTS and EXCEPT/MINUS (cute, Remus!), you have also LEFT JOIN solution:
INSERT INTO smaller(a,b)
SELECT DISTINCT master.a, master.b FROM master
LEFT JOIN smaller ON smaller.a=master.a AND smaller.b=master.b
WHERE smaller.pkey IS NULL
You don't say the scale of the problem so I'll mention something I recently helped a friend with.
He works for an insurance company that provides supplemental Dental and Vision benefits management for other insurance companies. When they get a new client they also get a new database that can have 10's of millions of records. They wanted to identify all possible dupes with the data they already had in a master database of 100's of millions of records.
The solution we came up with was to identify two distinct combinations of field values (normalized in various ways) that would indicate a high probability of a dupe. We then created a new table containing MD5 hashes of the combos plus the id of the master record they applied to. The MD5 columns were indexed. All new records would have their combo hashes computed and if either of them had a collision with the master the new record would be kicked out to an exceptions file for some human to deal with it.
The speed of this surprised the hell out of us (in a nice way) and it has had a very acceptable false-positive rate.
You could use the distinct keyword to filter out duplicates:
insert into AnotherTable
(col1, col2, col3)
select distinct col1, col2, col3
from MasterTable
Based on Microsoft SQL Server and its Transact-SQL. Untested as always and the target_table has the same amount of rows as the source table (otherwise use columnnames between INSERT INTO and SELECT
INSERT INTO target_table
SELECT DISTINCT row1, row2
FROM source_table
WHERE NOT EXISTS(
SELECT row1, row2
FROM target_table)
Something like this would work for SQL Server (you don't mention what RDBMS you're using):
INSERT INTO table (col1, col2, col3)
SELECT DISTINCT t2.a, t2.b, t2.c
FROM table2 AS t2
WHERE NOT EXISTS (
SELECT 1
FROM table
WHERE table.col1 = t2.a AND table.col2 = t2.b AND table.col3 = t2.c
)
Tune where appropriate, depending on exactly what defines "distinctness" for your table.