Eliminate duplicates automatically from table - sql

Table will be getting new data everyday from source system and i want the duplicates to be deleted automatically as soon as new data gets loaded to table.
Is it possible in bigquery?
I tried to create a view named sites_view in bigquery with below query
SELECT DISTINCT * FROM prd.sites
but duplicates not getting deleted automatically.

Below is for BigQuery:
Duplicates will not be deleted automatically - there is no such functionality in BigQuery
You should have some process to make this happen as frequently as you need or use views

Bigquery is based on append-only kind of a design. So, it accepts all the data.
This is one of the reasons there are no Primary/Unique key constraints on it, so you can't prevent duplicates from entering in the table.
So, you have to have a process like:
1.) Create a new table without duplicates from your original table.
(You can use DISTINCT/ROW_NUMBER() for doing this.)
2.) Drop original table.
3.) Rename new table with original table name.
Let me know if this information helps.

Related

SQL query change column name

I'm working on some current, and archived SQLite databases.
In current versions, a column is named message, but in the archived versions it's named message_id.
The query I'm running is pretty lengthy/complex, and it's just this one column that's changed. Is there any way I can do some kind of CASE EXISTS style query to do this, or am I just going to have to write a separate query?
I would suggest writing a view to access the historical data:
create view v_message_history
select message as message_id, . . .
from message_archive;
Then you can use the view and the two columns have the same name.
You could also use alter table to rename the column in either the history or current table. I am guessing, though, that you don't want to do that because it might break existing code.

Looking to have a copy of another database that automatically updates and has a few additional columns

I'm very new to database/server management. I'm working with a database that I can't add any columns to since it interfaces directly with another piece of software and therefore must stay in a very specific format. However, I'd like to be able to add DateCreated, and CreatedBy columns to the tables in this database to setup some automatic email updates when new entries are made. To do this, I thought I might be able to keep a copy of the original database that automatically updates when changes are made to the original and simply add the additional columns to the copy. I'm working in Microsoft SQL 2017. If anyone could provide any guidance on the best way to accomplish this, your help would be much appreciated.
Create a table extension that consists of the additional columns + the key value from the original table. Each row in Table 1 should have 1 or 0 rows in Table 2. Use a trigger on Table 1 to insert a row in Table 2 on Insert or Update.

Having placeholder columns when creating new database table

Is it good practice to add some placeholder columns when creating a database table with millions of rows, in case the schema gets changed later? More efficient to rename a column than to insert a new one?
There are many problems with adding "placeholder" columns to a table.
These columns may take up useless space, and appear "sloppy".
You may create too many columns now, and have columns that will never be used.
You may not create enough columns now, and will have to end up creating more anyways.
You don't know what the column data types will be at this time.
Always remember that if a column needs added at a later date and will not be used for any of the current rows in the table, you can still keep the table normalized by creating a smaller table that holds this information, then link them by using the primary key.
Let me know if you have any questions about this. I hope this helps!

How to create query to update records only when changes occur and to add new records that do not already exist

I am running a query that fetches data(records) form a linked table from another database.
The linked table is populated by users using a form remotely, like the web.
I created this piece of code that queries the data from the linked table into a new table, like this:
`INSERT INTO NEW_TBL(ENT_CUS_NUM, ENT_FIRST_NAME, ENT_LAST_NAME, ENT_ADDRESS1, ENT_CITY, ENT_STATE, ENT_ZIP, ENT_PHONE)
SELECT LINK_TBL.CUS_NUM, LINK_TBL.FIRST_NAME, LINK_TBL.LAST_NAME, LINK_TBL.ADDRESS1, LINK_TBL.CITY, LINK_TBL.STATE, LINK_TBL.ZIP, LINK_TBL.PHONE
FROM LINK_TBL`
Is it possible to modify this query so that it inserts new records from the link table if the record has not already been added, or update existent records
that have been modified? Example: Lets say a person changes their address, Can I update or bring over only their address without re-inserting their entire record because of an address change?
This is what confuses, I could write an update statement but modifying this querying so that it brings over new records or update records with changes is way over my head.
I would appreciate your input and help.
Guy
If you can designate a field as a unique key, you can use REPLACE INTO instead of INSERT INTO at least with mysql

Is there any way to fake an ID column in NHibernate?

Say I'm mapping a simple object to a table that contains duplicate records and I want to allow duplicates in my code. I don't need to update/insert/delete on this table, only display the records.
Is there a way that I can put a fake (generated) ID column in my mapping file to trick NHibernate into thinking the rows are unique? Creating a composite key won't work because there could be duplicates across all of the columns.
If this isn't possible, what is the best way to get around this issue?
Thanks!
Edit: Query seemed to be the way to go
The NHibernate mapping makes the assumption that you're going to want to save changes, hence the requirement for an ID of some kind.
If you're allowed to modify the table, you could add an identity column (SQL Server naming - your database may differ) to autogenerate unique Ids - existing code should be unaffected.
If you're allowed to add to the database, but not to the table, you could try defining a view that includes a RowNumber synthetic (calculated) column, and using that as the data source to load from. Depending on your database vendor (and the products handling of views and indexes) this may face some performance issues.
The other alternative, which I've not tried, would be to map your class to a SQL query instead of a table. IIRC, NHibernate supports having named SQL queries in the mapping file, and you can use those as the "data source" instead of a table or view.
If you're data is read only one simple way we found was to wrapper the query in a view and build the entity off the view, and add a newguid() column, result is something like
SELECT NEWGUID() as ID, * FROM TABLE
ID then becomes your uniquer primary key. As stated above this is only useful for read-only views. As the ID has no relevance after the query.