I have two databases(Say 'A' and 'B'). Many changes are used to done in 'A' database, but after certain time period, I need to make 'B' database too same as 'A'. Currently I am deleting and creating 'B' database, which seems the easiest to me. But, I'm curious is there any method so that I can just update 'B' from 'A'. Also, I think in case of large size of database, if I use delete and create method, it costs too much time to me. If there's any advice or solution, I feel highly helpful. Thanks in advance.
I think you are looking for database replication.
Hope this link would be useful.
Related
Long in short that I realized that Hue(Hive/Impala) is not like Microsoft SQL server that you run the following to look for the Table of Interest.
Select * from information_schema.columns where column_name like '%The_Table_of_Interest%'
1st scenario: Imagine that I know what my Database is and I target my attention to the right table by searching through the table and find the column of interest.
2nd scenario: I don't know even what database I need to look for the right table and as a result the column of interest.
I realized that in Hue, there is no option to look for a column. All I can see is Table Search!
Having said that for the two above scenarios there should be a way to find the column of interest.
Scenario 2 is of course difficult to approach, however the 1st one looks a bit easier.
Now, I did my research and came of with running some code in Shell Command Line might be helpful to find the target column. However, that require some further investigation in the layer that I am not quite familiar.(Speaking of Metaset, etc.)
Therefore, here is my question.
Assume we are discussing the 1st scenario, now how can I search and find the columns while you have no knowledge about the tables at all. I can not take guesses and try every tables to find the right one to find the column that I am looking for. What would you suggest, and what is your strategy to approach? Thank you in advance. :)
Good Day H2019
Here are some commands that should help you out to explore the different tables that you have access to:
Find a table or a database
show tables like 'ben*'
Look at the table definition
show create table <table>;
Get table information
describe my_table_01;
Get even more information
describe extended table_name
Get more information in a pretty format
describe formatted table_name;
If you have access to Apache Ranger I also find it useful to look into tables permissions. (And see who's using what)
Apache Atlas if you use it it helpful to see where data comes from.(It keeps data lineage information and may help to give you an understanding of how things work)
Don't forget you can look at HDFS to find databases, tables if they're in /hive/warehouse/. This can also be helpful to understand when things are created.
I was wondering what happens if all data is added in a same table without using any other table or foreign key. What effects does it create on the performance if the data is very large.
This is sheer out of curiosity. Please do help on finding an answer on this.
So, you could enter large data in single table as well but if you missed proper indexing and partitioning (if required) then the fetching of data will be very slow.
If you are using Binary column in the table then it would be worst.
As a best practice you should follow normalization and if that is not possible then you can use single table as well the only thing which you can prepare yourself for performance.
Use proper Index & partitioning will help in such cases.
This question strongly depends on amount of data (what is 'huge'? 1 GB? 1 TB? 1 PB? ...) and the kind of data you have got (unstructured? images? movies? ...)
Obviously, your data would not be efficient except having only one table if this is something that one would want to call efficiency.
To see what would happen, you basically could test it yourself. You might use AdventureWorks or WorldwideImporters Database and then do something like that with a Select Into:
SELECT * INTO TABLETARGET FROM TABLESOURCE
I'm trying to select column names in our database and I'm unsure how. I tried some solutions I found on Stack Social but I haven't been able to make them work.
Is it even possible given the schema? I feel like it should be but my limited understanding in how to change the FROM clause is preventing me from doing this on my own. Below is a picture to help elaborate on the difficulty I'm having.
Picture of schema and attempts to return table and column names
EDIT: The problem is unique in that the overall layout of our database seems largely different from the norm. There seems to be nested databases and I wasn't sure how to use a specific DB.
The Use function worked well for this but its not intuitive from other answers. At least the picture can help someone in the future if they have a similar problem.
Use RatManreport
GO
SELECT *
FROM INFORMATION_SCHEMA.COLUMNS
WHERE COLUMN_NAME LIKE '%CD%'
I recently started working at a company with an enormous "enterprisey" application. At my last job, I designed the database, but here we have a whole Database Architecture department that I'm not part of.
One of the stranger things in their database is that they have a bunch of views which, instead of having the user provide the date ranges they want to see, join with a (global temporary) table "TMP_PARM_RANG" with a start and end date. Every time the main app starts processing a request, the first thing it does it "DELETE FROM TMP_PARM_RANG;" then an insert into it.
This seems like a bizarre way of doing things, and not very safe, but everybody else here seems ok with it. Is this normal, or is my uneasiness valid?
Update I should mention that they use transactions and per-client locks, so it is guarded against most concurrency problems. Also, there are literally dozens if not hundreds of views that all depend on TMP_PARM_RANG.
Do I understand this correctly?
There is a view like this:
SELECT * FROM some_table, tmp_parm_rang
WHERE some_table.date_column BETWEEN tmp_parm_rang.start_date AND tmp_parm_rang.end_date;
Then in some frontend a user inputs a date range, and the application does the following:
Deletes all existing rows from
TMP_PARM_RANG
Inserts a new row into
TMP_PARM_RANG with the user's values
Selects all rows from the view
I wonder if the changes to TMP_PARM_RANG are committed or rolled back, and if so when? Is it a temporary table or a normal table? Basically, depending on the answers to these questions, the process may not be safe for multiple users to execute in parallel. One hopes that if this were the case they would have already discovered that and addressed it, but who knows?
Even if it is done in a thread-safe way, making changes to the database for simple query operations doesn't make a lot of sense. These DELETEs and INSERTs are generating redo/undo (or whatever the equivalent is in a non-Oracle database) which is completely unnecessary.
A simple and more normal way of accomplishing the same goal would be to execute this query, binding the user's inputs to the query parameters:
SELECT * FROM some_table WHERE some_table.date_column BETWEEN ? AND ?;
If the database is oracle, it's possibly a global temporary table; every session sees its own version of the table and inserts/deletes won't affect other users.
There must be some business reason for this table. I've seen views with dates hardcoded that were actually a partioned view and they were using dates as the partioning field. I've also seen joining on a table like when dealing with daylights saving times imagine a view that returned all activity which occured during DST. And none of these things would ever delete and insert into the table...that's just odd
So either there is a deeper reason for this that needs to be dug out, or it's just something that at the time seemed like a good idea but why it was done that way has been lost as tribal knowledge.
Personally, I'm guessing that it would be a pretty strange occurance. And from what you are saying two methods calling the process at the same time could be very interesting.
Typically date ranges are done as filters on a view, and not driven by outside values stored in other tables.
The only justification I could see for this is if there was a multi-step process, that was only executed once at a time and the dates are needed for multiple operations, across multiple stored procedures.
I suppose it would let them support multiple ranges. For example, they can return all dates between 1/1/2008 and 1/1/2009 AND 1/1/2006 and 1/1/2007 to compare 2006 data to 2008 data. You couldn't do that with a single pair of bound parameters. Also, I don't know how Oracle does it's query plan caching for views, but perhaps it has something to do with that? With the date columns being checked as part of the view the server could cache a plan that always assumes the dates will be checked.
Just throwing out some guesses here :)
Also, you wrote:
I should mention that they use
transactions and per-client locks, so
it is guarded against most concurrency
problems.
While that may guard against data consistency problems due to concurrency, it hurts when it comes to performance problems due to concurrency.
Do they also add one -in the application- to generate the next unique value for the primary key?
It seems that the concept of shared state eludes these folks, or the reason for the shared state eludes us.
That sounds like a pretty weird algorithm to me. I wonder how it handles concurrency - is it wrapped in a transaction?
Sounds to me like someone just wasn't sure how to write their WHERE clause.
The views are probably used as temp tables. In SQL Server we can use a table variable or a temp table (# / ##) for this purpose. Although creating views are not recommended by experts, I have created lots of them for my SSRS projects because the tables I am working on do not reference one another (NO FK's, seriously!). I have to workaround deficiencies in the database design; that's why I am using views a lot.
With the global temporary table GTT approach that you comment is being used here, the method is certainly safe with regard to a multiuser system, so no problem there. If this is Oracle then I'd want to check that the system either is using an appropriate level of dynamic sampling so that the GTT is joined appropriately, or that a call to DBMS_STATS is made to supply statistics on the GTT.
I'm working with a SQL Server 2000 database that likely has a few dozen tables that are no longer accessed. I'd like to clear out the data that we no longer need to be maintaining, but I'm not sure how to identify which tables to remove.
The database is shared by several different applications, so I can't be 100% confident that reviewing these will give me a complete list of the objects that are used.
What I'd like to do, if it's possible, is to get a list of tables that haven't been accessed at all for some period of time. No reads, no writes. How should I approach this?
MSSQL2000 won't give you that kind of information. But a way you can identify what tables ARE used (and then deduce which ones are not) is to use the SQL Profiler, to save all the queries that go to a certain database. Configure the profiler to record the results to a new table, and then check the queries saved there to find all the tables (and views, sps, etc) that are used by your applications.
Another way I think you might check if there's any "writes" is to add a new timestamp column to every table, and a trigger that updates that column every time there's an update or an insert. But keep in mind that if your apps do queries of the type
select * from ...
then they will receive a new column and that might cause you some problems.
Another suggestion for tracking tables that have been written to is to use Red Gate SQL Log Rescue (free). This tool dives into the log of the database and will show you all inserts, updates and deletes. The list is fully searchable, too.
It doesn't meet your criteria for researching reads into the database, but I think the SQL Profiler technique will get you a fair idea as far as that goes.
If you have lastupdate columns you can check for the writes, there is really no easy way to check for reads. You could run profiler, save the trace to a table and check in there
What I usually do is rename the table by prefixing it with an underscrore, when people start to scream I just rename it back
If by not used, you mean your application has no more references to the tables in question and you are using dynamic sql, you could do a search for the table names in your app, if they don't exist blow them away.
I've also outputted all sprocs, functions, etc. to a text file and done a search for the table names. If not found, or found in procedures that will need to be deleted too, blow them away.
It looks like using the Profiler is going to work. Once I've let it run for a while, I should have a good list of used tables. Anyone who doesn't use their tables every day can probably wait for them to be restored from backup. Thanks, folks.
Probably too late to help mogrify, but for anybody doing a search; I would search for all objects using this object in my code, then in SQL Server by running this :
select distinct '[' + object_name(id) + ']'
from syscomments
where text like '%MY_TABLE_NAME%'