What happens if I delete Meta Store of Hive from My SQL Server? - hive

What happens if I delete Meta Store of Hive from My SQL Server? Please provide me the details for Managed Table and External Table.

You lose the metadata - that's all. The data will be unaffected. So you can then simply re-run the scripts that define your tables. This is true for both managed and external tables.
So the deletion (and re-creation) may be a good way for you to clean up a potentially corrupted metastore.

Related

Hive Managed vs External table for Updates and Inserts

What are the Pros and Cons of hive external and managed tables?
We want to do updates and inserts in Hive tables but wonder which approach to take for these (Managed tables or create a workaround with refreshing external tables after manual file updates), especially after adding many files over time.. Will one approach or the other become too slow (e.g. too many files/too many updates to track via metastore and therefore master node becomes slow?)?
Thanks.
There are a number limitations to do DMLs on Hive. Please read the documentation link for more details - https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions. It’s always recommend not to use DML on Hive managed tables especially if the data volume is huge or if the table grows in size over time these operations would become too slow. Although, these operations would be considerably faster if done on a partition/bucket instead of the full tables. Nevertheless it better to handle the edits in file and do a full refresh via external table and only use DML on managed tables as last resort.

Best way to set up a new database on a new server which periodically refreshes tables from a live SQL Server?

I need to create a database solely for analytical purposes. The idea here is for it to start off as a 1:1 replica of a current SQL Server database but we will then add in additional tables. The idea here is to be able to have read-write access to a db without dropping anything in production inadvertently.
We would ideally like to set a daily refresh schedule to update all tables in the new tb to match the tables in the live environment.
In terms of the DBMS for the new database, I am very easy - MySQL, SQL Server, PostgreSQL would be great -- I am not hugely familiar with the Google Storage/BigQuery stack but if this is an easy option, I'm open to it.
You could use a standard HA/DR solution with a readable secondary (Availability Groups/mirroring /log shipping).
then have a second database on the new server for your additional tables.
Cloud Storage and BigQuery are not RDBMS services themselves, but could be used in this case to store the backups/exports/dumps from the replica, and then have the analytical work performed on those backups.
Here is an example workflow:
Perform a backup and restore in a different database
Add the new tables in the new database
Export the database as a CSV file on your local machine
Here you could either directly load the CSV file in BigQuery, or upload that file in a Cloud Storage bucket previously created
Query the data
I suggest to take a look at the multiple methods for loading data in BigQuery, as well as the methods for querying against external data sources which may help to determine which database replication/export method might be best for your use case.

Understanding Azure SQL Server External Tables

We are trying to create a cross-database query using Azure's preview Elastic Query. So we will be creating an External Table to make these queries happen.
Unfortunately, I have some apprehension about how the queries will be executed. I don't want a query or stored procedure to fail at run-time because the database connection fails. I just don't understand how the External Tables work.
Azure's External Table docs have good information on how to query and create the table. I just can't find information that specifically spells out how the data exists.
Oracle's version of external tables is just flat files that are referenced. SQL*Loader loads data from external files into tables of an Oracle database. I couldn't find any documentation about Azure doing the same. (Is it implied that they are the same? Is that a stupid question?)
If it is this way (external flat files), when the external table gets updated, does SQL Server update the flat files so our external table stays up to date? Or will I have to delete/create the link again every time I want to run the query for up to date information?
Per Microsoft Support:
Elastic queries basically works as remote queries which means the data is not stored locally but is pulled from the source database every time you run a query. When you execute a query on an external table, it makes a connection to the source database and gets the data.
With that being said, you do not have to delete/create the links. Once you have performed these steps, you can access the horizontally partitioned table “mytable” as though it were a local table. Azure SQL Database automatically opens multiple parallel connections to the remote databases where the tables are physically stored, processes the requests on the remote databases, and returns the results.
There is no specific risk associated with using this feature but it is simply like opening connections to the source database so it can pull data. Besides this you can expect some slowness when executing a remote query but nothing that will cause any other issues with the database.
In case any of the database becomes unavailable, queries that are using the affected DB as source or target will experience query cancellations or timeouts.

SQL Server - Archiving Data (Strategy/Stored Procedure)

Currently we have 100+ databases, some about 10GB in size with millions of records and they are growing at an alarming rate. We need to evaluate our archiving strategy.
Does anyone have any suggestions and sample scripts that go through all the tables and archives the data into an ARCHIVED database - with everything being audited (in regards to number of records imported etc..) and in case of failure it rolls back everything?
Regards
Partitioning can help a lot to archive within single database. Sliding window scenario is a particular tool.
Let me suggest setting up an Admin database. It will handle all setting and information about archiving.
There may be 2 SQL Server instances: Current Server and Archive Server. They will have same structure.
Process copies data from remote server to archive server using settings from Admin DB. There may be need in writing Dynamic SQL. Check Sp_MSForEachDB.
+1 to partitioning idea. To add - I think you can use it also if you have Developers edition

Add Indexes to columns in remote table - Oracle

Am querying a remote database using DBLink. Now am wondering to speed up the query, how can i add indexes to few columns in the remote table.
Would appreciate if anyone can provide any recommendations around the same.
You could use DBMS_JOB or DBMS_SCHEDULER packages on the remote database to schedule a job, executing DDL.
But consider this, if Oracle throws an exception for DDL over databse links, there must be a good reason for it, right? You don't want anyone messing with your schema remotely over a database link. So instead, talk to the remote DBA and try to figure out solutions with him/her.
it can't be done over the dblink (even if your dblink is using the owning schema) you will see
ORA-02021: DDL operations are not allowed on a remote database
You could create a Materialized View in the remote database based in your query, add your prefered indexes to it, and then, if you need it, create a synonym for that materialized view.
John,
A good place to start would be the following Oracle documentation on "Tuning Distributed Queries".
http://download.oracle.com/docs/cd/B28359_01/server.111/b28310/ds_appdev004.htm
you could create the indexes in the remote database and build up your query in a view form (in the remote database of course).
that way the remote database will complete the query using all the methods he got (like indexes) and bring you back only the wanted resultes.