What do the various tables in Hive Metastore contain? - hive

I went through the Hive Metastore database and found that it contains a lot of tables (for example TBLS, DBS, etc).
I want to know what data do these various tables store?
I tried to find some sort of documentation for the meaning of these individual tables, and their columns but couldn't find any.

There are many tables present in Hive metastore each one for a specific purpose. Hive uses this metastore to store its metadata (Database names, table names, columns, data types, etc.) For example, the TBLS table contains data related to Hive tables such as table name, table owner, created time, Database ID, etc.
These tables are related to each other with foreign keys and useful information can be retrieved by querying them with joins.
A sample query to find all the database names and their corresponding tables with column name and type is shown below.
SELECT DBS.NAME, TBLS.TBL_NAME, COLUMNS_V2.COLUMN_NAME, COLUMNS_V2.TYPE_NAME FROM TBLS, COLUMNS_V2, SDS, DBS WHERE TBLS.SD_ID=SDS.SD_ID AND COLUMNS_V2.CD_ID=SDS.CD_ID AND TBLS.DB_ID=DBS.DB_ID
ER diagram of the metastore - https://datacadamia.com/_media/db/hive/hive_metastore_er_diagram.png
Some useful queries can be found here - https://analyticsanvil.wordpress.com/2016/08/21/useful-queries-for-the-hive-metastore
I also didn't find any specific documentation on the same but I think this could help!!

Related

How to assign the IDs to the referring table and how to display this correctly? (SSMS)

I am in the process of creating an audit plan using ERD, going off the below image you can see that there's a permissions table with four FK columns referring to the other four tables PK column. I am just confused as to how the IDs will relate to the other tables and how will it show up correctly in the permissions table?
For the Users table, I imported the data from 'master.sys.server_principals.
For the Instance table, I imported the data by using ##SERVERNAME.
For the Databases table, I imported the data from master.sys.databases.
For the Object Types table, I imported the data from master.sys.objects.
Now, I am currently on the permissions table and stuck at this point because I am wondering how will the IDs match from the four other tables (mentioned above and shown in the image link below) to this permissions table. I know I need to query from master.sys.database_permissions to get the information for both columns 'Permissions_Permission_Name' and 'Permissions_Object_Name' but it's just the other four ID columns which I am confused about...(you can ignore the column Permissions_ID)
I'm going to use the Answer field, because there is no space in the Comment editor. This answer is an answer to only part your question, two of the four tables (Databases and Users) I can relate to system tables.
First and foremost: when filling in Id's, you would generate the other table records first, keep the Identity Id's generated, and finally create a new Permission record and fillin the correct indexes there, in each Id field. That counts for any such change when a table contains indexes to other tables. Suppose you know.
Issue is, your structure differs from the system tables. You will need more "permission" records than master.sys.database_permissions, because MsSQL registers these as permissions per principal (role) not permissions per user.
I solved two of the four:
The user is connected to a principal role via master.sys.database_role_members. The Id of the user role can be found in your source as master.sys.database_permissions.grantee_principal_id and the corresponding users that have this principal_id are listed in master.sys.database_role_members.
Your permission a database (ONE database) is defined in your Permission record. The database name in this database record should map to a database on your server. In that database, you will find database_permissions.sys.server_principals. users that have the permissions are (again) found in master.sys.database_role_members.
I'm not sure what you intend to do with the other 2 tables, Instances and Object Types.
Refer ms-docs about the subject at https://learn.microsoft.com/en-us/sql/relational-databases/system-catalog-views/sys-database-permissions-transact-sql?view=sql-server-ver15

ETL with PLSQL procedures - How to combine two similar schemas?

So consider two similar database schemas A and B
A has entries like
Table_Employee
NAME, PHONE_NUMBER, etc.
and
B has similar tables like
Table_Worker:
NAME, PHONE, etc.
There are still minor differences though, such as different formats of phone numbers or maybe other data that is missing in one schema but could be generated from entries that already exist.
Now I would like to combine the entries of both schemas and create a new one (C for example). Basically, I would like to find out how to do this with PLSQL procedures, just combining both schemas, while invoking minimal transformations on certain entries so that everything is homogeneous again and all entries of schema A and B exist in C.
I could also transform the data on the spot and then merge everything together or use two extra schemas but I don't think that is how it's done in practice and my minimal PLSQL knowledge is not helping.
So how could I do this kind of data mapping in PLSQL?

Azure Table Sharing Partition Key Across Tables

I'm hoping to keep common data from the same user partitioned together. Normally I'd just use the same partition key to accomplish that but in this case the data is in different tables. E.g users, photos, friends, etc
I have seen it explicitly stated but I assume that even if I use the partition key across tables that I won't be able to accomplish this. Can anyone validate or disprove?
Data with the same partition key but in different tables has no guarantee of being on the same server. If you check out the Storage Table Design Guide, particularly the section titled 'Table Partitions', you'll find 'The account name, table name and PartitionKey together identify the partition within the storage service where the table service stores the entity.' That guide may help you clarify this question and anything related.

User defined data types in T-sql

I had a look in the tables I am using in my database based on Rapid SQL. In one of the table i was just checking columns used in that table. I used sp_help table_name to find out the details of columns, datatypes and other things. I found some data types as T_gender_domain, T_name_domain, T_phone_domain. Are they used as a user defined data types???

How to Merge Multiple Database files in SQLite?

I have multiple database files which exist in multiple locations with exactly similar structure. I understand the attach function can be used to connect multiple files to one database connection, however, this treats them as seperate databases. I want to do something like:
SELECT uid, name FROM ALL_DATABASES.Users;
Also,
SELECT uid, name FROM DB1.Users UNION SELECT uid, name FROM DB2.Users ;
is NOT a valid answer because I have an arbitrary number of database files that I need to merge. Lastly, the database files, must stay seperate. anyone know how to accomplish this?
EDIT: an answer gave me the idea: would it be possible to create a view which is a combination of all the different tables? Is it possible to query for all database files and which databases they 'mount' and then use that inside the view query to create the 'master table'?
Because SQLite imposes a limit on the number of databases that can be attached at one time, there is no way to do what you want in a single query.
If the number can be guaranteed to be within SQLite's limit (which violates the definition of "arbitrary"), there's nothing that prevents you from generating a query with the right set of UNIONs at the time you need to execute it.
To support truly arbitrary numbers of tables, your only real option is to create a table in an unrelated database and repeatedly INSERT rows from each candidate:
ATTACH DATABASE '/path/to/candidate/database' AS candidate;
INSERT INTO some_table (uid, name) SELECT uid, name FROM candidate.User;
DETACH DATABASE candidate;
Some cleverness in the schema would take care of this.
You will generally have 2 types of tables: reference tables, and dynamic tables.
Reference tables have the same content across all databases, for example country codes, department codes, etc.
Dynamic data is data that will be unique to each DB, for example time series, sales statistics,etc.
The reference data should be maintained in a master DB, and replicated to the dynamic databases after changes.
The dynamic tables should all have a column for DB_ID, which would be part of a compound primary key, for example your time series might use db_id,measurement_id,time_stamp. You could also use a hash on DB_ID to generate primary keys, use same pk generator for all tables in DB. When merging these from different DBS , the data will be unique.
So you will have 3 types of databases:
Reference master -> replicated to all others
individual dynamic -> replicated to full dynamic
full dynamic -> replicated from reference master and all individual dynamic.
Then, it is up to you how you will do this replication, pseudo-realtime or brute force, truncate and rebuild the full dynamic every day or as needed.