How do I retrieve all the database's tables columns hive

How do I retrieve all the database's tables columns hive - hive

I want to fetch all databases and its related tables and columns information. Obviously I can do it from metastore. But I don't have access to it . So is there any other way instead of querying each database one by one.

you'd need python but i did it with this:
databases = run_hive_query('show schemas')
databases = list(databases.database_name)
schema = {'DB':[],
'Table':[],
'Column':[],
'DataType':[]}
for db in databases:
tables = run_hive_query( 'show tables from ' +db)
tables = list(tables.tab_name)
for tb in tables:
try:
columns = (run_hive_query('desc ' + db+'.'+tb))
print(db + ' '+ tb)
except:
print('failed'+db + ' '+ tb)
try:
for x in range(columns.shape[0]):
schema['DB'].append(db)
schema['Table'].append(tb)
schema['Column'].append(columns.iloc[x][0])
schema['DataType'].append(columns.iloc[x][1])
except:
print('failed'+db + ' '+ tb)

You should be able to run the following commands. I guess you could script it up to run across all databases and all tables
SHOW DATABASES;
SHOW TABLES;
DESCRIBE <table_name>;

Related

Newly created column not registering in Case statement - 'Object not found'

I am combining a group of columns to create the specs per each line item. I then will compare these specs to the specs from an excel sheet that have an ID with them. So in the end the ID will be associated with the correct items in the database. To do this, I create a new column called SpecKey and then write a Case statement to compare. Below is some simplified code to show what I am doing. Is there a reason why I cannot use SpecKey in the case statement? And is there a workaround or better way to do this?
When I swap SpecKey with an already existing column in the database, the code compiles. But using the newly created column, SpecKey, seems to be the problem. Here is the error I receive in DBeaver: 'SQL Error [42501]: UCAExc:::5.0.1 user lacks privilege or object not found: FULL.SPECKEY'
SQL Code:
SELECT Flat_Size + '|' + Finish_Size + '|' + STD_PK + '|' + STD_PK_QUANTITY as SpecKey,
CASE WHEN FULL.SpecKey IS '4x8|4x4|Each|1' THEN '12345ID'
ELSE NULL
END AS TemplateID
FROM `Table`

RDSdataService execute_statement returns (BadRequestException)

I am using boto3 library with executeStatement to get data from an RDS cluster using DATA API.
Query is working fine if i select 1 or 2 columns but as soon as I select another column to query, it returns an error with (BadRequestException) permission denied for relation table_name
I have checked using pgadmin the permissions are intact to query the whole db for the user I am using.
function included in call:
def execute_query(self, sql_query, sql_parameters=[]):
"""
Aurora DataAPI execute query. Generally used for select statements.
:param sql_query: Query
:param sql_parameters: parameters in sql query
:return: DataApi response
"""
client = self.api_access()
response = client.execute_statement(
resourceArn=RESOURCE_ARN,
secretArn=SECRET_ARN,
database='db_name',
sql=sql_query,
includeResultMetadata=True,
parameters=sql_parameters)
return response
function call: No errors
query = '''
SELECT id
FROM schema_name.table_name
limit 1
'''
print(query)
result = conn.execute_query(query)
print(result)
function call: fails with above error
query = '''
SELECT id,name,event
FROM schema_name.table_name
limit 1
'''
print(query)
result = conn.execute_query(query)
print(result)
Is there a horizontal limit on what we can get from DATA API using Boto3? I know there is a limit for 1MB, but it should return something as per the documentation if it exceeds the limit.
Backend is Postgres RDS
UPDATE:
I can select the same columns 10 times and its not a problem
query = '''
SELECT id,event,event,event,event,event
FROM schema_name.table_name
limit 1
'''
print(query)
result = conn.execute_query(query)
print(result)
So this means there are some columns that I cannot select.

I didnt know there are column level security in some tables. If there are column level securities in postgres for the user you are using that's obvious I cannot select those columns.

Create SQL table from parquet files

I am using R to handle large datasets (largest dataframe 30.000.000 x 120). These are stored in Azure Datalake Storage as parquet files, and we would need to query these daily and restore these in a local SQL database. Parquet files can be read without loading the data into memory, which is handy. However, creating SQL tables from parquuet files is more challenging as I'd prefer not to load the data into memory.
Here is the code I used. Unfortunately, this is not a perfect reprex as the SQL database need to exist for this to work.
# load packages
library(tidyverse)
library(arrow)
library(sparklyr)
library(DBI)
# Create test data
test <- data.frame(matrix(rnorm(20), nrow=10))
# Save as parquet file
write_parquet(test2, tempfile(fileext = ".parquet"))
# Load main table
sc <- spark_connect(master = "local", spark_home = spark_home_dir())
test <- spark_read_parquet(sc, name = "test_main", path = "/tmp/RtmpeJBgyB/file2b5f4764e153.parquet", memory = FALSE, overwrite = TRUE)
# Save into SQL table
DBI::dbWriteTable(conn = connection,
name = DBI::Id(schema = "schema", table = "table"),
value = test)
Is it possible to write a SQL table without loading parquet files into memory?

I lack the experience with T-sql bulk import and export but this is likely where you'll find your answer.
library(arrow)
library(DBI)
test <- data.frame(matrix(rnorm(20), nrow=10))
f <- tempfile(fileext = '.parquet')
write_parquet(test2, f)
#Upload table using bulk insert
dbExecute(connection,
paste("
BULK INSERT [database].[schema].[table]
FROM '", gsub('\\\\', '/', f), "' FORMAT = 'PARQUET';
")
)
here I use T-sql's own bulk insert command.
Disclaimer I have not yet used this command in T-sql, so it may riddled with error. For example I can't see a place to specify snappy compression within the documentation, although it can be specified if one instead defined a custom file format with CREATE EXTERNAL FILE FORMAT.
Now the above only inserts into an existing table. For your specific case, where you'd like to create a new table from the file, you would likely be looking more for OPENROWSET using CREATE TABLE AS [select statement].
column_definition <- paste(names(column_defs), column_defs, collapse = ',')
dbExecute(connection,
paste0("CREATE TABLE MySqlTable
AS
SELECT *
FROM
OPENROWSET(
BULK '", f, "' FORMAT = 'PARQUET'
) WITH (
", paste0([Column definitions], ..., collapse = ', '), "
);
")
where column_defs would be a named list or vector describing giving the SQL data-type definition for each column. A (more or less) complete translation from R data types to is available on the T-sql documentation page (Note two very necessary translations: Date and POSIXlt are not present). Once again disclaimer: My time in T-sql did not get to BULK INSERT or similar.

Pyodbc and Access with query parameter that contains a period

I recently found a bug with some Access SQL queries that I can't seem to track down. I have a fairly straightforward SQL query that I use to retrieve data from an access database that's "managed" in an older application (ie the data is already in the database and I have no real control over what's in there).
import pyodbc
MDB = '******.MDB'
DRV = '{Microsoft Access Driver (*.mdb)}'
PWD = ''
con = pyodbc.connect('DRIVER={};DBQ={};PWD={}'.format(DRV, MDB, PWD))
sql = ('SELECT Estim.PartNo, Estim.Descrip, Estim.CustCode, Estim.User_Text1, Estim.Revision, ' +
'Estim.Comments, Routing.PartNo AS RPartNo, Routing.StepNo, Routing.WorkCntr, Routing.VendCode, ' +
'Routing.Descrip AS StepDescrip, Routing.SetupTime, Routing.CycleTime, ' +
'Routing.WorkOrVend, ' +
'Materials.PartNo as MatPartNo, Materials.SubPartNo, Materials.Qty, ' +
'Materials.Unit, Materials.TotalQty, Materials.ItemNo, Materials.Vendor ' +
'FROM (( Estim ' +
'INNER JOIN Routing ON Estim.PartNo = Routing.PartNo ) ' +
'INNER JOIN Materials ON Estim.PartNo = Materials.PartNo )')
if 'PartNo' in kwargs:
key = kwargs['PartNo']
sql = sql + 'WHERE Estim.PartNo=?'
cursor = con.cursor().execute(sql, key)
# use this for debuging only
num = 0
for row in cursor.fetchall():
num += 1
return num
This works fine for all PartNo except when PartNo contains a decimal point. Curiously, when PartNo contains a decimal point AND a hyphen, I get the appropriate record(s).
kwargs['PartNo'] = "100.100-2" # returns 1 record
kwargs['PartNo'] = "200.100" # returns 0 records
Both PartNos exist when viewed in the other application, so I know there should be records returned for both queries.
My first thought was to ensure kwargs['PartNo'] is a string key = str(kwargs['PartNo']) with no change.
I also tried to places quotes around the 'PartNo' value with no success. key = '\'' + kwargs['PartNo'] + '\''
Finally, I tried to escape the . with no success (I realize this would break most queries, but I'm just trying to track down the issue with a single period) key = str(kwargs['partNo']).replace('.', '"."')
I know using query parameters should handle all the escaping for me, but at this point, I'm just trying to figure out what's going on. Any thoughts on this?

So the issue isn't with the query parameters - everything works as it should. The problem is with the SQL statement. I incorrectly assumed - and never checked - that there was a record in the Materials table that matched PartNo.
INNER JOIN Materials ON Estim.PartNo = Materials.PartNo
will only return a record if PartNo is found in both tables, which in this particular case it is not.
Changing it to
LEFT OUTER JOIN Materials ON Estim.PartNo = Materials.PartNo
produces the expected results. See this for info on JOINS. https://msdn.microsoft.com/en-us/library/bb243855(v=office.12).aspx
As for print (repr(key)) - flask handles the kwarg type upstream properly
api.add_resource(PartAPI, '/api/v1.0/part/<string:PartNo>'
so when I ran this in the browser, I got the "full length" strings. When run in the cmd line using python -c ....... I was not handling the argument type properly as Gord pointed out, so it was truncating the trailing zeros. I didn't think the flask portion was relevant, so I never added that in the original question.

ORACLE - comment all columns of all tables

I wanna comment all columns of all tables where the column have Foreing key with one especific table.
I know comment one by one, but they are many fields

SELECT
'COMMENT ON COLUMN ' as command1,
SYS.ALL_TAB_COLUMNS.OWNER,
'.' as command2,
SYS.ALL_TAB_COLUMNS.TABLE_NAME,
'.' as command3,
SYS.ALL_TAB_COLUMNS.COLUMN_NAME,
' is ''#Enumeration=boleano' as coment_to_add,
SYS.ALL_COL_COMMENTS.COMMENTS,
''';' as command5
FROM
SYS.ALL_TAB_COLUMNS
INNER JOIN SYS.ALL_COL_COMMENTS ON SYS.ALL_TAB_COLUMNS.COLUMN_NAME = SYS.ALL_COL_COMMENTS.COLUMN_NAME AND SYS.ALL_TAB_COLUMNS.TABLE_NAME = SYS.ALL_COL_COMMENTS.TABLE_NAME AND SYS.ALL_TAB_COLUMNS.OWNER = SYS.ALL_COL_COMMENTS.OWNER
WHERE
SYS.ALL_TAB_COLUMNS.OWNER LIKE '$MY_OWNER'
The result exported to txt file is the script;

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How do I retrieve all the database's tables columns hive - hive

I want to fetch all databases and its related tables and columns information. Obviously I can do it from metastore. But I don't have access to it . So is there any other way instead of querying each database one by one.

You should be able to run the following commands. I guess you could script it up to run across all databases and all tables SHOW DATABASES; SHOW TABLES; DESCRIBE <table_name>;

Related

Newly created column not registering in Case statement - 'Object not found'

RDSdataService execute_statement returns (BadRequestException)

Create SQL table from parquet files

Pyodbc and Access with query parameter that contains a period

ORACLE - comment all columns of all tables

Categories

Resources