How to get the partition column names for a table? - sql

I have a table that is partitioned on one or more columns. I can do ...
SHOW PARTITIONS table_db.table_1
which gives a list of all partitions like this,
year=2007
year=2015
year=1999
year=1993
but I am only interested in finding which columns the table is partitioned on, in this case, year. And I would like to be able to do this of multiple tables at once, giving me a list of their names and partitioned columns somewhat like this.
table_name partition_col
table_1 year
table_2 year, month
I tried the solutions here...
https://docs.aws.amazon.com/athena/latest/ug/querying-glue-catalog.html#querying-glue-catalog-listing-partitions
SELECT * FROM table_db."table_1$partitions"
does give me results with one column for each partition...
# year
1 2007
2 2015
3 1999
4 1993
...but I couldn't extract the column names from this query.

Try this.
SELECT table_name,
array_join(array_agg(column_name), ', ') as partition_col
FROM information_schema.columns
WHERE extra_info = 'partition key'
GROUP BY 1

Get metadatas with AWS client provided in your language, like boto3 athena for python
import boto3
client = boto3.client()
table = client.get_table_metadata(
CatalogName=catalog,
DatabaseName=database,
TableName=name
)["TableMetadata"]
partition_keys = table["PartitionKeys"]

it seems solution is for mysql not SQL Server.

Related

run sql for other tables if one does not exists (ignore missing tables do not fail)

I have many tables in bigquery. I need to create a table which has name of people in each table who is older than 20.
I created something like this but this one fails if one of the tables does not exist.
(I am running it for different projects and their tables are slightly different for example one of the projects does not have tableA)
WITH
a As (
SELECT name
FROM 'tableA'
WHERE age >20
),
b As (
SELECT name
FROM 'tableB'
WHERE age >20
)
SELECT name FROM a
UNION ALL
SELECT name FROM b
How can I prevent the failure and say if the table exist then find people older than 20 otherwise ignore it and run for other tables?
(This is an Airflow task which fails)
As I understand you have one composer environment and want it to use the BigQueryOperator() to query data in 3 different projects.
I am assumig you have already created your Composer environment in your project. Then, you can follow the below steps:
1) Create 3 different connections between your Composer environment and each project you will query against. Such as described here.
2) Create a specific query for each project, where you filter age>20 and append all the tables together. So, you will address properly the tables for each project.
3) Create one DAG file with 3 BigQueryOperators, each referencing a particular connection and using the appropriate query, created in 2. The DAG creation is described here and the DAG would be as follows:
task_custom = bigquery_operator.BigQueryOperator(
task_id='task_custom_connection_school1',
bql_1='SELECT * WHERE age>20', use_legacy_sql=False,
# Set a connection ID to use a connection that you have created in step 1.
bigquery_conn_id='my_gcp_connection')
As an alternative you can create multiple DAG files, one for each connection. In addition, notice that the connection name is specified with bigquery_conn_id.
Following the above steps, each of your queries would be tailored specifically for each project. Thus, they would be executed properly.
A BigQuery-only solution would be to use wildcard tables.
There are several options here:
If all the tables in the dataset are being queried:
SELECT name
FROM `project.dataset.*`
WHERE age > 20
If all the table names in all the datasets are known:
SELECT name
FROM `project.dataset.*`
WHERE age > 20
AND _TABLE_SUFFIX IN ('tableA', 'tableB', ..., 'tableN')
If all the tables in all the datasets conform to a specific pattern:
SELECT name
FROM `project.dataset.*`
WHERE age > 20
AND _TABLE_SUFFIX LIKE 'table%'
The LIKE operator (combined with logical operators) on the _TABLE_SUFFIX field give a lot of freedom to match a lot of patterns for table names without the need to explicitly list all the table names, as it is needed for the IN operator. If no table matches the _TABLE_SUFFIX specified (i.e. it is not listed in the array of the IN operator), the query will return 0 results instead of failing.
More details about querying wildcard tables in the BigQuery documentation.
Note that non-matching schemas could cause some issues, so you might want to include a verification that the matched tables have the right schema with a query to the INFORMATION_SCHEMA.COLUMNS table:
WITH correct_schema_tables as (SELECT table_name FROM (
SELECT * FROM project.dataset.INFORMATION_SCHEMA.COLUMNS
WHERE
column_name = 'name'
AND data_type = 'STRING')
JOIN (
SELECT * FROM project.dataset.INFORMATION_SCHEMA.COLUMNS
WHERE
column_name = 'age'
AND data_type = 'INT64')
USING (table_name)
)
SELECT name
FROM `project.dataset.*`
WHERE age > 20
AND _TABLE_SUFFIX IN (SELECT table_name FROM correct_schema_tables)
AND _TABLE_SUFFIX LIKE 'table%'

IF Field Exists in StandardSQL

I have a table with these columns:
Apples
Bananas
Peaches - however, this column may or may not
appear. The table is dropped and loaded every 5 hours and I need to
be ready for situation where column "Peaches" is not available.
I have found couple similar questions here on StackOverflow but they were all using LegacySQL to solve the problem.
I was trying something like this:
SELECT *
FROM project.dataset.fruits
WHERE EXISTS(
SELECT peaches
FROM project.dataset.fruits
)
The code gives me that "peaches" is unknown name in case the "fruits" table does not currently have the column and the entire query fails.
Any ideas how to get around this?
Below is for BigQuery Standard SQL
#standardSQL
SELECT * FROM `project.dataset.fruits`
WHERE EXISTS (
SELECT 1 FROM `project.dataset.fruits` t
WHERE REGEXP_CONTAINS(TO_JSON_STRING(t), '[{,]"peaches":')
LIMIT 1
)
You may use INFORMATION_SCHEMA
SELECT
1
FROM
`project.dataset.INFORMATION_SCHEMA.COLUMNS
WHERE
table_name="fruits" AND column_name="peaches"

Extract Oracle Pivot table to CSV file

I have a created a pivot table in Oracle in order to extract data from a rather large and complex table to try and use the data to merge with some letters
Ideally what I need to do is to then export the data to a CSV file for transfer to the hybrid mail solution we have here.
I am struggling with the syntax of how the export to the CSV file might work in this case since some of the columns are created as part of the pivot process.
I had also thought of creating a temporary table using the "create global temporary table as select * from - and then the query for the pivot table. However trying this and then trying to select from the table created by the pivot leads to another complication as I am unable to query the dynamically created column as it is surrounded by '' single quotes.
Any ideas / comments suggestions gratefully received
The query to create the PIVOT table looks like this
select *
from
(
select
BAA_QSA_ID ,
BAA_ASM_ID ,
BAA_SUBJECT_ID ,
PER_FIRST_NAMES ||' '|| per_surname as per_name,
case
when BAA_QST_DESC = 'Name of Person or Organisation of contact and their contact details' then 'PER_ORG_OF_CONTACT' else BAA_QST_CODE end as BAA_QST_CODE,
case
when BAA_QST_TYPE = 'QUE' then BAA_QUE_VALUE
when BAA_QST_TYPE = 'RVA' then (select RVA_DESC from o_Ref_values where BAA_RVA_VALUE = RVA_CODE and RVA_DOMAIN = 'CONTACT_OUTCOME')
end as answer
from
BO_ASSESSMENT_ANSWERS left join o_persons on BAA_SUBJECT_ID = PER_ID
where
BAA_QSA_ID='A1457'
to_date(BAA_ASM_END_DATE,'DD/MM/YYYY') = to_Date(SYSDATE,'DD/MM/YYYY')
and
(BAA_QST_CODE = 'CONTACT_OUTCOME' or BAA_QST_DESC = 'Name of Person or Organisation of contact and their contact details')
order by 3
)
PIVOT
(
MAX(ANSWER)
for BAA_QST_CODE in ('CONTACT_OUTCOME','PER_ORG_OF_CONTACT')
)
The table that comes out of it looks quite normal except that the last two columns, the ones created from the pivot have the names surrounded by the single quotes as shown in the SQL that creates the columns
I think actually I may now be on track to solving this based on a post I saw just before I closed down last night.
The problem with being able to query the temporary table was based on the fact that I was incorrectly creating the columns in the pivot table
the Pivot should read
PIVOT
(
MAX(ANSWER)
for BAA_QST_CODE in ('CONTACT_OUTCOME' as CONTACT_OUTCOME ,'PER_ORG_OF_CONTACT' as PER_ORG_OF_CONTACT )
)
The column names then become queryable and I presume / hope I should be able to extract from the temporary table into a CSV file

Vertica Dynamic Max Timestamp from all Tables in a Schema

System is HP VERTICA 7.1
I am trying to create a SQL query which will dynamically find all particular tables in a specific schema that have a Timestamp column named DWH_CREATE_TIMESTAMP from system tables. (I have completed this part successfully)
Then, pass this list of tables to an outer query or some kind of looping statement which will select the MAX(DWH_CREATE_TIMESTAMP) and TABLE_NAME from all the tables in the list (200+) and union all the results together into one list.
The expected output is a 2 column table with all said tables with that TS field and the max of each value. Tables are constantly being created and dropped, so the point is to make everything totally dynamic where no TABLE_NAME values are ever hard-coded.
Any idea of Vertica specific ways to accomplish this without UDF's would be greatly appreciated.
Inner Query (working):
select distinct(table_name)
from columns
where column_name = 'DWH_CREATE_TIMESTAMP'
and table_name in (select DISTINCT(table_name) from all_tables where schema_name = 'PTG_DWH')
Outer Query (attempted - not working):
SELECT Max(DWH_CREATE_DATE) from
WITH table_name AS (
select distinct(table_name)
from columns
where column_name = 'DWH_CREATE_DATE' and table_name in (select DISTINCT(table_name) from all_tables where schema_name = 'PTG_DWH'))
SELECT MAX(DWH_CREATE_DATE)
FROM table_name
Thanks!!!
No way to do that in one SQL .
You can used the below method for node max timestamp columns values
select projections.anchor_table_name,vs_ros.colname,max(max_value) from vs_ros,vs_ros_min_max_values,storage_containers,projections where vs_ros.colname ilike 'timestamp'
and vs_ros.salstorageid=storage_containers.sal_storage_id
and vs_ros_min_max_values.rosid=vs_ros.rosid
and storage_containers.projection_name=projections.projection_name
group by projections.anchor_table_name,vs_ros.colname

SQL Server : select columns not by name

Is it possible in SQL Server 2008 to select columns not by their names, but in the order as they appear in the table?
The reason is that i want to select the first 5 oder 6 columns of a table, no matter what's the content, because it is possible that their names or the columns self can be changed or moved.
For the first 5 columns you can try this:
select column_name,ordinal_position
from information_schema.columns
where table_schema = ...
and table_name = ...
and ordinal_position <= 5
Hope this works now.
Solution found here.
Edit: Updated answer - old one only selected first 5 rows, not columns.