What is the difference between sql view and hive external table? - sql

What is the difference between sql view and hive external table?

The question is valid
althougth Koushik Roy's comment is also true, it is possible to compare a Hive external table to a sql View
Koushik Roy's comment:
a view is a select statement, an external table is table(require physical storage). So, fundamentally two diff objects.
How I see those two objects:
SQL View -> A select statement on a physical table
HIVE External Table -> A query on a physical file
Therefore they are very similar

Related

Add new partition-scheme to existing table in athena with SQL code

Is it even possible to add a partition to an existing table in Athena that currently is without partitions? If so, please also write syntax for doing so in the answer.
For example:
ALTER TABLE table1 ADD PARTITION (ourDateStringCol = '2021-01-01')
The above command will give the following error:
FAILED: SemanticException table is not partitioned but partition spec exists
Note: I have done a web-search, and variants exist for SQL server, or adding a partition to an already partitioned table. However, I personally could not find a case where one could successfully add a partition to an existing non-partitioned table.
This is extremely similar to:
SemanticException adding partiton Hive table
However, the answer given there requires re-creating the table.
I want to do so without re-creating the table.
Partitions in Athena are based on folder structure in S3. Unlike standard RDBMS that are loading the data into their disks or memory, Athena is based on scanning data in S3. This is how you enjoy the scale and low cost of the service.
What it means is that you have to have your data in different folders in a meaningful structure such as year=2019, year=2020, and make sure that the data for each year is all and only in that folder.
The simple solution is to run a CREATE TABLE AS SELECT (CTAS) query that will copy the data and create a new table that can be optimized for your analytical queries. You can choose the table format (Parquet, for example), the compression (SNAPPY, for example), and also the partition schema (per year, for example).

Azure SQL: Is it possible to use the view from another database?

Being new to Azure SQL, I am already successfully JOINing my table with the table from the other database (by defining the CREATE EXTERNAL TABLE and the related things). However, some queries contain views from the other database, not the tables. How can I join my table with that external view? Is it possible at all? Or do I have to copy the code from the remote view and work directly with the external tables?
with create external table you can access a table or view :
take a look at
https://medium.com/#fbeltrao/access-another-database-in-azure-sql-1afc526b7ad4

Hive -where are tables information stored

I am creating and insert tables in HIVE,and the files are created on HDFS and some on external storage S3
Assuming if I created a 10 tables,is there any system table in Hive where I can find the table info created by the user??? (for example like in Teradata we have DBC.tablesv which hold information of all the user defined tables)
You can find where you metastore is configured to be in the hive-site.xml file.
Its usual location is under /etc/hive/{$hadoop_version}/ or /etc/hive/conf/.
grep for "hive.metastore.uris" or "javax.jdo.option.ConnectionURL" to see which db you are using for the metastore. The credentials should also be there.
If, for example, your metastore is on a MySQL server, you can run queries like
SELECT * FROM TBLS;
SELECT * FROM PARTITIONS;
etc
You can't query (as in SELECT ... FROM...) the metadata from within Hive.
You do however have comnands that display that information, e.g. show databases, show tables, desc MyTable etc.
I'm not sure I understood 100% your question, if you mean the informations about the creation of the table, like the query itself, with the location on HDFS, table properties, etc, you can try with:
SHOW CREATE TABLE <table>;
If you need to retrieve a list of the columns names and datatypes try with:
DESCRIBE <table>;

PXF Hive Plugin, to select only the columns selected in the query

Is there a way to PXF select only the column used in the query, apart from Hive partition filtering.
I have data stored in Hive-ORC format and using pxf external table to execute queries in HAWQ. The biggest tables are stored in Hive and we cannot make another copy of data in HAWQ.
Thanks--
P.S - Does the query optimizer collect stats on external tables in HAWQ 2.0?
You can always run a select foo from bar type query on external tables in HAWQ. However, if your question is whether PXF actually does column projection to avoid reading all the columns then the answer is No. Currently PXF will read all columns from an ORC file and return the records to HAWQ which then does the projection filtering on its end. However, https://issues.apache.org/jira/browse/HAWQ-583, is actively being worked on and should be released in an upcoming version of HAWQ which will pushdown column projections down to ORC to improve read performance of ORC files
Yes, the query optimizer does collect statistics on external tables, this is also handled by PXF. However, this is only for some data sources: https://issues.apache.org/jira/browse/HAWQ-44

Create Partition table in Big Query

Can anyone please suggest how to create partition table in Big Query ?.
Example: Suppose I have one log data in google storage for the year of 2016. I stored all data in one bucket partitioned by year , month and date wise. Here I want create table with partitioned by date.
Thanks in Advance
Documentation for partitioned tables is here:
https://cloud.google.com/bigquery/docs/creating-partitioned-tables
In this case, you'd create a partitioned table and populate the partitions with the data. You can run a query job that reads from GCS (and filters data for the specific date) and writes to the corresponding partition of a table. For example, to load data for May 1st, 2016 -- you'd specify the destination_table as table$20160501.
Currently, you'll have to run several query jobs to achieve this process. Please note that you'll be charged for each query job based on bytes processed.
Please see this post for some more details:
Migrating from non-partitioned to Partitioned tables
There are two options:
Option 1
You can load each daily file into separate respective table with name as YourLogs_YYYYMMDD
See details on how to Load Data from Cloud Storage
After tables created, you can access them either using Table wildcard functions (Legacy SQL) or using Wildcard Table (Standar SQL). See also Querying Multiple Tables Using a Wildcard Table for more examples
Option 2
You can create Date-Partitioned Table (just one table - YourLogs) - but you still will need to load each daily file into respective partition - see Creating and Updating Date-Partitioned Tables
After table is loaded you can easily Query Date-Partitioned Tables
Having partitions for an External Table is not allowed as for now. There is a Feature Request for it:
https://issuetracker.google.com/issues/62993684
(please vote for it if you're interested in it!)
Google says that they are considering it.