In BigQuery if every query creates a temporary table, is it possible to find out the name of that temporary table in a subsequent SQL statement? - google-bigquery

If every query creates a temporary table, is it possible to find out the name of that temporary table in a subsequent SQL statement?
Per https://cloud.google.com/bigquery/querying-data :
All query results are saved to a table, which can be either persistent
or temporary:
A temporary table is a randomly named table saved in a special
dataset; The table has a lifetime of approximately 24 hours. Temporary
tables are not available for sharing, and are not visible using any of
the standard list or other table manipulation methods.
For example:
/* query 1 */
select * from whatever;
/* query 2 */
select * from [temporary_table_that_just_got_created];

You cannot determine that name from within BigQuery Web UI using BigQuery SQL
But, Yes, you can find it by looking in Query History
You should find you query in history list and expand it by clicking on it
Name you are looking for is under "Destination Table"
It is also possible to do this with BigQuery API and very useful in client applications
You should use Jobs: get API and see configuration.query.destinationTable (https://cloud.google.com/bigquery/docs/reference/v2/jobs#resource)

Related

Hive retrieving meta data too slow

Situation :
I am aiming to retrieve location info for a list of hive external tables. For one table the easiest way is just using show create table table_name but I have quite a number of table, so I am finding alternative to achieve this. I managed to find there is the sys db in hive.
It seems the db is storing meta info of all tables, and I found the table sds that is storing these location info.
However when I query this sds table with the simplest where query select * from sds where sd_id = a_sd_id searching for info of only one table. It takes more than 50 seconds to return the result.
On the other hand, what is weird is that if I try to retrieve the same info using show create table the_table_name command, all table info include the location info is returned in 0.05 second .
So now my questions is when I trigger show create table, where did hive retrieve these info? Is it the same source when I query from the sys.sds table? If the two are the same source then the huge time gap between the ways cannot be explained.
Could anyone help cast some light on why the situation turns out like this and how can I retrieve the location info as I expected, i.e. retrieving from mysql metastore which can return as fast as the show create table command? I suppose the show create table should be accessing the mysql. But if the sys db is a mapping of the mysql db, why the query on these tables return 100 times slower than the show create?

How to drop columns from a partitioned table in BigQuery

We can not use create or replace table statement for partitioned tables in BigQuery. I can export the table to GCS but BigQuery generates then multiple JSON files that can not be imported into a table in once. Is there a safe way to drop a column from a partitioned table? I use BigQuery's web interface.
Renaming a column is not supported by the Cloud Console, the classic BigQuery web UI, the bq command-line tool, or the API. If you attempt to update a table schema using a renamed column, the following error is returned: BigQuery error in update operation: Provided Schema does not match Table project_id:dataset.table.
There are two ways to manually rename a column:
Using a SQL query: choose this option if you are more concerned about simplicity and ease of use, and you are less concerned about costs.
Recreating the table: choose this option if you are more concerned about costs, and you are less concerned about simplicity and ease of use.
If you want to drop a column you can either:
Use a SELECT * EXCEPT query that excludes the column (or columns) you want to remove and use the query result to overwrite the table or to create a new destination table
You can also remove a column by exporting your table data to Cloud Storage, deleting the data corresponding to the column (or columns) you want to remove, and then loading the data into a new table with a schema definition that does not include the removed column(s). You can also use the load job to overwrite the existing table
There is a guide published for Manually Changing Table Schemas.
edit
In order to change a Partitioned table to a Non-partitioned table, you can use the Console to query your data and overwrite your current table or copy to a new one. As an example, I have a table in BigQuery partitioned by _PARTITIONTIME. I used the following query to create a non-partitioned table,
SELECT *, _PARTITIONTIME as pt FROM `project.dataset.table`
With the above code, you will query the data among all table's partitions and create an extra column to show which partition it came from. Then, before executing it, there are two options, save the view in a new non-partitioned table or overwrite the current table:
Creating a new table go to: More(under the query editor) > Query Settings > Check the box "Set a destination table for query results" > Choose your project, dataset and write your new table's name > Under Destination table write preference check Write if empty.
Overwriting the current table: More(under the query editor) > Query Settings > Check the box "Set a destination table for query results" > Choose the same project and dataset for your current table > Write the same table's name as the one you want to overwrite > Under Destination table write preference check Overwrite table.
credit

Migrating data from one table of one database to other table of other database

I am trying to migrate data from table A of database DB1 to table B of database DB2 using java and Oracle.
I am using java 1.8 and my source database has Oracle 11g and destination database has Oracle 12c.
I made structure (scema, tables )of destination database in source database. And migrating as by making use of *insert into dest select * from source* query in java . but as the number of records in source table in millions so it's consuming time.. and later on this migrated data i want to export into my actual destination so that too will going to take time.
As per my little knowledge.. i think I can't use prepared statement with 2 connection. Because my table consists of 400 to 500 columns , so binding that many columns with prepared statement is not a good idea. Also my structure of source and destination tables are different. I made the field mapping in properties file where I mapped the old field to new field for insert into select * from tbl query. Like my source table has column as col0001 and the corresponding column in destination is ref_no. So this too will not allow me to use prepared statement. But by making use of statement in java i can migrate data in single dB only.
I tried with dblink also. But for clob datatype i am not able to migrate data.
Kindly provide the solution if anyone did something like this previously.
For a one-off copy, you can do a direct mode insert:
insert /*+ APPEND */ into local_table select * from table#database_link;
Here are some other related links.

Create Partition table in Big Query

Can anyone please suggest how to create partition table in Big Query ?.
Example: Suppose I have one log data in google storage for the year of 2016. I stored all data in one bucket partitioned by year , month and date wise. Here I want create table with partitioned by date.
Thanks in Advance
Documentation for partitioned tables is here:
https://cloud.google.com/bigquery/docs/creating-partitioned-tables
In this case, you'd create a partitioned table and populate the partitions with the data. You can run a query job that reads from GCS (and filters data for the specific date) and writes to the corresponding partition of a table. For example, to load data for May 1st, 2016 -- you'd specify the destination_table as table$20160501.
Currently, you'll have to run several query jobs to achieve this process. Please note that you'll be charged for each query job based on bytes processed.
Please see this post for some more details:
Migrating from non-partitioned to Partitioned tables
There are two options:
Option 1
You can load each daily file into separate respective table with name as YourLogs_YYYYMMDD
See details on how to Load Data from Cloud Storage
After tables created, you can access them either using Table wildcard functions (Legacy SQL) or using Wildcard Table (Standar SQL). See also Querying Multiple Tables Using a Wildcard Table for more examples
Option 2
You can create Date-Partitioned Table (just one table - YourLogs) - but you still will need to load each daily file into respective partition - see Creating and Updating Date-Partitioned Tables
After table is loaded you can easily Query Date-Partitioned Tables
Having partitions for an External Table is not allowed as for now. There is a Feature Request for it:
https://issuetracker.google.com/issues/62993684
(please vote for it if you're interested in it!)
Google says that they are considering it.

How to transfer data between databases with SELECT statement in Teradata

So I am stuck on this Teradata problem and I am looking to the community for advice as I am new to the TD platform. I am currently working with a Teradata Data Warehouse and have an interesting task to solve. Currently we store our information in a live production database but want to stage tables in another database before using FastExport to export the files. Basically we want to move our tables into a database to take a quick snapshot.
I have been exploring different solutions and am unsure how to proceed. I need to be able to automate a create table process from one DB in Teradata to another. The tricky part is I would like to create many tables off of the source table using a WHERE clause. For example, I have a transaction table and want to take a snapshot of the transaction table for a certain date range month by month. Meaning that the original table Transaction would be split into many tables such as Transaction_May2001, Transaction_June2001, Transaction_July2001 and so on and so forth.
Thanks
This is assuming by two databases you are referring to the same physical installation of Teradata.
You can use the CREATE TABLE AS construct to accomplish this:
CREATE TABLE {MyDB}.Transaction_May2001
AS (
SELECT *
FROM Transaction
WHERE Transaction_Date BETWEEN DATE '2001-05-01' AND '2001-05-31'
)
{UNIQUE} PRIMARY INDEX ({Same PI definition as Transaction Table})
WITH DATA AND STATS;
If you neglect to specify the explicit PI in the CREATE TABLE AS then Teradata will take the first column of the SELECT clause and use it as the PI of the new table.
Otherwise, you would be looking to use a Teradata utility as suggested by ryanbwork in the comment to your question.