How to stop partition column appearing last in a SELECT * output, in Hive? - hive

In Apache Hive, I'm trying to copy specific rows from one table to a second table that's identical apart from an additional string column (which I'm calling "report-type") at the end of the second table. Both tables are partitioned by a string field called 'dt' which has a date e.g. "2022-08-04". When I try and copy a row from table 1 to table 2, the data is inserted into table 2 with report-type and dt swapped, because the partition column seems to be forcibly listed last.
E.g. INSERT INTO table2 SELECT *, 'some_report_type' FROM table1 WHERE <some criteria>;
gives all the data in table2 in the correct columns apart from report-type is e.g. "2022-08-04" and dt is e.g. "2022-08-04"
Is there any way around this?
Two solutions I can see are recreate the table without the partitioning (ideally want to avoid) and just have dt as a regular non-partition column, or alternatively specify each of the columns in a column list in the query, but not sure if this would force "dt" to not be the last column, and the main issue with that is I have 830 columns to specify individually.
Thanks

Related

Copy specific columns from one table to another table, and include the source tablename

I have this newly created table in SQL Server with 3 columns ID, Name, Source.
Basically this table will be populated with data from other different tables, each specifically taking in their record IDs and record Names. I believe this can be easily achieved with an INSERT INTO SELECT statement.
I would like to find out on how to populate the Source column. This column is supposed to indicate which table the data came from. For example, Source in table A has 3 records, which I then copied the ID and Name columns from this table, and put it into my destination table.
At the same time, the 3 new records will have their Source column set, indicating it came from Table A. Then I will proceed to do the same for other tables.
You can use the constant string as follows:
INSERT INTO your_table
SELECT id, name, 'TableA' as source
FROM tableA

Hive partition column

We have avro partitioned table in hive. When we query table, partition column is displaying at the end. Is there any way to display partition column at first?
Eg: select * from tablea
Output:
Col1 col2 partition_column
Expected output:
Partition_column col1 col2
Partition column is not stored in files, so, avro or not avro, it does not matter in this context. Partition column corresponds partition sub-folder within table folder and stored in the metadata.
Historically partition column is the last one. dynamic partitioning using Insertoverwrite table partition (partition_column) SELECT * from ...` is rather common scenario. Hive will know partition is the last column.
The dynamic partition columns must be specified last among the columns
in the SELECT statement and in the same order in which they appear in
the PARTITION() clause.
You can change the order of columns displayed when running SELECT * only by creating a view in which you list all columns in the required order, OR select columns explicitly in your select.
Also according to the Codd's theory, column and row order is immaterial, you always must specify columns order desired explicitly in the select and rows order using ORDER BY, instead of relying on columns order and row order in the table or view. But in Hive the partitioning column is the last one in the table.
Consider also this: You may even not know, what you selecting from: table or view. And you may be not notified that upstream system decided to change the table or view eventually. View or table can change the order of columns. Consider view the same as a table when doing selects. It is just abstraction level. Use explicit column list to make your program working reliably always and do not have strong dependency on column order in the underlying table/view, which is immaterial.

Swapping columns in a table to match formatting of another table prior to row insertion

I want to swap columns within Visual Fox Pro 9 in table_1 before inserting its rows into table_2 so as to avoid data losses caused by datatype variations. I tried these two options based on other solutions on stackoverflow, but I get syntax error messages for both command inputs. The name field is of datatype = character(5)and it needs to be after the subdir field.
ALTER table "f:\csp" modify COLUMN name character(5) after subdir
ALTER table "f:\csp" change COLUMN name name character(5) after subdir
I attempted these commands based on solutions here:
How to move columns in a MySQL table?
You never need to change the column order, and you never should rely on column order to do something.
For inserting into another table from this one you could simply select the columns in the order you desired (and their column names do not even need to be the same in the case of "insert ... select ... "). ie:
insert into table_2 (subdir, name) ;
select subdir, name from table_1
Another way is to use the xBase commands like:
select table_2
append from table_1
In the case of latter, VFP would do the match on column names.
All in all, relying on column ordering is dangerous. If you really want to do that, then you can still do, in a number of ways. One of them is to select all data into a temp table, recreate the table in the order you want and fill back from temp (might not be as easy as it sounds if there are existing dependencies such as referential integrity - also you need to recreate the indexes).

Insert all data from an already sorted table into another

I have two tables in an Amazon Redshift cluster that both use a timestamp as sort key. The first table is sorted and contains only data from timepoint 1 to timepoint 2. The second table is only temporary but also sorted and contains data from timepoint 3 to timepoint 4. Is there any to insert all the data from the first table into the second without having to run VACUUM on the table as. A normal INSERT from one table to another always needs a VACUUM afterwards as far as I know.
I know it would be possible if I used COPY on a pre-sorted flat file. But is there also a solution for two pre-sorted tables that does not need a VACUUM?
Option 1:
create new table say final table same schema as table two as you wish to copy content of table 1 to table 2.
Please check
select "column", type, encoding
from pg_table_def where tablename='table2'
this will give encoding used for each column for table 2. Create new final table with same encoding for each column.
Use query to load data in final table in sorted order
insert into final table ( select * from table1 order by timepoint asc)
then fire
insert into final table ( select * from table2 order by timepoint asc )
Option 2:
create final table and load data for timpoint1 , then load for timepoint2.. Continue till time points loaded in sorted manner.
Option 3:
You can check for Deep Copy Redshift option as well
here is the link http://docs.aws.amazon.com/redshift/latest/dg/performing-a-deep-copy.html
While doing deep copy, copy data for table 1 first then load table 2
I have tried this query in SQL Server :
SELECT * INTO table_name FROM old_table_name

Get fields from one column to another in Access

Below i have a table where i need to get fields from one column to three columns.
This is how i would like the data to end up
Column1
Music
Column2
com.sec.android.app.music
Column3
com.sec.android.app.music.MusicActionTabActivity
Give the table a numeric autonumber id
Remove the rows with no data with a select where blank spaces or null
Find records with no point in the content with a select
Use the previous query as a source and use the id to find the id + 1 to find the next record and do the same with + 2 to find the second row
Build a table to hold the new structure and use the query as a source to insert the new created data in the new table with the 3 columns structure.
This is an example using sql server
Test table design
Data in table
Query
Look at the query from the inside. The first query inside clean the null records. Then the second query find the records with out point. This records are the reference to find the related two records. Then the id of the records with out point are used to make a query in the select adding 1 for the next record and then other query adding 2 to find the other record. Now you only need to create a table to insert this data, using this query as the source.