I have a pipe delimited text file with 400 values. Out of which I need to load only 40 values at positions[1, 2, 4, 5, 7, 8, 9, 15, 17, 18, 20...] into my Hive table, how can it be achieved?
By the book: create an EXTERNAL table to map your Text file, with 400 columns; create a managed table with 40 columns; then use SQL to INSERT INTO TABLE target SELECT col1, col2, col4, ..., col72 FROM wide_source
Actually, you don't need to map all 400 columns -- stop at the last column that you want to use in SQL and ignore the rest.
Related
I have encountered a problem when I am trying to write a script to one version of Insert and Update Incremental Load.
Example: To simplify the example I have made an illustration of how I want the data-set to update. (I leave the code for the discussion)
Illustration:
In the example above, you can see that I both want insert new records and to update records. The condition is that I only want to update records if the new value is greater than the existing record.
For instance, the existing record for ID 2 equals to 0 (Table 1) and since the new record for ID 2 equals to 100 (Table 2) I want to Update that record so the final updated record for ID 2 equals to the highest value of them two (Updated Table). If Table 2 contains a new record I just want to add that record to the final data-set.
Description:
Insert new records
Update records if the value is higher than the existing record
What do you guys think is the best solution for this kind of problem?
Not sure it's the best solution
//QVD
Table:
LOAD * INLINE [
ID, Value, Source
1, 500, 'QVD'
2, 0, 'QVD'
3, 100, 'QVD'
4, 300, 'QVD'
5, 0, 'QVD'
];
//ODBC
Concatenate(Table)
LOAD * INLINE [
ID, Value, Source
2, 100, ODBC
3, 700, ODBC
4, 300, ODBC
6, 500, ODBC
7, 0, ODBC
];
NewTable:
LOAD
ID,
max(Value) as Value
Resident Table
Group by ID
;
drop Table Table;
I have a script which return rows from a table , using which i migrate data/create new tables.
for one of the tables , because it has null values :it returns me some thing like:
insert into table1(column 1, column 2,column 3, column 4) values (abc,,,cdf);
will this insert query work as such or i need to take care of null values.
Edited :my script does this
do a selct query and pick columns from an old separate DB
print an output file where insert query has columns values passed from RESULTSET of above Select statement.
later i use this to directly update new tables in new database.
I am not sure about your requirement ,You can write script like this
insert into table2(column 1, column 2,column 3, column 4)
select column 1,column 2,column 3,column 4 from table1
where isnull(column 1,'')<>'' and isnull(column 2,'')<>'' and isnull(column 3,'')<>'' and isnull(column 4,'')<>''
I have a lot of stores in my database and i have some similar data that has to be in all of the stores. Here is my example:
INSERT INTO [dbo].[stores]
([identifiers],
[sales_price],
[discount],
[store])
VALUES ('9788276911',
99,
20,
'store121')
Is it any ways i can insert this data in all stores and not only 'store121'? Just looking for a easy way out here really :)
First, if you don't have your store names in table, you should create a table and populate it with names (copy/paste from your Excel).
If we assume your names are in table StoreNames, column Store, you can use a query like this to insert same data to table stores for all your stores
INSERT INTO [dbo].[stores]
([identifiers],
[sales_price],
[discount],
[store])
SELECT '9788276911',
99,
20,
[store]
FROM StoreNames
SQLFiddle DEMO
update stores
set identifiers = '9788276911',
sales_price = 99,
discount = 20
will update all records.
Using the solution proposed by #Nenad Zivkovic, you can also enumerate your 15 stores in a row constructor instead of reading the stores table if that makes it easier for you:
INSERT INTO [dbo].[stores]
([identifiers],
[sales_price],
[discount],
[store])
SELECT '9788276911',
99,
20,
[store]
FROM (values
('store1'),('store2'),('store3'),('store4')) x(store)
I am new to hive. I just wanted to know how I can insert data into Hive table directly
Create table t1 ( name string)
and I want to insert a value eg name = 'John'
But I have seen so many documentation there isn't any example that inserts data directly into the table. Either I need to create a file internal or externally and add the value 'John' and load this data into the table or i can load data from another table.
My goal is to add data directly into the hive table by providing a values directly? I have provided an oracle example of a sql query I want to achieve:
INSERT INTO t1 (name)
values ('John')
I want an equivalent statement as above in Hive ?
You can use hive's table generating functions,like exlode() or stack()
Example
Table struct as (name String, age Int)
INSERT INTO TABLE target_table
SELECT STACK(
2, # Amount of record
'John', 80, # record 1
'Bill', 61 # record 2
)
FROM dual # Any table already exists
LIMIT 2; # Amount of record! Have to add this line!
That will add 2 records in your target_table.
As of latest version of Hive, insert into .. values (...)is not supported. The enhancement to insert/update/delete syntax is under development. Please look at the Implement insert, update, and delete in Hive with full ACID support
Inserting values into a table is now supported by HIVE from the version Hive 0.14.
CREATE TABLE students (name VARCHAR(64), age INT, gpa DECIMAL(3, 2)) CLUSTERED BY (age) INTO 2 BUCKETS STORED AS ORC;
INSERT INTO TABLE students VALUES ('fred flintstone', 35, 1.28), ('barney rubble', 32, 2.32);
More can be found at https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-InsertingvaluesintotablesfromSQL
I'm curious about the error I just got after inserting this data, like so:
insert into bi_sessions
values (295377778, '04dzzzf7-e66c-4e6d-9c42-465a16546e34', 1, 43223810, 48, 1043, 'BELGIUM BEER Survey (QA)', 54, 'Synovate Panel', -1 , 2.5, 6, 3, 2.5 , 2.5, '2010-04-01 00:00:00.000', '2010-04-01 00:00:30.000', -1 ,3, 1, '000708c8507696c06f777', '68.200.93.212', 20, '04dea8f7-e66c-4e6d-9c42-465a16546777' , -1, NULL, 55743 ,9 , 'Untargeted', 3, 2, 2016, 'General', 1966, '2010-04-01 00:00:22.000', 1966, '2010-04-01 00:00:32.000', 1, 9, 'English - United States', 'Federated Sample', 1)
The error message reads:
An explicit value for the identity column in table 'sessions' can only be specified when a column list is used and IDENTITY_INSERT is ON.
It's confusing - what is an identity column ?
Your table has an identity column, meaning that you have a column that automatically increment its value when a row is inserted. Since this is, well, automatic, you can't explicitely insert a value into that column unless you use SET IDENTITY_INSERT ON. So, in your case, you shoul do:
INSERT INTO bi_sessions(<list of all non-identity column>)
VALUES (<your values>)
An IDENTITY column is a column where SQL Server itself determines the value. It's typically used to generate unique identity values as primary keys for a table (like your sessions table) - other RDBMS call this an auto-increment or auto-numbering column.
Since SQL Server determines those values itself, you're not allowed to insert your own values into that column.
In order to do that, you need to explicitly specify the list of columns in your INSERT statement:
INSERT INTO dbo.bi_sessions(Col1, Col2, ...., ColN)
VALUES (...., ..., .......)