Creation of pipe-delimited hive table - duplicate ids - hive

I'm trying to create a pipe-delimited hive table using these commands:
CREATE TABLE IF NOT EXISTS tableA (
id string,
col1 double,
col2 double,
col3 double,
col4 double,
col5 double,
col6 double,
col7 double
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'
tblproperties ("skip.header.line.count"="1");
INSERT INTO TABLE TABLEA
select a.id
b.col1,
b.col2,
b.col3,
b.col4,
b.col5,
b.col6,
b.col7
FROM customerTable as a left join factTable as b on a.id = b.id;
I get duplicate records in the new table, tableA. I checked using
select count(distinct id) as cnt from tableA ;
Whereas if I create a normal hive table like this, I don't get any duplicate ids:
Create table if not exists tableA as
select a.id
b.col1,
b.col2,
b.col3,
b.col4,
b.col5,
b.col6,
b.col7
FROM customerTable as a left join factTable as b on a.id = b.id;
The table created is in order of 80 Million rows but the difference in the number of records ( duplicate records) is only 58 records.
Not sure whats going on. I guess the problem is with how I'm creating the pipe-delimited hive table. Any help would be appreciated.

Remove tblproperties ("skip.header.line.count"="1"); property in your create table statement and run the insert statement again.

Related

create a view in HIVE merge two tables

How can I create a view to merge three tables
The workflow is like initially one table was created in mysql now this table has been divided to 3 tables and kept in hive
so for that I need to create a view
Initially in mysql one table for eg the table name is Initialtable.
This Initialtable consists of col1,col2,col3,col4,col5
now this table has been divided to 3 tables in hive and I need to merge these tables using a view
1)table1
2)table2
3)table3
Now this table1 consists of col1,col3,col5
table2 consists of col1,col2,col3
table3 consists of col1,col5
Now I have to create a view so that I can merge these table1,table2,table3
for that I will put the non used columns in table1,table2,table3 as null
like create view v1 select col1,col2 as null,col3,col4,col5 from table1 union select col1,col2,col3,col4 as null,col5 as null from table 2 union col1,col2 as null,col3 as null, col4 as null,col5 from table 3
can someone provide a proper syntax to gain this output in hive
Assuming table1, table2, table3 are the three tables which were split and the columns are as below:
table1: col1,col3,col5
table2: col1,col2,col3
table3: col1,col4,col3
and col1 is the primary key across all the three tables. You can create a view as below:
CREATE OR replace VIEW initialtable AS
SELECT DISTINCT a.col1,
b.col2,
a.col3,
c.col4,
a.col5
FROM TABLE1 AS a
join TABLE2 AS b
ON ( a.col1 = b.col1 )
join TABLE3 AS c
ON ( c.col1 = a.col1 )

How to update table without using update keyword

I have table a ,table b with same columns .I want to replace the value in table b with table a value without using update keyword.
The question could use a bit more detail on the table structure, what exactly you're trying to accomplish, and what precludes you from using UPDATE, but here goes:
CREATE TABLE #tempTable (col1, col2, col3, ...)
INSERT INTO #tempTable
SELECT
b.col1
, b.col2
, a.col3
, ...
FROM a
INNER JOIN b
ON a.col1 = b.col1
DELETE FROM b
WHERE col1 IN (SELECT col1 FROM a)
INSERT INTO b
SELECT
col1
, col2
, col3
, ...
FROM #TempTable
Which of course makes the bold assumption that Table a and b share a primary key, and that Table b doesn't have any constraint that would prevent deletion of matched rows. Please, provide some more detail and I'll update my answer accordingly.

Selecting rowset when value exists in one of 5 tables with different amounts of columns

Using SQL Server, I Need to return the entire row from whatever table contains 'value' in the Filename column (A column each of the tables contain), but the tables do not have the same number of columns, and each table has unique columns with their own specific data types (The only column Name/Type they have in common is the Filename column that I need to check for 'value').
Ideally, I would be able to do something along the lines of:
SELECT * FROM Table1, Table2, Table3, Table4, Table5
WHERE Filename = 'someValue'
Since all tables share the same column name for the Filename.
I have tried using Union but have issues since the number of columns and datatypes of the tables do not align.
I have also tried every combination of JOIN I could find.
I'm sure this could be accomplished with IF EXISTS, but that would be many, many lines of what seems like unnecessary code. Hoping there is a more elegant solution.
Thanks in advance!
You can try to join the tables together. First create temporary table where you store the input. And then join the tables with this temporary to get all records you want. When there is no record for that filename in the table, then you will get NULL values.
create table Table1 (id int,value int);
insert into Table1 values (1,10)
create table Table2 (id int,value int);
insert into Table2 values (1,20)
create table Table3 (id int,value int);
insert into Table3 values (2,30)
Here is the query itself
create table #tmp (id int)
insert into #tmp
values (1)
select t.id, t1.value, t2.value, t3.value from #tmp as t
left join Table1 as t1
on t.id = t1.id
left join Table2 as t2
on t.id = t2.id
left join Table3 as t3
on t.id = t3.id
And this is what you get
id value value value
1 10 20 NULL
this should work too:
EXEC sp_MSforeachtable
#command1='SELECT * FROM ? where filename = ''someValue''',
#whereand='AND o.id in (select object_id from sys.tables where name in (''Table1'',''Table2'',''Table3''))'

SQL Server: add values of small tables to the values of big table without losing the dimensions of the big table?

I have 3 tables. I want to add corresponding values from second table and third table to the first table in the picture below. Each table has an ID by which they can be matched, ... field in the pictures. The first table has 1531 rows with an ID column and 8 other columns. This table, the top table in the pictures, is almost full of zeroes.
I have tried to join the tables in different ways but the problem is that each table has different number of rows and hence different number unique IDs. The top table has all IDs.
Is there some convenient way to add the second table to the first table and then the third table to that result?
Result of Left Join as suggested Suzena: why do the numbers not get summed up together?
Method1: Joins
select a.id,(a.col1 + b.col1+c.col1) as col1, (a.col2 + b.col2 + c.col2) as col2, (a.col3 + b.col3 + c.col3) as col3
from
table1 a
left join
table2 b
on a.id = b.id
left join
table3 c
on a.id = c.id;
Method2: Unions
select id,sum(col1) col1, sum(col2) col2, sum(col3) col3
from
(
select id,col1,col2,col3
from table1
union all
select id,col1,col2,col3
from table2
union all
select id,col1,col2,col3
from table3
) t
group by id
Let me know if you have any different criteria.
Method 3: having different number of fields so use NULL or 0
SELECT
[MID],
SUM([KEVAT 201501-04]) AS 'KEVAT 201501-04',
SUM([KESA 201504-06]) AS 'KESA 201504-06',
SUM([SYKSY 201507-09]) AS 'SYKSY 201507-09',
SUM([TALVI 201510-12]) AS 'TALVI 201510-12',
SUM([KEVAT 201601-04]) AS 'KEVAT 201601-04',
SUM([KESA 201604-06]) AS 'KESA 201604-06',
SUM([SYKSY 201607-09]) AS 'SYKSY 201607-09',
SUM([TALVI 201610-12]) AS 'TALVI 201610-12'
FROM
(
SELECT * FROM TABLE1
UNION ALL
SELECT [MID]
,0 AS 'KEVAT 201501-04'
,0 AS 'KESA 201504-06'
,0 AS 'SYKSY 201507-09'
,0 AS 'TALVI 201510-12'
,[KEVAT 201601-04]
,[KESA 201604-06]
,[SYKSY 201607-09]
,[TALVI 201610-12]
FROM TABLE2
UNION ALL
SELECT [MID]
,[KEVAT 201501-04]
,[KESA 201504-06]
,[SYKSY 201507-09]
,[TALVI 201510-12]
,0 AS 'KEVAT 201601-04'
,0 AS 'KESA 201604-06'
,0 AS 'SYKSY 201607-09'
,0 AS 'TALVI 201610-12'
FROM TABLE3
) a
GROUP BY [MID]
If i understand your question, you could use an union. Something like:
insert into table1(col1,col2,col3,col4)
(select col1,col2,col3,col4 from table2 union
select col1,col2,col3,col4 from table3)
The names of the columns of table2 and table3 must match. Use alias for that.
Try using MERGE
--Get data from table 2 and merge into table 1
MERGE Table_1 AS TARGET
USING (SELECT [ID]
,[KEVAT 201501-04]
,[KESA 201504-06]
,[SYKSY 201507-09]
,[TALVI 201510-12] FROM Table_2) AS SOURCE
ON (TARGET.ID = SOURCE.ID)
WHEN MATCHED
THEN UPDATE SET
TARGET.[KEVAT 201501-04] = SOURCE.[KEVAT 201501-04],
TARGET.[KESA 201504-06] = SOURCE.[KESA 201504-06],
TARGET.[SYKSY 201507-09] = SOURCE.[SYKSY 201507-09],
TARGET.[TALVI 201510-12] = SOURCE.[TALVI 201510-12];
GO
--Get data from table 3 and merge into table 1
MERGE Table_1 AS TARGET
USING (SELECT [ID]
,[KEVAT 201601-01]
,[KESA 201604-06]
,[SYKSY 201607-09]
,[TALVI 201610-12] FROM Table_3) AS SOURCE
ON (TARGET.ID = SOURCE.ID)
WHEN MATCHED
THEN UPDATE SET
TARGET.[KEVAT 201601-01] = SOURCE.[KEVAT 201601-01],
TARGET.[KESA 201604-06] = SOURCE.[KESA 201604-06],
TARGET.[SYKSY 201607-09] = SOURCE.[SYKSY 201607-09],
TARGET.[TALVI 201610-12] = SOURCE.[TALVI 201610-12];
GO

using sql create a new table with 2 fields, field1 from tableA and field1 from table B

Am new to SQL and am stuck here with a very simple-looking query request.
I have 2 tables, both having exactly the same structure (IE same no. of columns, same no. Of rows) except for the actual contents. so for example,tableA has 2 columns called col1&col2; tableB has 2 columns too called col1&col2. Now I want to create a 3rd new tale, where 1st column is tableA's col1, and 2nd column is tableB's col1. preferably the name of the 1st column is fromTableA, and name of 2nd column is fromTableC. How do I achieve this please? I tried all the following ways but I always get the same error: "number of query values and destination fields are not the same."
variation 1:
insert into newTable(fromTable1,fromTable2)
select col1 from table1
select col1 from table2
variation 2:
insert into newTable(fromTable1,fromTable2)
select col1 from table1,col1 from table2
variation 3:
insert into newTable(fromTable1,fromTable2)
select col1 from table1, table2
Presumably you have fields in the two tables that can be joined, so this:
insert into newtable (romTable1,fromTable2)
select a.col1, b.col1
from table1 a, table2 b
where a.col1 = b.col1;
The a/b are aliases that differentiate between the two columns in each table. If you don't have fields to join then whatever you're trying to do probably needs a rethink.
You may try following sql query to achieve your purpose:
with OrderedTableA as (
select row_number() over (order by Col1) RowNum, *
from TableA (nolock)
),
OrderedTableB as (
select row_number() over (order by Col1) RowNum, *
from TableB (nolock)
)
select T1.Col1, T2.Col2 into TableC
from OrderedTableA T1
full outer join OrderedTableB T2 on T1.RowNum = T2.RowNum
Above query will create a new table as TableC with column col1 from TableA and col2 from TableB. You may change the queries to your need.
I hope you will understand the above queries. Give it a try.