Hive Table output shows one non existent Null values when Queried

Hive Table output shows one non existent Null values when Queried - hive

I know similar questions have been asked, but my Select query FOR THE HIVE TABLE returns all the correct columns and one extra NULL column.
I have created a HIVE TABLE and now trying to query it using SELECT
DDL :
Table DDL :
CREATE TABLE IF NOT EXISTS family (name STRING COMMENT 'Member Name',
Age INT COMMENT 'Age of the Member',
floor INT COMMENT 'Residence floor',
salary float COMMENT 'per month salary',
birthplace STRING COMMENT 'State of birth',
education STRING,
gender STRING )
COMMENT 'basic details of fmaily members'
LOCATION " /user/hive/warehouse/family/MANJREKAR"
TBLPROPERTIES ('creator'='Sarang', 'created_at'='2019-10-06 14:00:00') ;
DML :
LOAD DATA LOCAL INPATH '/Users/tcssig/Documents/Hive/warehouse/Imports' OVERWRITE INTO TABLE family;
Data to load
A 50 3 90000 Maharshtra UG M
B 46 3 40000 Maharshtra UG F
C 15 3 0 MP HS F
D 24 3 10000 Maharshtra PG F
E 85 3 7000 Maharshtra HS F
F 28 2 60000 MP UG M
G 59 2 60000 Maharshtra UG M
H 21 2 0 MP HS F
I 28 2 25000 Maharshtra PHD F
J 32 1 30000 Maharshtra PG M
K 26 1 0 MP UG F
L 58 1 55000 Maharshtra UG F
M 63 1 25000 Maharshtra UG M
SELECT name, salary from family;
Output :
"A",50,3,90000,"Maharshtra","UG","M" NULL
"B",46,3,40000,"Maharshtra","UG","F" NULL
"C",15,3,0,"MP","HS","F" NULL
"D",24,3,10000,"Maharshtra","PG","F" NULL
"E",85,3,7000,"Maharshtra","HS","F" NULL
"F",28,2,60000,"MP","UG","M" NULL
"G",59,2,60000,"Maharshtra","UG","M" NULL
"H",21,2,0,"MP","HS","F" NULL
"I",28,2,25000,"Maharshtra","PHD","F" NULL
"J",32,1,30000,"Maharshtra","PG","M" NULL
"K",26,1,0,"MP","UG","F" NULL
"L",58,1,55000,"Maharshtra","UG","F" NULL
"M",63,1,25000,"Maharshtra","UG","M" NULL
How do I get just the selected columns for my query ?

It appears your whole row land into the name field. With all other fields being Null.
As this is the first field, I bet the separator you have in the file, is not what you specify in your code.
You seem to have comma separated data, which is not the default for Hive tables.
Look here for an example on the right syntax for declaring the table: https://stackoverflow.com/a/48616635

Related

Using a Selected table for two sets of joins

I'm rationalising some old SQL tables that exist at a lot of remote sites, so I need to build a query that will make new good tables out of the bad old ones. So for this, we have table1 which has the columns DataGroup and Case1 as nvarchar, but these are enums in the application, so I've made new tables to store the enums, but I need to get the IDs. Unfortunately, we need to store all of the enums for this table in a single table, so the ExData table contains 4 columns: id, name, ExGroupId and DataGroupId.
As DataGroup in table1 is text, we need to look that up for the int id as well from a kvp table DataGroupTable
This is the query I have so far:
SELECT
<Other Columns>,
t1.ExDataId AS Case1
FROM
table1
LEFT JOIN (
SELECT
DataGroupTable.name AS dataGroup,
ExData.id AS ExDataId,
ExData.name AS ExDataName,
ExGroup.name AS ExGroupName
FROM
ExData
LEFT JOIN DataGroupTable ON DataGroupTable.id = ExData.dataGroupId
LEFT JOIN ExGroup ON ExGroup.id = ExData.ExGroupId
) t1 ON t1.dataGroup = table1.DataGroup
AND t1.ExGroupName = 'case1'
AND t1.ExDataName = table1.Case1
GO
... But while this works to retrieve Case1, how would I go about getting Case2?
I have 7 cases to handle, and whilst I could solve this with liberal copy-pasting, that is far from elegant.
Additionally, this is all going into an INSERT statment, so ideally this should return Case1, Case2 etc as ExDataId's
Please help.
Sample Data as requested, All id's will start from 0, but I have made all of the below unique for clarity.
table1:
DataGroup Case1 Case2 Case3 <Other Columns>
ABCD bob bob chris 1
ABCD pete gary chris 2
EFGH bob mike rod 3
DataGroupTable:
id name
11 ABCD
12 EFGH
ExGroup:
id name
21 case1
22 case2
23 case3
ExData:
id name ExGroupId dataGroupId
31 bob 21 11
32 pete 21 11
33 bob 21 12
34 bob 22 11
35 gary 22 11
36 mike 22 12
37 chris 23 11
38 rod 23 12
Ideal Result:
<Other Columns> Case1 Case2 Case3
1 31 34 37
2 32 35 38
3 33 36 38

How about a Common Table Expression ?
WITH ExDataCTE AS (
SELECT
DataGroupTable.name AS dataGroup,
ExData.id AS ExDataId,
ExData.name AS ExDataName,
ExGroup.name AS ExGroupName
FROM
ExData
LEFT JOIN DataGroupTable ON DataGroupTable.id = ExData.dataGroupId
LEFT JOIN ExGroup ON ExGroup.id = ExData.ExGroupId)
SELECT
<Other Columns>,
t1.ExDataId AS Case1,
t2.ExDataId AS Case2,
t3.ExDataId AS Case3
FROM
table1
LEFT JOIN ExDataCTE t1 ON (t1.dataGroup = table1.DataGroup
AND t1.ExGroupName = 'case1'
AND t1.ExDataName = table1.Case1)
LEFT JOIN ExDataCTE t2 ON (t2.dataGroup = table1.DataGroup
AND t2.ExGroupName = 'case2'
AND t2.ExDataName = table1.Case2)
LEFT JOIN ExDataCTE t3 ON (t3.dataGroup = table1.DataGroup
AND t3.ExGroupName = 'case3'
AND t3.ExDataName = table1.Case3)

Query to join two tables using two different columns from the first table

I have two tables .
Table A:
Table A ID Table Name owner1ID owner2ID
1 Work1 85 91
2 Work2 86 92
3 Work3 87 93
4 Work4 88 94
5 Work5 89 95
6 Work6 90 96
Table B:
OwnerID 0WNERFIRSTNAME 0WNERlASTNAME
85 A M
86 B N
87 C O
88 D P
90 E Q
91 F R
89 G S
92 H T
86 I U
94 J V
93 K W
95 L X
Can you please help me out in getting a query where i need the table which contains TABLEID OWNERFIRSTNAME and OWNERSECONDNAME.
Expected output:
TableAID 0WNER1FIRSTNAME 0WNER1LASTNAME 0WNER2FIRSTNAME 0WNER2LASTNAME
1 A M F R

You need to join on to TableB twice.
That means you need to give each instance of the table an alias, so you can differentiate which instance you're referring to...
SELECT
TableA.TableAID,
TableB1.0WNERFIRSTNAME AS 0WNER1FIRSTNAME,
TableB1.0WNERlASTNAME AS 0WNER1LASTNAME,
TableB2.0WNERFIRSTNAME AS 0WNER2FIRSTNAME,
TableB2.0WNERlASTNAME AS 0WNER2LASTNAME
FROM
TableA
INNER JOIN
TableB TableB1
ON TableB1.OwnerID = TableA.owner1ID
INNER JOIN
TableB TableB2
ON TableB2.OwnerID = TableA.owner2ID
P.S. Don't Spell 0WNERFIRSTNAME with a ZERO, Spell it OWNERFIRSTNAME!

While MatBaile's answer is the most common practice, your own example shows some problems. First is that we lose info about table 6 for which second owner is not found in second table. This can be easily corrected with left join:
select a.id, a.table_name,
b1.OwnerFirstName O1FN, b1.OwnerLastName O1LN,
b2.OwnerFirstName O2FN, b2.OwnerLastName O2LN
from a
left join b b1 on b1.OwnerId = a.Owner1Id
left join b b2 on b2.OwnerId = a.Owner2Id
What gives us:
ID TABLE_NAME O1FN O1LN O2FN O2LN
---------- ---------- ---- ---- ---- ----
1 Work1 A M F R
2 Work2 I U H T <-- two first owners
2 Work2 B N H T <-- two first owners
4 Work4 D P J V
3 Work3 C O K W
5 Work5 G S L X
6 Work6 E Q <-- null second owner
And second problem - for table 2 we got two entries, because in your example there are two owners with id = 86. I suspect that this is typo, but this can happen in similiar cases. You can leave it as is, or take only last row (if owner changed and you have info about this in some date column), or you can list all owners using listagg(), or take max value. Things are worse when there are more rows connected to 1. and 2. owner, your output is multiplied.
As a curiosity here is unpivot-pivot solution. In this case this query looks more complicated, but if there were 10 columns you had to do 10 joins and in this query only lists of columns requires change.
select *
from (
select id, table_name, type, ownerfirstname, ownerlastname
from (select * from a unpivot (ownerId for type in (owner1ID as 1, owner2ID as 2))) a
join b using (ownerId))
pivot (listagg(ownerfirstname||' '||ownerlastname, ', ') within group (order by null) owner
for type in (1, 2))
SQL Fiddle demo
ID TABLE_NAME 1_OWNER 2_OWNER
---------- ---------- ---------- ----------
1 Work1 A M F R
2 Work2 B N, I U H T <-- listagg() used to aggregate data
3 Work3 C O K W
4 Work4 D P J V
5 Work5 G S L X
6 Work6 E Q

To derive a table out of an existing one

Table:
A | B | C | D
1 q 123 23
2 w 22 32
3 e 23 21
New table:
A | B | C | D
1 q 123 C
1 q 23 D
2 w 22 C
2 w 32 D
3 e 23 C
3 e 21 D
I want to derive a new table/view from an existing table, where I want the records in the first table to be split by a column name.
C and D are months in the original table. In the new table I want the months to be as records.
The records in the original table for the months (123,23 for 1) should match the months column and be put into another column in the new table.
Please let me know if it is not clear.

Do a UNION ALL, with one select for the c's and one select for the d's.
select a, b, c, 'c' from tablename
union all
select a, b, d, 'd' from tablename

accumulated calculation in sql server

I want to do this query on sql server
I can do this with loop but would like to know if there is any easier way that has better performance
I have table for referrals ,
scenario 1 : if A has 5 points and refer B that has 7 then the query should show 12 pts for A (A points + B points)
scenario 2 : if A has 5 and refer to B that has 7 and A refer to C that has 3 points and B refer to D that has 4 pts and so on..
in this case A takes all points of people A + B + C + D
my table look like this
Refs
sID bigint
sName varchar(50)
sPoints int
sRefID bigint

You can do this using recursive SQL. Try this:
With CTETable (sID, sRefID, sName, sPoints)
AS
(
SELECT Refs.sID, Refs.sRefID, Refs.sName, Refs.sPoints FROM Refs
UNION ALL
SELECT Refs.sID, Refs.sRefID, CTETable.sName, Refs.sPoints
FROM Refs INNER JOIN CTETable ON CTETable.sID = Refs.sRefID
)
Select sName, Sum(sPoints)
From CTETable
Group By CTETable.sName
This will yeild:
sName TotalPoints
A 360
B 210
C 130
D 80
E 90
F 90
G 60
H 40
I 20
J 50

Show COUNT of each possible grade for an employee, showing zero when there are no grade entries

I have only one table available. I want to show the grade and the count of the number of times an employee has that grade recorded, but it must show a 0 for the grade if there are no records for that employee. I know how to do this using left join when two tables are present, but I only have 1 table.
How is this possible?
For example:
TABLE
empID | dept | grade
1 | 11 | a
2 | 11 | a
3 | 11 | b
1 | 22 | c
2 | 22 | f
3 | 22 | d
1 | 33 | a
2 | 33 | a
3 | 33 | a
If I run SELECT grade, count(grade) from table where empID = 1 Group by grade;, for example, it ends up printing out only grades the employee got and the count. Now I want to also print out the 0s for grades the employee did not have.

i think you're asking for this?
SQL> select e.grade, count(e2.empid)
2 from (select distinct grade from e) e
3 left outer join e e2
4 on e2.grade = e.grade
5 and e2.empid = 1
6 group by e.grade
7 order by grade;
G COUNT(E2.EMPID)
- ---------------
a 2
b 0
c 1
d 0
f 0
or as you have no rows with "e" grade then if you have a lookup table called grade:
SQL> select * from grade;
G
-
a
b
c
d
e
f
SQL> select e.grade, count(e2.empid)
2 from grade emp
3 left outer join emp e2
4 on e2.grade = e.grade
5 and e2.empid = 1
6 group by e.grade
7 order by grade;
G COUNT(E2.EMPID)
- ---------------
a 2
b 0
c 1
d 0
e 0
f 0

Let's say your query to select a value is:
select value from tbl;
You can ensure a 0 is returned if there are no rows in tbl t:
select nvl(t.value, 0) value
from dual d
left join tbl t on 1=1;

Sounds like you want the NVL function. With NVL, you can conditionally return an alternate value if the value is null. See the documentation.
So, if you had the following...
SELECT fooName, fooNumber FROM foo
and these were your results
fooName, fooNumber
Blah, 1
Asdf, null
Qwer, 3
poiu, null
you could rewrite the query like this...
SELECT fooName, NVL(fooNumber, 0) FROM foo
and your results would now be...
fooName, fooNumber
Blah, 1
Asdf, 0
Qwer, 3
poiu, 0
http://docs.oracle.com/cd/B19306_01/server.102/b14200/functions105.htm

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Hive Table output shows one non existent Null values when Queried - hive

Related

Using a Selected table for two sets of joins

Query to join two tables using two different columns from the first table

To derive a table out of an existing one

accumulated calculation in sql server

Show COUNT of each possible grade for an employee, showing zero when there are no grade entries

Categories

Resources