Hive parse string to columns - hive

In a Hive table, I have columns (all have string datatype)
CustomerId, Name, Gender
Name datatype is string of format
{'firstname': 'XXXXXX', 'middlename': 'YYYYYY', 'lastname': 'ZZZZZZ'}
Also, some of the rows have missing middlename -
CustomerId, {'firstname': 'AAAAAA', 'lastname': 'BBBBB'}, Gender
I wanted to create a new table and populate the below columns -
CustomerId, firstname, middlename, lastname , Gender.
Middlename will be null/spaces if value not present. Could you please help?

Those are not valid JSON strings, use double quotes instead of single quotes, then you may use JSON_TUPLE() to extract fields.
CREATE TABLE yournewtable AS
SELECT customerid,
firstname,
middlename,
lastname,
gender
FROM yourtable lateral view
json_tuple(Name,'firstname','middlename','lastname')
j as firstname, middlename,lastname;

Related

PostgreSQL: `VIEW` returns no rows with `SPLIT_PART()` columns

Problem Description:
Hi everyone, I'm having some troubles querying on a VIEW whose columns are, in part, the result of SPLIT_PART() function on og table's column; I created the VIEW as it follows:
CREATE VIEW ClientsAddressList(Client_ID, FirstName, LastName, ResidenceAddress, City, PostalCode, Province) AS
SELECT Client_ID,
FirstName,
LastName,
SPLIT_PART(Address, '-', 1) AS ResidenceAddress,
SPLIT_PART(Address, '-', 2) AS City,
SPLIT_PART(Address, '-', 3) AS PostalCode,
SPLIT_PART(Address, '-', 4) AS Province
FROM Clients;
My intention was to divide the structured attribute (Clients.Address defined as a string VARCHAR(255)) which contains all the informations releated to client's domicile in several columns to separately query (e.g. SELECT FirstName, LastName FROM ClientAddressList WHERE City LIKE 'N%'; or SELECT Client_ID FROM ClientAddressList WHERE PostalCode = '82305';).
What I experience:
The Clients table contains one test row:
Client_ID
FirstName
LastName
ResidenceAddress
City
PostalCode
Province
00451
Ezio
Auditore
Via dei Banchi 45 - Florence - 50123 - Florence
Florence
50123
Florence
So my VIEW has this row:
Client_ID
FirstName
LastName
ResidenceAddress
City
PostalCode
Province
00451
Ezio
Auditore
Via dei Banchi 45
Florence
50123
Florence
I've tried:
SELECT Client_ID, FirstName, LastName
FROM ClientsAddressList
WHERE City = 'Florence'
And it returns no result:
Client_ID
FirstName
LastName
ResidenceAddress
City
PostalCode
Province
But if I query on columns that are not the result of SPLIT_PART() it works:
SELECT Client_ID, FirstName, LastName, City
FROM ClientsAddressList
WHERE Client_ID = '00451'
Client_ID
FirstName
LastName
City
00451
Ezio
Auditore
Florence
What I expect:
I would WHERE clause to work and returns values even on SPLIT_PART() result columns:
SELECT Client_ID
FROM ClientAddressList
WHERE PostalCode LIKE = '%123'
Client_ID
00451
Can someone explain me what could be the problem, please? Thank you so much!
As sticky bit wrote: there are spaces around the values. There are two ways to deal with this. One way is to just slap a trim() around the expressions in the view:
trim(SPLIT_PART(Address, '-', 2)) AS City,
The other option is to use an appropriate regex to split the information to remove the whitespace during splitting:
select client_id,
firstname,
lastname,
address[1] as residenceaddress,
address[2] as city,
address[3] as postalcode,
address[4] as province
from (
select client_id, firstname, lastname,
regexp_split_to_array(residenceaddress, '\s*-\s*') as address
from clients
) t
Online example
In the long run you should fix your data model by properly normalizing it and storing those values in separate columns. I don't know how many city names contain dashes in Italy, but in Germany, this pattern would break quickly with city names like "Garmisch-Partenkirchen" or "Leinfelden-Echterdingen"

SQL Query to replace record of table with another

The following code produces an error:
select lastname, firstname, workphone, homephone
from members if (workphone is null) then workphone = homephone;
I am trying to select the lastnames, firstnames and thee phone numbers from a table named members. If a members' workphone is null I need to replace it with the homephone.
I would be more than happy to clarify if need be.
select lastname, firstname,
case when (workphone is null) then homephone else workphone end as workphone
, homephone
from members;
You can use the COALESCE function, which returns the first not null argument it's given:
SELECT lastname, firstname, homephone, COALESCE(workphone, homephone) AS workphone
FROM members

Sql returns Column has "FristName LastName" as 1 string how to order by Lastname?

I'm using Oracle. My sql returns a column like this
Name:
John Smith
David Lee
...
If I do Order by Name, it will order by first name. How do I order by Last name? If I do Order by Lastname, Firstname oracle returns invalid identifiers. I tried substr, instr but it doesn't work. I know the sql is tedious but just want the data to quickly fix this issue.
Full SQL:
http://pastebin.com/hYkdHBDM
You say your SQL "returns a column" in that format. Do you mean the column is stored that way, or that it's stored as two fields and composed into one in the SQL statement?
If stored that way it's difficult to create an algorithm that will reliably determine what part of a multi-part name is the last name part (indeed, this is sometimes down to personal preference of the person owning the name).
If stored in two separate fields you should be able to ORDER BY LastName, FirstName depending on how the SQL is constructed and whether there are any intermediate views between you and the table. Please post the SQL and table structure.
First, in order to sort by LastName, it needs to be one of the columns you return in each of the queries in your Union All. Second, you can greatly simplify your query by using a common-table expression. Third, do not use the comma delimited syntax for Joins (e.g. From TableA, TableB, TableC...). Instead use the ISO Join syntax.
With RootQuery As
(
Select MeetingID
, FirstName || ' ' || LastName AS Name
, LastName
, CASE WHEN RSVP = 1 THEN 1 ELSE NULL END AS Yes
, CASE WHEN RSVP = 0 THEN 1 ELSE NULL END AS No
, CASE WHEN RSVP = 2 THEN 1 ELSE NULL END AS Phone
, CASE WHEN RSVP = -1 THEN 1 ELSE NULL END AS No_Reply
, MysteryTable0.Response1
, MysteryTable1.Response2
, Note
, groupname
From Attendance A
Join Allusers As B
And B.MemberId = A.PersonId
Join MembershipGroups As M
And M.MemberId = B.MemberID
Join (
SELECT TD.MEMBERID AS MEM0
, Response AS Response1
FROM TRACKINGDETAILS TD, ALLUSERS U
Where TD.MEMBERID = U.MEMBERID
And TD.TRACKINGID = 64
) MysteryTable0
On MysteryTable0.Mem0 = B.MemberId
Join (
SELECT TD.MEMBERID AS MEM1
, Response AS Response2
FROM TRACKINGDETAILS TD, ALLUSERS U
Where TD.MEMBERID = U.MEMBERID
And TD.TRACKINGID = 65
) As MysteryTable1
On MysteryTable1.Mem1 = B.MemberId
Where Meetingid = :1
)
Select MeetingId, Name, LastName, Yes, No, Phone, No_Reply
, Response1, Response2
, Note, GroupName
From RootQuery
Union All
Select Null, 'Total', LastName, SUM(Yes), SUM(No), SUM(Phone), SUM(No_Reply)
, TO_CHAR(SUM(Response1))
, TO_CHAR(SUM(Response2))
, NULL, Groupname
From RootQuery
Group By GroupName
Union All
Select Null, 'Grand Total', LastName, SUM(Yes), SUM(No), SUM(Phone), SUM(No_Reply)
, TO_CHAR(SUM(Response1))
, TO_CHAR(SUM(Response2))
,NULL, ' '
From RootQuery
Group By ???
Order By GroupName Desc, LastName Asc, Name Asc
Btw, the last query will probably have a problem in that it did not have a Group By (which I denoted with Group By ???) but you are using aggregate functions.
What Matthew PK said is correct however he failed to mention that INSTR can parse backwards in which case, his "fail" scenario would be resolved.
Here try this:
create table test_name (f_name varchar2(20), l_name varchar2(20), full_name varchar2(20));
insert into test_name (f_name, l_name, full_name) values ('John', 'Mellencamp', 'John 2Mellen');
insert into test_name (f_name, l_name, full_name) values ('John', 'Mellencamp', 'John C. 1Mellen');
select f_name, l_name, substr(full_name,instr(full_name,' ',-1,1)) as substr, full_name from test_name order by substr(full_name,instr(full_name,' ',-1,1));
Basically the money shot is: substr(full_name,instr(full_name,' ',-1,1))
If you know the field will always be "FirstName LastName" separated by a space you could:
ORDER BY RIGHT(Name, INSTR(Name,' '))
This is the number of characters, on the right side, starting at the space.
This will fail if any other names are separated by a space like "John Cougar Mellencamp"

How to add table column headings to sql select statement

I have a SQL select statement like this:
select FirstName, LastName, Age from People
This will return me something like a table:
Peter Smith 34
John Walker 46
Pat Benetar 57
What I want is to insert the column headings into the first row like:
First Name Last Name Age
=========== ========== ====
Peter Smith 34
John Walker 46
Pat Benetar 57
Can someone suggest how this could be achieved?
Could you maybe create a temporary table with the headings and append the data one to this?
Neither of the answers above will work, unless all your names come after "first" in sort order.
Select FirstName, LastName
from (
select Sorter = 1, FirstName, LastName from People
union all
select 0, 'FirstName', 'LastName') X
order by Sorter, FirstName -- or whatever ordering you need
If you want to do this to all non-varchar columns as well, the CONS are (at least):
ALL your data will become VARCHAR. If you use Visual Studio for example, you will NO LONGER be able to recognize or use date values. Or int values. Or any other for that matter.
You need to explicitly provide a format to datetime values like DOB. DOB values in Varchar in the format dd-mm-yyyy (if that is what you choose to turn them into) won't sort properly.
The SQL to achieve this, however not-recommended, is
Select FirstName, LastName, Age, DOB
from (
select Sorter = 1,
Convert(Varchar(max), FirstName) as FirstName,
Convert(Varchar(max), LastName) as LastName,
Convert(Varchar(max), Age) as Age,
Convert(Varchar(max), DOB, 126) as DOB
from People
union all
select 0, 'FirstName', 'LastName', 'Age', 'DOB') X
order by Sorter, FirstName -- or whatever ordering you need
The lightest-weight way to do this is probably to do a UNION:
SELECT 'FirstName' AS FirstName, 'LastName' AS LastName
UNION ALL
SELECT FirstName, LastName
FROM People
No need to create temporary tables.
The UNION All is the solution except it should be pointed out that:
To add a header to a non-char column will require converting the column in the first part of the query.
If the converted column is used as part of the sort order then the field reference will have to be to the name of the column in the query, not the table
example:
Select Convert(varchar(25), b.integerfiled) AS [Converted Field]...
... Order by [Converted Field]

faster way to do this select query

Say Employee has three columns
FirstName, LastName and ID.
The query should first search for a name in firstname, only if its not found search for last name.
so select *from Employee where FirstName = 'test%' or lastname='test'%'. wont' work.
The query below will work.
select FirstName, LastName, ID
from EMPLOYEE
WHERE
LastName = 'Test%'
AND
(COUNT(SELECT FirstName, LastName, ID FROM EMPLOYEE WHERE FirstName = 'Test%')=0)
OR
SELECT FirstName, LastName, ID FROM EMPLOYEE WHERE FirstName = 'Test%'
I need to map this back to NHibernate, is there a faster efficient way of doing this instead of making two database calls?
Give this one a try:
SELECT FirstName, LastName, ID FROM EMPLOYEE WHERE FirstName = 'Test%'
OR (FirstName <> 'Test%' AND LastName = 'Test%')
FROM Employee
WHERE (FirstName LIKE 'Test%' AND LastName NOT LIKE 'Test%')
OR (LastName LIKE 'Test%' AND firstName NOT LIKE 'Test%')
Granted you don't care what order they come back in. If the records must come back with the records that the first name matches, followed by the names where the last name match, then this won't work.
"Premature optimization is the root of all evil".
Why would you care of how the search is done as the optimizer sees it, instead of declaring what you want?
It all boils down to a boolean truth table: when FirstName matches, you want the record (whatever is in LastName, match or nomatch) and if FirstName does not match, you want the record when LastName matches:
F match, L match => yes
F match, L nomatch => yes
F nomatch, L match => yes
F nomatch, L nomatch => no
That is exactly the OR condition: (FirstName matching) OR (LastName matching); the only discarded records are when both FirstName and LastName do not match.
The boolean optimization will ensure that the 2nd condition is not even evaluated when the 1st one is true.
So your query is:
SELECT FirstName, LastName, ID
FROM Employee
WHERE (FirstName LIKE 'Test%')
OR (LastName LIKE 'Test%')
UPDATE: I may have misunderstood the goal if it is indeed not to return any LastName match if records were found with only the FirstName...
Anyway, the stance on premature optimization is still valid...
You need somehow a 2 pass query as you cannot tell if LastName is to be considered until you're sure you don't have any match on FirstName. But with proper indexes and statistics, the optimizer will make it very efficient.
The query:
SELECT FirstName, LastName, ID
FROM Employee
WHERE (FirstName LIKE 'Test%')
OR (LastName LIKE 'Test%'
AND (SELECT Count(ID)
FROM Employee
WHERE FirstName LIKE 'Test%') = 0)
is only marginally more expensive than the previous.
Select *
From Employee
Where FirstName Like #Text + '%'
Union
Select *
From Employee
Where LastName Like #Text + '%';
I don't think any of these queries answers the question. My understanding is that a search on 'John%' should return employees with last name John, Johnson, etc. ONLY if there were no employees with first name John, Johnny, etc. All the queries show will return both John Adams and Lyndon Johnson if the table contains both, but only John Adams should appear, because last names should be matched ONLY if there are no first names that matched.
Here's a proposal using SQL Server syntax. It should be possible to write this in other dialects of SQL:
select top (1) with ties
FirstName, LastName, ID
from (
select
0 as SearchLastNames,
FirstName, LastName, ID
from EMPLOYEE
where FirstName like 'Test%'
union all
select
1 as SearchLastNames,
FirstName, LastName, ID
from EMPLOYEE
where LastName like 'Test%'
) as T
order by SearchLastNames;
If there are any matching first names, the smallest SearchLastNames value will be 0, and the TOP (1) with ties .. order by SearchLastNames will return information only for the first name matches (where SearchLastNames is 0).
If there are no matching first names, the only SearchLastNames value will be 1. In that case, TOP will return information for all last name matches (where SearchLastNames is 1), if there are any.
A more clumsy, but more portable solution is this:
select
FirstName, LastName, ID
from EMPLOYEE
where FirstName like 'Test%'
union all
select
FirstName, LastName, ID
from EMPLOYEE
where LastName like 'Test%'
and not exists (
select
FirstName, LastName, ID
from EMPLOYEE
where FirstName like 'Test%'
);