Split function in HIVE

Split function in HIVE - hive

I want to split the address into two columns as streetno and streetname, say for ex
select address1 from customer
address for ex-look like
2719 STONE CREEK DR
and store them into street no as 2719 and streetname as STONE CREEK DR.
using split(address1 ,'[\ ]')[0] just split the street no only.
it's just select statement we are using to view the data.
Sample data:
address1
100 HORACE GREELEY RD
55 School Street
2893 MASHIE CIR
1200 JEWEL DR
201 W RIVER RD
Expected output
+--------------+---------------------+
| streetnumber | streetname |
+--------------+---------------------+
| 100 | HORACE GREELEY RD |
| 55 | School Street |
| 2893 | MASHIE CIR |
| 1200 | JEWEL DR |
| 201 | W RIVER RD |
+--------------+---------------------+

select regexp_extract(address1,'(.*?)\\s',1) as streetnumber
,regexp_extract(address1,'\\s(.*)' ,1) as streetname
from mytable
;
+----------------+--------------------+
| streetnumber | streetname |
+----------------+--------------------+
| 100 | HORACE GREELEY RD |
| 55 | School Street |
| 2893 | MASHIE CIR |
| 1200 | JEWEL DR |
| 201 | W RIVER RD |
+----------------+--------------------+
or
select regexp_extract(address1,'.*?(?=\\s)',0) as streetnumber
,regexp_extract(address1,'(?<=\\s).*',0) as streetname
from mytable
;
+----------------+--------------------+
| streetnumber | streetname |
+----------------+--------------------+
| 100 | HORACE GREELEY RD |
| 55 | School Street |
| 2893 | MASHIE CIR |
| 1200 | JEWEL DR |
| 201 | W RIVER RD |
+----------------+--------------------+
or
select split(address1,'\\s')[0] as streetnumber
,split(address1,'^.*?\\s')[1] as streetname
from mytable
;
+----------------+--------------------+
| streetnumber | streetname |
+----------------+--------------------+
| 100 | HORACE GREELEY RD |
| 55 | School Street |
| 2893 | MASHIE CIR |
| 1200 | JEWEL DR |
| 201 | W RIVER RD |
+----------------+--------------------+

Related

Get hierarchy of all different level of managers

I'm using pgAdmin4 and I have a SQL table with employee/manager HR data that looks like this:
| employee_id | email_address | full_name | band_lvl | manager_id |
| 5592 | jillr#ex.org | Jill Rhode | 20 | 6521 |
| 6421 | racheln#ex.org | Rachel Nam | 40 | 4251 |
| 2818 | todda#ex.org | Todd Alex | 25 | 6421 |
| 4251 | jalens#ex.org | Jalen Smith | 60 | 2199 |
| 6521 | tolun#ex.org | Tolu Nagoye | 30 | 2199 |
| 7831 | jina#ex.org | Ji Na | 80 | NULL |
| 2199 | zaynm#ex.org | Zayn Mate | 70 | 7831 |
Based on the first manager_id and employee_id, I'm seeking to return the following columns: Level1 Manager Name, Level1 Manager Email, Level1 Manager Band Lvl, Level1 Manager Manager's Id. I then want to do that for each manager that's a step above, until there are no higher managers.
The desired output should look like this:
| employee_id | email_address | full_name | band_lvl | manager_id | Lvl1 Mng Nm | Lvl1 Mng Email | Lvl1 Mng Band Lvl | Lvl1 Mng Mngs Id | Lvl2 Mng Nm | Lvl2 Mng Email | Lvl2 Mng Band Lvl | Lvl2 Mng Mngs Id |
| 5592 | jillr#ex.org | Jill Rhode | 20 | 6521 | Tolu Nagoye | tolun#ex.org | 30 | 2199 | Zayn Mate | zaynm#ex.org | 70 | 7831 |
| 6421 | racheln#ex.org | Rachel Nam | 40 | 4251 | Jalen Smith | jalens#ex.org | 60 | 2199 | Zayn Mate | zaynm#ex.org | 70 | 7831 |
| 2818 | todda#ex.org | Todd Alex | 25 | 6421 | Rachel Nam | racheln#ex.org | 40 | 4251 | Jalen Smith | jalens#ex.org | 60 | 2199 |
| 4251 | jalens#ex.org | Jalen Smith | 60 | 2199 | Zayn Mate | zaynm#ex.org | 70 | 7831 | Ji Na | jina#ex.org | 80 | NULL |
| 6521 | tolun#ex.org | Tolu Nagoye | 30 | 2199 | Zayn Mate | zaynm#ex.org | 70 | 7831 | Ji Na | jina#ex.org | 80 | NULL |
| 7831 | jina#ex.org | Ji Na | 80 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL |
| 2199 | zaynm#ex.org | Zayn Mate | 70 | 7831 | Ji Na | jina#ex.org | 80 | NULL | NULL | NULL | NULL | NULL |
So far, this is what I've come up with, to get the first columns for the Level 1 Manager; however, I don't know where to go from here, as I'm very new to SQL:
SELECT B.employee_id,
B.email_address,
B.full_name,
B.band_lvl,
B.manager_id,
B1.full_name AS L1_mng_nm,
B1.email_address AS L1_mng_email,
B1.band_lvl AS L1_mng_band_lvl,
B1.manager_id AS L1_mgr_mgrs_id
FROM hrdata B
INNER JOIN hrdata B1 ON
B.manager_id = B1.employee_id;

Your query is close, but you would need to make a few changes to get to your desired output. To begin, I would recommend doing a LEFT JOIN as opposed to an INNER JOIN, as the INNER JOIN will not return null values and will instead drop records that it cannot find a match for in both tables (in this case, if it cannot find a match on manager_id to employee_id from the first use of hrdata to the second use of hrdata).
After that, your query should look similar to what you have already done, just with another self-join to get the second-level manager data:
SELECT B.employee_id,
B.email_address,
B.full_name,
B.band_lvl,
B.manager_id,
B1.full_name AS L1_mng_nm,
B1.email_address AS L1_mng_email,
B1.band_lvl AS L1_mng_band_lvl,
B1.manager_id AS L1_mgr_mgrs_id,
B2.full_name AS L2_mng_nm,
B2.email_address AS L2_mng_email,
B2.band_lvl AS L2_mng_band_lvl,
B2.manager_id AS L2_mgr_mgrs_id,
FROM hrdata B
LEFT JOIN hrdata B1
ON B1.employee_id = B.manager_id
LEFT JOIN hrdata B2
ON B2.employee_id = B1.manager_id

Compare very different tables

I have two Tables that are read from separate files (.xlsx and .csv) and are imported into MS Access. They are not in the same format
(which is why I'm having such a difficult time with it).
Here is xlsxTable:
+--------------------------------------------------------------------------------------+
| ID | Name | SSN | SSN2 | Address |
+--------------------------------------------------------------------------------------+
| 00012345 | Robert Robin | ThisIsSSN | ThisIsSSN2 | 12345 StreetName St. CityName, KS |
| 00013245 | Pete Peters | ThisIsSSN | ThisIsSSN2 | 54321 StreetName St. CityName, MO |
| 00012358 | Mike Michaels| ThisIsSSN | ThisIsSSN2 | 69874 StreetName St. CityName, NY |
| 00098755 | Tim Timpson | ThisIsSSN | ThisIsSSN2 | 15987 StreetName St. CityName, KY |
| 00035784 | Tom Thompson | ThisIsSSN | ThisIsSSN2 | 95123 StreetName St. CityName, CA |
| 00012584 | Will Willers | ThisIsSSN | ThisIsSSN2 | 35789 StreetName St. CityName, WA |
| ........ | ........... | ......... | .......... | ................................. |
Here is my csvTable:
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| tracking_number | last_name | first_name | middle_name | suffix | alias_last_name | alias_first_name | alias_middle_name | alias_suffix | number | number_type | dob | street | city | state | zip | country | phone |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 135247 | Keeves | Michael | | Jr | | | | | ThisIsSSN | SSN/ITIN | 1/1/1990 | StreetName | CityName | NJ | | US | |
| 135248 | Jackson | Sue | Master | | | | | | ThisIsSSN | SSN/ITIN | 10/29/1980 | StreetName | CityName | NY | zip | US | |
| 135248 | Thomspon | Dolf | Laundry | | | | | | DriverNum | Driver'sLicense | 11/15/1962 | StreetName | CityName | KS | | US | |
| 135249 | Peters | Pete | | | Peters | Petey | | | ThisIsSSN | SSN/ITIN | 5/6/1975 | StreetName | CityName | PA | zip | US | |
| 135250 | Rogers | Steve | | | | | | | ThisIsSSN | SSN/ITIN | 12/25/1990 | StreetName | CityName | CT | zip | US | |
| 135250 | Nikolson | Jack | | Jr | | | | | DriverNum | Driver'sLicense | 8/5/1975 | StreetName | CityName | CA | zip | US | |
| 135251 | Keeves | Keanu | Neo | | | | | | ThisIsSSN | SSN/ITIN | 10/30/2000 | StreetName | CityName | TX | zip | US | |
| 135252 | Starch | Tony | | | | | | | ThisIsSSN | SSN/ITIN | 9/10/1975 | StreetName | CityName | NJ | | US | |
|...................|...............|................|...............|..........|...................|......................|........................|.................|.............|....................|............|.............|..............|.........|......|.........|.......|
| dba_name | number | number_type | incorporated | street | city | state | zip | country | phone | | | | | | | | |
| Mini Mart | 92585487 | EIN | | Street | CityName | state | zipNum | GT | | | | | | | | | |
| | 15987548 | EIN | | street | CityName | KS | zipNum | US | | | | | | | | | |
| Check Systems | 35854855 | EIN | | street | CityName | CA | zipNum | US | | | | | | | | | |
|...................|...............|................|...............|..........|...................|......................|........................|.................|.............|....................|............|.............|..............|.........|......|.........|.......|
Where dba_name is in the above table is an actual row. For some reason, there's another portion of the file that starts a new list.
I have to query these tables and if a name along with SSN match, then I must take the name, address and SSN, and do something with them (most likely put into another table for export). I have loaded both tables from the files necessary.
I'm now needing to iterate through and find the matches. For the sake of the sample data, Pete Peters should match here since the data is in both tables. My expected output should look a lot like the first table:
| ID | Name | SSN | SSN2 | Address |
I currently have an MS Access database that contains these tables. Though, with how the data is parsed, I'm not sure where to even start with the SQL. Performance-wise, this may be extensive. I'm just looking for a way to get it working first.
How can I query these two very different tables and only pull the data that matches?

Access has a find duplicates query wizard. The fastest way to handle the problem is to combine the tables manually or using 1 or more queries and then run the wizard. Again, get all your data into one table and then run the wizard. To make things complicated by breaking them down.
you might get the data from the CSV Table: with a query like:
SELECT csvTable.First_Name AS First_Name, csvTable.Last_Name AS Last_Name, csvTable.Number AS [Number]
FROM csvTable
GROUP BY csvTable.First_Name, csvTable.Last_Name, csvTable.Number
HAVING (((Count(csvTable.Number))>1));
then create a query with the same structure from the xlsx table:
SELECT Left([xlsxTable]![FullName],InStr([xlsxTable]![FullName]," ")) AS First_Name, Right([xlsxTable].[FullName],Len([xlsxTable].[FullName])-InStr([xlsxTable]![FullName]," ")) AS Last_Name, xlsxTable.SSN AS [Number]
FROM xlsxTable
GROUP BY Left([xlsxTable]![FullName],InStr([xlsxTable]![FullName]," ")), Right([xlsxTable].[FullName],Len([xlsxTable].[FullName])-InStr([xlsxTable]![FullName]," ")), xlsxTable.SSN
HAVING (((Count(xlsxTable.SSN))>1));
The having Count >1 does the work of finding the duplicates. Most of the rest of this is obtuse string manipulations to turn Full Name into first and last name directly in the sql. Then combine the queries so you can run them at the same time in the sql pane using a UNION ALL statement:
SELECT csvTable.First_Name AS First_Name, csvTable.Last_Name AS Last_Name, csvTable.Number AS [Number]
FROM csvTable
GROUP BY csvTable.First_Name, csvTable.Last_Name, csvTable.Number
UNION ALL
SELECT Left([xlsxTable]![FullName],InStr([xlsxTable]![FullName]," ")) AS First_Name, Right([xlsxTable].[FullName],Len([xlsxTable].[FullName])-InStr([xlsxTable]![FullName]," ")) AS Last_Name, xlsxTable.SSN AS [Number]
FROM xlsxTable
GROUP BY Left([xlsxTable]![FullName],InStr([xlsxTable]![FullName]," ")), Right([xlsxTable].[FullName],Len([xlsxTable].[FullName])-InStr([xlsxTable]![FullName]," ")), xlsxTable.SSN;
union all keeps duplicates while union omits them. I have removed the having statements from the union as I find it works better. next use the find duplicates wizard on your combined query like:
SELECT [combine tables].First_Name, [combine tables].Last_Name, [combine tables].Number
FROM [combine tables]
GROUP BY [combine tables].First_Name, [combine tables].Last_Name, [combine tables].Number
HAVING (((Count([combine tables].Number))>1));

SQL query based on a column in parent table - Parent child relationship

I have the following three TABLES(ACCOUNTS,CUSTOMER,EMPLOYEE) and I would like to join them based on the columns AGENT_CODE & AGENT_TYPE and achieve the below.
What should be the best way to join these tables when AGENT_CODE can be same in CUSTOMER & EMPLOYEE table?
I have this query which is giving me wrong results
SELECT ac.AGENT_CODE,
ac.WORKING_AREA,
ac.AGENT_TYPE,
CONCAT(c.FIRST_NAME,c.LASTNAME_NAME),
e.EMP_NAME
FROM ACCOUNTS ac,
CUSTOMER c,
EMPLOYEE e
WHERE ac.AGENT_CODE = e.AGENT_CODE
OR ac.AGENT_CODE = c.AGENT_CODE
GETTING_WRONG_RESULTS_WITH_THE_ABOVE_QUERY
+------------+--------------------+------------+--------------+--------------+
| AGENT_CODE | WORKING_AREA | AGENT_TYPE | CUSTOMER_NAME| EMP_NAME |
+------------+--------------------+------------+--------------+--------------+
| A007 | Bangalore | CUSTOMER |Walter Holmes |Walter Holmes |
| A007 | London | EMPLOYEE |Walter Holmes |Peter Sam |
| A008 | New York | CUSTOMER |Micheal Junior|Micheal Junior|
| A007 | Bangalore | EMPLOYEE |Walter Holmes |John Tyler |
| A010 | Chennai | CUSTOMER |Micheal |Micheal |
| A007 | San Jose | EMPLOYEE |Walter Holmes |Albert |
+------------+--------------------+------------+--------------+--------------+
Expecting Result
+------------+--------------------+------------+--------------+
| AGENT_CODE | WORKING_AREA | AGENT_TYPE | AGENT_NAME |
+------------+--------------------+------------+--------------+
| A007 | Bangalore | CUSTOMER |Walter Holmes |
| A003 | London | EMPLOYEE |Peter Sam |
| A008 | New York | CUSTOMER |Micheal Junior|
| A011 | Bangalore | EMPLOYEE |John Tyler |
| A010 | Chennai | CUSTOMER |Micheal |
| A012 | San Jose | EMPLOYEE |Albert |
+------------+--------------------+------------+--------------+
ACCOUNTS(AGENT_CODE -PrimaryKey)
+------------+--------------------+------------+
| AGENT_CODE | WORKING_AREA | AGENT_TYPE |
+------------+--------------------+------------+
| A007 | Bangalore | CUSTOMER |
| A003 | London | EMPLOYEE |
| A008 | New York | CUSTOMER |
| A011 | Bangalore | EMPLOYEE |
| A010 | Chennai | CUSTOMER |
| A012 | San Jose | EMPLOYEE |
| A005 | Brisban | EMPLOYEE |
+------------+--------------------+------------+
CUSTOMER(AGENT_CODE -ForeignKey)
+-----------+-------------+-------------+------------+
|CUST_CODE | FIRST_NAME | LAST_NAME | AGENT_CODE |
+-----------+-------------+-------------+------------+
| C00013 | Walter | Holmes | A007 |
| C00001 | Micheal | Junior | A008 |
| C00020 | Albert | Skyler | A010 |
+-----------+-------------+-------------+------------+
EMPLOYEES(AGENT_CODE -ForeignKey)
EMP_NAME EMP_CODE AGENT_CODE
---------- --------------- ----------
Peter Sam C00054 A003
John Tyler C00023 A011
White Bolt C00043 A012

If you want to combine the result, you may want to UNION your result.
SELECT a.AGENT_CODE, a.WORKING_AREA, a.AGENT_TYPE, c.FIRST_NAME || ' ' || c.LAST_NAME AS AGENT_NAME
FROM ACCOUNTS a
JOIN CUSTOMER c ON c.AGENT_CODE = a.AGENT_CODE
UNION
SELECT a.AGENT_CODE, a.WORKING_AREA, a.AGENT_TYPE, e.EMP_NAME
FROM ACCOUNTS a
JOIN CUSTOMER e ON e.AGENT_CODE = a.AGENT_CODE

Unusual two tables join

I have table Persons:
----------------------------------------
id | name | phone | house_id |
----------------------------------------
1 | Sarah | 1234567 | 101 |
2 | Joseph | 7654321 | 102 |
3 | David | 1231231 | null |
Ans second table Houses:
----------------------------------------
id | street | number |
----------------------------------------
101 | Evergreen Terrace | 742 |
102 | Baker Street | 223B |
103 | Oxford Street | 23A |
I need such output table as following:
--------------------------------------------------------------------------------
id(person)| name | phone | house_id | id(house) | street | number |
--------------------------------------------------------------------------------
1 | Sarah | 1234567 | 101 | 101 | Evergreen T...| 742 |
2 | Joseph | 7654321 | 102 | 102 | Baker Street | 223B |
3 | David | 1231231 | null | null | null | null |
4 | null | null | null | 103 | Oxford Street | 23A |
What kind of join do I need to use to achieve such result?

SELECT
A.id AS 'Person',
A.name,
A.phone,
A.house_id,
B.id AS 'House',
B.street,
B.number
FROM
Persons AS A
FULL OUTER JOIN Houses AS B
ON A.house_id = B.id

Creating a pivot table in T-SQL?

I could really use some help creating a pivot table. I have data in some rows that instead need to be appear in columns, juxtaposed next to values in other records.
The data is currently in the following format:
Region | Location | Customer | CustomerKey |Status
North | New York | John | 111 |Active
North | New York | Mary | 112 |Active
North | Delaware | Bob | 113 |Idle
North | New Jersey| Bob | 113 |Active
West | California| Bob | 113 |Inactive
West | Washington| Greg | 114 |Inactive
West | Utah | Tim | 115 |Active
North | All States | Bob | 113 |VIP Customer
North | All States | Mary | 112 |Regular Customer
West | All States | Bob | 113 |Regular Customer
West | All States | Tim | 115 |Regular Customer
West | All States | Greg | 114 |VIP Customer
North | All States | John | 111 |Regular Customer
The issue is with the 'Status' column, which can have one group of values (Inactive/Active/Idle) and another (VIP Customer and Regular Customer). When the 'Location' column is 'All States', it uses the VIP/Regular values. I would like to add a column, to have the data appear along the lines of:
Region | Location | Customer | CustomerKey |Status | VIPStatus
North | New York | John | 111 |Active | No
North | New York | Mary | 112 |Active | No
North | Delaware | Bob | 113 |Idle | Yes
North | New Jersey| Bob | 113 |Active | Yes
West | California| Bob | 113 |Inactive | No
West | Washington| Greg | 114 |Inactive | Yes
West | Utah | Tim | 115 |Active | No
Basically, if the Customer has a record with the Status of a 'VIP Customer', under a combination of a Region and a corresponding Location value of 'All States', then it will show a 'VIPStatus' of 'Yes' or 'No' under any record of that customer under that given Region (regardless of the Location state). Is there a simple solution for this? Any help on rearranging this data would in T-SQL would be greatly appreciated.

You should be able to join on the table multiple times to get the result that you need:
select t1.region,
t1.location,
t1.customer,
t1.customerkey,
t1.status,
case when t2.status is not null then 'Yes' else 'No' end VIPStatus
from yourtable t1
left join yourtable t2
on t1.CustomerKey = t2.CustomerKey
and t2.Location = 'All States'
and t2.status = 'VIP Customer'
where t1.Location <> 'All States'
See SQL Fiddle with Demo
The result is:
| REGION | LOCATION | CUSTOMER | CUSTOMERKEY | STATUS | VIPSTATUS |
-----------------------------------------------------------------------
| North | New York | John | 111 | Active | No |
| North | New York | Mary | 112 | Active | No |
| North | Delaware | Bob | 113 | Idle | Yes |
| North | New Jersey | Bob | 113 | Active | Yes |
| West | California | Bob | 113 | Inactive | Yes |
| West | Washington | Greg | 114 | Inactive | Yes |
| West | Utah | Tim | 115 | Active | No |

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Split function in HIVE - hive

Related

Get hierarchy of all different level of managers

Compare very different tables

SQL query based on a column in parent table - Parent child relationship

Unusual two tables join

Creating a pivot table in T-SQL?

Categories

Resources