Increase every character in particular column to next characters - hive - hive

I am working on hive. I want to decode all records in name column by increasing every character to count 3. Example if name is abc i want cde. How can I do this using hive?

EDIT : A much simpler solution : with TRANSLATE function.
SELECT TRANSLATE (name,
'abcdefghijklmnopqrstuvwxyz',
'defghijklmnopqrstuvwxyzabc')
FROM yourtable;
OLD Approach (Don't use )
You could use a combination of splitting, converting to ascii, adding reconverting and joining them back into a string.
WITH t AS
( SELECT 'abc' AS name
UNION ALL
SELECT 'pqr' AS name
) ,
ASC AS
(SELECT name,
ASCII(t1.letter) ascii_number
FROM t LATERAL VIEW explode(split( REGEXP_REPLACE(name,'(.)' , '$1|') , '\\|') ) t1 AS letter
)
SELECT name,
concat_ws('',collect_set( DECODE(unhex(hex(ascii_number+3)), 'US-ASCII') ) ) as name_plus_3
FROM ASC
GROUP BY name;
O/p:
name name_plus_3
---- ----------
abc def
pqr stu

Related

Adding a Varchar auto-increment column to an select query

I'm trying to design a database and I have grouped a certain data as follows
select Name, prod_cat, prod_country, count(*)
from table1
group by Name, prod_cat, prod_country
union
select Name, prod_cat, prod_country, count(*)
from table2
group by Name, prod_cat, prod_country
Now for the resultant data from the query above, I would like to add an auto-increment ID as a VARCHAR column which should start as 'U000001' , 'U000002' , 'U000003' , 'U000004' ... all the way to the last column
For a better database, is it better to put the resultant query into a temporary table/view or is it better to use a stored procedure as there could be more data coming in
How can I add an auto-incrementing ID VARCHAR column to the above mentioned select query? Should I use Declare and increment?
Expected result :
Prod_ID
Name
Prod_cat
Prod_country
U000001
abc
12
USA
U000002
efg
1
IND
U000003
def
3
MEX
U000004
ijk
21
CHN

how to remove duplicated records depend on column not able to sort in hive

I have :
table test contain :
unique_id string , file_name string , mount bigint
sample of date :
uniqu_id , file_name , mount
1 , test.txt , 15
1 , test_R_file.txt , 50
3 , test_567.txt , 30
3 , test_567_R_file.txt , 100
what I want to do :
I need query to insert overwrite the table where I need to keep for each duplicated uniqu_id one record and this record should be the ones that has (R in the file name column)
the issue :
test table is extrnal table in hive (that mean it not support update and delete operation ) so I want insert overwrite to remove duplicated records for each uniqu_id in the table (in case I have 2 records for the same unique_id only the record that has (R) in file name record should stay ) , I was think to use ranking but the idea I do not have column to order on it to knew what record should I keep and what record should I remove I just has the file_name column who should I check it in case I have 2 record has the same unique_id to knew which record should I keep and which should I remove
You can sort by boolean expression does R exists in the filename or not using CASE expression. Also you can convert boolean to int in CASE and add more conditions to the CASE as well as add more orderby expressions, comma separated. You can sort by boolean because True is greater than False.
Demo:
with mytable as (--demo dataset, use your table instead of this CTE
select 1 unique_id , 'test.txt' file_name , 15 mount union all
select 1 , 'test_R_file.txt' , 50 union all
select 3 , 'test_567.txt' , 30 union all
select 3 , 'test_567_R_file.txt' , 100
)
select unique_id, file_name, mount
from
(
select unique_id, file_name, mount,
row_number() over(partition by unique_id
order by file_name rlike '_R_' desc --True is greater than False
--order by something else if necessary
) rn
from mytable t
) s
where rn=1
Result:
unique_id file_name mount
1 test_R_file.txt 50
3 test_567_R_file.txt 100
Use rank instead of row_number if there are possible multiple records with R per unique_id and you want to keep all of them. Rank will assign 1 to all records with R, row_number will assign 1 to the only such record per unique_id.

SQL Query to select a specific part of the values in a column

I have a table in a database and one of the columns of the table is of the format AAA-BBBBBBB-CCCCCCC(in the table below column Id) where A, B and C are all numbers (0-9). I want to write a SELECT query such that for this column I only want the values in the format BBBBBBB-CCCCCCC. I am new to SQL so not sure how to do this. I tried using SPLIT_PART on - but not sure how to join the second and third parts.
Table -
Id
Name
Age
123-4567890-1234567
First Name
199
456-7890123-4567890
Hulkamania
200
So when the query is written the output should be like
Output
4567890-1234567
7890123-4567890
As mentioned in the request comments, you should not store a combined number, when you are interested in its parts. Store the parts in separate columns instead.
However, as the format is fixed 'AAA-BBBBBBB-CCCCCCC', it is very easy to get the substring you are interested in. Just take the string from the fifth position on:
select substr(col, 5) from mytable;
You can select the right part of a column starting at the 4th character
SELECT RIGHT(Id, LEN(Id)-4) AS TrimmedId;
Another option using regexp_substr
with x ( c1,c2,c3 ) as
(
select '123-4567890-1234567', 'First Name' , 199 from dual union all
select '456-7890123-4567890', 'Hulkamania' , 200 from dual
)
select regexp_substr(c1,'[^-]+',1,2)||'-'||regexp_substr(c1,'[^-]+',1,3) as result from x ;
Demo
SQL> with x ( c1,c2,c3 ) as
(
select '123-4567890-1234567', 'First Name' , 199 from dual union all
select '456-7890123-4567890', 'Hulkamania' , 200 from dual
)
select regexp_substr(c1,'[^-]+',1,2)||'-'||regexp_substr(c1,'[^-]+',1,3) as result from x ; 2 3 4 5 6
RESULT
--------------------------------------------------------------------------------
4567890-1234567
7890123-4567890
SQL>

Combine Query Result into single row

I have sql query that result data like this one
Name | City
-------------------
Frank | London
Sebastian | New York
I want to merge that result into a single row and column like this one
Frank;London;Sebastian;New York
How do I resolve this query problem? Thanks before
By default SSMS prints results in grid format. One of the options is to print the results to text. Click this button on the menu or press the shortcut "CTRL+T". This will print tab delimited results by default, instead of the semicolon delimited results that you want. This can be changed from the Query->Query Options->Results-> Text->Output Format menu or by using "CTRL+H" in a text editor (like Notepad) to find and replace all tabs with semicolons.
You can do :
select stuff ( (select distinct ';'+t1.col2
from table t cross apply
( values (name), (city) ) t1 (col2)
for xml path ('')
), 1, 1, ''
) ;
May be this one for Oracle?
WITH tmp AS
(
SELECT 'Frank' Name, 'London' City FROM dual
UNION
SELECT 'Sebastian', 'New York' FROM dual
)
SELECT LISTAGG(name||';'||city, '; ') WITHIN GROUP(ORDER BY null) FROM tmp
In SQL Server that should do the work:
SELECT ';'+rtrim(Name)+';'+rtrim(City)
FROM Table
FOR XML PATH('')
in oracle you don't have XML PATH('') syntax but you can concatenate a field like that for example:
SELECT ';' || WM_CONCAT(name||';'||City) AS Result
FROM table
Note that in Oracle 12c WM_CONCAT is deprecated but you can use ListaAgg
SELECT LISTAGG(Name||City,';') WITHIN GROUP(ORDER BY aColumn DESC) FROM TABLE
cheers

Is it possible to concatenate column values into a string using CTE?

Say I have the following table:
id|myId|Name
-------------
1 | 3 |Bob
2 | 3 |Chet
3 | 3 |Dave
4 | 4 |Jim
5 | 4 |Jose
-------------
Is it possible to use a recursive CTE to generate the following output:
3 | Bob, Chet, Date
4 | Jim, Jose
I've played around with it a bit but haven't been able to get it working. Would I do better using a different technique?
I do not recommend this, but I managed to work it out.
Table:
CREATE TABLE [dbo].[names](
[id] [int] NULL,
[myId] [int] NULL,
[name] [char](25) NULL
) ON [PRIMARY]
Data:
INSERT INTO names values (1,3,'Bob')
INSERT INTO names values 2,3,'Chet')
INSERT INTO names values 3,3,'Dave')
INSERT INTO names values 4,4,'Jim')
INSERT INTO names values 5,4,'Jose')
INSERT INTO names values 6,5,'Nick')
Query:
WITH CTE (id, myId, Name, NameCount)
AS (SELECT id,
myId,
Cast(Name AS VARCHAR(225)) Name,
1 NameCount
FROM (SELECT Row_number() OVER (PARTITION BY myId ORDER BY myId) AS id,
myId,
Name
FROM names) e
WHERE id = 1
UNION ALL
SELECT e1.id,
e1.myId,
Cast(Rtrim(CTE.Name) + ',' + e1.Name AS VARCHAR(225)) AS Name,
CTE.NameCount + 1 NameCount
FROM CTE
INNER JOIN (SELECT Row_number() OVER (PARTITION BY myId ORDER BY myId) AS id,
myId,
Name
FROM names) e1
ON e1.id = CTE.id + 1
AND e1.myId = CTE.myId)
SELECT myID,
Name
FROM (SELECT myID,
Name,
(Row_number() OVER (PARTITION BY myId ORDER BY namecount DESC)) AS id
FROM CTE) AS p
WHERE id = 1
As requested, here is the XML method:
SELECT myId,
STUFF((SELECT ',' + rtrim(convert(char(50),Name))
FROM namestable b
WHERE a.myId = b.myId
FOR XML PATH('')),1,1,'') Names
FROM namestable a
GROUP BY myId
A CTE is just a glorified derived table with some extra features (like recursion). The question is, can you use recursion to do this? Probably, but it's using a screwdriver to pound in a nail. The nice part about doing the XML path (seen in the first answer) is it will combine grouping the MyId column with string concatenation.
How would you concatenate a list of strings using a CTE? I don't think that's its purpose.
A CTE is just a temporarily-created relation (tables and views are both relations) which only exists for the "life" of the current query.
I've played with the CTE names and the field names. I really don't like reusing fields names like id in multiple places; I tend to think those get confusing. And since the only use for names.id is as a ORDER BY in the first ROW_NUMBER() statement, I don't reuse it going forward.
WITH namesNumbered as (
select myId, Name,
ROW_NUMBER() OVER (
PARTITION BY myId
ORDER BY id
) as nameNum
FROM names
)
, namesJoined(myId, Name, nameCount) as (
SELECT myId,
Cast(Name AS VARCHAR(225)),
1
FROM namesNumbered nn1
WHERE nameNum = 1
UNION ALL
SELECT nn2.myId,
Cast(
Rtrim(nc.Name) + ',' + nn2.Name
AS VARCHAR(225)
),
nn.nameNum
FROM namesJoined nj
INNER JOIN namesNumbered nn2 ON nn2.myId = nj.myId
and nn2.nameNum = nj.nameCount + 1
)
SELECT myId, Name
FROM (
SELECT myID, Name,
ROW_NUMBER() OVER (
PARTITION BY myId
ORDER BY nameCount DESC
) AS finalSort
FROM namesJoined
) AS tmp
WHERE finalSort = 1
The first CTE, namesNumbered, returns two fields we care about and a sorting value; we can't just use names.id for this because we need, for each myId value, to have values of 1, 2, .... names.id will have 1, 2 ... for myId = 1 but it will have a higher starting value for subsequent myId values.
The second CTE, namesJoined, has to have the field names specified in the CTE signature because it will be recursive. The base case (part before UNION ALL) gives us records where nameNum = 1. We have to CAST() the Name field because it will grow with subsequent passes; we need to ensure that we CAST() it large enough to handle any of the outputs; we can always TRIM() it later, if needed. We don't have to specify aliases for the fields because the CTE signature provides those. The recursive case (after the UNION ALL) joins the current CTE with the prior one, ensuring that subsequent passes use ever-higher nameNum values. We need to TRIM() the prior iterations of Name, then add the comma and the new Name. The result will be, implicitly, CAST()ed to a larger field.
The final query grabs only the fields we care about (myId, Name) and, within the subquery, pointedly re-sorts the records so that the highest namesJoined.nameCount value will get a 1 as the finalSort value. Then, we tell the WHERE clause to only give us this one record (for each myId value).
Yes, I aliased the subquery as tmp, which is about as generic as you can get. Most SQL engines require that you give a subquery an alias, even if it's the only relation visible at that point.