Hive array specifying multiple delimiter in collection - hive

I have dataset contains two arrays, both arrays separated by different delimiter..
Ex: 14-20-50-60 is 1st array seperated by -
12#2#333#4 is 2nd array seperated by #..
While creating table how do we specify delimiter in
Collection items terminated by '' ?
input
14-20-50-60,12#2#333#4
create table test(first array<string>, second array<string>)
row format delimited
fields terminated by ','
collection items terminated by '-' (How to specify two delimiters in the collection)

You cannot use multiple delimiters for the collection items. You can achieve what you are trying to do as below though. I have used the SPLIT function to create the array using different delimiters.
Data
14-20-50-60,12#2#333#4
SQL - CREATE TABLE
create external table test1(first string, second string)
row format delimited
fields terminated by ','
LOCATION '/user/cloudera/ramesh/test1';
SQL - SELECT
WITH v_test_array AS
(SELECT split(first, "-") AS first_array,
split(second, "#") AS second_array
FROM test1)
SELECT first_array[0], second_array[0]
FROM v_test_array;
OUTPUT
14 12
Hope this helps.

Related

Decimal input get rounded in create external table in Hive from CSV

I am creating a table in Hive from a CSV (comma separated) that I have in HDFS. I have three columns - 2 strings and a decimal one (with at max 18 values after the decimal dot and one before). Below, what I do:
CREATE EXTERNAL TABLE IF NOT EXISTS my_table(
col1 STRING, col2 STRING, col_decimal DECIMAL(19,18))
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE
LOCATION '/hdfs_path/folder/';
My col_decimal is being rounded to 0 when below 1 and to 1 when it's not actually decimal (I only have numbers between 0 and 1).
Any idea of what is wrong? By the way, I have also tried DECIMAL(38,37)
Thank you.

SQL Server Search for multiple instances of same text in column

I have a SQL Server table that contains an nvarchar(max) column (MyText) containing sentences. I need to identify all instances of a particular phrase in all rows of the (MyText) column. Once identified I want to replace all instances with different text.
Thanks,
Brad
select cust_div, cust_seral
from [dbo].[lveIntake_closing_scripts]
where close_script like '%LMLSUnit%LMLSUnit.com%'
To count how many instances of the source string is contained within each row, you need to replace each instance with a string that is one character shorter, then subtract that length of the resultant string from the length of the original string. Like this:
select
cust_div
, cust_seral
, len(close_script) - len(replace(close_script, 'LMLSUnit.com','LMLSUnit.co'))
from [dbo].[lveIntake_closing_scripts]
where close_script like '%LMLSUnit%LMLSUnit.com%'

regex_replace to append to end of line?

I have a postgres table which contains rows that each hold multiple lines of text (split by new lines), for example...
The table name is formats, column is called format, an example format (1 table row) would look like the following:
list1=text1;
list2=text2;
list3=text3;
etc etc
I would like a way to identify the list2 string and then append additional text to the end of the same line.
So the outcome would be:
list1=text1;
list2=test2;additionaltext
list3=text3;
I have tried the below to try and pull in the 'capture string' into the replace string but have been unsuccessful so far.
regexp_replace(format, 'list2=.*', '\1 additionaltext','n');
To capture a pattern, you must enclose it in parenthesis.
regexp_replace(format, '(list2=.*)', '\1additionaltext', 'n')

Parse a PIPE delimited field into seperate fields (MS ACCESS 2007-2013)

I have a database in MC Access 2017-2013 where the fields are stored as a PIPE delimited string. Now I want a way to parse these fields in a SELECT query in separate fields.
Here an example of my database and the PIPE delimited field:
The field contains 9 Pipes. So I want to separate the field in 10 fields.
Here an example of the Output:
I would be great, if someone could help me.
Thanks,
Peter
You need a VBA function in a standard module to implement the splitting, e.g.
Public Function SplitPipes(str As String, i As Long) As String
Dim arSplit As Variant
arSplit = Split(str, "|")
' Check that arSplit has enough elements
If i - 1 <= UBound(arSplit) Then
' Split creates a 0-based array, but it is easier to start with index 1 in the query
SplitPipes = arSplit(i - 1)
Else
' out of elements -> return empty string
SplitPipes = ""
End If
End Function
Then you can use this function for every single field like this:
SELECT SplitPipes([Strategic Group],1) AS Destination,
SplitPipes([Strategic Group],2) AS SE,
...
FROM yourTable;
Note that the function currently has no error handling whatsoever. Added :)

Import Data Into Hive Containing Whitespace

I am importing data from a csv file into Hive. My table contains both strings and ints. However, in my input file, the ints have whitespace around them, so it kind of looks like this:
some string, 2 ,another string , 7 , yet another string
Unfortunately I cannot control the formatting of the program providing the file.
When I import the data using (e.g.):
CREATE TABLE MYTABLE(string1 STRING, alpha INT, string2 STRING, beta INT, string3 STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
Then all my integers get set to NULL. I am assuming this is because the extra whitespace makes the parsing fail. Is there a way around this?
You can perform a multi-stage import. In the first stage, save all of your data as STRING and in the second stage use trim() to remove whitespace and then save the data as INT. You could also look into using Pig to read the data from your source files as raw text and then write it to Hive as with the correct data types.
Edit
You can also do this in one pass if you can point to your source file as an external table.
CREATE TABLE myTable(
string1 STRING, alpha STRING, string2 STRING, beta STRING, string3 STRING
) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LOCATION '\\server\path\file.csv'
INSERT INTO myOtherTable
SELECT string1,
CAST(TRIM(alpha) AS INT),
string2,
CAST(TRIM(beta) AS INT),
string3
FROM myTable;