Break a row of words into word groups in Hive - hive

I have some text that I would like to break down into two, three, or even four words at a time. I'm trying to pull meaningful phrases.
I have used split and explode to retrieve what I need, but I would like to have the row broken into two or three words at a time. This is what I have so far, which only breaks the row into one word at a time.
select explode(a.text) text
from (select split(text," ") text
from table abc
where id = 123
and date = 2019-08-16
) a
The Output I get:
text
----
thank
you
for
calling
your
tv
is
not
working
?
I would like an output like this:
text
----
Thank you
for calling
your tv
is not
working?
or something like this:
text
----
thank you for calling
your
tv is not working
?

CREATE TABLE IF NOT EXISTS db.test_string
(
text string
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS orc
;
INSERT INTO TABLE db.test_string VALUES
('thank you for calling your tv is not working ?');
below is query:
select k,s from db.test_string
lateral view posexplode(split(text,' ')) pe as i,s
lateral view posexplode(split(text,' ')) ne as j,k
where ne.j=pe.i-1
and ne.j%2==0
;
thank you
for calling
your tv
is not
working ?
Time taken: 0.248 seconds, Fetched: 5 row(s)
add above logic to your actual table with where clause and let me know how it goes.

Related

Single hive query to remove certain text in data

I have a column data like this in 2 formats
1)"/abc/testapp/v1?FirstName=username&Lastname=test123"
2)"/abc/testapp/v1?FirstName=username"
I want to retrieve the output as "/abc/testapp/v1?FirstName=username" and strip out the data starting with "&Lastname" and ending with "".The idea is to remove the Lastname with its value.
But if the data doesn't contain "&Lastname" then it should also work fine as per the second scenario
The value for Lastname shown in the example is "test123" but in general this will be dynamic
I have started with regexp_replace but i am able to replace "&Lastname" but not its value.
select regexp_replace("/abc/testapp/v1?FirstName=username&Lastname=test123&type=en_US","&Lastname","");
Can someone please help here how i can achieve both these with a single hive query?
Use split function:
with your_data as (--Use your table instead of this example
select stack (2,
"/abc/testapp/v1?FirstName=username&Lastname=test123",
"/abc/testapp/v1?FirstName=username"
) as str
)
select split(str,'&')[0] from your_data;
Result:
_c0
/abc/testapp/v1?FirstName=username
/abc/testapp/v1?FirstName=username
Or use '&Lastname' pattern for split:
select split(str,'&Lastname')[0] from your_data;
It will allow something else with & except starting with &Lastname
for both queries with or without last name its working in this way using split for hive no need for any table to select you can directly execute the function like select functionname
select
split("/abc/testapp/v1FirstName=username&Lastname=test123",'&')[0]
select
split("/abc/testapp/v1FirstName=username",'&')[0]
Result :
_c0
/abc/testapp/v1FirstName=username
you can make a single query :
select
split("/abc/testapp/v1FirstName=username&Lastname=test123",'&')[0],
split("/abc/testapp/v1FirstName=username",'&')[0]
_c0 _c1
/abc/testapp/v1FirstName=username /abc/testapp/v1FirstName=username

Big Query Record split into multiple columns

I have a table that looks like:
text | STRING
concepts | RECORD
concepts.name | STRING
[...]
So one row could look like this:
"This is a text about BigQuery and how to split records into columns. "
SQL
BigQuery
Questions
I would like to transform that into:
text, concepts_1, concepts_2, concepts_3 // The names are not important
"This is a text about BigQuery and how to split records into columns. ",SQL,BigQuery,Questions
The number of concepts in each row vary.
EDIT:
This would also work:
text, concepts
"This is a text about BigQuery and how to split records into columns. ","SQL,BigQuery,Questions"
Below is for BigQuery Standard SQL
If comma separated list is fine with you - consider below shortcut versions
#standardSQL
SELECT identifier,
(SELECT STRING_AGG(name, ', ') FROM UNNEST(concepts)) AS conceptName
FROM `project.dataset.articles`
and
#standardSQL
SELECT identifier,
(SELECT STRING_AGG(name, ', ') FROM articles.concepts) AS conceptName
FROM `project.dataset.articles` articles
Both above versions return output like below
Row identifier conceptName
1 1 SQL, BigQuery, Questions
2 2 xxx, yyy, zzz
As you can see - above versions are brief and compact and don't use extra grouping to array with then transforming it into string - as all this can be done in one simple shot
This was the solution for me. But it only creates a comma-separated string. However, in my case, this was fine.
SELECT articles.identifier, ARRAY_TO_STRING(ARRAY_AGG(concepts.name), ",") as
conceptName
FROM `` articles, UNNEST(concepts) concepts
GROUP BY articles.identifier
Try using this:
SELECT
text,
c.*
FROM
`your_project.your_dataset.your_table`,
UNNEST(
concepts
) c
This will get the text column along with the unnested values from your RECORD column.
Hope it helps.

How to return results that have String A with String B, but not any values that have String B without String A

I'm working in Redshift with a field that tracks the position of an engagement -- the field can contain the following values:
FT, LC, PostLC, OC, Form, CW
and can have any combination of those really. I specifically only want to see ones that are LC touchpoints.
This means they must contain the LC string -- if i do:
'ilike '%LC%' i will get results that dont contain the LC position, but contain the PostLC position. I need to only return the values that contain the LC position with other values.
for example 'FT,LC,PostLC,OC' would be fine, but 'PostLC, OC, CW' would not be acceptable.
How can i translate this into a SQL condition in the where clause?
thanks
You can use like:
where ',' || field || ',' like '%,LC,%'
The quick answer is that you need to include commas and spaces into your thinking so that (for example) like '%, LC%' will help differentiate LC from PostLC.
demo
CREATE TABLE Table1
(engagements varchar(100))
;
INSERT INTO Table1
(engagements)
VALUES
('FT, LC, PostLC, OC, Form, CW')
, ('PostLC, OC, CW')
;
Query 1:
select *
from table1
where engagements like '%, LC%'
Results:
| engagements |
|------------------------------|
| FT, LC, PostLC, OC, Form, CW |
However the harder (to accept) answer is you should change your data model so that comma separated string becomes a new table with rows for each new value.
edit
Also note that the first item in the list has no preceding comma. So your queries will become a bit more complex:
select *
from table1
where engagements like '%, LC%'
or engagements like 'LC ,%'

split up sql column into queryable results set

Here is my issue:
I have a column with the following data in an sql column:
Answers
=======
1:2:5: <--- notice my delimiter
I need to be able to break up the digits into a result set that i can join against a lookup table such as
Answers_Expanded
=======
1 apple
2 pear
3 cherry
4 mango
5 grape
and return
Answers
=======
apple pear grape
Any such way?
Thanks!
This is a bit of a hack (the LIKE, the XML PATH, and the STUFF), and it assumes that you want the answers ordered by their ID as opposed to matching up with the original order in the multivalued column...
But this gives the results you're looking for:
SELECT STUFF((
SELECT ' ' + ae.Answer
FROM
Answers_Expanded ae
JOIN Answers a ON ':' + a.Answers LIKE '%:' + CAST(ae.ID AS VARCHAR) + ':%'
ORDER BY ae.ID
FOR XML PATH(''))
, 1, 1, '') AS Answers
Sql Fiddle
This works because:
Joining with LIKE finds any Answer_Expanded rows that match the multivalued column.
XML PATH simulates a group concatenation... and allows for ' ' to be specified as the delimiter
STUFF removes the leading delimiter.
This blog post has a good example of a user defined function that will return a table with the values from your delimited string in a column. You can then join that table to your Answers_Expanded table to get your value.
This works fine if you are parsing reasonably short strings, and if you are doing it as a one time thing, but if you have a table with your answers stored in a column like that, you don't want to be running this on the whole table as it will be a large performance hit. Ideally you'd want to avoid getting delimited strings like this in SQL.
i would suggest that you save your answers in a way that one cell has only one number...not multible information in one cell. (violation of the 1st normal form).
otherwise you better use some higher sql language such as T-SQL.

SQL Server Output to Text File Each Column Different Length Space Delimited

Let's say I have a SQL Server table that looks like the following:
ID NAME DESCRIPTION
1 ANDREW COOL
2 MATT NOT COOL
All I need to do is output the data to a space delimited text file. However I want to ensure that the 'NAME' column has at maximum 10 characters. So for example with the first row 'ANDREW' is is 6 characters, then I'd want 4 spaces after it.
Same thing for second row. 'MATT' is 4 characters, so I would want 6 spaces after it. This way as you move to each column the data is lined up, worst case it gets truncated but I'm not concerned with that.
Use this select query then export this to ur text file.
select ID,cast(NAME as char(10)) as NAME,DESCRIPTION from yourtable
you can use convert function
select CONVERT(char(10),'ANDREW')
.
select ID,
CONVERT((char(10),NAME) as NAME,
DESCRIPTION
from <table>