HQL Query for extracting certain number from a large string - hive

I am trying to extract a certain number in a large string field available in a database table in Hive using HQL. sample and desired output are given below. Please support.
String Sample:
'.....1HGFE......421FSDFS459....5050......645859....0089......PHX0200480002.....300fsdafsd418000006315418 '
'.....1ZXXS......421FDSFS8459....5050.......0089...... . . . .PHfdsfasfsaX0200480002....30000fsdfafdsa6315418 '
'..... 0089......4FSDFS00878459....5050............. . . . .PGFSASADFWRER200480002....3000006DFASF315418FSDF000006315418 '
Desired out put.
0089
0089
0089

Related

Update SQL column based on contains search from list of values

I have a list of varieties, each variety having codes like this
CLADIR(E611, E613, E614, E615, F120, F121, F122, F123, F188, F1D9, F1DA, F108)
VITIN(E620, F10A, F10E, F16B, F16C, F16D, F16E, F17D)
SOLO(E612, E617, E618, E619, F124, F125, F126, F127, F128, F1DB, F1DC)
JIMNA(E61E, F180)
I have data in existing database with different Varity names and codes(some codes are equal to one of these code from these varieties and some are not). I am working to update the Variety name based on the match for its code to any of these codes. If the code is not part of any of these codes, then I want to remove that row.
Example data:
Variety Code
SNANA F108
FLATO E612
JAITI X111
for the above data
SNANA will be updated as CLADIR as code F108 is one of the CLADIR's
code from the list
FLATO will be updated as SOLO as code E612 is
oneof the SOLO's code from the list
JAITI with code X111 row
should be removed as X111 is not matching any of above varieties
codes
Is it possible to do this in straight sql? can someone help me
Are you looking for case expressions?
update t
set variety = (case when code in ('E611', . . . ) then 'Cladir'
when code in ('E620', . . . ) then 'Vitin'
. . .
end);
-- The above sets the variety for non-matching codes to `NULL`, so delete those
delete from t
where variety is null;

Apache pig Store based on condition

I'm reading from a csv file and after grouping those datas I'm doing a count operation . Is there any way to store the datas into a folder name bad if the count is 0 and to good if the count is > 0 . I tried with the below code but it is not happening .
CODE :
STORE countVal INTO '/user/cloudera/good' IF countVal > 0 ;
USE function SPLIT. Refer :
https://pig.apache.org/docs/r0.7.0/piglatin_ref2.html#SPLIT
SPLIT A INTO X IF f1<7, Y IF f2==5, Z IF (f3<6 OR f3>6);
There are a couple of ways this :
1)Use the split function to perform the split based on the criteria.
SPLIT data into good if count>0, bad if (count==0);
2)Use a FOREACH loop to separate the data based on a criteria, using a BinCond operator.
X = FOREACH A GENERATE , data, (count>0?"good":"bad");

How to make multiple rows from single row in pig for movie lens dataset

I want to divide a single row into multiple row on the basis of a field in pig.
Example:
Consider one of the row in movie Data Set as follows:
(31807, Dot the I (2003), Drama|Film-Noir|Thriller)
each field is separated by ','.
Desired Output is as follows in 3 different rows:
31807,Dot the I (2003),Drama
31807,Dot the I (2003),Film-Noir
31807,Dot the I (2003),Thriller
Can anyone please help me to get the desired output in pig.
The below logic will help you .
/* Input
(31807,Dot the I (2003),Drama|Film-Noir|Thriller)
*/
list = LOAD '/user/cloudera/movies.txt' USING PigStorage(',') AS(id:int,name:chararray,generes:chararray);
list_each = FOREACH list GENERATE id,name, flatten(TOKENIZE(generes,'|'));
dump list_each;
/* Output
(31807,Dot the I (2003),Drama)
(31807,Dot the I (2003),Film-Noir)
(31807,Dot the I (2003),Thriller)
*/

SQL statement search for all characters inside field

I have an address field and address is stored similar to
1111 address info, county
$stmt = $db->prepare('SELECT * FROM ' . Config::USER_TABLE . ' WHERE ' . Config::SEARCH_COLUMN . ' LIKE :query ORDER BY '.Config::SEARCH_COLUMN.' LIMIT ' . $start . ', ' . $items_per_page);
$search_query = $query.'%';
$stmt->bindParam(':query', $search_query, PDO::PARAM_STR);
At the moment I can only search the field by it is number. I want to be able to type any letter and be able to search. Any ideas??? Thanks
The % sign you attach to your parameter acts as an SQL wildcard:
$search_query = $query.'%';
Since you only attach it to the end of your parameter, it will only search for the string in the beginning of the column values.
Instead, also add one in the beginning of your parameter:
$search_query = '%'.$query.'%';
This way, it will look for the string in any position.
Use %search_criteria% so you can search with wildcards on both sides

SAS table string length (limit)

I am creating a SAS table in which one of the fields has to holds a huge string.
Following is my table (TABLE name = MKPLOGS4):
OBS RNID DESCTEXT
--------- -----------
1 123 This is some text which is part of the record. I want this to appear
2 123 concatenated kinda like concat_group() from MYSQL but for some reason
3 123 SAS does not have such functionality. Now I am having trouble with
4 123 String concatenation.
5 124 Hi there old friend of mine, hope you are enjoying the weather
6 124 Are you sure this is not your jacket, okay then. Will give charity.
. . .
. . .
. . .
and I have to get a Table similar to this (table name = MKPLOGSA):
OBS RNID DESCTEXT
--------- -----------
1 123 This is some text which is part of the record. I want this to appear concatenated kinda like concat_group() from MYSQL but for some reason SAS does not have such functionality. Now I am having trouble with String concatenation.
2 124 Hi, there old friend of mine, hope you are enjoying the weather Are you sure this is not your jacket, okay then. Will give charity.
. . .
. . .
. . .
So, after trying unsuccessfully with SQL, I came up with the following SAS code (please note I am very new at SAS):
DATA MKPLOGSA (DROP = DTEMP DTEXT);
SET MKPLOGS4;
BY RNID;
RETAIN DTEXT;
IF FIRST.RNID THEN
DO;
DTEXT = DESCTEXT;
DELETE;
END;
ELSE IF LAST.RNID THEN
DO;
DTEMP = CATX(' ',DTEXT,DESCTEXT);
DESCTEXT = DTEMP;
END;
ELSE
DO;
DTEMP = CATX(' ',DTEXT,DESCTEXT);
DTEXT = DTEMP;
DELETE;
END;
The SAS log is producing this warning message:
WARNING: IN A CALL TO THE CATX FUNCTION, THE BUFFER ALLOCATED
FOR THE RESULT WAS NOT LONG ENOUGH TO CONTAIN THE CONCATENATION
OF ALL THE ARGUMENTS. THE CORRECT RESULT WOULD CONTAIN 229 CHARACTERS,
BUT THE ACTUAL RESULT MAY EITHER BE TRUNCATED TO 200 CHARACTER(S) OR
BE COMPLETELY BLANK, DEPENDING ON THE CALLING ENVIRONMENT. THE
FOLLOWING NOTE INDICATES THE LEFT-MOST ARGUMENT THAT CAUSED TRUNCATION.
Followed by the message (for the SAS data step I posted here):
NOTE: ARGUMENT 3 TO FUNCTION CATX AT LINE 100 COLUMN 15 IS INVALID.
Please note that in my sample data table (MKPLOGS4), each line of string for the field DESCTEXT can be upto 116 characters and there is no limit as to how many lines of description text/recordID.
The output I am getting has only the last line of description:
OBS RNID DESCTEXT
---- --------
1 123 String concatenation.
2 124 Are you sure this is not your jacket, okay then. Will give charity.
. . .
. . .
. . .
I have the following questions:
. is there something wrong with my code?
. is there a limit to SAS string concatenation? Can I override this? If yes, please provide code.
If you have a suggestion, I would really appreciate if you can post your version of code. This is not school work/homework.
Since SAS stores character data as blank-padded fixed length strings, it is usually not a good idea to store a large amount of text in the dataset. However, if you must, then you can create a character type variable with a length of up to 32767 characters. If you don't mind doing some extra I/O, here is an easy way.
/* test data -- same id repeated over multiple observations i.e., in a "long-format" */
data one;
input rnid desctext & :$200.;
cards;
123 This is some text which is part of the record. I want this to appear
123 concatenated kinda like concat_group() from MYSQL but for some reason
123 SAS does not have such functionality. Now I am having trouble with
123 String concatenation.
124 Hi there old friend of mine, hope you are enjoying the weather
124 Are you sure this is not your jacket, okay then. Will give charity.
;
run;
/* re-shape from the long to the wide-format. assumes that data are sorted by rnid. */
proc transpose data=one out=two;
by rnid;
var desctext;
run;
/* concatenate col1, col2, ... vars into single desctext */
data two;
length rnid 8 desctext $1000;
keep rnid desctext;
set two;
desctext = catx(' ', of col:);
run;
The documentation for the catx function specifies that it will (by default) only return 200 characters unless you have already specified a length for the string you are storing the result to.
All you need to do is add either a length or an attrib statement somewhere in your datastep.
Here is how I would have coded it (untested):
data mkplogsa (rename=dtext=desctext);
length dtext $32767 ;
set mkplogs4;
by rnid;
retain dtext;
if first.rnid then do;
dtext = "";
end;
dtext = catx(' ',dtext,desctext);
if last.rnid then do;
output;
end;
keep dtext;
run;
Note that 32767 is the largest string size for a character value in a SAS dataset. If your string is larger than that you're out of luck.
Cheers
Rob
Thanks guys, I was able to solve this problem by using PROC TRANSPOSE and then using concatenation. Here is the code:
/*
THIS TRANSPOSE STEP TAKES THE MKPLOGS4 TABLE AND
CREATES A NEW TEMPORARY TABLE CALLED MKPLOGSA. SINCE
THE DESCRIPTION TEXT IS STORED IN MULTIPLE LINES (OBSERVATIONS)
IN THE ITEXT FILE, IN ORDER TO COMBINE THEM TO A SINGLE ROW,
WE USE TRANSPOSE. HOWEVER, AFTER THIS STEP, THE DESCRIPTION TEXT
SPREAD OVER MULTIPLE LINES ALTHOUGH ON SAME ROW (OBSERVATION)
ARE STILL SEPARATED INTO MULTIPLE COLUMNS (ON THE SAME ROW)
ALL PREFIXED IN THIS CASE BY 'DESCTEXT'. WE DROP THE AUTO-CREATED
COLUMN _NAME_
*/
PROC TRANSPOSE DATA = MKPLOGS4 OUT = MKPLOGSA (DROP = _NAME_)
PREFIX = DESCTEXT;
VAR DESCTEXT;
BY PLOG;
RUN;
/*
THIS DATA STEP CREATES A NEW TABLE CALLED MKPLOGSB WHICH
TAKES ALL THE SEPARATED DESCRIPTION TEXT COLUMNS AND
CONCATENATES THEM INTO A SINGLE COLUMN - LONG_DESCRIPTION.
*/
DATA MKPLOGSB (DROP = DESCTEXT:);
SET MKPLOGSA;
/* CONCATENATED DESC. TEXT SET TO MAX. 27000 CHARS. */
LENGTH LONG_DESCRIPTION $27000;
LONG_DESCRIPTION = CATX(' ',OF DESCTEXT:);
RUN;