Regular Expression and Hive - sql

I'm trying to create a hive external table using org.apache.hadoop.hive.serde2.RegexSerDe for analysing comments. The sample rows are:
0 #chef/maintain fix the problem 2017-05-25 20:34:45 1 2017-05-25 20:34:27 0
6 ^ trailing comma is trolling you 2017-05-23 23:08:46 0 2017-05-24 04:40:42 1
This is my regex:
("input.regex" = “(d{1,5}\\s\\w+\\s\\.{19}\\.{1}\\s\\.{1}");
I am getting a null table and couldn't figure the regex.
Table definition:
order 1,2,3,4...
comment #chef/maintain fix the problem
comment_time 2017-05-25 20:34:45
merged 1 or 0
merged_time 2017-05-25 20:34:27
resolved 1 or 0
Can any one help on this?

Try this regex
(\\d)\\s+([^\\d{4}]*)\\s(\\d{4}-\\d{2}-\\d{2}\\s\\d{2}:\\d{2}:\\d{2})\\s+(\\d)\\s+(\\d{4}-\\d{2}-\\d{2}\\s\\d{2}:\\d{2}:\\d{2})\\s(\\d)

Related

Cannot inner join two simple CTEs on a field with mixed data types in Snowflake

Problem
I cannot inner join two simple CTEs in Snowflake. Terse error message does not show the way.
What is wrong with my query?
Bonus points: Why does my query work in SQL Server but not Snowflake?
Background
I am new to Snowflake. I am used to SQL Server. I want to query an inner join of 2 tables in Snowflake. Tables 1 and 2 show the tables that I want to join. Table 3 is the result that I want.
There are some rows with the first table that I want to remove inside a CTE with a simple where clause. When I run my query (see below), I get a verse terse error message:
Numeric value 'HAHA! IM CAUSING TROUBLE' is not recognized
But I thought I "removed" this value with my first CTE.
What is wrong with my query? Bonus points: Why does my query work in SQL Server but not Snowflake?
Table 1: Field History
id
date
field_id
field_value
1
2020-01-01
unwanted
HAHA! IM CAUSING TROUBLE
2
2020-01-02
thing
100
3
2020-01-03
thing
101
4
2020-01-04
thing
102
5
2020-01-05
thing
null
6
2020-01-06
thing
103
Table 2: Things I Want to Join
thing_id
thing_start_date
thing_end_date
something_i_care_about
100
2020-01-01
2020-02-01
secret alien intelligence
101
2020-02-01
2020-03-01
blueprints for shark lazers
102
2020-03-01
2020-04-01
non-YA biz-NAZZ
103
2020-04-01
2020-05-01
who will win bachelorette
Table 3: Final Table of My Dreams
id
date
thing_id
thing_start_date
thing_end_date
something_i_care_about
2
2020-01-02
100
2020-01-01
2020-02-01
secret alien intelligence
3
2020-01-03
101
2020-02-01
2020-03-01
blueprints for time machine
4
2020-01-04
102
2020-03-01
2020-04-01
non-YA biz-NAZZ
6
2020-01-06
103
2020-04-01
2020-05-01
who will win bachelorette
What I have Tried
with field_history as ( -- CTE with simple where clause
select
id
, date
, to_number(field_value, 38, 0) as thing_id -- SQL Server equivalent would be cast() or convert()
from db.schema.history
where field_id = 'thing' and field_value is not null
),
things_i_want as (
select
*
from db.schema.things
),
final as (
select
field_history.id
, field_history.date
, things.*
from field_history
inner join things_i_want on field_history.thing_id = things_i_want.thing_id
)
select * from final
Super Helpful Error Message Blocking me from My Dreams
Numeric value 'HAHA! IM CAUSING TROUBLE' is not recognized
Numeric value 'HAHA! IM CAUSING TROUBLE' is not recognized
Your error message is a type conversion problem and would appear to be here:
to_number(field_value, 38, 0) as thing_id
You may think that the where clause filters out the bad values. However, SQL engines can -- and do -- rearrange operations. I would suggest using a case expression to handle this:
(case when field_value regexp '^[0-9]+$'
then to_number(field_value, 38, 0)
end) as thing_id
The case expression is guaranteed to run the expressions sequentially.
The above idea (but not the regexp part) works in SQL Server and Snowflake.
In Snowflake only, you can use try_ functions:
try_to_number(field_value, 38, 0) as thing_id
It is simple error. You are trying to convert a string 'HAHA! IM CAUSING TROUBLE' to Number. Even if you remove it, you may have future error because of null in the same column in Field_History table. You need to handle that in query. Not sure exactly what query you wrote in SQL Server so can't say why it worked.

Hive line break issue

I have a hive table over an accumulo table (because we need cell level security):
CREATE TABLE testtable(rowid string, value string)
STORED BY 'org.apache.hadoop.hive.accumulo.AccumuloStorageHandler'
WITH SERDEPROPERTIES('accumulo.columns.mapping' = ':rowid,c:value') TBLPROPERTIES ('accumulo.table.name' = 'testtable');
If i have a value which contains "/n" it conflicts with the default hive line break property which is also "/n".
for example:
accumulo insert: insert 1 c value line\x0Abreak
hive select: select rowid, value, row_number() over (order by null) as rank from testtable;
you will get back two rows instead of one.
| rowid | value | rank |
+---------+--------+-------+
| 2 | line | NULL |
| break | 1 | NULL |
Is there any idea how can I avoid this? Thank you for all the help!
That seems very unexpected (as the author of the AccumuloStorageHandler), but maybe I just don't know something that Hive is trying to do?
I'd file a JIRA issue for Hive over at https://issues.apache.org/jira/secure/CreateIssue!default.jspa. Feel free to mention me and I can try to help write a test and get to the bottom of it.

Query Update where higher data in same table Oracle

i have table data like this, namely DATA table:
DATA
-------------------------------
NIK TIME ACTION
-------------------------------
1500671 07:30:00 0
1500671 15:37:00 0
1600005 07:25:00 0
1600005 16:29:00 0
1600006 07:16:00 0
1600006 17:15:00 0
in that table i wanna update data set ACTION=1 where time is higher in same NIK. Anyone can help me?
Please describe the problem appropriately when asking for assistance in technical forums. eg: ddl, test data etc. Anyway, I hope the below helps you.
UPDATE DATA
SET ACTION = 1
WHERE TIME IN (
SELECT MAX(TIME) FROM DATA GROUP BY NIK)

select a number of strings inside a clob

sorry I know you expect code examples but I have absolutly no idear how to start with that issue.
I have a database with about 100000 entries of that structure:
ID | LONGARG
0 ECLONG_TEXT_INSIDE_THIS|LONG_TEXT_INSIDE_THIS|LONG_TEXT_INSIDE_THIS
ECLONG_TEXT_INSIDE_THIS2|LONG_TEXT_INSIDE_THIS|LONG_TEXT_INSIDE_THIS
1 ECLONG_TEXT_INSIDE_THIS|LONG_TEXT_INSIDE_THIS|LONG_TEXT_INSIDE_THIS
2 ECLONG_TEXT_INSIDE_THIS|LONG_TEXT_INSIDE_THIS|LONG_TEXT_INSIDE_THIS
ECLONG_TEXT_INSIDE_THIS2|LONG_TEXT_INSIDE_THIS|LONG_TEXT_INSIDE_THIS
ECLONG_TEXT_INSIDE_THIS3|LONG_TEXT_INSIDE_THIS|LONG_TEXT_INSIDE_THIS
3 ECLONG_TEXT_INSIDE_THIS|LONG_TEXT_INSIDE_THIS|LONG_TEXT_INSIDE_THIS
ECLONG_TEXT_INSIDE_THIS2|LONG_TEXT_INSIDE_THIS|LONG_TEXT_INSIDE_THIS
Longarg is of type CLOB.
My question is, is there a possibility to select all the text that is between EC and the first | to get a result like that without using a StoredProcedure and for all datarows?
Result:
LONG_TEXT_INSIDE_THIS
LONG_TEXT_INSIDE_THIS2
LONG_TEXT_INSIDE_THIS
LONG_TEXT_INSIDE_THIS
LONG_TEXT_INSIDE_THIS2
LONG_TEXT_INSIDE_THIS3
LONG_TEXT_INSIDE_THIS
LONG_TEXT_INSIDE_THIS2
Thanks in advance for your help
Stefan

SQL - convert data to a date/source/value "grid"

I have data in an MYSQL database that looks like this:
Project Date Time
A 2009-01-01 15
A 2009-01-02 10
B 2009-01-02 30
A 2009-01-09 15
C 2009-01-07 5
I would like to produce output from this data like this:
Date Project A Time Project B Time Project C Time
2009-01-01 15 0 0
2009-01-02 10 30 0
2009-01-07 15 0 5
Can this be done with an SQL query, or do I need to write an external script to itterate through the DB and organize the output?
(Also, if someone has a better suggestion for a subject line let me know and I'll edit the question; I'm not sure of the proper terms to describe the current and desired formats, which makes searching for this information difficult)
You're looking for pivot / crosstab support. Here is a good link.
http://en.wikibooks.org/wiki/MySQL/Pivot_table
I believe this is called Pivot table. Just google for it.
I've been looking for something like this and found a MySQL stored procedure that works perfectly:
http://forums.mysql.com/read.php?98,7000,250306#msg-250306
The result you're looking for could be obtained from the following simple call:
call pivotwizard('date','project','time','from_table','where_clause')