Hive table property to consider consecutive delimiters as one delimiter

Hive table property to consider consecutive delimiters as one delimiter - hive

jan 18 "value1 is null"
feb 4 "value1 is null"
in the above dataset there is consecutive delimiters between the 1st and 2nd column in second row how to handle consecutive delimiters as one delimiter.

create external table mydata
(
c1 string
,c2 string
,c3 string
)
row format serde 'org.apache.hadoop.hive.serde2.RegexSerDe'
with serdeproperties ('input.regex' = '(".*?"|.*?)\\s+(".*?"|.*?)\\s+(".*?"|.*?)')
location '/user/hive/warehouse/mydata'
;
select * from mydata;
+-----------+-----------+------------------+
| mydata.c1 | mydata.c2 | mydata.c3 |
+-----------+-----------+------------------+
| jan | 18 | "value1 is null" |
| feb | 4 | "value1 is null" |
+-----------+-----------+------------------+

Related

PostgreSQL csv import not working for only integer

I have the following problem using PostgreSQL 14
On Windows 10 with latest updates.
I need to insert values into the following table.
CREATE TABLE StateList (
ID int GENERATED ALWAYS AS IDENTITY,
State_Number int NOT NULL,
ElectionGroup_ID INT NOT NULL,
Election_Number int NOT NULL,
UNIQUE (State_Number, ElectionGroup_ID, Election_Number),
PRIMARY KEY (ID)
);
I want to do the following command:
COPY StateList(Election_Number, State_Number, ElectionGroup_ID )
FROM '...\csvFileStateLists19.csv'
WITH (
FORMAT CSV,
DELIMITER ','
);
the "csvFileStateLists19" being
"19","9","4"
"19","5","238"
"19","5","21"
"19","15","1"
"19","5","10"
It worked fine for another table that used strings and integer.
But here I always get:
ERROR: FEHLER: ungültige Eingabesyntax für Typ integer: »19«
CONTEXT: COPY statelist, Zeile 1, Spalte election_number: »19«
SQL state: 22P02
Which is usually the sign that the number is an empty string or really not a number. but its not! It's a 19, why doesn't it work?
I generated the file in java,
its utf8 encoded,
database is "German_Germany.1252"
show client_encoding; => UNICODE
show server_encoding; => UTF8
SELECT pg_encoding_to_char(encoding) FROM pg_database WHERE datname = 'database1'; => UTF8
select pg_encoding_to_char(encoding), datcollate, datctype from pg_database where datname = 'database1';
Returns
"UTF8" "German_Germany.1252" "German_Germany.1252"
Thank you for your help!

Well, with your input, I get the same error message - just in English, not German - I did it in Vertica, Stonebraker's successor of PosgreSQL, whose CSV parser works very much the same:
COPY statelist FROM LOCAL 'st.csv' DELIMITER ',' EXCEPTIONS 'st.log';
-- error messages in "st.log"
-- COPY: Input record 1 has been rejected (Invalid integer format '"19"' for column 1 (State_Number)).
-- COPY: Input record 2 has been rejected (Invalid integer format '"19"' for column 1 (State_Number)).
-- COPY: Input record 3 has been rejected (Invalid integer format '"19"' for column 1 (State_Number)).
-- COPY: Input record 4 has been rejected (Invalid integer format '"19"' for column 1 (State_Number)).
-- COPY: Input record 5 has been rejected (Invalid integer format '"19"' for column 1 (State_Number)).
Well, that's no wonder really. "9" is a string literal, not an INTEGER literal. It's a VARCHAR(1) consisting of the numeric letter "9", not an INTEGER.
Try adding the ENCLOSED BY '"' clause. It worked for me:
COPY statelist FROM LOCAL 'st.csv' DELIMITER ',' ENCLOSED BY '"' EXCEPTIONS 'st.log';
-- out Rows Loaded
-- out -------------
-- out 5
SELECT * FROM statelist;
-- out State_Number | ElectionGroup_ID | Election_Number
-- out --------------+------------------+-----------------
-- out 19 | 5 | 10
-- out 19 | 5 | 21
-- out 19 | 5 | 238
-- out 19 | 9 | 4
-- out 19 | 15 | 1

Not an answer just proof that double quoted numeric values in a CSV are not the problem:
cat csv_test.csv
"19","9"
"19","5"
"19","5"
"19","15"
"19","5"
test(5432)=# \d csv_test
Table "public.csv_test"
Column | Type | Collation | Nullable | Default
--------+---------+-----------+----------+---------
col1 | integer | | |
col2 | integer | | |
select * from csv_test;
col1 | col2
------+------
(0 rows)
\copy csv_test from 'csv_test.csv' with csv;
COPY 5
select * from csv_test;
col1 | col2
------+------
19 | 9
19 | 5
19 | 5
19 | 15
19 | 5
So now maybe we can get on with answers that solve the issue.

sql-remove dashes from string column

in stored procedure, i have this field
LTRIM(ISNULL(O.Column1, ''))
If there is a dash(-) symbol at end of the value, want to remove it. only in conditions if a dash symbol exist at start/end.
Any suggestions
EDIT:
Microsoft SQL Server 2014 12.0.5546.0
Expected output:
1)input: "abc-abc" //output: "abc-abc"
2)input: "abc-" //output: "abc"
3)input: "abc" //ouput: "abc"

I think you might be stuck with string manipulation here.
The CASE expression here takes the LTRIM/RTRIM result from your column and checks both ends for a dash, and then each end for a dash. If dashes exist, it strips them out. It's not pretty, and won't perform well on a mountain of data, but will do what you need.
Data setup:
create table trim (col1 varchar(10));
insert trim (col1)
values
('abc'),
(' abc-'),
('abc- '),
('abc-abc '),
(' -abc'),
('-abc '),
(NULL),
(''),
(' -abc- ');
The query:
select
case
when right(ltrim(rtrim(isnull(col1,''))),1) = '-'
and left(ltrim(rtrim(isnull(col1,''))),1) = '-'
then substring(ltrim(rtrim(isnull(col1,''))),2,len(ltrim(rtrim(isnull(col1,''))))-2)
when right(ltrim(rtrim(isnull(col1,''))),1) = '-'
then left(ltrim(rtrim(isnull(col1,''))), len(ltrim(rtrim(isnull(col1,''))))-1)
when left(ltrim(rtrim(isnull(col1,''))),1) = '-'
then right(ltrim(rtrim(isnull(col1,''))), len(ltrim(rtrim(isnull(col1,''))))-1)
else ltrim(rtrim(isnull(col1,'')))
end as trimmed
from trim;
Results:
+---------+
| trimmed |
+---------+
| abc |
| abc |
| abc |
| abc-abc |
| abc |
| abc |
| |
| |
| abc |
+---------+
SQL Fiddle Demo

Since the Database is not mentioned, here is how you do it (rather find it)
SQL Server
Remove the last character in a string in T-SQL?
Oracle
Remove last character from string in sql plus
Postgresql
Postgresql: Remove last char in text-field if the column ends with minus sign
MySQL
Strip last two characters of a column in MySQL

You can use LEFT function, along with SUBSTRING to achieve the result.
SELECT CASE WHEN RIGHT(stringVal,1)= '-' THEN SUBSTRING(stringVal,1,LEN(stringVal)-1)
ELSE stringVal END AS ModifiedString
from
( VALUES ('abc-abc'), ('abc-'),('abc')) as t(stringVal)
+----------------+
| ModifiedString |
+----------------+
| abc-abc |
| abc |
| abc |
+----------------+

How to use Regex in SQL for extracting values after repetitive numbers

I have the following table (table1):
+---+---------------------------------------------+
+---|--------att1 --------------------------------+
| 1 | 10.2.5.4 4.3.2.1.in-addr.arpa |
| 2 | asd 100.99.98.97 97.3.2.1.a.b.c fsdf |
| 3 | fd 95.94.93.92 92.5.7.1.a.b.c |
| 4 | a 11.4.99.75 75.77.52.41.in-addr.arpa |
+---+---------------------------------------------+
I would like to get the following values (that are located after the repetitive numbers): in-addr.arpa, a.b.c, a.b.c, in-addr.arpa.
I tried to use the following format with no success:
SELECT att1
FROM table1
WHERE REGEXP_LIKE(att1 , '^(\d+?)\1$')
I would like it to run in Impala and Oracle.

Use REGEXP_SUBSTR (assuming you are using an Oracle DB).
select regexp_substr(att1,'[0-9]\.([^0-9]+)',1,1,null,1)
from table1
[0-9]\. a numeric followed by a .
[^0-9]+ any character other than a numeric is matched until the next numeric is found. () around this indicates the group (first in this case) and we only extract that part of the string.
Sample Demo

Get rows where value is not a substring in another row

I'm writing recursive sql against a table that contains circular references.
No problem! I read that you can build a unique path to prevent infinite loops. Now I need to filter the list down to only the last record in the chain. I must be doing something wrong though. -edit I'm adding more records to this sample to make it more clear why just selecting the longest record doesn't work.
This is an example table:
create table strings (id int, string varchar(200));
insert into strings values (1, '1');
insert into strings values (2, '1,2');
insert into strings values (3, '1,2,3');
insert into strings values (4, '1,2,3,4');
insert into strings values (5, '5');
And my query:
select * from strings str1 where not exists
(
select * from strings str2
where str2.id <> str1.id
and str1.string || '%' like str2.string
)
I'd expect to only get the last records
| id | string |
|----|---------|
| 4 | 1,2,3,4 |
| 5 | 5 |
Instead I get them all
| id | string |
|----|---------|
| 1 | 1 |
| 2 | 1,2 |
| 3 | 1,2,3 |
| 4 | 1,2,3,4 |
| 5 | 5 |
Link to sql fiddle: http://sqlfiddle.com/#!15/7a974/1

My problem was all around the 'LIKE' comparison.
select * from strings str1
where not exists
(
select
*
from
strings str2
where
str2.id <> str1.id
and str2.string like str1.string || '%'
)

db2 sql pattern matching

I have a table in db2 which has the following fields
int xyz;
string myId;
string myName;
Example dataset
xyz | myid | myname
--------------------------------
1 | ABC.123.456 | ABC
2 | PRQS.12.34 | PQRS
3 | ZZZ.3.2.2 | blah
I want to extract the rows where myName matches the character upto "." in the myId field. So from the above 3 rows, I want the firs 2 rows since myName is present in myId before "."
How can I do this in the query, can I do some kind of pattern matching inside the query?

LEFT and LOCATE work in the DB2 instance I can connect to (which may not help of course!)
So hopefully something like this...
SELECT
*
FROM
MyTable Z
WHERE
LEFT(myid, LOCATE('.', myid)) = myname + '.'

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Hive table property to consider consecutive delimiters as one delimiter - hive

jan 18 "value1 is null" feb 4 "value1 is null" in the above dataset there is consecutive delimiters between the 1st and 2nd column in second row how to handle consecutive delimiters as one delimiter.

Related

PostgreSQL csv import not working for only integer

sql-remove dashes from string column

How to use Regex in SQL for extracting values after repetitive numbers

Get rows where value is not a substring in another row

db2 sql pattern matching

Categories

Resources