Compare two rows of SQL Hive array - sql

Yesterday I was prompted for the code for Postgres:
CREATE TABLE logs (
word1 VARCHAR[3],
word2 VARCHAR[3]
);
INSERT INTO logs (Word1, Word2) VALUES
('{"location","title","value"}', '{"moskow","title353","34"}'),
('{"title","value","location"}', '{"title5653","574584","sidney"}');
SELECT *,
array_position(Word1, 'location') location_position,
Word2[array_position(Word1, 'location')] location_value
FROM logs;
The task was to find out the sequence number of the location value in the first one-dimensional array, and output the value with the same sequence number from the second array. The question is: how to remake the last part with array_position under Hive SQL?
I didn't find a function for the hive that would be similar to array_position in postgres, is it even possible to implement this in the hivegiven that the order and values in arrays can change, while their sequence is preserved in both arrays?

Related

Insert data into an impala table with struct type

I have a table with detailed rows (for 1 id many rows). For this reason I have created a table with struct types in order to reduce the rows and make them 1 id 1 row.
How can I insert data into an impala table with struct types? Also, how can I aggregate after values from a struct type?
Instead of using struct to store multiple values, you can use group_concat(col, separator)
For example, if a customer has 3 account numbers and you want to store them in 1 row separated by comma, you can use below code -
select cust_id, name, group_concat(cust_acc,',') as concat_account
from cust_details
group by 1,2
You can use pipe if your data has comma in it.
Another benifit of above solution is, you can use split_part(concat_account,',',1) to get first account number.
Now, if your data is very complex and you cant use group concat, you can use struct. Its little tricky because you have to prepare data first in the struct format and then load them. Pls refer to below link - How to insert Array<Struct> values in Impala?

Oracle SQL : Select Query : search if clob Contains a string with pattern matching

I have a Orable Table with one CLOB column which contains JSON data. I need a query which will search within the CLOB data.
I have used the condition where DBMS_LOB.instr(colName,'apple:')>0 which gives the records having apple:. However, I need to the query to return records with any number of apples other than blank, meaning, the json apple key should have a value.
I am thinking of something like where DBMS_LOB.instr(colName,'apple:**X**')>0, where X can be any number not null. I tried regexp_instr but it seems this is not correct for CLOB.
Are there any alternatives to solve this?
Generic string functions for parsing JSON inputs are dangerous - you will get false positives, for example, when something that looks like a JSON object is in fact embedded in a string value. (Illustrated by ID = 101 in my example below.)
The ideal scenario is that you are using Oracle 19 or higher; in that case you can use a simple call to json_exists as illustrated below. In the sample table I create, the first JSON string does not contain a member named apple. In the second row, the string does contain a member apple but the value is null. The first query I show (looking for all JSON with an apple member) will include this row in the output. The last query is what you need: it adds a filter so that a JSON string must include at least one apple member with non-null value (regardless of whether it also includes other members named apple, possibly with null value).
create table sample_data
( id number primary key
, colname clob check (colname is json)
);
insert into sample_data
values (101, '{name:"Chen", age:83, values:["{apple:6}", "street"]}');
insert into sample_data
values (102, '{data: {fruits: [{orange:33}, {apple:null}, {plum:44}]}}');
insert into sample_data
values (103, '[{po:3, "prods":[{"apple":4}, {"banana":null}]},
{po:4, "prods":null}]');
Note that I intentionally mixed together quoted and unquoted member names, to verify that the queries below work correctly in all cases. (Remember also that member names in JSON are case sensitive, even in Oracle!)
select id
from sample_data
where json_exists(colname, '$..apple')
;
ID
---
102
103
This is the query you need. Notice the .. in the path (meaning - find an object member named apple anywhere in the JSON) and the filter at the end.
select id
from sample_data
where json_exists(colname, '$..apple?(# != null)')
;
ID
---
103
You can use regexp_like function for this:
where regexp_like(colName,'apple : [0-9]')

Hive: Fetch a row and split the values

I am running the following hive query to fetch a row
select * from hive_table where row_id='x'
It returns an output like
10 15 hello world (1 row with four column values).
I am trying to split these values in java so that I could add get the individual column values in an array. Tried splitting them using the ^A delimiter char (the default delimiter when creating a hive table).
hive_result.split("\u0001")
But it still returns the same result (no splits and returns an array of length 1). Want to know how to split the column vals of a single row fetched from a hive query.
Note: I am running a command-line utility to run this hive query, using jdbc I could use resultSet.next() to get each column separately.
Looks like I need to split the row results by tab instead to control-A. This works fine
hive_result.split("\t")

Extract alphanumeric value from varchar column

I have a table which contains a column having alphanumeric values which is stored as a string. I have multiple values in that column having values such as F4737, 00Y778, PP0098, XXYYYZ etc.
I want to extract values starting with a series of F and must have numeric values in that row.
Alphanumeric column is the unique column having unique values but the rest of the columns contain duplicate values in my table.
Futhermore, once these values are extracted I would like to pick up the max value from the duplicate row,for eg:
Suppose I have F4737 and F4700 as a unique Alphanumeric row, then F4737 must be extracted from it.
I have written a query like this but the numeric values are not getting extracted from this query:
select max(Alplanumeric)
from Customers
where Alplanumeric '%[F0-9]%
or
select max(Alplanumeric)
from Customers
where Alplanumeric like '%[0-9]%'
and Alplanumeric like 'F%'**
I run the above query but I am only getting the F series if I remove the numeric part from the above query. How do I extract both, the F starting series as well as the numeric values included in that row?
Going out on a limb, you might be looking for a query like this:
SELECT *, substring(alphanumeric, '^F(\d+)')::int AS nr
FROM customers
WHERE alphanumeric ~ '^F\d+'
ORDER BY nr DESC NULLS LAST
, alphanumeric
LIMIT 1;
The WHERE conditions is a regular expression match, the expression is anchored to the start, so it can use an index. Ideally:
CREATE INDEX customers_alphanumeric_pattern_ops_idx ON customers
(alphanumeric text_pattern_ops);
This returns the one row with the highest (extracted) numeric value in alphanumeric among rows starting with 'F' followed by one ore more digits.
About the index:
PostgreSQL LIKE query performance variations
About pattern matching:
Pattern matching with LIKE, SIMILAR TO or regular expressions in PostgreSQL
Ideally, you should store the leading text and the following numeric value in separate columns to make this more efficient. You don't necessarily need more tables like has been suggested.

PostgreSQL - best way to return an array of key-value pairs

I'm trying to select a number of fields, one of which needs to be an array with each element of the array containing two values. Each array item needs to contain a name (character varying) and an ID (numeric). I know how to return an array of single values (using the ARRAY keyword) but I'm unsure of how to return an array of an object which in itself contains two values.
The query is something like
SELECT
t.field1,
t.field2,
ARRAY(--with each element containing two values i.e. {'TheName', 1 })
FROM MyTable t
I read that one way to do this is by selecting the values into a type and then creating an array of that type. Problem is, the rest of the function is already returning a type (which means I would then have nested types - is that OK? If so, how would you read this data back in application code - i.e. with a .Net data provider like NPGSQL?)
Any help is much appreciated.
ARRAYs can only hold elements of the same type
Your example displays a text and an integer value (no single quotes around 1). It is generally impossible to mix types in an array. To get those values into an array you have to create a composite type and then form an ARRAY of that composite type like you already mentioned yourself.
Alternatively you can use the data types json in Postgres 9.2+, jsonb in Postgres 9.4+ or hstore for key-value pairs.
Of course, you can cast the integer to text, and work with a two-dimensional text array. Consider the two syntax variants for a array input in the demo below and consult the manual on array input.
There is a limitation to overcome. If you try to aggregate an ARRAY (build from key and value) into a two-dimensional array, the aggregate function array_agg() or the ARRAY constructor error out:
ERROR: could not find array type for data type text[]
There are ways around it, though.
Aggregate key-value pairs into a 2-dimensional array
PostgreSQL 9.1 with standard_conforming_strings= on:
CREATE TEMP TABLE tbl(
id int
,txt text
,txtarr text[]
);
The column txtarr is just there to demonstrate syntax variants in the INSERT command. The third row is spiked with meta-characters:
INSERT INTO tbl VALUES
(1, 'foo', '{{1,foo1},{2,bar1},{3,baz1}}')
,(2, 'bar', ARRAY[['1','foo2'],['2','bar2'],['3','baz2']])
,(3, '}b",a{r''', '{{1,foo3},{2,bar3},{3,baz3}}'); -- txt has meta-characters
SELECT * FROM tbl;
Simple case: aggregate two integer (I use the same twice) into a two-dimensional int array:
Update: Better with custom aggregate function
With the polymorphic type anyarray it works for all base types:
CREATE AGGREGATE array_agg_mult (anyarray) (
SFUNC = array_cat
,STYPE = anyarray
,INITCOND = '{}'
);
Call:
SELECT array_agg_mult(ARRAY[ARRAY[id,id]]) AS x -- for int
,array_agg_mult(ARRAY[ARRAY[id::text,txt]]) AS y -- or text
FROM tbl;
Note the additional ARRAY[] layer to make it a multidimensional array.
Update for Postgres 9.5+
Postgres now ships a variant of array_agg() accepting array input and you can replace my custom function from above with this:
The manual:
array_agg(expression)
...
input arrays concatenated into array of one
higher dimension (inputs must all have same dimensionality, and cannot
be empty or NULL)
I suspect that without having more knowledge of your application I'm not going to be able to get you all the way to the result you need. But we can get pretty far. For starters, there is the ROW function:
# SELECT 'foo', ROW(3, 'Bob');
?column? | row
----------+---------
foo | (3,Bob)
(1 row)
So that right there lets you bundle a whole row into a cell. You could also make things more explicit by making a type for it:
# CREATE TYPE person(id INTEGER, name VARCHAR);
CREATE TYPE
# SELECT now(), row(3, 'Bob')::person;
now | row
-------------------------------+---------
2012-02-03 10:46:13.279512-07 | (3,Bob)
(1 row)
Incidentally, whenever you make a table, PostgreSQL makes a type of the same name, so if you already have a table like this you also have a type. For example:
# DROP TYPE person;
DROP TYPE
# CREATE TABLE people (id SERIAL, name VARCHAR);
NOTICE: CREATE TABLE will create implicit sequence "people_id_seq" for serial column "people.id"
CREATE TABLE
# SELECT 'foo', row(3, 'Bob')::people;
?column? | row
----------+---------
foo | (3,Bob)
(1 row)
See in the third query there I used people just like a type.
Now this is not likely to be as much help as you'd think for two reasons:
I can't find any convenient syntax for pulling data out of the nested row.
I may be missing something, but I just don't see many people using this syntax. The only example I see in the documentation is a function taking a row value as an argument and doing something with it. I don't see an example of pulling the row out of the cell and querying against parts of it. It seems like you can package the data up this way, but it's hard to deconstruct after that. You'll wind up having to make a lot of stored procedures.
Your language's PostgreSQL driver may not be able to handle row-valued data nested in a row.
I can't speak for NPGSQL, but since this is a very PostgreSQL-specific feature you're not going to find support for it in libraries that support other databases. For example, Hibernate isn't going to be able to handle fetching an object stored as a cell value in a row. I'm not even sure the JDBC would be able to give Hibernate the information usefully, so the problem could go quite deep.
So, what you're doing here is feasible provided you can live without a lot of the niceties. I would recommend against pursuing it though, because it's going to be an uphill battle the whole way, unless I'm really misinformed.
A simple way without hstore
SELECT
jsonb_agg(to_jsonb (t))
FROM (
SELECT
unnest(ARRAY ['foo', 'bar', 'baz']) AS table_name
) t
>>> [{"table_name": "foo"}, {"table_name": "bar"}, {"table_name": "baz"}]