Presto how to directly replace an item in array

Presto how to directly replace an item in array - sql

I have a database that one of the columns is an array containing many information, see below:
{"2":"KJH78","4":"CL","10":"Sell"}
The index 2 stands for the ID.
Now I would like to hash the ID, and directly replace the ID value in the array, using Presto SELECT statement.
I have not found any useful Presto commands that do the job.
What I have
acc_id
acc_id_hash
flds
KJH78
2bjkse879sdf2kk7
{"2":"KJH78", ......}
What I want
acc_id
acc_id_hash
flds
KJH78
2bjkse879sdf2kk7
{"2":"2bjkse879sdf2kk7 ", ......}
Failed attempts
Ignore the result of hash function as the value was manually changed to protect data
1. Using replace function
SELECT element_at(flds, 2) AS acc_id,
xxhash64(to_utf8(element_at(flds, 2))) as acc_id_hash,
replace(flds[2], cast(from_utf8(xxhash64(to_utf8(element_at(flds, 2)))) as varchar)),
flds
FROM mscs_feed
WHERE date_string = '2021-05-26'
and element_at(flds, 2) is not null
limit 10
Result:
acc_id
acc_id_hash
_col2
flds
KJH78
2bjkse879sdf2kk7
KJH78
{"2":"KJH78", ......}
Problem: The flds[2] is varchar, hash is varbinary. The replace function needs both parameters to be the same type. After converting the types, the hash becomes meaningless.
2. Trim the array, then add the hash into it
SELECT element_at(flds, 2) AS acc_id,
xxhash64(to_utf8(element_at(flds, 2))) as acc_id_hash,
slice(flds, 1, cardinality(flds) - 1),
flds
FROM mscs_feed
WHERE date_string = '2021-05-26'
and element_at(flds, 2) is not null
limit 10
Problem: Error line 3: Unexpected parameters (map(integer,varchar), integer, bigint) for function slice. Expected: slice(array(E), bigint, bigint) E
Please let me know if the question is unclear, or more information is needed.
Thanks in advance!

Related

Checking against value in a STRING_TABLE in a WHERE clause

I have a procedure with the parameter IT_ATINN:
IMPORTING
REFERENCE(IT_ATINN) TYPE STRING_TABLE
IT_ATINN contains a list of characteristics.
I have the following code:
LOOP AT values_tab INTO DATA(value).
SELECT ( #value-INSTANCE ) AS CUOBJ
FROM IBSYMBOL
WHERE SYMBOL_ID = #value-SYMBOL_ID
AND ATINN ??? "<======== HERE ???
APPENDING TABLE #DATA(ibsymbol_tab).
ENDLOOP.
How can I check if ATINN (in the WHERE clause) is equal to any entry in IT_ATINN?

To achieve what you want (and I assume you want dynamic SELECT fields) you cannot use inline declarations here, both in LOOP and in SELECT:
The structure of the results set must be statically identifiable. The SELECT list and the FROM clause must be specified statically and host variables in the SELECT list must not be generic.
So either you use inline or use dynamics, not both.
Here is the snippet that illustrates Sandra good suggestion:
TYPES: BEGIN OF ty_value_tab,
instance TYPE char18,
symbol_id TYPE id,
END OF ty_value_tab.
DATA: it_atinn TYPE string_table.
DATA: rt_atinn TYPE RANGE OF atinn,
value TYPE ty_value_tab,
values_tab TYPE RANGE OF ty_value_tab,
ibsymbol_tab TYPE TABLE OF ibsymbol.
rt_atinn = VALUE #( FOR value_atinn IN it_atinn ( sign = 'I' option = 'EQ' low = value_atinn ) ).
APPEND VALUE ty_value_tab( instance = 'ATWRT' ) TO values_tab.
LOOP AT values_tab INTO value.
SELECT (value-instance)
FROM ibsymbol
WHERE symbol_id = #value-symbol_id
AND atinn IN #rt_atinn
APPENDING CORRESPONDING FIELDS OF TABLE #ibsymbol_tab.
ENDLOOP.
Overall, it makes no sense select ibsymbol in loop, 'cause it has only 8 fields, so you can easily collect all necessary fields from values_tab and pass them as dynamic fieldstring.
If you wanna use alias CUOBJ for your dynamic field you should add it like this:
LOOP AT values_tab INTO value.
DATA(aliased_value) = value-instance && ` AS cuobj `.
SELECT (aliased_value)
...
Remember, that your alias should exists among ibsymbol fields, otherwise in case of static ibsymbol_tab declaration this statement will throw a short dump.

Using Array(Tuple(LowCardinality(String), Int32)) in ClickHouse

I have a table
CREATE TABLE table (
id Int32,
values Array(Tuple(LowCardinality(String), Int32)),
date Date
) ENGINE MergeTree()
PARTITION BY toYYYYMM(date)
ORDER BY (id, date)
but when executing the request
SELECT count(*)
FROM table
WHERE (arrayExists(x -> ((x.1) = toLowCardinality('pattern')), values) = 1)
I get an error
Code: 49. DB::Exception: Received from clickhouse:9000. DB::Exception: Cannot capture column 3 because it has incompatible type: got String, but LowCardinality(String) is expected..
If I replace the column 'values'
values Array(Tuple(String, Int32))
then the request is executed without errors.
What could be the problem when using Array(Tuple(LowCardinality(String), Int32))?

Until it will be fixed (see bug 7815), can be used this workaround:
SELECT uniqExact((id, date)) AS count
FROM table
ARRAY JOIN values
WHERE values.1 = 'pattern'
For the case when there are more than one Array-columns can be used this way:
SELECT uniqExact((id, date)) AS count
FROM
(
SELECT
id,
date,
arrayJoin(values) AS v,
arrayJoin(values2) AS v2
FROM table
WHERE v.1 = 'pattern' AND v2.1 = 'pattern2'
)

values Array(Tuple(LowCardinality(String), Int32)),
Do not use Tuple. It brings only cons.
It's still *2 files on the disk.
It gives twice slowdown then you extract only one tuple element
https://gist.github.com/den-crane/f20a2dce94a2926a1e7cfec7cdd12f6d
valuesS Array(LowCardinality(String)),
valuesI Array(Int32)

Cast to long datatype - BigQuery

BigQuery and SQL noob here. I was going through possible data types big query supports here. I have a column in bigtable which is of type bytes and its original data type is scala Long. This was converted to bytes and stored in bigtable from my application code. I am trying to do CAST(itemId AS integer) (where itemId is the column name) in the BigQuery UI but the output of CAST(itemId AS integer) is 0 instead of actual value. I have no idea how to do this. If someone could point me in the right direction then I would greatly appreciate it.
EDIT: Adding more details
Sample itemId is 190007788462
Following is the code which writes itemId to the big table. I have included the relevant method. Using hbase client to write to bigtable.
import org.apache.hadoop.hbase.client._
def toPut(key: String, itemId: Long): Put = {
val TrxColumnFamily = Bytes.toBytes("trx")
val ItemIdColumn = Bytes.toBytes("itemId")
new Put(Bytes.toBytes(key))
.addColumn(TrxColumnFamily,
ItemIdColumn,
Bytes.toBytes(itemId))
}
Following is the entry in big table based on above code
ROW COLUMN+CELL
foo column=trx:itemId, value=\x00\x00\x00\xAFP]F\xAA
Following is the relevant code which reads the entry from big table in scala. This works correctly. Result is a org.apache.hadoop.hbase.client.Result
private def getItemId(row: Result): Long = {
val key = Bytes.toString(row.getRow)
val TrxColumnFamily = Bytes.toBytes("trx")
val ItemIdColumn = Bytes.toBytes("itemId")
val itemId =
Bytes.toLong(row.getValue(TrxColumnFamily, ItemIdColumn))
itemId
}
The getItemId function above correctly returns itemId. That's because Bytes.toLong is part of org.apache.hadoop.hbase.util.Bytes which correctly casts the Byte string to Long.
I am using big query UI similar to this one and using CAST(itemId AS integer) because BigQuery doesn't have a Long data type. This incorrectly casts the itemId byte string to integer and resulting value is 0.
Is there any way I can have a Bytes.toLong equivalent from hbase-client in BigQuery UI? If not is there any other way I can go about this issue?

Try this:
SELECT CAST(CONCAT('0x', TO_HEX(itemId)) AS INT64) AS itemId
FROM YourTable;
It converts the bytes into a hex string, then casts that string into an INT64. Note that the query uses standard SQL, as opposed to legacy SQL. If you want to try it with some sample data, you can run this query:
WITH `YourTable` AS (
SELECT b'\x00\x00\x00\xAFP]F\xAA' AS itemId UNION ALL
SELECT b'\xFA\x45\x99\x61'
)
SELECT CAST(CONCAT('0x', TO_HEX(itemId)) AS INT64) AS itemId
FROM YourTable;

READ TABLE with dynamic key fields?

I have the name of a table DATA lv_tablename TYPE tabname VALUE 'xxxxx', and a generic FIELD-SYMBOLS: <lt_table> TYPE ANY TABLE. which contains entries selected from that corresponding table.
I've defined my line structure FIELD-SYMBOLS: <ls_line> TYPE ANY. which i'd use for reading from the table.
Is there a way to create a READ statement on <lt_table> fully specifying the key fields?
I am aware of the statement / addition READ TABLE xxxx WITH KEY (lv_field_name) = 'asdf'., but this however wouldn't work (afaik) for a dynamic number of key fields, and I wouldn't like to create a large number of READ TABLE statements with an increasing number of key field specifications.
Can this be done?

Actually i found this to work
DATA lt_bseg TYPE TABLE OF bseg.
DATA ls_bseg TYPE bseg.
DATA lv_string1 TYPE string.
DATA lv_string2 TYPE string.
lv_string1 = ` `.
lv_string2 = lv_string1.
SELECT whatever FROM wherever INTO TABLE lt_bseg.
READ TABLE lt_bseg INTO ls_bseg
WITH KEY ('MANDT') = 800
(' ') = ''
('BUKRS') = '0005'
('BELNR') = '0100000000'
('GJAHR') = 2005
('BUZEI') = '002'
('') = ''
(' ') = ''
(' ') = ' '
(lv_string1) = '1'
(lv_string2) = ''.
By using this syntax one can just specify as many key fields as required. If some fields will be empty, then these will just get ignored, even if values are specified for these empty fields.
One must pay attention that using this exact syntax (static definitions), 2 fields with the exact same name (even blank names) will not be allowed.
As shown with the variables lv_string1 and lv_string2, at run-time this is no problem.
And lastly, one can specify the fields in any order (i don't know what performance benefits or penalties one might get while using this syntax)

There seems to be the possibility ( like a dynamic select statement whith binding and lt_dynwhere ).
Please refer to this post, there was someone, who also asked for the requirement:
http://scn.sap.com/thread/1789520

3 ways:
READ TABLE itab WITH [TABLE] KEY (comp1) = value1 (comp2) = value2 ...
You can define a dynamic number of key fields by indicating statically the maximum number of key fields in the code, and indicate at runtime some empty key field names if there are less key fields to be used.
LOOP AT itab WHERE (where) (see Addition 4 "WHERE (cond_syntax)")
Available since ABAP 7.02.
SELECT ... FROM #itab WHERE (where) ...
Available since ABAP 7.52. It may be slow if the condition is complex and cannot be handled by the ABAP kernel, i.e. it needs to be executed by the database. In that case, only few databases are supported (I think only HANA is supported currently).
Examples (ASSERT statements are used here to prove that the conditions are true, otherwise the program would fail):
TYPES: BEGIN OF ty_table_line,
key_name_1 TYPE i,
key_name_2 TYPE i,
attr TYPE c LENGTH 1,
END OF ty_table_line,
ty_internal_table TYPE SORTED TABLE OF ty_table_line WITH UNIQUE KEY key_name_1 key_name_2.
DATA(itab) = VALUE ty_internal_table( ( key_name_1 = 1 key_name_2 = 1 attr = 'A' )
( key_name_1 = 1 key_name_2 = 2 attr = 'B' ) ).
"------------------ READ TABLE
DATA(key_name_1) = 'KEY_NAME_1'.
DATA(key_name_2) = 'KEY_NAME_2'.
READ TABLE itab WITH TABLE KEY
(key_name_1) = 1
(key_name_2) = 2
ASSIGNING FIELD-SYMBOL(<line>).
ASSERT <line> = VALUE ty_table_line( key_name_1 = 1 key_name_2 = 2 attr = 'B' ).
key_name_2 = ''. " ignore this key field
READ TABLE itab WITH TABLE KEY
(key_name_1) = 1
(key_name_2) = 2 "<=== will be ignored
ASSIGNING FIELD-SYMBOL(<line_2>).
ASSERT <line_2> = VALUE ty_table_line( key_name_1 = 1 key_name_2 = 1 attr = 'A' ).
"------------------ LOOP AT
DATA(where) = 'key_name_1 = 1 and key_name_2 = 1'.
LOOP AT itab ASSIGNING FIELD-SYMBOL(<line_3>)
WHERE (where).
EXIT.
ENDLOOP.
ASSERT <line_3> = VALUE ty_table_line( key_name_1 = 1 key_name_2 = 1 attr = 'A' ).
"---------------- SELECT ... FROM #itab
SELECT SINGLE * FROM #itab WHERE (where) INTO #DATA(line_3).
ASSERT line_3 = VALUE ty_table_line( key_name_1 = 1 key_name_2 = 1 attr = 'A' ).

How can I use the COUNT value obtained from a call to mkqlite()?

I'm using mksqlite to create and access an SQL database from matlab, and I want to get the number of rows in a table. I've tried this:
num = mksqlite('SELECT COUNT(*) FROM myTable');
, but the returned value isn't very helpful. If I put a breakpoint in my script and examine the variable, I find that it's a struct with a single field, called 'COUNT(_)', which seems to actually be an invalid name for a field, so I can't access it:
K>> class(num)
ans =
struct
K>> num
num =
COUNT(_): 0
K>> num.COUNT(_)
??? num.COUNT(_)
|
Error: The input character is not valid in MATLAB statements or expressions.
K>> num.COUNT()
??? Reference to non-existent field 'COUNT'.
K>> num.COUNT
??? Reference to non-existent field 'COUNT'.
Even the MATLAB IDE can't access it. If I try to double click the field in the variable editor, this gets spat out:
??? openvar('num.COUNT(_)', num.COUNT(_));
|
Error: The input character is not valid in MATLAB statements or expressions.
So how can I access this field?

You are correct that the problem is that mksqlite somehow manages to create an invalid field name that can't be read. The simplest solution is to add an AS clause to your SQL so that the field has a sensible name:
>> num = mksqlite('SELECT COUNT(*) AS cnt FROM myTable')
num =
cnt: 0
Then to remove the extra layer of indirection you can do:
>> num = num.cnt;
>> num
num =
0

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Presto how to directly replace an item in array - sql

Related

Checking against value in a STRING_TABLE in a WHERE clause

Using Array(Tuple(LowCardinality(String), Int32)) in ClickHouse

Cast to long datatype - BigQuery

READ TABLE with dynamic key fields?

How can I use the COUNT value obtained from a call to mkqlite()?

Categories

Resources