OrientDB embeddedmap query - sql

Let's say I have a Vertex class Data in OrientDB. Data has a property data which is of the type EMBEDDEDMAP.
I can create a new Vertex of this type and assign an object to the property data with the following command:
CREATE VERTEX Data SET data = {'key1':'val1', 'key2':'val2'}
Let's say now that I want to query the database and get records that holds exactly this structure in the data property.
Ie, something in the lines of:
SELECT FROM Data WHERE data = {"key1":"val1","key2":"val2"}
This doesn't work however (Note also that the structure in data is arbitrary and can have nested structures: {"key2":{"key2":"val2"}} etc.)
I know that this query is possible for an embeddedmap type:
SELECT FROM Data WHERE "val1" IN data.key1 AND "val2" IN data.key2
But for arbitrary data structures it would be bothersome to parse such a query, this also made me find out another thing:
Let's say I create two vertices:
CREATE VERTEX Data SET data = {"key1":["one", "two"]}
CREATE VERTEX Data SET data = {"key1":["one"]}
I now want to select only the first of them, for instance with:
SELECT FROM Data WHERE ["one", "two"] IN data.key1
This query however returns both records:
#rid #version #class data
#13:7 1 Data {"key1":["one","two"]}
#13:8 1 Data {"key1":["one"]}
I'm guessing I have to do:
SELECT FROM Data WHERE "one" IN data.key1 AND "two" IN data.key1
However, this also seems quite cumbersome for nested lists.
Question: How could I query on a known, arbitrary data structure (embeddedmap)
NOTE: I'm not asking about specific values in the structure, but the whole structure.

Related

Dynamic list of variables in process in Azure Data Factory

I have a lookup config table that stores the 1) source table and 2) list of variables to process, for example:
SQL Lookup Table:
tableA, variableX,variableY,variableZ <-- tableA has more than these 3 variables, i.e it has other variables such as variableV, variable W but they do not need to be processed
tableB, variableA,variableB <-- tableB has more than these 2 variables
Hence, I will need to dynamically connect to each table and process the specific variables in each table. The processing step is to convert the julian date (in integer format) to standard date (date format). Example of SQL query:
select dateadd(dd, (variableX - ((variableX/1000) * 1000)) - 1, dateadd(yy, variableX/1000, 0)) FROM [dbo].[tableA]
The problem is after setting up lookup and forEach in ADF, I am unsure how to loop through the variable array (or string, since SQL DB does not allow me to store array results) and convert all these variables into the standard time format.
The return result should be a processed dataset to be exported to a sink.
Hence would like to check what will be the best way to achieve this in ADF?
Thank you!
I have reproed in my local environment. Please see the below steps.
Using lookup activity, first get all the tables list from control table.
Pass the lookup output to ForEach activity.
Inside ForEach activity, add lookup activity to get the variables list from control table where table name is current item from ForEach activity.
#concat('select table_variables from control_tb where table_name = ''',item().table_name,'''')
Convert lookup2 activity output value to an array using set variable activity.
#split(activity('Lookup2').output.firstRow.table_variables,',')
create another pipeline (pipeline2) with 2 parameters (table name (string) and variables (array)) and add ForEach activity in pipeline2
Pass the array parameter to ForEach activity in pipeline2 and Use the copy activity to copy data from source to sink
Connect Execute pipeline activity to pipeline 1 inside ForEach activity.

Querying Redshift Spectrum array of string columns

I have an external (s3) table in my redshift cluster with an array of string column. It's literally just a list of strings. I can query and only select the array column no worries. I can query all 3 of the array columns no worries but as soon as I try to query other columns that are not arrays I get the following:
error: Spectrum Scan Error
I have tried the following as I saw it on some other stack overflow questions
select id_col, b
from test.test_table as a, a.array_col as b
but when I run the above I get: navigation on array_col is not allowed as it is not a struct/tuple type
Of course, this error message makes sense as it isn't a struct or tuple type but I am lost as to how on earth I can query a simple array of strings and have found no documentation on how to do this. Any help or advice would be greatly appreciated!
Since you renamed the external table, you have to use that name for all the fields that you want to retrieve; in the same way you did with the array column. Your query will be:
select a.id_col, b
from test.test_table as a, a.array_col as b

How can data studio read a repeatable column as values of a single record?

I'm moving a Mongo collection into BigQuery to do analysis and visualizations in Google Data Studio. I'm specifically trying to map all results of a locations collection, which has multiple records, one for each location. Each record stores the lat long as an array of 2 numbers.
In Data Studio, when i try to map the locations.coordinates value, it croaks, because it only pulls in the first value of the array. If instead of mapping it, I output the result as a table, I see 2 rows for each record, with the _id being the same and locations.coordinates being different between a row that has the latitude (locations.coordinates[0]) and another row for the longitude (locations.coordinates[1]).
I think I have to do this as a scheduled query in bigquery, that runs after every sync of data. But, I'm hoping there is a way to do this as a calculated field or a blended data set, in Google Data Studio.
Data as it exists in mongo
Data as it exists in bigquery
Data as it exists in data studio
additional:
Big Query Record Types
You can address values in arrays directly and transform your data accordingly using struct etc.:
WITH t AS (
SELECT * FROM UNNEST([
STRUCT('a' AS company, STRUCT([-71.2, 42.0] as coordinates, 'Point' as type) AS location),
('b', ([-71.0, 42.2], 'Point')),
('c', ([-71.4, 42.4], 'Point'))
])
)
--show source structure of example data
--SELECT * FROM t
SELECT * except(location),
STRUCT(
location.coordinates[safe_offset(0)] as long,
location.coordinates[safe_offset(1)] as lat,
location.type
) as location
FROM t
There's offset() for 0-based access, ordinal() for 1-based access and with safe_ you don't trigger errors in case the index in the array doesn't exist. If you need to know that values are missing, then you should use the version without safe_.
Anyway - this structure is flat by choosing specific values from the array. It should work with datastudio or any other visualization tool, there are no repeated rows anymore

Using rails , what 's wrong with this query , it does not return a valid id

store_id=Store.select(:id).where(user_id:current_user.id).to_a.first
it returns id like that : Store:0x00007f8717546c30
Store.select(:id).where(user_id:current_user.id).to_a.first
select does not return an array of strings or integers for the given column(s), but rather an active record relation containing objects with just the given field:
https://apidock.com/rails/ActiveRecord/QueryMethods/select
Your code is then converting that relation to an array, and taking the first object in that array, which is an instance of the Store class. If you want the ID, then try:
Store.select(:id).where(user_id:current_user.id).to_a.first.id
However, I think you're misunderstanding how to structure the queries. Put the where part first, and then find the ID of the first result:
Store.where(user_id: current_user.id).first.id
And if there is only 1 store, then:
Store.find_by(user_id: current_user.id).id
Or...
Store.find_by(user: current_user).id
or.....
current_user.store.id
(or current_user.stores.first.id if there are many)

Put Array in to Table Column

I'm trying to store information in a pytables subclass. I have my class Record and subclass Data. Data will have many rows for every row of Record. I don't want to use a loop with row.append() because it seems like it would be horribly slow. Can I just create an array and drop it in Data.v column? How?
import tables as tbs
import numpy as np
class Record(tbs.IsDescription):
filename = tbs.StringCol(255)
timestamp = tbs.Time32Col()
class Data(tbs.IsDescription):
v = tbs.Int32Col(dflt=None)
...
row = table.row
for each in importdata:
row['filename'] = each['filename']
row['timestamp'] = each['timestamp']
# ???? I want to do something like this
row.Data = tbs.Array('v', each['v'], shape=np.shape(each['v']))
row.append()
OK, When I read about nested tables I was think about relational data in a one-to-many situation. This isn't possible with nested tables. Instead I just created a separate table and stored row references using
tables.nrows
to get the current row of my Data table. This works for me because for every entry in Record I can calculate the number of rows that will be stored in Data. I just need to know the starting row. I'm not going to modify/insert/remove any rows in the future so my reference doesn't change. Anyone considering this technique should understand the significant limitations it brings.
Nested columns use the '/' separator in the column key. So I think that you simply need to change the line:
row.Data = tbs.Array('v', each['v'], shape=np.shape(each['v']))
to the following:
row['Data/v'] = each['v']