SSAS Create dynamic measures from values in dimension - ssas

I have a cube where I need to create dynamic measures using the script command.
the data is like the below:
Dim_Market Dim_Target value
Martket1 target1 val1
Martket1 target2 val1
Martket1 target3 val1
Martket2 target2 val1
Martket2 target1 val1
Martket2 target2 val1
The newly created measures should be like the below:
Martket1_Target1
Martket1_Target2
Martket1_Target3
...
Is there anyway to do this? The values in the dimensions are not static so we can't create the columns in the datasource view.

Do you mean:
[Dim_Market].[Martket].CurrentMember.Name + "_" + [Dim_Target].[Target].CurrentMember.Name
?

Related

How to retrieve the list of dynamic nested keys of BigQuery nested records

My ELT tools imports my data in bigquery and generates/extends automatically the schema for dynamic nested keys (in the schema below, under properties)
It looks like this
How can I get the list of nested keys of a repeated record ? so for example I can group by properties when those items have said property non-null ?
I have tried
select column_name
from my_schema.INFORMATION_SCHEMA.COLUMNS
where
table_name = 'my_table
But it will only list first level keys
From the picture above, I want, as a first step, a SQL query that returns
message
user_id
seeker
liker_id
rateable_id
rateable_type
from_organization
likeable_type
company
existing_attempt
...
My real goal through, is to group/count my data based on a non-null value of a 2nd level nested properties properties.filters.[filter_type]
The schema may evolve when our application adds more filters, so this need to be dynamically generated, I can't just hard-code the list of nested keys.
Note: this is very similar to this question How to extract all the keys in a JSON object with BigQuery but in my case my data is already in a shcema and it's not a JSON object
EDIT:
Suppose I have a list of such records with nested properties, how do I write a SQL query that adds a field "enabled_filters" which aggregates, for each item, the list of properties for wihch said property is not null ?
Example input (properties.x are dynamic and not known by the programmer)
search_id
properties.filters.school
properties.filters.type
1
MIT
master
2
Princetown
null
3
null
master
Example output
search_id
enabled_filters
1
["school", "type"]
2
["school"]
3
["type"]
Have you looked at COLUMN_FIELD_PATHS? It should give you the paths for all columns.
select field_path from my_schema.INFORMATION_SCHEMA.COLUMN_FIELD_PATHS where table_name = '<table>'
[https://cloud.google.com/bigquery/docs/information-schema-column-field-paths]
The field properties is not nested by array only by structures. Then a UDF in JavaScript to parse thise field should work fast enough.
CREATE TEMP FUNCTION jsonObjectKeys(input STRING, shownull BOOL,fullname Bool)
RETURNS Array<String>
LANGUAGE js AS """
function test(input,old){
var out=[]
for(let x in input){
let te=input[x];
out=out.concat(te==null ? (shownull?[x+'==null']:[]) : typeof te=='object' ? test(te,old+x+'.') : [fullname ? old+x : x] );
}
return out;
Object.keys(JSON.parse(input));
}
return test(JSON.parse(input),"");
""";
with tbl as (select struct(1 as alpha,struct(2 as x, 3 as y,[1,2,3] as z ) as B) A from unnest(generate_array(1,10*1))
union all select struct(null,struct(null,1,[999])) )
select *,
TO_JSON_STRING (A ) as string_output,
jsonObjectKeys(TO_JSON_STRING (A),true,false) as output1,
jsonObjectKeys(TO_JSON_STRING (A),false,true) as output2,
concat('["', array_to_string(jsonObjectKeys(TO_JSON_STRING (A),false,true),'","' ) ,'"]') as output_sring,
jsonObjectKeys(TO_JSON_STRING (A.B),false,true) as outpu
from tbl

Syncing Qubole HIve table to Snowflake with Struct field

I have a table like following Qubole:
use dm;
CREATE EXTERNAL TABLE IF NOT EXISTS fact (
id string,
fact_attr struct<
attr1 : String,
attr2 : String
>
)
STORED AS PARQUET
LOCATION 's3://my-bucket/DM/fact'
I have created parallel table in Snowflake like following:
CREATE TABLE IF NOT EXISTS dm.fact (
id string,
fact_attr variant
)
My ETL process loads the data into qubole table like:
+------------+--------------------------------+
| id | fact_attr |
+------------+--------------------------------+
| 1 | {"attr1": "a1", "attr2": "a2"} |
| 2 | {"attr1": "a3", "attr2": null} |
+------------+--------------------------------+
I am trying to sync this data to snowflake using Merge command, like
MERGE INTO DM.FACT dst USING %s src
ON dst.id = src.id
WHEN MATCHED THEN UPDATE SET
fact_attr = parse_json(src.fact_attr)
WHEN NOT MATCHED THEN INSERT (
id,
fact_attr
) VALUES (
src.id,
parse_json(src.fact_attr)
);
I am using PySpark to sync the data:
df.write \
.option("sfWarehouse", sf_warehouse) \
.option("sfDatabase", sf_database) \
.option("sfSchema", sf_schema) \
.option("postactions", query) \
.mode("overwrite") \
.snowflake("snowflake", sf_warehouse, sf_temp_table)
With above command I am getting following error:
pyspark.sql.utils.IllegalArgumentException: u"Don't know how to save StructField(fact_attr,StructType(StructField(attr1,StringType,true), StructField(attr2,StringType,true)),true) of type attributes to Snowflake"
I have read through following links but no success:
Semi-structured Data Types
Querying Semi-structured Data
Question:
How can I insert/sync data from Qubole Hive table which has STRUCT field to snowflake?
The version of your Spark Connector for Snowflake in use at the time of trying this lacked support for variant data types.
Support was introduced in their connector version 2.4.4 (released July 2018) onwards, where the StructType fields are now auto-mapped to a VARIANT data type that will work with your MERGE command.

ARRAY_CONTAINS muliple values in hive

Is there a convenient way to use the ARRAY_CONTAINS function in hive to search for multiple entries in an array column rather than just one? So rather than:
WHERE ARRAY_CONTAINS(array, val1) OR ARRAY_CONTAINS(array, val2)
I would like to write:
WHERE ARRAY_CONTAINS(array, val1, val2)
The full problem is that I need to read val1 and val2 dynamically from the command line arguments when I run the script and I generally don't know how many values will be conditioned on. So you can think of vals being a comma separated list (or array) containing values val1, val2, ..., and I want to write
WHERE ARRAY_CONTAINS(array, vals)
Thanks in advance!
There is a UDF here that will let you take the intersection of two arrays. Assuming your values have the structure
values_array = [val1, val2, ..., valn]
You could then do
where array_intersection(array, values_array)[0] is not null
If they don't have any elements in common, [] will be returned and therefore [][0] will be null
Create table tmp_cars AS Select make,COLLECT_LIST(TRIM(model))
model_List from default.cars GROUP BY make;
Select array_contains(model_List,CAST('Rainier' as varchar(40)))
FROM Default.tmp_cars t
where make = 'Buick';
Data
[" Rainier"," Rendezvous CX"," Century Custom 4dr"," LeSabre Custom 4dr"," Regal LS 4dr"," Regal GS 4dr"," LeSabre Limited 4dr"," Park Avenue 4dr"," Park Avenue Ultra 4dr"]
Return
True
select *
from table
lateral view explode(array) a as arr
where arr in (vals)
;

Perl - From Database to data structure

I'm querying a table in a database with SQL like this:
Select col1, col2 from table_name
For reference, col2 will be an integer value, and col1 will be the name of an element. E.G.
FOO, 3
BAR, 10
I want a data structure where the values can be addressed like vars->{valueofcol1} should return the value of col2.
So
$vars->FOO
would return 3
Basically I don't know how to return the SQL results into a data structure I can address like this.
You need to fetch reach row and build that hashref yourself.
my $vars; # declare the variable for the hash ref outside the loop
my $sth = $dbh->prepare(q{select col1, col2 from table_name});
$sth->execute;
while ( my $res = $sth->fetchrow_hashref ) { # fetch row by row
$vars->{ $res->{col1} } = $res->{col2}; # build up data structure
}
print $vars->{FOO};
__END__
3
You may want to read up on DBI, especially how to fetch stuff.

Reading json files in pig

I have three data types...
1) Base data
2) data_dict_1
3) data_dict_2
Base data is very well formatted json..
For example:
{"id1":"foo", "id2":"bar" ,type:"type1"}
{"id1":"foo", "id2":"bar" ,type:"type2"}
data_dict_1
1 foo
2 bar
3 foobar
....
data_dict_2
-1 foo
-2 bar
-3 foobar
... and so on
Now, what I want is.. if the data is of type1
Then read id1 from data_dict_1, id2 from data_dict2 and assign that integer id..
If data is of type2.. then read id1 from data_dict_2.. id2 from data_dict1.. and assign corresponding ids..
For example:
{"id1":1, "id2":2 ,type:"type1"}
{"id1":-1, "id2":-2 ,type:"type2"}
And so on..
How do i do this in pig?
Note: what you have in the upper example is not valid json, the type key is not quoted.
Assuming Pig 0.10 and up, there's the JsonLoader built-in, which you can pass a schema to and load it with
data = LOAD 'loljson' USING JsonLoader('id1:chararray,id2:chararray,type:chararray');
and load the dicts
dict_1 = LOAD 'data_dict_1' USING PigStorage(' ') AS (id:int, key:chararray);
dict_2 = LOAD 'data_dict_2' USING PigStorage(' ') AS (id:int, key:chararray);
Then split that based on the type value
SPLIT data INTO type1 IF type == 'type1', type2 IF type == 'type2';
JOIN them appropriately
type1_joined = JOIN type1 BY id1, dict_1 BY key;
type1_joined = FOREACH type1_joined GENERATE type1::id1 AS id1, type1::id2 AS id2, type1::type AS type, dict_1::id AS id;
type2_joined = JOIN type2 BY id2, dict_2 BY key;
type2_joined = FOREACH type2_joined GENERATE type2::id1 AS id1, type2::id2 AS id2, type2::type AS type, dict_2::id AS id;
and since the schemas are equal, UNION them together
final_data = UNION type1_joined, type2_joined;
this produces
DUMP final_data;
(foo,bar,type2,-2)
(foo,bar,type1,1)