split column into lines (explode) in dbt with sql - sql

I am very new to dbt, I am trying to explode a column value into rows using SQL on dbt.
I have a table sample_data:
| id | content_toSplit |
| -------- | --------------- |
| 1 | [a, b, c, d] |
| 2 | [a, v] |
| 3 | [m, n, a] |
|id |output_column|
|-----|-------------|
|1 | a |
|1 | b |
|1 | c |
|1 | d |
|2 | a |
|2 | v |
I tried using unnest in a macro, and used the macro in my model:
Macro
{% macro split_column_to_row(column_name) %}
cross join unnest{{ column_name }} as output_column
{% endmacro %}
The Macro used in my model
select id, content_toSplit from {{ref('source')}}
{{split_column_to_row(content_toSplit)}}
But I am getting an "SQL compilation error: Object 'UNNEST' does not exist or not authorized." error. Also, I would like to get a position of each value.
I have also tried the below using SQL:
with unnest_column as (
select
id,
content_toSplit
from my_table, unnest(content_toSplit) as exploded_value
)
select *
from unnest_column
But getting "unexpected '('. syntax error line"

Perhaps your macro is interpreting unnest as a table name since it follows cross join but that is just a guess without the actual generated SQL. Your SQL statement is using unnest improperly. It is uses as an old style (pre SQL-92) cross join. However, no join clause is necessary, just unnest(content_toSplit) as exploded_value in the select column list. (see demo)
select id
, unnest(content_toSplit) as exploded_value
from sample_data
where id < 3; -- needed to get your specific output
Caution: Please be consistent: You initially state I have a table sample_data: yet your SQL references my_table. This is a physically small question, but had it been larger, the inconsistency could actually keep you from getting an answer. Details are important.

Related

How to get a value inside of a JSON that is inside a column in a table in Oracle sql?

Suppose that I have a table named agents_timesheet that having a structure like this:
ID | name | health_check_record | date | clock_in | clock_out
---------------------------------------------------------------------------------------------------------
1 | AAA | {"mental":{"stress":"no", "depression":"no"}, | 6-Dec-2021 | 08:25:07 |
| | "physical":{"other_symptoms":"headache", "flu":"no"}} | | |
---------------------------------------------------------------------------------------------------------
2 | BBB | {"mental":{"stress":"no", "depression":"no"}, | 6-Dec-2021 | 08:26:12 |
| | "physical":{"other_symptoms":"no", "flu":"yes"}} | | |
---------------------------------------------------------------------------------------------------------
3 | CCC | {"mental":{"stress":"no", "depression":"severe"}, | 6-Dec-2021 | 08:27:12 |
| | "physical":{"other_symptoms":"cancer", "flu":"yes"}} | | |
Now I need to get all agents having flu at the day. As for getting the flu from a single JSON in Oracle SQL, I can already get it by this SQL statement:
SELECT * FROM JSON_TABLE(
'{"mental":{"stress":"no", "depression":"no"}, "physical":{"fever":"no", "flu":"yes"}}', '$'
COLUMNS (fever VARCHAR(2) PATH '$.physical.flu')
);
As for getting the values from the column health_check_record, I can get it by utilizing the SELECT statement.
But How to get the values of flu in the JSON in the health_check_record of that table?
Additional question
Based on the table, how can I retrieve full list of other_symptoms, then it will get me this kind of output:
ID | name | other_symptoms
-------------------------------
1 | AAA | headache
2 | BBB | no
3 | CCC | cancer
You can use JSON_EXISTS() function.
SELECT *
FROM agents_timesheet
WHERE JSON_EXISTS(health_check_record, '$.physical.flu == "yes"');
There is also "plain old way" without JSON parsing only treting column like a standard VARCHAR one. This way will not work in 100% of cases, but if you have the data in the same way like you described it might be sufficient.
SELECT *
FROM agents_timesheet
WHERE health_check_record LIKE '%"flu":"yes"%';
How to get the values of flu in the JSON in the health_check_record of that table?
From Oracle 12, to get the values you can use JSON_TABLE with a correlated CROSS JOIN to the table:
SELECT a.id,
a.name,
j.*,
a."DATE",
a.clock_in,
a.clock_out
FROM agents_timesheet a
CROSS JOIN JSON_TABLE(
a.health_check_record,
'$'
COLUMNS (
mental_stress VARCHAR2(3) PATH '$.mental.stress',
mental_depression VARCHAR2(3) PATH '$.mental.depression',
physical_fever VARCHAR2(3) PATH '$.physical.fever',
physical_flu VARCHAR2(3) PATH '$.physical.flu'
)
) j
WHERE physical_flu = 'yes';
db<>fiddle here
You can use "dot notation" to access data from a JSON column. Like this:
select "DATE", id, name
from agents_timesheet t
where t.health_check_record.physical.flu = 'yes'
;
DATE ID NAME
----------- --- ----
06-DEC-2021 2 BBB
Note that this approach requires that you use an alias for the table name (so you can use it in accessing the JSON data).
For testing I used the data posted by MT0 on dbfiddle. I am not a big fan of double-quoted column names; use something else for "DATE", such as dt or date_.

PostgreSQL: Efficiently split JSON array into rows

I have a table (Table A) that includes a text column that contains JSON encoded data.
The JSON data is always an array with between one and a few thousand plain object.
I have another table (Table B) with a few columns, including a column with a datatype of 'JSON'
I want to select all the rows from table A, split the json array into its elements and insert each element into table B
Bonus objective: Each object (almost) always has a key, x. I want to pull the value of x out into column, and delete x from the original object (if it exists).
E.g.: Table A
| id | json_array (text) |
+----+--------------------------------+
| 1 | '[{"x": 1}, {"y": 8}]' |
| 2 | '[{"x": 2, "y": 3}, {"x": 1}]' |
| 3 | '[{"x": 8, "z": 2}, {"z": 3}]' |
| 4 | '[{"x": 5, "y": 2, "z": 3}]' |
...would become: Table B
| id | a_id | x | json (json) |
+----+------+------+--------------------+
| 0 | 1 | 1 | '{}' |
| 1 | 1 | NULL | '{"y": 8}' |
| 2 | 2 | 2 | '{"y": 3}' |
| 3 | 2 | 1 | '{}' |
| 4 | 3 | 8 | '{"y": 2}' |
| 5 | 3 | NULL | '{"z": 3}' |
| 6 | 4 | 5 | '{"y": 2, "z": 3}' |
This initially has to work on a few million rows, and would then need to be run at regular intervals, so making it efficient would be a priority.
Is it possible to do this without using a loop and PL/PgSQL? I haven't been making much progress.
The json data type is not particularly suitable (or intended) for modification at the database level. Extracting "x" objects from the JSON object is therefore cumbersome, although it can be done.
You should create your table B (with hopefully a more creative column name than "json"; I am using item here) and make the id column a serial that starts at 0. A pure json solution then looks like this:
INSERT INTO b (a_id, x, item)
SELECT sub.a_id, sub.x,
('{' ||
string_agg(
CASE WHEN i.k IS NULL THEN '' ELSE '"' || i.k || '":' || i.v END,
', ') ||
'}')::json
FROM (
SELECT a.id AS a_id, (j.items->>'x')::integer AS x, j.items
FROM a, json_array_elements(json_array) j(items) ) sub
LEFT JOIN json_each(sub.items) i(k,v) ON i.k <> 'x'
GROUP BY sub.a_id, sub.x
ORDER BY sub.a_id;
In the sub-query this extracts the a_id and x values, well as the JSON object. In the outer query the JSON object is broken into its individual pieces and the objects with key x thrown out (the LEFT JOIN ON i.k <> 'x'). In the select list the pieces are put back together again with string concatenation and grouped into compound objects.
This necessarily has to be like this because json has no built-in manipulation functions of any consequence. This works on PG versions 9.3+, i.e. since time immemorial insofar as JSON support is concerned.
If you are using PG9.5+, the solution is much simpler through a cast to jsonb:
INSERT INTO b (a_id, x, item)
SELECT a.id, (j.items->>'x')::integer, j.items #- '{x}'
FROM a, jsonb_array_elements(json_array::jsonb) j(items);
The #- operator on the jsonb data type does all the dirty work here. Obviously, there is a lot of work going on behind the scenes, converting json to jsonb, so if you find that you need to manipulate your JSON objects more frequently then you are better off using the jsonb type to begin with. In your case I suggest you do some benchmarking with EXPLAIN ANALYZE SELECT ... (you can safely forget about the INSERT while testing) on perhaps 10,000 rows to see which works best for your setup.

SQL join two tables using value from one as column name for other

I'm a bit stumped on a query I need to write for work. I have the following two tables:
|===============Patterns==============|
|type | bucket_id | description |
|-----------------------|-------------|
|pattern a | 1 | Email |
|pattern b | 2 | Phone |
|==========Results============|
|id | buc_1 | buc_2 |
|-----------------------------|
|123 | pass | |
|124 | pass |fail |
In the results table, I can see that entity 124 failed a validation check in buc_2. Looking at the patterns table, I can see bucket 2 belongs to pattern b (bucket_id corresponds to the column name in the results table), so entity 124 failed phone validation. But how do I write a query that joins these two tables on the value of one of the columns? Limitations to how this query is going to be called will most likely prevent me from using any cursors.
Some crude solutions:
SELECT "id", "description" FROM
Results JOIN Patterns
ON "buc_1" = 'fail' AND "bucket_id" = 1
union all
SELECT "id", "description" FROM
Results JOIN Patterns
ON "buc_2" = 'fail' AND "bucket_id" = 2
Or, with a very probably better execution plan:
SELECT "id", "description" FROM
Results JOIN Patterns
ON "buc_1" = 'fail' AND "bucket_id" = 1
OR "buc_2" = 'fail' AND "bucket_id" = 2;
This will report all failure descriptions for each id having a fail case in bucket 1 or 2.
See http://sqlfiddle.com/#!4/a3eae/8 for a live example
That being said, the right solution would be probably to change your schema to something more manageable. Say by using an association table to store each failed test -- as you have in fact here a many to many relationship.
An other approach if you are using Oracle ≥ 11g, would be to use the UNPIVOT operation. This will translate columns to rows at query execution:
select * from Results
unpivot ("result" for "bucket_id" in ("buc_1" as 1, "buc_2" as 2))
join Patterns
using("bucket_id")
where "result" = 'fail';
Unfortunately, you still have to hard-code the various column names.
See http://sqlfiddle.com/#!4/a3eae/17
It looks to me that what you really want to know is the description(in your example Phone) of a Pattern entry given the condition that the bucket failed. Regardless of the specific example you have you want a solution that fulfills that condition, not just your particular example.
I agree with the comment above. Your bucket entries should be tuples(rows) and not arguments, and also you should share the ids on each table so you can actually join them. For example, Consider adding a bucket column and index their number then just add ONE result column to store the state. Like this:
|===============Patterns==============|
|type | bucket_id | description |
|-----------------------|-------------|
|pattern a | 1 | Email |
|pattern b | 2 | Phone |
|==========Results====================|
|entity_id | bucket_id |status |
|-------------------------------------|
|123 | 1 |pass |
|124 | 1 |pass |
|123 | 2 | |
|124 | 2 |fail |
1.-Use an Inner Join: http://www.w3schools.com/sql/sql_join_inner.asp and the WHERE clause to filter only those buckets that failed:
2.-Would this example help?
SELECT Patterns.type, Patterns.description, Results.entity_id,Results.status
INNER JOIN Results
ON
Patterns.bucket_id=Results.bucket_id
WHERE
Results.status=fail
Lastly, I would also add a primary_key column to each table to make sure indexing is faster for each unique combination.
Thanks!

SQL Query converting to Rails Active Record Query Interface

I have been using sql queries in my rails code which needs to be transitioned to Active Record Query. I haven't used Active Record before so i tried going through http://guides.rubyonrails.org/active_record_querying.html to get the proper syntax to be able to switch to this method of getting the data. I am able to convert the simple queries into this format but there are other complex queries like
SELECT b.owner,
Sum(a.idle_total),
Sum(a.idle_monthly_usage)
FROM market_place_idle_hosts_summaries a,
(SELECT DISTINCT owner,
hostclass,
week_number
FROM market_place_idle_hosts_details
WHERE week_number = '#{week_num}'
AND Year(updated_at) = '#{year_num}') b
WHERE a.hostclass = b.hostclass
AND a.week_number = b.week_number
AND Year(updated_at) = '#{year_num}'
GROUP BY b.owner
ORDER BY Sum(a.idle_monthly_usage) DESC
which i need in Active Record format but because of the complexity I am stuck as to how to proceed with the conversion.
The output of the query is something like this
+----------+-------------------+---------------------------+
| owner | sum(a.idle_total) | sum(a.idle_monthly_usage) |
+----------+-------------------+---------------------------+
| abc | 485 | 90387.13690185547 |
| xyz | 815 | 66242.01857376099 |
| qwe | 122 | 11730.609939575195 |
| asd | 80 | 9543.170425415039 |
| zxc | 87 | 8027.090087890625 |
| dfg | 67 | 7303.070011138916 |
| wqer | 76 | 5234.969814300537 |
Instead of converting it to an active record, you can use the find_by_sql method. Since your query is a bit complex.
You can use also use ActiveRecord::Base.connection, directly to fetch the records.
like this,
ActiveRecord::Base.connection.execute("your query")
You can create the subquery apart with ActiveRecord and convert it to sql using to_sql
Then use joins to join your table a with the b one, that it is the subquery. Note also the use of the active record clauses select, where, group and order that are basically what you need to build this complex SQL query in ActiveRecord.
Something similar to the following will work:
subquery = SubModel.select("DISTINCT ... ").where(" ... ").to_sql
Model.select("b.owner, ... ")
.joins("JOIN (#{subquery}) b ON a.hostclass = b.hostclass")
.where(" ... ")
.group("b.owner")
.order("Sum(a.idle_monthly_usage) DESC")

How to get numbers arranged right to left in sql server SELECT statements

When performing SELECT statements including number columns (prices, for example), the result always is left to right ordered, which reduces the readability. Therefore I'm searching a method to format the output of number columns right to left.
I already tried to use something like
SELECT ... SPACE(15-LEN(A.Nummer))+A.Nummer ...
FROM Artikel AS A ...
which gives close results, but depending on font not really. An alternative would be to replace 'SPACE()' with 'REPLICATE('_',...)', but I don't really like the underscores in output.
Beside that this formula will crash on numbers with more digits than 15, therefore I searched for a way finding the maximum length of entries to make it more save like
SELECT ... SPACE(MAX(A.Nummer)-LEN(A.Nummer))+A.Nummer ...
FROM Artikel AS A ...
but this does not work due to the aggregate character of the MAX-function.
So, what's the best way to achieve the right-justified order for the number-columns?
Thanks,
Rainer
To get you problem with the list box solved have a look at this link: http://www.lebans.com/List_Combo.htm
I strongly believe that this type of adjustment should be made in the UI layer and not mixed in with data retrieval.
But to answer your original question i have created a SQL Fiddle:
MS SQL Server 2008 Schema Setup:
CREATE TABLE dbo.some_numbers(n INT);
Create some example data:
INSERT INTO dbo.some_numbers
SELECT CHECKSUM(NEWID())
FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1))X(x);
The following query is using the OVER() clause to specify that the MAX() is to be applied over all rows. The > and < that the result is wrapped in is just for illustration purposes and not required for the solution.
Query 1:
SELECT '>'+
SPACE(MAX(LEN(CAST(n AS VARCHAR(MAX))))OVER()-LEN(CAST(n AS VARCHAR(MAX))))+
CAST(n AS VARCHAR(MAX))+
'<'
FROM dbo.some_numbers SN;
Results:
| COLUMN_0 |
|---------------|
| >-1486993739< |
| > 1620287540< |
| >-1451542215< |
| >-1257364471< |
| > -819471559< |
| >-1364318127< |
| >-1190313739< |
| > 1682890896< |
| >-1050938840< |
| > 484064148< |
This query does a straight case to show the difference:
Query 2:
SELECT '>'+CAST(n AS VARCHAR(MAX))+'<'
FROM dbo.some_numbers SN;
Results:
| COLUMN_0 |
|---------------|
| >-1486993739< |
| >1620287540< |
| >-1451542215< |
| >-1257364471< |
| >-819471559< |
| >-1364318127< |
| >-1190313739< |
| >1682890896< |
| >-1050938840< |
| >484064148< |
With this query you still need to change the display font to a monospaced font like COURIER NEW. Otherwise, as you have noticed, the result is still misaligned.