How to write a view in CouchDB to represent "not in" or "group by having count(*) = 1"? - sql

Using relational db as an example, given two tables like below, where when rows in tableA and tableB have the same values, they represent the same "thing" but in different "state". So for ID=1, this thing has gone through stage1 and 2. But for ID=2, this thing has only gone through stage1.
tableA (Id, columnA, columnB)
1, "a", "b"
2, "x", "y"
3, "z", "a"
tableB (Id, columnA, columnB)
1, "e", "f"
I want to find all the rows from tableA that don't exist in tableB.
select a.*
from tableA a
left join
tableB b
on a.Id = b.Id
where b.Id is null
So above SQL will show rows 2 and 3 from tableA.
How can I do similar things in CouchDB? Say I have 4 docs that look like below.
{ "_id":"a-1", "type":"A", "correlation_id": "1" }
{ "_id":"a-2", "type":"A", "correlation_id": "2" }
{ "_id":"a-3", "type":"A", "correlation_id": "3" }
{ "_id":"b-1", "type":"B", "correlation_id": "1" }
How can I create a view that only show docs id = a-2 and a-3? I don't want to filter, just want to show all docs that haven't got type B. I could kinda do a group by and count(*) equivalent with view, but I can't do a group by, having count(*) = 1 equivalent.
I'm using CouchDB 3.0.

you could write a view :
function (doc) {
emit(doc.type, 1);
}
and then query the view using the key "A" and include_docs=true if you wanted the whole content of those documents.
If you not only want "A" but "everything but B" you can query from start to B and then from B to end and get all the documents that way.
Depending on your setup it might be easier to query the view with group_level=1 so you get all the keys and then loop through them excluding B to get the rest of the info you're interested in.

Related

How to groupby in subquery SQL

I have the following query:
SELECT "IMPORTACIONCOLUMN3", "role", COUNT("IMPORTACIONCOLUMN1")
FROM MYTABLE
WHERE ID = 9
GROUP BY "IMPORTACIONCOLUMN3", "role"
This gives me the following result:
I would like to achieve the following:
The "unique" values are grouped together (i.e. Instead of having 4 values of "Robot 1" these are grouped togehter in just 1 cell summing the count values.
The second group by or subquery has to be the same count, but with role instead of "IMPORTACIONCOLUM3"
Is it possible (for the second picture) to "link" the values either by index or adding an extra column to reference them (i.e. There's two "Solicitante" with a count value of "52" but it refers to "Robot 1" and other to "Solicitante" with count value of "58" links to "Robot 2"
The second image represent visually what I'm trying to explain.
I have been trying on my own but only have reached the following:
select "IMPORTACIONCOLUMN3", count("IMPORTACIONCOLUMN1")
from
(
select "IMPORTACIONCOLUMN1", count("role"), "IMPORTACIONCOLUMN3"
from MYTABLE
WHERE ID = 9
group by "IMPORTACIONCOLUMN1", "IMPORTACIONCOLUMN3"
) as tmp
group by "IMPORTACIONCOLUMN3"
But it is not yet the result I am looking for.
Thanks in advance for your help and tips!
EDIT:
Explaining my desired output in detail
Each one of "Robot 1, 2, 3" have roles such as "Solicitante", "Gerente", etc. with different values.
i.e. The first row "Humano" value "243" is the sum of "Agente de Compras - 95", "Gerente Financiero - 37", "Gerente Solicitante - 45", "Proovedor - 31", "Solicitante - 60".
I am linking these by the column "GRAFICOCOLUMNARECURSIVOID" with contains the index of whatever "Robot" these "roles" are from.
I would like to achieve a query containing subquerys that allows me to have this output.
Try this for question number 1:
SELECT "IMPORTACIONCOLUMN3", COUNT("IMPORTACIONCOLUMN1")
FROM MYTABLE
WHERE ID = 9
GROUP BY "IMPORTACIONCOLUMN3"
the problem is Role: Robot 1 have 4 roles
and this for question 2:
SELECT "role", COUNT("IMPORTACIONCOLUMN1")
FROM MYTABLE
WHERE ID = 9
GROUP BY "role"
Question 3, I don't understand what you are asking for. Please make an example.

How to extract a json value inside an array inside a json in SQL

I have a JSON being returned in my query called MetaDataJSON, inside which is an array of JSONs called Results. Inside each JSON in Results are two values, Chronic and Probability. There are a couple other tables that have been joined too. Is there a way to get Chronic in a column by itself? Right now I have gotten this far (table and variable names have been made generic):
SELECT DISTINCT
JSON_QUERY(mdj.value, '$.Results[0]') [Results]
FROM table1 t1
JOIN table2 t2 ON t2.parameter1 = t1.parameter1
AND t2.parameter2 = 'ASDF'
JOIN table3 t3 ON oad.parameter3 = oa.parameter3
AND t3.parameter4 = 11
CROSS APPLY OPENJSON(t3.MetaDataJSON) as mdj
This gets me a column called Results where each entry looks like:
{"Chronic": 0, "Probability": 0.0016}
Is there an efficient way to get Chronic in a column by itself? Thanks!
You can do it like this,
WITH jsons (x)
AS
(
-- replace this part with your query
Select a.x from (select '{"Chronic": 0, "Probability": 0.0016}' x ) as a, (SELECT 1 as y union select 2 as y ) as b
)
select
(SELECT value FROM OPENJSON(x,'$') where [key]='Chronic') as "Chronic",
(SELECT value FROM OPENJSON(x,'$') where [key]='Probability') as "Probability"
from jsons;
I think you can change the #json equation to use your query. But I cannot try since I don't have your tables...
BTW, I assume you are using MSSQL...

How to find if A (1d array) is part of B (2d array)?

I can honestly say I’m way out here.
The scope. I have two databases in which there are tables. I would like to compare the tables to se if they are matching or not.
My approach is:
get data from db1 table1
get data from db2 table1
for each row in db2 table 1 check if that exact row is present in db1 table1
Example:
db1 table1
id, column1, column2
1, aa, bb
2, cc, dd
3, ee, ff
db2 table1
id, column1, column2
1, aa, bb
3, ee, aa
Row with id=1 will generate no output since it is exactly the same in db1 table1 as in db2 table1.
Row with id=2 will generate no output since it is missing completely from db2 table1.
Row with id=3 will generate error since value in column2 is different between the two db’s.
I have managed to extract data from the two db’s and tables and I now have them in two different variables (2d arrays?). Now comes the problem.
When I write the content to host it looks like this for db1 table1:
#{id=1; column1=aa; column2=bb} #{id=2; column1=cc; column2=dd} #{id=3; column1=ee; column2=ff}
… and lite this for db2 table1:
#{id=1; column1=aa; column2=bb} #{id=3; column1=ee; column2=aa}
My idea was to do something like this:
foreach ($row in $db2table1) {
if($db1table1.Contains($row)) {
#all is good
}
else {
#error
}
}
But it don’t work. Ideas?
Compare-Object makes this task easier. You can perform a check like so:
$check = Compare-Object -ref $db1table1 -dif $db2table1
if ($check) {
"Error"
} else {
"All Good"
}
By default, Compare-Object outputs differences only indicating which object had the difference (SideIndicator property). If no difference is found, nothing is output. You can change the default behavior with -IncludeEqual and -ExcludeDifferent parameters.

How to query all entries with a value in a nested bigquery table

I generated a BigQuery table using an existing BigTable table, and the result is a multi-nested dataset that I'm struggling to query from. Here's the format of an entry from that BigQuery table just doing a simple select * from my_table limit 1:
[
{
"rowkey": "XA_1234_0",
"info": {
"column": [],
"somename": {
"cell": [
{
"timestamp": "1514357827.321",
"value": "1234"
}
]
},
...
}
},
...
]
What I need is to be able to get all entries from my_table where the value of somename is X, for instance. There will be multiple rowkeys where the value of somename will be X and I need all the data from each of those rowkey entries.
OR
If I could have a query where rowkey contains X, so to get "XA_1234_0", "XA_1234_1"... The "XA" and the "0" can change but the middle numbers to be the same. I've tried doing a where rowkey like "$_1234_$" but the query goes on for over a minute and is way too long for some reason.
I am using standard SQL.
EDIT: Here's an example of a query I tried that didn't work (with error: Cannot access field value on a value with type ARRAY<STRUCT<timestamp TIMESTAMP, value STRING>>), but best describes what I'm trying to achieve:
SELECT * FROM `my_dataset.mytable` where info.field_name.cell.value=12345
I want to get all records whose value in field_name equals some value.
From the sample Firebase Analytics dataset:
#standardSQL
SELECT *
FROM `firebase-analytics-sample-data.android_dataset.app_events_20160607`
WHERE EXISTS(
SELECT * FROM UNNEST(user_dim.user_properties)
WHERE key='powers' AND value.value.string_value='20'
)
LIMIT 1000
Below is for BigQuery Standard SQL
#standardSQL
SELECT t.*
FROM `my_dataset.mytable` t,
UNNEST(info.somename.cell) c
WHERE c.value = '1234'
above is assuming specific value can appear in each record just once - hope this is a true for you case
If this is not a case - below should make it
#standardSQL
SELECT *
FROM `yourproject.yourdadtaset.yourtable`
WHERE EXISTS(
SELECT *
FROM UNNEST(info.somename.cell)
WHERE value = '1234'
)
which I just realised pretty much same as Felipe's version - but just using your table / schema

snowflake json lateral subquery

I have the following in snowflake:
create or replace table json_tmp as select column1 as id, parse_json(column2) as c
from VALUES (1,
'{"id": "0x1",
"custom_vars": [
{ "key": "a", "value": "foo" },
{ "key": "b", "value": "bar" }
] }') v;
Based on the FLATTEN docs, I hoped to turn these into a table looking like this:
+-------+---------+-----+-----+
| db_id | json_id | a | b |
+-------+---------+-----+-----+
+-------+---------+-----+-----+
| 1 | 0x1 | foo | bar |
+-------+---------+-----+-----+
Here is the query I tried; it resulted in a SQL compilation error: "Object 'CUSTOM_VARS' does not exist."
select json_tmp.id as dbid,
f.value:id as json_id,
a.v,
b.v
from json_tmp,
lateral flatten(input => json_tmp.c) as f,
lateral flatten(input => f.value:custom_vars) as custom_vars,
lateral (select value:value as v from custom_vars where value:key = 'a') as a,
lateral (select value:value as v from custom_vars where value:key = 'b') as b;
What exactly is the error here? Is there a better way to do this transformation?
Note - your solution doesn't actually perform any joins - flatten is a "streaming" operation, it "explodes" the input, and then selects the rows it wants. If you only have 2 attributes in the data, it should be reasonably fast. However, if not, it can lead to an unnecessary data explosion (e.g. if you have 1000s of attributes).
The fastest solution depends on how your data is structured exactly, and what you can assume about the input. For example, if you know that 'a' and 'b' are always in that order, you can obviously use
select
id as db_id,
c:id,
c:custom_vars[0].value,
c:custom_vars[1].value
from json_tmp;
If you know that custom_vars is always 2 elements, but the order is not known, you could do e.g.
select
id as db_id,
c:id,
iff(c:custom_vars[0].key = 'a', c:custom_vars[0].value, c:custom_vars[1].value),
iff(c:custom_vars[0].key = 'b', c:custom_vars[0].value, c:custom_vars[1].value)
from json_tmp;
If the size of custom_vars is unknown, you could create a JavaScript function like extract_key(custom_vars, key) that would iterate over custom_vars and return value for the found key (or e.g. null or <empty_string> if not found).
Hope this helps. If not, please provide more details about your problem (data, etc).
Update Nov 2019
There seems to be a function that does this sort of thing:
select json_tmp.id as dbid,
json_tmp.c:id as json_id,
object_agg(custom_vars.value:key, custom_vars.value:value):a as a,
object_agg(custom_vars.value:key, custom_vars.value:value):b as b
from
json_tmp,
lateral flatten(input => json_tmp.c, path => 'custom_vars') custom_vars
group by json_tmp.id
Original answer Sept 2017
The following query seems to work:
select json_tmp.id as dbid,
json_tmp.c:id as json_id,
a.value:value a,
b.value:value b
from
json_tmp,
lateral flatten(input => json_tmp.c, path => 'custom_vars') a,
lateral flatten(input => json_tmp.c, path => 'custom_vars') b
where a.value:key = 'a' and b.value:key = 'b'
;
I'd rather filter in a subquery rather than on the join, so I'm still interested in seeing other answers.