Access array elements ANSI SQL - sql

I'm using Drill to query MongoDB using ANSI SQL , I have a field that contains an array of values, I want to be able to access those elements to join them with other documents .
select name from table where table.id = array.element;
but other than FLATTEN which divides them into multiple lines, I can't access the array's elements.
Any help please ?

I added some sample data in mongodb
db.col.insert({"id":1,name:"dev","arr":[1,2,3,4]});
Working query from Drill:
select name from col4 where id=arr[0];
Output:
+-------+
| name |
+-------+
| dev |
+-------+

Related

Need Column data to be the ROW header for my query

I am trying to use a LATERAL JOIN on a particular data set however i cannot seem to get the syntax correct for the query.
What am i trying to achieve:
Take the first column in the dataset (See picture) and use that as the Table headers (rows) and populate the rows with the data from the StringValue column
Currently it appears like this:
cfname | stringvalue |
----------------------------------------
customerrequesttype | newformsubmission|
Assignmentgroup | ITDEPT |
and I would like to have it appear as this:
customerrequesttype| Assignmentgroup|
-------------------------------------
newformsubmission | ITDEPT
As mentioned i am very new to SQL i know limited basics

How to fix: cannot retrieve all fields MongoDB collection with Apache Drill SQL expression query

I'm trying to retrieve all(*) columns from a MongoDB object with Apache Drill expression SQL:
`_id`.`$oid`
Background: I'm using Apache Drill to query MongoDB collections. By default, Drill retrieves the ObjectId values in a different format than the stored in the database. For example:
Mongo: ObjectId(“59f2c3eba83a576fe07c735c”)
Drill query result: [B#3149a…]
In order to get the data in String format (59f2c3eba83a576fe07c735c) from the object, I changed the Drill config "store.mongo.bson.record.reader" to "false".
ALTER SESSION SET store.mongo.bson.record.reader = false
Drill query result after config set to false:
select * from calc;
+--------------------------------------+---------+
| _id | name |
+--------------------------------------+---------+
| {"$oid":"5cb0e161f0849231dfe16d99"} | thiago |
+--------------------------------------+---------+
Running a query by _id:
select `o`.`_id`.`$oid` , `o`.`name` from mongo.od_teste.calc o where `o`.`_id`.`$oid`='5cb0e161f0849231dfe16d99';
Result:
+---------------------------+---------+
| EXPR$0 | name |
+---------------------------+---------+
| 5cb0e161f0849231dfe16d99 | thiago |
+---------------------------+---------+
For an object with a few columns like the one above (_id, name) it's ok to specify all the columns in the select query by id. However, in my production database, the objects have a "hundred" of columns.
If I try to query all (*) columns from the collection, this is the result:
select `o`.* from mongo.od_teste.calc o where `o`.`_id`.`$oid`='5cb0e161f0849231dfe16d99';
or
select * from mongo.od_teste.calc o where `o`.`_id`.`$oid`='5cb0e161f0849231dfe16d99';
+-----+
| ** |
+-----+
+-----+
No rows selected (6.112 seconds)
Expected result: Retrieve all columns from a MongoDB collection instead of declaring all of them on the SQL query.
I have no suggestions here, because it is a bug in Mongo Storage Plugin.
I have created Jira ticket for it, please take a look and feel free to add any related info there: DRILL-7176

Querying array of text in postgres

I have an array type I want to store in Postgres. One of the major use cases I have is to see if any of the records has an array which has a string in it.
eg.
| A | ["NY", "Paris", "Milan"] |
| B | ["Paris", "NY"] |
| C | [] |
| D | ["Milan"] |
Does there exist a row with Paris in the array? Which rows have Milan in the array? and so on.
I have 2 options on how to store the column. I can either make it a type text[] or convert it into a json as {"cities": ["NY", "Paris", "Milan"]} and then store as a JSONB field
However, I am not sure what would allow the fastest querying for the use case I have. Is there any one obviously better way of doing this? Am I tying myself down in any way by choosing one over the other? If I choose one over the other then how can I query the DB?
As you seem to be storing simple lists of values, I would recommend to use datataype Array over JSON, which better fits more complex cases (nested datastructures, associative arrays, ...).
To check for the value of an element at any position in the array, you can use array function ANY().
Here is a query that will return all records where the array stored in column cities contains 'Paris' :
SELECT t.* FROM mytable t WHERE 'Paris' = ANY(t.cities);
Yields :
id cities
---------------------------
A ["NY","Paris","Milan"]
B ["Paris","NY"]
Demo on DB Fiddle
For more information :
Postgres Arrays Documentation
Postgres Arrays Tutorial
I've noticed it is better to query JSONB, if it is a simple key-value store.
As in for instance you want to store arbitrary info on a row that your not sure what the columns(keys) would be.
info = {"a":"apple", "b":"ball"}
For use cases like yours, it would be better if you could design the db with simple tables so you could use JOINS and Indexes to your advantage.
You could restructure the tables like :
Location
id | name
----------
1 | Paris
2 | NY
3 | Milan
Other Table (with foreign key on location table)
user | location_id
--------------------
A | 1
A | 3
B | 2
Using these set of tables it would be easy to query all users with location paris using JOINS.

In Postgres: Select columns from a set of arrays of columns and check a condition on all of them

I have a table like this:
I want to perform count on different set of columns (all subsets where there is at least one element from X and one element from Y). How can I do that in Postgres?
For example, I may have {x1,x2,y3}, {x4,y1,y2,y3},etc. I want to count number of "id"s having 1 in each set. So for the first set:
SELECT COUNT(id) FROM table WHERE x1=1 AND x2=1 AND x3=1;
and for the second set does the same:
SELECT COUNT(id) FROM table WHERE x4=1 AND y1=1 AND y2=1 AND y3=1;
Is it possible to write a loop that goes over all these sets and query the table accordingly? The array will have more than 10000 sets, so it cannot be done manually.
You should be able convert the table columns to an array using ARRAY[col1, col2,...], then use the array_positions function, setting the second parameter to be the value you're checking for. So, given your example above, this query:
SELECT id, array_positions(array[x1,x2,x3,x4,y1,y2,y3,y4], 1)
FROM tbl
ORDER BY id;
Will yield this result:
+----+-------------------+
| id | array_positions |
+----+-------------------+
| a | {1,4,5} |
| b | {1,2,4,7} |
| c | {1,2,3,4,6,7,8} |
+----+-------------------+
Here's a SQL Fiddle.

ssrs report report filter with no duplicates used in query

I am having an issue and I'm not sure how to solve it.
I have an SSRS report that pulls from a table. I want a parameter filter to show de-duplicated values based on available options in one of the columns.
So my dataset with a query like:
SELECT * FROM table1 WITH (NOLOCK) WHERE col1 IN (#param)
Then I want a parameter called param that gets its available and default values from col1 in the above data set and I want them to be de-duplicated.
From reading online I learned I have to create a dummy param and use VBA code to de-duplicate that list.
So I have these params:
param_dummy that gets its available and default values from col1 in the above dataset
param that gets a de-duplicate list from param_dummy using Code.RemoveDuplicates
But I'm having an issue with circular logic. param gets its value from param_default which gets its value from the dataset/query which uses param.
How can I solve this?
One thought is to remove the WHERE col1 IN (#param) and instead use a filter on the Tablix table in the SSRS report. This works but I am wondering how efficient it is.
And/or if anyone has any other suggestions I am all ears.
Updated to add more details...
So let us say I have a table in my DB like so:
| id | col1 | col2 |
|----|------|--------|
| 1 | a | hello |
| 2 | b | how |
| 3 | a | are |
| 4 | c | you |
| 5 | d | on |
| 6 | a | this |
| 7 | b | lovely |
| 8 | c | day |
What I want is:
a Tablix to show all the fields from the table
a filter where the user can select between the available dropdowns in col1 (de-duplicated)
a text filter that allows nulls where a user can filter on col2
the parameters will have default values so the table will load on page load
So I have a dataset with a query like so:
SELECT
*
FROM dbo.table1
WHERE col1 IN (#col1options) AND (#col2value IS NULL OR col2 = #col2value)
Then for col1options I would make available and default options be Get values from a query and I would use the above dataset and col1.
But this won't work since the query/dataset depends on col1options which gets its default values from the query/dataset.
I can use a second dataset but that means making multiple calls to the SQL server and I want to avoid that.
I'm not sure I understand your issue so this is a guess...
If you mean you want to be able to filter your data by choosing one or more entries from a specific column in the table, but this column has duplicates and you want your parameter list to not show duplicates then this is what do to.
Create a new report
Add dataset dsMain as SELECT * FROM myTable WHERE myColumn IN (#myParam)
Add dataset dsParamValues as SELECT DISTINCT myColumn FROM myTable ORDER BY myColumn
Edit the #myParam parameter properties and set the available and default values to a query, then choose dsParamValues
Add you table/matrix control and set it's dataset property to dsMain
Found an easier solution.
Follow this link to build the "dummy" hidden parameter, the visible paramter and the de-dupe VBA code
Add a tablix properties filter where param is in the visible / non-hidden parameter from above VBA (FYI double click to add parameter)
Adding via double click will append a (0) at the end, remove the (0)
It should work as expected at that point! You should be able to select one, some or all parameters and your report should update accordingly.