sql select query for impala column with map data type - sql

For a table, say details, with the schema as,
Column
Type
name
string
desc
map<int, string>
How do I form a select query - to be run by java program - which expects the result set in this structure?
name
desc
Bob
{1,"home"}
Alice
{2,"office"}
Having in mind limitations in impala with regards to complex types here
The result set of an Impala query always contains all scalar types;
the elements and fields within any complex type queries must be
"unpacked" using join queries.
ie. select * from details; would only return results without the column with map type(complex type).
The closest I've come up with is select name, map_col.key, map_col.value from details, details.desc map_col;. Result set is not in expected format - obviously
.
Thanks in advance.

Related

How can you filter Snowflake EXPLAIN AS TABULAR syntax when its embedded in the TABLE function? Can you filter it with anything?

I have a table named Posts I would like to count and profile in Snowflake using the current Snowsight UI.
When I return the results via EXPLAIN using TABLULAR I am able to return the set with the combination of TABLE, RESULT_SCAN, and LAST_QUERY_ID functions, but any predicate or filter or column reference seems to fail.
Is there a valid way to do this in Snowflake with the TABLE function or is there another way to query the output of the EXPLAIN using TABLULAR?
-- Works
EXPLAIN using TABULAR SELECT COUNT(*) from Posts;
-- Works
SELECT t.* FROM TABLE(RESULT_SCAN(LAST_QUERY_ID())) as t;
-- Does not work
SELECT t.* FROM TABLE(RESULT_SCAN(LAST_QUERY_ID())) as t where operation = 'GlobalStats';
-- invalid identifier 'OPERATION', the column does not seem recognized.
Tried the third example and expected the predicate to apply to the function output. I don't understand why the filter works on some TABLE() results and not others.
You need to double quote the column name
where "operation"=
From the Documentation
Note that because the output column names from the DESC USER command
were generated in lowercase, the commands use delimited identifier
notation (double quotes) around the column names in the query to
ensure that the column names in the query match the column names in
the output that was scanned

PostgreSQL - How to cast dynamically?

I have a column that has the type of the dataset in text.
So I want to do something like this:
SELECT CAST ('100' AS %INTEGER%);
SELECT CAST (100 AS %TEXT%);
SELECT CAST ('100' AS (SELECT type FROM dataset_types WHERE id = 2));
Is that possible with PostgreSQL?
SQL is strongly typed and static. Postgres demands to know the number of columns and their data type a the time of the call. So you need dynamic SQL in one of the procedural language extensions for this. And then you still face the obstacle that functions (necessarily) have a fixed return type. Related:
Dynamically define returning row types based on a passed given table in plpgsql?
Function to return dynamic set of columns for given table
Or you go with a two-step flow. First concatenate the query string (with another SELECT query). Then execute the generated query string. Two round trips to the server.
SELECT '100::' || type FROM dataset_types WHERE id = 2; -- record resulting string
Execute the result. (And make sure you didn't open any vectors for SQL injection!)
About the short cast syntax:
Postgres data type cast

Dynamic SQL queries as parameter

I need a Report where a user has to choose 2 parameters. The first parameter contains the years (2017, 2016...), and the second one contains the ID process. Depending on the process that the user chooses, the SQL statement will be one or another. The parameter year is part of the WHERE clause of the SQL contained in the second parameter.
So I have this report with 2 parameters (param_year, Indicador). Query parameter is done using a table datasource, where the IDs column contains the SQL sentences and the Values column contains the text the user must select.
So what I'm doing next is to set ${Indicador} as the SQL statement in the JDBC connection that I have done to the Database. This is reporting me an SQL error
"Failed at query: ${Indicador}.
Any suggestions will be appreciated. Thanks in advance.
Another option is to create multiple datasources in your Master/sub report, then select appropriate datasource using PRD expression on Master/sub Report -> Attributes -> query -> name attribute.
More detailed explanation:
Create a query (I mean a query as a PRD object, which uses the PRD datasource) for every SQL string you need and move the SQL strings from the parameter table into Report Designer queries definitions.
Replace the SQL strings in your parameter table with names of corresponding queries, e.g:
Use the value of your parameter (which should be equal to the PRD query name) as value for Master/sub Report -> Attributes -> query -> name attribute:
You need Pentaho Data Integration to do this kind of dynamic query
If the table structure (output columns) for both queries is the same, you could put them together into one big SQL statement with UNION ALL and put in each query "WHERE ${Indicador} = ValueToRunThisQuery".
The optimizer should be smart enough to know the not-selected subquery is going to return zero rows and not even run it. You can supply a few null columns if one query has fewer columns, but the data types have to be the same for filled columns.
If the output table structure is different between the two queries they should be in different data sources, or even reports.
SELECT ID, BLA, BLA, BLA, ONLY_IN_A
FROM TABLE A
WHERE ${Indicador} = "S010"
UNION ALL
SELECT ID, BLA, BLA, BLA, NULL
FROM TABLE B
WHERE ${Indicador} = "S020"

SqlQuery and SqlFieldsQuery

it looks that SqlQuery only supports sql that starts with select *? Doesn't it support other sql that only select some columns like
select id, name from person and maps the columns to the corresponding POJO?
If I use SqlFieldQuery to run sql, the result is a QueryCursor of List(each List contains one record of the result). But if the sql starts with select *, the this list's contents would be different with field query like:
select id,name,age from person
For the select *, each List is constructed with 3 parts:
the first elment is the key of the cache
the second element is the pojo object that contains the data
the tailing element are the values for each column.
Why was it so designed? If I don't know what the sql that SqlFieldsQuery runs , then I need additional effort to figure out what the List contains.
SqlQuery returns key and value objects, while SqlFieldsQuery allows to select specific fields. Which one to use depends on your use case.
Currently select * indeed includes predefined _key and _val fields, and this will be improved in the future. However, generally it's a good practice to list fields you want to fetch when running SQL queries (this is true for any SQL database, not only Ignite). This way your code will be protected from unexpected behavior in case schema is changed, for example.

Typecheck SQL query

Is there any relational database that can output the return type of a query before running it? As an example, a query like this GIVE_TYPES SELECT name, age FROM person would give a result like VARCHAR(255), INTEGER without actually executing the query. If this is not a possibility, why is that the case?
EDIT
The first comment made me realize that I need to give a slightly more complicated use case. Imagine if the query were something like this:
SELECT parent_name, COUNT(name) FROM person GROUP BY parent_name;
To select the names of all parents and the number of children they have. I would expect something like VARCHAR(255), INTEGER as the result for this as well, but a column inspection would not let me know about COUNT's return type.
Count's return type always is int.
http://msdn.microsoft.com/en-us/library/ms175997.aspx
If you are running on top of persistent and esqueleto, then I don't think you're going to have simple access to this sort of information. In particular, I assume any unrecognized PostgreSQL types just get mapped to a Haskell String or Text.
About the best you can do is:
CREATE TEMP TABLE t1 AS LIMIT 0
SELECT * FROM information_schema.columns WHERE table_name='t1' AND schema_name=...
DROP TABLE t1;
You'll want to identify the temporary schema-name for your table t1 (in the format pg_temp_xxx).
This (plus perhaps some follow-up queries on the information-schema for type details) should give you details on all columns of your result-set.