Can SQL CASE return a different number of columns in WHEN clauses? - sql

Is there a way to SELECT a different number of columns in different WHEN clauses of the same CASE statement?
For example
SELECT
CASE x
WHEN is y THEN show me 1 column
WHEN is z THEN show me 3 columns
END
FROM i;

The restriction is that all branches of the CASE expression must resolve to the same data type. The manual:
The data types of all the result expressions must be convertible to a
single output type. See Section 10.5 for more details.
If all your output columns have a compatible data type, you could use an ARRAY to include a variable number of columns (resulting in the same array type). Like:
SELECT CASE x
WHEN 1 THEN ARRAY[y]
WHEN 2 THEN ARRAY[x,y,z]
-- no ELSE defaults to NULL
END AS my_result_array
FROM tbl;
If not, you could cast to a common element type of your choice (text would be the safe default):
SELECT CASE x
WHEN 1 THEN ARRAY[x::text]
WHEN 2 THEN ARRAY[x::text, y::text, z::text]
END AS my_result_array
FROM tbl;
Or, to make it work for heterogeneous data types, you can use a composite type (row type) and pad with NULL values. Name, number and type of output columns are fixed and have to cover all possible result combinations.
CREATE TYPE foo (a int, b text, c date);
SELECT CASE x
WHEN 1 THEN (x, NULL, NULL)::foo
WHEN 2 THEN (x, y, z)::foo
END AS my_result_array
FROM tbl;
Or you could use a document type like json, hstore or xml to contain a variable number of column ...
Note that you get one result column either way, the workarounds just contain a variable payload - which can be decomposed in the next step.

No you can't since columns selection are static. If you really want to choose columns dynamically based on some condition then consider building a dynamic query for that purpose.

Related

Store int, float and boolean in same database column

Is there a sane way of storing int, float and boolean values in the same column in Postgres?
If have something like that:
rid
time
value
2d9c5bdc-dfc5-4ce5-888f-59d06b5065d0
2021-01-01 00:00:10.000000 +00:00
true
039264ad-af42-43a0-806b-294c878827fe
2020-01-03 10:00:00.000000 +00:00
2
b3b1f808-d3c3-4b6a-8fe6-c9f5af61d517
2021-01-01 00:00:10.000000 +00:00
43.2
Currently I'm using jsonb to store it, the problem however now is, that I can't filter in the table with for instance the greater operator.
The query
SELECT *
FROM points
WHERE value > 0;
gives back the error:
ERROR: operator does not exist: jsonb > integer: No operator matches the given name and argument types. You might need to add explicit type casts.
For me it's okay to handle boolean as 1 or 0 in case of true or false. Is there any possibility to achieve that with jsonb or is there maybe another super type which lets me use a column that is able to use all three types?
Performance is not so much of a concern here, as I'm going to have very few records inside of that table, max 5k I guess.
If you were just storing integers and floats, normally you'd use a float or numeric column.
But there's that pesky true.
You could cast the JSON...
select *
from test
where value::float > 1;
...but there's that pesky true.
You have to convert the boolean to a number to make it work.
select *
from test
where
(case when value = 'true' then 1.0 when value = 'false' then 0.0 else value::float end) >= 1;
Or ignore it.
This having to work around the type system suggests that value is actually two or even three different fields crammed into one. Consider separating them into multiple columns.
You should skip the rows where value is not number and cast the value to numeric, e.g.:
with points(id, value) as (
values
(1, 'true'::jsonb),
(2, '2'),
(3, '43.2')
)
select *
from points
where jsonb_typeof(value) = 'number'
and value::text::numeric > 0;
id | value
----+-------
2 | 2
3 | 43.2
(2 rows)
I actually found out, regardless of the jsonb fields value, that you can compare it to other jsonb in postgres. That means, I can for instance do the following:
SELECT *
FROM points
WHERE val > '5'
This correctly gives me back only the third row. It just ignores the bool value. To filter for a certain bool I can achieve that with the following query:
SELECT *
FROM points
WHERE val = 'true'
This is good enough for me. I even could hold timestamps in the json column and compare them using this methodology.
Another way of solving the problem after all your comments seem to be to make the column a numeric. This would work as well, but requires more client side conversion, as I would have to have a second type column, remembering what the actual type is. This type should than be used on the client side to convert the value back into its og value. For integers its trivial, for booleans like #schwern suggested, one can use 1 and 0, for dates, one could use the unix timestamp representation.
When I now want to search for a certain value, the type has to be contained in the where clause as well.

Check subset using either string or array in Impala

I have a table like this
col
-----
A,B
The col could be string with comma or array. I have flexibility on the storage.
How to check of col is a subset of either another string or array variable? For example:
B,A --> TRUE (order doesn't matter)
A,D,B --> TRUE (other item in between)
A,D,C --> FALSE (missing B)
I have flexibility on the type. The variable is something I cannot store in a table.
Please let me know if you have any suggestion for Impala only (no Hive).
Thanks
A not pretty method, but perhaps a starting point...
Assuming a table with a unique identifier column id and an array<string> column col, and a string variable with ',' as a separator (and no occurrences of escaped '\,')...
SELECT
yourTable.id
FROM
yourTable,
yourTable.col
GROUP BY
yourTable.id
HAVING
COUNT(DISTINCT CASE WHEN find_in_set(col.item, ${VAR:yourString}) > 0 THEN col.item END)
=
LENGTH(regexp_replace(${VAR:yourString},'[^,]',''))+1
Basically...
Expand the arrays in your table, to one row per array item.
Check if each item exists in your string.
Aggregate back up to count how many of the items were found in the string.
Check that the number of items found is the same as the number of items in the string
The COUNT(DISTINCT <CASE>) copes with arrays like {'a', 'a', 'b', 'b'}.
Without expanding the string to an array or table (which I don't know how to do) you're dependent on the items in the string being unique. (Because I'm just counting commas in the string to find out how many items there are...)

SQL Server: How to select rows which contain value comprising of only one digit

I am trying to write a SQL query that only returns rows where a specific column (let's say 'amount' column) contains numbers comprising of only one digit, e.g. only '1's (1111111...) or only '2's (2222222...), etc.
In addition, 'amount' column contains numbers with decimal points as well and these kind of values should also be returned, e.g. 1111.11, 2222.22, etc
If you want to make the query generic that you don't have to specify each possible digit you could change the where to the following:
WHERE LEN(REPLACE(REPLACE(amount,LEFT(amount,1),''),'.','') = 0
This will always use the first digit as comparison for the rest of the string
If you are using SQL Server, then you can try this script:
SELECT *
FROM (
SELECT CAST(amount AS VARCHAR(30)) AS amount
FROM TableName
)t
WHERE LEN(REPLACE(REPLACE(amount,'1',''),'.','') = 0 OR
LEN(REPLACE(REPLACE(amount,'2',''),'.','') = 0
I tried like this in place of 1111111 replace with column name:
Select replace(Str(1111111, 12, 2),0,left(11111,1))

from string to map object in Hive

My input is a string that can contain any characters from A to Z (no duplicates, so maximum 26 characters it may have).
For example:-
set Input='ATK';
The characters within the string can appear in any order.
Now I want to create a map object out of this which will have fixed keys from A to Z. The value for a key is 1 if its corresponding character appears in the input string. So in case of this example (ATK) the map object should look like:-
So what is the best way to do this?
So the code should look like:-
set Input='ATK';
select <some logic>;
It should return a map object (Map<string,int>) with 26 key value pairs within it. What is the best way to do it, without creating any user defined functions in Hive. I know there is a function str_to_map that easily comes to mind.But it only works if key value pairs exist in source string and also it will only consider the key value pairs specified in the input.
Maybe not efficient but works:
select str_to_map(concat_ws('&',collect_list(concat_ws(":",a.dict,case when
b.character is null then '0' else '1' end))),'&',':')
from
(
select explode(split("A,B,C,D,E,F,G,H,I,J,K,L,M,N,O,P,Q,R,S,T,U,V,W,X,Y,Z",',')) as dict
) a
left join
(
select explode(split(${hiveconf:Input},'')) as character
) b
on a.dict = b.character
The result:
{"A":"1","B":"0","C":"0","D":"0","E":"0","F":"0","G":"0","H":"0","I":"0","J":"0","K":"1","L":"0","M":"0","N":"0","O":"0","P":"0","Q":"0","R":"0","S":"0","T":"1","U":"0","V":"0","W":"0","X":"0","Y":"0","Z":"0"}

How to get malformed or string type data from a numeric column in hive?

I have a column id (data type integer) containing the following records:
1
2
NULL
x
y
As hive automatically converts x and y into NULL, I'm first casting the id column to a string. Now I want count(id) where id is not from [0-9] and also not NULL. In my case, the count should be 2, but it is not working with xand y. I am also getting count of NULL's, in my example 3.
I have tried using LIKE, RLIKE and also with regexp_extract(id,'\&q=([^\&]+).
Can some one suggest me how to achieve this?
I tried something similar and it is working for me. I created an external table with your data:
CREATE EXTERNAL TABLE temp_count (count STRING) ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t' LOCATION 'user/$username/data'
Now I am running a query like this:
(Edited)
select count(*) from (select (count - count) as value from temp_count where count != 'NULL')q1 where value is NULL;
and I am getting 2 as the output.
Let me know if I am missing something here