Hive table with dynamic number of columns - sql

TestTable
inputsCOLUMN
3-300-150-150-R
3-200-100-100-A
5-500-00-500-A
output
3_open 3_spent 3_closing 3_type 5_open 5_spent 5_closing 5_type
-------- --------- ----------- -------- -------- --------- ----------- --------
300 150 150 R 500 00 500 A
200 100 100 A
Above is the input table called TestTable. It has two columns that contains rows of data(strings)
And there is a desired output table of which the column names are based on the input string.
the column name is the first number on the string + another string name, like CONCAT(split(inputsCOLUMN,'\\-')[0],'-','type')
so that output is the desired output. and the below query is not working as desired because of that part when i am trying to concatenate an alias i think is not allowed. so help me if there is a way i can find that desired output.
SELECT split(inputsCOLUMN,'\\-')[1] as CONCAT(split(inputsCOLUMN,'\\-')[0],'-','open'),
split(inputsCOLUMN,'\\-')[2] as CONCAT(split(inputsCOLUMN,'\\-')[0],'-','spent'),
split(inputsCOLUMN,'\\-')[3] as CONCAT(split(inputsCOLUMN,'\\-')[0],'-','closing'),
split(inputsCOLUMN,'\\-')[4] as CONCAT(split(inputsCOLUMN,'\\-')[0],'-','type')

Hive cannot have a dynamic number of columns, and it cannot have dynamic column names. It must be able to determine the entire schema (column count, types, and names) at query planning time, without looking at any data.
It's also not clear to me how exactly you're matching up input records into a single row. For example, how do you know which "3" record corresponds to which "5" record.
If you knew that, for example, there would always be a "3" record and a "5" record and you could commit to those being the only column names, and if you had a consistent way of matching up records to "flatten" this data, then it is possible, but difficult. I've done almost this exact operation before, and it involved a custom UDTF and a custom UDAF, and some code to auto-generate the actual query, which ended up being hundreds of lines long in some cases. I would re-evaluate why you want to do this in the first place and see if you can come up with another approach.

Related

What is the best way to represent similar inputs for rules in a relational database?

I am designing a rules database, which takes a number of inputs and determines the appropriate output.
There is a set of ~20 input types that are very similar (can be represented with Id, Value) which all drive a set ~20 output types. Each input type has a number of possible values. Each output type has a number of possible values as well.
An example rule would be InputTypeA + InputTypeB determines OutputTypeA or InputTypeA determines OutputTypeB. n input types can drive an output, but currently the maximum number is 3.
What is the best way to store the input values? I am considering either:
Storing in one table, with a InputType table FK'd to determine the type of the input value (they will all be strings from a db type perspective).
Storing each input type in a seperate table.
How can I represent the relationships? I could have a separate table for each relationship, but this system should be as flexible as possible. I'm concerned that will be a large number of tables, and make it difficult to add new rules.
I would think of this as 3 major concepts.
Condition: what needs to be true for a rule to fire?
Rule: when a rule fires, which results are true?
Result: which outputs fire, and with which values?
Conditions are modelled as one or more inputs with their values (your example uses InputTypeA + InputTypeB so I assume a rule has one or more qualifying inputs).
Results are modelled as one or more output, with their values.
Specifically:
Condition
-----------
ID
RuleID
InputType
InputValue
Rule
----
ID
Result
-------
ID
RuleID
OutputType
OutputValue
If there is more logic to input types and output types, you can create a lookup table with a foreign key relationship.
Example might be:
Condition
ID RuleID InputType InputValue
1 1 DrivingIn30zone 1
2 1 CurrentSpeed 40
Rule
ID Name
1 Don't break the speed limit
Result
ID RuleID OutputType OutputValue
1. 1 SetSpeed 29
You would then search for conditions which match your current input values to see what the output states are:

select row based on what a substring in a column might contain

I'm looking to select the primary key of a row and I've only got a column that contains info (in a substring) that I need to select the row.
E.g. MyTable
ID | Label
------------
11 | 1593:#:#:RE: test
12 | 1239#:#:#some more random text
13 | 12415#:#:#some more random text about the weather
14 | 369#:#:#some more random text about the StackOverflow
The label column has always a delimiter of :#:#:
So really I guess, I'd need to be able to split this row by the delimiter, grab the first part of the label column (i.e. the number I'm looking) to get the id I wanted.
So, If I wanted row with ID of 14, then I'd be:
Select ID from MyTable
where *something* = '369'
Any ideas on how to construct something ..or how best to go about this:)
I'm completely stumped and haven't been able to find how to do this.
Thanks,
How about:
WHERE label LIKE '369#%'?
No reason to get fancy.
Although.. if you are going to do this search often, then maybe pre-split that value out to another column as part of your ETL process and index it.

How to Pivot a single column source data in SQL?

Below are the input and output details.Any database Oracle, SQL Server and MySQL should do for the answers.I am not able to derive the logic to rank data which will help me to pivot.
My source is a flat file which contains data like below.I have loaded that file into one of the tables in Oracle.
Source Input:
**Flatfile1**
**Coulmn1**
Kamesh
65
5000
123456789
Nanu
45
3000
321654789
Expected Output:
Name Age Salary Mobilenumber
Kamesh 65 5000 123456789
Nanu 45 3000 321654789
After loading into one of the tables I am applying the logic to number this data which will eventually look like below:
Column1 Datavalue
Kamesh 1
65 1
5000 1
123456789 1
Nanu 2
45 2
3000 2
321654789 2
However, I am not able to derive logic (I tried with Rank) which will give me sequence number like this without having any key field.Hope this explains situation.
Thanks!!
Oracle doesn't store the rows in order, if you do select * from table1 multiple times you could get rows in different orders according to db operations and caching
Therefore if you have a table like that with no other column it's impossible to "pivot" the data.
I strongly suggest to save data in a normalized form, if you can't consider adding a column with a row ID populated automatically (identity column in oracle 12, trigger+ sequence in previous version)
Once you have your rows in order it will be easy to organize your data

SQL add column value based on another column ACCESS

What I'm trying to do is add another column to an existing table whose value will depend on an already existing column in the table. For example say I have this table:
Table1
|Letter|
A
C
R
A
I want to create another column (for example, numbers) that is chosen based on the letters. So let's say A corresponds with 10, C with 3 and R with 32 (this was chosen at random). My resulting table should be like this:
|Letter| Number |
A | 10
C | 3
R | 32
A | 10
Can anyone help me write a query that does this..I have over 20 different cases, so the simpler it looks the better.
Thanks in advance!
Options:
Build a table that associates [Letter] with the numeric value. Include this table in query by joining on the common [Letter] fields.
A very long Switch() expression. However, query design grid cell has a limit of 1024 characters.
Better to provide example with your real data and criteria.

How can I "dynamically" split a varchar column by specific characters?

I have a column that stores 2 values. Example below:
| Column 1 |
|some title1 =ExtractThis ; Source Title12 = ExtractThis2|
I want to remove 'ExtractThis' into one column and 'ExtractThis2' into another column. I've tried using a substring but it doesn't work as the data in column 1 is variable and therefore it doesn't always carve out my intended values. SQL below:
SELECT substring(d.Column1,13,24) FROM dbo.Table d
This returns 'Extract This' but for other columns it either takes too much or too little. Is there a function or combination of functions that will allow me to split consistently on the character? This is consistent in my column unlike my length count.
select substring(col1,CHARINDEX('=',col1)+1,CHARINDEX (';',col1)-CHARINDEX ('=',col1)-1) Val1,
substring(col1,CHARINDEX('=',col1,CHARINDEX (';',col1))+1,LEN(col1)) Val2
from #data
there is duplicate calculation that can be reduced from 5 to 3 to each line.
but I want to believe this simple optimization done by SQL SERVER.