How to query data in dynamic table and column in HQL? - hive

In our product, we have a wired scenario.
The backend service output the a HDFS as following.
|table_name| table_column| dimension|
|----|-----|----|
| A | A1 | dimension1|
| B | B1 | dimension2|
| C | C2 | dimension3 |
It means that we need create a table named dimension with dimension1, dimension2 and dimension3 column,
The dimension1 value is from A1 column of A table , dimension2 is from B1 column of B column and dimension3 is from C2 column of C table.
Anyone know how to archieve that?

Related

Replace cell reference in =formulatext output with variable name

I am working on a excel sheet where I need to show the formula that is used in another cell. I have 2 tables. Table one contains:
+-----------+-------+-------+------+
| Parameter | Short | Value | Unit |
| Name | | | |
+-----------+-------+-------+------+
| Diameter | D | 50 | mm |
+-----------+-------+-------+------+
| Wanddikte | T | 5 | mm |
+-----------+-------+-------+------+
| Lengte | L | 200 | mm |
+-----------+-------+-------+------+
And the second table:
+----------------------+-------+-------------+------+-----------------+
| Name | Short | output | Unit | Formula |
+----------------------+-------+-------------+------+-----------------+
| Doorsnede oppervlakt | A1 | 1963,495408 | mm | =0,25*PI()*C3^2 |
+----------------------+-------+-------------+------+-----------------+
| Binnendiameter | ID | 40 | mm | =C3-2*C4 |
+----------------------+-------+-------------+------+-----------------+
| Verfoppervlakt | Averf | 31415,92654 | mm2 | =PI()*C3*C5 |
+----------------------+-------+-------------+------+-----------------+
Now I want to change the last column of the second table. There you see the cell references: C3, C4 and C5.
Those refer to cells in the first table (Value column). But instead of showing C3 (value= 50 in table1) I want to show D (Short in table 1).
The last column in table 2 contains the excel formula: =FORMULATEXT(...) which refers to the output calculation in table 2.
How do I replace cell references with values from the Short column in the last column of the second table ?
1) You could use named ranges.
For example: C3 would be a named range called D. Then in your formula you would write =25*PI()*D^2 and you would have the FORMULATEXT as requested.
C4 would be a named range called T and C5 a named range called L.
To create the named range click on the cell you want to enter the name for e.g. C3 then go to the Name Box top left and enter then name e.g. D.
See here: Named ranges
2) Consider having a helper column where you put the following:
'=0,25*PI()*D^2 . Hide the column where you have the FORMULATEXT result and leave the helper column visible. The ' at the start means Excel will not try to evaluate the cell contents.
I think this might appear confusing if you use a simple letter such as D. This is not descriptive of what D actually is and can be confused as a partial cell reference.

Add columns from table 2 to columns at table1 based on Id matches (SQL)

I have a stored procedure and after performing certain calculations, i select the columns of the temp table to display at the UI.
Here is the end part of that stored procedure
SELECT Id, Data, Value from #preopt
The data which returns when we run this select statement is as follows.
Id | Data | Value
1 | xyz | 232
2 | abc | 222
3 | 3232 | www
Now I have one more table. This is not a temporary table. It has following data in it.
SELECT Id, List1, List2 from dbo.IdLists
Id | List1 | List2
1 | g23 | h323
45 | g21 | h44
2 | g455 | g45
3 | g32 | h48
I want my final table from stored proc to look like this. In the temp table #preopt. Basically it checks the Id column in #preopt and compares with Id column in dbo.IDlists. After comparison, it picks up List1 & List2 columns and adds relevant value for that id to the temp table #preopt
Id | Data | Value | List1 | List2
1 | xyz | 232 | g23 | h323
2 | abc | 222 | g455 | g45
3 | 3232 | www | g32 | h48
Can someone please let me know if this is achievable?
This query should do the trick. Update your List1 and List2 in the temp table using the values from the join on IDLists.
UPDATE p
SET p.List1 = l.List1, p.List2 = l.List2
FROM #preopt p
INNER JOIN dbo.IdLists l
ON p.Id = l.Id
Looks like you want to do a join.
SELECT po.Id, po.Data, po.Value, il. from #preopt po
INNER JOIN dbo.IdLists il on po.Id = il.Id

SQL: Join to different table based on column data

My scenario are as below:
Table: t_audit
row | colname | oldvalue | newvalue
===================================
1 | locid | 001 | 002
2 | typeid | 010 | 011
Table: t_ref_audit
colname | desc | link_table | link_key | link_desc
===========================================================
locid | Location | t_ref_loc | locid | loc_desc
typeid | Type | t_ref_type | typeid | type_desc
Table: t_ref_loc
locid | type_desc
==================
001 | LOCATION A
002 | LOCATION B
Table: t_ref_type
typeid | loc_desc
==================
010 | TYPE C
011 | TYPE D
As you can see from above, the first table is my audit log table, and 2nd table is the reference table. 3rd and 4th tables are reference tables. By using simple SQL below I can get the proper description for the column name based on t_ref_audit table.
SELECT t_ref_audit.desc, t_audit.oldvalue, t_audit.newvalue
FROM t_audit, t_ref_audit
WHERE t_audit.colname = t_ref_audit.colname
My problem now is, the columns on t_audit.oldvalue and t_audit.newvalue contains reference code ID from other reference tables (t_ref_loc & t_ref_type). I wanted to show the proper description based on columns from t_ref_audit.link_desc instead of just the ID as below:
coldesc | oldvalue | newvalue
==================================
Location | LOCATION A | LOCATION B
Type | TYPE C | TYPE D
Hope someone can enlighten me on this. Thank you.
Maybe something like this? (and the same logic for newvalues...)
SELECT
t_ref_audit.desc,
t_audit.oldvalue,
t_audit.newvalue,
case
when link_table='t_ref_loc' then t_ref_loc.loc_desc
when link_table='t_ref_type' then t_ref_type.type_desc
else '???'
end oldvalue_desc
FROM
t_audit
join t_ref_audit ON t_audit.colname = t_ref_audit.colname
left join t_ref_loc on link_table='t_ref_loc' and oldvalue=locid
left join t_ref_type on link_table='t_ref_type' and oldvalue=typeid
The logic works only for static mapping...
I think you mixed a bit t_ref_loc/t_ref_type tables and their titles.

How would I use SELECT statements to transform this data from two columns into three columns?

I inherited a horribly-designed table where data is stored like this:
Period | Identifier | Value
----------------------------------
1 | AB1 | some number
1 | AB2 | some number
1 | AB3 | some number
1 | AB4 | some number
1 | AB5 | some number
1 | A1 | some number
1 | A2 | some number
1 | A3 | some number
1 | A4 | some number
1 | A5 | some number
2 | AB1 | some number
2 | AB2 | some number
2 | AB3 | some number
2 | AB4 | some number
2 | AB5 | some number
2 | A1 | some number
2 | A2 | some number
2 | A3 | some number
2 | A4 | some number
2 | A5 | some number
I'm trying to use SELECT statements that will get data into this format:
Row # | First value | Second value
1 | A1's number | AB1's number // The next 5 rows are data from period 1
2 | A2's number | AB2's number
3 | A3's number | AB3's number
4 | A4's number | AB4's number
5 | A5's number | AB5's number
6 | A1's number | AB1's number // These 5 rows are from period 2
7 | A2's number | AB2's number
8 | A3's number | AB3's number
9 | A4's number | AB4's number
10 | A5's number | AB5's number
AB% and A% are two separate ID's of that format, which mildly frustrates WHERE LIKE ... clauses, I think. I'm not entirely sure the data can be forced into the desired format, but my supervisor asked me to look into it.
My initial attempt, for which I don't know the SQL code for, would be to look at the row number itself and work with, but as I said, I'm unsure how to progress down that route.
Right now, the data is in SQL Server, but it will be accessed from SAS using proc sql. I think those standards conform to SQL Server for the most part, even though DECLARE isn't supported.
And no, I don't know whose idea it was to store the data in this fashion...
If you're using SAS, then I'd just use PROC TRANSPOSE. Get the data to include a label variable, which determines which variable the data will be moved to:
data datatable;
infile datalines dlm='|';
input
Period Identifier $ Value $;
datalines;
1 | AB1 | some number
1 | AB2 | some number
1 | AB3 | some number
1 | AB4 | some number
1 | AB5 | some number
1 | A1 | some number
1 | A2 | some number
1 | A3 | some number
1 | A4 | some number
1 | A5 | some number
2 | AB1 | some number
2 | AB2 | some number
2 | AB3 | some number
2 | AB4 | some number
2 | AB5 | some number
2 | A1 | some number
2 | A2 | some number
2 | A3 | some number
2 | A4 | some number
2 | A5 | some number
;;;
run;
data have;
set datatable;
idlabel = compress(identifier, ,'d');
byval = compress(identifier,,'kd');
run;
proc sort data=have;
by period byval;
run;
proc transpose data=have out=want;
by period byval;
id idlabel;
var value;
run;
If for some reason you HAVE to do it in SQL, you are best off doing it as a join to itself. You want to join the row where period=1 and compress(identifier,,'kd')=1 for both AB and A, so you can do that:
proc sql;
create table want as
select A.period, AB.value as AB, A.value as A
from (select * from have where compress(identifier,,'d')='AB') AB,
(select * from have where compress(identifier,,'d')='A') A
where AB.period=A.period
and compress(AB.identifier,,'kd') = compress(A.identifier,,'kd');
quit;
But the PROC TRANSPOSE option is likely to be more efficient than the self join, I'd think (and more flexible, if your data isn't quite as pretty as you show).
If the "B" in the identifier is only used to differentiate between type A and type AB identifiers then you can simply remove that letter and join on the result:
SELECT ROW_NUMBER() OVER(ORDER BY AData.Period, AData.[Identifier]) AS [Row #]
, AData.[Identifier] AS [First Value]
, ABData.[Identifier] AS [Second Value]
FROM YourTable AData
-- Change to a LEFT JOIN if not all A's have AB's.
JOIN YourTable ABData
-- NOTE: Assumes that 'B' is the only differentiator between
-- AData and ABData's Identifier column and that it is
-- not repeated as part of the common identifier.
ON AData.[Identifier] = REPLACE(ABData.[Identifier], 'B', '')
You are absolutely correct - it is not a terribly great schema - this will probably need a full table scan.
Ignoring the trickiness of relating A to AB across their specific periods for a second, if the data were able to be related somehow, I would select the format you are looking for by doing an inner join on the table to itself, thus:
SELECT row_number() OVER(ORDER BY a.Period, a.Identifier, b.Identifier),
a.Value,
b.Value
FROM TableName a
INNER JOIN TableName b ON join_mechanism
ORDER BY a.Period, a.Identifier, b.Identifier
Now, to fill in the join mechanism, the obvious part would be to have a.Period = b.Period. The questionable part is an idea that you might try a string replace if this text is static. So REPLACE(a.Identifier, 'A', 'AB') = b.Identifier.
Thus, all told, you would have:
SELECT row_number() OVER(ORDER BY a.Period, a.Identifier, b.Identifier),
a.Value,
b.Value
FROM TableName a
INNER JOIN TableName b ON a.Period = b.Period AND REPLACE(a.Identifier, 'A', 'AB') = b.Identifier
ORDER BY a.Period, a.Identifier, b.Identifier
Note: The SELECT statements have not been tested, I'm assuming you are using are relatively new version of MSSQL that supports row_number.

sql - How do I implement a table where few attributes are just references to another tuple?

I have a Sql table, shown below:-
> select * from table1;
|--------------------------------------------------|
| ID | A1 | A2 | B1 | B2 | C1 | C2 | REF_B | REF_C |
|--------------------------------------------------|
| 1 | a1 | a1 | b1 | b1| c1 | c1 | 1 | 1 |
| 2 | a2 | a2 | b2 | b2| c1 | c1 | 2 | 1 |
| 3 | a3 | a3 | b1 | b1| c1 | c1 | 1 | 1 |
|--------------------------------------------------|
ID is Primary key.
A1 and A2 are unique to each tuple.
B1 and B2 are the values of tuple pointed to by REF_B attribute of the current row.
C1 and C2 are the values of tuple pointed to by REF_C attribute of the current row.
REF_B refers to the ID of another tuple in this same table from where we should get the values of Bx.
REF_C refers to the ID of another tuple in this same table from where we should get the values of Cx.
In this the above approach the obvious problem we face is propagating the changes made in tuple 1 to tuples 2 and 3. Right now we have used programmatic approach (Java code) to achieve this.
This is both difficult and not beautiful.
Proposed change
Divide table1 into three tables.
> select * from table1_a;
|------------------------------|
| ID | A1 | A2 | REF_B | REF_C |
|------------------------------|
| 1 | a1 | a1 | 1 | 1 |
| 2 | a2 | a2 | 2 | 1 |
| 3 | a3 | a3 | 1 | 1 |
|------------------------------|
> select * from table1_b;
|--------------|
| ID | B1 | B2 |
|--------------|
| 1 | b1 | b1 |
| 2 | b2 | b2 |
|--------------|
> select * from table1_c;
|--------------|
| ID | C1 | C2 |
|--------------|
| 1 | c1 | c1 |
|--------------|
table1 will be a updatable view over the join of these three tables.
Do you see any possible flaw in this approach?
Is there an easier solution?
What are the possible restrictions we may have on the new table1. table1 directly maps to an ADF Entity Object.
Use a trigger:
CREATE OR REPLACE TRIGGER upd_table1
BEFORE UPDATE OF a1
OR UPDATE OF a2
ON TABLE1
REFERENCING new AS new
BEGIN
UPDATE table1
SET b1 = new.a1, b2 = new.a2
WHERE refb = new.id;
UPDATE table1
SET c1 = new.a1, c2 = new.a2
WHERE refc = new.id;
END;
It sounds like your proposed solution is a normalization of your original table, assuming you make REF_A & REF_B (though I'd name these A_ID and B_ID, myself) foreign keys to table1_b and table1_c. Is that what you have in mind?
One thing that's not clear to me is why you need two columns here (A1 & A2) if they contain the same data. Couldn't you consolidate that into a single column and then simply select twice if you need two copies in the result? ie, assuming you had only one "A" column instead of A1/A2:
select A, A from table1....
But, I might be missing the intended use case here.
I've never used ADF, but the oracle documentation seems to imply you can reference a view:
Entity objects map to single objects in the datasource. In the vast majority of cases, these >are tables, views, synonyms, or snapshots in a database.
If this isn't very helpful, perhaps add some detail concerning the underlying purpose of this table.