in Hive I'd like to dynamically extract information from a table, save it in a variable and further use it. Consider the following example, where I retrieve the maximum of column var and want to use it as a condition in the subsequent query.
set maximo=select max(var) from table;
select
*
from
table
where
var=${hiveconf:maximo}
It does not work, although
set maximo=select max(var) from table;
${hiveconf:maximo}
shows me the intended result.
Doing:
select '${hiveconf:maximo}'
gives
"select max(var) from table"
though.
Best
Hive substitutes variables as is and does not execute them. Use shell wrapper script to get result into variable and pass it to your Hive script.
maximo=$(hive -e "set hive.cli.print.header=false; select max(var) from table;")
hive -hiveconf "maximo"="$maximo" -f your_hive_script.hql
And after this inside your script you can use select '${hiveconf:maximo}'
#Hein du Plessis
Whilst it's not possible to do exactly what you want from Hue -- a constant source of frustration for me -- if you are restricted to Hue, and can't use a shell wrapper as suggested above, there are workarounds depending on the scenario.
When I once wanted to set a variable by selecting the max of a column in a table to use in a query, I got round it like this:
I first put the result into a table comprising two columns, with the (arbitrary word) 'MAX_KEY' in one column and the result of the max calculation in the other, like this:
drop table if exists tam_seg.tbl_stg_temp_max_id;
create table tam_seg.tbl_stg_temp_max_id as
select
'MAX_KEY' as max_key
, max(pvw_id) as max_id
from
tam_seg.tbl_dim_cc_phone_vs_web;
I then added the word 'MAX_KEY' to a sub-query then joined in the above table so I could use the result in the main query:
select
-- *** here is the joined in value from the table being used ***
cast(mxi.max_id + qry.temp_id as string) as pvw_id
, qry.cc_phone_vs_web
from
(
select
snp.cc_phone_vs_web
, row_number() over(order by snp.cc_phone_vs_web) as temp_id
-- *** here is the key being added to the sub-query ***
, 'MAX_KEY' as max_key
from
(
select distinct cc_phone_vs_web from tam_seg.tbl_stg_base_snapshots
) as snp
left outer join
tam_seg.tbl_dim_cc_phone_vs_web as pvw
on snp.cc_phone_vs_web = pvw.cc_phone_vs_web
where
pvw.cc_phone_vs_web is null
) as qry
-- *** here is the table with the select result in being joined in ***
left outer join
tam_seg.tbl_stg_temp_max_id as mxi
on qry.max_key = mxi.max_key
;
Not sure if this is your scenario but maybe it can be adapted. I'm 99% sure you can't just put a select statement directly into a variable in Hue though.
If I am doing something in just Hue I would probably do the temporary table and join method. But if I were using a shall wrapper anyway I would definitely do it there.
I hope this helps.
Related
Provided there is a long list of values, which happen to be values of attributes of records in a postgres-database.
I would like to create a query which finds out which of these values can not be found in the database.
I have no right to execute DDL-Statements and I would like to avoid procedural code.
Example:
the table might be
CREATE TABLE Test (
ID Integer,
attr varchar(30)
)
The list might be something like (but longer, about 240000 values)
ATTR
TestValue0
TestValue1
TestValue2
TestValue3
Using sed I can create and execute a statement
select count(*) from Test where attr in ('TestValue0',
'TestValue1','TestValue2','TestValue3')
This statement shows me, that not all of these values can be found in Test.
How can I formulate a query which tells me which of these uniq-values can not be found in the postgres-database?
For what you want to do, you can use left join, not in or not exists. But the key is that you need a derived table with the values you care about:
select v.attr
from (values ('TestValue0'), ('TestValue1'), ('TestValue2'), ('TestValue3')
) v attr
where not exists (select 1 from test t where t.attr = v.attr);
I am working on a query that will check the temp table if there is a record that do not exist on the main table. My query looks like this
SELECT * FROM [Telemarketing].[dbo].[PDCampaignBatch_temp]
WHERE [StartDateTime] NOT IN (SELECT [StartDateTime] FROM [Telemarketing].[dbo].PDCampaignBatch GROUP BY [StartDateTime])
but the problem is it does not display this row
even if that data does not exist in my main table. What seems to be the problem?
NOT IN has strange semantics. If any values in the subquery are NULL, then the query returns no rows at all. For this reason, I strongly recommend using NOT EXISTS instead:
SELECT t.*
FROM [Telemarketing].[dbo].[PDCampaignBatch_temp] t
WHERE NOT EXISTS (SELECT 1
FROM [Telemarketing].[dbo].PDCampaignBatch cb
WHERE t.StartDateTime = cb.StartDateTime
);
If the set is evaluated by the SQL NOT IN condition contains any values that are null, then the outer query here will return an empty set, even if there are many [StartDateTime]s that match [StartDateTime]s in the PDCampaignBatch table.
To avoid such issue,
SELECT *
FROM [Telemarketing].[dbo].[PDCampaignBatch_temp]
WHERE [StartDateTime] NOT IN (
SELECT DISTINCT [StartDateTime]
FROM [Telemarketing].[dbo].PDCampaignBatch
WHERE [StartDateTime] IS NOT NULL
);
Let's say PDCampaignBatch_temp and PDCampaignBatch happen to have the same structure (same columns in the same order) and you're tasked with getting the set of all rows in PDCampaignBatch_temp that aren't in PDCampaignBatch. The most effective way to do that is to make use of the EXCEPT operator, which will deal with NULL in the expected way as well:
SELECT * FROM [Telemarketing].[dbo].[PDCampaignBatch_temp]
EXCEPT
SELECT * FROM [Telemarketing].[dbo].[PDCampaignBatch]
In production code that is not a one-off, don't use SELECT *, write out the column names instead.
Most likely your issue is with the datetime. You may be only displaying a certain degree of percision like the year/month/date. The data may be stored as year/month/date/hour/minute/second/milisecond. If so you have to match down the the most granluar measurement of the data. If one field is a date and the other is a date time they also will likely never match up. Thus you always get no responses.
I would like to know if is it possible to use the clause "with as" with a variable and/or in a block begin/end.
My code is
WITH EDGE_TMP
AS
(select edge.node_beg_id,edge.node_end_id,prg_massif.longueur,prg_massif.lgvideoupartage,prg_massif.lgsanscable from prg_massif
INNER JOIN edge on prg_massif.asset_id=edge.asset_id
where prg_massif.lgvideoupartage LIKE '1' OR prg_massif.lgsanscable LIKE '1')
,
journey (TO_TOWN, STEPS,DISTANCE,WAY)
AS
(SELECT DISTINCT node_beg_id, 0, 0, CAST(&&node_begin AS VARCHAR2(2000))
FROM EDGE_TMP
WHERE node_beg_id = &&node_begin
UNION ALL
SELECT node_end_id, journey.STEPS + 1
, journey.DISTANCE + EDGE_TMP.longueur,
CONCAT(CONCAT(journey.WAY,';'), EDGE_TMP.node_end_id
)
It create a string as output separated by a ; but i need to get it back as variable or table do you know how? I used a concat to retrieve data in a big string. Can i use a table to insert data
,
A need to use the result to proceed more treatment.
Thank you,
mat
No, WITH is a part of an SQL statement only. But if you describe why you need it in pl/sql, we'll can advice you something.
Edit: if you have SQL statement which produces result you need, you can assign it's value to pl/sql variable. There are several methods to do this, simpliest is to use SELECT INTO statement (add INTO variable clause into your select).
You can use WITH clause as a part of SELECT INTO statement (at least in not-too-very-old Oracle versions).
I am fairly new to SQL, and I am hoping someone can help me with a problem I'm having. I haven't been able to find any answers helping me figure out this exact problem.
I have two tables in two SQL Server databases on two different servers that I want to compare using the column ItemID. I want to find records from Table1 that have an ItemID that does not exist in Table2 and insert those into a table variable. I have the following code:
--Create table variable to hold query results
DECLARE #ItemIDTable TABLE
(
[itemid][NVARCHAR](20) NULL
);
--Query data and insert results into table variable
INSERT INTO #ItemIDTable
([itemid])
SELECT a.[itemid]
FROM database1.dbo.table1 a
WHERE NOT EXISTS (SELECT 1
FROM [Database2].[dbo].[table2]
WHERE a.itemid = [Database2].[dbo].[table2].[itemid])
ORDER BY itemid
This works on a test server where the two databases are on the same server, but not in real life where they are on different servers. I tried the following using OPENQUERY, but I know I haven't got it quite right.
--Create table variable to hold query results
DECLARE #ItemIDTable TABLE
(
[ItemID][nvarchar](20) NULL
);
--Query data and insert results into table variable
INSERT INTO #ItemIDTable
([ItemID])
SELECT a.[ItemID]
FROM Database1.dbo.Table1 a
WHERE NOT EXISTS (SELECT 1
FROM OPENQUERY([Server2], SELECT * FROM [Database2].[dbo].[Table2]')
WHERE a.ItemID = [Database2].[dbo].[Table2].[ItemID])
ORDER BY ItemID
I'm pretty sure I need to do something in the WHERE clause, where I have the two databases on two servers, I'm just not quite sure how to structure it. Could anyone help?
You can't create an OPENQUERY that is correlated to an outer query. You could populate a temp table with the results of an OPENQUERY and do your WHERE NOT EXISTS against the temp table, or you might want to look into Synonyms.
Openquery works like this:
select *
from openquery
(LINKED_SERVER_NAME,
'select query goes here'
)
Note that the sql portion is single quoted. That means you might have to quote the quotes if necessary. For example:
select *
from openquery
(LINKED_SERVER_NAME,
'
select SomeTextField
from SomeTable
where SomeDateField = ''20141014''
'
)
I have to compare comma separated values with a column in the table and find out which values are not in database. [kind of master data validation]. Please have a look at the sample data below:
table data in database:
id name
1 abc
2 def
3 ghi
SQL part :
Here i am getting comma separated list like ('abc','def','ghi','xyz').
now xyz is invalid value, so i want to take that value and return it as output saying "invalid value".
It is possible if i split those value, take it in temp table, loop through each value and compare one by one.
but is there any other optimal way to do this ??
I'm sure if I got the question right, however, I would personally be trying to get to something like this:
SELECT
D.id,
CASE
WHEN B.Name IS NULL THEN D.name
ELSE "invalid value"
END
FROM
data AS D
INNER JOIN badNames B ON b.Name = d.Name
--as SQL is case insensitive, equal sign should work
There is one table with bad names or invalid values if You prefer. This can a temporary table as well - depending on usage (a black-listed words should be a table, ad hoc invalid values provided by a service should be temp table, etc.).
NOTE: The select above can be nested in a view, so the data remain as they were, yet you gain the correctness information. Otherwise I would create a cursor inside a function that would go through the select like the one above and alter the original data, if that is the goal...
It sounds like you just need a NOT EXISTS / LEFT JOIN, as in:
SELECT tmp.InvalidValue
FROM dbo.HopeThisIsNotAWhileBasedSplit(#CSVlist) tmp
WHERE NOT EXISTS (
SELECT *
FROM dbo.Table tbl
WHERE tbl.Field = tmp.InvalidValue
);
Of course, depending on the size of the CSV list coming in, the number of rows in the table you are checking, and the style of splitter you are using, it might be better to dump the CSV to a temp table first (as you mentioned doing in the question).
Try following query:
SELECT SplitedValues.name,
CASE WHEN YourTable.Id IS NULL THEN 'invalid value' ELSE NULL END AS Result
FROM SplitedValues
LEFT JOIN yourTable ON SplitedValues.name = YourTable.name