Select column from Hive if exists - hive

Is there a way to conditionally select each column only if the column exist in Hive?
Here is my pseudo-hql:
SELECT attr1 IF EXISTS, attr2 IF EXISTS, attr3 IF EXISTS
FROM some_table;
If attr1 & attr3 exist in the table, but attr2 does not exist, this should return to me all the rows from attr1 & attr3 without complaining about the absence of attr2. This syntax does NOT work, and Hive is very restrictive about inner queries too so I don't want to go that route unless necessary.

There is no direct way to do it with a single query.
But you can do either of them below and develop a logic to leverage the results gained from them:
1)Hive metastore client(HiveMetastoreClient.getFields) to get the fields.
2)Desc the table and get the description.
After getting the results(from any one of the above) iterate through the result to check if the fields in your query are present in the result. If it is then execute entire query. Or skip those which are not present.

Related

Update on join in Hive

I'm trying to understand how to update a column in a Hive table, based on an id match with a different table.
That is, I have a table 'users', with columns 'UID' (string), 'isVerified' (boolean) and a lot more columns. Then I have a second table 'users_verified', with just 1 column 'UID' (string). I'm trying to do something to the effect of
UPDATE users SET isVerified = 1
WHERE UID in (SELECT UID from users_verified);
However neither this nor UPDATE ON JOIN queries seem supported by Hive, and it seems I need to use an INSERT OVERWRITE statement instead. Can anyone give me an example of how that might work?

Get Id from a conditional INSERT

For a table like this one:
CREATE TABLE Users(
id SERIAL PRIMARY KEY,
name TEXT UNIQUE
);
What would be the correct one-query insert for the following operation:
Given a user name, insert a new record and return the new id. But if the name already exists, just return the id.
I am aware of the new syntax within PostgreSQL 9.5 for ON CONFLICT(column) DO UPDATE/NOTHING, but I can't figure out how, if at all, it can help, given that I need the id to be returned.
It seems that RETURNING id and ON CONFLICT do not belong together.
The UPSERT implementation is hugely complex to be safe against concurrent write access. Take a look at this Postgres Wiki that served as log during initial development. The Postgres hackers decided not to include "excluded" rows in the RETURNING clause for the first release in Postgres 9.5. They might build something in for the next release.
This is the crucial statement in the manual to explain your situation:
The syntax of the RETURNING list is identical to that of the output
list of SELECT. Only rows that were successfully inserted or updated
will be returned. For example, if a row was locked but not updated
because an ON CONFLICT DO UPDATE ... WHERE clause condition was not
satisfied, the row will not be returned.
Bold emphasis mine.
For a single row to insert:
Without concurrent write load on the same table
WITH ins AS (
INSERT INTO users(name)
VALUES ('new_usr_name') -- input value
ON CONFLICT(name) DO NOTHING
RETURNING users.id
)
SELECT id FROM ins
UNION ALL
SELECT id FROM users -- 2nd SELECT never executed if INSERT successful
WHERE name = 'new_usr_name' -- input value a 2nd time
LIMIT 1;
With possible concurrent write load on the table
Consider this instead (for single row INSERT):
Is SELECT or INSERT in a function prone to race conditions?
To insert a set of rows:
How to use RETURNING with ON CONFLICT in PostgreSQL?
How to include excluded rows in RETURNING from INSERT ... ON CONFLICT
All three with very detailed explanation.
For a single row insert and no update:
with i as (
insert into users (name)
select 'the name'
where not exists (
select 1
from users
where name = 'the name'
)
returning id
)
select id
from users
where name = 'the name'
union all
select id from i
The manual about the primary and the with subqueries parts:
The primary query and the WITH queries are all (notionally) executed at the same time
Although that sounds to me "same snapshot" I'm not sure since I don't know what notionally means in that context.
But there is also:
The sub-statements in WITH are executed concurrently with each other and with the main query. Therefore, when using data-modifying statements in WITH, the order in which the specified updates actually happen is unpredictable. All the statements are executed with the same snapshot
If I understand correctly that same snapshot bit prevents a race condition. But again I'm not sure if by all the statements it refers only to the statements in the with subqueries excluding the main query. To avoid any doubt move the select in the previous query to a with subquery:
with s as (
select id
from users
where name = 'the name'
), i as (
insert into users (name)
select 'the name'
where not exists (select 1 from s)
returning id
)
select id from s
union all
select id from i

Push deleted items into temporary table after MERGE

A simple query on the INTO clause. When i try the below statement, the items get pushed into CustomersBackup2013 whether the table exists or not.
SELECT *
INTO CustomersBackup2013
FROM Customers;
However, when i try using the into clause in a MERGE like
MERGE TargetTable tt
USING SyncTable st on <condition>
.
.
WHEN Not MATCHED BY SOURCE
DELETE
OUTPUT deleted.* INTO #Sometemptable;
I get an error saying invalid object name '#Sometemptable'
Isnt it supposed to create the table if it does not exist? Is there something I am doing wrong.
Is there any way i can modify the clause to push items into #Sometemptable?
No, the table has to exists for the output clause to work.
First create the #temp table and then you can output deleted values in it. Columns in the temp table must match columns from output by ordinal and type.
select into table creates a table if required. it is much like oracle's create table as select. see here select into table
merge only inserts, updates or deletes

Validate Data in SQL Server Table

I am trying to validate the data present in SQL Server table using a stored procedure.
In one of the validation rules, i have to check whether the value of a particular column is present in another table.
Suppose i have a staging table with following columns Cat_ID, Amount, SRC_CDE
I have a 'maintable' with following columns CatID , Cat_Name
I have to validate whether the Cat_ID present in staging table exists in the 'maintable' for each row
I am using the following statement to validate
if((Select count(*) from maintable where CatID= #Cat_id) >0 )
-- Do something if data present
I want to know if there is any better way of doing the above thing other than using a select query for every row.
Can i use some sort of an array where i can fetch all the CatID from maintable and the check instead of using a select query.
Thanks
Using a left join to list all the invalid rows.
select
staging.*
from
staging
left join maintable
on staging.catid=maintable.catid
where maintable.catid is null

Query for first occurence of null value in SQLite

I have a table which I dynamically fill with some data I want to create some statistics for. I have one value which has some values following a certain pattern, so I created an additional column where I map the values to other values so I can group them.
Now before I run my statistics, I need to check if I have to remap these values which means that I have to check if there are null values in that column.
I can do a select like this:
select distinct 1
from my-table t
where t.status_rd is not null
;
The disadvantage is, that this returns exactly one row, but it has to perform a full select. Is there some way that I can stop the select for the first encounter? I'm not interested in the exact row, because when there is at least one row, I have to run an update on all of them anyway, but I would like to avoid running the update unnecessarily everytime.
In Oracle I would do it with rownum, but this doesn't exist in SQLite
select 1
from my-table t
where t.status_rd is not null
and rownum <= 1
;
Use LIMIT 1 to select the first row returned:
SELECT 1
FROM my_table t
WHERE t.status_rd IS NULL
LIMIT 1
Note: I changed the where clause from IS NOT NULL to IS NULL based on your problem description. This may or may not be correct.