How to retrieve ambiguous columns from a hive table using subquery?

How to retrieve ambiguous columns from a hive table using subquery? - hive

Main table:
CREATE EXTERNAL TABLE user(language STRING,snapshot_time STRING,products STRUCT<id:STRING,name:STRING>,item STRUCT<quantity:ARRAY<STRUCT<name:STRING>>>)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
STORED AS TEXTFILE
LOCATION '/user/input/sample';
I'm trying to insert ambiguous column names "product.name","A.name" into user_prod_info table. Since, the column names are same, I'm facing Ambiguous column reference text in q error.
Insert command:
INSERT OVERWRITE TABLE user_prod_info
SELECT q.* FROM (
SELECT row_number() OVER (PARTITION BY products.id ORDER BY snapshot_time DESC) AS temp_row_num,
language,
snapshot_time,
products.id,
products.name,
A.name
FROM user as raw
LATERAL VIEW EXPLODE(item.quantity) quantity as A
) q WHERE temp_row_num == 1;
This command is unable to retrieve the field from the specific table because we have two "name" fields. one is in "products" and the other is in "A".
I tried creating alias for "A.name as name1". I'm able to insert the data without errors. But, one record is storing in 3 rows with some nulls in it.
I got stuck over here. Can anyone please help me out regarding this...

Related

Max source dates from all joined tables during data warehouse incremental build

I have a data warehouse query that builds a fact table by joining 14 source tables. Each source table has a source_timestamp field to indicate the time the record was inserted or updated. I need to pull the max source_timestamp for each row of the query result from each of the 14 source tables. This will allow me to know the max update date for each row of the fact table.
I wanted to do something like this for the last field in the query...
(
SELECT MAX(Source_Timestamp)
FROM (
VALUES a.source_timestamp, b.source_timestamp, c.source_timestamp, ...
) AS UpdateDate(Source_Timestamp)
) AS LastUpdateDate
However, I get an incorrect syntax error because the subquery doesn't know a., b., or c. in the query context. I was hoping the VALUES clause would help me out but apparently not.
Any ideas on how to accomplish this?

It was my fault for not being more careful with the coding. I should've guessed from the fact that it was a syntax error. I needed to enclose each of the items in the VALUES clause in () like:
(
SELECT MAX(Source_Timestamp)
FROM (
VALUES (a.source_timestamp), (b.source_timestamp), (c.source_timestamp),(...)
) AS UpdateDate(Source_Timestamp)
) AS LastUpdateDate

Get actual target table insert count

I'm inserting data into hive external table in append mode. Every time I insert some records in a table, I want to get the count of actual records which are inserted into the hive external table. Is there any way I could find this information in any hive log file?

There can be workaround for this. Not sure about any hive property for this.
Have an additional timestamp column in your table.
Do self join on table on timestamp column.
count the latest records inserted into table. You can check below sample query:-
SELECT count(1) from (
SELECT tbl_alias.* FROM test_table tbl_alias JOIN
( select max(timestamp_date) as max_timestamp_date FROM test_table) max_timestamp_date_table ON
tbl_alias.timestamp_date=max_timestamp_date_table.max_timestamp_date ) outer_table;

SQL how to create and import only specific columns from table A to a new table, table B

In Table A i have many fields like referenceid, amount, timestamp, remarks, status, balancebefore, balanceafter, frmsisdn, tomsisdn, id etc etc
I want to create a new table, Table B based of Table A(with column names, datatypes etc etc) but i only need specific columns that are in table A.
I tried select * into TableB from TableA where 1 = 2 but it says ORA-00905: missing keyword. I am using TOAD.
thank you

In Oracle, the correct syntax is create table as. SELECT INTO is used primarily in SQL Server and Sybase.
create table tableb as
select . . .
from tableA;
Only include the where clause if you don't actually want to insert any rows.

In MySQL the syntax is the same as Oracle's (see here).
Notice that the new table does not contain any constraints from the original table (indexes, keys, etc.)

Validate Data in SQL Server Table

I am trying to validate the data present in SQL Server table using a stored procedure.
In one of the validation rules, i have to check whether the value of a particular column is present in another table.
Suppose i have a staging table with following columns Cat_ID, Amount, SRC_CDE
I have a 'maintable' with following columns CatID , Cat_Name
I have to validate whether the Cat_ID present in staging table exists in the 'maintable' for each row
I am using the following statement to validate
if((Select count(*) from maintable where CatID= #Cat_id) >0 )
-- Do something if data present
I want to know if there is any better way of doing the above thing other than using a select query for every row.
Can i use some sort of an array where i can fetch all the CatID from maintable and the check instead of using a select query.
Thanks

Using a left join to list all the invalid rows.
select
staging.*
from
staging
left join maintable
on staging.catid=maintable.catid
where maintable.catid is null

SQL: I need to take two fields I get as a result of a SELECT COUNT statement and populate a temp table with them

So I have a table which has a bunch of information and a bunch of records. But there will be one field in particular I care about, in this case #BegAttField# where only a subset of records have it populated. Many of them have the same value as one another as well.
What I need to do is get a count (minus 1) of all duplicates, then populate the first record in the bunch with that count value in a new field. I have another field I call BegProd that will match #BegAttField# for each "first" record.
I'm just stuck as to how to make this happen. I may have been on the right path, but who knows. The SELECT statement gets me two fields and as many records as their are unique #BegAttField#'s. But once I have them, I haven't been able to work with them.
Here's my whole set of code, trying to use a temporary table and SELECT INTO to try and populate it. (Note: the fields with # around the names are variables for this 3rd party app)
CREATE TABLE #temp (AttCount int, BegProd varchar(255))
SELECT COUNT(d.[#BegAttField#])-1 AS AttCount, d.[#BegAttField#] AS BegProd
INTO [#temp] FROM [Document] d
WHERE d.[#BegAttField#] IS NOT NULL GROUP BY [#BegAttField#]
UPDATE [Document] d SET d.[#NumAttach#] =
SELECT t.[AttCount] FROM [#temp] t INNER JOIN [Document] d1
WHERE t.[BegProd] = d1.[#BegAttField#]
DROP TABLE #temp
Unfortunately I'm running this script through a 3rd party database application that uses SQL as its back-end. So the errors I get are simply: "There is already an object named '#temp' in the database. Incorrect syntax near the keyword 'WHERE'. "

Comment out the CREATE TABLE statement. The SELECT INTO creates that #temp table.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to retrieve ambiguous columns from a hive table using subquery? - hive

Related

Max source dates from all joined tables during data warehouse incremental build

Get actual target table insert count

SQL how to create and import only specific columns from table A to a new table, table B

Validate Data in SQL Server Table

SQL: I need to take two fields I get as a result of a SELECT COUNT statement and populate a temp table with them

Categories

Resources