I have the below tables X and y table:
X table
column : Address
values : "ABCDHongKidoJapan589068".
I will be having 100 records like this
Y table
column :Name
Values :Hong
Timm
Agent
Lane.
I want to search if any of the Y values present in X, if present that values of Y will be stored in a different column name as Addr_nm in X table.
Can we achieve this in Hive. ?
I tried in python, but I have the requirement to do in the hive.
Related
A numeric column should be extended to hold multiple values, i.e. reference some different entity. SQL only (Postgres specifically if no standard solution available).
Schema now:
Table X with columns ID, VAL, STUFF
Table Y with columns ID, VAL1, VAL2
What I want to achieve:
Table X with columns ID, YID, STUFF
Table Y won't be altered (neither existing data touched)
Table Y gets inserts for all rows of table X where X.VAL should be inserted as Y.VAL1. Y.ID auto-incremented, Y.VAL2 may remain NULL. Table X should then be updated to hold Y's ID as foreign key X.YID instead of the actual value X.VAL that is now stored in Y.VAL1.
Somehow I think it has to be possible to achieve that with a clean SQL-only solution. What I've found so far:
create some PG/SQL script: just loop over table X, insert the stuff to table Y row by row returning the ID and updating table X
plain SQL: get the number of entries in table Y, INSERT INTO Y with SELECT FROM X ... ORDER BY ID, INSERT INTO X with SELECT FROM Y ... skipping the number of entries that have been there before so the order should remain stable. I really don't like that solution. Sounds dirty to me.
Any suggestions? Or better go with PG/SQL?
TIA
There is a third option: a single SQL statement. Postgres allows DML within a CTE. So create a CTE that performs the insert and returns the generated id. Use the returned id in the main query which updates the original table. This then does what you are looking for in a single SQL statement.
with cte as
( insert into y(val1)
select val
from x
returning y.id, y.val1
)
update x
set val = cte.id
from cte
where x.val = cte.val1;
Then assuming you want to maintain referential integrity:
alter table x
add constraint x2y_fk
foreign key (val)
references y(id) ;
See Demo: Note: The demo copies both val and stuff from table x into table y. This was strictly for demonstration purposes and is not necessary.
I am using SQL Server 2008. Assume I have a table structured like this (the below denotes the column names / types):
ID - Int (Primary Key) [This is set to autoincrement]
X - varchar(MAX)
Y - varchar(MAX)
Z - varchar(MAX)
I am trying to build a generic query that based on the a value of one of the columns, the query can copy the values of existing records into a new record, change some column values and naturally increment the ID. Consider the following pseudo code (I couldn't think of a way to distill this otherwise :/) :
1)
SELECT *
FROM TABLE
WHERE X = "Hello"
INTO TEMPTABLE
2)
COPY RESULTS
FROM TEMPTABLE
INTO TABLE
SET X = "HI HI HI" AND SET Y "HI!"
The expected result from the above would be the result set from 1) feeding the record creation in 2). The value of Z should be naturally copied over since we aren't setting it otherwise.
INSERT INTO foobar (x, y, z)
SELECT 'HI HI HI', 'HI!', z FROM foobar WHERE x = 'Hello'
What would the script be if I wanted to update column b with different values for current values of column A? So in column values currently have different values and we need the final result to be as per below.
P.e.
Column(A) Column(B)
0440 A
04470 A
045102 A
030532 B
03580 B
03240 C
etc.
Thank you.
I have a database full of log entries that looks like this:
CREATE TABLE event_log (
id INTEGER PRIMARY KEY AUTOINCREMENT,
timestamp DATETIME DEFAULT CURRENT_TIMESTAMP,
message VARCHAR(1024),
sent INTEGER DEFAULT 0
);
How would you go about limiting the size of this database like in this pseudocode with x = row count to insert and y = an arbitrary number of rows to limit the table to.
On inserting x number of rows:
{
if (total rows in table + x > y)
{
remove x number of rows form the start of the table (i.e. they have lowest id numbers)
}
insert the new rows at the end of the table
}
i.e. limiting the table to y number of rows max.
Many thanks!
I believe you should be able to achieve what you try using a trigger. How you do that would depend on the database you are using. You can have a look at the trigger page in wikipedia.
Suppose you have two tables in PostgreSQL. Table A has field x, which is of type character varying and has a lot of duplicates. Table B has fields y, z, and w. y is a serial column, z has the same type as x, and w is an integer.
If I issue this query:
INSERT INTO B
SELECT DISTINCT ______, A.x, COUNT(A.x)
FROM A
WHERE x IS NOT NULL
GROUP BY x;
I get an error regardless of what I have in ______. I've even gotten as exotic as CAST(NULL as INTEGER), but that just gives me this error:
a null value in column "id" violates not-null constraint
Is there a simple solution?
You are allowed and even encouraged to specify your columns when using INSERT (and you really should always specify the columns):
insert into b (z, w)
select x, count(x)
from a
where x is not null
group by x
And I don't see the point of distinct when you're already grouping by x so I dropped that; I also dropped the column prefixes since they aren't needed and just add noise to the SQL.
If you don't specify a column when using INSERT, you get the default value and that will give you the sequence value that you're looking for.