case insensitive comparison in hive - hive

I have a requirement where I need to do case-insensitive joins across the system and I don't wish to apply upper/lower functions.
I tried setting TBLPROPERTIES('serialization.encoding'='utf8mb4_unicode_ci') at Table level but still the comparison is happening considering case sensitivity. PSB -
drop table test.caseI;
create table test.caseI
(name string, id int)
TBLPROPERTIES('serialization.encoding'='utf8mb4_unicode_ci');
insert into test.caseI values ('hj',1);
drop table test.caseI_2;
create table test.caseI_2
(name string, id int)
TBLPROPERTIES('serialization.encoding'='utf8mb4_unicode_ci');
insert into test.caseI_2 values ('HJ',1);
select * from test.caseI i
inner join test.caseI_2 i2 on i.name=i2.name;
--No Result
Tried with encoding 'SQL_Latin1_General_CP1_CI_AI' but got same result as above.
Any help would be appreciated, thanks!

Related

Matrix table index SQL Server 2008

I have a table with two columns built from another table of names, one identity and one a name like this:
ID---Name
1----Mike
2----Jeff
3----Robert
...down to however many
Could be 10 rows, could be 100. This will vary depending on input from other tables that are always changing but never be over 160 or so.
Now, pairings of names will have some meaning and thus a decimal data type score will be associated with said pairing (how at this point doesn’t matter, just need to build it for now...numbers just illustrative). I envision a matrix kind of like this:
ID------Name------Mike-------Jeff--------Robert-------- ...out to however many
1 -------Mike-------NULL------100.1------5.4-------- ...out to however many
2 -------Jeff---------100.1------NULL-----21.23--------- ...out to however many
3 ------Robert-------5.4--------21.23-----NULL---------...out to however many
…down to however many happen to be in the first table…
Maybe this isn’t quite the most optimal way to go (Yes, I know there are duplicates in the table but I plan to structure the queries such that the duplicates are ignored) but at this point am not aware of many viable options. After searching around, I thought maybe I wanted a pivot but that doesn’t seem to fit what I have here because I’m leaving the names in the column and associating them as column heads for a paired score. Then I thought maybe I wanted to store a variable as the value of each row and then add them as the columns. That was no help. My latest iteration was maybe creating a temp table as an exact copy with and identity column, then trying to select the specific name by the identity and looping through them but I can’t even seem to grab the first name and make it a column name in addition to a row value under the name column...see below
--create a table of names with an identity column
CREATE TABLE myTable2
(
ID INT IDENTITY(1,1),
Name VARCHAR(5),
);
--add names to the table from a different table
INSERT INTO myTable1 (Name)
SELECT Name
FROM myTable1
--create a temp table with the same values
SELECT ID, Name
INTO #new
FROM myTable2
GROUP BY ID, Name
--insert name from first row as a column head
INSERT INTO myTable2 (SELECT Number FROM #new WHERE ID =1)
So, in the last bit there, INSERT INTO”, I want to copy the names, in this instance “Mike” and make it ALSO a column head in the same table where it is a row (like in my second table). I get an error message that the syntax is not correct for the statement. Why isn’t this allowed? How can I get it to do what I want? It also has been suggested by someone that knows way more about this stuff than me, that maybe instead of building the table as a matrix, build it as below. It is possible here to get rid of the duplicates this way and I would except I have no idea where to even begin doing this…
Name1-----------Name2-----------Calculated Value
Mike--------------Mike-------------NULL
Jeff---------------Mike-------------100.1
Robert-------------Mike-------------5.4
Mike--------------Jeff-------------100.1
Jeff----------------Jeff-------------NULL
Robert------------Jeff-------------21.23
Mike--------------Robert-----------5.4
Jeff---------------Robert-----------21.23
Robert------------Robert-----------NULL
...etc
Any help suggestions or pointing of me in the right and most appropriate direction would be greatly appreciated!
EDIT: Here's how I solved my problem. Looks like the Cartesian product was the way to go. Thanks #Alex Kudryashev
--create a table of cross joined names
CREATE TABLE cartNames
(
Name1 VARCHAR(5),
Name2 VARCHAR(5),
);
--create two temporary tables from a source table of names
SELECT Name AS Name1
INTO #name1
FROM names
GROUP BY Name
SELECT Name AS Name2
INTO #Name2
FROM names
GROUP BY Name
--populate the Cartesian table
INSERT INTO cartNames
SELECT * FROM #name1 CROSS JOIN #name2
--get rid of the temp tables
DROP TABLE #Name1
DROP TABLE #Name2
--add columns and populate calculated scores
---
It looks like you want to create a Cartesian Product. There is very easy way to do so.
declare #tbl table(name varchar(10))
insert #tbl(name) values('MIke'),('Jeff'),('Robert')
select t1.name name1,t2.name name2, some_udf(t1.name,t2.name) calc_value
from #tbl t1 cross join #tbl t2

SQL Server - Select INTO statement stored in sys.tables

I know how to find the CREATE statement for a table in SQL Server but is there any place that stores the actual SQL code if I use SELECT INTO ... to create a table and if so how do I access it?
I see two ways of creating tables with SELECT INTO.
First: You know the Schema, then you can declare a #Table Variable and perform the Select INSERT
Second: You can create a temp table:
SELECT * INTO #TempTable FROM Customer
There are some limitations on the second choice:
- You need to drop the temp table afterwards.
- If there is a VARCHAR Column and the maximum number of characters of that given SELECT is 123 characters (example), and then you try to insert into the TEMP table afterwards with a greater number of characters, it will throw an error.
My recommendation is always declare a table in order to use, it makes it clear what is the intentions and increases readability.

cannot insert value NULL into column error shows wrong column name

I've added a new column(NewValue) to my table which holds an int and allows nulls. Now I want to update the column but my insert statement only attempts to update the first column in the table not the one I specified.
I basically start with a temp table that I put my initial data into and it has two columns like this:
create table #tempTable
(
OldValue int,
NewValue int
)
I then do an insert into that table and based on the information NewValue can be null.
Example data in #tempTable:
OldValue NewValue
-------- --------
34556 8765432
34557 7654321
34558 null
Once that's complete I planned to insert NewValue into the primary table like so:
insert into myPrimaryTable(NewValue)
select tt.NewValue from #tempTable tt
left join myPrimaryTable mpt on mpt.Id = tt.OldValue
where tt.NewValue is not null
I only want the NewValue to insert into rows in myPrimaryTable where the Id matches the OldValue. However when I try to execute this code I get the following error:
Cannot insert the value NULL into column 'myCode', table 'myPrimaryTable'; column does not allow nulls. INSERT fails.
But I'm not trying to insert into 'myCode', I specified 'NewValue' as the column but it doesn't seem to see it. I've checked NewValue and it is set to allow int and is set to allow null and it does exist on the right table in the right database. The column 'myCode' is actually the second column in the table. Could someone please point me in the right direction with this error?
Thanks in advance.
INSERT always creates new rows, it never modifies existing rows. If you skip specifying a value for a column in an INSERT and that column has no DEFAULT bound to it and is not identity, that column will be NULL in the new row--thus your error. I believe you might be looking for an UPDATE instead of an INSERT.
Here's a potential query that might work for you:
UPDATE mpt
SET
mpt.NewValue = tt.NewValue
FROM
myPrimaryTable mpt
INNER JOIN #tempTable tt
ON mpt.Id = tt.OldValue -- really?
WHERE
tt.NewValue IS NOT NULL;
Note that I changed it to an INNER JOIN. A LEFT JOIN is clearly incorrect since you are filtering #tempTable for only rows with values, and don't want to update mpt where there is no match to tt--so LEFT JOIN expresses the wrong logical join type.
I put "really?" as a comment on the ON clause since I was wondering if OldValue is really an Id. It probably is--you know your table best. It just raised a mild red flag in my mind to see an Id column being compared to a column that does not have Id in its name (so if it is correct, I would suggest OldId as a better column choice than OldValue).
Also, I recommend that you never name a column just Id again--column names should be the same in every table in the database. Also, when it comes join time you will be more likely to make mistakes when your columns from different tables can coincide. It is much better to follow the format of SomethingId in the Something table, instead of just Id. Correspondingly, the suggested old column name would be OldSomethingId.

Compare data between two tables with in single database

My requirement to compare two table data with in one database and stored the uncommon data in separate table named relation data within same database.
How to compare this tables data?
To compare is their any tools and can we stored uncommon data in separately table using any tool?
i forgot to tell one thing that two tables having same data but different column names that means for example first table having 20 columns and 2 and table having 50 columns but in that 4 columns are matched data with different number of rows and different column names in each table.based on these columns data matching i need to find rows and stored into another table
As an alternative to writing a SQL script, you could copy the entire results from both tables to a .csv file and then use win merge to compare the two:
http://winmerge.org/downloads/
I have used this technique in the past when comparing mass amounts of data and it has worked quite well.
This can be accomplished in t-sql with not a lot of effort. However in your question you were asking for a tool to accomplish this. If you are simply looking to purchase a tool to do this, at my job, we use the Redgate tools for deploying code from test to production, and I believe if you were a little creative you could get the SQL Data Compare Tool to do what you are asking for.
If you select and compare these two tables, it will generate a change script from one to the other. If you only take the changes from one, save off the script, then come back, click on the arrow and take only the changes from the source the other way, you should have the uncommon attributes.
Try this query, I think its work
insert into relational(r1,r2,r3,....rn)
(select s1,s2,s3,...sn from
information info where info.informationcity not in (select customercity from customer)
and info.informationstate not in (select customerstate from customer) )
Assuming you both tables have the same structure
Quick and dirty?
;WITH cte AS (
SELECT 1 AS OriginTable, *
FROM OriginTable1
UNION SELECT 2 AS OriginTable, *
FROM OriginTable2
)
SELECT {put here the list of all your columns}
INTO [YourDeltaTable]
FROM cte
GROUP BY {put here the list of all your columns}
HAVING COUNT(*) = 1
You can use the following query for inserting data into target table by retrieving data from multiple tables
insert into TargetTable(list_of_columns)
(select list of columns from
Table1 t1 join Table2 t2
on (t1.common_column != t2.common_column))
The src column list count and target column list count should be equal
Here's a simple example that assumes your table structures are the same
DECLARE #a table (
val char(1)
);
DECLARE #b table (
val char(1)
);
INSERT INTO #a (val)
VALUES ('A'), ('B'), ('C');
INSERT INTO #b (val)
VALUES ('B'), ('C'), ('D'), ('E');
DECLARE #mismatches table (
val char(1)
);
INSERT INTO #mismatches (val)
SELECT val -- All those from #a
FROM #a
EXCEPT -- Where not in #b
SELECT val
FROM #b;
INSERT INTO #mismatches (val)
SELECT val -- All those from #a
FROM #b
EXCEPT -- Where not in #b
SELECT val
FROM #a;
SELECT *
FROM #mismatches

Changing Column Datatype After Data Insertion

I'm currently trying to localize a database, and my strategy involves taking all localizable strings out of my various tables, and putting them into another table containing a StringID, a CultureID and the LocalizedString, which is then referenced within the original table by the StringID. The problem is that I need to change the datatype of the column containing the string from a varchar to an int and replace the string with its reference to the LocalizedStrings table.
I've already taken all my strings from the table and created entries in the LocalizedStrings table at this point using an INSERT INTO query. And my current efforts to solve my problem look like this:
SELECT column1, column2, ...
INTO TempTable
FROM OriginalTable
INNER JOIN LocalizedStrings
ON OriginalTable.StringColumn = LocalizedStrings.LocalizedString
ALTER TABLE OriginalTable
DROP COLUMN StringColumn
ALTER TABLE OriginalTable
ADD NameStringID int
INSERT INTO OriginalTable (NameStringID)
SELECT StringID FROM TempTable
DROP TABLE TempTable
However due to various nightmarish dependencies, I'm getting all kinds of exceptions trying to do this.
My question is, is there an easier way? I'd also considered just adding the new column and leaving the old one as a temporary workaround, but that's pretty messy.
ALTER TABLE OriginalTable
ADD NameStringID int
update OT
set NameStringID = LS.NameStringID
from OriginalTable OT
join LocalizedStrings LS on ls.StringColumn = OT.LocalizedString
You will need to repeat this process for every child table if they also used the StringColumn.
You will also need to adjust all stored procedures, queries, ORM mappings to use the new colulm.
Then when all have been changed, run
ALTER TABLE OriginalTable
DROP COLUMN StringColumn
And of course dropp the column onthe child tables too if need be.
If you know that all of your column contains integer values, what you can do is cast the column to integer, and create another one on the fly. Not sure if I am understand you correctly, but something similar to the following:
declare #test table(id varchar(50),name varchar(50))
insert into #test
select '1','Test 1'
insert into #test
select '2','Test 2'
select *, cast(id as int) as ConvertedToInt into #Result from #test
select * from #Result
drop table #Result