Compare one value of column A with all the values of column B in Hive HQL

Compare one value of column A with all the values of column B in Hive HQL - hive

I have two columns in one table say Column A and Column B. I need to search each value of Column A with All the values of column B each and every time and return true if the column A value is found in any of the rows of column B. How can i get this?
I have tried using the below command:
select column _A, column_B,(if (column_A =column_B), True, False) as test from sample;
If i use the above command, it is checking for that particular row alone. But I need true value, if a value of column A is found in any of the rows of column B.
How can i can check one value of column A with all the all the values of column B?
Or Is there any possibility to iterate and compare each value between two columns?

Solution
create temporary table t as select rand() as id, column_A, column_B from sample; --> Refer 1
select distinct t3.id,t3.column_A,t3.column_B,t3.match from ( --> Refer 3
select t1.id as id, t1.column_A as column_A, t1.column_B as column_B,--> Refer 2
if(t2.column_B is null, False, True) as match from t t1 LEFT OUTER JOIN
t t2 ON t1.column_A = t2.column_B
) t3;
Explanation
Create an identifier column to keep track of the rows in original table. I am using rand() here. We will take advantage of this to get the original rows in Step 3. Creating a temporary table t here for simplicity in next steps.
Use a LEFT OUTER JOIN with self to do your test that requires matching each column with another across all rows, yielding the match column. Note that here multiple duplicate rows may get created than in Sample table, but we have got a handle on the duplicates, since the id column for them will be same.
In this step, we apply distinct to get the original rows as in Sample table. You can then ditch the id column.
Notes
Self joins are costly in terms of performance, but this is unavoidable for solution to the question.
The distinct used in Step 3, is costly too. A more performant approach would be to use Window functions where we can partition by the id and pick the first row in the window. You can explore that.

You can do a left join to itself and check if the column key is null. If it is null, then that value is not found in the other table. Use if or "case when" function to check if it is null or not.
Select t1.column_A,
t1.column_B,
IF(t2.column_B is null, 'False', 'True') as test
from Sample t1
Left Join Sample t2
On t1.column_A = t2.column_B;

Related

Store only 1 values and remove the rest for same duplicated values in bigquery

I have duplicated values in my data. However, from the duplicated values, i only want to store 1 values and remove the rest of same duplicated values.
So far, I have found the solution where they remove ALL the duplicated values like this.
Code:
SELECT ID, a.date as date.A, b.date as date.B,
CASE WHEN a.date <> b.date THEN NULL END AS b.date
except(date.A)
FROM
table1 a LEFT JOIN table2 b
USING (ID)
WHERE date.A = 1
Sample input:
Sample output (Store only 1 values from the duplicated values and remove the rest):
NOTE: query might wrong as it remove all duplicated values.

Considering your screenshot's sample data and your explanation. I understand that you want to remove duplicates from your table retaining only one row of unique data. Thus, I was able to create a query to select only one row of data ignoring the duplicates.
In order to select the rows without any duplicates, you can use SELECT DISTINCT. According to the documentation, it discards any duplicate rows. In addition to this method, CREATE TABLE statement will also be used to create a new table (or replace the previous one) with the new data without duplicates. The syntax is as follows:
CREATE OR REPLACE TABLE project_id.dataset.table AS
SELECT DISTINCT ID, a.date as date.A, b.date as date.B,
CASE WHEN a.date <> b.date THEN NULL END AS b.date
except(date.A)
FROM
table1 a LEFT JOIN table2 b
USING (ID)
WHERE date.A = 1
And the output will be exactly the same as you shared in your question.
Notice that I used CREATE OR REPLACE, which means if you set project_id.dataset.table to the same path as the table within your select, it will replace your current table (in case you have the data coming from one unique table). Otherwise, it will create a new table with the specified new table's name.

You can use aggregation. Something like this:
SELECT ANY_VALUE(a).*, ANY_VALUE(b).*
FROM table1 a LEFT JOIN
table2 b
USING (ID)
WHERE date.A = 1
GROUP BY id, a.date;
For each id/datecombination, this returns an arbitrary matching row froma/b`.

SQL Server ISNULL multiple columns

I have the following query which works great but how do I add multiple columns in its select statement? Following is the query:
SELECT ISNULL(
(SELECT DISTINCT a.DatasourceID
FROM [Table1] a
WHERE a.DatasourceID = 5 AND a.AgencyID = 4 AND a.AccountingMonth = 201907), NULL) TEST
So currently I only get one column (TEST) but would like to add other columns such as DataSourceID, AgencyID and AccountingMonth.

If you want to output a row for some condition (or requested values ) and output a row when it does not meet condition,
you can set a pseudo table for your requested values in the FROM clause and make a left outer join with your Table1.
SELECT ISNULL(Table1.DatasourceId, 999999),
Table1.AgencyId,
Table1.AccountingMonth,
COUNT(*) as count
FROM ( VALUES (5, 4, 201907 ),
(6, 4, 201907 ))
AS requested(DatasourceId, AgencyId, AccountingMonth)
LEFT OUTER JOIN Table1 ON requested.agencyid=Table1.AgencyId
AND requested.datasourceid = Table1.DatasourceId
AND requested.AccountingMonth = Table1.AccountingMonth
GROUP BY Table1.DatasourceId, Table1.AgencyId, Table1.AccountingMonth
Note that:
I have put a ISNULL for the first column like you did to output a particular value (9999) when no value is found.
I did not put the ISNULL(...,NULL) like your query in the other columns since IMHO it is not necessary: if there is no value, a null will be output anyway.
I added a COUNT(*) column to illustrate an aggregate, you could use another (SUM, MIN, MAX) or none if you do not need it.
The set of requested values is provided as a constant table values (see https://learn.microsoft.com/en-us/sql/t-sql/queries/table-value-constructor-transact-sql?view=sql-server-2017)
I have added multiple rows for requested conditions : you can request for multiple datasources, agencies or months in one query with one line for each in the output.
If you want only one row, put only one row in "requested" pseudo table values.
There must be a GROUP BY, even if you do not want to use an aggregate (count, sum or other) in order to have the same behavior as your distinct clause , it restricts the output to single lines for requested values.

To me it seems that you want to see does data exists, i guess that your's AgencyID is foreign key to agency table, DataSourceID also to DataSource, and that you have AccountingMonth table which has all accounting periods:
SELECT ds.ID as DataSourceID , ag.ID as AgencyID , am.ID as AccountingMonth ,
ISNULL(COUNT(a.*),0) as Count
FROM [Table1] a
RIGHT JOIN [Datasource] ds ON ds.ID = a.DataSourceID
RIGHT JOIN [Agency] ag ON ag.ID = a.AgencyID
RIGHT JOIN [AccountingMonth] am on am.ID = a.AccountingMonth
GROUP BY ds.ID, ag.ID, am.ID
In this way you can see count of records per group by criteria. Notice RIGHT join, you must use RIGHT JOIN if you want to include all record from "Right" table.
In yours query you have DISTINCT a.DatasourceID and WHERE a.DatasourceID = 5 and it returns 5 if in table exists rows that match yours WHERE criteria, and returns null if there is no data. If you remove WHERE a.DatasourceID = 5 your query would break with error: subquery returned multiple rows.

the way you are doing only allows for one column and one record and giving it the name of test. It does not look like you really need to test for null. because you are returning null so that does nothing to help you. Remove all the null testing and return a full recordset distinct will also limit your returns to 1 record. When working with a single table you don't need an alias, if there are no spaces or keywords braced identifiers not required. if you need to see if you have an empty record set, test for it in the calling program.
SELECT DatasourceID, AgencyID,AccountingMonth
FROM Table1
WHERE DatasourceID = 5 AND AgencyID = 4 AND AccountingMonth = 201907

SQL: How to update an empty column with pre-defined set of values

I have a table with, let's say, 100 records. The table has two columns. The first column (A) has unique values. The second column (B) has NULL values
For 4 elements from column A I'd like to associate some earlier defined values, and they are unique as well.
I don't care about which value from column B will be associated with the value from column A. I'd like to associate 4 unique values with another 4 unique values. Basically, like I'd cut and paste a block of values from one column to another in excel.
How can I do it without using cursors?
I'd like to use one Update statement for ALL rows instead one Update statement for EVERY row as I do now.

Try this:
UPDATE t
SET ColumnB = BValue
FROM Table t
INNER JOIN
(
SELECT 1 AValue, 'Mouse' BValue UNION
SELECT 2, 'Cat' UNION
SELECT 3, 'Dog' UNION
SELECT 4, 'Wolf'
) PreDefined ON(t.ColumnA = PreDefined.AValue)
Use any number you want in the 'PreDefined' table, as long as they are unique and within the range of values in columnA of your original table.

If you are only trying to fill a table for testing purposes, I guess you could:
A) Use the value from Column A itself (as it is already unique).
B) If they are to be different, use some function on the column A's value to obtain a column B value (something simple, like (ColumnA * 10), and this would give youA)
C) Create a temp table with a "dictionary" setting a B value for each possible A value, and then update the rows desired on your table looking up from values on this dictionary table.
Anyway, if you explain a little further your purpose it will be easier to try suggesting you a solution.

if your animal data is already in a database table, then you can use a single update statement like this:
update target_table t4
set columnb = (
select animal_name
from (select columna, animal_name
from (select rownum rowNumber, animal_name from animal_table) t1
join (select rownum rowNumber, columna from target_table t1 where columnb is null) t2
on t1.rowNumber = t2.rowNumber
) t3
where t4.columna = t3.columna
)
;
this works by selecting a sequence number and animal name from the source table, then selecting a sequence number and columna value from your target table. by joining those records on the sequence number you guarantee you get exactly 1 animal name for each columna value. you can then join those columna-to-animal records to your target table to do an update of columnb.
for more background on updating one table from values in another, you might consider the solutions presented here: Update rows in one table with data from another table based on one column in each being equal. the only difference is that in your example, you do not have any column that matches between your target table and your animal names table, so you need to use the rownum to create an arbitrary 1-to-1 matching of records.
if your unique options are in a text file or spreadsheet, then you can format them into a fixed-width space-padded string and pick the one you want using the rownum index like so:
update table_name
set columnb = trim(substr('mouse cat dog wolf ', rownum*6-6, 6))
where columnb is null;

Oracle Compare data between two different table

I have two table one is having all field VARCHAR2 but other having different type for different data.
For Example :
Table One
==========================
Col 1 VARCHAR2 UNIQUE KEY
Col 2 VARCHAR2
Col 3 VARCHAR2
===========================
Table Two
==========================
Col One VARCHAR2 UNIQUE KEY
Col Two TIMESTAMP
Col Three NUMBER
==========================
we are having one mapping table. it denotes which column of Table One has to compare with which column of Table Two.
For Example
Mapping Table
==============================
Table One Table Two
==============================
Col 1 Col One
Col 2 Col Three
Col 3 Col Two
==============================
Now with the help of UNIQUE KEY of TABLE ONE we have to find same row in TABLE TWO and compare rows column by column and get changes in data.
Currently we are using java program for comparing data row by row and column by column and getting changes between data in rows with same UNIQUE KEY. it is working fine but taking too much time as we are having 100000 records in DB.
Now my question is : is there any way i can compare data at SQL level and get changes in data?

You can do it 'manually' with a query like this: It's a lot of work, but there are only three different types of checks you need to do, so it's not very complex:
select
*
from
Table1 t1
full outer join Table2 t2 on t2.ID = t1.ID
where
-- Check ID, either record does not exist in either table.
t1.ID is null or
t2.ID = null or
-- Not nullable field can be easily compared.
t1.NotNullableField1 <> t2.NotNUllableField1 or
-- Nullable field is slightly more work.
t1.NullableField1 <> t2.NullableField1 or
(t1.NullableField1 is null and t2.NullableField1 is not null) or
(t1.NullableField1 is not null and t2.NullableField1 is null)
Another solution is to use MINUS, which is a bit like UNION, only it returns a dataset minus the records in a second dataset:
select * from Table1 t1
MINUS
select * from Table2 t2
This works only one way (which might be fine for your purpose), but you can also combine it with UNION to make it bidirectional.
select
*
from
( select * from Table1
MINUS
select * from Table2)
UNION ALL
( select * from Table2
MINUS
select * from Table1)
The output of both solutions is a bit different.
In the FULL OUTER JOIN query, the IDs will be joined and the values of the matching rows will be displayed next to each other as a single row.
In the MINUS query, the result will be presented as a single dataset. If a record does not exist in either one table, it will be displayed. If a record (ID) exists in both tables, but other fields are different, you will get both rows. So it's a bit harder to compare them.
See: http://www.techonthenet.com/oracle/minus.php

SQL statement to return data from a table in an other sight

How would the SQL statement look like to return the bottom result from the upper table?
The last letter from the key should be removed. It stands for the language. EXP column should be split into 5 columns with the language prefix and the right value.
I'm weak at writing more or less difficult SQL statements so any help would be appreciated!

The Microsoft Access equivalent of a PIVOT in SQL Server is known as a CROSSTAB. The following query will work for Microsoft Access 2010.
TRANSFORM First(table1.Exp) AS FirstOfEXP
SELECT Left([KEY],Len([KEY])-2) AS [XKEY]
FROM table1
GROUP BY Left([KEY],Len([KEY])-2)
PIVOT Right([KEY],1);
Access will throw a circular field reference error if you try to name the row heading with KEY since that is also the name of the original table field that you are deriving it from. If you do not want XKEY as the field name, then you would need to break apart the above query into two separate queries as shown below:
qsel_table1:
SELECT Left([KEY],Len([KEY])-2) AS XKEY
, Right([KEY],1) AS [Language]
, Table1.Exp
FROM Table1
ORDER BY Left([KEY],Len([KEY])-2), Right([KEY],1);
qsel_table1_Crosstab:
TRANSFORM First(qsel_table1.Exp) AS FirstOfEXP
SELECT qsel_table1.XKEY AS [KEY]
FROM qsel_table1
GROUP BY qsel_table1.XKEY
PIVOT qsel_table1.Language;
In order to always output all language columns regardless of whether there is a value or not, you need to spike of those values into a separate table. That table will then supply the row and column values for the crosstab and the original table will supply the value expression. Using the two query solution above we would instead need to do the following:
table2:
This is a new table with a BASE_KEY TEXT*255 column and a LANG TEXT*1 column. Together these two columns will define the primary key. Populate this table with the following rows:
"AbstractItemNumberReportController.SelectPositionen", "D"
"AbstractItemNumberReportController.SelectPositionen", "E"
"AbstractItemNumberReportController.SelectPositionen", "F"
"AbstractItemNumberReportController.SelectPositionen", "I"
"AbstractItemNumberReportController.SelectPositionen", "X"
qsel_table1:
This query remains unchanged.
qsel_table1_crosstab:
The new table2 is added to this query with an outer join with the original table1. The outer join will allow all rows to be returned from table2 regardless of whether there is a matching row in the table1. Table2 now supplies the values for the row and column headings.
TRANSFORM First(qsel_table1.Exp) AS FirstOfEXP
SELECT Table2.Base_KEY AS [KEY]
FROM Table2 LEFT JOIN qsel_table1 ON (Table2.BASE_KEY = qsel_table1.XKEY)
AND (Table2.LANG = qsel_table1.Language)
GROUP BY Table2.Base_KEY
PIVOT Table2.LANG;

Try something like this:
select *
from
(
select 'abcd' as [key], right([key], 1) as id, expression
from table1
) x
pivot
(
max(expression)
for id in ([D], [E])
) p
Demo Fiddle

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas