Store only 1 values and remove the rest for same duplicated values in bigquery - sql

I have duplicated values in my data. However, from the duplicated values, i only want to store 1 values and remove the rest of same duplicated values.
So far, I have found the solution where they remove ALL the duplicated values like this.
Code:
SELECT ID, a.date as date.A, b.date as date.B,
CASE WHEN a.date <> b.date THEN NULL END AS b.date
except(date.A)
FROM
table1 a LEFT JOIN table2 b
USING (ID)
WHERE date.A = 1
Sample input:
Sample output (Store only 1 values from the duplicated values and remove the rest):
NOTE: query might wrong as it remove all duplicated values.

Considering your screenshot's sample data and your explanation. I understand that you want to remove duplicates from your table retaining only one row of unique data. Thus, I was able to create a query to select only one row of data ignoring the duplicates.
In order to select the rows without any duplicates, you can use SELECT DISTINCT. According to the documentation, it discards any duplicate rows. In addition to this method, CREATE TABLE statement will also be used to create a new table (or replace the previous one) with the new data without duplicates. The syntax is as follows:
CREATE OR REPLACE TABLE project_id.dataset.table AS
SELECT DISTINCT ID, a.date as date.A, b.date as date.B,
CASE WHEN a.date <> b.date THEN NULL END AS b.date
except(date.A)
FROM
table1 a LEFT JOIN table2 b
USING (ID)
WHERE date.A = 1
And the output will be exactly the same as you shared in your question.
Notice that I used CREATE OR REPLACE, which means if you set project_id.dataset.table to the same path as the table within your select, it will replace your current table (in case you have the data coming from one unique table). Otherwise, it will create a new table with the specified new table's name.

You can use aggregation. Something like this:
SELECT ANY_VALUE(a).*, ANY_VALUE(b).*
FROM table1 a LEFT JOIN
table2 b
USING (ID)
WHERE date.A = 1
GROUP BY id, a.date;
For each id/datecombination, this returns an arbitrary matching row froma/b`.

Related

I need to create a VIEW from an existing TABLE and MAP an additional COLUMN to that VIEW

I am fairly new to SQL. What I am trying to do is create a view from an existing table. I also need to add a new column to the view which maps to the values of an existing column in the table.
So within the view, if the value in a field for Col_1 = A, then the value in the corresponding row for New_Col = C etc
Does this even make sense? Would I use the CASE clause? Is mapping in this way even possible?
Thanks
The best way to do this is to create a mapping or lookup table
For example consider the following LOOKUP table.
COL_A NEW_VALUE
---- -----
A C
B D
Then you can have a query like this:
SELECT A.*, LOOK.NEW_VALUE
FROM TABLEA AS A
JOIN LOOKUP AS LOOK ON A.COL_A = LOOK.COL_A
This is what DimaSUN is doing in his query too -- but in his case he is creating the table dynamically in the body of the query.
Also note, I'm using a JOIN (which is an inner join) so only results in the lookup table will be returned. This could filter the results. A LEFT JOIN there would return all data from A but some of the new columns might be null.
Generally, a view is an instance of a table/a replica provided that there is no alteration to the original table. So, as per your query you can manipulate the data and columns in a view by using case.
Create View viewname as
Select *,
case when column=a.value then 'C'
....
ELSE
END
FROM ( Select * from table) a
If You have restricted list of replaced values You may hardcode that list in query
select T.*,map.New_Col
from ExistingTable T
left join (
values
('A','C')
,('B','D')
) map (Col_1,New_Col) on map.Col_1 = T.Col_1
In this sample You hardcode 'A' -> 'C' and 'B' -> 'D'
In general case You better may to use additional table ( see Hogan answer )

Compare one value of column A with all the values of column B in Hive HQL

I have two columns in one table say Column A and Column B. I need to search each value of Column A with All the values of column B each and every time and return true if the column A value is found in any of the rows of column B. How can i get this?
I have tried using the below command:
select column _A, column_B,(if (column_A =column_B), True, False) as test from sample;
If i use the above command, it is checking for that particular row alone. But I need true value, if a value of column A is found in any of the rows of column B.
How can i can check one value of column A with all the all the values of column B?
Or Is there any possibility to iterate and compare each value between two columns?
Solution
create temporary table t as select rand() as id, column_A, column_B from sample; --> Refer 1
select distinct t3.id,t3.column_A,t3.column_B,t3.match from ( --> Refer 3
select t1.id as id, t1.column_A as column_A, t1.column_B as column_B,--> Refer 2
if(t2.column_B is null, False, True) as match from t t1 LEFT OUTER JOIN
t t2 ON t1.column_A = t2.column_B
) t3;
Explanation
Create an identifier column to keep track of the rows in original table. I am using rand() here. We will take advantage of this to get the original rows in Step 3. Creating a temporary table t here for simplicity in next steps.
Use a LEFT OUTER JOIN with self to do your test that requires matching each column with another across all rows, yielding the match column. Note that here multiple duplicate rows may get created than in Sample table, but we have got a handle on the duplicates, since the id column for them will be same.
In this step, we apply distinct to get the original rows as in Sample table. You can then ditch the id column.
Notes
Self joins are costly in terms of performance, but this is unavoidable for solution to the question.
The distinct used in Step 3, is costly too. A more performant approach would be to use Window functions where we can partition by the id and pick the first row in the window. You can explore that.
You can do a left join to itself and check if the column key is null. If it is null, then that value is not found in the other table. Use if or "case when" function to check if it is null or not.
Select t1.column_A,
t1.column_B,
IF(t2.column_B is null, 'False', 'True') as test
from Sample t1
Left Join Sample t2
On t1.column_A = t2.column_B;

Oracle SQL Update one table column with the value of another table

I have a table A, where there is a column D_DATE with value in the form YYYYMMDD (I am not bothered about the date format). I also happen to have another table B, where there is a column name V_TILL. Now, I want to update the V_TILL column value of table B with the value of D_DATE column in table A which happens to have duplicates as well. Meaning, the inner query can return multiple records from where I form a query to update the table.
I currently have this query written but it throws the error:
ORA-01427: single-row subquery returns more than one row
UPDATE TAB_A t1
SET (V_TILL) = (SELECT TO_DATE(t2.D_DATE,'YYYYMMDD')
FROM B t2
WHERE t1.BR_CODE = t2.BR_CODE
AND t1.BK_CODE = t2.BK_CODE||t2.BR_CODE)
WHERE EXISTS (
SELECT 1
FROM TAB_B t2
WHERE t1.BR_CODE = t2.BR_CODE
AND t1.BK_CODE = t2.BK_CODE||t2.BR_CODE)
PS: BK_CODE IS THE CONCATENATION OF BK_CODE and BR_CODE
Kindly help me as I am stuck in this quagmire! Any help would be appreciated.
If the subquery returns many values which one do you want to use ?
If any you can use rownum <=1;
If you know that there is only one value use distinct
SET (V_TILL) = (SELECT TO_DATE(t2.D_DATE,'YYYYMMDD')
FROM B t2
WHERE t1.BR_CODE = t2.BR_CODE
AND t1.BK_CODE = t2.BK_CODE||t2.BR_CODE AND ROWNUM <=1)
or
SET (V_TILL) = (SELECT DISTINCT TO_DATE(t2.D_DATE,'YYYYMMDD')
FROM B t2
WHERE t1.BR_CODE = t2.BR_CODE
AND t1.BK_CODE = t2.BK_CODE||t2.BR_CODE)
above are workarounds. To do it right you have to analyze why you are getting more than one value. Maybe more sophisticated logic is needed to select the right value.
I got it working with this command:
MERGE INTO TAB_A A
USING TAB_B B
ON (A.BK_CODE = B.BK_CODE || B.BR_CODE
AND A.BR_CODE = B.BR_CODE AND B.BR_DISP_TYPE <> '0'
AND ((B.BK_CODE, B.BR_SUFFIX) IN (SELECT BK_CODE,
MIN(BR_SUFFIX)
FROM TAB_B
GROUP BY BK_CODE)))
As mentioned earlier by many, I was missing an extra condition and got it working, otherwise the above mentioned techniques work very well.
Thanks to all!

SQL: How to update an empty column with pre-defined set of values

I have a table with, let's say, 100 records. The table has two columns. The first column (A) has unique values. The second column (B) has NULL values
For 4 elements from column A I'd like to associate some earlier defined values, and they are unique as well.
I don't care about which value from column B will be associated with the value from column A. I'd like to associate 4 unique values with another 4 unique values. Basically, like I'd cut and paste a block of values from one column to another in excel.
How can I do it without using cursors?
I'd like to use one Update statement for ALL rows instead one Update statement for EVERY row as I do now.
Try this:
UPDATE t
SET ColumnB = BValue
FROM Table t
INNER JOIN
(
SELECT 1 AValue, 'Mouse' BValue UNION
SELECT 2, 'Cat' UNION
SELECT 3, 'Dog' UNION
SELECT 4, 'Wolf'
) PreDefined ON(t.ColumnA = PreDefined.AValue)
Use any number you want in the 'PreDefined' table, as long as they are unique and within the range of values in columnA of your original table.
If you are only trying to fill a table for testing purposes, I guess you could:
A) Use the value from Column A itself (as it is already unique).
B) If they are to be different, use some function on the column A's value to obtain a column B value (something simple, like (ColumnA * 10), and this would give youA)
C) Create a temp table with a "dictionary" setting a B value for each possible A value, and then update the rows desired on your table looking up from values on this dictionary table.
Anyway, if you explain a little further your purpose it will be easier to try suggesting you a solution.
if your animal data is already in a database table, then you can use a single update statement like this:
update target_table t4
set columnb = (
select animal_name
from (select columna, animal_name
from (select rownum rowNumber, animal_name from animal_table) t1
join (select rownum rowNumber, columna from target_table t1 where columnb is null) t2
on t1.rowNumber = t2.rowNumber
) t3
where t4.columna = t3.columna
)
;
this works by selecting a sequence number and animal name from the source table, then selecting a sequence number and columna value from your target table. by joining those records on the sequence number you guarantee you get exactly 1 animal name for each columna value. you can then join those columna-to-animal records to your target table to do an update of columnb.
for more background on updating one table from values in another, you might consider the solutions presented here: Update rows in one table with data from another table based on one column in each being equal. the only difference is that in your example, you do not have any column that matches between your target table and your animal names table, so you need to use the rownum to create an arbitrary 1-to-1 matching of records.
if your unique options are in a text file or spreadsheet, then you can format them into a fixed-width space-padded string and pick the one you want using the rownum index like so:
update table_name
set columnb = trim(substr('mouse cat dog wolf ', rownum*6-6, 6))
where columnb is null;

SQL JOIN and COUNT COMMANDS

I have two different tables, we will call table 1 and table 2. Within table 1, I have a column known as featureid which is a numerical value that corresponds to a numerical value in table 2 known as the termid. Also, within this table 2, each of the different termid corresponds to a plain text description of that termid.
What I am attempting to do is join the featureid in table 1 to the termid in table 2, but have the output be a two column display of the plain text description and occurrence of each within table 1.
I know I need to use the JOIN and COUNT syntax within SQL, but not sure how to correctly write the command.
Thanks!
You actually only need one column in the GROUP BY. It's also good practice to specify which values are coming from which table, like so:
SELECT
table2.textDesc,
COUNT(table1.featureid) AS OccurrenceCount
FROM
table1
INNER JOIN
table2 ON
table1.featureid = table2.termid
GROUP BY table2.textDesc
I think this is what you are looking for:
SELECT textDesc, COUNT(featureid)
FROM table1, table2
WHERE featureid=termid
GROUP BY featureid, textDesc
Alternatively, you can use a different syntax (with the same end result) like so:
SELECT textDesc, COUNT(featureid)
FROM table1 INNER JOIN table2
ON featureid=termid
GROUP BY featureid, textDesc