Concatenate IDs only IF not unique in table - sql

I am bit stuck in writing a query that might be easy for some of you. I am working with Redshift in Coginiti. So this is the problem I want to solve:
I have a a big table but for this particular query I will only use 3 columns: ID, X,Y
the requirement is if ID is unique then I should leave it as is, ID. If ID is not unique then I want to concatenate ID,X,Y. I am not looking to overwrite the column but rather create a new column I would call NEW_ID
if ID is unique in table T-->ID
else concatenate(ID,X,Y) using '_' as delimiter
I did have a kind of solution, where I write a subquery to get the count of ID then write and if statement saying if count(ID)=1 then ID, else the concatenated but I am blanking out on to actually implement it in SQL world.
Thanks, I appreciate your help in advance :)
I did have a kind of solution, where I write a subquery to get the count of ID then write and if statement saying if count(ID)=1 then ID, else the concatenated but I am blanking out on to actually implement it in SQL world.
SELECT *, CONCAT(ID,X,Y)
from table
left join ....got stuck here on how to tie it to the next part
SELECT ID, COUNT(ID)
FROM table
group by id
having count(ID)<>1 ...or perhaps =1. I need to work with all values anyway

This should be straight forward. On Redshift I like the decode() function for cases like this but a CASE statement works just as well.
select id, x, y,
decode(id_count, 1, id, id::text || '_' || x || '_' || y) as concat_col
from (
select id, x, y, count(*) over (partition by id) as id_count
from <table>
);

Related

Query specific tables in Bigtable from BigQuery

I have some data in Google BigTable over which I have built a BigQuery external table (as per Querying Cloud Bigtable data so that I can query the Bigtable table using conventional SQL (which I'm very familiar with).
When I issue a select * I get this:
Now I would like to know the syntax for querying specific values in this nested data. For example, to get a list of accountIds I can do this:
SELECT ARRAY(SELECT timestamp FROM UNNEST(attributes.column[OFFSET(0)].cell)) AS timestamp,
ARRAY(SELECT SAFE_CONVERT_BYTES_TO_STRING(value) FROM UNNEST(attributes.column[OFFSET(0)].cell)) AS values
FROM `table`
where SAFE_CONVERT_BYTES_TO_STRING(rowkey) = 'XXXX'
which returns:
which is, well, kinda handy.
Similarly I can get car#le11mcr#policyStartDate by changing the OFFSET like so:
SELECT ARRAY(SELECT timestamp FROM UNNEST(attributes.column[OFFSET(6)].cell)) AS timestamp,
ARRAY(SELECT SAFE_CONVERT_BYTES_TO_STRING(value) FROM UNNEST(attributes.column[OFFSET(6)].cell)) AS values
FROM `table`
where SAFE_CONVERT_BYTES_TO_STRING(rowkey) = 'XXXX'
However both of these queries require me to know what value to pass to OFFSET() and that value appears to depend on the alphabetical order of the Bigtable columns hence if another column whose name starts with (say) 'b' appears in the future my queries would no longer return the same thing.
I need a better way of querying the table than using OFFSET(). Essentially I want to be able to say:
select the cell values and timestamp values for the cell whose name is accountId
or
select the cell values and timestamp values for the cell whose name is car#le11mcr#policyStartDate
Is there a way to do that? I'm not too familiar with BigQuery syntax for doing this.
OK, I've made a tiny bit of progress.
This:
SELECT
(
select array(select timestamp from unnest(cell))
from unnest(attributes.column) where name in ('accountId')
) accountIdTimestamp,
(
select array(select value from unnest(cell))
from unnest(attributes.column) where name in ('accountId')
) accountIdValue
FROM `table`
where SAFE_CONVERT_BYTES_TO_STRING(rowkey) = 'XXXX'
limit 3
returns:
which is better, but notice it didn't return anything for the first two rows. That's because those two rows don't have a cell called accountId, a problem I can get around by introducing a WHERE clause:
SELECT
(
select array(select timestamp from unnest(cell))
from unnest(attributes.column) where name in ('accountId')
) accountIdTimestamp,
(
select array(select value from unnest(cell))
from unnest(attributes.column) where name in ('accountId')
) accountIdValue
FROM `table`
where ARRAY_LENGTH(ARRAY(
select name from unnest(attributes.column) where name in ('accountId')
)) > 0
limit 3
which returns:
That does what I want, I guess, but I'd like to think there's a better way of achieving this that doesn't require quite so much typing and so much complicated logic (the WHERE clause in particular feels like a very complicated way of saying only give me rows if there's an accountId).
Any advice to make this more efficient or readable would be appreciated.
My next challenge to solve is to return the accountIdValue for the max(accountIdTimestamp)

Sorting concatenated strings after grouping in Netezza

I'm using the code on this page to create concatenated list of strings on a group by aggregation basis.
https://dwgeek.com/netezza-group_concat-alternative-working-example.html/
I'm trying to get the concatenated string in sorted order, so that, for example, for DB1 I'd get data1,data2,data5,data9
I tied modifying the original code to selecting from a pre-sorted table but it doesn't seem to make any difference.
select Col1
, count(*) as NUM_OF_ROWS
, trim(trailing ',' from SETNZ..replace(SETNZ..replace (SETNZ..XMLserialize(SETNZ..XMLagg(SETNZ..XMLElement('X',col2))), '<X>','' ),'</X>' ,',' )) AS NZ_CONCAT_STRING
from
(select * from tbl_concat_demo order by 1,2) AS A
group by Col1
order by 1;
Is there a way to sort the strings before they get aggregated?
BTW - I'm aware there is a GROUP_CONCAT UDF function for Netezza, but I won't have access to it.
This is notoriously difficult to accomplish in sql, since sorting is usually done while returning the data, and you want to do it in the ‘input’ set.
Try this:
1)
Create temp table X as select * from tbl_concat_demo Order by col2
Partition by (col1)
In your original code above: select from X instead of tbl_concat_demo
Let me know if it works ?

Query with Oracle partition

I would like to knew if if possible build a query statement who read the same table from 2 different partition.
In my scenario I have two partition (partition_A and partition_B) and a table "consume" , in nowadays I do this statement :
Select id, item
From partition_A(consume)
union all
Select id, item
From partition_B(consume);
but I would like to obtain the same result when I use this statement :
Select id, item from consume;
Is it possible ?
tks.
Do you want to like this?
Select id, item
From consume PARTITION partition_A(name of partition)
union all
Select id, item
From consume PARTITION partition_B;
Maybe I didn't understand your question.

Merge 2 Tables from different Databases

Hypothetically I want to merge 2 tables from different databases into one table, which includes all the data from the 2 tables:
The result would look like something like this:
Aren't the entries in the result table redundant, because there are 2 entries with Porsche and VW? Or can I just add the values in the column 'stock' because the column 'Mark' is explicit?
you need to create database link to another database here is the example on how to create database link http://psoug.org/definition/create_database_link.htm
after creating your select statement from another database should look: select * from tableA#"database_link_name"
Then you need to use MERGE statement to push data from another database so the merge statement should look something like this.
you can read about merge statement here: https://docs.oracle.com/cd/B28359_01/server.111/b28286/statements_9016.htm#SQLRF01606
merge into result_table res
using (select mark, stock, some_unique_id
from result_table res2
union all
select mark, stock, some_unique_id
from tableA#"database_link_name") diff
on (res.some_unique_id = diff.some_unique_id )
when matched then
update set res.mark = diff.mark,
res.stock = diff.stock
when not matched then
insert
(res.mark,
res.stock,
res.some_unique_id)
values
(diff.mark,
diff.stock,
diff.some_unique_id);
I hope this will help you
SELECT ROW_NUMBER() OVER (ORDER BY Mark) AS new_ID, Mark, SUM(Stock) AS Stock
FROM
(
SELECT Mark,Stock FROM Database1.dbo.table1
UNION ALL
SELECT Mark,Stock FROM Database2.dbo.table2
) RESULT
GROUP BY Mark
Try this:
Select Mark, Stock, row_number() over(order by Mark desc) from table1
union all
Select Mark, Stock, row_number() over(order by Mark desc) from table2
regardless of the data redundancy, you could use union all clause to achieve this. Like:
Select * From tableA
UNION ALL
Select * From tanleB
Make sure the total number of columns and datatype should be matched between each
Don't forget to use fully qualified table names as the tables are in different databases
SELECT
Mark
,Stock
FROM Database1.dbo.table1
UNION ALL
SELECT
Mark
,Stock
FROM Database2.dbo.table2
If these are 2 live databases and you would need to constantly include rows from the 2 databases into your new database consider writing the table in your 3rd database as a view rather.
This way you can also add a column specifying which system the datarow is coming from. Summing the values is an option, however if you ever have a query regarding a incorrect summed value how would you know which system is the culprit?

GROUP BY in Informix (11.5)

Following Example Table Structure:
NR1 | NR2 | PRENAME | LASTNAME
If i query all 4 fields of this table and grouping it´s first 2 fields (NR1,NR2) in mysql,
i can do something like this:
SELECT NR1,NR2,PRENAME,LASTNAME FROM tbl GROUP BY NR1,NR1
But this won´t work in informix.
INFORMIX ERROR: the column (PRENAME) must be in the group by list
After reading some Topics at google, it is an "Informix feature" that all Selected Columns has to be in the Grouping List.
But if i will do that, the result is not that result, that i wish to have.
If i use
DISTINCT
instead GROUP BY the result is similar false, because i can not put the DISTINCTfunction only to column 1 and 2.
So: How can i make a "MYSQL GROUP BY" function ?
Your original syntax is suitable in one database -- MySQL. And, that database says that the results of the non-aggregated columns come from indeterminate rows. So, an equivalent query is just to use MIN() or MAX():
SELECT NR1, NR2, MIN(PRENAME), MIN(LASTNAME)
FROM tbl
GROUP BY NR1, NR1;
My guess is that you want an arbitrary value from just one row. I'd be inclined to concatenate them:
SELECT NR1, NR2, MIN(PRENAME || ' ' || LASTNAME)
FROM tbl
GROUP BY NR1, NR1;