Split not-atomar value into multiple rows with PostgreSQL - sql

I have some not atomar data in a database like this:
ID
Component ID List
1
123, 456
2
123, 345
I need to transform those table into a view that provides the "Component ID List" in a way, that I can use joins. Expected result:
ID
Component ID List
1
123
1
456
2
123
2
345
Because I have this case in quite a few tables I look for the possibility to create a reusable way to perform this action, e.g. with a SQL-function. The tables have different column-names so the function would need a parameter, like this:
SELECT *, split_values("Component ID List") FROM xyz
I know the best way would be to fix the problem in the raw-data but that's not possible in this case.
Any suggestions how to solve this the best way possible?

You can use unnest(string_to_array(Component_ID_List, ', ')):
SELECT ID,
unnest(string_to_array(Component_ID_List, ', ')) as Component_ID_List
FROM table_name;
Fiddle

Related

SQL: extract the last word

I have a table that looks like the following:
id | cars
1 | John's Honda
2 | Andrew's red lexus
3 | James has a bmw
I need to just get the last word of the "cars" column that shows the actual "car" name
I have tried the followings but I don't get the desired output
select substr(cars, -1)
from t
the code above just shows me the last charater of the column. Later, I tried the following:
select split(cars, ' ')[offset(1)]
from t
however, I got the "Array index 1 is out of bounds (overflow)" error. Can anyone help how this can be achieved with bigquery?
Consider below simple approach
select *,
array_reverse(split(cars, ' '))[offset(0)] as brand
from your_table
if applied to sample data in your question - output is
Note: there are really many ways to accomplish your case - so anoher one would be regexp_extract(cars, r'\b(\w+)$') as brand

Hive sql expand multiple columns to rows

I have a hive table which has the following format:
user | item | like | comment
Joe 5 1 0
Lan 3 0 1
Mack 5 1 1
and I want use HIVE SQL to convert like and comment to the user behavior column, then keep rows which user and item and times of behaviors:
user | item | behavior | times
Joe 5 like 1
Joe 5 comment 0
Lan 3 like 0
Lan 3 comment 1
Mack 5 like 1
Mack 5 comment 1
could you please give any advice?
Using map and explode.
select user,item,behavior,times
from tbl
lateral view explode(map('like',like,'comment',comment)) t as behavior,times
As as side note, you should avoid using reserved keywords like user, like, comment as column names.
Great answer by Prabhala and Linoff. Here I'm offering yet another way, the builtin UDTF stack, which is both intuitive and native.
select
stack(2, user, item, 'like', like,
user, item, 'comment', comment)
as (user, item, behavior, times)
from tbl
);
One method uses union all:
select user, item, 'like' as behavior, like as times
from t
union all
select user, item, 'comment' as behavior, comment as times
from t;

How do I select a SQL dataset where values in the first row are the column names?

I have data that looks like this:
ID RowType Col_1 Col_2 Col_3 ... Col_n
1 HDR FirstName LastName Birthdate
2 DTL Steve Bramblet 1989-01-01
3 DTL Bob Marley 1967-03-12
4 DTL Mickey Mouse 1921-04-25
And I want to return a table or dataset that looks like this:
ID FirstName LastName Birthdate
2 Steve Bramblet 1989-01-01
3 Bob Marley 1967-03-12
4 Mickey Mouse 1921-04-25
where n = 255 (so there's a limit of 255 Col_ fields)
***EDIT: The data in the HDR row is arbitrary so I'm just using FirstName, LastName, Birthdate as examples. This is why I thought it will need to be dynamic SQL since the column names I want to end up with will change based on the values in the HDR row. THX! ***
If there's a purely SQL solution that is what I'm after. It's going into an ETL process (SSIS) so I could use a Script task if all else fails.
Even if I could return a single row that would be a solution. I was thinking there might be a dynamic sql solution for something like this:
select Col_1 as FirstName, Col_2 as LastName, Col_3 as Birthdate
Not sure if your first data snippet is already in a oracle table or not but it is in a CSV file then you have option during loading to skip headers.
If data is already in table then you can use UNION to get desired result
Select * from table name where rowtype=‘HRD’
union
select * from table name where rowtype=‘DTL’
If you need First Name etc as Column header then you need not to do anything. Design destination table columns as per your requirement.
Sorry, posted an answer but I completely misread that you had your desired column headers as data in the source table.
One trivial solution (though it requires more IO) would be to dump the table data to a flat file without headers, then read it back in, but this time tell SSIS that the first row has headers, and ignore the RowType column. Make sure you sort the data correctly before writing it out to the intermediate file!
To dump to a file without headers, you have to set ColumnNamesInFirstDataRow to false. Set this in the properties window, not by editing the connection. More info in this thread
If you have a lot of data, this is obviously very inefficient.
Try the following using row_number. Here is the demo.
with cte as
(
select
*,
row_number() over (order by id) as rn
from myTable
)
select
ID,
Col_1 as FirstName,
Col_2 as LastName,
Col_3 as Birthdate
from cte
where rn > 1
output:
| id | firstname | lastname | birthdate |
| --- | --------- | -------- | ---------- |
| 2 | Steve | Bramblet | 1989-01-01 |
| 3 | Bob | Marley | 1967-03-12 |
| 4 | Mickey | Mouse | 1921-04-25 |
Oh, well. There is a pure SSIS approach, assumed the source is a SQL table. Here it is, rather sketchy.
Create a Variable oColSet with type Object, and 255 variables of type String and names sColName_1, sColName_2 ... sColName_255.
Create a SQL Task with query like select top(1) Col_1, Col_2, ... Col_255 from Src where RowType = 'HDR', set task properties ResultSet = Full Result Set, on result set tab - set Result Name to 0 and Variable Name to oColSet.
Add ForEach Loop enumerator, set it as ForEach ADO Enumerator, ADO object source variable - set to oColSet, Enumeration mode = Rows in the first table. Then, on the Variable Mappings tab - define as such example (Variable - Index) - sColName_1 - 0, sColName_2 - 1, ... sColName_255 - 254.
Create a variable sSQLQuery with type String and Variable Expression like
"SELECT Col_1 AS ["+#[User::sColName_1]+"],
Col_2 AS ["+#[User::sColName_2]+"],
...
Col_255 AS ["+#[User::sColName_255]+"]
FROM Src WHERE RowType='DTL'"
In the ForEach Loop - add your dataflow, in the OLEDB Source - set Data access mode to SQL command from variable and provide variable name User::sSQLQuery. On the Data Flow itself - set DelayValidation=true.
The main idea of this design - retrieve all column names and store it in temp variable (step 2). Then step 3 does parsing and places all results into corresponding variables, 1 column (0th) - into sColName_1 etc. Step 4 defines a SQL command as an expression, which is evaluated every time when the variable is read. Finally, in the ForEach Loop (where parsing is done) - you perform your dataflow.
Limitations of SSIS - data types and column names should be the same at runtime as at design time. If you need to further store your dataset into SQL - let me know, so I could adjust the proposed solution.

Data field - search and write value in new data field (Oracle)

Sorry, I don't know how to describe that as a title.
With a query (example: Select SELECT PKEY, TRUNC (CREATEDFORMAT), STATISTICS FROM BUSINESS_DATA WHERE STATISTICS LIKE '% business_%'), I can display all data that contains the value "business_xxxxxx".
For example, the data field can have the following content: c01_ad; concierge_beendet; business_start; or also skill_my; pre_initial_markt; business_request; topIntMaster; concierge_start; c01_start;
Is it now possible in a temp-only output the corresponding value in another column?
So the output looks like this, for example?
PKEY | TRUNC(CREATEDFORMAT) | NEW_STATISTICS
1 | 13.06.2020 | business_start
2 | 14.06.2020 | business_request
That means removing everything that does not start with business_xxx? Is this possible in an SQL query? RegEx would not be the right one, I think.
I think you want:
select
pkey,
trunc(createdformat) createddate,
regexp_substr(statistics, 'business_\S*') new_statistics
from business_data
where statistics like '% business_%'
You can also use the following regexp_substr:
SQL> select regexp_substr(str,'business_[^;]+') as result
2 from
3 --sample data
4 (select 'skill_my; pre_initial_markt; business_request; topIntMaster; concierge_start; c01_start;' as str from dual
5 union all
6 select 'c01_ad; concierge_beendet; business_start;' from dual);
RESULT
--------------------------------------------------------------------------------
business_request
business_start
SQL>

SELECT from 50 columns

I have a table that has many columns around 50 columns that have datetime data that represent steps user takes when he/she do a procedure
SELECT UserID, Intro_Req_DateTime, Intro_Onset_DateTime, Intro_Comp_DateTime, Info_Req_DateTime, Info_Onset_DateTime, Info_Comp_DateTime,
Start_Req_DateTime, Start_Onset_DateTime, Start_Comp_DateTime,
Check_Req_DateTime, Check_Onset_DateTime, Check_Comp_DateTime,
Validate_Req_DateTime, Validate_Onset_DateTime, Validate_Comp_DateTime,
....
FROM MyTable
I want to find the Step the user did after certain datetime
example I want to find user ABC what the first step he did after 2 May 2019 17:25:36
I cannot use case to check this will take ages to code
is there an easier way to do that?
P.S. Thanks for everyone suggested redesigning the database.. not all databases can be redesigned, this database is for one of the big systems we have and it is been used for more than 20 years. redesigning is out of the equation.
You can use CROSS APPLY to unpivot the values. The syntax for UNPIVOT is rather cumbersome.
The actual query text should be rather manageable. No need for complicated CASE statements. Yes, you will have to explicitly list all 50 column names in the query text, you can't avoid that, but it will be only once.
SELECT TOP(1)
A.StepName
,A.dt
FROM
MyTable
CROSS APPLY
(
VALUES
('Intro_Req', Intro_Req_DateTime)
,('Intro_Onset', Intro_Onset_DateTime)
,('Intro_Comp', Intro_Comp_DateTime)
.........
) AS A (StepName, dt)
WHERE
MyTable.UserID = 'ABC'
AND A.dt > '2019-05-02T17:25:36'
ORDER BY dt DESC;
See also How to unpivot columns using CROSS APPLY in SQL Server 2012
The best way is to design your table with your action type and datetime that action was done. Then you can use a simple where clause to find what you want. The table should be like the table below:
ID ActionType ActionDatetime
----------- ----------- -------------------
1492 1 2019-05-13 10:10:10
1494 2 2019-05-13 11:10:10
1496 3 2019-05-13 12:10:10
1498 4 2019-05-13 13:10:10
1500 5 2019-05-13 14:10:10
But in your current solution, you should use UNPIVOT to get what you want. You can find more information in this LINK.