Applying Cross Apply & String_split to column - SQL - sql

I have data in the following format unfortunately I'm powerless to how the data comes in, the data looks like;
ID
Column X
1
I (a)Some Text (b)Some more Text (c) Text
2
1a Some Text b Some more text c
Is it possible to use Cross Apply and String split so data looks something like;
ID
Column X
1
(a) Some text
1
(b) Some more text
1
(c) Text
2
a Some text
2
b Some more Text
Ideally, the end goal once split will be to remove the a,b,c etc.
I'm using SQL Version; 2019 Dev

Related

Datastage Column Mapping logic Based on column Value (Or any SQL function to do the same)

I'm trying to find a solution in datastage (Or in SQL) - without having to use a bunch of if/else conditions - where I can map value of one column based on value of another column.
Example -
Source File -
ID
Header1
Value1
Header2
Value2
1
Length
10
Height
15
2
Weight
200
Length
20
Target Output -
ID
Length
Height
Weight
1
10
15
2
20
200
I can do this using Index/Match function of excel. Was wondering if datastage or Snowflake can look into all these fields similarly and automatically populate the value column to the corresponding header column!
I think the best solution in DataStage would be a Pivot stage followed by a Transformer stage to strip out the hard-coded source column names.

How to transpose columns when they encode multiple "records"?

I have a spreadsheet I have imported into OpenRefine. The creator encoded groups of information (records) in columns. I need to bring each of those groups of columns into its own row, along with all the relevant columns.
Using a simplified example, how would I go from this:
id foo1 foo2 foo3 bar1 bar2 bar3
1 4 6 a 7 9 b
2 5 5 a 8 8 b
3 6 4 a 9 7 b
To this:
id foobar1 foobar2 foobar3
1 4 6 a
1 7 9 b
2 5 5 a
2 8 8 b
3 6 4 a
3 9 7 b
I've been trying to think of a way forward with intermediate columns, but there are are 6 groups of 5 columns and I'm currently stuck.
I found a solution. The steps are:
Concat each group of columns into a single column (FOO_CONCAT, BAR_CONCAT)
Delete the now unneeded columns (foo1..3, bar1..3)
Transpose your CONCAT columns into a single column, no prefix, ignoring blanks, filling down other columns
Now FOO_CONCATs and BAR_CONCATs are all in the same column
Split that column into several columns...(using the separator you used in step 1)
Rename columns
Strip out prefixes (I had foo1:4, bar2:8, etc for clarity)
Transform to numbers (Edit cells -> Common Transforms -> toNumber)
Now you're ready to transpose,facet, etc
I think this is essentially the same has the solution you describe, but possibly with some shortcuts to avoid all the steps.
Given the example data you post I would:
On "Id" column select Edit column->Add column based on this column
from menu
Make new column name "foobar"
Use the GREL forEach(row.columnNames,cn,if(cn.startsWith("foo"),cells[cn].value,null)).join("|")+"~"+forEach(row.columnNames,cn,if(cn.startsWith("bar"),cells[cn].value,null)).join("|")
Once new "foobar" column exists, on this column use menu option Edit cells->Split multi-valued cells using the "~" character (as used in the GREL above)
The also on the "foobar" column use menu option Edit columns->Split into several columns, using the "|" character as in the GREL above
Finally on ID column use menu Edit cells->Fill down
This should result in the output you describe - if you don't need the original columns at this point you can either remove them, or (sometimes quicker) export the first X columns that have the reconfigured data using the custom tabular exporter, and then import that data into a new project.
You can modify the GREL to deal with the exact column groupings you have. In my example I've used the column naming to group the values, but if that isn't the reality of the data you are dealing with you can use GREL like:
forEach(row.columnNames.slice(1,4),cn,cells[cn].value).join("|")+"~"+forEach(row.columnNames.slice(4,8),cn,cells[cn].value).join("|")
Which uses the 'slice' function to select certain columns rather than using some aspect of the column name to select them.

Power BI - Is there a way to conditionally format a text column if it does NOT meet a criteria?

as the title states, is there a way to conditionally format a text column if it does NOT meet a criteria - In Power BI?
i.e: Format a column's font colour if it is NOT "x".
e.g:
I want to format the following data (using the letter column) and highlight a, b and c (as they are not equal to x)
ID | code | letter
------------------
1 | 123 | a
2 | 345 | b
3 | 567 | x
4 | 789 | c
------------------
You can always create a measure to determine the desired highlight color for example, and use it, e.g. like this:
Measure = If(FIRSTNONBLANK('Table'[letter]; [letter]) = "x"; BLANK(); "#AABBCC")
See Color formatting by field value:
You can use a measure or a column that specifics a color, either using a text value or a hex code, to apply that color to the background of font color of a table or matrix visual. You can also create custom logic for a given field, and have that logic apply the desired color to the font or background.
You didn't mention what do you want to highlight and how. If the data will not be aggregated and show all rows in a table, you can calculate the highlighting color in conditional column in Power Query Editor, or computed column (DAX). If the highlighting rules will be applied on aggregated data, then you should use DAX measure.
If you want to highlight a cell, return the desired color. Otherwise return null in Power Query Editor, or BLANK() in DAX.

Extracting a word from string from n rows and append that word as a new col in SQL Server

I have got a data set that contains 3 columns and has 15565 observations. one of the columns has got several words in the same row.
What I am looking to do is to extract a particular word from each row and append it to a new column (i will have 4 cols in total)
The problem is that the word that i am looking for are not the same and they are not always on the same position.
Here is an extract of my DS:
x y z
-----------------------------------------------------------------------
1 T 3C00652722 (T558799A)
2 T NA >> MSP: T0578836A & 3C03024632
3 T T0579010A, 3C03051500, EAET03051496
4 U T0023231A > MSP: T0577506A & 3C02808556
8 U (T561041A C72/59460)>POPMigr.T576447A,C72/221816*3C00721502
I am looking to extract all the words that start with 3Cand are 10 characters long and then append the to a new col so it looks like this:
x y z Ref
----------------------------------------------------------------
1 T 3C00652722 (T558799A) 3C00652722
2 T NA >> MSP: T0578836A & 3C03024632 3C03024632
3 T T0579010A, 3C03051500, EAET03051496 3C03051500
4 U T0023231A > MSP: T0577506A & 3C02808556 3C02808556
8 U >POPMigr.T576447A,C72/221816*3C00721502 3C00721502
I have tried using the Contains, Like and substring methods but it does not give me the results i am looking for as it basically finds the rows that have the 3C number but does not extract it, it just copies the whole cell and pastes is on the Ref column.
SQL Server doesn't have good string functions, but this should suffice if you only want to extract one value per row:
select t.*,
left(stuff(col,
1,
patindex('%3C[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]%', col),
''
), 10)
from t ;

Excel: one column has duplicates of each value, I need to take averages of the corresponding two values from the other columns

Example:
column A column B
A 1
A 2
B 2
B 2
C 1
C 1
I would somehow like to get the following result:
column A column B
A 1.5
B 2
C 1
(which are averages of 1 and 2, 2 and 2 and 1 and 1)
How do I achieve that?
Thanks
If you're using Excel 2007 or above, you can also use the shorter AVERAGEIF function:
=AVERAGEIF($A$1:$A:$6,D1,$B$1:$B$6)
Less typing, easier to read..
In D1:D3, type A, B, C. Then in E1, put this formula
=SUMIF($A$1:$A$6,D1,$B$1:$B$6)/COUNTIF($A$1:$A$6,D1)
and fill down to E3. If you want to replace the existing data, copy E1:E3 and paste-special-values over itself. Then delete A:C.
Alternatively, you can add headers to your data, say "Letter" and "Number". Then create a Pivot Table from your data. Put Letter in the rows section and Number in the Data section. Change your Data section from SUM to AVERAGE and you'll get the same result.