SQL Array within a cell - sql

I have a cell that contains an array of characters seperated by commas i.e. "1,2,3,4,5" My question is, is it possible to remove a particular element of the array such as if I wanted to remove "1" then the cell would then become "2,3,4,5" or remove "3" and it becomes "1,2,4,5" I want to perform this task within SQL either as a function or a stored procedure, any help is much appreciated.

Sure, it'd just be some basic string REPLACE() calls: http://msdn.microsoft.com/en-us/library/ms186862.aspx
However, since you have to manipulate individual bits of this data field separately from the rest of the field, it's a good candidate for getting normalized into its own child table.

Related

Data Factory expression substring? Is there a function similar like right?

Please help,
How could I extract 2019-04-02 out of the following string with Azure data flow expression?
ABC_DATASET-2019-04-02T02:10:03.5249248Z.parquet
The first part of the string received as a ChildItem from a GetMetaData activity is dynamically. So in this case it is ABC_DATASET that is dynamic.
Kind regards,
D
There are several ways to approach this problem, and they are really dependent on the format of the string value. Each of these approaches uses Derived Column to either create a new column or replace the existing column's value in the Data Flow.
Static format
If the format is always the same, meaning the length of the sections is always the same, then substring is simplest:
This will parse the string like so:
Useful reminder: substring and array indexes in Data Flow are 1-based.
Dynamic format
If the format of the base string is dynamic, things get a tad trickier. For this answer, I will assume that the basic format of {variabledata}-{timestamp}.parquet is consistent, so we can use the hyphen as a base delineator.
Derived Column has support for local variables, which is really useful when solving problems like this one. Let's start by creating a local variable to convert the string into an array based on the hyphen. This will lead to some other problems later since the string includes multiple hyphens thanks to the timestamp data, but we'll deal with that later. Inside the Derived Column Expression Builder, select "Locals":
On the right side, click "New" to create a local variable. We'll name it and define it using a split expression:
Press "OK" to save the local and go back to the Derived Column. Next, create another local variable for the yyyy portion of the date:
The cool part of this is I am now referencing the local variable array that I created in the previous step. I'll follow this pattern to create a local variable for MM too:
I'll do this one more time for the dd portion, but this time I have to do a bit more to get rid of all the extraneous data at the end of the string. Substring again turns out to be a good solution:
Now that I have the components I need isolated as variables, we just reconstruct them using string interpolation in the Derived Column:
Back in our data preview, we can see the results:
Where else to go from here
If these solutions don't address your problem, then you have to get creative. Here are some other functions that may help:
regexSplit
left
right
dropLeft
dropRight

Conditional Split in SSIS - SQL

I'm quite new to SQL Databases, but I'm trying to add a Conditional Split in my Data Flow between my Flat File Source and OLE DB Database to exclude records containing some special characters such as ø and ¿ and ¡ on the [title] column. Those are causing errors when creating a table and therefore I want those records to be split from my table. How can I create a conditional split for this?
As a bonus: Is there a way to only filter in a conditional split the rows that contain numbers from 0-9 and letters from a-zA-Z so that all rows with "special" symbols are filtered out automatically?
A conditional split works by determining whether a condition is true or false. So, if you can write a rule that evaluates to true or false, and you can multiple rules to address assorted business needs, then you can properly shunt rows into different pathways.
How do I do that?
I always advocate that people add new columns to their data flows to handle this stuff. It's the only way you're going to have to debug when a condition comes up that you think should have been handled but wasn't.
Whether you create a column called IsTitleOnlyAlphaNumeric or IsTitleInternational is really up to you. General programming rule is you go for the common/probable case. Since the ASCII universe is 127 characters max, 255 for extended ASCII, I'd advocate the former. Otherwise, you're going to play whack-a-mole as the next file has umlats or a thorn in it.
Typically, we would add a new column through a Derived Column Transformation which means you're working with the SSIS expression language. However, in this case the expression does not have the ability to gracefully* identify whether the string is good or not. Instead, you'll want to use the .NET library for this heavy lifting. That's the Script Component and you'll have it operate in the Transformation mode (default).
Add a new column of type boolean IsTitleOnlyAlphaNumeric and crib the regular expression from check alphanumeric characters in string in c#
The relevant bit of the OnRowProcessed (name approximate) would look like
Row.IsTitleOnlyAlphaNumeric = isAlphaNumeric(Row.Title);
As rows flow through, that will be evaluated for each one and you'll see whether it meets the criteria or not. Depending on your data, you might need a check for NULL before you call that method.
How I shouldn't do that
*You could abuse the daylights out of the REPLACE function and test the allowable length of an expression by doing something like creating a new column called StrippedTitle and we are going to replace everything allowable character with an empty string. If the length of the trimmed final string is not zero, then there's something bad in there.
REPLACE(REPLACE(REPLACE([Title], "A", ""), "B", ""), "C", "") ..., "a", ""), "b", "") ..., "9", "")
where ... implies you've continued the pattern. Yes, you'll have to replace upper and lower cased characters. ASCIITable.com or similar will be your friend.
That will be a new column. So add a second Derived Column component to calculate whether it's empty - again, easier to debug. IsTitleOnlyAlphaNumeric
LEN(RTRIM(StrippedTitle)) == 0
Terrible approach but the number of questions I answer where people later clarify "I cannot use script" is decidedly non-zero.

Split multiple points in text format and switch coordinates in postgres column

I have a PostgreSQL column of type text that contains data like shown below
(32.85563, -117.25624)(32.855470000000004, -117.25648000000001)(32.85567, -117.25710000000001)(32.85544, -117.2556)
(37.75363, -121.44142000000001)(37.75292, -121.4414)
I want to convert this into another column of type text like shown below
(-117.25624, 32.85563)(-117.25648000000001,32.855470000000004 )(-117.25710000000001,32.85567 )(-117.2556,32.85544 )
(-121.44142000000001,37.75363 )(-121.4414,37.75292 )
As you can see, the values inside the parentheses have switched around. Also note that I have shown two records here to indicate that not all fields have same number of parenthesized figures.
What I've tried
I tried extracting the column to Java and performing my operations there. But due to sheer amount of records I have, I will run out of memory. I also cannot do this method in batched due to time constraints.
What I want
A SQL query or a sequence of SQL queries that will achieve the result that I have mentioned above.
I am using PostgreSQL9.4 with PGAdmin III as the client
this is a type of problem that should not be solved by sql, but you are lucky to use Postgres.
I suggest the following steps in defining your algorithm.
First part will be turning your strings into a structured data, second will transform structured data back to string in a format that you require.
From string to data
First, you need to turn your bracketed values into an array, which can be done with string_to_array function.
Now you can turn this array into rows with unnest function, which will return a row per bracketed value.
Finally you need to slit values in each row into two fields.
From data to string
You need to group results of the first query with results wrapped in string_agg function that will combine all numbers in rows into string.
You will need to experiment with brackets to achieve exactly what you want.
PS. I am not providing query here. Once you have some code that you tried, let me know.
Assuming you also have a PK or some unique column, and possibly other columns, you can do as follows:
SELECT id, (...), string_agg(point(pt[1], pt[0])::text, '') AS col_reversed
FROM (
SELECT id, (...), unnest(string_to_array(replace(col, ')(', ');('), ';'))::point AS pt
FROM my_table) sub
GROUP BY id; -- assuming id is PK or no other columns
PostgreSQL has the point type which you can use here. First you need to make sure you can properly divide the long string into individual points (insert ';' between the parentheses), then turn that into an array of individual points in text format, unnest the array into individual rows, and finally cast those rows to the point data type:
unnest(string_to_array(replace(col, ')(', ');('), ';'))::point AS pt
You can then create a new point from the point you just created, but with the coordinates reversed, turn that into a string and aggregate into your desired output:
string_agg(point(pt[1], pt[0])::text, '') AS col_reversed
But you might also move away from the text format and make an array of point values as that will be easier and faster to work with:
array_agg(point(pt[1], pt[0])) AS pt_reversed
As I put in the question, I tried extracting the column to Java and performing my operations there. But due to sheer amount of records I have, I will run out of memory. I also cannot do this method in batched due to time constraints.
I ran out of memory here as I was putting everything in a Hashmap of
< my_primary_key,the_newly_formatted_text >. As the text was very long sometimes and due to the sheer number of records that I had, it wasnt surprising that I got an OOM.
Solution that I used:
As suggested my many folks here, this solution was better solved with a code. I wrote a small script that formatted the text as per my liking and wrote the primary key and the newly formatted text to a file in tsv format. Then I imported the tsv in a new table and updated the original table from the new one.

Microsoft Access 2010 SQL Split String at "X" and Multiply

I have a table with a package size column with a data type of text that I need to convert to an integer for mathmatical reasons. The values in this column typically look something like "100ML","20GM","UD 20","13OZ" here is where it gets tricky there are occasionally values like "6X12ML","UD 5X6ML". The ones with the "X" in them I need to remove the "ML" I'm currently doing this with
Replace([TABLE_NAME].[COLUMN_NAME],"ML","")
in an expression column in a query. I can use nested Replace functions to remove the "ML","GM","OZ" and "UD ". All of my attempts to do this have failed, I figured the end solution would be something like
IIf([TABLE_NAME].[COLUMN_NAME] Like "X", (CInt(Left([TABLE_NAME].[COLUMN_NAME],InStr(1,[TABLE_NAME].[COLUMN_NAME],"X")-1))*CInt(Right([TABLE_NAME].[COLUMN_NAME],InStr(1,[TABLE_NAME].[COLUMN_NAME],"X")+1))),[TABLE_NAME].[COLUMN_NAME])
I have tried using a variation of the code above with no avail. All suggestions are appreciated, I would preffer to get this knocked out in one query but I do realize I can use and expression and just split the text before and after the "X" into two differenct expression columns. Then use another query to multiply the values.
QTY_ORDERED: IIf(InStr(1,Replace(Replace(Replace(Replace([STANDARD_PRICING].[PACKAGE_AMOUNT],"GM",""),"ML",""),"UD","")," ",""),"X")>1,[CRX_HISTORIC_PO].[QUANTITY]/Left(Replace(Replace(Replace(Replace([STANDARD_PRICING].[PACKAGE_AMOUNT],"GM",""),"ML",""),"UD","")," ",""),InStr(1,Replace(Replace(Replace(Replace([STANDARD_PRICING].[PACKAGE_AMOUNT],"GM",""),"ML",""),"UD","")," ",""),"X")-1)*Right(Replace(Replace(Replace(Replace([STANDARD_PRICING].[PACKAGE_AMOUNT],"GM",""),"ML",""),"UD","")," ",""),Len(Replace(Replace(Replace(Replace([STANDARD_PRICING].[PACKAGE_AMOUNT],"GM",""),"ML",""),"UD","")," ",""))-InStr(1,Replace(Replace(Replace(Replace([STANDARD_PRICING].[PACKAGE_AMOUNT],"GM",""),"ML",""),"UD","")," ",""),"X"))*-1,[CRX_HISTORIC_PO].[QUANTITY]/Replace(Replace(Replace(Replace([STANDARD_PRICING].[PACKAGE_AMOUNT],"GM",""),"ML",""),"UD","")," ","")*-1)
The code above is what I used to complete the task at hand.

SQL Remove Substring From Query Results

I have a query that is returning data from a database. In a single field there is a rather long text comment with a segment, which is clearly defined with marking tags like !markerstart! and !markerend!. I would like to have a query return with the string segment between the two markers removed (and the markers removed too).
I would normally do this client-side after I get the data back, however, the problem is that the query is an INSERT query that gets it's data from a SELECT statement. I don't want the text segment to be stored in the archival/reporting table (working with an OLTP application here), so I need to find a way to get the SELECT statement to return exactly what is to be inserted, which, in this case, means getting the SELECT statement to strip out the unwanted phrase instead of doing it in post-processing client-side.
My only thought is to use some convoluted combination of SUBSTRING, CHARINDEX, and CONCAT, but I'm hoping there is a better way, but, based on this, I don't see how. Anyone have ideas?
Sample:
This is a long string of text in some field in a database that has a segment that needs to be removed. !markerstart! This is the segment that is to be removed. It's length is unknown and variable. !markerend! The part of this field that appears after the marker should remain.
Result:
This is a long string of text in some field in a database that has a segment that needs to be removed. The part of this field that appears after the marker should remain.
SOLUTION USING STUFF:
I really don't like how verbose this is, but I can put it in a function if I really need to. It isn't ideal, but it is easier and faster than a CLR routine.
SELECT STUFF(CAST(Description AS varchar(MAX)), CHARINDEX('!markerstart!', Description), CHARINDEX('!markerend!', Description) + 11 - CHARINDEX('!markerstart!', Description), '') AS Description
FROM MyTable
You may want to consider implementing a CLR user-defined function that returns the parsed data.
The following link demonstrates how to use a CLR UDF RegEx function for pattern matching and data extraction.
http://msdn.microsoft.com/en-us/magazine/cc163473.aspx
Regards,
You can use Stuff function or Replace function and replace your unwanted symbols with ''.
STUFF('EXP',START_POS,'NUMBER_OF_CHARS','REPLACE_EXP')