Split multiple points in text format and switch coordinates in postgres column - sql

I have a PostgreSQL column of type text that contains data like shown below
(32.85563, -117.25624)(32.855470000000004, -117.25648000000001)(32.85567, -117.25710000000001)(32.85544, -117.2556)
(37.75363, -121.44142000000001)(37.75292, -121.4414)
I want to convert this into another column of type text like shown below
(-117.25624, 32.85563)(-117.25648000000001,32.855470000000004 )(-117.25710000000001,32.85567 )(-117.2556,32.85544 )
(-121.44142000000001,37.75363 )(-121.4414,37.75292 )
As you can see, the values inside the parentheses have switched around. Also note that I have shown two records here to indicate that not all fields have same number of parenthesized figures.
What I've tried
I tried extracting the column to Java and performing my operations there. But due to sheer amount of records I have, I will run out of memory. I also cannot do this method in batched due to time constraints.
What I want
A SQL query or a sequence of SQL queries that will achieve the result that I have mentioned above.
I am using PostgreSQL9.4 with PGAdmin III as the client

this is a type of problem that should not be solved by sql, but you are lucky to use Postgres.
I suggest the following steps in defining your algorithm.
First part will be turning your strings into a structured data, second will transform structured data back to string in a format that you require.
From string to data
First, you need to turn your bracketed values into an array, which can be done with string_to_array function.
Now you can turn this array into rows with unnest function, which will return a row per bracketed value.
Finally you need to slit values in each row into two fields.
From data to string
You need to group results of the first query with results wrapped in string_agg function that will combine all numbers in rows into string.
You will need to experiment with brackets to achieve exactly what you want.
PS. I am not providing query here. Once you have some code that you tried, let me know.

Assuming you also have a PK or some unique column, and possibly other columns, you can do as follows:
SELECT id, (...), string_agg(point(pt[1], pt[0])::text, '') AS col_reversed
FROM (
SELECT id, (...), unnest(string_to_array(replace(col, ')(', ');('), ';'))::point AS pt
FROM my_table) sub
GROUP BY id; -- assuming id is PK or no other columns
PostgreSQL has the point type which you can use here. First you need to make sure you can properly divide the long string into individual points (insert ';' between the parentheses), then turn that into an array of individual points in text format, unnest the array into individual rows, and finally cast those rows to the point data type:
unnest(string_to_array(replace(col, ')(', ');('), ';'))::point AS pt
You can then create a new point from the point you just created, but with the coordinates reversed, turn that into a string and aggregate into your desired output:
string_agg(point(pt[1], pt[0])::text, '') AS col_reversed
But you might also move away from the text format and make an array of point values as that will be easier and faster to work with:
array_agg(point(pt[1], pt[0])) AS pt_reversed

As I put in the question, I tried extracting the column to Java and performing my operations there. But due to sheer amount of records I have, I will run out of memory. I also cannot do this method in batched due to time constraints.
I ran out of memory here as I was putting everything in a Hashmap of
< my_primary_key,the_newly_formatted_text >. As the text was very long sometimes and due to the sheer number of records that I had, it wasnt surprising that I got an OOM.
Solution that I used:
As suggested my many folks here, this solution was better solved with a code. I wrote a small script that formatted the text as per my liking and wrote the primary key and the newly formatted text to a file in tsv format. Then I imported the tsv in a new table and updated the original table from the new one.

Related

"Numeric value '' is not recognized" - what column?

I am trying to insert data from a staging table into the master table. The table has nearly 300 columns, and is a mix of data-typed Varchars, Integers, Decimals, Dates, etc.
Snowflake gives the unhelpful error message of "Numeric value '' is not recognized"
I have gone through and cut out various parts of the query to try and isolate where it is coming from. After several hours and cutting every column, it is still happening.
Does anyone know of a Snowflake diagnostic query (like Redshift has) which can tell me a specific column where the issue is occurring?
Unfortunately not at the point you're at. If you went back to the COPY INTO that loaded the data, you'd be able to use VALIDATE() function to get better information to the record and byte-offset level.
I would query your staging table for just the numeric fields and look for blanks, or you can wrap all of your fields destined for numeric fields with try_to_number() functions. A bit tedious, but might not be too bad if you don't have a lot of numbers.
https://docs.snowflake.com/en/sql-reference/functions/try_to_decimal.html
As a note, when you stage, you should try and use the NULL_IF options to get rid of bad characters and/or try to load them into stage using the actual datatypes in your stage table, so you can leverage the VALIDATE() function to make sure the data types are correct before loading into Snowflake.
Query your staging using try_to_number() and/or try_to_decimal() for number and decimal fields of the table and the use the minus to get the difference
Select $1,$2,...$300 from #stage
minus
Select $1,try_to_number($2)...$300 from#stage
If any number field has a string that cannot be converted then it will be null and then minus should return those rows which have a problem..Once you get the rows then try to analyze the columns in the result set for errors.

SQL Remove Substring From Query Results

I have a query that is returning data from a database. In a single field there is a rather long text comment with a segment, which is clearly defined with marking tags like !markerstart! and !markerend!. I would like to have a query return with the string segment between the two markers removed (and the markers removed too).
I would normally do this client-side after I get the data back, however, the problem is that the query is an INSERT query that gets it's data from a SELECT statement. I don't want the text segment to be stored in the archival/reporting table (working with an OLTP application here), so I need to find a way to get the SELECT statement to return exactly what is to be inserted, which, in this case, means getting the SELECT statement to strip out the unwanted phrase instead of doing it in post-processing client-side.
My only thought is to use some convoluted combination of SUBSTRING, CHARINDEX, and CONCAT, but I'm hoping there is a better way, but, based on this, I don't see how. Anyone have ideas?
Sample:
This is a long string of text in some field in a database that has a segment that needs to be removed. !markerstart! This is the segment that is to be removed. It's length is unknown and variable. !markerend! The part of this field that appears after the marker should remain.
Result:
This is a long string of text in some field in a database that has a segment that needs to be removed. The part of this field that appears after the marker should remain.
SOLUTION USING STUFF:
I really don't like how verbose this is, but I can put it in a function if I really need to. It isn't ideal, but it is easier and faster than a CLR routine.
SELECT STUFF(CAST(Description AS varchar(MAX)), CHARINDEX('!markerstart!', Description), CHARINDEX('!markerend!', Description) + 11 - CHARINDEX('!markerstart!', Description), '') AS Description
FROM MyTable
You may want to consider implementing a CLR user-defined function that returns the parsed data.
The following link demonstrates how to use a CLR UDF RegEx function for pattern matching and data extraction.
http://msdn.microsoft.com/en-us/magazine/cc163473.aspx
Regards,
You can use Stuff function or Replace function and replace your unwanted symbols with ''.
STUFF('EXP',START_POS,'NUMBER_OF_CHARS','REPLACE_EXP')

RIGHT function, not returning whats expected?

Query:
SELECT StartDate, EndDate, RIGHT(Sector, 1 )
FROM Table1
ORDER BY Right(Sector, 1), StartDate
By looking at this, the query should order everything by sector, followed by the start date. This query has worked for quiet awhile until yesterday where it did not order it properly, for some reason, Sector 2 came before Sector 1.
The data type for Sector is of type int, not null. After inserting a TRIM function into Sector it seems to work fine afterwards.
New Query:
SELECT StartDate, EndDate, RIGHT(Sector, 1 )
FROM Table1
ORDER BY Right(TRIM(Sector), 1), StartDate
Which I found really weird since it's suppose to only pick out one character, so why is there leading spaces?
Is there an issue with using RIGHT function on a int before converting the type? Or is it something else?
Thanks for the help everyone!
-Edit- The RIGHT function should return either 1,2,3 or 4 however when ordering it, 2 comes before 1.
To clarify, the column Sector contains an int value, we can determine it's location by obtaining the last digit (Which is why the previous coder did)
MS Access 2003 has a curious little feature (I can't speak for the other versions):
Make a simple query. Sort by Column A Ascending. Save the query.
Run the query. When you see the output, sort by Column A Descending using the toolbar option (see pic below). Save & close.
Run the query again. Your new sort will have overridden the sort that you saved in the query.
I think you or someone else probably just opened the query out of curiosity, sorted by Sector Descending, and when prompted to save Design Changes, you chose Yes (even though technically you didn't make any). The easiest way I found to restore the original sort is to edit the query and save it.
You've got your data stored wrong if you need to sort on a subcharacter of a numeric field.
That said, in certain context, VBA functions reserve a space in string representations of numbers for the sign. A nonsensical example of this would be:
?Len("12345")
5
Notice the space at the beginning (where the - would be if the number returned by Len() could be negative). I thought this was a result of coercing a number to a string value, but that's not it, and I couldn't replicate the problem. But that would likely be the source of the problem, and, of course, trimming off the leading space would take care of the issue.
But that's two function calls for each line, and then you're sorting by it, and that means no use of indexes, so it's going to be slow relative to a SORT BY that can use indexes. So, I'd conclude you have a schema error, in that you're giving meaning to a subpart of the data stored in the field.
It seems pretty obvious that you have a blank space at the end of the Sector field that the trim is removing.

Full Text Searching for single characters

I have a table with a TEXT column where the contents is just strings of CSV numbers. Example ",1,76,77,115," Each string can have an arbitrary number of numbers.
I am trying to set up Full Text Indexing so that I can search this column rapidly. This works great. Instead of running queries with
where MY_COL LIKE '%,77,%' and MY_COL LIKE '%,115,%'
I can do
where CONTAINS(MY_COL,'77 and 115')
However, when I try to search for a single character it doesn't work.
where CONTAINS(MY_COL,'1')
But I know that there should be records returned! I quickly found that I need to edit the Noise file and rebuild the index. But even after doing that it still doesn't work.
Working with relational databases that way is going to hurt.
Use a proper schema. Either store the values in different rows or use an array datatype for the column.
That will make solving the problem trivial.
I fixed my own problem, although I'm not exactly sure what fixed it.
I dropped my table and populated a new one (my program does batch processing) and created a new Full Text Index. Maybe I wasn't being patient enough to allow the indexing to fully rebuild.
Agreed. How does 12,15,33 not return that record for a search for 1 with fulltext? Use an actual table schema to accomplish this.

Force numerical order on a SQL Server 2005 varchar column, containing letters and numbers?

I have a column containing the strings 'Operator (1)' and so on until 'Operator (600)' so far.
I want to get them numerically ordered and I've come up with
select colname from table order by
cast(replace(replace(colname,'Operator (',''),')','') as int)
which is very very ugly.
Better suggestions?
It's that, InStr()/SubString(), changing Operator(1) to Operator(001), storing the n in Operator(n) separately, or creating a computed column that hides the ugly string manipulation. What you have seems fine.
If you really have to leave the data in the format you have - and adding a numeric sort order column is the better solution - then consider wrapping the text manipulation up in a user defined function.
select colname from table order by dbo.udfSortOperator(colname)
It's less ugly and gives you some abstraction. There's an additional overhead of the function call but on a table containing low thousands of rows in a not-too-heavily hit database server it's not a major concern. Make notes in the function to optomise later as required.
My answer would be to change the problem. I would add an operatorNumber field to the table if that is possible. Change the update/insert routines to extract the number and store it. That way the string conversion hit is only once per record.
The ordering logic would require the string conversion every time the query is run.
Well, first define the meaning of that column. Is operator a name so you can justify using chars? Or is it a number?
If the field is a name then you will use chars, and then you would want to determine the fixed length. Pad all operator names with zeros on the left. Define naming rules for operators (I.E. No leters. Or the codes you would use in a series like "A001")
An index will sort the physical data in the server. And a properly define text naming will sort them on a query. You would want both.
If the operator is a number, then you got the data type for that column wrong and needs to be changed.
Indexed computed column
If you find yourself ordering on or otherwise querying operator column often, consider creating a computed column for its numeric value and adding an index for it. This will give you a computed/persistent column (which sounds like oxymoron, but isn't).