How to change mileage representation forms in sql - sql

I would like to change the manner in which the mileage is represented in the database. For example, right now the mileage is represented as 080+0.348; this would mean that this particular feature is at mileage point 80.348 along the roadway corridor. I would like to have the data represented in the database in the latter form, 80.348 and so on. This would save me from having to export the dataset to excel for the conversion. Is this even possible? The name of the column is NRLG_MILEPOINT.
Much appreciated.

One thing you could try is to pick the string value apart into its component pieces and then recombine them as a number. If your data is in a table called TEST you might do something like the following:
select miles, fraction,
nvl(to_number(miles), 0) + nvl(to_number(fraction), 0) as milepoint
from (select regexp_substr(nrlg_milepoint, '[0-9]*') as miles,
regexp_substr(nrlg_milepoint, '[+-][0-9.]*') as fraction
from test);
SQLFiddle here.
Share and enjoy.

Using the answer provided above, I was able to expand it to get exactly the answer i needed. Thanks a ton to everyone who helped! Here is the query i ended up with:
select distinct nrlg_dept_route,corridor_code_rb,nrlg_county,next_county,
nvl(to_number(miles), 0) + nvl(to_number(fraction), 0) as milepoint
from (select regexp_substr(nrlg_milepoint, '[0-9]*') as miles,
nrlg_milepoint as nrlg_mile_point
nrlg_dept_route as nrlg_dept_route,
nrlg_county as nrlg_county,
next_county as next_county,
corridor_code_rb as corridor_code_rb,
corridor_code as corridor_code,
regexp_substr(nrlg_milepoint, '[+-][0-9.]*') as fraction
from corridor_county_intersect,south_van_data_view)
where nrlg_dept_route = corridor_code
order by 1,5;

There are a variety of ways to do this. Which one depends on your situation, how the data needs to be stored, and how it is being interacted with. Some of these options include:
Changing the datatype.
This option would potentially require you to change how the data is being stored currently. The conversion of the data would have to be done by whatever is writing the data to the schema currently.
Creating another column that stores the data in the correct format.
If you have an existing means of storing the data that would be broken by changing the datatype of NRLG_MILEPOINT and/or you have a requirement to store the data in that format; you could optionally add another column... say NRLG_MILEAGE_DISPLAY that is of a datatype number perhaps, and store the data there. You could make a trigger that updates/inserts NRLG_MILEAGE_DISPLAY appropriately, based on the data in NRLG_MILEPOINT.
If you are just wanting the data to be displayed differently in your select statement, you can convert the datatype in your SQL statement. Specifically how you would do this depends on the current datatype of NRLG_MILEPOINT.
Assuming that varchar2 is the type, based on the comments, here is an SQLFIDDLE link displaying a crude example of option 3. Your usage of this may vary depending on the actual datatype of NRLG_MILEPOINT. Regardless of its datatype... I am sure there is a means of converting how it is displayed in your query. You could take this further and create a view if you needed to. As an inline view or as a stored view, you can then use the converted value for doing your join later.

Related

How can I change a date field from String to Date or DateTime?

I an using Google Big Query and I have a field, named 'AsOfDate' which is set as a string datatype. I have a bunch of data in this field, which I really want to set as DateTime or just Date. Either is fine. I Googled for a solution, and I thought this would be pretty easy to do, but I can't seem to get the data type updated. I don't want to run a simple select statement; I want to permanently change the Schema. Has anyone run into this and figured out how to do this kind of thing? If so, please share your insights. Thanks!
To quote directly from the official documentation: 'Changing a column's data type is not supported by the BigQuery web UI, the command-line tool, or the API.'
https://cloud.google.com/bigquery/docs/manually-changing-schemas#changing_a_columns_data_type
There are two ways to manually change a column's data type:
Using a SQL query — Choose this option if you are more concerned about
simplicity and ease of use, and you are less concerned about costs.
Recreating the table — Choose this option if you are more concerned
about costs, and you are less concerned about simplicity and ease of
use.
You could use either of the approaches above along with the PARSE_DATE() function to transform your string into a date field.
https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators#parse_date

SQL data type - recommendation for 'unknown' number

I'm pulling in some external data into my MSSQL server. Several columns of incoming data are marked as 'number' (it's a json file). It's millions of rows in size and many of the columns appear to be decimal (18,2) like 23.33. But I can't be sure that it will always be like that, in fact a few have been 23.333 or longer numbers like 23.35555555 which will mess up my import.
So my question is given a column is going to have some kind of number imported into it, but I can't be sure really how big or how many decimal places it's going to have... do I have to resort to making my column a varchar or is there a very generic number kind of column I'm not thinking of?
Is there a max size decimal, sort of like using VARCHAR(8000) or VARCHAR(MAX) ?
update
This is the 'data type' of number that I'm pulling in:
https://dev.socrata.com/docs/datatypes/number.html#
Looks like it can be pretty much any number, as per their writing:
"Numbers are arbitrary precision, arbitrary scale numbers."
The way I handle things like this is to import the raw data into a staging table in a varchar(max) column.
Then I use TRY_PARSE() or TRY_CONVERT() when moving it to the desired datatype in my final destination table.
The point here is that the shape of the incoming data shouldn't determine the datatype you use. The datatype should be determined by the usage of the data once it's in your table. And if the incoming data doesn't fit, there are ways of making it fit.
What do those numbers represent? If they are just values to show you could just set float as datatype and you're good to go.
But if they are coordinates or currencies or anything you need for absolute precise calculations float might sometimes give rounding problems. Then you should set your desired minimal precision with decimal and simply truncate what's eventually over.
For instance if most of the numbers have two decimals, you could go with 3 or 4 decimal points to be sure, but over that it will be cut.

(Oracle/SQL) Merge all data types into a single column

Let me explain why I want to do this... I have built a Tableau dashboard that allows a user to browse/search all of the tables & columns in our warehouse by schema, object type (table,view,materialized view), etc. I want to add a column that pulls a sample of the data from each column in each table - this is also done, but with this problem...:
The resulting column is comprised of data of different types (varchar2, LONG, etc.). I can basically get every type of data to conform to a single data type except for LONG - it will not allow me to convert it to anything else compatible with everything else (if that makes sense...). I simply need all data types to coexist in a single column. I've tried many different things and have been reading up on the subject for about a week now, but it sounds like it just can't be done, but in my experience there is always a way... I figured I'd check with the guru's here before admitting defeat.
One of the things I've tried:
--Here, from two different tables, I'm pulling a single piece of data from a single column and attempting to merge into a single column called SAMPLE_DATA
--OTHER is LONG data type
--ORGN_NME is VARCHAR2 data type
select 'PLAN','OTHER', cast(substr(OTHER,1,2) as varchar2(4000)) as SAMPLE_DATA from sde.PLAN union all
select 'BUS_ORGN','ORGN_NME', cast(substr(ORGN_NME,1,2) as varchar2(4000)) as SAMPLE_DATA from sde.BUS_ORGN;
Resulting error:
Lookup Error
ORA-00932: inconsistent datatypes: expected CHAR got LONG
How can I achieve this?
Thanks in advance
Long datatypes are basically unusable by most applications. I made something similar where I wanted to search the contents of packages. The solution is to convert the LONG into CLOB using a pipelined function. Adrian Billington's source code can be found here:
https://github.com/oracle-developer/dla
You end up with a view that you can query. I did not see any performance hit even when looking at large packages so it should work for you.

Split multiple points in text format and switch coordinates in postgres column

I have a PostgreSQL column of type text that contains data like shown below
(32.85563, -117.25624)(32.855470000000004, -117.25648000000001)(32.85567, -117.25710000000001)(32.85544, -117.2556)
(37.75363, -121.44142000000001)(37.75292, -121.4414)
I want to convert this into another column of type text like shown below
(-117.25624, 32.85563)(-117.25648000000001,32.855470000000004 )(-117.25710000000001,32.85567 )(-117.2556,32.85544 )
(-121.44142000000001,37.75363 )(-121.4414,37.75292 )
As you can see, the values inside the parentheses have switched around. Also note that I have shown two records here to indicate that not all fields have same number of parenthesized figures.
What I've tried
I tried extracting the column to Java and performing my operations there. But due to sheer amount of records I have, I will run out of memory. I also cannot do this method in batched due to time constraints.
What I want
A SQL query or a sequence of SQL queries that will achieve the result that I have mentioned above.
I am using PostgreSQL9.4 with PGAdmin III as the client
this is a type of problem that should not be solved by sql, but you are lucky to use Postgres.
I suggest the following steps in defining your algorithm.
First part will be turning your strings into a structured data, second will transform structured data back to string in a format that you require.
From string to data
First, you need to turn your bracketed values into an array, which can be done with string_to_array function.
Now you can turn this array into rows with unnest function, which will return a row per bracketed value.
Finally you need to slit values in each row into two fields.
From data to string
You need to group results of the first query with results wrapped in string_agg function that will combine all numbers in rows into string.
You will need to experiment with brackets to achieve exactly what you want.
PS. I am not providing query here. Once you have some code that you tried, let me know.
Assuming you also have a PK or some unique column, and possibly other columns, you can do as follows:
SELECT id, (...), string_agg(point(pt[1], pt[0])::text, '') AS col_reversed
FROM (
SELECT id, (...), unnest(string_to_array(replace(col, ')(', ');('), ';'))::point AS pt
FROM my_table) sub
GROUP BY id; -- assuming id is PK or no other columns
PostgreSQL has the point type which you can use here. First you need to make sure you can properly divide the long string into individual points (insert ';' between the parentheses), then turn that into an array of individual points in text format, unnest the array into individual rows, and finally cast those rows to the point data type:
unnest(string_to_array(replace(col, ')(', ');('), ';'))::point AS pt
You can then create a new point from the point you just created, but with the coordinates reversed, turn that into a string and aggregate into your desired output:
string_agg(point(pt[1], pt[0])::text, '') AS col_reversed
But you might also move away from the text format and make an array of point values as that will be easier and faster to work with:
array_agg(point(pt[1], pt[0])) AS pt_reversed
As I put in the question, I tried extracting the column to Java and performing my operations there. But due to sheer amount of records I have, I will run out of memory. I also cannot do this method in batched due to time constraints.
I ran out of memory here as I was putting everything in a Hashmap of
< my_primary_key,the_newly_formatted_text >. As the text was very long sometimes and due to the sheer number of records that I had, it wasnt surprising that I got an OOM.
Solution that I used:
As suggested my many folks here, this solution was better solved with a code. I wrote a small script that formatted the text as per my liking and wrote the primary key and the newly formatted text to a file in tsv format. Then I imported the tsv in a new table and updated the original table from the new one.

Force numerical order on a SQL Server 2005 varchar column, containing letters and numbers?

I have a column containing the strings 'Operator (1)' and so on until 'Operator (600)' so far.
I want to get them numerically ordered and I've come up with
select colname from table order by
cast(replace(replace(colname,'Operator (',''),')','') as int)
which is very very ugly.
Better suggestions?
It's that, InStr()/SubString(), changing Operator(1) to Operator(001), storing the n in Operator(n) separately, or creating a computed column that hides the ugly string manipulation. What you have seems fine.
If you really have to leave the data in the format you have - and adding a numeric sort order column is the better solution - then consider wrapping the text manipulation up in a user defined function.
select colname from table order by dbo.udfSortOperator(colname)
It's less ugly and gives you some abstraction. There's an additional overhead of the function call but on a table containing low thousands of rows in a not-too-heavily hit database server it's not a major concern. Make notes in the function to optomise later as required.
My answer would be to change the problem. I would add an operatorNumber field to the table if that is possible. Change the update/insert routines to extract the number and store it. That way the string conversion hit is only once per record.
The ordering logic would require the string conversion every time the query is run.
Well, first define the meaning of that column. Is operator a name so you can justify using chars? Or is it a number?
If the field is a name then you will use chars, and then you would want to determine the fixed length. Pad all operator names with zeros on the left. Define naming rules for operators (I.E. No leters. Or the codes you would use in a series like "A001")
An index will sort the physical data in the server. And a properly define text naming will sort them on a query. You would want both.
If the operator is a number, then you got the data type for that column wrong and needs to be changed.
Indexed computed column
If you find yourself ordering on or otherwise querying operator column often, consider creating a computed column for its numeric value and adding an index for it. This will give you a computed/persistent column (which sounds like oxymoron, but isn't).