(Oracle/SQL) Merge all data types into a single column - sql

Let me explain why I want to do this... I have built a Tableau dashboard that allows a user to browse/search all of the tables & columns in our warehouse by schema, object type (table,view,materialized view), etc. I want to add a column that pulls a sample of the data from each column in each table - this is also done, but with this problem...:
The resulting column is comprised of data of different types (varchar2, LONG, etc.). I can basically get every type of data to conform to a single data type except for LONG - it will not allow me to convert it to anything else compatible with everything else (if that makes sense...). I simply need all data types to coexist in a single column. I've tried many different things and have been reading up on the subject for about a week now, but it sounds like it just can't be done, but in my experience there is always a way... I figured I'd check with the guru's here before admitting defeat.
One of the things I've tried:
--Here, from two different tables, I'm pulling a single piece of data from a single column and attempting to merge into a single column called SAMPLE_DATA
--OTHER is LONG data type
--ORGN_NME is VARCHAR2 data type
select 'PLAN','OTHER', cast(substr(OTHER,1,2) as varchar2(4000)) as SAMPLE_DATA from sde.PLAN union all
select 'BUS_ORGN','ORGN_NME', cast(substr(ORGN_NME,1,2) as varchar2(4000)) as SAMPLE_DATA from sde.BUS_ORGN;
Resulting error:
Lookup Error
ORA-00932: inconsistent datatypes: expected CHAR got LONG
How can I achieve this?
Thanks in advance

Long datatypes are basically unusable by most applications. I made something similar where I wanted to search the contents of packages. The solution is to convert the LONG into CLOB using a pipelined function. Adrian Billington's source code can be found here:
https://github.com/oracle-developer/dla
You end up with a view that you can query. I did not see any performance hit even when looking at large packages so it should work for you.

Related

"Numeric value '' is not recognized" - what column?

I am trying to insert data from a staging table into the master table. The table has nearly 300 columns, and is a mix of data-typed Varchars, Integers, Decimals, Dates, etc.
Snowflake gives the unhelpful error message of "Numeric value '' is not recognized"
I have gone through and cut out various parts of the query to try and isolate where it is coming from. After several hours and cutting every column, it is still happening.
Does anyone know of a Snowflake diagnostic query (like Redshift has) which can tell me a specific column where the issue is occurring?
Unfortunately not at the point you're at. If you went back to the COPY INTO that loaded the data, you'd be able to use VALIDATE() function to get better information to the record and byte-offset level.
I would query your staging table for just the numeric fields and look for blanks, or you can wrap all of your fields destined for numeric fields with try_to_number() functions. A bit tedious, but might not be too bad if you don't have a lot of numbers.
https://docs.snowflake.com/en/sql-reference/functions/try_to_decimal.html
As a note, when you stage, you should try and use the NULL_IF options to get rid of bad characters and/or try to load them into stage using the actual datatypes in your stage table, so you can leverage the VALIDATE() function to make sure the data types are correct before loading into Snowflake.
Query your staging using try_to_number() and/or try_to_decimal() for number and decimal fields of the table and the use the minus to get the difference
Select $1,$2,...$300 from #stage
minus
Select $1,try_to_number($2)...$300 from#stage
If any number field has a string that cannot be converted then it will be null and then minus should return those rows which have a problem..Once you get the rows then try to analyze the columns in the result set for errors.

Import PostgreSQL dump into SQL Server - data type errors

I have some data which was dumped from a PostgreSQL database (allegedly, using pg_dump) which needs to get imported into SQL Server.
While the data types are ok, I am running into an issue where there seems to be a placeholder for a NULL. I see a backslash followed by an uppercase N in many fields. Below is a snippet of the data, as viewed from within Excel. Left column has a Boolean data type, and the right one has an integer as the data type
Some of these are supposed to be of the Boolean datatype, and having two characters in there is most certainly not going to fly.
Here's what I tried so far:
Import via dirty read - keeping whatever datatypes SSIS decided each field had; to no avail. There were error messages about truncation on all of the boolean fields.
Creating a table for the data based on the correct data types, though this was more fun... I needed to do the same as in the dirty read, as the source would otherwise not load properly. There was also a need to transform the data into the correct data type for insertion into the destination data source; yet, I am getting truncation issues, when it most certainly shouldn't be.
Here is a sample expression in my derived column transformation editor:
(DT_BOOL)REPLACE(observation,"\\N","")
The data type should be Boolean.
Any suggestion would be really helpful!
Thanks!
Since I was unable to circumvent the SSIS rules in order to get my data into my tables without an error, I took the quick-and-dirty approach.
The solution which worked for me was to have the source data read each column as if it were a string, and the destination table had all fields be of the datatype VARCHAR. This destination table will be used as a staging table, once in SS, I can manipulate as needed.
Thank you #cha for your input.

Split multiple points in text format and switch coordinates in postgres column

I have a PostgreSQL column of type text that contains data like shown below
(32.85563, -117.25624)(32.855470000000004, -117.25648000000001)(32.85567, -117.25710000000001)(32.85544, -117.2556)
(37.75363, -121.44142000000001)(37.75292, -121.4414)
I want to convert this into another column of type text like shown below
(-117.25624, 32.85563)(-117.25648000000001,32.855470000000004 )(-117.25710000000001,32.85567 )(-117.2556,32.85544 )
(-121.44142000000001,37.75363 )(-121.4414,37.75292 )
As you can see, the values inside the parentheses have switched around. Also note that I have shown two records here to indicate that not all fields have same number of parenthesized figures.
What I've tried
I tried extracting the column to Java and performing my operations there. But due to sheer amount of records I have, I will run out of memory. I also cannot do this method in batched due to time constraints.
What I want
A SQL query or a sequence of SQL queries that will achieve the result that I have mentioned above.
I am using PostgreSQL9.4 with PGAdmin III as the client
this is a type of problem that should not be solved by sql, but you are lucky to use Postgres.
I suggest the following steps in defining your algorithm.
First part will be turning your strings into a structured data, second will transform structured data back to string in a format that you require.
From string to data
First, you need to turn your bracketed values into an array, which can be done with string_to_array function.
Now you can turn this array into rows with unnest function, which will return a row per bracketed value.
Finally you need to slit values in each row into two fields.
From data to string
You need to group results of the first query with results wrapped in string_agg function that will combine all numbers in rows into string.
You will need to experiment with brackets to achieve exactly what you want.
PS. I am not providing query here. Once you have some code that you tried, let me know.
Assuming you also have a PK or some unique column, and possibly other columns, you can do as follows:
SELECT id, (...), string_agg(point(pt[1], pt[0])::text, '') AS col_reversed
FROM (
SELECT id, (...), unnest(string_to_array(replace(col, ')(', ');('), ';'))::point AS pt
FROM my_table) sub
GROUP BY id; -- assuming id is PK or no other columns
PostgreSQL has the point type which you can use here. First you need to make sure you can properly divide the long string into individual points (insert ';' between the parentheses), then turn that into an array of individual points in text format, unnest the array into individual rows, and finally cast those rows to the point data type:
unnest(string_to_array(replace(col, ')(', ');('), ';'))::point AS pt
You can then create a new point from the point you just created, but with the coordinates reversed, turn that into a string and aggregate into your desired output:
string_agg(point(pt[1], pt[0])::text, '') AS col_reversed
But you might also move away from the text format and make an array of point values as that will be easier and faster to work with:
array_agg(point(pt[1], pt[0])) AS pt_reversed
As I put in the question, I tried extracting the column to Java and performing my operations there. But due to sheer amount of records I have, I will run out of memory. I also cannot do this method in batched due to time constraints.
I ran out of memory here as I was putting everything in a Hashmap of
< my_primary_key,the_newly_formatted_text >. As the text was very long sometimes and due to the sheer number of records that I had, it wasnt surprising that I got an OOM.
Solution that I used:
As suggested my many folks here, this solution was better solved with a code. I wrote a small script that formatted the text as per my liking and wrote the primary key and the newly formatted text to a file in tsv format. Then I imported the tsv in a new table and updated the original table from the new one.

How to change mileage representation forms in sql

I would like to change the manner in which the mileage is represented in the database. For example, right now the mileage is represented as 080+0.348; this would mean that this particular feature is at mileage point 80.348 along the roadway corridor. I would like to have the data represented in the database in the latter form, 80.348 and so on. This would save me from having to export the dataset to excel for the conversion. Is this even possible? The name of the column is NRLG_MILEPOINT.
Much appreciated.
One thing you could try is to pick the string value apart into its component pieces and then recombine them as a number. If your data is in a table called TEST you might do something like the following:
select miles, fraction,
nvl(to_number(miles), 0) + nvl(to_number(fraction), 0) as milepoint
from (select regexp_substr(nrlg_milepoint, '[0-9]*') as miles,
regexp_substr(nrlg_milepoint, '[+-][0-9.]*') as fraction
from test);
SQLFiddle here.
Share and enjoy.
Using the answer provided above, I was able to expand it to get exactly the answer i needed. Thanks a ton to everyone who helped! Here is the query i ended up with:
select distinct nrlg_dept_route,corridor_code_rb,nrlg_county,next_county,
nvl(to_number(miles), 0) + nvl(to_number(fraction), 0) as milepoint
from (select regexp_substr(nrlg_milepoint, '[0-9]*') as miles,
nrlg_milepoint as nrlg_mile_point
nrlg_dept_route as nrlg_dept_route,
nrlg_county as nrlg_county,
next_county as next_county,
corridor_code_rb as corridor_code_rb,
corridor_code as corridor_code,
regexp_substr(nrlg_milepoint, '[+-][0-9.]*') as fraction
from corridor_county_intersect,south_van_data_view)
where nrlg_dept_route = corridor_code
order by 1,5;
There are a variety of ways to do this. Which one depends on your situation, how the data needs to be stored, and how it is being interacted with. Some of these options include:
Changing the datatype.
This option would potentially require you to change how the data is being stored currently. The conversion of the data would have to be done by whatever is writing the data to the schema currently.
Creating another column that stores the data in the correct format.
If you have an existing means of storing the data that would be broken by changing the datatype of NRLG_MILEPOINT and/or you have a requirement to store the data in that format; you could optionally add another column... say NRLG_MILEAGE_DISPLAY that is of a datatype number perhaps, and store the data there. You could make a trigger that updates/inserts NRLG_MILEAGE_DISPLAY appropriately, based on the data in NRLG_MILEPOINT.
If you are just wanting the data to be displayed differently in your select statement, you can convert the datatype in your SQL statement. Specifically how you would do this depends on the current datatype of NRLG_MILEPOINT.
Assuming that varchar2 is the type, based on the comments, here is an SQLFIDDLE link displaying a crude example of option 3. Your usage of this may vary depending on the actual datatype of NRLG_MILEPOINT. Regardless of its datatype... I am sure there is a means of converting how it is displayed in your query. You could take this further and create a view if you needed to. As an inline view or as a stored view, you can then use the converted value for doing your join later.

Why am I getting a "[SQL0802] Data conversion of data mapping error" exception?

I am not very familiar with iseries/DB2. However, I work on a website that uses it as its primary database.
A new column was recently added to an existing table. When I view it via AS400, I see the following data type:
Type: S
Length: 9
Dec: 2
This tells me it's a numeric field with 6 digits before the decimal point, and 2 digits after the decimal point.
When I query the data with a simple SELECT (SELECT MYCOL FROM MYTABLE), I get back all the records without a problem. However, when I try using a DISTINCT, GROUP BY, or ORDER BY on that same column I get the following exception:
[SQL0802] Data conversion of data mapping error
I've deduced that at least one record has invalid data - what my DBA calls "blanks" or "4 O". How is this possible though? Shouldn't the database throw an exception when invalid data is attempted to be added to that column?
Is there any way I can get around this, such as filtering out those bad records in my query?
"4 O" means 0x40 which is the EBCDIC code for a space or blank character and is the default value placed into any new space in a record.
Legacy programs / operations can introduce the decimal data error. For example if the new file was created and filled using the CPYF command with the FMTOPT(*NOCHK) option.
The easiest way to fix it is to write an HLL program (RPG) to read the file and correct the records.
The only solution I could find was to write a script that checks for blank values in the column and then updates them to zero when they are found.
If the file has record format level checking turned off [ie. LVLCHK(*NO)] or is overridden to that, then an HLL program. (ex. RPG, COBOL, etc) that was not recompiled with the new record might write out records with invalid data in this column, especially if the new column is not at the end of the record.
Make sure that all programs that use native I/O to write or update records on this file are recompiled.
I was able to solve this error by force-casting the key columns to integer. I changed the join from this...
FROM DAILYV INNER JOIN BXV ON DAILYV.DAITEM=BXV.BXPACK
...to this...
FROM DAILYV INNER JOIN BXV ON CAST(DAILYV.DAITEM AS INT)=CAST(BXV.BXPACK AS INT)
...and I didn't have to make any corrections to the tables. This is a very old, very messy database with lots of junk in it. I've made many corrections, but it's a work in progress.