I have a table with a column of type "binary". When I select this column I see the data automatically converted and printed as a string, for example " ¢ZêZ". I want to write the select statement in a way that it is printed as actual zeros and ones e.g. "01001010". Note, I am running this query through a python script and dumping the results to a csv file.
If anybody has an idea how to do this your help would be really appreciate it.
I have found a way to do it at least on my vertica DB.
TO_BITSTRING(expression)
https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/SQLReferenceManual/Functions/String/TO_BITSTRING.htm
Related
I am trying to insert data from a staging table into the master table. The table has nearly 300 columns, and is a mix of data-typed Varchars, Integers, Decimals, Dates, etc.
Snowflake gives the unhelpful error message of "Numeric value '' is not recognized"
I have gone through and cut out various parts of the query to try and isolate where it is coming from. After several hours and cutting every column, it is still happening.
Does anyone know of a Snowflake diagnostic query (like Redshift has) which can tell me a specific column where the issue is occurring?
Unfortunately not at the point you're at. If you went back to the COPY INTO that loaded the data, you'd be able to use VALIDATE() function to get better information to the record and byte-offset level.
I would query your staging table for just the numeric fields and look for blanks, or you can wrap all of your fields destined for numeric fields with try_to_number() functions. A bit tedious, but might not be too bad if you don't have a lot of numbers.
https://docs.snowflake.com/en/sql-reference/functions/try_to_decimal.html
As a note, when you stage, you should try and use the NULL_IF options to get rid of bad characters and/or try to load them into stage using the actual datatypes in your stage table, so you can leverage the VALIDATE() function to make sure the data types are correct before loading into Snowflake.
Query your staging using try_to_number() and/or try_to_decimal() for number and decimal fields of the table and the use the minus to get the difference
Select $1,$2,...$300 from #stage
minus
Select $1,try_to_number($2)...$300 from#stage
If any number field has a string that cannot be converted then it will be null and then minus should return those rows which have a problem..Once you get the rows then try to analyze the columns in the result set for errors.
I have a two tables in a database in AWS Athena that I want to join.
I want to join them by several columns, one of them being date.
However in one data set the date string is encoded for single value months is encoded as
"08/31/2018"
While the other would have it encoded as
"8/31/2018"
Is there a way to make them the same format?
I am unsure if it is easier to add the extra 0 to strings which have lack the extra 0 or to concatenate strings which have the extra 0.
Based on what I have researched I think I will have to use the CASE and CONCAT functions.
Both of the tables were loaded into the database from a CSV file, and the variables are in the string format.
I have tried changing the values manually in the CSV file, tried running an R script on one of the tables to format the date in the same way, and have also tried re-loading the tables into the database as the same date format.
However no matter what I do whenever it is loaded into the database, even when they have the same date type, it always loads them with different formats.
One with the the extra 0 and the other without it.
The last avenue I haven't tried is through a SQL query.
However I am not well versed in Athena and am having a hard time formatting this query.
I know this is rather vague, so please ask me for more information if you need.
If someone could help me start this query I would be grateful.
Thank you for the help.
Here is the query for changing dates in Athena.
date_parse(table.date_variable,'%m/%d/%Y')
Though Athena tables are immutable once created.
You can convert the value to date using date_parse(). So, this should work:
date_parse(t1.datecol, '%m/%d/%Y') = str_to_date(t2.datecol, '%m/%d/%Y')
Having said that, you should fix the data model. Store dates as dates not as strings! Then you can use an equality join and that is just better all around.
I have a PostgreSQL column of type text that contains data like shown below
(32.85563, -117.25624)(32.855470000000004, -117.25648000000001)(32.85567, -117.25710000000001)(32.85544, -117.2556)
(37.75363, -121.44142000000001)(37.75292, -121.4414)
I want to convert this into another column of type text like shown below
(-117.25624, 32.85563)(-117.25648000000001,32.855470000000004 )(-117.25710000000001,32.85567 )(-117.2556,32.85544 )
(-121.44142000000001,37.75363 )(-121.4414,37.75292 )
As you can see, the values inside the parentheses have switched around. Also note that I have shown two records here to indicate that not all fields have same number of parenthesized figures.
What I've tried
I tried extracting the column to Java and performing my operations there. But due to sheer amount of records I have, I will run out of memory. I also cannot do this method in batched due to time constraints.
What I want
A SQL query or a sequence of SQL queries that will achieve the result that I have mentioned above.
I am using PostgreSQL9.4 with PGAdmin III as the client
this is a type of problem that should not be solved by sql, but you are lucky to use Postgres.
I suggest the following steps in defining your algorithm.
First part will be turning your strings into a structured data, second will transform structured data back to string in a format that you require.
From string to data
First, you need to turn your bracketed values into an array, which can be done with string_to_array function.
Now you can turn this array into rows with unnest function, which will return a row per bracketed value.
Finally you need to slit values in each row into two fields.
From data to string
You need to group results of the first query with results wrapped in string_agg function that will combine all numbers in rows into string.
You will need to experiment with brackets to achieve exactly what you want.
PS. I am not providing query here. Once you have some code that you tried, let me know.
Assuming you also have a PK or some unique column, and possibly other columns, you can do as follows:
SELECT id, (...), string_agg(point(pt[1], pt[0])::text, '') AS col_reversed
FROM (
SELECT id, (...), unnest(string_to_array(replace(col, ')(', ');('), ';'))::point AS pt
FROM my_table) sub
GROUP BY id; -- assuming id is PK or no other columns
PostgreSQL has the point type which you can use here. First you need to make sure you can properly divide the long string into individual points (insert ';' between the parentheses), then turn that into an array of individual points in text format, unnest the array into individual rows, and finally cast those rows to the point data type:
unnest(string_to_array(replace(col, ')(', ');('), ';'))::point AS pt
You can then create a new point from the point you just created, but with the coordinates reversed, turn that into a string and aggregate into your desired output:
string_agg(point(pt[1], pt[0])::text, '') AS col_reversed
But you might also move away from the text format and make an array of point values as that will be easier and faster to work with:
array_agg(point(pt[1], pt[0])) AS pt_reversed
As I put in the question, I tried extracting the column to Java and performing my operations there. But due to sheer amount of records I have, I will run out of memory. I also cannot do this method in batched due to time constraints.
I ran out of memory here as I was putting everything in a Hashmap of
< my_primary_key,the_newly_formatted_text >. As the text was very long sometimes and due to the sheer number of records that I had, it wasnt surprising that I got an OOM.
Solution that I used:
As suggested my many folks here, this solution was better solved with a code. I wrote a small script that formatted the text as per my liking and wrote the primary key and the newly formatted text to a file in tsv format. Then I imported the tsv in a new table and updated the original table from the new one.
I am currently outputting the results of an sql query to a CSV file with a header using the copy command in Postgresql. What I want to do is add a line break between the header and the start of the CSV data so that it is more readable.
Is there an easy way to do this by just altering my query? Thanks for your help!
No, there isn't. Do it in a script that post-processes the output. PostgreSQL's COPY isn't supposed to produce pretty-printed readable output, it's there to produce reliable, consistent output.
Actually, it is possible, although it is a bit of a hack:
Rename the final column of the output using an AS clause in the query to add a newline character to the end of the column. You have to be sure to double-quote the column name in order to preserve the non-standard character, and you can't use that column in the FORCE QUOTE clause of COPY.
I was able to get it to work as you requested with this query:
copy (select typname, typlen, typinput, typoutput as "typoutput^L" from pg_type limit 10) to stdout with csv header;
The first three lines of output on my system look like this:
typname,typlen,typinput,typoutput
bool,1,boolin,boolout
Please note that the "^L" in this output is my actually using the key combination in a Linux console in the PostgreSQL 9.2 psql client. Trying to use the string "^L" doesn't work. Also, this doesn't work either, although it gets you closer (note the line break):
copy (select typname, typlen, typinput, typoutput as "typoutput
" from pg_type limit 10) to stdout with csv header;
This ends up with the first two lines looking like this:
typname,typlen,typinput,"typoutput
"
Which isn't quite correct.
You'll just need to tinker in your specific environment on how to get that newline character added but hopefully that's enough info here to start.
I am not very familiar with iseries/DB2. However, I work on a website that uses it as its primary database.
A new column was recently added to an existing table. When I view it via AS400, I see the following data type:
Type: S
Length: 9
Dec: 2
This tells me it's a numeric field with 6 digits before the decimal point, and 2 digits after the decimal point.
When I query the data with a simple SELECT (SELECT MYCOL FROM MYTABLE), I get back all the records without a problem. However, when I try using a DISTINCT, GROUP BY, or ORDER BY on that same column I get the following exception:
[SQL0802] Data conversion of data mapping error
I've deduced that at least one record has invalid data - what my DBA calls "blanks" or "4 O". How is this possible though? Shouldn't the database throw an exception when invalid data is attempted to be added to that column?
Is there any way I can get around this, such as filtering out those bad records in my query?
"4 O" means 0x40 which is the EBCDIC code for a space or blank character and is the default value placed into any new space in a record.
Legacy programs / operations can introduce the decimal data error. For example if the new file was created and filled using the CPYF command with the FMTOPT(*NOCHK) option.
The easiest way to fix it is to write an HLL program (RPG) to read the file and correct the records.
The only solution I could find was to write a script that checks for blank values in the column and then updates them to zero when they are found.
If the file has record format level checking turned off [ie. LVLCHK(*NO)] or is overridden to that, then an HLL program. (ex. RPG, COBOL, etc) that was not recompiled with the new record might write out records with invalid data in this column, especially if the new column is not at the end of the record.
Make sure that all programs that use native I/O to write or update records on this file are recompiled.
I was able to solve this error by force-casting the key columns to integer. I changed the join from this...
FROM DAILYV INNER JOIN BXV ON DAILYV.DAITEM=BXV.BXPACK
...to this...
FROM DAILYV INNER JOIN BXV ON CAST(DAILYV.DAITEM AS INT)=CAST(BXV.BXPACK AS INT)
...and I didn't have to make any corrections to the tables. This is a very old, very messy database with lots of junk in it. I've made many corrections, but it's a work in progress.