So when I was trying to hash 25 columns using ORA_HASH function I was getting error: too many parameter.
Is there any way we can hash all 25 columns and quickly because we have around 60M rows and no Update date :(
select ORA_HASH
(id,name,c....,...) form table name
Use concatenation with some special string as delimited e.g. here chr(10) assuming this charter doesn't appear in you data
col1||chr(10)||col1||....
Be carefull with numeric and data columns.
Either convert them explicitely in character columns, e.g.
...||to_char(col_date,'yyyy-mm-dd hh24:mi:ss')||...
or temorary override the session setting to have a constant values
ALTER SESSION SET NLS_NUMERIC_CHARACTERS = ',.']';
ALTER SESSION SET NLS_DATE_FORMAT = 'DD.MM.YYYY HH24:MI:SS';
The problem with NLS setting is, when they change and you perform a default conversion to character string - you get a different hash code.
Note also, that ORA_HASH can lead to duplicates, consider e.g. MD5 hash code to recognise change in the table data.
Final note Oracle has a (not well known) function DBMS_SQLHASH.GETHASH whitch may or may not be what you are looking for.
Surely your ultimate goal is not to get a hash? What is the hash for? It may very well not be the right way to achieve your goal.
Second, ORA_HASH is a weak, 32-bit hash that will produce a hash collision about every 25,000 rows! I wrote a whole blog post about this, see:
https://stewashton.wordpress.com/2014/02/15/compare-and-sync-tables-dbms_comparison/
Third, starting with version 12c there is a STANDARD_HASH function that seems to perform quite well and that goes up to 512 bits! (not bytes as I said before editing this answer...)
Finally, the right way to hash several things together is "hash chaining", not concatenating the values. ORA_HASH appears to support hash chaining (or something of similar effect) using the third parameter:
ora_hash(column1, 4294967295, ora_hash(column2))
With STANDARD_HASH, I would first use it on each column individually, then use UTL_RAW.CONCAT to concatenate the results, then either use STANDARD_HASH on the concatenated result or just use the concatenated value as if it were a big hash.
Related
I currently have a column in a DB2 table which is being passed through web calls and procedure by a character-encrypted value. It is type CHARACTER(13) with a CSSID for encryption.
This has become a huge pain to accommodate through multiple APIs but was initially intended to allow us a unique ID to use in calls that wasn't the primary key.
In DB2-400, what would be the next best thing as far as a 13 or more character string that is unique and randomly created upon insert, but doesn't require decryption (just a plain string)?
Is there a commonly-gravitated-to method for this? We aren't passing secure data, so there's no need for encryption, but we just want a randomly created and unique character
Try hex(generate_unique()). It's unique CHAR(26) string.
Or to_char(timestamp(generate_unique()), 'YYYYMMDDHH24MISSFF6'). You may play with format of the to_char function as well. May be useful to use, let's say, reverse format like FF6SSMIHH24DDMMYYYY to avoid unique index page contention upon heavy insert activity.
This is a comment that doesn't fit in the comments section.
I don't have access to a DB2-400 (anymore), but I tested the code below in DB2 10.5 for Linux.
create sequence seq1;
select concat('A', varchar_format(next value for seq1, '000000000000')) as my_id
from sysibm.sysdummy1;
Result, if you run it 4 times in a row:
A0000000000001
A0000000000002
A0000000000003
A0000000000004
Maybe there's something equivalent in DB2-400.
Sounds like you might be using GENERATE_UNIQUE()
GENERATE_UNIQUE function returns a bit data character string 13 bytes
long (CHAR(13) FOR BIT DATA)
Doesn't really have anything to do with encryption...
And pretty much the ideal solution in my opinion generating a unique value other than a simple numeric identity. So what the problem you are having?
I have a PostgreSQL column of type text that contains data like shown below
(32.85563, -117.25624)(32.855470000000004, -117.25648000000001)(32.85567, -117.25710000000001)(32.85544, -117.2556)
(37.75363, -121.44142000000001)(37.75292, -121.4414)
I want to convert this into another column of type text like shown below
(-117.25624, 32.85563)(-117.25648000000001,32.855470000000004 )(-117.25710000000001,32.85567 )(-117.2556,32.85544 )
(-121.44142000000001,37.75363 )(-121.4414,37.75292 )
As you can see, the values inside the parentheses have switched around. Also note that I have shown two records here to indicate that not all fields have same number of parenthesized figures.
What I've tried
I tried extracting the column to Java and performing my operations there. But due to sheer amount of records I have, I will run out of memory. I also cannot do this method in batched due to time constraints.
What I want
A SQL query or a sequence of SQL queries that will achieve the result that I have mentioned above.
I am using PostgreSQL9.4 with PGAdmin III as the client
this is a type of problem that should not be solved by sql, but you are lucky to use Postgres.
I suggest the following steps in defining your algorithm.
First part will be turning your strings into a structured data, second will transform structured data back to string in a format that you require.
From string to data
First, you need to turn your bracketed values into an array, which can be done with string_to_array function.
Now you can turn this array into rows with unnest function, which will return a row per bracketed value.
Finally you need to slit values in each row into two fields.
From data to string
You need to group results of the first query with results wrapped in string_agg function that will combine all numbers in rows into string.
You will need to experiment with brackets to achieve exactly what you want.
PS. I am not providing query here. Once you have some code that you tried, let me know.
Assuming you also have a PK or some unique column, and possibly other columns, you can do as follows:
SELECT id, (...), string_agg(point(pt[1], pt[0])::text, '') AS col_reversed
FROM (
SELECT id, (...), unnest(string_to_array(replace(col, ')(', ');('), ';'))::point AS pt
FROM my_table) sub
GROUP BY id; -- assuming id is PK or no other columns
PostgreSQL has the point type which you can use here. First you need to make sure you can properly divide the long string into individual points (insert ';' between the parentheses), then turn that into an array of individual points in text format, unnest the array into individual rows, and finally cast those rows to the point data type:
unnest(string_to_array(replace(col, ')(', ');('), ';'))::point AS pt
You can then create a new point from the point you just created, but with the coordinates reversed, turn that into a string and aggregate into your desired output:
string_agg(point(pt[1], pt[0])::text, '') AS col_reversed
But you might also move away from the text format and make an array of point values as that will be easier and faster to work with:
array_agg(point(pt[1], pt[0])) AS pt_reversed
As I put in the question, I tried extracting the column to Java and performing my operations there. But due to sheer amount of records I have, I will run out of memory. I also cannot do this method in batched due to time constraints.
I ran out of memory here as I was putting everything in a Hashmap of
< my_primary_key,the_newly_formatted_text >. As the text was very long sometimes and due to the sheer number of records that I had, it wasnt surprising that I got an OOM.
Solution that I used:
As suggested my many folks here, this solution was better solved with a code. I wrote a small script that formatted the text as per my liking and wrote the primary key and the newly formatted text to a file in tsv format. Then I imported the tsv in a new table and updated the original table from the new one.
I'm importing data from one system to another. The former keys off an alphanumeric field whereas the latter requires a numeric integer field. I'd like to find or write a function that I can feed the alphanumeric value to and have it return a number that would be unique to the value passed in.
My first thought was to do a hash, but of course the result of any built in hashes are going to contains letters and plus it's technically possible (however unlikely) that a hash may not be unique.
My first question is whether there is anything built in to sql that I'm overlooking, and short of that I'd like to hear suggestions on the easiest way to implement such a function.
Here is a function which will probably convert from base 10 (integer) to base 36 (alphanumeric) and back again:
https://www.simple-talk.com/sql/t-sql-programming/numeral-systems-and-numbers-conversion-in-sql/
You might find the resultant number is too big to be held in an integer though.
You could concatenate the ascii values of each character of your string and cast the result as a bigint.
If the original data is known to be integers you can use cast:
SELECT CAST(varcharcol AS INT) FROM Table
This is with Postgresql.
A column in a table contains string values with punctuations. The values are "aac", ".aaa", "aa_b", etc. When this column is specified in order by clause, the order of results is almost random. The strings starting with a period should appear at the top, which doesn't happen. They appear somewhere in the middle.
Surprisingly, this behavior is seen with only one database. The same query works fine on database on other host.
What could be the possible reason for this?
The "order by" (string comparison) behaviour depends on the cluster's locale.
First, check the EXPLAIN and see how it's doing the sort.
If it's calling an user-defined comparison function, look at that function.
If it's walking an index, see if that index is using an incorrect sorting function (one that's not transitive or some such).
If EXPLAIN doesn't show anything odd, check the cluster's locale - perhaps it's doing the comparison using a locale that ignores certain characters.
I have a column containing the strings 'Operator (1)' and so on until 'Operator (600)' so far.
I want to get them numerically ordered and I've come up with
select colname from table order by
cast(replace(replace(colname,'Operator (',''),')','') as int)
which is very very ugly.
Better suggestions?
It's that, InStr()/SubString(), changing Operator(1) to Operator(001), storing the n in Operator(n) separately, or creating a computed column that hides the ugly string manipulation. What you have seems fine.
If you really have to leave the data in the format you have - and adding a numeric sort order column is the better solution - then consider wrapping the text manipulation up in a user defined function.
select colname from table order by dbo.udfSortOperator(colname)
It's less ugly and gives you some abstraction. There's an additional overhead of the function call but on a table containing low thousands of rows in a not-too-heavily hit database server it's not a major concern. Make notes in the function to optomise later as required.
My answer would be to change the problem. I would add an operatorNumber field to the table if that is possible. Change the update/insert routines to extract the number and store it. That way the string conversion hit is only once per record.
The ordering logic would require the string conversion every time the query is run.
Well, first define the meaning of that column. Is operator a name so you can justify using chars? Or is it a number?
If the field is a name then you will use chars, and then you would want to determine the fixed length. Pad all operator names with zeros on the left. Define naming rules for operators (I.E. No leters. Or the codes you would use in a series like "A001")
An index will sort the physical data in the server. And a properly define text naming will sort them on a query. You would want both.
If the operator is a number, then you got the data type for that column wrong and needs to be changed.
Indexed computed column
If you find yourself ordering on or otherwise querying operator column often, consider creating a computed column for its numeric value and adding an index for it. This will give you a computed/persistent column (which sounds like oxymoron, but isn't).