Pixel issue of characters in concatenation. Cols should be fixed as per the width for all the cols concatenated - sql

These values are stored as part of 1 column. They are actually 3 cols concatenated through sql oracle. How can I make these values look aligned as 3 different cols?
Currently, these numbers will be 3 characters for 1 col, 10 for second and 7 for 3rd col. For some reason I had to bring them in 1 column to use in my rtf template. I need a solution in SQL Oracle to set width in concatenated cols. I am using cast here but this is the output I am getting
CAST(TO_CHAR(G_WEIGHTAGE)||'%' as char(3))||' '
||CAST(TO_CHAR(G_ACHIEVE , '990.00')||'%' as char(13)) ||' '
||CAST( TO_CHAR(ROUND((G_ACHIEVE_FACTOR*(G_WEIGHTAGE/100)),2)*(TARGET_BONUS/100), '990.00')||'%' as char(8)) AS G_WAIT
This is for 1st line in this output. I have written for other lines as well same way.
This is how it is coming
40% 96.79% 9.60%
10% 99.89% 2.70%
20% NA/ 51.42% 0.00%
10% 62.90% 0.00%
10% 112.77% 4.80%
Output I need :
40% 96.79% 9.60%
10% 99.89% 2.70%
20% NA/51.42% 0.00%
10% 62.90% 0.00%
10% 112.77% 4.80%
I believe this is due to different characters having different width. Like N has different width in comparison to 1.

Yeah, you want a PAD function. They add spaces to make a string a specific width. Try this:
lpad(TO_CHAR(G_WEIGHTAGE)||'%',3) ||' '
||lpad(TO_CHAR(G_ACHIEVE , '990.00')||'%',13) ||' '
||lpad(TO_CHAR(ROUND((G_ACHIEVE_FACTOR*(G_WEIGHTAGE/100)),9)*(TARGET_BONUS/100), '990.00')||'%',7) AS G_WAIT

Related

string representation of DataFrame with max width and ellipsized columns without terminal

I have a pandas.DataFrame which I would like to represent as string (not in Jupyter, not in IPython) with limited width (for later terminal output), without line wrapping (one value per output line) and with ellipses for excess columns in the middle. This is similar what Pandas does when printing to terminal. Is there a function for that? DataFrame.to_string lets me only wrap excess lines (with line_width) but I don't see a way to insert the ellipsis automatically.
If I understand your correctly, you could just do:
print(str(df))
But if you would like to specify n rows and n columns, pd.DataFrame.to_string has arguments for that:
print(df.to_string(max_rows=10, max_cols=10))
This would only display 10 columns (5 columns and ellipsis then another 5 columns), and 10 rows (5 rows and ellipsis then another 5 rows).

Efficient way to store FilePath

Currently I have a table with the following format/Desc:
ColumnName ColID PK IndexPos Null DataType
ID 1 1 N VARCHAR2 (1 Byte)
FILEPATH 2 N VARCHAR2 (127 Byte)
As you can see the length of ID Column is only 1 Byte we can store only 36 different file paths. I have more than 35 different file paths that has to be stored and retrieved. I know increasing the length of ID solves the issue but I want to also know/suggestion that is there any Efficient way to handle this.
Thanks!
The assertion that you can store only 35 different values in the table is incorrect, because varchar2 characters are not limited to letters and digits (even if they were you'd have 26 letters + 10 digits + 1 empty string = 37, not 35 possibilities).
If you need to store few more paths, say, 40 or 50, you could make your keys mixed case, so 'a' and 'A' would reference different paths. This would instantly give you 26 extra possibilities.
Expanding past the limit of 63 is a little harder, because you need to bring special characters into the mix. However, the theoretical maximum for a single character is 256 plus one combination for an empty string.

What is the "star" measurement in Expression Blend?

I am currently working on a Windows 8 Metro/Modern UI application. Right now, I'm working on the interface in Expression Blend for Visual Studio.
My question is this: When sizing UI elements such as grid columns, I can use either pixels, auto, or stars. What is a star in this context? A google search turns up nothing and I haven't found anything in the Windiws 8 developer documentation.
Thank you.
In a grid a * means that it will equally share available space with other * columns (or rows). There are some good WPF examples of how this works here.
From the documentation here:
starSizing
A convention by which you can size rows or columns to take
the remaining available space in a Grid. A star sizing always includes
the asterisk character (), and optionally precedes the asterisk with
an integer value that specifies a weighted factor versus other
possible star sizings (for example, 3). For more information about
star sizing, see Grid.
In a grid with multiple columns, the * size columns divide up the remaining space. For example assume a 300px wide grid with 3 columns (150px, 120px and 1*).
The calculation is:
remainder = (300 - 150 - 120)
Since the remainder is 30px the 1* column is 30px wide
Now add some columns and modify the widths to (35px, 85px, 2*, 1*, 3*)
Redoing the calculation:
remainder = (300 - 35 - 85)
In this scenario the remainder is 180px, so each * column splits the remaining pixels according to their weighting number.
factor = (180/ (2 + 1 + 3))
factor = 30px
Therefore the 2* column is 60px, the 1* column is 30px and the 3* column is 90px
300 == 35 + 85 + 60 + 30 + 90
Of course the same principles apply for Row sizing.
When the grid is resized the * columns divvy up the new remainder size. But they keep the same size ratio between other * size items. In the example the 3* column will always be 3 times as wide as the 1* column.

SQL - Create Unique AlphaNumeric based on a 10-digit integer stored as VARCHAR

I'm trying to emulate a function in SQL that a client has produced in Excel. In effect, they have a unique, 10-digit numeric value (VARCHAR) as the primary key in one of their enterprise database systems. Within another database, they require a unique, 5-digit alphanumeric identifier. They want that 5-digit alphanumeric value to be a representation of the 10-digit number. So what they did in excel was to split the 10-digit number into pairs, then convert each of those pairs into a hexadecimal value, then stitch them back together.
The EXCEL equation is:
=IF(VALUE(MID(A2,1,4))>0,DEC2HEX(VALUE(MID(A2,3,2)))&DEC2HEX(VALUE(MID(A2,5,2)))&DEC2HEX(VALUE(MID(A2,7,2)))&DEC2HEX(VALUE(MID(A2,9,2))),DEC2HEX(VALUE(MID(A2,5,2)))&DEC2HEX(VALUE(MID(A2,7,2)))&DEC2HEX((VALUE(MID(A2,9,2)))))
I need the SQL equivalent of this. Of course, should someone out there know a better way to accomplish their goal of "a 5-digit alphanumeric identifier" based off the 10-digit number, I'm all ears.
ADDED 8/2/2011
First of all, thank you to everyone for the replies. Nice to see folks willing to help and even enjoying it! Based on all the responses, I'm apt to tell my client they're intent is sound, only their method is off kilter. I'd also like to recommend a solution. So the challenge remains, just modified slightly:
CHALLENGE: Within SQL, take a 10 digit, unique NUMERIC string and represent it ALPHANUMERICALLY in as few characters as possible. The resulting string must also be unique.
Note that the first 3-4 characters in the 10-digit string are likely to be zeros, and that they could be stripped to shorten the resulting alphanumeric string. Not required, but perhaps helpful.
This problem is inherently impossible. You have a 10 digit numeric value that you want to convert to a 5 digit alphanumeric value. Since there are 10 numeric characters, this means that there are 10^10 = 10 000 000 000 unique values for your 10 digit number. Since there are 36 alphanumeric characters (26 letters + 10 numbers), there are 36^5 = 60 466 176 unique values for your 5 digit number. You cannot map a set of 10 billion elements into a set with around 60 million.
Now, lets take a closer look at what your client's code is doing:
So what they did in excel was to split the 10-digit number into pairs, then convert each of those pairs into a hexadecimal value, then stitch them back together.
This isn't 100% accurate. The excel code never uses the first 2 digits, but performs this operation on the remaining 8. There are two main problems with this algorithm which may not be intuitively obvious:
Two 10 digit numbers can map to the same 5 digit number. Consider the numbers 1000000117 and 1000001701. The last four digits of 1000000117 get mapped to 1 11, where the last four digits of 1000001701 get mapped to 11 1. This causes both to map to 00111.
The 5 digit number may not even end up being 5 digits! For example, 1000001616 gets mapped to 001010.
So, what is a possible solution? Well, if you don't care if that 5 digit number is unique or not, in MySQL you can use something like:
hex(<NUMERIC VALUE> % 0xFFFFF)
The log of 10^10 base 2 is 33.219280948874
> return math.log(10 ^ 10) / math.log(2)
33.219280948874
> = 2 ^ 33.21928
9999993422.9114
So, it takes 34 bits to represent this number. In hex this will take 34/4 = 8.5 characters, much more than 5.
> return math.log(10 ^ 10) / math.log(16)
8.3048202372184
The Excel macro is ignoring the first 4 (or 6) characters of the 10 character string.
You could try encoding in base 36 instead of 16. This will get you to 7 characters or less.
> return math.log(10 ^ 10) / math.log(36)
6.4254860446923
The popular base 64 encoding will get you to 6 characters
> return math.log(10 ^ 10) / math.log(64)
5.5365468248123
Even Ascii85 encoding won't get you down to 5.
> return math.log(10 ^ 10) / math.log(85)
5.1829075929158
You need base 100 to get to 5 characters
> return math.log(10 ^ 10) / math.log(100)
5
There aren't 100 printable ASCII characters, so this is not going to work, as zkhr explained as well, unless you're willing to go beyond ASCII.
I found your question interesting (although I don't claim to know the answer) - I googled a bit for you out of interest and found this which may help you http://dpatrickcaldwell.blogspot.com/2009/05/converting-decimal-to-hexadecimal-with.html

Power-law distribution in T-SQL

I basically need the answer to this SO question that provides a power-law distribution, translated to T-SQL for me.
I want to pull a last name, one at a time, from a census provided table of names. I want to get roughly the same distribution as occurs in the population. The table has 88,799 names ranked by frequency. "Smith" is rank 1 with 1.006% frequency, "Alderink" is rank 88,799 with frequency of 1.7 x 10^-6. "Sanders" is rank 75 with a frequency of 0.100%.
The curve doesn't have to fit precisely at all. Just give me about 1% "Smith" and about 1 in a million "Alderink"
Here's what I have so far.
SELECT [LastName]
FROM [LastNames] as LN
WHERE LN.[Rank] = ROUND(88799 * RAND(), 0)
But this of course yields a uniform distribution.
I promise I'll still be trying to figure this out myself by the time a smarter person responds.
Why settle for the power-law distribution when you can draw from the actual distribution ?
I suggest you alter the LastNames table to include a numeric column which would contain a numeric value representing the actual number of indivuduals with a name that is more common. You'll probably want a number on a smaller but proportional scale, say, maybe 10,000 for each percent of representation.
The list would then look something like:
(other than the 3 names mentioned in the question, I'm guessing about White, Johnson et al)
Smith 0
White 10,060
Johnson 19,123
Williams 28,456
...
Sanders 200,987
..
Alderink 999,997
And the name selection would be
SELECT TOP 1 [LastName]
FROM [LastNames] as LN
WHERE LN.[number_described_above] < ROUND(100000 * RAND(), 0)
ORDER BY [number_described_above] DESC
That's picking the first name which number does not exceed the [uniform distribution] random number. Note how the query, uses less than and ordering in desc-ending order; this will guaranty that the very first entry (Smith) gets picked. The alternative would be to start the series with Smith at 10,060 rather than zero and to discard the random draws smaller than this value.
Aside from the matter of boundary management (starting at zero rather than 10,060) mentioned above, this solution, along with the two other responses so far, are the same as the one suggested in dmckee's answer to the question referenced in this question. Essentially the idea is to use the CDF (Cumulative Distribution function).
Edit:
If you insist on using a mathematical function rather than the actual distribution, the following should provide a power law function which would somehow convey the "long tail" shape of the real distribution. You may wan to tweak the #PwrCoef value (which BTW needn't be a integer), essentially the bigger the coeficient, the more skewed to the beginning of the list the function is.
DECLARE #PwrCoef INT
SET #PwrCoef = 2
SELECT 88799 - ROUND(POWER(POWER(88799.0, #PwrCoef) * RAND(), 1.0/#PwrCoef), 0)
Notes:
- the extra ".0" in the function above are important to force SQL to perform float operations rather than integer operations.
- the reason why we subtract the power calculation from 88799 is that the calculation's distribution is such that the closer a number is closer to the end of our scale, the more likely it is to be drawn. The List of family names being sorted in the reverse order (most likely names first), we need this substraction.
Assuming a power of, say, 3 the query would then look something like
SELECT [LastName]
FROM [LastNames] as LN
WHERE LN.[Rank]
= 88799 - ROUND(POWER(POWER(88799.0, 3) * RAND(), 1.0/3), 0)
Which is the query from the question except for the last line.
Re-Edit:
In looking at the actual distribution, as apparent in the Census data, the curve is extremely steep and would require a very big power coefficient, which in turn would cause overflows and/or extreme rounding errors in the naive formula shown above.
A more sensible approach may be to operate in several tiers i.e. to perform an equal number of draws in each of the, say, three thirds (or four quarters or...) of the cumulative distribution; within each of these parts list, we would draw using a power law function, possibly with the same coeficient, but with different ranges.
For example
Assuming thirds, the list divides as follow:
First third = 425 names, from Smith to Alvarado
Second third = 6,277 names, from to Gainer
Last third = 82,097 names, from Frisby to the end
If we were to need, say, 1,000 names, we'd draw 334 from the top third of the list, 333 from the second third and 333 from the last third.
For each of the thirds we'd use a similar formula, maybe with a bigger power coeficient for the first third (were were are really interested in favoring the earlier names in the list, and also where the relative frequencies are more statistically relevant). The three selection queries could look like the following:
-- Random Drawing of a single Name in top third
-- Power Coef = 12
SELECT [LastName]
FROM [LastNames] as LN
WHERE LN.[Rank]
= 425 - ROUND(POWER(POWER(425.0, 12) * RAND(), 1.0/12), 0)
-- Second third; Power Coef = 7
...
WHERE LN.[Rank]
= (425 + 6277) - ROUND(POWER(POWER(6277.0, 7) * RAND(), 1.0/7), 0)
-- Bottom third; Power Coef = 4
...
WHERE LN.[Rank]
= (425 + 6277 + 82097) - ROUND(POWER(POWER(82097.0, 4) * RAND(), 1.0/4), 0)
Instead of storing the pdf as rank, store the CDF (the sum of all frequencies until that name, starting from Aldekirk).
Then modify your select to retrieve the first LN with rank greater than your formula result.
I read the question as "I need to get a stream of names which will mirror the frequency of last names from the 1990 US Census"
I might have read the question a bit differently than the other suggestions and although an answer has been accepted, and a very through answer it is, I will contribute my experience with the Census last names.
I had downloaded the same data from the 1990 census. My goal was to produce a large number of names to be submitted for search testing during performance testing of a medical record app. I inserted the last names and the percentage of frequency into a table. I added a column and filled it with a integer which was the product of the "total names required * frequency". The frequency data from the census did not add up to exactly 100% so my total number of names was also a bit short of the requirement. I was able to correct the number by selecting random names from the list and increasing their count until I had exactly the required number, the randomly added count never ammounted to more than .05% of the total of 10 million.
I generated 10 million random numbers in the range of 1 to 88799. With each random number I would pick that name from the list and decrement the counter for that name. My approach was to simulate dealing a deck of cards except my deck had many more distinct cards and a varing number of each card.
Do you store the actual frequencies with the ranks?
Converting the algebra from that accepted answer to MySQL is no bother, if you know what values to use for n. y would be what you currently have ROUND(88799 * RAND(), 0) and x0,x1 = 1,88799 I think, though I might misunderstand it. The only non-standard maths operator involved from a T-SQL perspective is ^ which is just POWER(x,y) == x^y.