How can I save data to data structure (data which I get from user) - structure

I have to make a polynomial multiplication program. The parameter of polynomial can be various such as'x','y','xzy' etc. Also I have to distinguish the priority of operator '()'and '{}'.
For example, if user insert "(3y^7+4x^4-1){{(5x^4-3y^2+7)(3x^3-4x^1+2)}*(2y^7-3z^1+9)}", then program should print the result of multiplication.
For this program, I tried to use data structure. I have to make a data structure which type is double pointer to save polynomial. But, I don't know how to make it,,, and I don't know how to save each polynomial which user insert.

Related

Non-cryptography algorithms to protect the data

I was able to find a few, but I was wondering, is there more algorithms that based on data encoding/modification instead of complete encryption of it. Examples that I found:
Steganography. The method is based on hiding a message within a message;
Tokenization. Data is mapped in the tokenization server to a random token that represents the real data outside of the server;
Data perturbation. As far as I know it works mostly with databases. Adds noise to the sensitive records yet allows to read general and public fields, like sum of the records on a specific day.
Are there any other methods like this?
If your purpose is to publish this data there are other methods similars to data perturbation, its called Data Anonymization [source]:
Data masking—hiding data with altered values. You can create a mirror
version of a database and apply modification techniques such as
character shuffling, encryption, and word or character substitution.
For example, you can replace a value character with a symbol such as
“*” or “x”. Data masking makes reverse engineering or detection
impossible.
Pseudonymization—a data management and de-identification method that
replaces private identifiers with fake identifiers or pseudonyms, for
example replacing the identifier “John Smith” with “Mark Spencer”.
Pseudonymization preserves statistical accuracy and data integrity,
allowing the modified data to be used for training, development,
testing, and analytics while protecting data privacy.
Generalization—deliberately removes some of the data to make it less
identifiable. Data can be modified into a set of ranges or a broad
area with appropriate boundaries. You can remove the house number in
an address, but make sure you don’t remove the road name. The purpose
is to eliminate some of the identifiers while retaining a measure of
data accuracy.
Data swapping—also known as shuffling and permutation, a technique
used to rearrange the dataset attribute values so they don’t
correspond with the original records. Swapping attributes (columns)
that contain identifiers values such as date of birth, for example,
may have more impact on anonymization than membership type values.
Data perturbation—modifies the original dataset slightly by applying techniques that round numbers and add random noise. The range
of values needs to be in proportion to the perturbation. A small base
may lead to weak anonymization while a large base can reduce the
utility of the dataset. For example, you can use a base of 5 for
rounding values like age or house number because it’s proportional to
the original value. You can multiply a house number by 15 and the
value may retain its credence. However, using higher bases like 15 can
make the age values seem fake.
Synthetic data—algorithmically manufactured information that has no
connection to real events. Synthetic data is used to create artificial
datasets instead of altering the original dataset or using it as is
and risking privacy and security. The process involves creating
statistical models based on patterns found in the original dataset.
You can use standard deviations, medians, linear regression or other
statistical techniques to generate the synthetic data.
Is this what are you looking for?
EDIT: added link to the source and quotation.

Abstract data type (ADT) for truth table

Just like there is a nice, common/generic, ADT for Graphs.
Is there something for "truth tables"? I'm trying to find 'class' definitions from projects which already implement a truth table.
If not, how would you go about designing a (generic) truth table ADT?
UPDATE: As suggested in comments, here is what I came up with:
Add (delete) a row: TruthTable.add(input,output) (the first row which gets added is used to extract the length (in bits) of inputs and output. All subsequent row additions are validated against this.) and TruthTable.delete(input)
Get the output(image) for a given input:TruthTable.output(input) or TruthTable.image(input)
Get all the inputs: TruthTable.inputs (Odometer ordering.)
Get all outputs : TruthTable.outputs (Order according to ordering of inputs or odometer ordering?)
See if a table is completely specified, ie, for all the 2n possible inputs, is an output specified: TruthTable.completely_specified?
Other specialized operations can be:
Check if a truth table is invertible: TruthTable.invertible?
Check if two tables are equivalent: TT1 ==? TT2
(I wish sometimes that programming languages allowed method names to end in a '?', for names of those methods which return a Bool)

VB.NET Comparing files with Levenshtein algorithm

I'd like to use the Levenshtein algorithm to compare two files in VB.NET. I know I can use an MD5 hash to determine if they're different, but I want to know HOW MUCH different the two files are. The files I'm working with are both around 250 megs. I've experimented with different ways of doing this and I've realized I really can't load both files into memory (all kinds of string-related issues). So I figured I'd just stream the bytes I need as I go. Fine. But the implementations that I've found of the Levenshtein algorithm all dimension a matrix that's length 1 * length 2 in size, which in this case is impossible to work with. I've heard there's a way to do this with just two vectors instead of the whole matrix.
How can I compute Levenshtein distance of two large files without declaring a matrix that's the product of their file sizes?
Note that the values in each row of the Levenshtein matrix depend only on the values in the row above it. This means that you only need two one-dimensional arrays: one contains the values of the current row; the other is populated with the new values that you can compute from the current row. Then, you swap their roles (the "new" row becomes the "current" row and vice versa) and continue.
Note that this approach only lets you compute the Levenshtein distance (which seems to be what you want); it cannot tell you which operations must be done in order to transform one string into the other. There exists a very clever modification of the algorithm that lets you reconstruct the edit operations without using nm memory, but I've forgotten how it works.

How to distinguish in master data and calculated interpolated data?

I'm getting a bunch of vectors with datapoints for a fixed set of values, in the example below you see an example of a vector with a value per time point
1D:2
2D:
7D:5
1M:6
6M:6.5
But alas not for all the timepoints is a value available. All vectors are stored in a database and with a trigger we calcuate the missing values by interpolation, or possibly a more advanced algorithm. Somehow I want to be able to tell which data points have been calculated and which have been original delivered to us. Of course I can add a flag column to the table with values indicating whether the value was a master value or is calculated, but I'm wondering whether there is a more sophisticated way. We probably don't need to determine on a regular basis, so cpu cycles are not an issue for determining or insertion.
The example above shows some nice looking numbers but in reality it would look more somethin like 3.1415966533.
The database for storage is called oracle 10.
cheers.
Could you deactivate the trigger temporarily?

Using integers and requiring multiplication vs. using decimals as a data type - what are your thoughts?

What are your thoughts on this? I'm working on integrating some new data that's in a tab-delimited text format, and all of the decimal columns are kept as single integers; in order to determine the decimal amount you need to multiply the number by .01. It does this for things like percentages, weight and pricing information. For example, an item's price is expressed as 3259 in the data files, and when I want to display it I would need to multiply it in order to get the "real" amount of 32.59.
Do you think this is a good or bad idea? Should I be keeping my data structure identical to the one provided by the vendor, or should I make the database columns true decimals and use SSIS or some sort of ETL process to automatically multiply the integer columns into their decimal equivalent? At this point I haven't decided if I am going to use an ORM or Stored Procedures or what to retrieve the data, so I'm trying to think long term and decide which approach to use. I could also easily just handle this in code from a DTO or similar, something along the lines of:
public class Product
{
// ...
private int _price;
public decimal Price
{
get
{
return (this._price * .01);
}
set
{
this._price = (value / .01);
}
}
}
But that seems like extra and unnecessary work on the part of a class. How would you approach this, keeping in mind that the data is provided in the integer format by a vendor that you will regularly need to get updates from.
"Do you think this is a good or bad idea?"
Bad.
"Should I be keeping my data structure identical to the one provided by the vendor?"
No.
"Should I make the database columns true decimals?"
Yes.
It's so much simpler to do what's right. Currently, the data is transmitted with no "." to separate the whole numbers from the decimals; that doesn't any real significance.
The data is decimal. Decimal math works. Use the decimal math provided by your language and database. Don't invent your own version of Decimal arithmetic.
Personally I would much prefer to have the data stored correctly in my database and just do a simple conversion every time an update comes in.
Pedantically: they aren't kept as ints either. They are strings that require parsing.
Philisophically: you have information in the file and you should write data into the database. That means transforming the information in any ways necessary to make it meaningful/useful. If you don't do this transform up front, then you'll be doomed to repeat the transform across all consumers of the database.
There are some scenarios where you aren't allowed to transform the data, such as being able to answer the question: "What was in the file?". Those scenarios would require the data to be written as string - if the parse failed, you wouldn't have an accurate representation of the file.
In my mind the most important facet of using Decimal over Int in this scenario is maintainability.
Data stored in the tables should be clearly meaningful without need for arbitrary manipulation. If manipulation is required is should be clearly evident that it is (such as from the field name).
I recently dealt with data where days of the week were stored as values 2-8. You can not imagine the fall out this caused (testing didn't show the problem for a variety of reasons, but live use did cause political explosions).
If you do ever run in to such a situation, I would be absolutely certain to ensure data can not be written to or read from the table without use of stored procedures or views. This enables you to ensure the necessary manipulation is both enforced and documented. If you don't have both of these, some poor sod who follows you in the future will curse your very name.