Fields to be covered for compliance data masking - masking

I am into building a compliance checkup document for data masking.
Name of compliance--> Fields to be masked-->Type of masking
e.g.
PCI DSS --> “Mask PAN when displayed (the first six and last four digits are the maximum number of digits to be displayed).
From where can I get the list of attributes for all the major compliance's like PCI, PII, HIPAA etc
I will use the same as source of reference.
Thanks in advance.

From where can I get the list of attributes for all the major compliance's like PCI, PII, HIPAA etc??
Are you asking what type of information to be masked?
it depends on transaction type that you are getting, i don't know what type of side (provider/payer/thridparty-broker). it varies by the compnay to company. if it is HIPAA for sure u gonna deal with 820,834,835,837.
Please provide more information.

Related

What is meaning of different fields returned by get login form call?

I am looking for specific meaning of following fields
valueIdentifier
valueMask
fieldType
FieldInfoMultiFixed
AutoRegFieldInfoSingle
FieldInfoMultiVariable
and in most cases we are getting numerical value for helpText. How do we identify whether helpText is present or not?
A lot of the stuff like FieldInfoMultiFixed/Variable is discussed in the Yodlee SDK Developer guide. Search for either one. They're just basically silly combos where people breakup a single value into multiple fields (like phone number or ssn into 3 textboxes)
As for the helpText, every time i've seen a Yodlee tech respond they say no. The number corresponds to an internal resource identifier that is apparently not exposed through the api. I want to say I saw somebody say that it might be available for things like forum signup/registration (where it would be more useful). The SDK makes mention as if it works as you would expect it to but that is an error.
Currently Yodlee does not have helptext populated for any field. Hence a numerical value is associated to it. In future if any helptext gets added then instead of numerical value you will have text in that field.
Hence if you are receiving numerical values then you should take it as helptext not present.
Shreyans

US phone numbers

I'm building an app that uses phone numbers to perform different tasks, and recently I've had quite a few requests to implement it for the US market. Unfortunately as I live in the UK I don't have much knowledge of US phone number formats, and with so many USA users on here I was hoping some of you would be able to help.
I'm looking to obtain a list of sample phone numbers as they appear in your call log on your mobile phones. I'm trying to determine if they come through in the format +1234567, +001234567, 001234567, 01234567, 1234567, 234567 etc, or perhaps the format can vary..
Hopefully you're hesitant about giving out phone numbers on the web, so feel free to change a few digits (I'm mainly interested in the first few digits and the format of the numbers).
The more numbers you can provide the better, thanks!
The following formats are common:
+12312322334
2312322334
(231) 232-2334
2322334
232-2334
The last two forms are unusual, though may be encountered. The area code is implied to be local to the phone.
Note that there are some invalid entries: Numbers never start with a "1" (thats the long distance dial indicator, optional on cell phones), the "555" prefix is reserved (so commonly used in movies).
U.S. phone numbers have three parts: A three-digit area code, a three-digit number, and a four-digit number. Generally, these are written in the format (234)-555-1234. If you are calling from the same area code as the person you are calling, you can omit the area code (the (234) part). For landlines, you often need to input a 1 first if you intend to include the area code, but most cell phones don't require this.
As I say -- interesting q. Have you searched for something like "dirty north american phone format" or "how are north american phone numbers typically formatted"? Struck me as being something that has to be done often.
Google brings up this as an example: Phone number format provider. It has a) some example formats and b) some code that actually deals with dirty or non-standard formats, and reformats them ...
So -- from my comment I guess I'd strip spaces (and hyphens) to start with, but from then on assume that you've got a right-most part of the number, and that any missing left-most parts represent increasingly wider geographic areas.
In reverse -- if the assumption works, you can create your own sample numbers by taking a standard format number and chopping groups from the left hand side -- I think.

Is there common street addresses database design for all addresses of the world? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 2 years ago.
Improve this question
I am a programmer and need a practical approach to storing street address structures of the world in a database. So which is the best and common database design for storing street addresses? It should be simple to use, fast to query and dynamic to store all street addresses of the world.
It is possible to represent addresses from lots of different countries in a standard set of fields. The basic idea of a named access route (thoroughfare) which the named or numbered buildings are located on is fairly standard, except in China sometimes. Other near universal concepts include: naming the settlement (city/town/village), which can be generically referred to as a locality; naming the region and assigning an alphanumeric postcode. Note that postcodes, also known as zip codes, are purely numeric only in some countries. You will need lots of fields if you really want to be generic.
The Universal Postal Union (UPU) provides address data for lots of countries in a standard format. Note that the UPU format holds all addresses (down to the available field precision) for a whole country, it is therefore relational. If storing customer addresses, where only a small fraction of all possible addresses will be stored, its better to use a single table (or flat format) containing all fields and one address per row.
A reasonable format for storing addresses would be as follows:
Address Lines 1-4
Locality
Region
Postcode (or zipcode)
Country
Address lines 1-4 can hold components such as:
Building
Sub-Building
Premise number (house number)
Premise Range
Thoroughfare
Sub-Thoroughfare
Double-Dependent Locality
Sub-Locality
Frequently only 3 address lines are used, but this is often insufficient. It is of course possible to require more lines to represent all addresses in the official format, but commas can always be used as line separators, meaning the information can still be captured.
Usually analysis of the data would be performed by locality, region, postcode and country and these elements are fairly easy for users to understand when entering data. This is why these elements should be stored as separate fields. However, don't force users to supply postcode or region, they may not be used locally.
Locality can be unclear, particularly the distinction between map locality and postal-locality. The postal locality is the one deemed by a postal authority which may sometimes be a nearby large town. However, the postcode will usually resolve any problems or discrepancies there, to allow correct delivery even if the official post-locality is not used.
Have a look at Database Answers. Specifically, this covers many cases:
(All variable length character datatype)
AddressId
Line1
Line2
Line3
City
ZipOrPostcode
StateProvinceCounty
CountryId
OtherAddressDetails
Ask yourself what is the main purpose of storing this data? Do you intend to actually send mail to the person at the address? Track demographics, populations? Be able to ask callers for their correct address as part of some basic authentication/verification? All of the above? None of the above?
Depending on your actual need, you will determine either a) it doesn't really matter, and you can go for a free-text approach, or b) structured/specific fields for all countries, or c) country specific architecture.
Sometimes the closest you can get to a street address is the city.
I once had a project to put all the Secondary Schools in India in Google Maps. I wrote a spiffy program using the Google API and thought it would be quite easy.
Then I got the data from the client. Some school addresses were things like "Across from the market, next to the barber" or "Near old bus stand".
It made my task much harder since, unfortunately, the Google API does not support that format.
For international addresses, it is remarkably hard to find a way to format the information if it is broken down into fields. As a for instance, an Italian address uses:
<street address>
<zip> <town> <region>
<country>
Such as
Via Eroi della Repubblica
89861 Tropea VV
Italy
That is rather different from the order for US addresses - on the second line.
See also the SO questions:
How many address fields would you use for a UK database?
Do you break up addresses into street / city / state / zip?
How do you deal with duplicate street suffixes?
Best practices for storing postal addresses in a database (RDBMS)?
Also check out tag 'postal-code'.
Edit: Reverse order of region and town - per UPU
Maybe this is useful:
https://gist.github.com/259744
For a project I collected a table of informations about all countries of the world, including ISO codes, top level domain, phone code, car sign, length and regex of zip.
Country names and comments unfortunately only in German...
Differently of other answers here, I believe it's possible to have an structured address database.
Just out of the hat, I can think of the following structure:
Country
Region (State / Province)
Locality (City / Municipality)
Sub-Locality (County / other sub-division of a locality)
Street
But how to query it fast enough?
One way I always think it can be accomplished is to ask for the ZIP Code (or Postal Code) which varies from country to country, but is solid within the country.
This way you can structure your data around the information provided by the postal offices around the world.
Depends on how free-form you are prepared to go with the fields. One free-form address field will obviously always do, but be of relatively little help narrowing down geography.
The problem you'll have is that there is too much variation in the level of geographical hierarchy across countries. Heck, some countries do not even have 'street addresses' everywhere.
I recommend you don't try to make it too clever.
Len Silverston of Universal Data Model fame recommends a separate hierarchy of GEOGRAPHIC BOUNDARIES and depending on how much free-formed-ness you're willing to accept either simple STREET ADDRESS LINEs or per-country derivatives.
No, absolutely not. If you compare the way US and Japanese addresses work, you'll see that it's not possible.
UPDATE:
On second thought, anything can be done, but there's a trade-off.
One approach is to model the problem with address and address_attribute tables, with a 1:m relationship between them, anything can be modeled. The address_attribute table would have a pk, a name, a value, and an fk that points back to its address parent's pk. It's almost like using a Map with name, value pairs.
The trade-off is having to do a JOIN every time you want an address. You also have to interrogate the names of the address_attributes to figure out what you're dealing with each time.
Another approach would be to do more comprehensive research on how addresses are modeled around the world. In an object-oriented world you might have the western Address class (street1/street2/city/state/zip) and others for Japan, China, as many as needed to tile the address space. Then you'd have a master Address table and child tables to the other types with a 1:1 relationship between them.
How does Amazon or eBay do it? They ship internationally. Do they have locale-specific UI features? I've only used the US locale.
No, there are no standard addressing scheme. It usually varies from country-to-country.
Even the Universal Postal Union said on Adressing the world, an address for everyone that there is none. The best solution for this is to use the 2/3-letter country code standards known as ISO 3166 and treat everything else by country's standards.
However, if you really are desperate to use easily accessible tools for your project, you can try Google Place API.
Your design should strongly depend from your purpose. Some people have posted how to structure data. So if you simply want to send s-mail to someone, it will do. Things begin to complicate if you want to use this data for navigation. Car navigation will require additional structures to contain traffic info (eg one-way roads), while foot navigation will require a lot of additional data. Here is small example: in my city, my neighborhood is near the park. Next to the park is former airfield (in fact, one of the oldest in Europe) turned into aviation museum. Next to aviation museum is a business park. Street number for museum is 39, while business park numbers start with 39A. So it may seem that 39 and 39A are close – but it takes about a mile to walk from one to another (and even longer if going by car) .
This is just a small example taken from my city, I think you can probably find a lot of exceptions (especially in rural or wilder parts of every country).

Which parts of an address should be required?

Say I am storing addresses in a DB table, in this fairly common break down:
address_street_line_1,
address_street_line_2,
address_city,
address_state,
address_zip,
address_country_id
(Note: I have read the questions on splitting down further, street type, house number, etc. and for this application I think it would unnecessarily complicate things.)
To work best with international users, which of these fields should NOT be required?
I'm thinking this:
address_street_line_1 REQUIRED
address_city REQUIRED
address_country_id REQUIRED
Should I require state or zip?
Thanks!
Xavier
You can probably only require one field: country.
But what you should really be doing is making the logic dependent on country. Take a look at Address Formats by Country for a comprehensive list. That isn't just about required fields either. It's also about correct formatting. A US address might be:
8031 Main Street
Springfield OH 12345
USA
whereas in Switzerland:
Bodenstr. 173
8043 Zürich
Schweiz
Note: the street numbers and post codes are in the "reverse" order for Switzerland (compared to what English speaking countries use).
Also, your data types need to be broad enough to cover data used in other countries. Zip/post code should absolutely not be a numeric type. For example, "EC2R 8AH" is a valid UK postcode.
That goes back to this principle: if you don't perform arithmetic on it, it's not a numeric type. It's text.
Also, try not to call it Zip Code to end users. That's a US only term. Pretty much everywhere else its call a Postcode, Post code or Postal Code. Also note that the UK postal codes are alphanumeric and include a space.
Not all countries even use postal codes, for example they were rarely used in New Zealand prior to 2006 or so. I think Ireland doesn't use them at all.
If you're truly international, city-states such as Singapore don't actually need a City field.
In the user interface, you can (and perhaps should) make the postcode required for countries where you already know it's required, since that isn't likely to change. And, if you make the UI dynamic enough, you can call it "Zip code" if the selected country is the United States, "Postal code" for Canada, "Postcode" for the UK, etc.
How about making none required? If the user wants to be contacted they'll enter enough information. Or, enter a single text field and let them enter free form information. They know better than you what fields are required for postal deliveries to make it to their door.
I would say everything except street_line_2 and state- and think of 'zip' as more of a postal codes instead of zip code - as you can tell from the variety of format based on the country of origin, this should have a pretty open format.
Even in the U.S., most of the address is not required. A large fraction of U.S. zip codes are allocated to various businesses and organizations - any mail to one of those zips will be delivered the same regardless of the rest of the address. For instance:
General Electric
Schenectady, NY 12345
Internal Revenue Service
Ogden, UT 84201-0027
The city and state are nice, but the mail will probably get delivered without.
The best way that I have found to solve this problem is by abstracting the logic in your application layer, and not the persistence layer. One of the cleanest/simplest ways I've seen this done is by passing the user's data in a value object (creating a common interface that's easy to validate against) to a validator with the current country code, which makes sure all the required attributes are set properly in the value object for that locale. Assuming it passes validation, pass the value object along to the persistence side of your application for storage.
The key here is the value object - you're creating a common interface that multiple pieces of your application can talk to, validate, and read/write from. You can then also use that same value object when displaying the address: have your persistence layer get the information, put it in the value object, pass it to a factory with the current locale which returns the desired address format, and send that output to the front end.
There are no states in New Zealand, so it should definately be optional. So I think you have the right answer in your question.
If you are not going to do any specific lookup, like searching by postal code or by city, I'd say to all combine the address in a single field. This way you will support the different address from different countries.
You will also support address oddities.
If you fear that the requirements are going to change, you could store the address as a Xml field. Modern database like Sql Server 2005 and 2008 can have an index on a Xml node inside a Xml column as long as you are using a schema.
It all come down to requirements. If the client need to group the data inside a grid by country, then you need a country column.
Making fields required is always a tradeoff. If the person doesn't want to fill in the info then they won't -- they'll put in a period, or garbage to get past the "required field" nanny.
I only require street_address_1 in my apps. Also, for the US and many countries, you can buy the mapping between the postal/zip code and the canonical city/state. It's not expensive. (The mapping between individual street addresses and zip is much more expensive.)
For the US, see http://www.usps.com/ncsc/addressinfo/citystate.htm
If you're including an Ajax web interface, ask for the country first, then the post code. If in the US, then use Ajax to fetch and fill in the city/state for the user from the zip.
Some non-US countries, eg UK, can have 3 lines of street addresses if you're asking people to fill in their "preferred address" Eg:
Mirassou (You can register a building's name with the post office
High Street as an alternative to its street number)
Old Town
City, Bucks postal_code
Larry
Actually, city isn't even required in the US.
Many people have rural addresses on state and county roads. Publication 28 at the postal service web site has details. Different companies end up using the "city" field to store other information. This also applies to military base addresses.
Publication 28 link

Should we put units of measurements in attribute names? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
I think most of us agree that it's a good idea to use a descriptive name for variables, object attributes, and database columns. If you want to store something's name, you may as well call the attribute Name so people know what to put in it.
Where the unit of measurement isn't immediately apparent, I think you should go a step further and include the unit of measurement in the name. Length_mm, for example, should help remind developers that they'd better convert the length to mm if the user just entered it in inches.
My database administrator, however, just told me that including units of measurement in database column names is “frowned upon”. I think that's just nuts, but perhaps there's some risk DBAs know about that I don't.
Throw me a line, here: should we embed units of measurement in our attribute names? Why? Why not?
If you have a consistent UOM for things, then your DBA's policy is OK.
For example, if timespans are ALWAYS in minutes, etc.
If the UOM could change, then you should store it in another column, alongside the qty.
That said, I tend to side with you on this. Clarity trumps most things, including this. I'd rather see DurationMinutes than Duration and have to guess what the UOM is.
Yes. You should.
The key, as #[Charles Bretana] pointed out, is legibility and that the other users of your table or developers following you know what you're using.
I would absolutely involve the units/measurement in a field name - in my business you can't guess what you'll find from the context or name: a field entitled MarketValue - is that in millions, thousands or units? US Dollars, Euros, pounds, $CURRENCY? Is that value a percentage, a ratio? Absolute or relative? Daily, monthly, calendar year, financial year? That timestamp, what time zone is it?
Your first, last and only task when providing data is to ensure that it isn't used incorrectly because the consumer wasn't able to find out enough about it. As developers, throwing "Metre", "USD", "GMT", "Percent" or whatever into a field name isn't the least bit smelly.
There are enormous smells that need resolving before the tiny whiff of field naming needs standardising.
This is why the Mars Climate Orbiter crashed into the surface at 350 meters/sec when it was planned to only handle 350 ft/sec (or something like that).
Although "Never say 'Never' or 'Always'" is, in general, a good rule of thumb, here I will bend my rule and say I think you should "always" make it clear what units a numeric value is in.
The convention of naming all my columns in the format:
{name}_in_{unit}
helped for one project, since I was using si units it actually ended up allowing me to be able to infer the column data type and generally simplify my writing style.
length_in_m
speed_in_ms-1
color_in_nm
there were a few exceptions that I handled either with at_time or number_of:
started_at_time
updated_at_time
number_of_rotations
I think this is a good idea anywhere since there is always room for ambiguity.
For example, the with high performance timer class we use, I keep having to check if the GetElapsed() method returns seconds or milliseconds or something else. If it were called GetElapsedMilliseconds() that would save the confusion.
The only downside being if you wanted to change your mind ... but in that case any clients would need to know about the change anyway.
F# has an interesting twist on this allowing measurement units to be specified in the type system. See this blog post, and another stackoverflow question discussing Are units of measurement unique to F#?
I've done a lot of database work, and I would not frown upon that at all, nor have I heard of frowning on it.
It's better than the extended properties, which is not apparent to the casual developer. It's better than in a separate document, because many developers won't read them, and certainly not in great detail. If the units are set, then having it in the name sounds like a good idea. If that changes, then when the unit field is added, change the name of the measurement field.
Where the unit of measurement isn't immediately apparent, I think you should go a step further and include the unit of measurement in the name. Length_mm, for example, should help remind developers that they'd better convert the length to mm if the user just entered it in inches.
You could go even a step further (in your code, not in the database) and have a Length type, which takes care of the measurement unit and of possible conversions. This is the approach of the "Quantity" pattern in Martin Fowler's "Analysis Patterns" book.
Do not put units of measurement (or column type) in your database column names.
Many Databases have the ability to document/comment columns in some way (in SQL Server it is sp_addextendedproperty), I would suggest that is a more appropriate place.
For Python datetimes, consider using objects from the datetime package. Doing so will capture the unit implicity to microsecond resolution. There is then no basis for including the unit in the variable name.
If you must use an int or float instead, it is strongly recommend to suffix the unit name abbreviation to the variable name. For example, instead of the variable name diff, use diff_secs for seconds, diff_ms for milliseconds, diff_µs for microseconds, or diff_ns for nanoseconds.
We don't put units of measurement in column names in our database. We do, however, have a data dictionary document where all of the columns and relationships are described.
The ideal approach is, if possible, to use a type that leaves no ambiguity as to the measurement. For example in .NET rather than saying int periodInSeconds you'd be much better off using TimeSpan period.
The F# language actually has units of measurement as part of the type system so you can declare types in units such as 10<m/s> and 5<s> and even perform calculations on them so something like 10<m/s> * 5<s> would result in 50<m>. See here for more info.
So I'd say if possible use a type that conveys your intention, but if that isn't possible then you should probably encode the measurement into the name. It's better and more obvious than a comment.
You definitely want units of measurement somewhere. I don't know if the column names are a good place or if the schema is better. Ask your database administrator
Where is the information about units of measure stored?
How can I get access to the units programmatically?
If the answers are "it isn't" or "you can't", complain bitterly---they have no right to deny you your naming convention. Otherwise, all may be happier if you work within the system.
P.S. I really like the support for units of measure that they've put into F#.
I have to say, I hate "descriptive" variable names becoming "incredibly verbose" variable names.
My preferred alternative is to use nothing but the unit-of-measure names in short functions. Eg.
function velocity(m, s) {
return m/s;
}
You don't need to say "length_m" because in this context, it's obvious that only lengths are measurable in metres.
Having said that. If I was writing a system where units of measure errors were really dangerous, I'd probably use the type system and define a Length class which always converted itself into a standard unit for any calculation. Maybe even different sub-classes for Feet, Metres etc.
NO, the name of the attribute is seperate from its unit of measurement.
If you call a variable length_mm then you are tied to mm.
what if you use a 32bit int to store length_mm, eventually the length in mm may get larger then 62,000, or whatever the limit is on 32bit ints. You cant switch over to m cause you tied you length variable to length_mm.
I think putting units in your identifiers is a huge design smell. It almost surely means that you chose the wrong language: if units are so important to the project, you'd better be using a language whose type system is capable of representing them.