What does the word "semantic" mean in Computer Science context? - definition

I keep coming across the use of this word and I never understand its use or the meaning being conveyed.
Phrases like...
"add semantics for those who read"
"HTML5 semantics"
"semantic web"
"semantically correctly way to..."
... confuse me and I'm not just referring to the web. Is the word just another way to say "grammar" or "syntax"?
Thanks!

Semantics are the meaning of various elements in the program (or whatever).
For example, let's look at this code:
int width, numberOfChildren;
Both of these variables are integers. From the compiler's point of view, they are exactly the same. However, judging by the names, one is the width of something, while the other is a count of some other things.
numberOfChildren = width;
Syntactically, this is 100% okay, since you can assign integers to each other. However, semantically, this is totally wrong, since the width and the number of children (probably) don't have any relationship. In this case, we'd say that this is semantically incorrect, even if the compiler permits it.

Syntax is structure. Semantics is meaning. Each different context will give a different shade of meaning to the term.
HTML 5, for example, has new tags that are meant to provide meaning to the data that is wrapped in the tags. The <aside> tag conveys that the data contained within is tangentially-related to the information around itself. See, it is meaning, not markup.
Take a look at this list of HTML 5's new semantic tags. Contrast them against the older and more familiar HTML tags like <b>, <em>, <pre>, <h1>. Each one of those will affect the appearance of HTML content as rendered in a browser, but they can't tell us why. They contain no information of meaning.

The word ‘semantic ‘as an adjective simply means ‘meaningful’ which is very related to the word 'high level' in computer science.
For instances:
Semantic data model:
a data model that is semantic, that is meaningful and understood by anyone regardless of his background or expertise.
C++ is less semantic than Java, because Java uses meaningful words for its classes, methods and fields.
HTML5 semantics: refer to the tags that describe themselves such , , and so on.

It means "meaning", what you've got left when you've already accounted for syntax and grammar. For example, in C++ i++; is defined by the grammar as a valid statement, but says nothing about what it does. "Increment i by one" is semantics.
HTML5 semantics is what a well-formed HTML5 description is supposed to put on the page. "Semantic web" is, generally, a web where links and searches are on meaning, not words. The semantically correct way to do something is how to do it so it means the right thing.

It is not just Computer Science terminology, and if you ask,
What is the meaning behind this Computer Science lingo?
then I'm afraid we'll get in a recursive loop just like this.

In the HTML world, "semantic" is used to talk about the meaning of tags, rather than just considering how the output looks. For example, it's common to italicize foreign words, and it's also common to italicize emphasized words. You could simply wrap all foreign or emphasized words in <i>..</i> tags, but that only describes how they look, it doesn't describe why they look that way.
A better tag to use for emphasized word is <em>..</em>, because it conveys the semantics of emphasis. The browser (or your stylesheet) can then render them in italics, and other consumers of the page will know the word is emphasized. For example, a screen-reader could properly read it as an emphasized word.

From my view, it's almost like looking at syntax in a grammatical way. I can't speak to semantics in a broad term, but When people talk about semantics on the web, they are normally referring to the idea that if you stripped away all of the css and javascript etc; what was left (the bare bones html) would make sense to be read.
It also takes into account using the correct tags for correct markup. This stems from the old table-based layouts (tables should only be used for tabular data), and using lists to present list-like content.
You wouldn't use an h1 for something that was less important than an h2. That wouldn't make sense.

The below is syntactically different but semantically the same:
C, C++, C#, Java, JavaScript, Python, Ruby, etc.
x += y
Perl, PHP
$x += $y

Related

Are there conventional synonyms used to replace keywords reserved in programming languages?

The main examples of the words I mean are "object", "value" etc. In many (well, not really, but the chances are on some occasions at least) cases you may happen to find yourself willing to name a variable etc. of yours this way.
Another example I have stumbled upon in my practice is "try" which represents both the keyword (in many C-like and other languages) used in exception handling and the currency of Turkey. But this is an example just for fun, I doubt there are any common practices known for this particular case (though I feel like there may be for the previous).
What do people do in such cases? What are some synonyms for an object, a value etc reasonable in the programming and data modelling context?
For example imagine you are developing an object database, manipulating objects, properties and values (rather than documents, fields and... eh... values) is, for some reason, among the key ideas of its philosophy and you really don't want to use words too distinct from these semantically. What words would you use to replace the reserved ones while keeping the sense very close to that of theirs?
The easiest solution to come into my mind so far it to use misspelled (or spelled in a different language orthography) varieties of the same words like "objekt", "walue" etc. but although this can do the job this just disgusts me so much I really don't want to accept going this way ever.
UPDATE: Indeed, in some specific cases (particular languages) using a different case (which, some times, may go against the case aspect of the commynity and/or the company naming convention by the way) and/or namespaces (which have been introduced almost exactly for this) may solve the problem at least partially but I am still interested in alternatives as I believe actually duplicating a system keyword is a thing one should at least think about avoiding (might there be a way to do it easily without accepting compromises considered too serious) in every case.
I am even considering writing script that would scrape through GitHub to analyse the common code elements naming vocabulary but I think it is always a good idea to ask first rather than to "reinvent a bicycle", perhaps somebody has done something like this already.
UPDATE2: Please do me a favour and consider the following with applicable degree of objectivity before voting to close. With all do respect I would like to emphasise that the actual degree of subjectivity of this question is excusably low (though, I admit, somewhat above zero anyway). The only real flaw of it is that it might perhaps fit the English site better but I believe the audience of StackOverflow is much more relevant (generally informed in a much more relevant way) to the context. The actual goal of publishing this question is to highlight a problem that is fairly easy to understand clearly enough and which can not be denied of existence (though its importance may be questionable so far) but is spoken of too little (as importance of code clarity and semantic relevance is increasing, IMHO, code as a media is quickly moving towards obtaining bigger cultural (in the broad meaning of this word) importance than of books). And to let people share the ways of addressing it in practice they know of.
Capitalization: Often, a different capitalization instead of a synonym does the trick, as most language keywords are case-sensitive. E.g. object = new Object();
Prefix / Postfix: Another often encountered solution is to write myObject = new Object(). Which one you chose really depends on the naming conventions you follow. For private class fields, some developers use an underscore, e.g. this._object indicating a private access modifier.
Specification: In most cases however, you can find a more specific word describing the role - such as instance, parent, child or argument - or the subclass - such as integer or n instead of a generic number datatype - of your object.
In addition to the above, many language communities follow de-facto conventions such as cls for Class, obj for Object, me or self for this etc.

Is the <wbr> element semantic HTML? What about in a microdata context?

In short, this is bad web development and UX:
But solving it by using CSS3 word breaking (code & demo) can lead to an 'awkward whitespace' situation, and strange cut-offs — here's an example of both:
Maybe it's not such a big deal, and the UX perspective of it is here, but let's look at the semantics of one of the solutions:
You could ... use the <wbr> element to indicate an optional word
break opportunity. This will tell the browser to insert a line break
as necessary to flow onto a new line inside the container.
The first question: is using <wbr> semantic HTML? (Does it at least degrade gracefully?)
In either case, it seems that being un-semantic in the general sense is a small price to pay for good UX functionality.
However, the second quesiton is about the big picture:
Are there any schema.org (microdata/RFDa) ramifications to consider when using <wbr> to split up an email address? Will it still be valid there?
The wbr element is defined in the HTML5 spec. So it's fine to use it. If it's used right (= according to the definition in the spec), you may call it also "semantic use".
I don't think that there would be any problems in combination with micordata/RDFa. Usually you'd provide the URL in an attribute anyway, which can't contain wbr elements of course: foo<wbr>#example<wbr>.com.
For element content I'd guess (didn't check though) that microdata/RDFa parsers should use the text content without markup resp. understand what is markup and what is text, otherwise e.g. a FOAF name would be <abbr>Dr.</abbr> Foo instead of Dr. Foo.
So you can bet that microdata/RDFa parsers know HTML ;), and therefor it shouldn't be a problem to use its elements.

Better identifier names? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
How can I train myself to give better variable and function names (any user-defined name in a program).
Practice.
Always be thinking about it whenever you write or read code, including other people's code. Try to figure out what you would do differently in their code and talk to them about it, when possible (but don't harp on it, that would be a nuisance). Ask them questions about why they picked a certain name.
See how well your code reads without comments. If you ever need to comment on the basic purpose of something you named, consider whether it could have a better name.
The biggest thing is active mental participation: practice.
Thinking of names seems to be something that some people are extraordinarily bad at, and I'm not sure what the cure is. Back when I was an instructor working in commercial training, I'd often see situations like this:
Me: OK, now you need to create an integer variable to contain the value returned by getchar().
[Trainees start typing, and I wander round the training room. Most are doing fine, but one is sitting like a deer frozen the headlights]
Me: What's the problem?
Him: I can't think of a name for the variable!
So, I'd give them a name for it, but I have a feeling that people with this problem are not going to go far in programming. Or perhaps the problem is they go too far...
This is a subjective question.
I strive to make my code align with the libraries (or, at the least the standard ones) so that the code has an uniformity. I'd suggest: See how the standard library functions are named. Look for patterns. Learn what different naming conventions exist. See which one makes the most sense. E.g: most Java code uses really big identifier names, Camel casing etc. C uses terse/short names. There is then the Hungarian notation -- which was worth the trouble when editors weren't smart enough to provide you with type information. Nowadays, you probably don't need it.
Joel Spolsky wrote a helpful article on Hungarian notation a few years back. His key insight is this:
Let’s try to come up with a coding
convention that will ensure that if
you ever make this mistake, the code
will just look wrong. If wrong code,
at least, looks wrong, then it has a
fighting chance of getting caught by
someone working on that code or
reviewing that code.
He goes on to show how naming variables in a rigourous fashion can improve our code. The point being, that avoiding bugs has a quicker and more obvious ROI than making our code more "maintainable".
Read some good code and imitate it. That's the universal way of learning anything; just replace "read" and "code" with appropriate words.
A good way to find expressive names is starting with a textual description what your piece of software actually does.
Good candidates for function (method) names are verbs, for classes nouns.
If you do design first, one method is textual analysis.
(Even if you are only a team of 1) agree a coding standard with your colleagues so you all use the same conventions for naming. (e.g. It is common to use properties for values that are returned quickly, but a GetXXX or CalculateXXX method for values that will take time to calculate. This convention gives the caller a much better idea about whether they need to cache the results etc). Try to use the same names for equivalent concepts (e.g. Don't mix Array.Count and List.Length as Microsoft did in .net!)
Try to read your code as if somebody else wrote it (i.e. forget everything you know and just read it). Does it make sense? Does it explain everything they need to know to understand it? (Probably not, because we all forget to describe the stuff we "know" or which is "obvious". Go back and clarify the naming and documentation so that someone else can pick up your code file and easily understand it)
Keep names short but descriptive. There is no point writing a whole sentence, but with auto-completion in most IDEs, there is also little point in abbreviating anything unless it's a very standard abbreviation.
Don't waste characters telling somebody that this string is a string (the common interpretation of hungarian notation). Use names that describe what something does, and how it is used. e.g. I use prefixes to indicate the usage (m=member, i=iterator/index, p=pointer, v=volatile, s=static, etc). This is important information when accessing the variable so it's a useful addition to the name. It also allows you to copy a line of code into an email and the receiver can understand exactly what all the variables are meant to do - the difference in usage between a static volatile and a parameter is usually very important.
Describe the contents of a variable or the purpose of a method in its name, avoiding technical terms unless you know that all the readers of your code will know what those terms mean. Use the simplest description you can think of - complex words and technical terms sound intelligent and impressive, but are much more open to misinterpretation (e.g. off the top of my head: Collation or SortOrder, Serialise or Save - though these are well known words in programming so are not very good cases).
Avoid vague and near-meaningless terms like "value", "type". This is especially true of base class properties, because you end up with a "Type" in a derived class and you have no idea what kind if type it is. Use "JoystickType" or "VehicleType" and the meaning is immediately much clearer.
If you use a value with units, tell people what they are in the name (angleDegrees rather than angle). This simple trick will stop your spacecraft smashing into Mars.
For C#, C++, C in Visual Studio, try using AtomineerUtils to add documentation comments to methods, classes etc. This tool derives automatic documentation from your names, so the better your names are, the better the documentation is and the less effort you need to put in the finish the documentation off.
Read "Code Complete" book, more specifically Chapter 11 about Naming. This is the checklist (from here, free registration required):
General Naming Considerations
Does the name fully and accurately describe what the variable represents?
Does the name refer to the real-world problem rather than to the programming-language solution?
Is the name long enough that you don't have to puzzle it out?
Are computed-value qualifiers, if any, at the end of the name?
Does the name use Count or Index instead of Num?
Naming Specific Kinds Of Data
Are loop index names meaningful (something other than i, j, or k if the loop is more than one or two lines long or is nested)?
Have all "temporary" variables been renamed to something more meaningful?
Are boolean variables named so that their meanings when they're True are clear?
Do enumerated-type names include a prefix or suffix that indicates the category-for example, Color_ for Color_Red, Color_Green, Color_Blue, and so on?
Are named constants named for the abstract entities they represent rather than the numbers they refer to?
Naming Conventions
Does the convention distinguish among local, class, and global data?
Does the convention distinguish among type names, named constants, enumerated types, and variables?
Does the convention identify input-only parameters to routines in languages that don't enforce them?
Is the convention as compatible as possible with standard conventions for the language?
Are names formatted for readability?
Short Names
Does the code use long names (unless it's necessary to use short ones)?
Does the code avoid abbreviations that save only one character?
Are all words abbreviated consistently?
Are the names pronounceable?
Are names that could be mispronounced avoided?
Are short names documented in translation tables?
Common Naming Problems: Have You Avoided...
...names that are misleading?
...names with similar meanings?
...names that are different by only one or two characters?
...names that sound similar?
...names that use numerals?
...names intentionally misspelled to make them shorter?
...names that are commonly misspelled in English?
...names that conflict with standard library-routine names or with predefined variable names?
...totally arbitrary names?
...hard-to-read characters?

Should primitive datatypes be capitalized?

If you were to invent a new language, do you think primitive datatypes should be capitalized, like Int, Float, Double, String to be consistent with standard class naming conventions? Why or why not?
By "primitive" I don't mean that they can't be (or behave like) objects. I guess I should have said "basic" datatypes.
If I were to invent a new language, it wouldn't have primitive data types, just wrapper objects. I've done enough wrapper-to-primitive-to-wrapper conversions in Java to last me the rest of my life.
As for capitalization? I'd go with case-sensitive first letter capitalized, partly because it's a convention that's ingrained in my brain, and partly to convey the fact that hey, these are objects too.
Case insensitivity leads to some crazy internationalization stuff; think umlauts, tildes, etc. It makes the compiler harder and allows the programmer freedoms that don't result in better code. Seriously, you think there's enough arguments over where to put braces in C... just watch.
As far as primitives looking like classes... only if you can subclass primitives. Don't assume everyone capitalizes class names; the C++ standard libraries do not.
Personally, I'd like a language that has, for example, two integer types:
int: Whatever integer type is fastest on the platform, and
int(bits): An integer with the given number of bits.
You can typedef whatever you need from that. Then maybe I could get a fixed(w,f) type (number of bits to left and right of decimal, respectively) and a float(m,e). And uint and ufixed for unsigned. (Anyone who wants an unsigned float can beg.) And standardize how bit fields are packed into structures. If the compiler can't handle a particular number of bits, it should say so and abort.
Why, yes, I program embedded systems and got sick of int and long changing size every couple years, how could you tell? ^_-
(Warning: MASSIVE post. If you want my final answer to this question, skip to the bottom section, where I answer it. If you do, and you think I'm spouting a load of bull, please read the rest before trying to argue with my "bull.")
If I were to make a programming language, here are a few caveats:
The type system would be more or less Perl 6 (but I totally came up with the idea first :P) - dynamically and weakly typed, with a stronger (I'm thinking Haskellian) type system that can be imposed on top of it.
There would be a minimal number of language keywords. Everything else would be reassignable first-class objects (types, functions, so on).
It will be a very high level language, like Perl / Python / Ruby / Haskell / Lisp / whatever is fashionable today. It will probably be interpreted, but I won't rule out compilation.
If any of those (rather important) design decisions don't apply to your ideal language (and they may very well not), then my following (apparently controversial) decision won't work for you. If you're not me, it may not work for you either. I think it fits my language, because it's my language. You should think about your language and how you want your language to be so that you, like Dennis Ritchie or Guido van Rossum or Larry Wall, can grow up to make bad design decisions and defend them in retrospect with good arguments.
Now then, I would still maintain that, in my language, identifiers would be case insensitive, and this would include variables, functions (which would be variables), types (which would also be variables, both built-in/primitive (which would be subclass-able) and user-defined), you name it.
To address issues as they come:
Naming consistency is the best argument I've seen, but I disagree. First off, allowing two different types called int and Int is ridiculous. The fact that Java has int and Integer is almost as ridiculous as the fact that neither of them allow arbitrary-precision. (Disclaimer: I've become a big fan of the word "ridiculous" lately.)
Normally I would be a fan of allowing people to shoot themselves in the foot with things like two different objects called int and Int if they want to, but here it's an issue of laziness, and of the old multiple-word-variable-name argument.
My personal take on the issue of underscore_case vs. MixedCase vs. camelCase is that they're both ugly and less readable and if at all possible you should only use a single word. In an ideal world, all code should be stored in your source control in an agreed-upon format (the style that most of the team uses) and the team's dissenters should have hooks in their VCS to convert all checked out code from that style to their style and vice versa for checking back in, but we don't live in that world.
It bothers me for some reason when I have to continually write MixedCaseVariableOrClassNames a lot more than it bothers me to write underscore_separated_variable_or_class_names. Even TimeOfDay and time_of_day might be the same identifier because they're conceptually the same thing, but I'm a bit hesitant to make that leap, if only because it's an unusual rule (internal underscores are removed in variable names). On one hand, it could end the debate between the two styles, but on the other hand it could just annoy people.
So my final decision is based on two parts, which are both highly subjective:
If I make a name others must use that's likely to be exported to another namespace, I'll probably name it as simply and clearly as I can. I usually won't use many words, and I'll use as much lowercase as I can get away with. sizedint doesn't strike me as much better or worse than sized_int or SizedInt (which, as far as examples of camelCase go, looks particularly bad because of the dI IMHO), so I'd go with that. If you like camelCase (and many people do), you can use it. If you like underscores, you're out of luck, but if you really need to you can write sized_int = sizedint and go on with life.
If someone else wrote it, and wanted to use sized_int, I can live with that. If they wrote it and used SizedInt, I don't have to stick with their annoying-to-type camelCase and, in my code, can freely write it as sizedint.
Saying that consistency helps us remember what things mean is silly. Do you speak english or English? Both, because they're the same word, and you recognize them as the same word. I think e.e. cummings was on to something, and we probably shouldn't have different cases at all, but I can't exactly rewrite most human and computer languages out there on a whim. All I can do is say, "Why are you making such a fuss about case when it says the same thing either way?" and implement this attitude in my own language.
Throwaway variables in functions (i.e. Person person = /* something */) is a pretty good argument, but I disagree that people would do Person thePerson (or Person aPerson). I personally tend to just do Person p anyway.
I'm not much fond of capitalizing type names (or much of anything) in the first place, and if it's enough of a throwaway variable to declare it undescriptively as Person person, then you won't lose much information with Person p. And anyone who says "non-descriptive one-letter variable names are bad" shouldn't be using non-descriptive many-letter variable names either, like Person person.
Variables should follow sane scoping rules (like C and Perl, unlike Python - flame war starts here guys!), so conflicts in simple names used locally (like p) should never arise.
As to making the implementation barf if you use two variables with the same names differing only in case, that's a good idea, but no. If someone makes library X that defines the type XMLparser and someone else makes library Y that defines the type XMLParser, and I want to write an abstraction layer that provides the same interface for many XML parsers including the two types, I'm pretty boned. Even with namespaces, this still becomes prohibitively annoying to pull off.
Internationalization issues have been brought up. Distinguishing between capital and lowercase umlautted U's will be no easier in my interpreter/compiler (probably the former) than in my source code.
If a language has a string type (i.e. the language isn't C) and the string type supports Unicode (i.e. the language isn't Ruby - it's only a joke, don't crucify me), then the language already provides a way to convert Unicode strings to and from lowercase, like Perl's lc() function (sometimes) and Python's unicode.lower() method. This function must be built into the language somewhere and can handle Unicode.
Calling this function during an interpreter's compile-time rather than its runtime is simple. For a compiler it's only marginally harder, because you'll still have to implement this kind of functionality anyway, so including it in the compiler is no harder than including it in the runtime library. If you're writing the compiler in the language itself (and you should be), and the functionality is built into the language, you'll have no problems.
To answer your question, no. I don't think we should be capitalizing anything, period. It's annoying to type (to me) and allowing case differences creates (or allows) unnecessary confusion between capitalized and lowercased things, or camelCased and under_scored things, or other sets of semantically-distinct-but-conceptually-identical things. If the distinction is entirely semantic, let's not bother with it at all.

When to join name and when not to?

Most languages give guidelines to separate different words of a name by underscores (python, C etc.) or by camel-casing (Java). However the problem is when to consider the names as separate. The options are:
1) Do it at every instance when separate words from the English dictionary occur e.g. create_gui(), recv_msg(), createGui(), recvMsg() etc.
2) Use some intuition to decide when to do this and when not to do this e.g. recvmsg() is OK, but its better to have create_gui() .
What is this intuition?
The question looks trivial. But it presents a problem which is common and takes at least 5 seconds for each instance whenever it appears.
I always do your option 1, and as far as I can tell, all modern frameworks do.
One thing that comes to mind that just sticks names together is the standard C library. But its function names are often pretty cryptic anyway.
I'm probably biased as an Objective-C programmer, where things tend to be quite spelled out, but I'd never have a method like recvMsg. It would be receiveMessage (and the first parameter should be of type Message; if it's a string, then it should be receiveString or possibly receiveMessageString depending on context). When you spell things out this way, I think the question tends to go away. You would never say receivemessage.
The only time I abbreviate is when the abbreviation is more clear than the full version. createGUI is good because "GUI" (gooey) is the common way we say it in English. createGraphicalUserInterface is actually more confusing, so should be avoided.
So to the original question, I believe #1 is best, but coupled with an opposition to unclear abbreviations.
One of the most foolish naming choices ever made in Unix was creat(), making a nonsense word to save one keystroke. Code is written once and read many times, so it should be biased towards ease of reading rather than writing.
For me, and this is just me, I prefer to follow whatever is conventional for the language, thus camelCase for Java and C++, underscore for C and SQL.
But whatever you do, be consistent within any source file or project. The reader of your code will thank you; seeing an identifier that is inconsistent with most others makes the reader pause and ask "is something different going on with this identifier? Is there something here I should be noticing?"
Or in other words, follow the Principal of Least Surprise.
Edit: This got downmodded why??
Just follow coding style, such moments usually well described.
For example:
ClassNamesInCamelNotaionWithFirstLetterCapitalized
classMethod()
classMember
CONSTANTS_IN_UPPERCASE_WITH_UNDERSCORE
local_variables_in_lowercase_with_underscores