i want to copy a data from a website which sells courses like ITIL, Prince2 and PMP and many other IT sector courses now there are 20,000 different courses's description is there.
However, i want to use selenium to scrape all the data but description is still subject to copyright.
Kindly let me know how i can manipulate all of that description to data to same meaning but different words.
Is there any API which can give me an access to build an code which will be helping these description data by using it's synonymous or which can change it's grammer to completely new sentennces but same meaning.
Kindly let me know where to start this.
Thanks,
The task you are referring to is called paraphrasing.
There is a lot of research on the field. In arXiv you fill find research papers on the topic. However, since you are asking for an API, I am assuming you don't want to implement these models by your self. Luckily, some authors have published their models online on GitHub. (Note: some are a re-implementation by someone else.)
When you use some of these implementations, note that most offer a pre-trained model. Do read which data set was used for training and try to pick the one that is the most similar to the data that you are facing. By doing so, more words in the domain of your descriptions will be available and more synonyms can be used.
I am trying to put a report from two different universes. is it possible to add fields from both universes into a single report?
Yes, this is a commonly used functionality. However there are rules on how to combine data sets (be it from the same or a different universe) and what you can do with them.
As this is a quite extensive topic, I suggest you have a look at the Web Intelligence (I'm assuming this is the application you're using, as you haven't specified it) manuals, corresponding to the version of BusinessObjects you're using. The manuals are available at help.sap.com.
You might also find the interactive tutorials available on sap.com/LearnBI helpful.
I can barely count the number of times I've created a "users" table, similar for "computers" and "customers". I've tried looking around, but haven't ever seen a resource for modeling these schema that we see over and over again. It seems like some of these objects should be some-kind-of-solved by now. Is there anything like this?
I have never seen anything like this either and I'm not sure it's necessary. Yes, there are a lot of similarities but every application is different. At one point I had built an internal library of some of my more "standard" tables (user is a good example) to use as a jumping point, but I have yet to create two identical tables for different systems.
Thus, I have yet to ever use the library I built because I can write the new table quicker and more error free than I can modify another existing example to work for the current project.
You could look at the source code of some popular open-source CRM/ERPs, such as OpenERP, though some of them are not great.
These are the top books on data modelling patterns:
Analysis Patterns, Fowler
Data Model Resource Book, vol. 1,2,3, Silverston
Enterprise Model Patterns, Hay
Patterns of Data Modeling, Blaha
I was searching the net for something like a wiki database, just like wikipedia but instead stores structured content, editable by users. What I was looking for was an online database accessible by everyone where people can design the schema and data with proper versioning of both schema and data. I couldn't find any such site. I am not sure if it is my search skills or if there really is no wiki database as of now. Does anyone out there know anything like this?
I think there is a great potential for something like this. A possible example will be a website with a GUI for querying a MySQL DB where any website visitor can create DB objects and populate data.
UPDATE: I had registered the domain wikidatabase.org to get started on a tool but I didn't find enough time yet. If anyone is interested in spending some time and coding on this, please let me know at wikidatabase.org
It's not quite what you're looking for, but Semantic Mediawiki adds database-like features to MediaWiki:
http://semantic-mediawiki.org/wiki/Semantic_MediaWiki
It's still fundamentally a Wiki, but you can add semantic tags to pages ([[foo::bar]] [[baz::1000]]) and then do database-type queries across them: SELECT baz FROM pages WHERE foo=bar would be {{#ask: [[foo::bar]] | ?baz}}. There is even an embryonic SPARQL implementation for pseudo-SQL queries.
OK this question is old, but Google led me here, so for anyone else out there looking for a wiki for structured data: Take a look at Foswiki.
This might be like what you're looking for: dbpedia.org. They're working on extracting data from Wikipedia, and encoding it in a structured format using RDF, so that it can be queried using SPARQL.
Linkeddata.org has a big list of RDF data sets.
Do you mean something like http://www.freebase.com?
You should check out https://www.wikidata.org/wiki/Wikidata:Main_Page which is a bit different but still may be of interest.
Something that might come close to your requirements is Google Docs.
What's offered is document editing roughly similar to MS Word, and spreadsheets roughly similar to Excel. I'm thinking of the latter, of course.
In Google Docs, You can create spreadsheets for free; being spreadsheets, they naturally have a row-and-column structure similar to a database, and which you can define flexibly. You can also share these sheets with other people. This seems to be a by-invite-only process rather than open-to-all, but there may be other possibilities I'm not aware of, or that level of sharing might be enough for you in any case.
mindtouch should be able to do it. It's rather easy to get data in / out. (for example: it's trivial to aggregate all the IP's for servers into one table).
I pretty much use it as a DB in the wiki itself (pages have tables, key/value..inheritance, templates, etc...) but you can also interface with the API, write dekiscript, grab the XML...
I like this idea. I have heard of some sites that are trying to pull together large datasets for various things for open consumption, but none that would allow a wiki feel.
You could start with something as simple as an installation of phpMyAdmin with a known password that would allow people to log in, create a database, edit data and query from any other site on the web.
It might suffer from more accuracy problems than wikipedia though.
OpenRecord, development of which seems to have halted in 2008, seems to approach this. It is a structured wiki in which pages are views on the data. Unlike RDBMSes it is loosely typed - the system tries to make a best guess about what data you entered, but defaults to text when it cannot guess. Schemas appear to have been implied.
http://openrecord.org
An example of the typing that is given is that of a date. If you enter '2008' in a record, the system interprets this as a date. If you enter 'unknown' however, the system allows that as well.
Perhaps you might be interested in Couch DB:
Apache CouchDB is a document-oriented
database that can be queried and
indexed in a MapReduce fashion using
JavaScript. CouchDB also offers
incremental replication with
bi-directional conflict detection and
resolution.
I'm working on an Open Source PHP / Symfony / PostgreSQL app that does this.
It allows multiple projects, each project can have multiple directories, each directory has a defined field structure. Admins set all this up.
Then members of the public can suggest new records, edit or report existing ones. All this is moderated and versioned.
It's early days yet but it basically works and is already in real world use in several projects.
Future plans already in progress include tools to help keep the data up to date, better searching/querying and field types that allow translations of content between languages.
There is more at http://www.directoki.org/
I'm surprised that nobody has mentioned Wikibase yet, which is the software that powers Wikidata.
At my work everyone has sql snippets that they use to answer questions. Some are specific to a customer, while some are generic for a given database. I want to consolidate those queries into a library/repository that can be accessed by anyone on the team. The requirements would be:
Accessible
Searchable
Tagable (multiple tags allowed per sql)
Exportable (create a document containing all queries with certain tags)
I'm interested in what has been found to work in other team environments.
You could use a wiki.
You could get started with something as simple as Tiddly wiki.
A wiki is a great approach.
For database specific or project specific snippets it's also very useful to have links to where a similar construct occurs in the code. We use trac's wiki which gives nice integration with out SVN for this.
Rather than pasting SQL snippets, I would consider graduating to an ORM (Object-Relational Mapper) or some other library to make representing and manipulating the data easier. It provides a layer of encapsulation to guard against schema changes and a layer of abstraction so you can think of the data in terms of business logic (ie. a user) rather than a collection of tables (ie. a user table, a password table, an access table...).
In Perl this would be something like DBIx::Class.
Another approach you may want to look at is creating views in your database. 'select * from some_view' can hide quite a bit of SQL. You'll still want to use a wiki to document them, but if its a view you don't have to worry about people keeping outdated copies.