OLAP datasets other than Adventureworks? - ssas

I'm looking to play around with various cube data to better understand it. I've also created a few of my own to play around with, but it's quite tedious to do, especially with saving the tables/views into the cube.
Other than Adventure Works, are there any data sets that I can download to play around with? I've been having a hard time finding any of these via Google, MSDN, etc. I'm open to whatever!

In this github link you will find 2 sample databases other than Adventure Works wich are
wide-world-importers.
contoso-data-warehouse.

Related

Best way to save SQL versions while working on Tableau?

I am working using Tableau and have to write down multiple different SQL each time, while making new data sources.
I have to save all changes on SQL for every data source.
Currently I would paste the SQL on notepad and save them on separate folder in my computer, along with description of the changes.
Is there any better way to do this?
Assuming you have permission to create objects in the database, begin by creating database views, As #Nick.McDermaid commented.
Then, instead of using Custom SQL data source in Tableau, just connect to the View as if it were a table.
If you need to track the changes to these SQL views of your data, you will need to learn how to use source control for the .sql files that can be scripted from within SQL Server Management Studio:
Your company or school may have a preferred source control system already in use, in which case you should use that. If they don't, or if you are learning at home, then Git and Subversion are popular open source choices.
There are many courses available on learning platforms like Coursera that will teach you how to learn how to use those systems.
I had similar problem as you.
We ended up writing the queries in SQL Editor SQL Work bench (https://www.sql-workbench.eu/), then managed the code history and performed code peer-review (logic, error check, etc) in team shared space (like confluence).
The reasons we did that is
1) SQL queries are much easy to write on Work Bench
2) Code review is a must! You will find through implementing a review process more mistakes than you could ever think about
3) The shared space is just really convenient as it is accessible by everyone, and all errors are documented. After sometimes you get a lot of visible knowledge accumulated.
I also totally agree with Nick as this is one step to a reporting solution. But developing a whole reporting server is heavy, costly and takes time. Unless management are really convinced of the importance of developing a reporting solution, you may have to get a workaround with queries and Tableau (at least that was the case for us)
A little late to the party, but I would suggest you simply version the tableau workbook. The contents of the workbook are XML, so perfect for versioning using file based tools (Dropbox, One Drive, etc.) or source control (git, etc.). The workbooks themselves are usually quite small, so just make sure to keep the extract data separate if you use it.

How to turn MongoDB collection into a 'Table'

I've been given access to a cloud MongoDB (MongoLab) and need to extract some data into Excel so I can analyse it. The data isn't particularly complicated or large and is well suited to a 'normal' relational structure.
My research suggests things are trickier because the data has 'nested' aspects although conceptually its pretty clear how this would become a table. Here is what a document in the collection looks like, essntinaly the stuff highlighted blue would be columns in the table while the yellow would create a row for each "marketing_event" with the specifics of each event also being in a column:
Ideally I would use Power Query to get the data into Power Pivot but at this point anything will do!
I've tried a bunch of things all of which haven't got me much closer to end result that I'm looking for:
I downloaded MongoVue which I used to successfully connect to the database and while it enabled me to see the data in a basic table form, it does nothing with the nested stuff and the documentation is minimal in terms of how it could be of more use.
I also tried Pentaho PDI based on this article:http://sqlmag.com/blog/integrating-mongodb-and-open-source-data-stores-power-pivot but the steps aren't detailed and although I can see the collection, trying to replicate some sample queries I found on the web were totally unsuccesful.
I've tried to get a trial of Simba's ODBC connector but as yet the download doesn't seem to be working. I have contacted them but without response just yet.
I've even installed Mongo locally and tried to use the command prompt to connect which I was unable to do. Even if I pursued this I wouldn't be confident about knowing where to start in terms of creating the end product.
Happy to hear any suggestions or recommendations.
TIA
Jacob
Here's a solid ODBC driver that helps maintain the fidelity of your mongoDB data by exposing the nested MongoDB data model as a set of relational tables to Excel and other ODBC apps. in the sample document above, this driver will do exactly what you're looking for. The embedded documents and arrays can be extracted as separate related tables from the fields at the root level of the document.
https://www.progress.com/odbc/mongodb
I don't know if you already found the solution - but Simba ODBC is providing support for nested arrays.
Have a look here:
https://www.simba.com/resources/webinars/connect-tableau-big-data-source. This is an example how to connect Tableau BI to MongoDB. You might find it helpful.
And some more information on handling no-sql data in BI tools is provided in this whitepaper: http://info.mongodb.com/rs/mongodb/images/MongoDB_BI_Analytics.pdf

Schema-agnostic OLAP-like tool?

Do there exist any (ideally free or open-source) tools for performing OLAP analyses on arbitrary tables in a relational database, without requiring any advance specification of dimensional hierarchies, cardinalities, or any other meta-information about the table beyond what can be extracted automatically from the table itself?
My inability to Google for anything like what I'm describing makes me suspect I'm using incorrect terminology and what I'm searching for isn't properly considered to be OLAP. If this is the case, what I specifically want is anything that would let technically unsophisticated users create cross-tab or contingency table aggregations using tables in a relational DB without needing to write elaborate SQL queries.
Or, in other words, I'd like something that mimics Excel's PivotTables on a larger scale. I appreciate that Excel does indeed generate extensive caches behind the scenes when you make a PivotTable, but it does this without the user having to explain to it which caches need creating. This is the functionality I'm trying to find elsewhere, if it exists.
The best options I know of are Excel and Access, but of course they are not open source. This space kinda got trampled in the explosion of interest in what is now called Business Intelligence and a lot of companies got bought by MS and others. It's pretty thin now as far as I can tell. I'll watch this thread though.
The most useful paradigm to attach to is I think spreadsheets and there's not much competition there any more. Google Docs spreadsheets can import csv etc. exported from databases, and there's a pivot chart available, but not much more.
The other place I've seen OLAP capabilities is in the Adobe Flex libraries to build on with ActionScript if you have any inclination in that direction. As usual, Adobe manages to get it about 90% right but doesn't quite provide a whole product.
icCube aims to setup an OLAP cube as simply as possible. It is not schema-agnostic, but I guess this is quite simple to define dimensions and facts from existing DB tables. Nevertheless, this could be not so "simple" depending on your tables - difficult to say without knowledge about them. I guess there's no generic easy solution ;-)
Then you can use Excel pivot table (amongst others) to access the cubes. Note as far as I know Excel does not do any caching neither aggregation when connecting to a cube. Indeed, it is generating all the required MDX requests to the cube.
Hope that helps.

Is This a “Large” GraphViz Chart, and How to Fix it?

I started using GraphViz yesterday in order to visualize the relationships between some things—a project I’ve been wanting to tackle for quite a while now.
So far, I’ve gotten it pretty well done, but there’s a few things I’m struggling with, specifically getting the chart to look good and how long it takes to process the DOT file—thank goodness I decided to use a computer ot do it instead of doing it by hand on paper!
It seems to me that the reason my graph looks so cluttered and why it takes so long is because there are so many entities and so many relationships in it (including many cyclical ones).
I’ve read of “large” graphs several times in the documentation and other resources on the Internet, but nothing that specifically indicates what is considered large, so I’m wondering if mine counts as large.
Simply put, I’ve got two groups of nodes (A and B), with relationships between items in A to items in B. Currently there are:
  123 entries in group A
  55 entries in group B
  278 relationships
  ? cyclical relationships
Plus, I’m also considering adding relationships between items in group A.
So does this count as “large”? I was under the impression that “large” was thousands of items/links. (Depending on some minor changes, the resulting image is about 8444x7463 pixels. I’ve included a—relatively huge—thumbnail.)
If it is large, are there any tips on making a messy, convoluted graph look good?
If it is not large, then why am I having trouble with it (I am currently using non-overlapped and neato)?
Thanks a lot.
alt text http://img340.imageshack.us/img340/5493/mygraph.png
I suggest you give Gephi a try, as I have found it easier to use than Graphviz.

Wiki Database, is there one?

I was searching the net for something like a wiki database, just like wikipedia but instead stores structured content, editable by users. What I was looking for was an online database accessible by everyone where people can design the schema and data with proper versioning of both schema and data. I couldn't find any such site. I am not sure if it is my search skills or if there really is no wiki database as of now. Does anyone out there know anything like this?
I think there is a great potential for something like this. A possible example will be a website with a GUI for querying a MySQL DB where any website visitor can create DB objects and populate data.
UPDATE: I had registered the domain wikidatabase.org to get started on a tool but I didn't find enough time yet. If anyone is interested in spending some time and coding on this, please let me know at wikidatabase.org
It's not quite what you're looking for, but Semantic Mediawiki adds database-like features to MediaWiki:
http://semantic-mediawiki.org/wiki/Semantic_MediaWiki
It's still fundamentally a Wiki, but you can add semantic tags to pages ([[foo::bar]] [[baz::1000]]) and then do database-type queries across them: SELECT baz FROM pages WHERE foo=bar would be {{#ask: [[foo::bar]] | ?baz}}. There is even an embryonic SPARQL implementation for pseudo-SQL queries.
OK this question is old, but Google led me here, so for anyone else out there looking for a wiki for structured data: Take a look at Foswiki.
This might be like what you're looking for: dbpedia.org. They're working on extracting data from Wikipedia, and encoding it in a structured format using RDF, so that it can be queried using SPARQL.
Linkeddata.org has a big list of RDF data sets.
Do you mean something like http://www.freebase.com?
You should check out https://www.wikidata.org/wiki/Wikidata:Main_Page which is a bit different but still may be of interest.
Something that might come close to your requirements is Google Docs.
What's offered is document editing roughly similar to MS Word, and spreadsheets roughly similar to Excel. I'm thinking of the latter, of course.
In Google Docs, You can create spreadsheets for free; being spreadsheets, they naturally have a row-and-column structure similar to a database, and which you can define flexibly. You can also share these sheets with other people. This seems to be a by-invite-only process rather than open-to-all, but there may be other possibilities I'm not aware of, or that level of sharing might be enough for you in any case.
mindtouch should be able to do it. It's rather easy to get data in / out. (for example: it's trivial to aggregate all the IP's for servers into one table).
I pretty much use it as a DB in the wiki itself (pages have tables, key/value..inheritance, templates, etc...) but you can also interface with the API, write dekiscript, grab the XML...
I like this idea. I have heard of some sites that are trying to pull together large datasets for various things for open consumption, but none that would allow a wiki feel.
You could start with something as simple as an installation of phpMyAdmin with a known password that would allow people to log in, create a database, edit data and query from any other site on the web.
It might suffer from more accuracy problems than wikipedia though.
OpenRecord, development of which seems to have halted in 2008, seems to approach this. It is a structured wiki in which pages are views on the data. Unlike RDBMSes it is loosely typed - the system tries to make a best guess about what data you entered, but defaults to text when it cannot guess. Schemas appear to have been implied.
http://openrecord.org
An example of the typing that is given is that of a date. If you enter '2008' in a record, the system interprets this as a date. If you enter 'unknown' however, the system allows that as well.
Perhaps you might be interested in Couch DB:
Apache CouchDB is a document-oriented
database that can be queried and
indexed in a MapReduce fashion using
JavaScript. CouchDB also offers
incremental replication with
bi-directional conflict detection and
resolution.
I'm working on an Open Source PHP / Symfony / PostgreSQL app that does this.
It allows multiple projects, each project can have multiple directories, each directory has a defined field structure. Admins set all this up.
Then members of the public can suggest new records, edit or report existing ones. All this is moderated and versioned.
It's early days yet but it basically works and is already in real world use in several projects.
Future plans already in progress include tools to help keep the data up to date, better searching/querying and field types that allow translations of content between languages.
There is more at http://www.directoki.org/
I'm surprised that nobody has mentioned Wikibase yet, which is the software that powers Wikidata.