Would server traffic software (something like Piwik or Google) make a good case for using No SQL? - sql

We are trying to develop a company specific tracking software but not interested in Google or Piwik. Essentially we would have a JavaScript tracking code also. The data that it would capture, would that be best suited for traditional RDMS or can we get a NO SQL solution ?
Any thoughts or ideas welcome.

Creating xml files could do the trick for a no sql solution but web analytics can encompass a very large ammount of data depending on your "tracking software." You'll need some sort of relational data solution if you want to properly analyse the data and see trends such as how many unique visitors are using a specific browser.

Related

Better test reporting

I'm looking for some help designing a better summary report. Right now we publish and send everything (execution% by modules, defects etc) in an excel and I was hoping if we could use that excel data to generate a live dashboard that would be accessible by a URL.
To add, the execution data comes from QTest and defects from JIRA. At this point we are even ok with filling data in excel manually and using that as a source for any reporting tool.
If a free tool is available, even more better.
Any leads, helps, feedback is appreciated.
Thanks,
MD
Sounds like you need Microsoft's Power BI. We've done a lot of reporting from JIRA using this free tool (Desktop). If you need to share it with others "real time", you'll prefer the online experience for about $10/user/month. But if you're looking to stay "free", you can simply share the Power BI file with your stakeholders.
I recommend AGAINST using the already built in JIRA APP. It seems to want to pull back all your issues. Instead, use a REST API Call like this:
https://domain/rest/api/2/search?jql=filter=22605&fields=id,key,summary,description
If you get more issues back than your Issue Search is configured for, the pagination can be a little tricky. Also multiple values in a custom field need special handling.
Or if you're on premise and know your JIRA DB, direct SQL is an efficient way to go.
We use both mechanisms... (REST and SQL). SQL let us add logic in the view of the data that JIRA itself doesn't report on easily. (Parent-Child-subchild relationships and roll up of effort, story points, etc)
The best part of the Power BI solution is you should be able to integrate the data from JIRA and your test tool. (We pull from JIRA and our time tracking system).

Solution to host 200GB of data and provide JSON API with aggregates?

I am looking for a solution that will host a nearly-static 200GB, structured, clean dataset, and provide a JSON API onto the data, for querying in a web app.
Each row of my data looks like this, and I have about 700 million rows:
parent_org,org,spend,count,product_code,product_name,date
A31,A81001,1003223.2,14,QX0081,Rosiflora,2014-01-01
The data is almost completely static - it updates once a month. I would like to support straightforward aggregate queries like:
get total spending on product codes starting QX, by organisation, by month
get total spending by parent org A31, by month
And I would like these queries to be available over a RESTful JSON API, so that I can use the data in a web application.
I don't need to do joins, I only have one table.
Solutions I have investigated:
To date I have been using Postgres (with a web app to provide the API), but am starting to reach the limits of what I can do with indexing and materialized views, without dedicated hardware + more skills than I have
Google Cloud Datastore: is suitable for structured data of about this size, and has a baked-in JSON API, but doesn't do aggregates (so I couldn't support my "total spending" queries above)
Google BigTable: can definitely do data of this size, can do aggregates, could build my own API using App Engine? Might need to convert data to hbase to import.
Google BigQuery: fast at aggregating, would need to roll my own API as with BigTable, easy to import data
I'm wondering if there's a generic solution for my needs above. If not, I'd also be grateful for any advice on the best setup for hosting this data and providing a JSON API.
Update: Seems that BigQuery and Cloud SQL support SQL-like queries, but Cloud SQL may not be big enough (see comments) and BigQuery gets expensive very quickly, because you're paying by the query, so isn't ideal for a public web app. Datastore is good value, but doesn't do aggregates, so I'd have to pre-aggregate and have multiple tables.
Cloud SQL is likely sufficient for your needs. It certainly is capable of handling 200GB, especially if you use Cloud SQL Second Generation.
They only reason why a conventional database like MySQL (the database Cloud SQL uses) might not be sufficient is if your queries are very complex and not indexed. I recommend you try Cloud SQL, and if the performance isn't sufficient, try ensuring you have sufficient indexes (hint: use the EXPLAIN statement to see how the queries are being executed).
If your queries cannot be indexed in a useful way, or your queries are so cpu intensive that they are slow regardless of indexing, you might want to graduate up to BigQuery. BigQuery is parallelised so that it can handle pretty much as much data as you throw at it, however it isn't optimized for real-time use and isn't as conveneint as Cloud SQL's "MySQL in a box".
Take a look at ElasticSearch. It's JSON, REST, cloud, distributed, quick on aggregate queries and so on. It may or may not be what you're looking for.

Xcode iOS phone directory app. core data, sql lite, or

as part of an application I am trying to create, I am looking to data storage solutions. However, I have found many solutions that I can not quite directly apply to the position I am in.
Basically, I want to display in my app, a directory of the staff of my organization. About 100 or so individuals. I want to have generic attributes such as name, email, office#, etc.
However, my goal is to not end up with a static representation of the staff here! (people come and go, switch offices,etc.)
I am looking for the best way (if possible) to maintain a small database that I can administer, and if perhaps, something were to change to someone here, I can make the change and the change will be reflected accordingly.
Please help! I tried submitting my first app but got rejected because I relied on a webview to accomplish this task. This is an internship opportunity and my first real chance at development. Any help will be GREATLY appreciated.
Thanks!!!!!
The iPhone directory app can be used to store data in any format you want (xml, json or a proprietary format), because all you do is save a file. But if you choose to use the iPhone app directory to store data you have to write code to read the file (very simple to do) and parse the information (not so simple because the dificulty scales based on the information complexity).
SQLite is a tool to store structured data, providing you a set of tools to access and use the information. You don't need to parse the information, because SQLite does it for you by using transact sql queries.
By now, because you have a list of individuals, and these people are relationed to offices, I think you should use SQLite.
The Code Data is a object graph management, it's a tool to give you more options over data manipulation, and can make your life very easy if you have a lot of data and very complex data models. I don't think you need that for your particular problem, but I think you should learn it at some point.
UPDATE 1
You application will have something like:
A core database (sql server, oracle, my sql, etc) will hold your individuals information (your cloud database).
A web page (php, asp.net, etc) will display the core database information in json or xml format (your api).
A iphone app will download the information from the web page and store it in the local SQLite. (you have to decide when you will update the local sql lite, like when is opened, once a week, once a moth, twice a day, etc) (your local storage method).
Display the local SQLite individuals information in the app.

google-bigquery

I am using BigQuery for SEO reasons. I am a search TC and I am a little confused why you are not using the Google Forum as I thought that was standard. What I want to use BigQuery for is to pull when my competitors change data on their website and which pages that were changed. So I need the URL that was changed and when it was changed (date) so I can also pull the page title and description to see what they are doing different than I am.
Is there anyone that knows how to use BigQuery to pull:
Date the page was changed
URL
Title
Description
We've switched to using Stack Overflow for support for many of our developer products, such as BigQuery. There's a great community here on StackOverflow, and the interface for formatting technical questions and interacting with the community is fantastic.
BigQuery does not collect the data for you-- it's a cloud service for performing ad hoc queries on massive datasets. Before first performing the queries, you need to upload the data to the service (as a CSV format).
So, if you have a job which collects this data -- URL, title, description, date and perhaps a hash of the webpage, you could potentially ingest a CSV file of this data into BigQuery and use it to understand when webpages have changed.
Of course, there are also 3rd-party services (such as Changedetection.com) which may be easier to use for your purposes.

What is the fastest way for me to take a query and turn it into a refreshable graph of the results set?

I often find myself writing one off queries to either answer someone's question or trouble shoot something and I would like to be able to quickly expose the on demand refreshable results of the query graphically so that I can share these results to others without having to go through the process of creating an SSRS report and publishing it to a reporting services server.
I have thought about using excel to do this or maybe running a local SSRS server but both of these options are still labor intensive and I cannot justify the time it would take to do these since no one has officially requested that I turn this data into a report.
The way I see it the business I work for has invested money in me creating these queries that often return potentially useful data that other people in the organization might want but since it isn't exposed in any way and I don't know that this data is something they want and they may not even realize they want this data, the potential value of the query is not realized. I want to increase the company's return on investment on all these one off queries that I and other developers write by exposing their results graphically so that they can be browsed by others and then potentially turned into more formalized SSRS reports if they provide enough value to justify the development of the report.
What is the fastest way for me to take a query and turn it into a refreshable graph of the results set?
Why dont you simply use what you may already have. Excel...you can import data via an ODBC / Oracle / SQL Connection. Get Data..and bam you can run the query and format it right in the spreadsheet and provide sorting etc. All you need to supply is the database name and user name and password to connect to the db.
JonH is right regarding Excel's built in ODBC support, but I have had tons of trouble with this. In my case, the ODBC connection required the client software to be installed so that it could use the encryption methods, etc. Also, even if that were not the case, the user (I believe) would still have to manually install and set up an ODBC connection.
Now if you just want something on your machine to do the queries and refresh them, JohH's solution is great and my caveats are probably irrelavent. But if you want other users to have access, you should consider having a middle-man app (basically a PHP script, assuming a web server is an option for you), that does a query, transforms the results into XML, and outputs it as "report-xyz.xml". You can then point anybody running a newer version of Excel to that address and they can very easily import the data into Excel with no overhead. (basically a kind of web service).
Keep in mind, I don't think you should have a web script that will allow users to make queries to your Database server! You would have some admin page where you make pass the query in and a new xml file with the results gets made. So my idea is also based on the idea that you want to run the same queries over and over without any specifics passed in. (if that were the case, I'd look into just finding a pre-built web services bridge for your database that already has security features built in. Then you could have users make the limited changes allowed.)