How to access event data for LHCb calorimeters? - physics

I want to do a project using machine learning on the calorimeter event data of the LHCb. How can I access this data? Is it very difficult to navigate your way through the source code on your own?

As far as gaining access to the lhcb calorimeter database, I am not sure if that is publicly available.
However, I can tell you that once you have this dataset, you will need to use some data science tools to get the machine learning data out of it. Hadoop is a common stack for big data, but how to set up and use Hadoop is WAY outside the scope of this post.
http://hadoop.apache.org/

Related

What is a good stack to build a web application with analytics? Currently using mongodb but am thinking about changing

As my question states, I am building an application that has a feature of displaying analytics for the user. How many units sold, product price breakdowns, etc.
My initial thought was to use MongoDB to test it out, with mongoose as the ORM. I knew I would need some relational functionality that MongoDB handles good enough so at the end I'm still happy with MongoDB from a dev perspective but whenever I start thinking about building charts, data queries, and analytics, things start to get a little bit muddy since I'm not entirely sure how to go about it.
Right now whenever I need to build a chart I run a bunch loops over data to format the relevant information for the front end to read. However, this is probably inefficient and additional server logic that could be handled differently with say a SQL database.
Having given the context, my question is: Is mongodb not a good database for what I want to do? If so, what would be your recommendation? Or is my inexperience with mongo making me think about changing technologies without looking at alternatives? If so what should I be educating myself on?

Steady Flow of Data to SQLite

I need to write a piece of software which collects signals from different bus systems (e.g. CAN Bus, Ethernet) and then saves the data in a data storage.
Another piece of software then uses the data in the database to operate.
So basically i need constant write access on the one side and constant read access on the other side.
I thought about using SQLite because i had good experience with it because its ease of use. But i also experienced problems with two processes simultaniously using the database file.
So my question is: Is SQLite still the right thing for a use case like this or are there alternatives? Like a whole SQL Server thingy? Or some completely other approach? I can't think of a practical one.
Thanks in advance!

Google Dataflow Sharing Resources Between Windows

I am currently building a google data-flow pipeline that writes to multiple big query tables at run-time. The problem I am currently facing is, I need to re-use the resources like big query service instance, table info etc. (I do not want to re-create those resources every time) but I am not able to cache them in an efficient way.
Currently I am using a simple factory to cache them (using static concurrent hash map). The pipeline does not seem to pick those from the cache (actually it does it for couple of times but most of them are re-created).
I saw some work around with fixed size session windows but I need more simpler solution if there exists any.
So, is there any best practices or solution to the current problem I am facing.
Is there any way to share resources between windows ?
Actually I misplaced the logging information which let to invert the result (my bad). But the solution with Static Factory separate from the pipeline job seem to resolve the resource sharing issue. Hope this helps to anyone having similar issue further :)

Does Semantic tools like Anzo create a copy of data?

I'm new to semantic technologies. I understand what RDF, OWL and Ontologies and other basic terminologies are and how semantic search uses them. When we create a semantic search module using anzo with enterprise search capabilities. It connects with various data sources and creates relationship between them. Now I'm interested in knowing what a semantic tool like anzo does internally.
Does it creates a copy of data on local machine or it hits data sources every time we execute a SPARQL query
If it stores data, is this data stored in its row format or data is stored after cleaning and creating semantic relation between them.
What happens to data after query is executed. How does it get current data every time?
Any thoughts over it would be valuable for me.
Thanks a lot in advance!
Based on your comments, it appears you're using Anzo Graph Query Engine? If so, then the answers to you questions are
A copy of the data is held in memory
Not clear from any of the published information.
It doesn't. You need to load in the data using the 'LOAD' command.
A bit more on 3: You would be responsible for implementing a mechanism to keep the data in here up-to-date with the underlying data source. (which might be as simple as rebuilding the graph from a nightly dump or trying to implement a change data capture against the underlying store which replicated CRUD operations on the graph)
My answers are based on the marketing and support information available on the CambridgeSemantics site.

How to migrate data from MongoDB to SQL-Server? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I searched around I found that there are ways to transfer/sync data from sql-server to mongodb.
I also know that Mongodb contains collections instead of table and the data is stored differently.
I want to know whether it is possible to move data from mongodb to sql-server. If yes, then how and what are the tools/topics should I use?
Of course it's possible, but you will need to find a way to force the flexibility of a document db like MongoDB into a RDBMS like SQL Server.
It means that you need to define how you want to handle missing fields (will it be a NULL in the db column? or a default value?) and other things that usually don't fit well in a relational database.
Said do, you can use an ETL tool able to connect to both databases, SSIS can be an example if you want to stay in the MicroSoft world (you can check this Importing MongoDB Data Using SSIS 2012 to have an idea) or you can go for an open source tool like Talend Big Data Integration which has a connector to MongoDB (and of course to SQL Server).
There is no way to directly move data from MongoDB to SQL Server. Because MongoDB data is non-relational, any such movement must involve defining a target relational data model in SQL Server, and then developing a transformation that can take the data in MongoDB and transform it into the target data model.
Most ETL tools such as Kettle or Talend can help you with this process, or if you're a glutton for punishment, you can just write gobs of code.
Keep in mind that if you need this transformation process to be online, or applied more than once, you may need to tweak it for any small changes in the structure or types of the data stored in MongoDB. As an example, if a developer adds a new field to a document inside a collection, your ETL process will need rethinking (possibly new data model, new transformation process, etc.).
If you are not sold on SQL Server, I'd suggest you consider Postgres, because there is a widely-used open source tool called MoSQL that has been developed expressly for the purpose of syncing a Postgres database with a MongoDB database. It's primarily used for reporting purposes (getting data out of MongoDB and into an RDBMS so one can layer analytical or reporting tools on top).
MoSQL enjoys wide adoption and is well supported, and for badly tortured data, you always have the option of using the Postgres JSON data type, which is not supported by any analytics or reporting tools, but at least allows you to directly query the data in Postgres. Also, and now my own personal bias is showing through, Postgres is 100% open source, while SQL Server is 100% closed source. :-)
Finally, if you are only extracting the data from MongoDB to make analytics or reporting easier, you should consider SlamData, an open source project I started last year that makes it possible to execute ANSI SQL on MongoDB, using 100% in-database execution (it's basically a SQL-to-MongoDB API compiler). Most people using the project seem to be using it for analytics or reporting use cases. The advantage is that it works with the data as it is, so you don't have to perform ETL, and of course it's always up to date because it runs directly on MongoDB. A disadvantage is that no one has yet built an ODBC / JDBC driver for it, so you can't directly connect BI tools to SlamData.
Good luck!
There is a tool provided by MongoDB called mongoexport and it's capable of exporting csv files. These csv files can be easily imported into MySQL. Good luck!