Twitter queries ideas - sql

I don't know if this is the right place to ask this question, but I was wondering if you have a huge database of tweets, what kind of informative queries would you run on that database?
I just want to have more ideas so I can illustrate that in my study.

It depends on what you are storing. If it is just the tweets themselves you could run reports on keywords being used, hash tags used within the tweet, etc. If you are storing other information about the user then you open it up to the world of #marketing

Related

What is best practice to store user data?

I know this is a really basic question yet I did not find further information. Figure I want to develop a basic multiple-user application for notes. In my database I have a table where I store user IDs, usernames and passwords.
I now want to store the users notes, but a user should only be able to see their own notes. What is best practice to do this? The two possibilities that come to my mind are
Create a table for each user where you store their notes (probably
scales horribly bad)
Have one big notes-table and save the user IDs as secondary keys (It just
feels a bit "off" to have everything stored in one big table)
Is one of these two ideas used in this exact way in large scale real-world projects? If so, is there anything else one has to pay attention to?
In general you need the 2nd option.
My advice to you, please don’t create any auth functions, because it's a very hard solution for the beginners. Much better for this type of application (as notes) is to use a serverless architecture.
E.g. Firebase, Supabase and so on.
Where you will have database, security authentication, record level security, storage for files etc.

Accessing Dashboard usage data in Pentaho

I have an idea about creating a dashboard that shows the usage statistics of my Pentaho dashboards. It would generally show which dashboards are being used how much and by whom.
I know this information is in some logs somewhere but I would appreciate pointers on where to look and if anyone has implemented anything of the sort, I'd love some ideas.
There are audit tables in the Hibernate database.
Look for table PRO_AUDIT, it includes things like username, action requested, timestamp, etc. It still requires a lot of parsing, but it’s easier than going through the log files.

What is the most effective way to compare, find and return a large set of mobile numbers while identifying contacts?

Say I have data for 1000 users on my DB and someone new signs up. I want them to have an easy to way find contacts already registered - via their phone number. Something very similar to WhatsApp, Allo, Instagram, Twitter etc. do - they allow you to see what contacts are already using their services.
The DB stores usernames and contact information - name, number etc. If X signs up with 200 contacts, do I compare each of the 200 with each of the 1000 existing?
Surely, there's a better way than taking my New User's 200 contacts and comparing each one to the existing 1000 records. How do the other services manage this? Is there a specific sort of data structure I should be maintaining for searching?
Will a tree or graph structure be a more efficient approach in this scenario? If so, how should I be implementing it?
I'm using DRF for the back-end implementation.
I've searched around, but I don't seem to find a good answer for this problem.
Your content is in a database. You aren't required to work out the most efficient data structures to use to store the information; your database engine already does that. And it does it way better than you ever could. (That's not a personal insult; databases store and retrieve information way better than ANYONE could.)
Ultimately, you do need to compare each of the 200 new numbers against each of the 1,000 numbers in your database. You don't have to work out how to do that efficiently, though. Just ask the database if any of those numbers exist and it will (I guarantee) work out the most efficient possible way to do it.
For a database with many millions of records, the answer to the question "Does this value already exist?" should still come back in the tiniest fraction of a second.

Data handling performance

I'm not sure how to best handle the data I am working with so I wanted to ask what you guys suggested. I'm no expert so please try to be as simple as possible.
I'm writing an IRC bot that maintains a massive list of users(hundreds of thousands). It gives each user points based on the time they spend in IRC. So the data I must manage will consist of a user & their points.
I already tried storing the users in individual text files but that was a bad idea mostly due to wasting clusters on the HDD. Now I'm considering storing all the users in a single file but I'm concerned with the efficiency of processing through all the users.
Should I load the whole file into an array or just load the users that are online?
I hope you can understand what I am asking. I will try to clarify if needed.
It sounds like this program is calling for the information to be stored in a SQL style database. This load is beyond what I'd want to put into a single text file but can easily be handled in a database

What is the best way to allow website users to edit already existing database records?

I am building a web application that will essentially allow authenticated users access to mass amounts of data, but I don't want users to only have read-only access. If there are records missing fields but a user has found information to fill these fields or correct already populated data, I would like the user to be able to do so.
However, I'm worried about mean-spirited folks coming in and simply clearing out records out of sheer boredom and am wondering what the best way to prevent this from happening would be.
My first thought is to have users submit edits, and have a page devoted to batch approvals of these edits after myself or trusted individuals skim over the page. Of course, this would be time consuming (especially as the database grows larger), and I'm curious to know of any better ways to give users editing privileges.
As you are in Rails, there are a number of plugins that provide auditing and versioning of records -
http://github.com/andersondias/acts_as_auditable
http://github.com/laserlemon/vestal_versions
These should let you build something that allows edits but still support reversions in the worst case scenario.
Support rollbacks, like Wikis, to undo malicious edits.