Whats the maximum number users that can join a public chat group - quickblox

Is there a limit on the number of users that can join a public chat group at a certain time?
Does it make a difference if you are under the Starter, Advanced, Pro or the Enterprise Plans.
Thanks.

Related

Google BigQuery Query exceeded resource limits

I'm setting up a crude data warehouse for my company and I've successfully pulled contact, company, deal and association data from our CRM into bigquery, but when I join these together into a master table for analysis via our BI platform, I continually get the error:
Query exceeded resource limits. This query used 22602 CPU seconds but would charge only 40M Analysis bytes. This exceeds the ratio supported by the on-demand pricing model. Please consider moving this workload to the flat-rate reservation pricing model, which does not have this limit. 22602 CPU seconds were used, and this query must use less than 10200 CPU seconds.
As such, I'm looking to optimise my query. I've already removed all GROUP BY and ORDER BY commands, and have tried using WHERE commands to do additional filtering but this seems illogical to me as it would add processing demands.
My current query is:
SELECT
coy.company_id,
cont.contact_id,
deals.deal_id,
{another 52 fields}
FROM `{contacts}` AS cont
LEFT JOIN `{assoc-contact}` AS ac
ON cont.contact_id = ac.to_id
LEFT JOIN `{companies}` AS coy
ON CAST(ac.from_id AS int64) = coy.company_id
LEFT JOIN `{assoc-deal}` AS ad
ON coy.company_id = CAST(ad.from_id AS int64)
LEFT JOIN `{deals}` AS deals
ON ad.to_id = deals.deal_id;
FYI {assoc-contact} and {assoc-deal} are both separate views I created from the associations table for easier associations of those tables to the companies table.
It should also be noted that this query has occasionally run successfully, so I know it does work, it just fails about 90% of the time due to the query being so big.
TLDR;
Check your join keys. 99% of the time the cause of the problem is a combinatoric explosion.
I can't know for sure since I don't have access to the data of the underlying table, but I will give a general resolution method which in my experience worked every time to find the root cause.
Long Answer
Investigation method
Say you are joining two tables
SELECT
cols
FROM L
JOIN R ON L.c1 = R.c1 AND L.c2 = R.c2
and you run into this error. The first thing you should do is check for duplicates in both tables.
SELECT
c1, c2, COUNT(1) as nb
FROM L
GROUP BY c1, c2
ORDER by nb DESC
And the same thing for each table involved in a join.
I bet that you will find that your join keys is duplicated. BigQuery is very scalable, so in my experience this error happens when you have a join key that repeats more than 100 000 times on both tables. It means that after your join, you will have 100000^2 = 10 billion rows !!!
Why BigQuery gives this error
In my experience, this error message means that your query does too many computation compared to the size of your inputs.
No wonder you're getting this if you end up with 10 billion rows after joining tables with a few million rows each.
BigQuery's on-demand pricing model is based on the amount of data read in your tables. This means that people could try to abuse this by, say running CPU-intensive computations while reading small datasets. To give an extreme example, imagine someone makes a Javascript UDF to mine bitcoin and runs it on BigQuery
SELECT MINE_BITCOIN_UDF()
The query will be billed $0 because it doesn't read anything, but will consume hours of Google's CPU. Of course they had to do something about this.
So this ratio exists to make sure that users don't do anything sketchy by using hours of CPUs while processing a few Mb of inputs.
Other MPP platforms with a different pricing model (e.g. Azure Synapse who charges based on the amount of bytes processed, not read like BQ) would perhaps have run without complaining, and then billed you 10Tb for reading that 40Mb table.
P.S.: Sorry for the late and long answer, it's probably too late for the person who asked, but hopefully it will help whoever runs into that error.

Does BigQuery BI engine support left join or not?

as per the documentation BI engine is supposed to accelerate left join
https://cloud.google.com/bi-engine/docs/optimized-sql#unsupported-features
I tried this dummy query as a view, connect to datastudio
SELECT xx.country_region,yy._1_22_20 FROM `bigquery-public-data.covid19_jhu_csse.deaths` xx
left join `bigquery-public-data.covid19_jhu_csse.deaths` yy
on xx.country_region=yy.country_region
my question is: is left join supported or not ?
bug report here : https://issuetracker.google.com/issues/154786936
Datastudio report : https://datastudio.google.com/reporting/25710c42-acda-40a3-a3bf-68571c314650
edit : it seems BI engine is still under heavy development and needs more time to be feature completed, I just materialized my view, but it has a cost, 4 small tables < 10 MB, that change very 5 minutes cost 11 GB/ day , I guess it is worth it, Datastudio is substantially faster now, you can check it here (public report)
https://nemtracker.github.io/
Don't try JOINs, don't try sub SELECTs, don't do queries for BI Engine.
The best practice is to CREATE OR REPLACE a table dedicated to the dashboard you're building. Make sure to not have nested/repeated data there either. Then BI Engine will make your reports shine.
Related, check out this video I just made with the best practices for BI Engine with BigQuery:
https://youtu.be/zsm8FYrOfGs?t=307

What is the best approach to fetch a flag if objects are connected?

Suppose, we have two entities\tables - Users and Games (could be anything instead). And a user can mark multiple games as a favourite. So we also have a user_favourite_game (user_id, game_id) table.
Then suppose, a user is fetching a list of all available games and some of them should have the "favourite" flag = true (pagination is used, so we'll assume 20 games are fetched each time). So I see two approaches here:
We can make one request populating the "favourite" field, e. g.
SELECT
g.*,
ufg.game_id IS NOT NULL AS favourite
FROM
games g LEFT JOIN
user_favourite_game ufg ON ufg.user_id = :userId AND g.id = ufg.game_id
ORDER BY
g.id;
We can select the games and then perform 20 requests to check whether a game is of user's favourites.
Which approach is better to use and why? Any other ideas?
On the last project, we used the second approach because of the complexity of computations required for each entity. So it was a lot more complicated rather than in the example above and close to impossible to be calculated inside a single query.
But in general, it seems to me that in such simple cases a single query with JOIN should run faster than 20 simple queries. Although, I'm not sure how it will behave when we'll have a lot of data in user_favourite_game table
Use the database for what it's designed to do and have it give you the results as part of your original query.
The time your DB will spend performing the outer join on the user favorite game table will likely be less than the network overhead of 20 separate requests for the favorite flag.
Make sure the tables are indexed appropriate as they grow and have accurate statistics.
This isn't a hard and fast rule, and actual performance testing should guide, but I have observed plenty of applications that were harmed by network chattiness. If your round-trip cost for each request is 250ms, your 20 calls will be very expensive. If your round-trip cost is 1ms, people might never notice.
Firing 20 queries(irrespective of how simple they are) will always slow your application. Factors includes network cost, query running etc.
You should fire one query to get page of available games and then make another query to get list of "favorite" games of that user by passing ids of games present in that page. Then set/unset the flag by looping the result. This way you will make only 2 DB calls and it will improve performance significantly.

Sharing a table row between users in SqlServer Azure

Context: A mobile note taking application that is connected to windows azure mobile services. (Sql Server Azure)
Currently I have 2 tables: Users & Notes.
A user downloads their respective notes by querying the Notes table and asking for all notes that have the userID match their own.
Example:
SELECT * FROM Notes WHERE userID = myID;
But now I want my users to be able to share notes between them, so...
I'm thinking of adding a "SharedList" & "SharedListMember" tables, where each note will have a shared list with their respective sharing members on the SharedListMember child table.
Example:
SELECT DISTINCT n.* FROM Notes n LEFT OUTER JOIN SharedList l ON n.list = l.id INNER JOIN SharedListMember lm ON l.id = lm.list WHERE (lm.memberID = myID OR n.userID = myID)
I have added a LEFT OUTER JOIN because not all tasks will be shared.
I would be adding indexes on Notes.list, SharedList.id (Primary Key) , SharedListMember.memberID, SharedListMember.list
Questions:
How much performance impact can I expect with this setup ? Is there a faster way?
I currently query about a 1000 notes in less than a second. What would happen if I've got 10 million notes ?
You will likely notice no impact on 10 million notes, with this SQL query.
Your bottlenecks will be bandwidth back to your app, if your notes ever contain attachments and latency with the SQL query call to the database, so cache locally if you can and do Async calls where practical in your application.
This is a case of don't try to over optimize a solution that isn't causing a problem. SQL Azure is highly optimized and I have millions of rows in some of my tables and queries return in less than a second and are far more complicated than the one you have shown above.

Best way to store "views" of an topic

I use this code to update views of an topic.
UPDATE topics
SET views = views + 1
WHERE id = $id
Problem is that users likes spam to F5 to get ridiculous amounts of views.
How should I do to get unique hits? Make a new table where I store the IP?
Don't want to store it in cookies. It's too easy to clear your cookies.
I would create a separate table for storing this information. You can then capture a larger amount of data and not require updating the table that is likely to be read the most.
You would always use INSERT INTO tblTopicViews...
And you would want to capture as much information as you can, IP address, date and time of the hit, perhaps some information on browser version, operating system etc - whatever you can get your hands on. That way, you can fine-tune how you filter out refresh requests over time.
It's worth bearing in mind that many users can share an IP - for example, an entire office might go via the same router.
I would create a table which stores unique views:
CREATE TABLE unique_views(
page_id number,
user_agent varchar2(500),
ip_address varchar2(16),
access_time date,
PRIMARY KEY (page_id, user_agent, ip_address, access_time)
)
Now if someone accesses the page and you want to allow one view per user per day, you could do
INSERT INTO unique_views (:page_id, :user_agent, :ip_address, trunc(SYSDATE, 'day'))
which won't allow duplicate views for the same user during one day. You could then count the views for each page with a simple GROUP BY (example for today's views):
SELECT page_id, count(*) page_views
FROM unique_views
WHERE access_time = trunc(SYSDATE, 'day')
GROUP BY page_id
Well, you could write the individual page hits to a log table, including identifying information like cookings or IP address. You can analyze that table at leisure.
But the web server probably has a facility for this. I know both IIS and Apache can create detailed usage logs. And for both, there's a variety of graphing and analysis tools that keeps things like IP addresses into account.
So instead of rolling your own logging, you could use the web server one.
You could use the session_id() to discriminate between different users, obiously you need a separate table to track each visit.
UPDATE: I just noticed you don't want to depend on cookies, so this may not be suitable for you.
Note that due to various problems (eg. unknown behavior of cache servers) this kind of thing is always going to be inaccurate and a balance between various factors. However, for a rough vaguely-secure counter, using a separate table as Karl Bartel and others suggest is a decent solution.
However, depending on how seriously you take this problem, you may want to leave out "user_agent" - it's far to easy to fake, so if I really wanted to inflate my hit counter I could rack up the hits with a script that called my page with user-agent="bot1", then again from the same IP with "bot2", etc.
But then, 2 users behind one IP will be only counted as 1 hit so you lose accuracy - see what I mean about a balance between various factors?