In what instances would a GA4 session ID have 2 different user pseudo ids? - google-bigquery

So I've been sorting out tracking and reporting on a new website using GA4 with a Big Query export.
As I've started building a report I've found 5 session IDs (out of a few hundred) that have 2 different user pseudo ids attached to them.
Any ideas why/when this would happen?
While I would expect one pseudo user id to have have more than one session id, I was only expecting each session id to only have one user pseudo id.
The only thing I thought it might be is if cookies were deleted during a session but I've tried this and the same pseudo used id persists if I change page (my page changes are just history changes) or I get a new pseudo id AND session id if I hard refresh.

ga_session_ids are unique to user_pseudo_id/user_ids. That means to identify a unique session in your property, you need to have a composite key of ga_session_id and user_pseudo_id. You can see the official/standard method of identifying and calculating sessions here. (Disclaimer: I wrote the linked article).

ga_session_ids are basically a time stamp when the session started linked to an
event, its possible for multiple visits to happen at the same time.
Its the user_pseudo_id that defines the events are from different users so combining them will give you the correct number of sessions
count(distinct concat(user_pseudo_id,(select value.int_value from unnest(event_params) where key = 'ga_session_id'))) as sessions

Related

Adding ASP.NET Identity to an existing project with already managed users table

I have an existing web api project with a users table. In general User is involved in some key business queries in the system (as other tables keep its 'UserId' foreign key).
These days I'm interested in adding Asp.net (core) identity. Basically I've already performed the required steps adding a separate Identity table, managing an additional db context (implementing IdentityDbContext), and also added a JWT token service. It looks that everything works fine. However I am now wondering how should I "link" between the authenticated user (which has logged in through the Identity module) and the user which is found on the other original "business related db".
What I was thinking of is that upon login, having the userId retrieved from the original Users table, based on the email which is used as the username and is found on both the original Users table and the new Identity table, and than have it kept as a Claim on the authenticated user. This way, each time the user is calling the API (for an Authorize marked action on the API relevant controller), assuming is authenticated I will have the relevant userId on hand and be able to address and query what ever is needed from the existing business table.
I guess this can work, however I'm not sure regarding this approach and I was wondering if there are any other options?
Regarding the option I've mentioned above, the main drawback I see is that upon the creation of a new user, this should be performed against 2 different tables, on 2 different DBs. In this case, in order to keep this in one unit of work, is it possible to create transaction scope consists of 2 different db contexts?
You're on the right track.
I faced similar problem
Imagine two different microservices.
Identity-Microservice(Stores identity information (Username, Password Etc...))
Employees-Microservice (Stores employee information (Name, Surname Etc...))
So how to establish a relationship between these two services?
Use queues(RabbitMq, Kafka etc...)
An event is created after User Registration(UserCreatedEvent {Id, Name etc..})
The workers microservice listens for this activity and records it in the corresponding table
This is the final state
Identity
Id = 1, UserName = ExampleUserName, Email = Example#Email Etc...
Employee
Id = 1, Name = ExampleName, Surname = ExampleSurname Etc...
Now both services are associated with each other.
Example
If i want to get the information of an employee who is logged in now.
var currentEmployeeId = User.Identity.GetById()
var employee = _db.Employee.GetById(currentEmployeeId)

Model 'friends' table in a relational db for social network

I'm rebuilding a project which I did a while ago, and instead of using mongoDB I'll be using PostgresSQL.
The project focuses on a social network - pretty much like facebook where users can post, send friend requests and etc.
Now, suppose I want to model a "friends" table, here is what I thought I can do:
There is going to be a buddy_requests
table
The idea is the sender is the one who sends the buddy request, and the receiver is the one who should approve it.
Now, when the receiver accepts the request, there is another table called "buddies" which is being updated:
Of course, if user A is friends with user B, then user B is friends with user A
Now, there are 3 ways of updating this buddies table which I though about and I would like to know which is the right way to do it (if it's right at all...)
Whenever a new request is being approved - insert 2 new rows, user_id for user A and buddy_id for user B and vice versa, user_id for user B and buddy_id for user A
The why: whenever I would want to query for all the friends of a specific user, I can search for them only in 1 column (user_id) and it is guaranteed I would retrieve all of this friends.
Whenever a new request is being approved - insert 1 new row, user_id for user A and buddy_id for user B
The why: Less space usage of the table, but querying for all the friends requires to go over 2 columns and take the distinct values (because we added only the sender to user_id)
Instead of having only 1 value in buddy_id, keep a json of all the friend's id's of user_id
The why: Querying only 1 column but saving a lot of data in the other one
Use MongoDB to store this kind of data and not use 'buddies' table.
I would like your honest opinions on this take, and if you have any other recommendations which offer better solution I would gladly read them.
Thank you!

Table design in Azure Table Storage

I should organize REST-service for messaging using azure. Now i have problem with DB. I have 3 tables: users, chats, messages of chats.
Users contains user data like login, password hash, salt.
Chats contains partitionkey - userlogin, rowkey - chatId, nowInChat - the user came from a chat.
Messages of chat contains partitionkey, wich consists of
userlogin_chatId_datetimeticks
(zevis_8a70ff8d-c363-4eb4-8a51-f853fa113fa8 _634292263478068039),
rowkey - messageId, message, sender - userLogin.
I saw disadvantages in the design, such as, if you imagine that users are actively communicated a year ago, and now do not talk, and one of them wants to look at the history, then I'll have to send a large number of requests to the server with the time intervals, such as a week, request data. Sending the request with a time less than today will be ineffective, because We get the whole story.
How should we change the design of the table?
Because Azure Storage Tables do not support secondary indexes, and storage is very inexpensive, your best option is to store the data twice, using different partition and/or row keys. From the Azure Storage Table Design Guide:
To work around the lack of secondary indexes, you can store multiple
copies of each entity with each copy using different PartitionKey and
RowKey values
https://azure.microsoft.com/en-us/documentation/articles/storage-table-design-guide/#table-design-patterns
Thank you for your post, you have two options here. The easiest answer with the least amount of design change would be to include a StartTime and EndTime in the Chat table. Although these properties would not be indexed I'm guessing there will not be many rows to scan once you filter on the UserID.
The second option requires a bit more work, but cleaner, would be to create an additional table with Partition Key = UserID, Row Key = DateTimeTicks and your entity properties would contain the ChatID. This would enable you to quickly filter by user on a given date/date range. (This is the denormalization answer provided above).
Hopefully this helps your design progress.
I would create a separate table with these PK and RK values:
Partition Key = UserID, Row Key = DateTime.Max - DateTimeTicks
Optionally you can also append ChatId to the end of the Row Key above.
This way the most recent communication made by the user will always be on top. So you can later on simply query the table with passing in only the UserId and a take count (ie. Take Count = 1 if you want the latest chat entry from the user). The query will also be very fast because since you use inverted ticks for your row keys, azure table storage service will sort the entries for the same user id in increasing lexicographical order of Row Keys, always keeping the latest chat on top of the partition as it will have the minimum inverted tick value.
Even if you add Chat Id at the end of the RowKey (ie. InvertedTicks_ChatId) the sort order will not change and latest conversation will be on top regardless of chat id.
Once you read the entity back, you subtract the inverted ticks from DateTime.Max to find the actual date.

3 Level authorization structure

I am working on banking application, I want to add a feature of maker,checker and authorize for every record in a table. I am explaining in below details
Suppose I have one table called invmast table. There are 3 users one is maker, 2nd one is checker and last one is authorize. So when maker user creates a transaction in database then this record is not live (means this record can not be available in invmast table). Once checker checked the record and authorizer authorized the record the record will go live ( means this record will insert in invmast table ). Same thing is applicable for update and delete also. So I want a table structure how to achieve this in real time. Please advice if any.
I am using vb.net and sql server 2008
Reads like a homework assignment.....
Lots of ways to solve this, here's a common design pattern:
Have an invmast_draft table that is identical to invmast but has an additional status column in the table. Apps need to be aware of this table, status column and what its values mean. In your case, it can have at least 3 values - draft, checked, authorized. Makers first create a transaction in this table. Once maker is done, the row is committed with the value "draft" in the status column. Checker then knows there's a new row to check and does his job. When done, row is updated with status set to checked. Authorizer does her thing. When authorizer updates the status as "authorized" you can then copy or move the row to the final invmast table rightaway. Alternatively, you can have a process that wakes up periodically to copy/move batches of rows. All depends on your business requirements. All kinds of optimizations can be performed here but you get the general idea.

Optimizing SQL to determine unique page views per user

I need to determine if user has already visited a page, for tracking unique page views.
I have already implemented some HTTP header cache, but now I need to optimize the SQL queries.
The visit is unique, when:
pair: page_id + user_id is found in the visit table
or pair: page_id + session_id is found
or: page_id + [ip + useragent] - (this is a topic for another discussion, whether it should be only ip or ip+useragent)
So I have a table tracking user visits:
visit:
page_id
user_id
session_id
useragent
ip
created_at
updated_at
Now on each user visit (which does not hit cache) I will update a row if it exists. If there are any affected rows, I will insert new visit to the table.
This are one or two queries (assuming the cache will work, mostly two queries), but the number of rows is limited somehow. Maybe it would be better to store all the visits and then clean up the database within e.g. a month?
The questions are:
how should be the visit table constructed (keys, indexes, relations to user and page_views table). Some of the important fields may be null (e.g. user_id), what about indexes then? Do I need a multi column primary key?
which would be the fastest sql query to find the unique user?
is this sane approach?
I use PostgreSQL and PDO (Doctrine ORM).
All my sessions are stored in the same DB.
Personally I would not put this in the request-response path. I would log the the raw data in a table (or push it on a queue) and let a background task/thread/cron job deal with that.
The queue (or the message passing table) should then just contain pageid, userip, sessionid, useragen,ip.
Absolute timings are less important now as long as the background task can keep up. since a single thread will now do the heavy lifting it will not create conflicting locks when updating the unique pageviews tables.
Just some random thoughts:
Can I verify that the thinking behind the unique visit types is:
pageid + userid = user has logged in
pageid + sessionid = user not identified but has cookies enabled
pageid + ip / useragent = user not identified and no cookies enabled
For raw performance, you might consider #2 to be redundant since #3 will probably cover #2 i most conditions (or is #2 important e.g. if the user then registers and then #2 can be mapped to a #1)? (meaning that session id might still be logged, but not used in any visit determination)
IMHO IP will always be present (even if spoofed) and will be a good candidate for an Index. User agent can be hidden and will only have a limited range (not very selectable).
I would use a surrogate primary key in this instance due to the nullable fields and since none of the fields is unique by themselves.
IMHO your idea about storing ALL the visits and then trimming the duplicates via batch out is a good one to weigh up (rather than checking if exists to update vs insert new)
So PK = Surrogate
Clustering = Not sure - another query / requirement might drive this better.
NonClustered Index = IP Address, Page Id (assuming more distinct IP addresses than page id's)