How to use squid access logs to find frequency of web requests - frequency

I am trying to build a model for how frequently users make web requests. I am interested in the timing between each new page they visit. I want to build a load simulator which then uses this model.
To do this I've been analyzing Squid access logs and looking at the timing between http requests by user IP. Squid captures all the requests associated with a web site request and I am only interested in the top level page requests. There are numerous starting pages for a request eg. not just *.html so it seems challenging to only capture the starting page for each session.
Is there a way to only capture the initial request for the top level page, like for when a user a page on Amazon, and then they jump to another page, etc.

You can use Squid Analysis Report Generator it will read log files and generate reports in HTML format with detailed information like access and denied website,daily and weekly report.

Related

Cloudflare Dashboard and Cloudflare Web Analytics show very different number of visits

I have a static website on Cloudflare Pages, for which Cloudflare Web Analytics is enabled.
This is the only thing I am hosting on Cloudflare, and I set it up less than 24 hours ago. On the Cloudflare Dashboard, I see 403.96k visitors in the last 7 days, whereas in Cloudflare Web Analytics it is 152.08k, a relative difference of 2.68. (In both cases, the number of page views is very close to the number of visits.) What could be the reason for this?
The Cloudflare Dashboard shows server-side analytics, meaning it will record every request to your domain (bots, utils, users, etc.)
On the free plan, their Web Analytics solution is client-side, and relies on Javascript to run and report data. This leaves is susceptible to being blocked by browser extensions. It will also not record all bot requests, particular if they are just requesting a specific page / resource, and not running in a browser.
More info - https://developers.cloudflare.com/analytics/faq/web-analytics#the-analytics-beacon-is-blocked-by-ad-blockers-including-adblockplus-brave-duckduckgo-extension-etc-why-is-that

How to add dynamic meta tags to website with no middleware or SSR

I have a relatively large app where there are a lot of user profile pages. I want to make it so that if you share one of the user's profile page it will preview their name and picture on social medias like FB and Twitter (think sharing a Twitch streamer's page on Twitter). I used create-react-app to start the project so I don't have server side rendering or any middleware for pre-rendering tools. Is there another way I can accomplish this?
There two ways you can get this to work
Is the server your files via express server and check for who has the made the request by checking user-agent header from request and if its a bot instead of sending them the usual response you can fetch the required user profile data and use that data to populate the open-graph metatags and return them the HTML with those metatags.
Second way would be to use a network interceptor from the CDN you're using to identify the who is requesting the page (either bot or a person) if its a bot, make a request to your backend to fetch related data and send them the HTML with populated metatags.
Explained approach
Every time a request comes into our server, it comes with a header value user-agent which tells the server who is requesting the resources (human or a bot from Facebook trying to do link preview). Just by comparing a list of known user-agent (so it won't work on all but will work all know platforms and 90% of others.)
Let's say we have something.com where we want the link preview and let's say a request comes for something.com/john. What we will do is check for request that is coming to the server and will check for user-agent property, if its a human it will be redirected to our normal site but if its a bot (so it just wants an HTML for link preview) what we are going to do is since it's our server we can grab the data of akkshay and set the proper metatags inside our HTML and send it back as a response.
So what happens here is whenever a human tries to go for something.com/john he will be redirected to our landing page as he is more concerned about what he sees on his browse but when a bot comes in we will send it HTML response with proper metatags as its the link previews which is the concern for the bot.
This thing can be done on our express server with something like this. But this can also be done infrastructure level.

Yii Flash Messages not showing - possible HTTP Proxy browsing?

I'm investigating a problem a user is having with a web application that is built using Yii.
The user is not seeing the Yii 'flash' session-based user-feedback messages. These messages are shown once to a user and then destroyed (so they're not shown on subsequent page loads).
I took a look at the server access logs and I noticed something weird.
When this user requests a page there is a second identical request but from a different IP and with a different User Agent string. The second request is often at the same time or is sometimes (at most) a couple of minutes later. A bit of googling leads me to the conclusion that the user is browsing the web using a HTTP Proxy.
So, is this likely to be a HTTP Proxy? Or could it be something more suspicious? And if it is a HTTP Proxy, does this explain why they're not seeing the flash session messages? Could it be that the messages are being 'shown' to the Proxy and then destroyed?

Who knows which files should be included in a website?

When the browser requests a website, any website from a HTTP server, which of the two parses the site's content in order to know which other files need to be included on the webpage?
What I mean is this:
the browser asks for the html file and then observers that it needs to import some external css files and HE is the one who requests them.
OR
the HTTP server when faced with a request for a website, parses (already knows) which sites need to be linked to a certain webpage and sends them alongside the html page?
I'm guessing the first case is the correct one, but if someone can confirm and maybe clarify it, I'd appreciate it.
It's all done by the client (which is usually a browser). When it sees <script>, <iframe>, <img>, <link>, etc. tags that reference other documents, it downloads them if necessary.
According to Wikipedia -
The primary function of a web server is to cater web page to the
request of clients using the Hypertext Transfer Protocol (HTTP). This
means delivery of HTML documents and any additional content that may
be included by a document, such as images, style sheets and scripts.
and
The primary purpose of a web browser is to bring information resources
to the user ("retrieval" or "fetching"), allowing them to view the
information ("display", "rendering"), and then access other
information ("navigation", "following links").
It is the Browser that parses the HTML and request for the associated contents.

How to preprocess steps on Apache before sending a page to the User

I have an Apache web server and I need to do some processes outside the web server when a user requests a certain page.
I will try to be more clear: when a user requests page X, I have to start an external program, passing it some session parameters, wait for response, and then send the requested page to the user.
Is this possible to do this?
I used apache ext_mod_filter: http://httpd.apache.org/docs/2.0/mod/mod_ext_filter.html
Performances are not that great, but for my purposes it is ok.