SQL Web Traffic Query - sql

I came upon this question a couple of days back and couldn't find an optimal solution. We have a following table structure for storing some basic Web Traffic Logs of users who visit a particular website.
Table name: [tblWebtraffic]
Columns: Id,IPAddress,PageName,Date
I want a single query(i.e single Select statement) to query out. The total visits and total unique visitors (based on IPAddress) and total unique Pages that have been visited over the last 60 days.
PS:This is my first question in this site so forgive me if there are some details missing in the question. :)
EDIT: I am using a SQL Server Database.

SELECT pageName, count(*) AS pageHits FROM tblWebtraffic GROUP BY pageName
will give you hits per page, a slight alteration will give unique page hits
SELECT pageName, count(DISTINCT ipAddress) AS uniquePageHits FROM tblWebtraffic GROUP BY pageName
of course, removing the group by on pagename will give you the entire site hits
SELECT count(*) AS siteHits FROM tblWebtraffic
SELECT count(DISTINCT ipAddress) uniqueSiteHits FROM tblWebtraffic
PS: my first attempt at answer, so if anything missing please let me know :)
edit: these will work on MS SQLServer Transact SQL .. MySQL I'm less familiar with but I just tried out SELECT count(Distinct fieldname) and it worked
edit2: thanks for the edits - the code formatting looks great
edit3: answering the question :)
SELECT count(*) AS siteHits, count(DISTINCT ipAddress) AS uniqueSiteHits, count(DISTINCT pageName) AS uniquePages FROM tblWebtraffic WHERE DATEDIFF(d, [Date], getDate()) <= 60

Related

Disagreement between BigQuery and Google Analytics 4 for Pageviews - why?

I have a large table of Google Analytics 4 (GA4) events in Big Query for a bunch of websites I look after. The table has the following schema:
field name
type
event_date
date
event_timestamp
integer
event_name
string
event_key
string
event_string_value
string
event_int_value
integer
event_float_value
float
event_double_value
float
user_pseudo_id
string
user_first_touch_timestamp
integer
device_category
string
device_model_name
string
device_host_name
string
device_web_hostman
string
geo_country
string
geo_city
string
traffic_source_name
string
I query the table to get the total number for pageviews for a specific site using the following query:
with date_range as (
select
'20220601' as start_date,
'20220630' as end_date)
select
count(distinct case when event_name = 'page_view' then concat(user_pseudo_id, cast(event_timestamp as string)) end) as pageviews
from
`project_name.datset_name.table_name`,
date_range
WHERE
event_date BETWEEN PARSE_DATE('%Y%m%d',date_range.start_date) AND PARSE_DATE('%Y%m%d',date_range.end_date)
AND device_web_hostname in ("www.website_name.com")
What is a mystery to me is that when I do this for some sites, the figure for page_views is out by several hundred pageviews. The Big Query figure is higher. What is interesting is that:
If I try other events, such as sessions then there are no issues
As stated, it is only for some sites and not all
I know enought to know:
These numbers are never going to agree, but they shouldn't be out by several hundred either
GA4 has the unprocessed data, so the way I am querying the data is different to how it is being processed in the GA4 interface
I have tried:
Looking at the GA4 documentation to see how pageviews are used/processed; I can't see anything that enlightens me
Debugging each site to make sure tags are firing correctly; they are
I've hit a bit of a wall with this and I'd begrateful if anyone has any insight to point me in another possible direction. Thanks in advance!
The issue lies in this following part of the code:
select
count(distinct case when event_name = 'page_view' then concat(user_pseudo_id, cast(event_timestamp as string)) end) as pageviews
You are counting distinct for concat of user_pseudo_id and event_timestamp which is not unique. You need to also have session_id on top of that to get a unique hit.

how to get transactions on a database for a specific time

I want to get the users from a postgresql database where users activity is not seen for a specific period of time etc. (Basically I am trying to get which users who are not using the application at all)
For example the following SQL query is for users not using for the last 30 days:
SELECT distinct on (username) username, started_at
FROM projects_user JOIN projects_synclog
ON projects_user.id = projects_synclog.user_id
WHERE started_at BETWEEN '2019-08-15' AND '2019-09-15'
ORDER BY username, started_at DESC
In this query it is showing all the users which means for example a user may have logged in a month ago for once and again the same user has logged in 2 days ago. In this case, the user is still active, which I don't want to be listed out.
I have been trying this for countless hours. I searched for solutions a lot in here and other forums listed in google.
I would highly appreciate any help.
Thanks a lot.
I think you want aggregation and having. This answers the question in the title:
SELECT pu.username, max(sl.started_at)
FROM projects_user pu LEFT JOIN
projects_synclog sl
ON pu.id = sl.user_id
GROUP BY pu.username
HAVING MAX(sl.started_at) IS NULL OR
MAX(sl.started_at) < CURRENT_DATE - INTERVAL '7 DAY'

How to select the correct date in the same column when data in different rows are equal

I have the following problem with my data on a DB2 database. I want to create an overview when a machine was used for a project with a begin and end date.
The following data is available:
||Machine name||Description||Project||Start date|
|Mach1|DB2_AIX|Team_1_PERS|TEST_1|07-03-2017|
|Mach1|DB2_AIX|Team_1_PERS|TEST_1|16-03-2017|
|Mach1|DB2_AIX|Team_1_PERS|TEST_1|24-04-2017|
|Mach1|DB2_AIX|Team_1_PERS|TEST_1|07-05-2017|
|Mach1|DB2_AIX|Team_1_PERS|TEST_2|13-05-2016|
|Mach1|DB2_AIX|Team_1_PERS|TEST_1|22-05-2017|
|Mach1|DB2_AIX|Team_1_PERS|TEST_1|12-06-2017|
The result that I'm looking for is:
|Machine name||Description||Project||Start date||Last date|
|Mach1|DB2_AIX|Team_1_PERS|TEST_1|07-03-2017|07-05-2017|
|Mach1|DB2_AIX|Team_1_PERS|TEST_2|13-05-2016|13-05-2017|
|Mach1|DB2_AIX|Team_1_PERS|TEST_1|22-05-2017|12-06-2017|
Does anybody have an idea how to create this result with a statement?
This is a classic gaps-and-islands problem, and the standard solutions will work just fine:
WITH Grouped_Run AS (SELECT name, description, project, test, executedOn,
ROW_NUMBER() OVER(ORDER BY executedOn) -
ROW_NUMBER() OVER(PARTITION BY name, description, project, test ORDER BY executedOn) AS groupingId
FROM Machine)
SELECT name, description, project, test, MIN(executedOn) as testStart
FROM Grouped_Run
GROUP BY name, description, project, test, groupingId
ORDER BY testStart
Fiddle example
(it's a little unclear if the group is going to be the whole row, but that's adjustable)
....will produce the results you're looking for.
Note that depending on what specific version you're on, there may be other/faster ways to achieve these results.
It seems like you're trying to get the first and last of "Start date". Write a GROUP BY query with MIN(Start date) and another with MAX(Start date) then union the results. You'll have to select DISTINCT or do another GROUP BY to eliminate the duplicates that will occur when there's only one date.

sql count all items that day until start of database isn't working because of time

I am trying to count each item in a database table, that is deployments. I have counted the total number of items 3879, by doing this:
use Bamboo_Version6
go
SELECT Count(STARTED_DATE)
FROM [Bamboo_Version6].[dbo].[DEPLOYMENT_RESULT]
But I have been struggling to get the number of items each day until the start. I have tried using some of the other similar answers to this like:
select STARTED_Date, count(deploymentID)
from [Bamboo_Version6].[dbo].[DEPLOYMENT_RESULT]
WHERE STARTED_Date>=dateadd(day,datediff(day,0,STARTED_Date)- 7,0)
GROUP BY STARTED_Date
But this will return every id, and a 1 beside it because the dates have times which are making it unique, so I tried doing this: CONVERT(varchar(12),STARTED_DATE,110) to try and fix the problem but it still happens. How can I count this without, getting all the id's or every id as 1 each time?
Remove the time component:
select cast(STARTED_Date as date) as dte, count(deploymentID)
from [Bamboo_Version6].[dbo].[DEPLOYMENT_RESULT]
group by cast(STARTED_Date as date)
order by dte;
I'm not sure what the WHERE clause is supposed to be doing, so I just removed it. If it is useful, add it back in.
I have another efficient way of doing this, may be try this with an over clause
SELECT cast(STARTED_DATE AS DATE) AS Deployment_date,
COUNT(deploymentID) OVER ( PARTITION BY cast(STARTED_DATE AS DATE) ORDER BY STARTED_DATE) AS NumberofDeployments
FROM [Bamboo_Version6].[dbo].[DEPLOYMENT_RESULT]

SQL Query on find individuals that age is over 60 as of a specific date

Im new to stack so please go easy on me. Ive looked all over the web and cant find anything that really helps me.
So I need to provide details of all regular academics working in the Computing Department who were
over 60 years old as of 31/12/2014.
my trouble comes with how would I approach showing data of someone 60+ could you minus one date from another date? or is there is possible sql command that I am missing.
my attempt:
SELECT *
FROM staff, department
WHERE DOB <= '31/12/1964'
AND staff.department_ID = department.department _ID
There are functions to calculate the difference between dates, but the most efficient is to first calculate the date that a person would be born to be 60 at 2014-12-31. That way you make a direct comparison to a value, so the database can make use of an index if there is one.
Example for Oracle:
select
PersonId, FirstName, LastName
from
Person
where
Born <= add_months(date '2014-12-31', -60 * 12)
Example for MySQL (eventhough you removed the MySQL tag):
select
PersonId, FirstName, LastName
from
Person
where
Born <= date_sub('2014-12-31' 60 year)
I think In SQL SERVER
Select Datediff(DAYS,'05-19-2015','05-21-2015')
In My SQL
SELECT TIMESTAMPDIFF(HOUR, start_time, end_time)
as difference FROM timeattendance WHERE timeattendance_id = '1484'
The oracle add_months function will help you.
where yourfield < add_months(date '1964-12-31', 60*12 )