Summarize browsing records in specific URL for specific user into one record regarding date in SQL Server - sql

I'm facing a problem in my report. The report calculates how many times a specific (company's applications) url's were viewed/opened ignoring user data.
What I need is for the query to count the times viewed regarding that every user might have not only opened the application but also browse in it (i.e. filter something, but it is still the same application), then the data shows that the same user in minutes or seconds difference opened the same application - every click/filtering/recompiling the page etc. makes a new entry record, which is misleading, because the report shows how many times the application was opened as an individual record. The applications (which are the same in every country) are used in different countries therefore the log data is in different servers.
There are 4 tables from different servers, which have log entry data of the applications (url's), and they have to be inserted into one with already summarizes log entry data.
A small piece of the one table with the data:
A small piece of the second table just to see that the only difference between tables is litintranet, wokintranet:
There you can see that for the LogApp IFP the same user browsed with difference of seconds. But it should have only one record (just for opening the app), but has 3 records because the user probably filtered something or refreshed the page etc.
I need a query that summarizes this information and enters the new summarized / reduced records into a new table. The new table will be used for reports as the correct data of records.
The output should look like this:
How can the summarizing be done?
Thank you for your help

Related

report scheduler system design using database as master

Problem
we have ~50k scheduled financial reports that we periodically deliver to clients via email
reports have their own delivery frequency (date&time format - as configured by clients)
weekly
daily
hourly
weekdays only
etc.
Current architecture
we have a table called report_metadata that holds report information
report_id
report_name
report_type
report_details
next_run_time
last_run_time
etc...
every week, all 6 instances of our scheduler service poll the report_metadata database, extract metadata for all reports that are to be delivered in the following week, and puts them in a timed-queue in-memory.
Only in the master/leader instance (which is one of the 6 instances):
data in the timed-queue is popped at the appropriate time
processed
a few API calls are made to get a fully-complete and current/up-to-date report
and the report is emailed to clients
the other 5 instances do nothing - they simply exist for redundancy
Proposed architecture
Numbers:
db can handle up to 1000 concurrent connections - which is good enough
total existing report number (~50k) is unlikely to get much larger in the near/distant future
Solution:
instead of polling the report_metadata db every week and storing data in a timed-queue in-memory, all 6 instances will poll the report_metadata db every 60 seconds (with a 10 s offset for each instance)
on average the scheduler will attempt to pick up work every 10 seconds
data for any single report whose next_run_time is in the past is extracted, the table row is locked, and the report is processed/delivered to clients by that specific instance
after the report is successfully processed, table row is unlocked and the next_run_time, last_run_time, etc for the report is updated
In general, the database serves as the master, individual instances of the process can work independently and the database ensures they do not overlap.
It would help if you could let me know if the proposed architecture is:
a good/correct solution
which table columns can/should be indexed
any other considerations
I have worked on a differt kind of sceduler for a program that reported analyses on a specific moment of the month/week and what I did was combining the reports to so called business cycle based time moments. these moments are on the "start of a new week", "start of the month", "start/end of a D/W/M/Q/Y'. So I standardised the moments of sending the reports and added the id's to a table that would carry the details of the report. - now you add thinks to the cycle of you remove it when needed, you could do this by adding a tag like(EOD(end of day)/EOM (End of month) SOW (Start of week) ect, ect, ect,).
So you could index the moments of when the clients want to receive the reports and build on that track. Hope that this comment can help you with your challenge.
It seems good to simply query that metadata table by all 6 instances to check which is the next report to process as you are suggesting.
It seems odd though to have a staggered approach with a check once every 60 seconds offset by 10 seconds for your servers. You have 6 servers now but that may change. Also I don't understand the "locking" you are suggesting, why now simply set a flag on the row such as [State] = "processing", then the next scheduler knows to skip that row and move on to the next available one. Once a run is processed, you can simply update a [Date_last_processed] column, or maybe something like [last_cycle_complete] = 'YES'.
Alternatively you could have one server-process to go through the table, and for each available row, sends it off to one of the instances, in a round-robin fashion (or keep track of who is busy and who isn't).

Value only showing the first item in SSRS report

So my problem here is that I have a Part number which lives in two warehouses hence it has two bin locations. If I just use =Fields!PrimBin.Value it only ever returns the first location. I need to display the PrimBin if the location is from a specific warehouse. To get the warehouse I use =Fields!WarehouseCode.value
What I need to do is only show the PrimBin.Value of MAINWHSE and not CELLWHSE
Thanks in advance.
Ok so the database it quite vast. However, for the information required I am using two tables. Part and PimWhse.
Part shares the Product ID to PrimWhse. In PrimWhse each partID has two locations "MAINWHSE", "CELLWHSE "and 1 bin to pick in each warehouse giving to possible locations.
So WarehouseCode.Value will have the information for which warehouse the part is located. and PrimBin.Value will have the warehouse position ID stored in it.
This is all setup via report style within the Epicor system. When I create a query in business activity to look in MAINWHSE it shows the correct information.
However, in the report data builder I'm not able to set this query so I assume SSRS will be able to see of both theses possible values for PrimBin.Value!? If not I guess I need to work out how to add a query to report data builder, which at the moment does no seem possible?
Thanks again.

Use domain of one table for criteria in another in ms Access query?

I am trying to create a report that displays 3 different numbers for each of my projects.
Contract Hours - Stored in projects table, 1 to 1 relationship
Worked Hours - Stored in linked table that will be updated using an external website reporting feature that will contain only data for the dates that are to be displayed in the report, one to many relationship needs to be a sum
Allocated Hours - Stored in a table in my database called allocations and contains data for all dates, one to many relationship needs to be summed.
Right now i have it set up in a way that the user has to type the data range for the report every time it is run, however the date range only actually applies to the Allocation data because the worked hours data comes filtered and the contract data is one to one.
What I would like to do is set up a query that can see the domain of the worked hours and apply it as a date criteria for the allocated hours.
I have attempted to use max and min values of the Worked hours and tried to get creative but I'm actually not even sure if this is possible because I cannot see any simple solution (although I know it should be possible and fairly simple)
Any help, suggestions, or recommendations are appreciated.

How to fetch data for a news feed like system?

I have few tables as shown below
Polls
PollId Question Option
1 What 1
2 Why 4
Updates
UpdateId Text
1 Sleep
2 Play
Polls and updates are just two sample tables (In reality there are more tables like ,photos, videos,links etc). But when a user visit his home (like facebook new feed) he must be displayed with data relevant to him (no such data included in this example). ie I want to select data from all tables with less number of query executions. (ie, I want to present a mixture of datas, ie polls, photos, videos etc )
Currently, I'm fetching only ids and type (ie which table) from all of the tables and gather further data while iterating through this resultset. (ie from c# calling another SqlQuery) .
Is there a way to query the data from whole tables at once? (OUTER JOIN?, UNION?)
Or simply,
How can I select different type of entities at once in a single sql Query?
You could write your query so that you have one long select list for everything you want and it all comes back in one result set but I suspect that wouldn't work too well because you might have varying numbers of different types of items per user.
If you really must have it all in one hit then you can issue multiple queries in one go and get multiple result sets back. To handle this you can use an ADO.Net DataSet. See this SO example (but not the accepted answer - see Vikram Dibyal's answer as that gives a very basic overview of what I think you're asking for).
I won't copy and paste the stuff from the linked thread, just head over and take a look.

SSRS Data-Driven Subscription [based on static Subscription table] Not Picking Up Changes Made to Subscription Table

I have a .RDL report which I designed in BIDS and have deployed to my report server. The report asks for three parameters before viewing report: Year, Month and Customer ID. The report works great and does exactly what it is supposed to.
While I used to run each report individually because there were 2-3 customers, now there are 30+ customers who receive the report, so I wanted to switch to a more automated fulfillment method to get the reports generated. After doing some research it appears that a using Report Manager to create a "Data Driven Subscription" (DDS) using the "Windows File Share" option gives me the capabilities I need.
As part of creating the DDS, I created a table called [Subscription] which is a table containing one row for each customer receiving the report and has the following columns:
Year
Month
CustomerID
FileName
FileLocation
Overwrite
Format
...so through using the DDS Wizard in Report Manager, I was able to successfully set up a Data Driven Subscription (which is linked to various columns in the [Subscription] table) which creates a new report for each customer in the [Subscription] table, saves [and overwrites, if necessary] it in a location of my choosing as a PDF (specified in [Subscription].[FileLocation], or the FileLocation column of my table for each row), and runs every minute (I plan on changing frequency to once a week, eventually).
This works flawlessly, giving me a new set of 30 reports in the directory of my choosing, with each report having a name I assigned in the FileName column of my table. Exactly what I was looking for.
HERE'S THE PROBLEM: When I update the FileLocation or FileName (or anything, really) in the [Subscription] table - it doesn't pick up the changes right away. Sometimes it doesn't even pick it up at all (for example I updated the [ReportName] column for one customer from Report_711622 to SpecialReport_711622, so that the output file for that customer should be named SpecialReport_711622 while all of the other reports should be called Report_XXXXX [no Special prefix]. But the file name of report for Customer 711622 remains the same!
It's almost like the job only see's what it needs to do once a day, and then does not go back and reference the [Subscription] table until I leave for the night, then when I come back in the morning it picks up the change.
Since I am about to scale this process out to a large customer-base using a different report, I need to be able to make edits to the [Subscription] table and have them get picked up by the Data Driven Subscription immediately (and if not immediately, at least a fixed interval of time that I can adjust, so that I can know 100% when the change will get picked up).
Does anyone know what's causing my lag? How do I change it so that updates to the Subscription table get picked up regularly? I'm also having issues with creating new DDS on other reports (following the exact process outlined above) - I've created the subscriptions, for every minute, and it says they are running and the number of outputs match the number of customers with 0 errors, but there are no files in the drive I specified (or anywhere else I've looked, for that matter).
Any help would be greatly appreciated!
I think the answer lies in the mechanism SSRS uses. There are a few places "lag" can occur.
The subscription is in fact an SQL Agent job which creates a record in the Event table. This table is a queue that SSRS checks to do scheduled tasks.
There is a small amount of time between the moment the subscription creates the Event record and the moment SQL reads it and starts creating the dataset for your DDS. The creation of the DDS dataset takes some time, too. In this time, the subscription will be in the Pending state. If you change anything in the data during this time, The subscription will still use the old data as report parameters. So obviously you will not notice your change until the next scheduled run.
Which brings me to the following: if a subscription is still being run and the next schedule kicks in (chances are, because yours runs every minute), the engine will not execute it, but wait for the next subscription schedule, and so on. So that's another possibility of lag - and cause of missing reports for a certain schedule minute. The subscription processes reports sequentially, one row from your DDS recordset at a time. Again, this takes some time. You can also see that in the subscription window when it says: # of # processed.
I suggest you look at the Event table in the database ReportServer during an execution. Also the ExecutionHistory views (there are 3) may be interesting. A scheduled run shows up as a RequestType = 1 and generates one record for each report. You can see the exact timing and parameters of each report that is run in the subscription. You may be able to extract the data you need to resolve your other issues.
EDIT: Here is a more elaborate guide to DDS data and events
http://blogs.msdn.com/b/deanka/archive/2009/01/13/diagnosing-and-troubleshooting-subscriptions.aspx
http://blogs.msdn.com/b/deanka/archive/2010/02/16/troubleshooting-subscriptions-part-ii-using-the-report-services-trace-log-file.aspx
Could this "Double-Hop" problem be the source of my issues? I'm so stuck on this one!
The Double-Hop Problem - MSDN Knowledgecast