Where in the pdf text did the modification likely take place? - pdf

I have a pdf file that was created on a certain date and from the meta-data it was last modified on a date after its creation.
The pdf is nearly all just text and there is a sentence in the text that has likely been extended and a word deleted. Can I find out whether this particular sentence was in fact (likely) modified between the creation date and last modification date? Or rule it out.
I didn't know whether I could convert the pdf to a more elementary type (similar to .tex) or view it in another more elementary application (like CosEdit) to identify whether this sentence was extended and words deleted between the creation date and last modification date?
Don't worry about anyone attempting to conceal the modifications in any way. That's not applicable in this instance.
Link to document: https://drive.google.com/file/d/1OFXRCw2U1mo7BjHUSGs_1fVjDsQLRo0V/view?usp=drivesdk
Realvent line is on page5. Its the first bullet point under the title Criteria for Addressing a Property

There is not much value or certainty when analysing a reasonably well constructed PDF the sample provided is of unknown pedigree. I personally would not trust a PDF history comparison over a conventional Paper Trail. You query the changes made to a newer copy of a Public Document.
We can see the Original was reported as produced by the technician using Word 2013 on 6/12/2017, potentially after drafts had been corrected by management, the source document reports that there were 2 prior changes, which are not of concern here, since the document as it stood at that time, would then (as if printed) have gone forward for final approval, master sign off, and publication.
You provided a secondary amended copy of the same policy document. Initial query shows it appears as if it was subject to A change in time but there are no incremental editions to be pared back, so using a comparison tool we can check for the differences.
First look suggests 5 of 8 pages were changed (updated per annual review)
The first change is Page 3 the admin charge for 2021 is now £86 (was £75 in 2017)
The second change is on Page 5 more on that later
The third change is on Page 6 where premise has been changed to primary
The fourth is Page 7 where the example Numbered ... 1 is changed to Lettered ... A
Finally Page 8 the Technician has been Promoted over the years and the department has been renamed.
ALL these changes would have been made in the source Word Document which in turn may have changed many more times than we shall know without the paper trail showing which day the technician was formally appointed or the department changed name or the annual charges were increased. A PDF is dumbly generated as showing A difference from the original.
Your query is can we tell how many times or when or by who Page 5 was changed. As you may have gathered from the above the short answer is usually no (not from a PDF).
The changes over time of a policy document are driven by many factors such as inflation, spell checking and proof reading changes, or changes in managerial policies.
Page 5 was changed in two places
semantically the unnecessary word "new" was replaced with "a"
and a concession was added to the end of the paragraph
"unless justification can be supplied"
There is no way of knowing who penned those changes, only some certainty we can guess the technician was directed to make those corrections between 2017 and 2021. But was it verbal or by email or paper we do not know those are other documents. What we do know is the final document must have been approved for PDF printing, unless your copy is unofficial.
If you wish to know more see https://www.whatdotheyknow.com/request/street_naming_information

Related

What are the steps to convert a Scenario to BPMN?

I have an exam tomorrow and to be honest till now I don't know what are the steps that I should go through to design a given Scenario.
For example, when you see a scenario like this
Every weekday morning, the database is backed up and then it is checked to see whether the “Account Defaulter” table has new records. If no new records are found, then the process should check the CRM system to see whether new returns have been filed. If new returns exist, then register all defaulting accounts and customers. If the defaulting client codes have not been previously advised, produce another table of defaulting accounts and send to account management. All of this must be completed by 2:30 pm, if it is not, then an alert should be sent to the supervisor. Once the new defaulting account report has been completed, check the CRM system to see whether new returns have been filed. If new returns have been filed, reconcile with the existing account defaulters table. This must be completed by 4:00 pm otherwise a supervisor should be sent a message.
What is your approach to model this? I am not asking for the answer of this particular scenario, I am asking for the method. Do you design sentence by sentence? or do you try to figure out the big picture first then try to find the sub process?
There are no exact steps. Use imagination, Luke!)
You can take these funny instructions like a starting point, but they were made by dummies for dummies.
Commonly you should outline process steps and process participants on a sheet of paper schematically and try to build your model. No other way: only brainstorm.
When BPMN comes to mind, one thinks of people together in a conference room discussing how the business does things (creating what you call scenarios and translating to business processes) and drawing boxes and lines on a whiteboard.
Since 2012, when BPMN 2.0 appeared as an Object Management Group (OMG) specification, we have the very comprehensive 532-page .pdf file with pretty much all the information to create the process diagrams one needs.
Still, in addition to reading the previous file, one can also find many BPMN examples of common modeling problems, patterns, books and research papers which help to understand how certain scenarios come to live.
Generally speaking, we first identify who takes part in the process to understand who are the actors. After, we realize where they get (if they get) their input, what they do with it (if they do anything) and where they forward it after they have completed their work (if they forward). This allows to visualize each actor has specific tasks that follow a specific flow of work and can better draw it.
Then, once the clean and simple diagram is built, one can validate visualizing (IRL or not) the users / actors executing the activities.

Lotus Notes: Replication conflict caused by agent and user running on the document at same time

In one of the lotus notes db, too frequent replication/save conflicts are caused reason being a scheduled agent and any user working on the document at the same time.
is there any way to avoid this.
Thanks,
H.P.
Several options in addition to merging conflicts:
Change the schedule The best way to avoid it is to have your scheduled agents running at times when users are not likely to be accessing the system. If the LastContact field on a Client document is updated by an agent every hour as it checks all Contact documents, maybe the agent should run overnight instead.
Run the agent on user action It may also be the case that the agent shouldn't be running on a schedule, but should be running when the user takes some action. For example, run the agent to update the Client document when the user saves the supporting Contact document.
Break the form into smaller bits A third thing to consider is redesigning your form so that not every piece of data is on a main form. For example, if comments on recent contacts with a client are currently held in a field on the Client document, you might change the design to have a separate ClientMeeting form from which the comments on the meeting are displayed in an embedded view or computed text (or designed using Xpages).
Despite the fact that I am a developer, I think rep/saves are far more often the result of design decisions than anything else.
You can use the Conflict Handling option on the form(s) in question and select either Merge Conflicts or Merge/No Conflicts in order to have Notes handle merging of edit conflicts.
From the Help database:
At the "Conflict Handling" section of the Form Info tab, choose one of the following options for the form:
Create Conflicts -- Creates conflicts so that a replication conflict appears as a response document to the main document. The main document is selected based on the time and date of the changes and an internal document sequence number.
Merge Conflicts -- If a replication conflict occurs, saves the edits to each field in a single document. However, if two users edit the same field in the same document, Notes saves one document as a main document and the other document as a response document marked as a replication conflict. The main document is selected based on the criteria listed in the bullet above.
Merge/No Conflicts -- If replication occurs, saves the edits to each field in a single document. If two users edit the same field in the same document, Notes selects the field from the main document, based on time and date, and an internal document sequence number. No conflict document is generated, instead conflicting documents are merged into a single document at the field level.
Do Not Create Conflicts -- No merge occurs. IBM® Lotus® Domino(TM) simply chooses one document over another. The other document is lost.
In later versions of Notes there is the concept of document locking, and used properly that can prevent conflicts (but also add complexity).
Usually most conflicts can be avoided by planning to run the agents late at night when users aren't on the system. If that's not an option, then locking may be the best solution. If the conflicts aren't too many, you might benefit from adding a view filtered to show only conflicts, which would make findind and resolving them easier.
IMHO, the best answer to conflicts between users and agents is to make sure that they are operating on different documents. I.e., there are two documents with a common key. They can be parent and child if it would be convenient to show them that way in a view, but they don't have to be. Just call them DocA and DocB for the purposes of this discussion.
DocA is read and updated by users. When a user is viewing DocA, computed field formulas can pull information from DocB via DbLookup or GetDocField. Users never update DocB.
DocB, on the other hand, is read and updated by agents, but agents only read DocA. They never update them.
If you design your application any other way, then you either have to use locking -- which can create the possibility of not being able to update something when you need to, or accepting the fact that conflicts can happen occasionally and will need to be resolved.
Note that even with this strategy, you can still have conflicts if you have multiple replicas of the database. You can use the 'Conflict Handling' section of the Form properties to help minimize replication conflicts, as per #Per Henrik Lausten's answer, but since you are talking about an existing please also see my comment to his answer for additional info about what you would have to do in order to use this feature.
If this is a mission critical application, consider creating a database with lock-documents. That means, every time a user opens a document, a separate lock-document is created.
Then code the agent to see if lock-documents exist for every document that the agent wants to modify. If it does, skip that document.
Document-close should remove the doc-lock.
The lock-doc should be created on document open, not just read. This way, when a user has the document open in read mode, the agent will not be able to modify as well. This is to prohibit, that the user might change to editmode afterwards and make changes.
If the agent has a long modification time, it should create lock-docs as well.

How to find the document visitior's count?

Actually I am in need of counting the visitors count for a particular document.
I can do it by adding a field, and increasing its value.
But the problem is following.,
I have 10 replication copies in different location. It is being replicated by scheduled manner. So replication conflict is happening because of document count is editing the same document in different location.
I would use an external solution for this. Just search for "visitor count" in your favorite search engine and choose a third party tool. You can then display the count on the page if that is important.
If you need to store the value in the database for some reason, perhaps you could store it as a new doc type that gets added each time (and cleaned up later) to avoid the replication issues.
Otherwise if storing it isn't required consider Google Analytics too.
Also I faced this problem. I can not say that it has a easy solution. Document locking is the only solution that i had found. But the visitor's count is not possible.
It is possible, but not by updating the document. Instead have an AJAX call to an agent or form with parameters on the URL identifying the document being read. This call writes a document into a tracking DB with one or two views and then determines from those views how many reads you have had. The number of reads is the return value of the AJAX form.
This can be written in LS, Java or #Formulas. I would try to do it 100% in #Formulas to make it as efficient as possible.
You can also add logic to exclude reads from the same user or same source IP address.
The tracking database then replicates using the same schedule as the other database.
Daily or Hourly agents can run to create summary documents and delete the detail documents so that you do not exceed the limits for #DBLookup.
If you do not need very nearly real time counts (and that is the best you can get with replicated system like this) you could use the web logs that domino generates by finding the reads in the logs and building the counts in a document per server.
/Newbs
Back in the 90s, we had a client that needed to know that each person had read a document without them clicking to sign or anything.
The initial solution was to add each name to a text field on a separate tracking document. This ran into problems when it got over 32k real fast. Then, one of my colleagues realized you could just have it create a document for each user to record that they'd read it.
Heck, you could have one database used to track all reads for all users of all documents, since one user can only open one document at a time -- each time they open a new document, either add that value to a field or create a field named after the document they've read on their own "reader tracker" document.
Or you could make that a mail-in database, so no worries about replication. Each time they open a document for which you want to track reads, it create a tiny document that has only their name and what document they read which gets mailed into the "read counter database". If you don't care who read it, you have an agent that runs on a schedule that updates the count and deletes the mailed-in documents.
There really are a lot of ways to skin this cat.

Building a ColdFusion Application with Version Control

We have a CMS built entirely in house. I'm the new web developer guy with literally 4 weeks of ColdFusion Experience. What I want to do is add version control to our dynamic pages. Something like what Wordpress does. When you modify a page in Wordpress it makes some database entires and keeps a copy of each page when you save it. So if you create a page and modifiy it 6 times, all in one day you have 7 different versions to roll back if necessary. Is there a easy way to do something similar in Coldfusion?
Please note I'm not talking about source control or version control of actual CFM files, all pages are done on the backend dynamically using SQL.
sure you can. just stash the page content in another database table. you can do that with ColdFusion or via a trigger in the database.
One way (there are many) to do this is to add a column called "version" and a column called "live" in the table where you're storing all of your cms pages.
The column called live is option but might make it easier for your in some ways when starting out.
The column "version" will tell you what revision number of a document in the CMS you have. By a process of elimination you could say the newest one (highest version #) would be the latest and live one. However, you may need to override this some time and turn an old page live, which is what the "live" setting can be set to.
So when you click "edit" on a page, you would take that version that was clicked, and copy it into a new higher version number. It stays as a draft until you click publish (at which time it's written as 'live')..
I hope that helps. This kind of an approach should work okay with most schema designs but I can't say for sure either without seeing it.
Jas' solution works well if most of the changes are to one field, for example the full text of a page of content.
However, if you have many fields, and people only tend to change one or two at a time, a new entry in to the table for each version can quickly get out of hand, with many almost identical versions in the history.
In this case what i like to do is store the changes on a per field basis in a table ChangeHistory. I include the table name, row ID, field name, previous value, new value, and who made the change and when.
This acts as a complete change history for any field in any table. I'm also able to view changes by record, by user, or by field.
For realtime page generation from the database, your best bet are "live" and "versioned" tables. Reason being keeping all data, live and versioned, in one table will negatively impact performance. So if page generation relies on a single SELECT query from the live table you can easily version the result set using ColdFusion's Web Distributed Data eXchange format (wddx) via the tag <cfwddx>. WDDX is a serialized data format that works particularly well with ColdFusion data (sorta like Python's pickle, albeit without the ability to deal with objects).
The versioned table could be as such:
PageID
Created
Data
Where data is the column storing the WDDX.
Note, you could also use built-in JSON support as well for version serialization (serializeJSON & deserializeJSON), but cfwddx tends to be more stable.

VB 2008 - Extracting data from website question

Having more problems coming up with some code for my hwk. All I've been able to do is create the form. We have to get information off of a website and load certain information back into the textboxes on our form. I need a push in the right direction if someone could help.
Assignment 6 – Text Parsing
Many applications require you to extract information from a block of text. We will be using this heavily in the project. This project is designed to give you some practice extracting information out of text files. You will need to open up a file, search the file for some specific content, and then copy that content and place it on screen.
Go to Jobs.com and select a state that starts with the same letter as your last name (if there is no state that starts with the same letter, use the second letter, then the third letter, etc.). For example if my name was ‘John Byway’, since there was no B or Y, I would pick a state starting with W – either Washington, West Virginia, Wisconsin, or Wyoming.
The idea is you want to extract all the jobs information out of the page. We want to put the jobs names in a combo box, so the use can pick any of the jobs. Ideally, this would also show the information about each job. You don’t need to do that. However, you do need to extract the information and put it on the screen (When you load the next job, the old job information will be lost, and you will be left with the information about the last job loaded)
Tasks
1. Go to the above address, view the source in the browser, copy and paste it into a text file. You do not need to access the Source of the web page within your application.(in IE go to View / Source; In FireFox go to View / Page Source)
3. Extract each job title and place the name in the Combo box.
4. Find and extract the following fields. Note, some jobs may not list all of these. In that case, get as many as are presented.
a. Date
b. Title of job
c. Company
d. Location
e. Description of job
f. URL associated with the “More” for each description
g. Experience level
h. Career level
i. Education level
Indicate on screen how many jobs you found. Note, you only need to look on the first page of jobs
RegEx is a great way to do the kind of text parsing you need. Here are a couple of links:
VB Dot Net Heaven Intro to RegEx
Discussion about parsing HTML with VB.Net
Note: A well-defined RegEx pattern will get the heavy lifting done for you in this assignment in a dozen lines of code.
P.S. For future reference and perhaps for now... cut the nonsense out of your question. That whole bit about choosing a state with a letter from your name is ridiculous. Please take it out. It's just confusing.