I have a large document library (at the moment ~6000 documents) and I need to find a document based on a custom field value (custom column on the library).
Is there a way of getting this document back without iterating through all 6000 documents?
I understand that an iteration must occur at some point, but I would prefer it to happen on the SharePoint server side, rather than transfer them all to the client side then cherry pick the document.
Thanks
You can query Sharepoint. You issue a CAML query which is executed on the server and brings back only items that match the criteria that you specified. You specify the name of the custom column to search on and you specify the value to find. For efficiency , you can ask only for a few fields back (document url for example). So, you do not need to iterate over documents in the list to find the item.
You can find some discussion here:
http://msdn.microsoft.com/en-us/library/ee956524.aspx and you can also find examples how to do it from javascript or silvelight.
Example CAML:
CamlQuery camlQuery = new CamlQuery();
camlQuery.ViewXml =
#"<View>
<Query>
<Where>
<Eq>
<FieldRef Name='FileLeafRef'/>
<Value Type='Text'>Test.docx</Value>
</Eq>
</Where>
<RowLimit>1</RowLimit>
</Query>
</View>";
Related
I've poured through various articles but haven't found an answer to my exact scenario. I've a SharePoint 2010 list with some Query Parameters for filter purposes. My CAML Query works well for filtering except in one circumstance, I'd like a General Display All criteria in my query for when the listview is first hit (i.e. my client will actively be able to see/page/sort the data without first having to search the list).
If I was just filtering the list fields I'd be set, but since I'm referencing Query Parameters in my CAML I'm receiving SOAP errors in SharePoint Designer.
The SQL equivalent would be: Where (#Parameter1 is null and #Parameter2 is null and #Parameter3 is null...)
I've tried this structure:
<Or Group="true">
<And>
<And>
<IsNull>
<FieldRef Name ="Title"/>
<Value Type="Text">{RollNum}</Value>
</IsNull>
<Gt>
<FieldRef Name="ID"/>
<Value Type="Counter">
<IfEqual>
<Expr1><![CDATA[{Param1}]]></Expr1>
<Expr2/>
<Then>0</Then>
<Else>2147483647</Else>
</IfEqual>
</Value>
</Gt>
</And>
<IsNull>
<FieldRef Name ="RefNumber"/>
<Value Type="Text">{RefNum}</Value>
</IsNull>
</And>
</Or>...the rest of the query, which works fine.
I have a feeling my structure is incorrect.
Thanks in advance,
Brian H.
You can use this syntax to determine if a field is null:
<Where><IsNull><FieldRef Name='YourFieldName' /></IsNull></Where>
You have an extra Value element within IsNull that you should remove.
I decided to use an HTML Form Web Part to achieve my goal (instead of a content editor). The HTML form web part keeps all filter values entered into any boxes, and can be wired up to filter my list using parameters and a custom caml query.
I have an application which creates datarequests which can be quite complex. These need to be stored in the database as tables. An outline of a datarequest (as XML) would be...
<datarequest>
<datatask view="vw_ContractData" db="reporting" index="1">
<datefilter modifier="w0">
<filter index="1" datatype="d" column="Contract Date" param1="2009-10-19 12:00:00" param2="2012-09-27 12:00:00" daterange="" operation="Between" />
</datefilter>
<filters>
<alternation index="1">
<filter index="1" datatype="t" column="Department" param1="Stock" param2="" operation="Equals" />
</alternation>
<alternation index="2">
<filter index="1" datatype="t" column="Department" param1="HR" param2="" operation="Equals" />
</alternation>
</filters>
<series column="Turnaround" aggregate="avg" split="0" splitfield="" index="1">
<filters />
</series>
<series column="Requested 3" aggregate="avg" split="0" splitfield="" index="2">
<filters>
<alternation index="1">
<filter index="1" datatype="t" column="Worker" param1="Malcom" param2="" operation="Equals" />
</alternation>
</filters>
</series>
<series column="Requested 2" aggregate="avg" split="0" splitfield="" index="3">
<filters />
</series>
<series column="Reqested" aggregate="avg" split="0" splitfield="" index="4">
<filters />
</series>
</datatask>
</datarequest>
This encodes a datarequest comprising a daterange, main filters, series and series filters. Basically any element which has the index attribute can occur multiple times within its parent element - the exception to this being the filter within datefilter.
But the structure of this is kind of academic, the problem is more fundamental:
When a request comes through, XML like this is sent to SQLServer as a parameter to a stored proc. This XML is shredded into a de-normalised table and then written iteratively to normalised tables such as tblDataRequest (DataRequestID PK), tblDataTask, tblFilter, tblSeries. This is fine.
The problem occurs when I want to match a given XML defintion with one already held in the DB. I currently do this by...
Shredding the XML into a de-normalised table
Using a CTE to pull all the existing data in the database into that same de-normalised form
Matching using a huge WHERE condition (34 lines long)
..This will return me any DataRequestID which exactly matches the XML given. I fear that this method will end up being painfully slow - partly because I don't believe the CTE will do any clever filtering, it will pull all the data every single time before applying the huge WHERE.
I have thought there must be better solutions to this eg
When storing a datarequest, also store a hash of the datarequest somehow and simply match on that. In the case of collision, use the current method. I wanted however to do this using set-logic. And also, I'm concerned about irrelevant small differences in the XML changing the hash - spurious spaces etc.
Somehow perform the matching iteratively from the bottom up. Eg produce a list of filters which match on the lowest level. Use this as part of an IN to match Series. Use this as part of an IN to match DataTasks etc etc. The trouble is, I start to black-out when I think about this for too long.
Basically - Has anyone ever encountered this kind of problem before (they must have). And what would be the recommended route for tackling it? example (pseudo)code would be great :)
To get rid of the possibility of minor variances, I'd run the request through an XML transform (XSLT).
Alternatively, since you've already got the code to parse this out into a denormalized staging table that's fine too. I would then simply using FOR XML to create a new XML doc.
Your goal here is to create a standardized XML document that respects ordering where appropriate and removes inconsistencies where it is not.
Once that is done, store this in a new table. Now you can run a direct comparison of the "standardized" request XML against existing data.
To do the actual comparison, you can use a hash, store the XML as a string and do a direct string comparison, or do a full XML comparison like this: http://beyondrelational.com/modules/2/blogs/28/posts/10317/xquery-lab-36-writing-a-tsql-function-to-compare-two-xml-values-part-2.aspx
My preference, as long as the XML is never over 8000bytes, would be to create a unique string (either VARCHAR(8000) or NVARCHAR(4000) if you have special character support) and create a unique index on the column.
Example, I have some groups and some items belong to these group. Now, I want to list items of arbitrary group by url like http://sp2010/Lists/items.aspx?groupid=1.
I maked a SPView like
SPView view = SPList.DefaultView;
view.Query = "<Where>
<Eq>
<FieldRef Name=\"Group\" LookupId=\"TRUE\">
<Value Type=\"Lookup\"><GetVar Name=\"GroupId\"></Value>
</Eq>
</Where>";
view.Update();
It work on sharepoint 2007, but unfortunately it don't work on sharepoint 2010.
Try FilterField and FilterValue:
http://sp2010/Lists/MyList/AllItems.aspx?FilterField1=group&FilterValue1=Group%20Name
Note that the value of FilterField is the internal name of the field, not the display name.
I'm doing a research project for the summer and I've got to use get some data from Wikipedia, store it and then do some analysis on it. I'm using the Wikipedia API to gather the data and I've got that down pretty well.
What my questions is in regards to the links-alllinks option in the API doc here
After reading the description, both there and in the API itself (it's down and bit and I can't link directly to the section), I think I understand what it's supposed to return. However when I ran a query it gave me back something I didn't expect.
Here's the query I ran:
http://en.wikipedia.org/w/api.php?action=query&prop=revisions&titles=google&rvprop=ids|timestamp|user|comment|content&rvlimit=1&list=alllinks&alunique&allimit=40&format=xml
Which in essence says: Get the last revision of the Google page, include the id, timestamp, user, comment and content of each revision, and return it in XML format.
The allinks (I thought) should give me back a list of wikipedia pages which point to the google page (In this case the first 40 unique ones).
I'm not sure what the policy is on swears, but this is the result I got back exactly:
<?xml version="1.0"?>
<api>
<query><normalized>
<n from="google" to="Google" />
</normalized>
<pages>
<page pageid="1092923" ns="0" title="Google">
<revisions>
<rev revid="366826294" parentid="366673948" user="Citation bot" timestamp="2010-06-08T17:18:31Z" comment="Citations: [161]Tweaked: url. [[User:Mono|Mono]]" xml:space="preserve">
<!-- The page content, I've replaced this cos its not of interest -->
</rev>
</revisions>
</page>
</pages>
<alllinks>
<!-- offensive content removed -->
</alllinks>
</query>
<query-continue>
<revisions rvstartid="366673948" />
<alllinks alfrom="!2009" />
</query-continue>
</api>
The <alllinks> part, its just a load of random gobbledy-gook and offensive comments. No nearly what I thought I'd get. I've done a fair bit of searching but I can't seem to find a direct answer to my question.
What should the list=alllinks option return?
Why am I getting this crap in there?
You don't want a list; a list is something that iterates over all pages. In your case you simply "enumerate all links that point to a given namespace".
You want a property associated with the Google page, so you need prop=links instead of the alllinks crap.
So your query becomes:
http://en.wikipedia.org/w/api.php?action=query&prop=revisions|links&titles=google&rvprop=ids|timestamp|user|comment|content&rvlimit=1&format=xml
During document processing I want to extract all dates from html meta data and then identify the latest date which will be used to populate a date field (dtgeneric1).
<meta name="OriginalPublicationDate" content="2010/04/21 12:06:36" />
<meta name="LastModificationDate" content="2010/04/22 14:10:16" />
+ other non-date meta data
Inspection using spy stages shows that our pipeline already adds meta_* attributes but the meta data names will be different across documents from different sources.
#### ATTRIBUTE meta_originalpublicationdate <class 'docproc.DocumentAttributes.TextChunks'>: 2010/04/21 12:06:36
#### ATTRIBUTE meta_lastmodificationdate <class 'docproc.DocumentAttributes.TextChunks'>: 2010/04/22 14:10:16
+ other non-date meta attributes
Ideally we would like to pass all the meta_* attributes to a Python stage and use that to work out which are dates and which is the largest but there seems to be no way of specifying "all meta attributes" as input.
Has anyone done something similar and can offer any advice on the best way to do this.
Thanks
Neil
I suppose that a custom stage that takes all the needed date attributes as an input, processes a comparison between all them (to find the newest date), and outputs the most up-to-date field will do the job.