Title and description aren't indexed with collective.dexteritytextindexer - indexing

I have lots of Dexterity content types, some of them are just containers and are left with just the Title and Description (from plone.app.dexterity.behaviors.metadata.IBasic behavior).
I can find them by searching the text inside their title or description.
But for some complex content types I'm using collective.dexteritytextindexer to index some more fields and it works fine, I can find the text on the fields I marked to be indexed.
However the Title and Description are no longer available for searching. I tried something like:
class IMyContent(form.Schema):
"""My content type description
"""
dexteritytextindexer.searchable('title')
dexteritytextindexer.searchable('description')
dexteritytextindexer.searchable('long_desc')
form.widget(long_desc = WysiwygFieldWidget)
long_desc = schema.Text (
title = _(u"Rich description"),
description = _(u"Complete description"),
required = False,
)
...
But I can't see the content of title and description on the SearchableText column in the portal_catalog, and thus the results don't show them.
Any idea what I'm missing?
Cheers,

Got pretty much the same issue. Following the documentation on http://pypi.python.org/pypi/collective.dexteritytextindexer I used
from collective import dexteritytextindexer
from plone.autoform.interfaces import IFormFieldProvider
from plone.directives import form
from zope import schema
from zope.interface import alsoProvides
class IMyBehavior(form.Schema):
dexteritytextindexer.searchable('specialfield')
specialfield = schema.TextField(title=u'Special field')
alsoProvides(IMyBehavior, IFormFieldProvider)
to get my own fields indexed. However, the code
from plone.app.dexterity.interfaces import IBasic
from collective.dexteritytextindexer.utils import searchable
searchable(IBasic, 'title')
searchable(IBasic, 'description')
Didn't work. The import of IBasic fails. Seems this can easily be solved by importing
from plone.app.dexterity.behaviors.metadata import IBasic

The problem is probably that the field is coming from the IBasic or IDublineCore behaviour and not from your schema. I don't know enough about collective.dexteritytextindexer to know how to work around this, though.
Another option may be to just use plone.indexer and create your own SearchableText indexer that returns "%s %s %s" % (context.title, context.description, context.long_desc,). See the Dexterity docs for details.

As a reference this is the code I ended up writing:
#indexer(IMyDexterityType)
def searchableIndexer(context):
transforms = getToolByName(context, 'portal_transforms')
long_desc = context.long_desc // long_desc is a rich text field
if long_desc is not None:
long_desc = transforms.convert('html_to_text', long_desc).getData()
contacts = context.contacts // contacts is also a rich text field
if contacts is not None:
contacts = transforms.convert('html_to_text', contacts).getData()
return "%s %s %s %s" % (context.title, context.description, long_desc, contacts,)
grok.global_adapter(searchableIndexer, name="SearchableText")

Related

Correct use of append and Alignat from pylatex while pdf creating (python)

I want to save some formulas from latex in pdf
from pylatex import Document, Section, Subsection, Command,Package, Alignat
doc = Document(default_filepath='basic.tex', documentclass='article')
doc.append('Solve the equation:')
doc.append(r'$$\frac{x}{10} = 0 \\$$',Alignat(numbering=False, escape=False))
doc.generate_pdf("test", clean_tex=True)
But I get an error:
doc.append(r'$$\frac{x}{10} = 0 \\$$',Alignat(numbering=False, escape=False))
TypeError: append() takes 2 positional arguments but 3 were given
How should I solve my problem?
This answer comes late, but I guess there is no harm: The Alignat environment cannot be passed to append like that, instead the appended formula is enclosed in it. Also it is a math environment, so the $$ are not necessary.
from pylatex import Document, Section, Subsection, Command,Package, Alignat
doc = Document(default_filepath='basic.tex', documentclass='article')
doc.append('Solve the equation:')
with doc.create(Alignat(numbering=False, escape=False)) as agn:
agn.append(r'\frac{x}{10} = 0')
Output:

How do I find a specific tag's value (which could be anything) with beautifulsoup?

I am trying to get the job IDs from the tags of Indeed listings. So far, I have taken Indeed search results and put each job into its own "bs4.element.Tag" object, but I don't know how to extract the value of the tag (or is it a class?) "data-jk". Here is what I have so far:
import requests
import bs4
import re
# 1: scrape (5?) pages of search results for listing ID's
results = []
results.append(requests.get("https://www.indeed.com/jobs?q=data+analyst&l=United+States&start=0"))
results.append(requests.get("https://www.indeed.com/jobs?q=data+analyst&l=United+States&start=10"))
results.append(requests.get("https://www.indeed.com/jobs?q=data+analyst&l=United+States&start=20"))
results.append(requests.get("https://www.indeed.com/jobs?q=data+analyst&l=United+States&start=30"))
results.append(requests.get("https://www.indeed.com/jobs?q=data+analyst&l=United+States&start=40"))
# each search page has a query "q", location "l", and a "start" = 10*int
# the search results are contained in a "td" with ID = "resultsCol"
justjobs = []
for eachResult in results:
soup_jobs = bs4.BeautifulSoup(eachResult.text, "lxml") # this is for IDs
justjobs.extend(soup_jobs.find_all(attrs={"data-jk":True})) # re.compile("data-jk")
# each "card" is a div object
# each has the class "jobsearch-SerpJobCard unifiedRow row result clickcard"
# as well as a specific tag "data-jk"
# "data-jk" seems to be the actual IDs used in each listing's URL
# Now, each div element has a data-jk. I will try to get data-jk from each one:
jobIDs = []
print(type(justjobs[0])) # DEBUG
for eachJob in justjobs:
jobIDs.append(eachJob.find("data-jk"))
print("Length: " + str(len(jobIDs))) # DEBUG
print("Example JobID: " + str(jobIDs[1])) # DEBUG
The examples I've seen online generally try to get the information contained between and , but I am not sure how to get the info from inside of the (first) tag itself. I've tried doing it by parsing it as a string instead:
print(justjobs[0])
for eachJob in justjobs:
jobIDs.append(str(eachJob)[115:131])
print(jobIDs)
but the website is also inconsistent with how the tags operate, and I think that using beautifulsoup would be more flexible than multiple cases and substrings.
Any pointers would be greatly appreciated!
Looks like you can regex them out from a script tag
import requests,re
html = requests.get('https://www.indeed.com/jobs?q=data+analyst&l=United+States&start=0').text
p = re.compile(r"jk:'(.*?)'")
ids = p.findall(html)

Marshmallow: How can I code for my schema to accept fields not in the table but used to filter?

I want to make an API that accepts two date parameters and have that return all results in a database table. I don't want these dates stored anywhere so I can't define them as columns. My table has three columns: id (int), answers(json), datestamp(DateTime). I want to pass in "start_date" and "end_date" variables in the API request, and have all answers and ids from that period. My date conversion from strings in fine but I keep getting hit with the {"_schema":["Invalid input type."]} response. Is it possible to pass in these fields even though they don't exist in the table? Do I need to define them like I have done the columns in my model? If so instead of Column(...), what would it be?
I've tried "additional_properties = fields.Raw()" but I still got the invalid input type error. I've played with the schema but seem to be missing something.
schema
from ma import ma # marshmallow
from models.c_bulk import CBulkModel
from flask_restful import fields
class CBulkSchema(ma.ModelSchema):
class Meta:
additional properties = fields.Raw()
model = ClaraBulkModel
dump_only = ("id", "answers", "datestamp")
model
from typing import List
from datetime import timedelta
from db import db
class CBulkModel(db.Model):
__tablename__ = "test_post_date"
datestamp = db.Column(db.DateTime, nullable=False)
id = db.Column(db.BigInteger, nullable=True, unique=True, primary_key=True)
answers = db.Column(db.JSON, nullable=True)
#classmethod
def find_all(cls, start_date, end_date): # -> List["CBulkModel"]:
return cls.query.filter(db.and_(start_date >= cls.datestamp, end_date < cls.datestamp))
{"_schema":["Invalid input type."]} instead of the surveys from that time.
Well I guess I figure it out. There were more code changes to make the final solution work by the way but the input issue was resolved by not passing in the JSON. If you don't pass in the JSON through your schema you can load in any attribute and use it as a filter.
try add unknown = INCLUDE.. hope this help...
enter code herefrom ma import ma # marshmallow
from models.c_bulk import CBulkModel
from flask_restful import fields
class CBulkSchema(ma.ModelSchema):
class Meta:
additional properties = fields.Raw()
model = ClaraBulkModel
dump_only = ("id", "answers", "datestamp")
unknown = INCLUDE

Pymongo: insert_many() gives "TypeError: document must be instance of dict" for list of dicts

I haven't been able to find any relevant solutions to my problem when googling, so I thought I'd try here.
I have a program where I parse though folders for a certain kind of trace files, and then save these in a MongoDB database. Like so:
posts = function(source_path)
client = pymongo.MongoClient()
db = client.database
collection = db.collection
insert = collection.insert_many(posts)
def function(...):
....
post = parse(trace)
posts.append(post)
return posts
def parse(...):
....
post = {'Thing1': thing,
'Thing2': other_thing,
etc}
return post
However, when I get to "insert = collection.insert_many(posts)", it returns an error:
TypeError: document must be an instance of dict, bson.son.SON, bson.raw_bson.RawBSONDocument, or a type that inherits from collections.MutableMapping
According to the debugger, "posts" is a list of about 1000 dicts, which should be vaild input according to all of my research. If I construct a smaller list of dicts and insert_many(), it works flawlessly.
Does anyone know what the issue may be?
Some more debugging revealed the issue to be that the "parse" function sometimes returned None rather than a dict. Easily fixed.

Flask button to save table from query as csv

I have a flask app that runs a query and returns a table. I would like to provide a button on the page so the user can export the data as a csv.
The problem is that the query is generated dynamically based on form input.
#app.route('/report/<int:account_id>', methods=['GET'])
def report(account_id):
if request == 'GET':
c = g.db.cursor()
c.execute('SELECT * FROM TABLE WHERE account_id = :account_id', account_id=account_id)
entries = [dict(title=row[0], text=row[1]) for row in c.fetchall()]
return render_template('show_results.html', entries=entries)
On the html side it's just a simple table, looping over the rows and rendering them. I'm using bootstrap for styling, and included a tablesorter jquery plugin. None of this is really consequential. I did try one javascript exporter I found, but since my content is rendered dynamically, it saves a blank CSV.
Do I need to do some ajax-style trickery to grab a csv object from the route?
I solved this myself. For anyone who comes across this I find it valuable for the specific use case within flask. Here's what I did.
import cx_Oracle # We are an Oracle shop, and this changes some things
import csv
import StringIO # allows you to store response object in memory instead of on disk
from flask import Flask, make_response # Necessary imports, should be obvious
#app.route('/export/<int:identifier>', methods=['GET'])
def export(load_file_id):
si = StringIO.StringIO()
cw = csv.writer(si)
c = g.db.cursor()
c.execute('SELECT * FROM TABLE WHERE column_val = :identifier', identifier=identifier)
rows = c.fetchall()
cw.writerow([i[0] for i in c.description])
cw.writerows(rows)
response = make_response(si.getvalue())
response.headers['Content-Disposition'] = 'attachment; filename=report.csv'
response.headers["Content-type"] = "text/csv"
return response
For anyone using flask with sqlalchemy, here's an adjustment to tadamhicks answer, also with a library update:
import csv
from io import StringIO
from flask import make_response
si = StringIO()
cw = csv.writer(si)
records = myTable.query.all() # or a filtered set, of course
# any table method that extracts an iterable will work
cw.writerows([(r.fielda, r.fieldb, r.fieldc) for r in records])
response = make_response(si.getvalue())
response.headers['Content-Disposition'] = 'attachment; filename=report.csv'
response.headers["Content-type"] = "text/csv"
return response