Can´t upload an check Excel file on Unix system - openpyxl

I'm trying to upload and check if one file is an Excel type, my code is:
for router:
from fastapi import APIRouter, UploadFile,File, HTTPException, status, Depends
from fastapi.responses import HTMLResponse
from app.api.deps import get_curret_admin
from app.schemas.admin import Admins
from app.core.verify_file import check_excel
router = APIRouter()
#Upload Excel File
#router.post(
path="/api/uploadfile/",
status_code=status.HTTP_200_OK,
summary="Upload a excel file",
tags= ["Files"]
)
async def upload_file(
file: UploadFile,
admin: Admins= Depends(get_curret_admin)
):
if not check_excel(file.file):
raise HTTPException(status_code=400, detail="El archivo no es de tipo Excel")
return HTMLResponse("El archivo ha sido subido correctamente")
And check:
from openpyxl import load_workbook
def check_excel(file):
try:
wb= load_workbook(file)
return True
except:
return False
The problem is this works fine on Windows, but Linux or macOS doesn't. In Windows check when upload an Excel file, but in Unix system all files are not Excel result, and don't accept anyone.

Ok, i finded the answer by my own, the problen was the version of python, in my mac and linux pc is under 3.11, that's the reason, there has and issue on TemporyFile under python 3.11. This solve using io.BytesIO. Like this page:
https://asiones.hashnode.dev/fastapi-receive-a-xlsx-file-and-read-it

Related

unable to open database file on a hosting service pythonanywhere

I want to deploy my project on pythonanywhere. Error.log says that server or machine is unable to open my database. Everything works fine on my local machine. I watched a video of Pretty Printed from YouTube
This how I initialize in app.py. This what I got from error.log
db_session.global_init("db/data.sqlite")
this in db_session:
def global_init(db_file):
global __factory
if __factory:
return
if not db_file or not db_file.strip():
raise Exception("Необходимо указать файл базы данных.")
conn_str = f'sqlite:///{db_file.strip()}?check_same_thread=False'
print(f"Подключение к базе данных по адресу {conn_str}")
engine = sa.create_engine(conn_str, echo=False)
__factory = orm.sessionmaker(bind=engine)
from . import __all_models
SqlAlchemyBase.metadata.create_all(engine)
def create_session() -> Session:
global __factory
return __factory()
last thing is my wsgi.py:
import sys
path = '/home/r1chter/Chicken-beta'
if path not in sys.path:
sys.path.append(path)
import os
from dotenv import load_dotenv
project_folder = os.path.expanduser(path)
load_dotenv(os.path.join(project_folder, '.env'))
import app # noqa
application = app.app()
Usually errors like this on PythonAnywhere are due to providing relative path instead of absolute path.

How to fill JavaScript form using Python?

I want to use Python to fill this form.
I tried using Mechanize but this is a Microsoft Form which uses JavaScript and has no form tag and no GET/POST URL. Maybe BeautifulSoup/Selenium can do this, but I do not have any experience in scraping JS forms. Can anyone help me out and suggest how to go about this?
Here's what I've tried, Mechanize is unable to recognize any form on the page:
import mechanize
def main():
br = mechanize.Browser()
br.set_handle_robots(False)
br.set_handle_refresh(False)
br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')]
response = br.open("https://forms.office.com/Pages/ResponsePage.aspx?id=8Pm7rtoj40mYvzIXGrvJvCxQDveyljlCrKN2Teo3EHFUQVNaWDlYRkhYR09JRTZWRFpKTTNIQU9HUC4u")
for form in br.forms():
print("Form name:", form.name) #prints nothing
print(form) #prints nothing
if __name__ == '__main__':
main()
Selenium works fine.
You'll need to install the components
install selenium pip install selenium
You need to ensure you download the correct chromedriver (or other driver) for your browser and OS versions and add it to path
Then this runs:
from selenium import webdriver
driver = webdriver.Chrome()
url = "https://forms.office.com/Pages/ResponsePage.aspx?id=8Pm7rtoj40mYvzIXGrvJvCxQDveyljlCrKN2Teo3EHFUQVNaWDlYRkhYR09JRTZWRFpKTTNIQU9HUC4u"
driver.get(url)
name = driver.find_element_by_xpath("//div[#class='question-title-box'][.//span[text()='NAME']]/following-sibling::*//input")
name.send_keys("hello, World")
setionSelection = "F"
section = driver.find_element_by_xpath("//div[#class='question-title-box'][.//span[text()='Section']]/following-sibling::*//input[#value='" + setionSelection + "']")
section.click()
date = driver.find_element_by_xpath("//input[contains(#placeholder, 'Please input date')]")
date.send_keys("01/12/2020")
submit = driver.find_element_by_xpath("//div[text()='Submit']")
submit.click()
The xapths are a little long but they're based on the question text so potentially stable
For an alternative approach - When you say there is no POST url, did you check devtools? - That exposes the destination of the form:
Request URL: https://forms.office.com/formapi/api/aebbf9f0-23da-49e3-98bf-32171abbc9bc/users/f70e502c-96b2-4239-aca3-764dea371071/forms('8Pm7rtoj40mYvzIXGrvJvCxQDveyljlCrKN2Teo3EHFUQVNaWDlYRkhYR09JRTZWRFpKTTNIQU9HUC4u')/responses
Request Method: POST
it also exposes the payload... This is the first submit:
{startDate: "2020-08-17T10:40:18.504Z", submitDate: "2020-08-17T10:40:18.507Z",…}
answers: "[{"questionId":"r8f09d63e6f6f42feb2f8f4f8ed3f9389","answer1":"Hello, World"},{"questionId":"r28fe12073dfa47399f8ce95ae679dccf","answer1":"G"},{"questionId":"r8f9e9fedcc2e410c80bfa1e0e3ef9750","answer1":"2020-08-28"}]"
startDate: "2020-08-17T10:40:18.504Z"
submitDate: "2020-08-17T10:40:18.507Z"
Those post URL UUID/GUIDs questions IDs seem to be satic for this form. Every time i run form they're not chaning. This is the second run:
{startDate: "2020-08-17T10:43:48.544Z", submitDate: "2020-08-17T10:43:48.546Z",…}
answers: "[{"questionId":"r8f09d63e6f6f42feb2f8f4f8ed3f9389","answer1":"test me"},{"questionId":"r28fe12073dfa47399f8ce95ae679dccf","answer1":"G"},{"questionId":"r8f9e9fedcc2e410c80bfa1e0e3ef9750","answer1":"2020-08-12"}]"
startDate: "2020-08-17T10:43:48.544Z"
submitDate: "2020-08-17T10:43:48.546Z"
Once you capture this once you'll probably be able to do it through the API without a GUI.
... Just to make sure, i tried it and i get success...
import requests
url = "https://forms.office.com/formapi/api/aebbf9f0-23da-49e3-98bf-32171abbc9bc/users/f70e502c-96b2-4239-aca3-764dea371071/forms('8Pm7rtoj40mYvzIXGrvJvCxQDveyljlCrKN2Teo3EHFUQVNaWDlYRkhYR09JRTZWRFpKTTNIQU9HUC4u')/responses"
myobj = {"startDate":"2020-08-17T10:48:40.118Z","submitDate":"2020-08-17T10:48:40.121Z","answers":"[{\"questionId\":\"r8f09d63e6f6f42feb2f8f4f8ed3f9389\",\"answer1\":\"Hello again, World\"},{\"questionId\":\"r28fe12073dfa47399f8ce95ae679dccf\",\"answer1\":\"F\"},{\"questionId\":\"r8f9e9fedcc2e410c80bfa1e0e3ef9750\",\"answer1\":\"2020-08-26\"}]"}
x = requests.post(url, data = myobj)
My answers are just hard coded into the data object but it seems to work.
Remember to pip install requests if you don't already have it

why scrapy logs different in the console and external log file

I'm new to Scrapy and I once managed to run my script well on Scrapy 0.24. But when I switched to the newly launched 1.0 I encountered a logging problem: What I want to do is to set both the file and the console log level to INFO, but however I set the LOG_LEVEL or the configure_logging() function(using the Python internal logging package instead of scrapy.log), Scrapy always logs DEBUG level information to the console, which returns the whole item object in format of dict. In fact, the LOG_LEVEL option only works for the external file. I suspect it must have something to do with the Python logging but have no idea how to set it. Could any one help me out?
This is how I config my logging in run_my_spider.py:
from crawler.settings import LOG_FILE, LOG_FORMAT
from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings
from scrapy.utils.log import configure_logging
from crawler.spiders.MySpiders import MySpider
import logging
def run_spider(spider):
settings = get_project_settings()
# configure file logging
# It ONLY works for the file
configure_logging({'LOG_FORMAT': LOG_FORMAT,
'LOG_ENABLEED' : True,
'LOG_FILE' : LOG_FILE,
'LOG_LEVEL' : 'INFO',
'LOG_STDOUT' : True})
# instantiate spider
process = CrawlerProcess(settings)
process.crawl(MySpider)
logging.info('Running Crawler: ' + spider.name)
process.start() # the script will block here until the spider_closed signal was sent
logging.info('Crawler ' + spider.name + ' stopped.\n')
......
This is the console output:
DEBUG:scrapy.core.engine:Crawled (200) <GET http://mil.news.sina.com.cn/2014-10-09/0450804543.html>(referer: http://rss.sina.com.cn/rollnews/jczs/20141009.js)
{'item_name': 'item_sina_news_reply',
'news_id': u'jc:27-1-804530',
'reply_id': u'jc:27-1-804530:1',
'reply_lastcrawl': '1438605374.41',
'reply_table': 'news_reply_20141009'}
Many Thanks!
It may be that what you are viewing in the console is the Twisted Logs.
It will print the Debug level messages to the console.
You can redirect them to your log files using:
from twisted.python import log
observer = log.PythonLoggingObserver(loggerName='logname')
observer.start()
(As given in How to make Twisted use Python logging?)

ScrapyDeprecationWaring: Command's default `crawler` is deprecated and will be removed. Use `create_crawler` method to instantiate crawlers

Scrapy version 0.19
I am using the code at this page ( Run multiple scrapy spiders at once using scrapyd ). When I run scrapy allcrawl, I got
ScrapyDeprecationWaring: Command's default `crawler` is deprecated and will be removed. Use `create_crawler` method to instantiate crawlers
Here is the code:
from scrapy.command import ScrapyCommand
import urllib
import urllib2
from scrapy import log
class AllCrawlCommand(ScrapyCommand):
requires_project = True
default_settings = {'LOG_ENABLED': False}
def short_desc(self):
return "Schedule a run for all available spiders"
def run(self, args, opts):
url = 'http://localhost:6800/schedule.json'
for s in self.crawler.spiders.list(): #this line raise the warning
values = {'project' : 'YOUR_PROJECT_NAME', 'spider' : s}
data = urllib.urlencode(values)
req = urllib2.Request(url, data)
response = urllib2.urlopen(req)
log.msg(response)
How do I fix the DeprecationWarning ?
Thanks
Use:
crawler = self.crawler_process.create_crawler()

ImproperlyConfigured: Requested setting MIDDLEWARE_CLASSES, but settings are not configured

i am confuring the apache mod_wsgi to django project
and here is my djangotest.wsgi file
import os
import sys
sys.path = ['/home/pavan/djangoproject'] + sys.path
os.environ['DJANGO_SETTINS_MODULE'] = 'djangoproject.settings'
import django.core.handlers.wsgi
_application = django.core.handlers.wsgi.WSGIHandler()
def application(environ, start_response):
environ['PATH_INFO'] = environ['SCRIPT_NAME'] + environ['PATH_INFO']
return _application(environ, start_response)
and i add the WSGIScrptAlias to my virtual directory
when i try to get the homepage of the project it says the following error
ImproperlyConfigured: Requested setting MIDDLEWARE_CLASSES, but settings are not configured. You must either define the environment variable DJANGO_SETTINGS_MODULE or call settings.configure() before accessing settings.
Looks like you have a typo on line 4. 'DJANGO_SETTINS_MODULE' should be 'DJANGO_SETTINGS_MODULE'.