I'm trying to extract certain values from google patents: https://console.cloud.google.com/marketplace/product/google_patents_public_datasets/google-patents-public-data?project=pivotal-life-sciences
and the datatype of the cpc (The Cooperative Patent Classification codes) classification inside table patents-public-data.patents.publications is RECORD
If I only wanted to pull records where at least 1 of the codes is like "C01%" how would I do that?
what I'm trying right now is:
SELECT inventor, assignee, cpc, title_localized, abstract_localized, claims_localized, description_localized, filing_date
FROM patents-public-data.patents.publications
WHERE filing_date >= 20200101 and cpc.code like 'C01%'
LIMIT 2
but this returns the error:
google.api_core.exceptions.BadRequest: 400 Cannot access field code on a value with type ARRAY<STRUCT<code STRING, inventive BOOL, first BOOL, ...>> at [4:43]
I understand that this means that I'm having bad syntax on my attempt to sort the cpc statement, but I don't know what the right records are.
if it helps, here is the result from the query:
SELECT inventor, assignee, cpc, title_localized, abstract_localized, claims_localized, description_localized, filing_date
FROM patents-public-data.patents.publications
WHERE filing_date >= 20200101
LIMIT 2
output:
Row((['MEIJER, JAN', 'STAACKE, Robert', 'BURCHARD, BERND', 'MEIJER, Nils'], ['Jan Meijer', 'Staacke Robert', 'Bernd Burchard'], [{'code': 'C01B32/26', 'inventive': True, 'first': False, 'tree': []}, {'code': 'G01R33/032', 'inventive': True, 'first': True, 'tree': []}, {'code': 'G01N24/006', 'inventive': False, 'first': False, 'tree': []}, {'code': 'G01R33/26', 'inventive': False, 'first': False, 'tree': []}, {'code': 'C01B32/28', 'inventive': True, 'first': False, 'tree': []}], [{'text': 'Nv-zentrum basierender mikrowellenfreier quantensensor und dessen anwendungen und ausprägungen', 'language': 'de', 'truncated': False}, {'text': 'Nv-centre-based microwave-free quantum sensor and uses and characteristics thereof', 'language': 'en', 'truncated': False}, {'text': 'Capteur quantique sans micro-ondes fondé sur un centre nv et applications et formes dudit capteur', 'language': 'fr', 'truncated': False}], [{'text': 'Die Erfindung betrifft ein Sensorsystem auf Basis von Diamanten mit einer hohen Dichte an NV-Zentren. Die Beschreibung umfasst a) Methoden zur Herstellung der notwendigen Diamanten hoher NV-Zentrendichte, b) Merkmale solcher Diamanten, c) Sensorelemente für die Nutzung der Fluoreszenzstrahlung der solcher Diamanten, d) Sensorelemente für die Nutzung des Fotostromes solcher Diamanten, e) Systeme zur Auswertung dieser Größen, f) Systeme mit verringertem Rauschen zur Auswertung dieser Systeme, g) Gehäuse zur Verwendung solcher Systeme in automatischen Bestückungsanlagen, g) Verfahren zum Test diese Systeme und h) ein Musikinstrument als Beispiel einer letztendlichen Anwendung all dieser Vorrichtungen und Verfahren.', 'language': 'de', 'truncated': False}, {'text': 'Die Erfindung betrifft ein Sensorsystem auf Basis von Diamanten mit einer hohen Dichte an NV-Zentren. Die Beschreibung umfasst a) Methoden zur Herstellung der notwendigen Diamanten hoher NV-Zentrendichte, b) Merkmale solcher Diamanten, c) Sensorelemente für die Nutzung der Fluoreszenzstrahlung der solcher Diamanten, d) Sensorelemente für die Nutzung des Fotostromes solcher Diamanten, e) Systeme zur Auswertung dieser Größen, f) Systeme mit verringertem Rauschen zur Auswertung dieser Systeme, g) Gehäuse zur Verwendung solcher Systeme in automatischen Bestückungsanlagen, g) Verfahren zum Test diese Systeme und h) ein Musikinstrument als Beispiel einer letztendlichen Anwendung all dieser Vorrichtungen und Verfahren.', 'language': 'de', 'truncated': False}, {'text': 'The invention relates to a sensor system on the basis of diamonds with a high density of NV centres. The description comprises a) methods for producing the necessary diamonds with high NV centre density, b) features of such diamonds, c) sensor elements for the use of the fluorescence radiation of such diamonds, d) sensor elements for the use of the photocurrent of such diamonds, e) systems for evaluating these variables, f) systems with reduced noise for evaluating these systems, g) housing for the use of such systems in automatic placement systems, g) method for testing these systems, and h) a musical instrument as an example of a final use of all of these devices and methods.', 'language': 'en', 'truncated': False}, {'text': 'The invention relates to a sensor system on the basis of diamonds with a high density of NV centres. The description comprises a) methods for producing the necessary diamonds with high NV centre density, b) features of such diamonds, c) sensor elements for the use of the fluorescence radiation of such diamonds, d) sensor elements for the use of the photocurrent of such diamonds, e) systems for evaluating these variables, f) systems with reduced noise for evaluating these systems, g) housing for the use of such systems in automatic placement systems, g) method for testing these systems, and h) a musical instrument as an example of a final use of all of these devices and methods.', 'language': 'en', 'truncated': False}, {'text': 'L'invention concerne un système capteur à base de diamants dont la densité en centres NV est élevée. La description comprend a)\u202fdes méthodes de production des diamants nécessaires dont la densité des centres NV est élevée, b)\u202fles caractéristiques de tels diamants, c)\u202fdes éléments capteurs pour l'exploitation du rayonnement fluorescent de tels diamants, d)\u202fdes éléments capteurs pour l'exploitation du photocourant de tels diamants, e)\u202fdes systèmes d'évaluation de ces grandeurs, f)\u202fdes systèmes à bruit réduit pour l'évaluation de ces systèmes, g)\u202fun carter permettant d'utiliser de tels systèmes dans des installations de montage automatique, g) un procédé pour tester ces systèmes, et h) un instrument de musique comme exemple d'une application finale de tous ces dispositifs et du procédé.', 'language': 'fr', 'truncated': False}, {'text': 'L'invention concerne un système capteur à base de diamants dont la densité en centres NV est élevée. La description comprend a)\u202fdes méthodes de production des diamants nécessaires dont la densité des centres NV est élevée, b)\u202fles caractéristiques de tels diamants, c)\u202fdes éléments capteurs pour l'exploitation du rayonnement fluorescent de tels diamants, d)\u202fdes éléments capteurs pour l'exploitation du photocourant de tels diamants, e)\u202fdes systèmes d'évaluation de ces grandeurs, f)\u202fdes systèmes à bruit réduit pour l'évaluation de ces systèmes, g)\u202fun carter permettant d'utiliser de tels systèmes dans des installations de montage automatique, g) un procédé pour tester ces systèmes, et h) un instrument de musique comme exemple d'une application finale de tous ces dispositifs et du procédé.', 'language': 'fr', 'truncated': False}], [], [], 20200722), {'inventor': 0, 'assignee': 1, 'cpc': 2, 'title_localized': 3, 'abstract_localized': 4, 'claims_localized': 5, 'description_localized': 6, 'filing_date': 7})
so TL;DR how do I sort through RECORD objects in sql with the LIKE operator?
OpenAI's GPT-3 gave me the answer. Damn language models taking over StackOverflow jobs...
Looks like its:
SELECT inventor, assignee, cpc, title_localized, abstract_localized, claims_localized, description_localized, filing_date
FROM `patents-public-data.patents.publications`
WHERE filing_date >= 20200101 and
(SELECT count(*)
FROM UNNEST(cpc) c
WHERE c.code LIKE 'C01%'
) > 0
LIMIT 2
If anyone is curious I just copied that stack qustion into the playground and this was the raw output:
A:
I think you want to unnest the cpc column:
<code>SELECT inventor, assignee, cpc, title_localized, abstract_localized, claims_localized, description_localized, filing_date
FROM patents-public-data.patents.publications
WHERE filing_date >= 20200101 and
(SELECT count(*)
FROM UNNEST(cpc) c
WHERE c.code LIKE 'C01%'
) > 0
LIMIT 2;
</code>
Another option
SELECT inventor, assignee, cpc, title_localized, abstract_localized, claims_localized, description_localized, filing_date
FROM `patents-public-data.patents.publications` t
WHERE filing_date >= 20200101 and
EXISTS (SELECT 1 FROM t.cpc WHERE code LIKE 'C01%')
LIMIT 2
Im trying to extract a name from a column that return a big clob like this one:
{
"idStatus":6,
"atrasoSLA":0.0,
"atrasoSLAStr":"00:00",
"baseReports":false,
"idItemTrabalho":10019,
"portfolio":"Segurança",
"servicoRelacionado":"ATIVOS (SEGURANÇA)",
"idStatusControleSla":8,
"dataFinalUltimoControleSla":"Sep 3, 2018 11:55:12 AM",
"grupoExecutor":"Segurança",
"idGrupoExecutor":71,
"contrato":"Central de Serviços - SEFAZ-MA",
"dataHoraCaptura":"Sep 3, 2018 11:42:32 AM",
"dataHoraFim":"Sep 3, 2018 11:55:12 AM",
"dataHoraInicio":"Sep 3, 2018 11:42:19 AM",
"dataHoraInicioSLA":"Sep 3, 2018 11:42:19 AM",
"dataHoraInicioSLAStr":"09/03/2018 11:42 AM",
"dataHoraLimite":"Sep 4, 2018 7:42:00 AM",
"dataHoraLimiteStr":"09/04/2018 07:42 AM",
"dataHoraSolicitacao":"Sep 3, 2018 11:42:19 AM",
"dataHoraSolicitacaoStr":"09/03/2018 11:42 AM",
"demanda":"Requisição",
"descricao":"\u003cp\u003eRafael reportou que o CITSMART homologa\u0026ccedil;\u0026atilde;o\u0026nbsp;esta\u0026nbsp;offline.\u0026nbsp;\u003c/p\u003e\n",
"descricaoSemFormatacao":"Rafael reportou que o CITSMART homologação esta offline. ",
"descrSituacao":"citcorpore.comum.fechada",
"detalhamentoCausa":"\u003cp\u003eO incidente foi causado devido o servidor de aplica\u0026ccedil;\u0026otilde;es do CITSMART estar em DHCP, com isso o IP 10.1.1.247 foi alterado para outro IP causando a falha na comunica\u0026ccedil;\u0026atilde;o com o APACHE.\u003c/p\u003e\n",
"emailcontato":"rafael.feitosa#sefaz.ma.gov.br",
"emailResponsavel":"nilson#sefaz.ma.gov.br",
"enviaEmailAcoes":"S",
"enviaEmailCriacao":"S",
"enviaEmailFinalizacao":"S",
"faseAtual":"Execução",
"grupoNivel1":"SDNIVEL1",
"idAcordoNivelServico":8,
"idCalendario":2,
"idContatoSolicitacaoServico":1844,
"idContrato":2,
"idFaseAtual":2,
"idOrigem":1,
"idPrioridade":5,
"idServico":70,
"idServicoContrato":61,
"idSolicitacaoServico":1559,
"idSolicitante":2220,
"idTipoDemandaServico":1,
"idUnidade":104,
"idTarefaEncerramento":10019,
"impacto":"B",
"nomecontato":"Rafael Brito Feitosa",
"nomeServico":"ATIVOS (SEGURANÇA) - Análise LOGs/Desempenho/Capacidade/Disponibilidade",
"nomeTarefa":"Atender solicitacao",
"nomeUnidadeResponsavel":"COTEC",
"observacao":" ",
"origem":"Central de Serviços",
"prazoCapturaHH":0,
"prazoCapturaMM":0,
"prazoHH":8,
"prazoMM":0,
"prioridade":"5",
"responsavel":"Nilson Roniery da Silva Vieira (COTEC)",
"resposta":"\u003cp\u003ePara solucionar este foi inserido um IP fixo da DMZ EXTERNA e mais o seguintes passos:\u003c/p\u003e\n\n\u003col\u003e\n\t\u003cli\u003eAdicionado IP fixo na m\u0026aacute;quina, sendo ele: 172.20.1.55;\u003c/li\u003e\n\t\u003cli\u003eForam criadas regras no Firewall:\n\t\u003cul\u003e\n\t\t\u003cli\u003e172.20.1.55 --\u0026gt; 10.1.1.56 (BD) porta 1521\u003c/li\u003e\n\t\t\u003cli\u003e172.20.1.55 -\u0026gt;\u0026gt;\u0026nbsp; INTERNET portas 80 e 443\u003c/li\u003e\n\t\t\u003cli\u003eVPN CIT --\u0026gt;\u0026nbsp;172.20.1.55\u003c/li\u003e\n\t\u003c/ul\u003e\n\t\u003c/li\u003e\n\t\u003cli\u003eForam alterados os proxies no apache conforme imagem abaixo\u003cimg src\u003d\"/citsmart/galeriaImagens/1/2/439.png\" style\u003d\"height:105px; width:884px\" /\u003e\u003c/li\u003e\n\t\u003cli\u003eFoi restartado o servi\u0026ccedil;o do citsmart no AS.\u0026nbsp;\n\t\u003cul\u003e\n\t\t\u003cli\u003e\u003cem\u003e#\u0026nbsp;/etc/init.d/citsmart_itsm stop\u003c/em\u003e\u003c/li\u003e\n\t\t\u003cli\u003e\u003cem\u003e# /etc/init.d/citsmart_itsm start\u003c/em\u003e\u003c/li\u003e\n\t\u003c/ul\u003e\n\t\u003c/li\u003e\n\u003c/ol\u003e\n",
"seqReabertura":0,
"servico":"ATIVOS (SEGURANÇA) - Análise LOGs/Desempenho/Capacidade/Disponibilidade",
"situacaoSLA":"A",
"slaACombinar":"N",
"solicitante":"Rafael Brito Feitosa",
"solicitanteUnidade":"Rafael Brito Feitosa",
"solucaoTemporaria":"N",
"telefonecontato":"Não disponível.",
"tempoAtendimentoHH":0,
"tempoAtendimentoMM":12,
"tempoAtrasoHH":0,
"tempoAtrasoMM":0,
"tempoCapturaHH":0,
"tempoCapturaMM":0,
"tempoCapturaSS":13,
"tempoDecorridoHH":0,
"tempoDecorridoMM":0,
"urgencia":"B",
"ordernacao":0,
"usuarioDto":{
"idUsuario":635,
"idEmpregado":635,
"idPerfilAcessoUsuario":6,
"idEmpresa":1,
"login":"sefaz.ma.gov.br\\034013",
"nomeUsuario":"Nilson Roniery da Silva Vieira",
"senha":"f04534e4998415904454ae1ceb2040fa05bf548e",
"status":"A",
"ldap":"S",
"email":"nilson#sefaz.ma.gov.br",
"ldapGroupId":1,
"fromToken":false
},
"idResponsavel":635,
"idGrupoAtual":71,
"idGrupoNivel1":2,
"idTarefa":10019,
"grupoAtual":"Segurança"
}
In this case the name i want is "Nilson Roniery da Silva"
And the only thing i know is that the names always come after the "nomeUsuario":" and at the end have a "
so, how can i make a select that only brings me the name between "nomeUsuario":" and "
You should be able to use json_value to parse this clob:
SELECT JSON_VALUE(mycolumn, '$.usuarioDto.nomeUsuario') FROM mytable
I have created a keywords.csv file in my resource folder in order to automatically fetch the data from all the products listed in there. This works perfectly. The first column of the .csv-file is named "keywords". The keywords are based on a unique number representing the product category (i.e., high pressure washer, floor cleaner, ...). I am using this number to adjust the API URL where I am fetching all the data from.
I would like to print out the product category. Because there is no product category defined in the API where I am fetching the data from, I would like to print out the second column of my .csv-file, being "keywordtype". The "keywordtype" should be printed to the products with the same "keyword".
Secondly, I've been googling around but couldn't find a solution for replacing multiple characters in one string. Is there a way of making some sort of dictionairy so that "\u00e9" gets replaced with "é", "\u00b2" with "²", etc. I tried using .replace('text','') and .split(), but for multiple characters this isn't sufficient.
What I tried
I tried calling the "keywordtype" in the def parse(self, response):, but I don't know how to make it check which keyword is being used. Therefore it keeps printing the same "keywordtype" over and over again.
...
def parse(self, response):
#Calling the "category" from the keywords.csv file -- THIS IS NOT WORKING
with open(os.path.join(os.path.dirname(__file__), "../resources/keywords.csv")) as search_keywords:
for keyword in csv.DictReader(search_keywords):
category=keyword["keywordtype"]
...
FULL CODE #1
Spider.py
# -*- coding: utf-8 -*-
"""
Created on Mon Aug 12 22:44:35 2019
#author: bergs
"""
# -*- coding: utf-8 -*-
import scrapy
from krc.items import KrcItem
import json
import os
import csv
import time
import datetime
class KRCSpider(scrapy.Spider):
name = "krc_spider"
allowed_domains = ["kaercher.com"]
start_urls = ['https://www.kaercher.com/api/v1/products/search/shoppableproducts/partial/20035386?page=1&size=8&isocode=nl-NL']
def start_requests(self):
"""Read keywords from keywords file amd construct the search URL"""
with open(os.path.join(os.path.dirname(__file__), "../resources/keywords.csv")) as search_keywords:
for keyword in csv.DictReader(search_keywords):
search_text=keyword["keyword"]
category = keyword["keywordtype"]
url="https://www.kaercher.com/api/v1/products/search/shoppableproducts/partial/{0}?page=1&size=8&isocode=nl-NL".format(
search_text)
# The meta is used to send our search text into the parser as metadata
yield scrapy.Request(url, callback = self.parse, meta = {"search_text": search_text, "category": category})
def parse(self, response):
category = response.meta["category"]
current_page = response.meta.get("page", 1)
next_page = current_page + 1
#Printing the timestamp when fetching the data, using default timezone from the requesting machine
ts = time.time()
timestamp = datetime.datetime.fromtimestamp(ts).strftime('%d-%m-%Y %H:%M:%S')
#Defining the items
item = KrcItem()
data = json.loads(response.text)
for company in data.get('products', []):
item["productid"] = company["id"]
item["category"] = category
item["name"] = company["name"]
item["description"] = company["description"]
item["price"] = company["priceFormatted"].replace("\u20ac","").strip()
item["timestamp"] = timestamp
yield item
#Checking whether "isTruncated" is true (boolean), if so, next page will be triggered
if data["isTruncated"]:
yield scrapy.Request(
url="https://www.kaercher.com/api/v1/products/search/shoppableproducts/partial/20035386?page={page}&size=8&isocode=nl-NL".format(page=next_page),
callback=self.parse,
meta={'page': next_page},
)
FULL CODE #2
# -*- coding: utf-8 -*-
"""
Created on Mon Aug 12 22:44:35 2019
#author: bergs
"""
# -*- coding: utf-8 -*-
import scrapy
from krc.items import KrcItem
import json
import os
import csv
import time
import datetime
class KRCSpider(scrapy.Spider):
name = "krc_spider"
allowed_domains = ["kaercher.com"]
start_urls = ['https://www.kaercher.com/api/v1/products/search/shoppableproducts/partial/20035386?page=1&size=8&isocode=nl-NL']
def start_requests(self):
"""Read keywords from keywords file amd construct the search URL"""
with open(os.path.join(os.path.dirname(__file__), "../resources/keywords.csv")) as search_keywords:
for keyword in csv.DictReader(search_keywords):
search_text = keyword["keyword"]
url="https://www.kaercher.com/api/v1/products/search/shoppableproducts/partial/{0}?page=1&size=8&isocode=nl-NL".format(
search_text)
# The meta is used to send our search text into the parser as metadata
yield scrapy.Request(url, callback = self.parse, meta = {"search_text": search_text})
def parse(self, response):
current_page = response.meta.get("page", 1)
next_page = current_page + 1
#Printing the timestamp when fetching the data, using default timezone from the requesting machine
ts = time.time()
timestamp = datetime.datetime.fromtimestamp(ts).strftime('%d-%m-%Y %H:%M:%S')
#Defining the items
item = KrcItem()
data = json.loads(response.text)
for company in data.get('products', []):
item["productid"] = company["id"]
item["category"] = company["url"].split('/')[-2].replace('-',' ')
item["name"] = company["name"]
item["description"] = company["description"]
item["price"] = company["priceFormatted"].replace("\u20ac","").strip()
item["timestamp"] = timestamp
yield item
#Checking whether "isTruncated" is true (boolean), if so, next page will be triggered
if data["isTruncated"]:
yield scrapy.Request(
url="https://www.kaercher.com/api/v1/products/search/shoppableproducts/partial/20035386?page={page}&size=8&isocode=nl-NL".format(page=next_page),
callback=self.parse,
meta={'page': next_page},
)
Keywords.csv
keyword,keywordtype
20035386,Hogedrukreiniger
20035424,Window Vacs
items.py
import scrapy
class KrcItem(scrapy.Item):
productid=scrapy.Field()
name=scrapy.Field()
description=scrapy.Field()
price=scrapy.Field()
producttype=scrapy.Field()
timestamp=scrapy.Field()
category=scrapy.Field()
pass
Results.json #1
[
{"productid": 10491477, "category": "Window Vacs", "name": "WV 6 + KV 4(wit)", "description": "De vibrerende accu-wisser KV 4 inclusief elektrische bevochtiging, vibratie, tweede wisdoek en meer accessoires maakt het vuil los - en de Window Vac WV 6 zuigt het achterblijvende water af.", "price": "169,95", "timestamp": "18-08-2019 15:53:30"},
{"productid": 10491478, "category": "Window Vacs", "name": "WV 6 + KV 4", "description": "Schone vensters in een handomdraai: met de vibrerende Window Vac KV 4 inclusief elektrische watervoorziening en vibratie maakt u het vuil eerst los en daarna zuigt u met de Window Vac WV 6 het vuile water af.", "price": "159,95", "timestamp": "18-08-2019 15:53:30"},
{"productid": 10491479, "category": "Window Vacs", "name": "WV 6 Plus (wit)", "description": "In een set met 2 wisdoeken voor binnen en buiten, sproeiflacon, reinigingsmiddel en vuilkrabber: de Window Vac WV 6 Premium met innovatieve zuigstrip is nog flexibeler in gebruik.", "price": "109,95", "timestamp": "18-08-2019 15:53:30"},
{"productid": 10491480, "category": "Window Vacs", "name": "WV 6 Plus", "description": "Met innovatieve striptechnologie en nog flexibeler in gebruik: de Window Vac WV 6 Plus voor streepvrij schone vensters in een mum van tijd.", "price": "99,95", "timestamp": "18-08-2019 15:53:30"},
{"productid": 10491481, "category": "Window Vacs", "name": "WV 2 + KV4 (wit)", "description": "Sterk duo voor streepvrij schone vensters: de vibrerende accu-wisser KV 4 inclusief elektrische bevochtiging, vibratie, tweede wisdoek en meer extra's, en de Window Vac WV 2.", "price": "139,95", "timestamp": "18-08-2019 15:53:30"},
{"productid": 10491482, "category": "Window Vacs", "name": "WV 2 + KV 4", "description": "Voor moeiteloos ramen lappen: de vibrerende accu-wisser KV 4 inclusief elektrische watervoorziening en trilfunctie maakt het vuil los - en de Window Vac WV 2 zuigt het vuile water af.", "price": "129,95", "timestamp": "18-08-2019 15:53:30"},
{"productid": 10475755, "category": "Window Vacs", "name": "KV 4 Premium", "description": "Inclusief tweede wisdoek: de Window Vac KV 4 met elektrische watervoorziening en ondersteunende trilfunctie maakt het vuil moeiteloos los van alle gladde oppervlakken.", "price": "89,95", "timestamp": "18-08-2019 15:53:30"},
{"productid": 10475756, "category": "Window Vacs", "name": "KV 4", "description": "De vibrerende accu-wisser KV 4 maakt vuil moeiteloos los van gladde oppervlakken. Door het elektrisch aangebrachte water en de trillingen wordt handmatig schrobben overbodig.", "price": "79,95", "timestamp": "18-08-2019 15:53:30"},
{"productid": 10461927, "category": "Hogedrukreiniger", "name": "K 7 Premium Full Control Plus Home", "description": "Inclusief slanghaspel en Home Kit: de K 7 Premium Full Control Plus Home. U kunt de juiste druk instellen met de +/- knoppen en de LCD-scherm op het hogedrukpistool van de hogedrukreiniger.", "price": "699,95", "timestamp": "18-08-2019 15:53:30"},
{"productid": 10453128, "category": "Hogedrukreiniger", "name": "K 7 Premium Full Control Plus", "description": "Inclusief slanghaspel: de K 7 Premium Full Control Plus. U kunt de juiste druk instellen met de +/- knoppen en het LCD-scherm op het hogedrukpistool van de hogedrukreiniger.", "price": "639,95", "timestamp": "18-08-2019 15:53:30"},
{"productid": 10461929, "category": "Hogedrukreiniger", "name": "K 7 Full Control Plus Home", "description": "De ideale hogedrukreiniger voor regelmatig gebruik op hardnekkig vuil. K 7 Full Control Plus Home met Home Kit en hogedrukpistool inclusief +/- knoppen voor drukregeling en LCD-scherm.", "price": "639,95", "timestamp": "18-08-2019 15:53:30"},
{"productid": 10442692, "category": "Hogedrukreiniger", "name": "K 7 Full Control Plus", "description": "De K 7 Full Control Plus hogedrukreiniger van K\u00e4rcher inclusief hogedrukpistool met drukregeling en dosering van reinigingsmiddel met een druk op de knop, en een LCD-scherm waarop de druk wordt weergegeven.", "price": "579,95", "timestamp": "18-08-2019 15:53:30"},
{"productid": 10462452, "category": "Hogedrukreiniger", "name": "K 5 Premium Full Control Plus Home", "description": "K 5 Premium Full Control Plus Home hogedrukreiniger met +/- knoppen voor drukregeling en LCD-scherm op het hogedrukpistool. Inclusief Home Kit, 3-in-1 Multi Jet spuitlans en slanghaspel.", "price": "499,95", "timestamp": "18-08-2019 15:53:30"},
{"productid": 10442700, "category": "Hogedrukreiniger", "name": "K 5 Premium Full Control Plus", "description": "Voor regelmatig verwijderen van matige verontreiniging op auto's, stenen muren en fietsen. K 5 Premium Full Control Plus hogedrukreiniger inclusief hogedrukpistool met +/- knoppen voor drukregeling en LCD-scherm.", "price": "449,95", "timestamp": "18-08-2019 15:53:30"},
{"productid": 10430946, "category": "Hogedrukreiniger", "name": "K 5 Full Control", "description": "Hogedrukreiniger met drukindicator op het hogedrukpistool - voor de juiste druk op elk oppervlak. Ideaal voor regelmatig gebruik op matige verontreiniging. Oppervlakteprestatie van 40 m\u00b2/u.", "price": "379,95", "timestamp": "18-08-2019 15:53:30"},
{"productid": 10441978, "category": "Hogedrukreiniger", "name": "K 4 Premium Full Control Home", "description": "Hogedrukreiniger met drukindicator op het hogedrukpistool \u2013 voor de juiste druk op elk oppervlak. Inclusief slanghaspel en Home Kit. Ideaal voor een gemiddelde vervuilingsgraad. Oppervlakteprestatie van 30 m\u00b2/u.", "price": "369,95", "timestamp": "18-08-2019 15:53:30"}
]
Results.json #2
[
{"productid": 10461927, "category": "hogedrukreinigers", "name": "K 7 Premium Full Control Plus Home", "description": "Inclusief slanghaspel en Home Kit: de K 7 Premium Full Control Plus Home. U kunt de juiste druk instellen met de +/- knoppen en de LCD-scherm op het hogedrukpistool van de hogedrukreiniger.", "price": "699,95", "timestamp": "18-08-2019 16:43:57"},
{"productid": 10453128, "category": "hogedrukreinigers", "name": "K 7 Premium Full Control Plus", "description": "Inclusief slanghaspel: de K 7 Premium Full Control Plus. U kunt de juiste druk instellen met de +/- knoppen en het LCD-scherm op het hogedrukpistool van de hogedrukreiniger.", "price": "639,95", "timestamp": "18-08-2019 16:43:57"},
{"productid": 10461929, "category": "hogedrukreinigers", "name": "K 7 Full Control Plus Home", "description": "De ideale hogedrukreiniger voor regelmatig gebruik op hardnekkig vuil. K 7 Full Control Plus Home met Home Kit en hogedrukpistool inclusief +/- knoppen voor drukregeling en LCD-scherm.", "price": "639,95", "timestamp": "18-08-2019 16:43:57"},
{"productid": 10442692, "category": "hogedrukreinigers", "name": "K 7 Full Control Plus", "description": "De K 7 Full Control Plus hogedrukreiniger van K\u00e4rcher inclusief hogedrukpistool met drukregeling en dosering van reinigingsmiddel met een druk op de knop, en een LCD-scherm waarop de druk wordt weergegeven.", "price": "579,95", "timestamp": "18-08-2019 16:43:57"},
{"productid": 10462452, "category": "hogedrukreinigers", "name": "K 5 Premium Full Control Plus Home", "description": "K 5 Premium Full Control Plus Home hogedrukreiniger met +/- knoppen voor drukregeling en LCD-scherm op het hogedrukpistool. Inclusief Home Kit, 3-in-1 Multi Jet spuitlans en slanghaspel.", "price": "499,95", "timestamp": "18-08-2019 16:43:57"},
{"productid": 10442700, "category": "hogedrukreinigers", "name": "K 5 Premium Full Control Plus", "description": "Voor regelmatig verwijderen van matige verontreiniging op auto's, stenen muren en fietsen. K 5 Premium Full Control Plus hogedrukreiniger inclusief hogedrukpistool met +/- knoppen voor drukregeling en LCD-scherm.", "price": "449,95", "timestamp": "18-08-2019 16:43:57"},
{"productid": 10430946, "category": "hogedrukreinigers", "name": "K 5 Full Control", "description": "Hogedrukreiniger met drukindicator op het hogedrukpistool - voor de juiste druk op elk oppervlak. Ideaal voor regelmatig gebruik op matige verontreiniging. Oppervlakteprestatie van 40 m\u00b2/u.", "price": "379,95", "timestamp": "18-08-2019 16:43:57"},
{"productid": 10441978, "category": "hogedrukreinigers", "name": "K 4 Premium Full Control Home", "description": "Hogedrukreiniger met drukindicator op het hogedrukpistool \u2013 voor de juiste druk op elk oppervlak. Inclusief slanghaspel en Home Kit. Ideaal voor een gemiddelde vervuilingsgraad. Oppervlakteprestatie van 30 m\u00b2/u.", "price": "369,95", "timestamp": "18-08-2019 16:43:57"},
{"productid": 10491477, "category": "window vacs", "name": "WV 6 + KV 4(wit)", "description": "De vibrerende accu-wisser KV 4 inclusief elektrische bevochtiging, vibratie, tweede wisdoek en meer accessoires maakt het vuil los - en de Window Vac WV 6 zuigt het achterblijvende water af.", "price": "169,95", "timestamp": "18-08-2019 16:43:57"},
{"productid": 10491478, "category": "window vacs", "name": "WV 6 + KV 4", "description": "Schone vensters in een handomdraai: met de vibrerende Window Vac KV 4 inclusief elektrische watervoorziening en vibratie maakt u het vuil eerst los en daarna zuigt u met de Window Vac WV 6 het vuile water af.", "price": "159,95", "timestamp": "18-08-2019 16:43:57"},
{"productid": 10491479, "category": "window vacs", "name": "WV 6 Plus (wit)", "description": "In een set met 2 wisdoeken voor binnen en buiten, sproeiflacon, reinigingsmiddel en vuilkrabber: de Window Vac WV 6 Premium met innovatieve zuigstrip is nog flexibeler in gebruik.", "price": "109,95", "timestamp": "18-08-2019 16:43:57"},
{"productid": 10491480, "category": "window vacs", "name": "WV 6 Plus", "description": "Met innovatieve striptechnologie en nog flexibeler in gebruik: de Window Vac WV 6 Plus voor streepvrij schone vensters in een mum van tijd.", "price": "99,95", "timestamp": "18-08-2019 16:43:57"},
{"productid": 10491481, "category": "window vacs", "name": "WV 2 + KV4 (wit)", "description": "Sterk duo voor streepvrij schone vensters: de vibrerende accu-wisser KV 4 inclusief elektrische bevochtiging, vibratie, tweede wisdoek en meer extra's, en de Window Vac WV 2.", "price": "139,95", "timestamp": "18-08-2019 16:43:57"},
{"productid": 10491482, "category": "window vacs", "name": "WV 2 + KV 4", "description": "Voor moeiteloos ramen lappen: de vibrerende accu-wisser KV 4 inclusief elektrische watervoorziening en trilfunctie maakt het vuil los - en de Window Vac WV 2 zuigt het vuile water af.", "price": "129,95", "timestamp": "18-08-2019 16:43:57"},
{"productid": 10475755, "category": "window vacs", "name": "KV 4 Premium", "description": "Inclusief tweede wisdoek: de Window Vac KV 4 met elektrische watervoorziening en ondersteunende trilfunctie maakt het vuil moeiteloos los van alle gladde oppervlakken.", "price": "89,95", "timestamp": "18-08-2019 16:43:57"},
{"productid": 10475756, "category": "window vacs", "name": "KV 4", "description": "De vibrerende accu-wisser KV 4 maakt vuil moeiteloos los van gladde oppervlakken. Door het elektrisch aangebrachte water en de trillingen wordt handmatig schrobben overbodig.", "price": "79,95", "timestamp": "18-08-2019 16:43:57"},
{"productid": 10430942, "category": "hogedrukreinigers", "name": "K 4 Premium Full Control", "description": "Hogedrukreiniger met drukindicator op het hogedrukpistool - voor de juiste druk op elk oppervlak. Inclusief slanghaspel. Ideaal voor matige verontreiniging. Oppervlakteprestatie van 30 m\u00b2/u.", "price": "319,95", "timestamp": "18-08-2019 16:43:57"},
{"productid": 10441943, "category": "hogedrukreinigers", "name": "K 4 Full Control Home", "description": "Hogedrukreiniger met drukindicator op het hogedrukpistool \u2013 voor de juiste druk op elk oppervlak. Ideaal voor de incidentele reiniging van een gemiddelde vervuiling. Inclusief Home Kit. Oppervlakteprestatie van 30 m\u00b2/u.", "price": "339,95", "timestamp": "18-08-2019 16:43:57"},
{"productid": 10430938, "category": "hogedrukreinigers", "name": "K 4 Full Control", "description": "Hogedrukreiniger met drukindicator op het hogedrukpistool \u2013 voor de juiste druk op elk oppervlak. Ideaal voor de incidentele reiniging van een gemiddelde vervuiling. Oppervlakteprestatie van 30 m\u00b2/u.", "price": "289,95", "timestamp": "18-08-2019 16:43:57"},
{"productid": 10501457, "category": "hogedrukreinigers", "name": "K 3 Full Control Home T150", "description": "Perfect schoon rondom huis dankzij de Home Kit inclusief oppervlaktereiniger en reinigingsmiddelen. Het pistool van de K 3 Full Control Home hogedrukreiniger geeft het drukniveau aan.", "price": "209,95", "timestamp": "18-08-2019 16:43:57"},
{"productid": 10501459, "category": "hogedrukreinigers", "name": "K 3 Full Control", "description": "Ideale hogedrukreiniger voor incidenteel gebruik bij lichte vervuiling. De K 3 Full Control geeft de druk aan op het pistool. Zo kunt u voor elk oppervlak de juiste druk instellen.", "price": "179,95", "timestamp": "18-08-2019 16:43:57"},
{"productid": 10406424, "category": "hogedrukreinigers", "name": "K 3 Home", "description": "De K 3 Home hogedrukreiniger met Home Kit is ideaal voor incidenteel gebruik en voor de verwijdering van lichte vervuiling op bijvoorbeeld fietsen, tuinhekken en motorfietsen.", "price": "219,95", "timestamp": "18-08-2019 16:43:57"},
{"productid": 10461899, "category": "hogedrukreinigers", "name": "K 2 Premium Full Control Home", "description": "De K 2 Premium Full Control Home hogedrukreiniger van K\u00e4rcher inclusief Home Kit en met gerichte drukregeling. Ideaal voor oppervlakken rondom het huis en op lichtere vervuiling.", "price": "179,95", "timestamp": "18-08-2019 16:43:57"},
{"productid": 10464555, "category": "hogedrukreinigers", "name": "K 2 HOME T150", "description": "De K 2 Home op wieltjes is speciaal ontworpen voor incidenteel gebruik en lichte vervuiling. Dankzij de Home Kit houdt u met de hogedrukreiniger ook grotere oppervlakken rondom het huis eenvoudig schoon.", "price": "129,95", "timestamp": "18-08-2019 16:43:57"},
{"productid": 10479573, "category": "hogedrukreinigers", "name": "K 2 Basic", "description": "De 'K2 Basic' hogedrukreiniger is ideaal voor incidenteel gebruik en verwijdering van normale vervuiling rondom de woning (bijv. fietsen, tuingereedschap, tuinmeubilair).", "price": "79,95", "timestamp": "18-08-2019 16:43:57"},
{"productid": 10497774, "category": "hogedrukreinigers", "name": "K 7 Compact Home", "description": "Compact en handzaam: de hogedrukreiniger K 7 Compact met watergekoelde motor. Voor frequent gebruik en sterke vervuiling, bijvoorbeeld op straten, in zwembaden, op fietsen of grote auto's.", "price": "589,95", "timestamp": "18-08-2019 16:43:57"},
{"productid": 10497770, "category": "hogedrukreinigers", "name": "K 5 Compact Home", "description": "De K5 Compact Home met watergekoelde motor neemt weinig ruimte in en kan comfortabel worden vervoerd. Hogedrukreiniger inclusief Home Kit met oppervlaktereiniger T 350 en steen- en gevelreiniger.", "price": "379,95", "timestamp": "18-08-2019 16:43:57"},
{"productid": 10497773, "category": "hogedrukreinigers", "name": "K 5 Compact", "description": "Inclusief innovatieve slangopberging: de eenvoudig te vervoeren en op te bergen hogedrukreiniger K 5 Compact voor regelmatig gebruik bij middelzware vervuiling. Oppervlakteprestatie 40 m\u00b2/u.", "price": "329,95", "timestamp": "18-08-2019 16:43:57"},
{"productid": 10497765, "category": "hogedrukreinigers", "name": "K 4 Compact Home", "description": "Eenvoudig te vervoeren en snel op te bergen: de K 4 Compact voor incidenteel gebruik bij middelzware vervuiling. Inclusief telescoopgreep en watergekoelde motor. Oppervlakteprestatie 30 m\u00b2/u.", "price": "289,95", "timestamp": "18-08-2019 16:43:57"},
{"productid": 10497768, "category": "hogedrukreinigers", "name": "K 4 Compact", "description": "Eenvoudig te vervoeren en snel op te bergen: de K 4 Compact voor incidenteel gebruik bij middelzware vervuiling. Inclusief telescoopgreep en watergekoelde motor. Oppervlakteprestatie 30 m\u00b2/u.", "price": "249,95", "timestamp": "18-08-2019 16:43:57"},
{"productid": 10472949, "category": "hogedrukreinigers", "name": "K 2 Battery Set", "description": "Kan overal worden ingezet waar er geen stopcontact beschikbaar is: De K2 hogedrukreiniger met accuaandrijving, inclusief 36 V li-ionaccu en oplader. Ideaal voor uiteenlopende toepassingen.", "price": "399,95", "timestamp": "18-08-2019 16:43:57"},
{"productid": 10472950, "category": "hogedrukreinigers", "name": "K 2 Battery", "description": "De hogedrukreiniger met accuaandrijving voor flexibel schoonmaken zonder stopcontact. Ideaal voor uiteenlopende toepassingen rondom het huis. Accu en oplader worden niet meegeleverd.", "price": "199,95", "timestamp": "18-08-2019 16:43:58"}
]
You need to yield category from CSV file to the parse using request.meta:
def start_requests(self):
"""Read keywords from keywords file amd construct the search URL"""
with open(os.path.join(os.path.dirname(__file__), "../resources/keywords.csv")) as search_keywords:
for keyword in csv.DictReader(search_keywords):
search_text=keyword["keyword"]
category = keyword["keywordtype"]
url="https://www.kaercher.com/api/v1/products/search/shoppableproducts/partial/{0}?page=1&size=8&isocode=nl-NL".format(
search_text)
# The meta is used to send our search text into the parser as metadata
yield scrapy.Request(url, callback = self.parse, meta = {"search_text": search_text, "category": category})
def parse(self, response):
category = response.meta["category"]
...
UPDATE
If you want to have your category on next page you need to use .meta again:
#Checking whether "isTruncated" is true (boolean), if so, next page will be triggered
if data["isTruncated"]:
yield scrapy.Request(
url="https://www.kaercher.com/api/v1/products/search/shoppableproducts/partial/20035386?page={page}&size=8&isocode=nl-NL".format(page=next_page),
callback=self.parse,
meta={'page': next_page, "category": category},
)