PIG: how filter a field with special characteres - apache-pig

I need filter a list of accented words, because is Portuguese.
The load is working:
arq = LOAD '/user/cloudera/file1.5.txt' USING PigStorage(';') as
(time:chararray,
cd_rastreio:chararray,
hora:chararray,
detalhe:chararray,
local:chararray,
destino:chararray);
I need make a filter like this:
[...]
detalhe IN (
'A entrega não pode ser efetuada - Carteiro não atendido',
'A entrega não pode ser efetuada - Cliente desconhecido no local',
'A entrega não pode ser efetuada - Cliente mudou-se')
But it returns no lines, I believe is because of "ã".
What Could I do?

I am able to use the filter you described above (in Pig version 0.16), for example:
B = FILTER arq BY detalhe IN (
'A entrega não pode ser efetuada - Carteiro não atendido',
'A entrega não pode ser efetuada - Cliente desconhecido no local',
'A entrega não pode ser efetuada - Cliente mudou-se');
dump
(A entrega não pode ser efetuada - Carteiro não atendido)
(A entrega não pode ser efetuada - Cliente desconhecido no local)
(A entrega não pode ser efetuada - Cliente mudou-se)
Can you inspect your file in HDFS to make sure it still has the character ã and hasn't been scrubbed by a previous process?
Regardless, you can write a regex expression and use MATCHES to filter without using the ã character, for example, with . in place of ã:
B = FILTER arq BY detalhe MATCHES
'A entrega n.o pode ser efetuada - (Carteiro n.o atendido|Cliente desconhecido no local|Cliente mudou-se)';
dump
(A entrega não pode ser efetuada - Carteiro não atendido)
(A entrega não pode ser efetuada - Cliente desconhecido no local)
(A entrega não pode ser efetuada - Cliente mudou-se)

Related

I am extracting data from a Json in SQL and the data comes in lists, how do I get it from there?

thanks for reading it!
I am extracting data from a Json with operators in SQL, this is the code:
select lc.name as company,
lr.name as retailer,
ls.name as tienda,
sm.created_on as date,
sm.task_id as task_id,
ss.submission_data::json-> '¿Qué ''bancos'' se encuentran cerca de la tienda? De ser necesario, pregunta al gerente de tienda' AS banco,
ss.submission_data::json->> 'Indica la distancia aproximada entre la tienda y el ''banco'' más cercano' AS banco_distancia,
ss.submission_data::json->> '¿En la zona hay disponibilidad de ''transporte público''? Indica los tipos de transporte' AS transporte,
ss.submission_data::json->> 'Indica el precio estimado de un viaje en ''transporte público''' AS transporte_precio,
ss.submission_data::json->> '¿En la zona hay servicio de ''aplicaciones de envíos''?' AS app_envio
from submission_submissionmetadata sm
left join submission_submission ss on sm.submission_id = ss.id
left join location_store ls on ls.id = sm.store_id
left join location_retailer lr on lr.id = ls.retailer_id
left join location_company lc on lc.id = lr.company_id
where sm.brand_id = 293
order by date desc
In The results I get the columns bank, transport and app_ship come as in a list type, I have tried to use functions to flatten this but I have not been successful. Do you know what I can do?
https://www.postgresql.org/docs/current/functions-json.html
Waiting on your response to comment above, but you can see all the ways to extract json in the above doc. For instance,if you want only the first element of the app_envio array, it would be:
ss.submission_data::json #>> '{'¿En la zona hay servicio de ''aplicaciones de envíos''?',0}' AS app_envio

Determining the number of turtle-clusters

I have been trying to calculate the number of turtle-clusters in my World but I have been constantly faced with "You can't use GROUPE in a patch context, because GROUPE is turtle-only". I am using bits from the Segregation model of Schelling and adapting the Patch Clusters Example from the Model Library, but I have not been able to get the number of clusters by using the command show max [amas] of personnes after running the model.
The line causing problems is towards the bottom of the bloc and is:
(groupe = [groupe] of myself)]
Here is the full code:
extensions
[
GIS ; Extension GIS
]
globals
[
v
;vc
routes.data ; shapefile routes
Usol.data ; shapefile utilisation sol
average-happy ; pourcentage heureux
amas
]
patches-own
[
routes ; routes
routes? ; booléenne routes
Usol
Usol?
state ; état d'une patch
val ; valeur
cluster
]
turtles-own
[
groupe
unhappy?
counter
similar-nearby
other-nearby
total-nearby
happy
]
breed ; Création d'une classe
[
personnes
personne
]
to setup ; Déclaration de la procédure setup
ca ; Clear-all
reset-ticks ; Remet le compteur de ticks à zéro
set v 2
initialiserGIS ; Initialisation de la procédure initialiserGIS
creerHumains
do-plots
end ; Fin de la procédure setup
to initialiserGIS ; Déclaration de la procédure initialiserGIS
set Usol.data gis:load-dataset "utilisationDuSol.shp"
gis:apply-coverage Usol.data "LANDUSE" Usol
ask patches
[
set Usol? FALSE
]
ask patches gis:intersecting Usol.data
[
set Usol? TRUE
ifelse Usol = "residential"
[
set state 1
set pcolor green
]
[
set state 0
set pcolor grey
]
]
set routes.data gis:load-dataset "routesMtl.shp"
ask patches
[
set routes? FALSE
]
ask patches gis:intersecting routes.data
[
set routes? TRUE
set state 0 ; Empêche aux agents d'habiter sur les routes
set pcolor red ; Assigne la couleur rouge aux routes
]
end ; Fin de la procédure initialiser GIS
to creerHumains ; Déclaration de la procédure creerHumains
ask patches with [state = 1] ; Demande aux cellules résidentielles
[
set val random-float 1 ; Accorde une valeur aléatoire décimale entre 0 et 1 à la variable val
]
let vide 0.1 ; Initialisation de la variable locale vide avec la valeur 0.1
let limite1 (1 - vide) / v ; Initialisation de la variable locale limite
let residents patches with [state = 1 and val > vide] ; Initialisation de la variable locale residents
ask residents ; Demande aux residents
[
sprout-personnes 1 ; Primitive permettant à une patch de créer un agent sur toutes les patches résidentielles
] ; Permet de créer un nombre identique de personne de chaque groupe
let limiteList (list (pmin * (1 - vide )) ((1 - pmin) * (1 - vide)))
let i 0
let minVal vide
let maxVal 0
while [i <= v - 1]
[
set maxVal minVal + item i limiteList
ask personnes with [val <= maxVal and val > minVal]
[
set groupe i + 1
]
set minVal maxVal
set i i + 1
]
ask personnes ;ceci sert uniquement à attribuer une couleur différente à chaque groupe
[ ; Assigne la couleur et la forme appropriée en fonction du groupe
ifelse groupe = 1
[
set color red
set shape "house"
]
[
ifelse groupe = 2
[
set color blue
set shape "house"
]
[
ifelse groupe = 3
[
set color green
set shape "house"
]
[
ifelse groupe = 4
[
set color orange
set shape "house"
]
[
ifelse groupe = 5
[
set color black
set shape "house"
]
[
ifelse groupe = 6
[
set color brown
set shape "house"
]
[
ifelse groupe = 7
[
set color pink
set shape "house"
]
[
ifelse groupe = 8
[
set color white
set shape "house"
]
[
set color magenta
set shape "house"
]
]
]
]
]
]
]
]
]
end ; Fin de la procédure creerHumains
to move ; Déclaration de la procédure move
move-to one-of patches with [Usol = "residential"]
if any? other turtles-here
[
move ;; Continue jusqu'à tant qu'une patch soit trouvée
]
end
to update-variables ; Déclaration de la procédure update-variables ; permet de mettre à jour les valeurs
update-turtles
update-globals
end ; Fin de la procédure update-variables
to update-turtles ; Déclaration de la procédure update-turtles
ask turtles ; demande turtles
[ ; Test des patches voisines
set counter 0 ; réinitialise counter
set similar-nearby count (turtles-on neighbors) with [groupe = [groupe] of myself]
set other-nearby count (turtles-on neighbors) with [groupe != [groupe] of myself]
set total-nearby similar-nearby + other-nearby
ifelse groupe = 1
[
ifelse (Tmin >= similar-nearby) ; vérification si vc est supérieur ou égale aux voisins similaires
[ ; personne unhappy + move
set unhappy? TRUE
set happy 0
move
]
[ ; personne happy
set unhappy? FALSE
set happy 1
]
]
[
ifelse (Tmaj <= similar-nearby) ; vérification si vc est supérieur ou égale aux voisins similaires
[ ; personne unhappy + move
set unhappy? TRUE
set happy 0
move
]
[ ; personne happy
set unhappy? FALSE
set happy 1
]
]
]
end ; Fin de la procédure update-turtles
to update-globals ; Déclaration de la procédure update-globals
set average-happy mean [happy] of turtles ; Average-happy = moyenne des turtles happy
end ; Fin de la procédure update-globals
to go ; Déclaration de la procédure go
update-variables ; Vérifie l'état de chacunes des personnes et fait bouger celles nécessitant de l'être
tick ; Incrémente la valeur de la variable tick
if ticks >= 100 ; Si le nombre de ticks est supérieur ou égal à 100
[
stop ; Arrête le modèle
]
if c-unhappy = 0
[
stop ; Arrête le modèle si personne est unhappy
]
end ; Fin de la procédure go
to do-plots ; Déclaration de la procédure do-plots
plot average-happy * 100 ; Plot la moyenne des habitants étant happy
end ; Fin de la procédure do-plots
to-report c-happy ; Déclaration de la procédure de rapportage c-happy
report sum [happy] of turtles ; Rapporte le nombre de turtles étant happy
end ; Fin de la procédure de rapportage c-happy
to-report c-unhappy ; Déclaration de la procédure de rapportage c-unhappy
report ((count turtles) - (sum [happy] of turtles)) ; Rapporte le nombre de turtles étant unhappy
end ; Fin de la procédure de rapportage c-unhappy
;;; Q6
to find-clusters
loop
[
;; pick a random patch that isn't in a cluster yet
let seed one-of personnes with [cluster = nobody]
;; if we can't find one, then we're done!
if seed = nobody
[
show-clusters
stop
]
;; otherwise, make the patch the "leader" of a new cluster
;; by assigning itself to its own cluster, then call
;; grow-cluster to find the rest of the cluster
ask seed
[
set cluster self
grow-cluster
]
]
display
end
to grow-cluster ;; patch procedure
ask neighbors4 with [(cluster = nobody) and
(groupe = [groupe] of myself)]
[
set cluster [cluster] of myself
grow-cluster
]
end
;; once all the clusters have been found, this is called
;; to put numeric labels on them so the user can see
;; that the clusters were identified correctly
to show-clusters
let counter2 0
loop
[ ;; pick a random patch we haven't labeled yet
let p one-of patches with [plabel = ""]
if p = nobody
[
stop
]
;; give all patches in the chosen patch's cluster
;; the same label
ask p
[
ask personnes with [cluster = [cluster] of myself]
[
set amas counter2
]
]
set counter2 counter2 + 1
]
end

Recursive query to give a route in postgreSQL

I have to do a recursive function in pSQL to get the following query:
I have a table called tb_route with from_city and to_city
I have another column with the distance in km between different cities.
The table is builded recursively. I have to make a recursive CTE to show the route between two cities (i.e., from 'Santiago de compostela' to 'Saint Jean Pied de Port') showing the total km of the route and the cities where it goes through.
The output has to be something like this:
This is what I've tried:
WITH RECURSIVE cities AS (
SELECT *
FROM textil.tb_route
WHERE to_city_name = 'Santigo de Compostela'
UNION ALL
SELECT e.from_city, e.to_city, e.route, e.km
FROM textil.tb_route e
INNER JOIN cities tb_route ON tb_route.from_city_name = e.from_city
)
SELECT *
FROM cities;
And I had an error like:
ERROR: column e.from_city does not exist
LINE 8: ...JOIN cities tb_route ON tb_route.from_city_name = e.from_cit...
Table:
CREATE TABLE textil.tb_route
(
from_city_name CHARACTER VARYING(120) NOT NULL ,
to_city_name CHARACTER VARYING(120) NOT NULL ,
km_distance_num NUMERIC(5,2) NOT NULL ,
CONSTRAINT pk_route PRIMARY KEY (from_city_name, to_city_name)
);
Data:
INSERT INTO textil.tb_route VALUES
('Saint Jean Pied de Port','Roncesvalles',25.7),
('Somport','Jaca',30.5),
('Roncesvalles','Zubiri',21.5),
('Jaca','Arrés',25),
('Zubiri','Pamplona/Iruña',20.4),
('Arrés','Ruesta',28.7),
('Pamplona/Iruña','Puente la Reina/Gares',24),
('Ruesta','Sangüesa',21.8),
('Puente la Reina/Gares','Estella/Lizarra',22),
('Sangüesa','Monreal',27.25),
('Estella/Lizarra','Torres del Río',29),
('Monreal','Puente la Reina/Gares',31.1),
('Torres del Río','Logroño',20),
('Logroño','Nájera',29.6),
('Nájera','Santo Domingo de la Calzada',21),
('Santo Domingo de la Calzada','Belorado',22.7),
('Belorado','Agés',27.4),
('Agés','Burgos',23),
('Burgos','Hontanas',31.1),
('Hontanas','Boadilla del Camino',28.5),
('Boadilla del Camino','Carrión de los Condes',24.6),
('Carrión de los Condes','Terradillos de los Templarios',26.6),
('Terradillos de los Templarios','El Burgo Ranero',30.6),
('El Burgo Ranero','León',37.1),
('León','San Martín del Camino',25.9),
('San Martín del Camino','Astorga',24.2),
('Astorga','Foncebadón',25.9),
('Foncebadón','Ponferrada',27.3),
('Ponferrada','Villafranca del Bierzo',24.1),
('Villafranca del Bierzo','O Cebreiro',28.4),
('O Cebreiro','Triacastela',21.1),
('Triacastela','Sarria',18.3),
('Sarria','Portomarín',22.4),
('Portomarín','Palas de Rei',25),
('Palas de Rei','Arzúa',28.8),
('Arzúa','Pedrouzo',19.1),
('Pedrouzo','Santiago de Compostela',20),
('Bayona','Ustaritz',14.3),
('Ustaritz','Urdax',21.2),
('Urdax','Elizondo',18.8),
('Elizondo','Berroeta',9.7),
('Berroeta','Olagüe',20.4),
('Olagüe','Pamplona/Iruña',25),
('Irún','Hernani',26.6),
('Hernani','Tolosa',18.9),
('Tolosa','Zerain',33),
('Zerain','Salvatierra/Agurain',28),
('Salvatierra/Agurain','Vitoria/Gasteiz',27.4),
('Vitoria/Gasteiz','La Puebla de Arganzón',18.5),
('La Puebla de Arganzón','Haro',31),
('Haro','Santo Domingo de la Calzada',20),
('Bayona','Irún',33.8),
('Tolosa','Zegama',37.9),
('Zegama','Salvatierra/Agurain',20.1),
('La Puebla de Arganzón','Miranda del Ebro',22.3),
('Miranda del Ebro','Pancorbo',16.7),
('Pancorbo','Briviesca',23.4),
('Briviesca','Monasterio de Rodilla',19.8),
('Monasterio de Rodilla','Burgos',28.5);
```
Here I leave the solution I've get finally:
with recursive caminos(from_city_name, to_city_name, path, total_distance, terminar, ultima_ciudad) as (
-- Consulta base
select to_city_name
,'Saint Jean Pied de Port' -- Cambiar Destino
,concat(to_city_name, concat(' -> ', from_city_name))
,cast(km_distance_num as numeric(8,2))
,0 --No terminar
,from_city_name
from textil.tb_route
where to_city_name = 'Santiago de Compostela' -- Cambiar Origen
union all
-- Consulta recursiva
select caminos.from_city_name
,caminos.to_city_name
,concat(caminos.path, concat( ' -> ', tr.from_city_name))
,cast(caminos.total_distance + tr.km_distance_num as numeric(8,2))
,case when tr.from_city_name = caminos.to_city_name then 1 else 0 end
,tr.from_city_name
from caminos inner join textil.tb_route tr on tr.to_city_name = caminos.ultima_ciudad and caminos.terminar != 1
)
select from_city_name, to_city_name, path, total_distance
from caminos
where 1 = 1
and from_city_name = 'Santiago de Compostela' --Cambiar Origen
and ultima_ciudad = 'Saint Jean Pied de Port' -- Cambiar Destino
;
I understand your question as a graph-walking problem. As described in your questions, edges are directed (meaning that you can travel from from_city_name to to_city_name, but not the other way around).
Here is an approach using a recursive CTE. The idea is to start from a given city, and then follow all possible routes, while keeping track of the overall travel path in an arry. The recursion stops either when a circle is detected, or when the target city is reached. Then, the outer query filters on the successful paths (there may be none, one or several).
with recursive cte as (
select
from_city_name,
to_city_name,
km_distance_num,
array[from_city_name::text, to_city_name::text] path
from tb_route
where from_city_name = 'Saint Jean Pied de Port'
union all
select
r.from_city_name,
r.to_city_name,
c.km_distance_num + r.km_distance_num,
c.path || r.to_city_name::text
from tb_route r
inner join cte c on c.to_city_name = r.from_city_name
where
not r.to_city_name = any(c.path)
and c.from_city_name <> 'Santiago de Compostela'
)
select
path[1] from_city_name,
to_city_name,
km_distance_num,
array_to_string(path, ' > ') path
from cte
where to_city_name = 'Santiago de Compostela';
Demo on DB Fiddle:
from_city_name | to_city_name | km_distance_num | path
:---------------------- | :--------------------- | --------------: | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Saint Jean Pied de Port | Santiago de Compostela | 775.3 | Saint Jean Pied de Port > Roncesvalles > Zubiri > Pamplona/Iruña > Puente la Reina/Gares > Estella/Lizarra > Torres del Río > Logroño > Nájera > Santo Domingo de la Calzada > Belorado > Agés > Burgos > Hontanas > Boadilla del Camino > Carrión de los Condes > Terradillos de los Templarios > El Burgo Ranero > León > San Martín del Camino > Astorga > Foncebadón > Ponferrada > Villafranca del Bierzo > O Cebreiro > Triacastela > Sarria > Portomarín > Palas de Rei > Arzúa > Pedrouzo > Santiago de Compostela

First CLIPS code doesnt work

I wrote my first CLIPS code for a school project but I am not familiar with CLIPS (I use C# and python as main languages).
This is my code and the errors I am gettin:
(defrule determine-closing-date
(not (day-to-close ?))
(billing-size ?)
(unpaid-invoices-number ?)
=>
(if
(or
(< billing-size 1000000)
(< unpaid-invoices-number 1000000)
)
then (assert (day-to-close MtTh))
else (assert (day-to-close friday))
)
(defrule determine-billing-size ""
(not (billing-size ?))
(not (day-to-close ?))
=>
(printout t "¿Cuál es el tamaño de la facturacion?")
(assert (billing-size ?size (read))))
(defrule determine-unpaid-invoices-number ""
(not (unpaid-invoices-number ?))
(not (day-to-close ?))
=>
(printout t "¿Cuál es la cantidad de facturas no pagadas")
(assert (unpaid-invoices-number ?size (read))))
(defrule determine-friday-load ""
(day-to-close friday)
(not (friday-load ?))
=>
(printout t "¿Cuál es la carga de cierres para el viernes?")
(assert (friday-load ?load (read))))
(defrule determine-saturday-closing ""
(day-to-close friday)
(not(< friday-load 1000000))
=>
(assert (day-to-close saturday)))
(defrule day-to-close-conclulssion ""
(day-to-close ?)
=>
(if (eq day-to-close MtTh)
then (printout t "Se puede cerrar de Lunes a Jueves")
else (
if (eq day-to-close friday)
then (printout t "Se debe cerrar viernes.")
else (printout t "Se debe cerrar sabado.")
)
))
The errores are:
[ARGACCES5] Function < expected argument #1 to be of type integer or float
[PRCCODE3] Undefined variable size referenced in RHS of defrule.
[CSTRCPSR1] WARNING: Redefining defrule: determine-saturday-closing +j+j+j
[CSTRCPSR1] WARNING: Redefining defrule: day-to-close-conclulssion +j+j
Some suggest revisions:
(defrule determine-closing-date
(not (day-to-close ?))
(billing-size ?billing-size)
(unpaid-invoices-number ?unpaid-invoices-number)
=>
(if (or (< ?billing-size 1000000)
(< ?unpaid-invoices-number 1000000))
then (assert (day-to-close MtTh))
else (assert (day-to-close friday))))
(defrule determine-billing-size ""
(not (billing-size ?))
(not (day-to-close ?))
=>
; What is the size of the billing?
(printout t "¿Cuál es el tamaño de la facturacion? ")
(bind ?size (read))
(assert (billing-size ?size)))
(defrule determine-unpaid-invoices-number ""
(not (unpaid-invoices-number ?))
(not (day-to-close ?))
=>
; What is the amount of unpaid bills?
(printout t "¿Cuál es la cantidad de facturas no pagadas? ")
(bind ?size (read))
(assert (unpaid-invoices-number ?size)))
(defrule determine-friday-load ""
(day-to-close friday)
(not (friday-load ?))
=>
; What is the burden of closures for Friday?
(printout t "¿Cuál es la carga de cierres para el viernes? ")
(bind ?load (read))
(assert (friday-load ?load)))
(defrule determine-saturday-closing ""
?dtc <- (day-to-close friday)
(friday-load ?load&:(< ?load 1000000))
=>
(retract ?dtc)
(assert (day-to-close saturday)))
(defrule day-to-close-conclusion ""
(declare (salience -10))
(day-to-close ?day-to-close)
=>
(switch ?day-to-close
(case MtTh
; Can be closed from Monday to Thursday.
then (printout t "Se puede cerrar de Lunes a Jueves." crlf))
(case friday
; Must be closed Fridays.
then (printout t "Se debe cerrar viernes." crlf))
(default
; Must be closed Saturday.
then (printout t "Se debe cerrar sabado." crlf))))
And the output it produces:
CLIPS> Loading Buffer...
******
CLIPS> (reset)
CLIPS> (run)
¿Cuál es el tamaño de la facturacion? 10
¿Cuál es la cantidad de facturas no pagadas? 10
Se puede cerrar de Lunes a Jueves.
CLIPS> (reset)
CLIPS> (run)
¿Cuál es el tamaño de la facturacion? 3000000
¿Cuál es la cantidad de facturas no pagadas? 3000000
¿Cuál es la carga de cierres para el viernes? 10
Se debe cerrar sabado.
CLIPS> (reset)
CLIPS> (run)
¿Cuál es el tamaño de la facturacion? 3000000
¿Cuál es la cantidad de facturas no pagadas? 3000000
¿Cuál es la carga de cierres para el viernes? 3000000
Se debe cerrar viernes.
CLIPS>

Statement Error ORA-00923

i have a database that I have populated.
I am now trying to write a SELECT statement:
SELECT + FULL ( fst) full(fs) COUNT(DISTINCT r1.CLIENT_ID || e.CODE || fst.TAB_ID) NB_TAB FROM RATTACHEMENT_DART rd, FLUX_SORTANT fs, FS_TABLEAU fst, equipement e, REFERENTIEL r1, referentiel r2, referentiel pere;
When I try to run the above case statement I recieve the following error:
Erreur commençant à la ligne 4 de la commande : SELECT + FULL ( fst)
full(fs) COUNT(DISTINCT r1.CLIENT_ID || e.CODE || fst.TAB_ID) NB_TAB
FROM RATTACHEMENT_DART rd, FLUX_SORTANT fs, FS_TABLEAU fst, equipement
e, REFERENTIEL r1, referentiel r2, referentiel pere Erreur à la ligne
de commande : 4, colonne : 26 Rapport d'erreur : Erreur SQL :
ORA-00923: mot-clé FROM absent à l'emplacement prévu
00923. 00000 - "FROM keyword not found where expected"
*Cause:
*Action:
If you want to use hints you should use /*+...*/ as below
SELECT /*+ FULL( fst) full(fs)*/ COUNT(DISTINCT r1.CLIENT_ID || e.CODE || fst.TAB_ID) NB_TAB
FROM RATTACHEMENT_DART rd, FLUX_SORTANT fs,
FS_TABLEAU fst, equipement e,
REFERENTIEL r1, referentiel r2, referentiel pere;
See documentation