With Construct-Package: RepeatUntil total lenght of subconstruct - python-3.8

Let's say I have:
numResults (Int16ul)
resultItems[numResults]
where the resultItems is constructed like:
ID (does not always increase)
strLen
my_str[strLen]
Now I understand, that I have to use RepeatUntil but how the repeathandler is supposed to work swooshes right over my head.
Edit:
I now added debug print statements.
What I have (not working):
import construct as ct
from pprint import pprint
def repeat_handler( x, lst, ctx):
pprint(f'{x=}, {lst=}, {ctx=}')
pprint(f'{ctx.numResults=}, {x["ID"]=}')
return ctx.numResults==x["ID"]
format = ct.Struct(
'numResults' / ct.Int16ul,
'resultItem' / ct.RepeatUntil(repeat_handler,
ct.Struct(
'ID' / ct.Int16ul,
'strLen' / ct.Int16ul,
'my_str' / ct.PascalString(ct.Computed(ct.this.strLen) , 'ISO-8859-1'),
),
),
)
Please can you explain to me, how the repeat_handler is supposed to work, so that it iterates over all the resultItems?
Edit2: got it working for rising ID, but how to do without rising ID?
d = dict(numResults=2, resultItem=[ dict(ID=0, strLen=3, my_str='abc'),
dict(ID=1, strLen=3, my_str='abc'),
dict(ID=2, strLen=3, my_str='abc')]
)
f= format.build(d)
pprint(f)
pprint(format.parse(f))
Thank you kind stranger for your time!

The repeat_handler should return this:
return ctx._index==(ctx.numResults(-1))
Also for the Pascal String I had to multiply times two and fix the encoding:
ct.PascalString(ct.Computed(ct.this.strLen*2) , 'utf-16-le'
Remember kids, a well rested brain is your best ally :)

Related

Regexp_Replace in pyspark not working properly

I am reading a csv file which is something like:
"ZEN","123"
"TEN","567"
Now if I am replacing character E with regexp_replace , its not giving correct results:
from pyspark.sql.functions import
row_number,col,desc,date_format,to_date,to_timestamp,regexp_replace
inputDirPath="/FileStore/tables/test.csv"
schema = StructType()
for field in fields:
colType = StringType()
schema.add(field.strip(),colType,True)
incr_df = spark.read.format("csv").option("header",
"false").schema(schema).option("delimiter", "\u002c").option("nullValue",
"").option("emptyValue","").option("multiline",True).csv(inputDirPath)
for column in incr_df.columns:
inc_new=incr_df.withColumn(column, regexp_replace(column,"E","") )
inc_new.show()
is not giving correct results, it is doing nothing
Note : I have 100+ columns, so need to use for loop
can someone help in spotting my error?
List comprehension will be neater and easier. Lets try
inc_new =inc_new.select(*[regexp_replace(x,'E','').alias(x) for x in inc_new.columns])
inc_new.show()

SUM function in PIG

Starting to learn Pig latin scripting and stuck on below issue. I have gone through similar questions on the same topic without any luck! Want to find SUM of all the age fields.
DUMP X;
(22)(19)
grunt> DESCRIBE X;
X: {age: int}
I tried several options such as :
Y = FOREACH ( group X all ) GENERATE SUM(X.age);
But, getting below exception.
Invalid field projection. Projected field [age] does not exist in schema: group:chararray,X:bag{:tuple(age:int)}.
Thanks for your time and help.
I think the Y projection should work as you wrote it. Here's mi little example code for the same and that's just work fine for me.
X = LOAD 'SO/sum_age.txt' USING PigStorage('\t') AS (age:int);
DESCRIBE X;
Y = FOREACH ( group X all ) GENERATE
SUM(X.age);
DESCRIBE Y;
DUMP Y;
So you your problem looks strange. I used the following input data:
-bash-4.1$ cat sum_age.txt
22
19
Can you make a try on the same data with script I inserted here?

Limit number of characters imported from SQL in R

I am using the sqlquery function in R to connect the DB with R. I am using the following lines
for (i in 1:length(Counter)){
if (Counter[i] %in% str_sub(dir(),1,29) == FALSE){
DT <- data.table(sqlQuery(con, paste0("select a.* from edp_data.sme_loan a
where a.edcode IN (", print(paste0("\'",EDCode,"\'"), quote=FALSE),
") and a.poolcutoffdate in (",print(paste0("\'",str_sub(PoolCutoffDate,1,4),"-",str_sub(PoolCutoffDate,5,6),"-",
str_sub(PoolCutoffDate,7,8),"\'"), quote=FALSE),")")))}}
Thus I am importing subsets of the DB by EDCode and PoolCutoffDate. This works perfectly, however there is one variable in edp_data.sme in one particular EDCode which produces an undesired result.
If I take the unique of this "as.3" variable for a particular EDCode I get:
unique(DT$as3)
[1] 30003000000000019876240886000 30003000000000028672000424000
In reality there shoud be more unique IDs in this DB. The problem is that the string of as3 is much longer than the one which is imported.
nchar(unique(DT$as3))
[1] 29 29
How can I import more characters from this string? I do not want to specify each variable in select a.* ideally, but only make sure that it imports the full string of as3.
Any help is appreciated!

Setting group_by in specialized query

I need to perform data smoothing using averaging, with a non-standard group_by variable that is created on-the-fly. My model consists of two tables:
class WthrStn(models.Model):
name=models.CharField(max_length=64, error_messages=MOD_ERR_MSGS)
owner_email=models.EmailField('Contact email')
location_city=models.CharField(max_length=32, blank=True)
location_state=models.CharField(max_length=32, blank=True)
...
class WthrData(models.Model):
stn=models.ForeignKey(WthrStn)
date=models.DateField()
time=models.TimeField()
temptr_out=models.DecimalField(max_digits=5, decimal_places=2)
temptr_in=models.DecimalField(max_digits=5, decimal_places=2)
class Meta:
ordering = ['-date','-time']
unique_together = (("date", "time", "stn"),)
The data in WthrData table are entered from an xml file in variable time increments, currently 15 or 30 minutes, but that could vary and change over time. There are >20000 records in that table. I want to provide an option to display the data smoothed to variable time units, e.g. 30 minutes, 1, 2 or N hours (60, 120, 180, etc minutes)
I am using SQLIte3 as the DB engine. I tested the following sql, which proved quite adequate to perform the smoothing in 'bins' of N-minutes duration:
select id, date, time, 24*60*julianday(datetime(date || time))/N jsec, avg(temptr_out)
as temptr_out, avg(temptr_in) as temptr_in, avg(barom_mmhg) as barom_mmhg,
avg(wind_mph) as wind_mph, avg(wind_dir) as wind_dir, avg(humid_pct) as humid_pct,
avg(rain_in) as rain_in, avg(rain_rate) as rain_rate,
datetime(avg(julianday(datetime(date || time)))) as avg_date from wthr_wthrdata where
stn_id=19 group by round(jsec,0) order by stn_id,date,time;
Note I create an output variable 'jsec' using the SQLite3 function 'julianday', which returns number of days in the integer part and fraction of day in the decimal part. So, multiplying by 24*60 gives me number of minutes. Dividing by N-minute resolution gives me a nice 'group by' variable, compensating for varying time increments of the raw data.
How can I implement this in Django? I have tried the objects.raw(), but that returns a RawQuerySet, not a QuerySet to the view, so I get error messages from the html template:
</p>
Number of data entries: {{ valid_form|length }}
</p>
I have tried using a standard Query, with code like this:
wthrdta=WthrData.objects.all()
wthrdta.extra(select={'jsec':'24*60*julianday(datetime(date || time))/{}'.format(n)})
wthrdta.extra(select = {'temptr_out':'avg(temptr_out)',
'temptr_in':'avg(temptr_in)',
'barom_mmhg':'avg(barom_mmhg)',
'wind_mph':'avg(wind_mph)',
'wind_dir':'avg(wind_dir)',
'humid_pct':'avg(humid_pct)',
'rain_in':'avg(rain_in)',
'rain_sum_in':'sum(rain_in)',
'rain_rate':'avg(rain_rate)',
'avg_date':'datetime(avg(julianday(datetime(date || time))))'})
Note that here I use the sql-avg functions instead of using the django aggregate() or annotate(). This seems to generate correct sql code, but I cant seem to get the group_by set properly to my jsec data that is created at the top.
Any suggestions for how to approach this? All I really need is to have the QuerySet.raw() method return a QuerySet, or something that can be converted to a QuerySet instead of RawQuerySet. I can not find an easy way to do that.
The answer to this turns out to be really simple, using a hint I found from
[https://gist.github.com/carymrobbins/8477219][1]
though I modified his code slightly. To return a QuerySet from a RawQuerySet, all I did was add to my models.py file, right above the WthrData class definition:
class MyManager(models.Manager):
def raw_as_qs(self, raw_query, params=()):
"""Execute a raw query and return a QuerySet. The first column in the
result set must be the id field for the model.
:type raw_query: str | unicode
:type params: tuple[T] | dict[str | unicode, T]
:rtype: django.db.models.query.QuerySet
"""
cursor = connection.cursor()
try:
cursor.execute(raw_query, params)
return self.filter(id__in=(x[0] for x in cursor))
finally:
cursor.close()
Then in my class definition for WthrData:
class WthrData(models.Model):
objects=MyManager()
......
and later in the WthrData class:
def get_smoothWthrData(stn_id,n):
sqlcode='select id, date, time, 24*60*julianday(datetime(date || time))/%s jsec, avg(temptr_out) as temptr_out, avg(temptr_in) as temptr_in, avg(barom_mmhg) as barom_mmhg, avg(wind_mph) as wind_mph, avg(wind_dir) as wind_dir, avg(humid_pct) as humid_pct, avg(rain_in) as rain_in, avg(rain_rate) as rain_rate, datetime(avg(julianday(datetime(date || time)))) as avg_date from wthr_wthrdata where stn_id=%s group by round(jsec,0) order by stn_id,date,time;'
return WthrData.objects.raw_as_qs(sqlcode,[n,stn_id]);
This allows me to grab results from the highly populated WthrData table smoothed over time increments, and the results come back as a QuerySet instead of RawQuerySet

'numpy.float64' object is not callable

I get the error as in the title of my post. I have seen this come up in other questions, but I am interested in understanding what this means since the other answers were in a specific context that does not apply to me.
Secondly, I would like to understand how this applies to my code, shown below. Note that this all works OK if Zindx = 0, but not for any other case.
Zindx = list(E).index(0)
for m in range(0,N):
if m != Zindx:
for n in range(0,N):
if n != Zindx:
if n != m:
x[m,m] = x[m,m] (
- (E[n]-E[m] + E[n])*x[m,n]*x[n,Zindx]
/x[m,Zindx]/E[m]
)
This:
x[m,m] (
- (E[n]-E[m] + E[n])*x[m,n]*x[n,Zindx]
/x[m,Zindx]/E[m]
)
Is trying to call x[m,m] as a function with the expression within the parenthesis as an argument. I am guessing x[m,m] returns a float.
Do you mean to multiply x[m,m] by the term in the parenthesis? If so, add the *.