How to select by elements in a UniData multivalued field - unidata

I'm trying to do an ad hoc search of records that contain duplicate values in the first and second elements of a multivalued UniData field. I was hoping something like this would work but I'm not having any luck.
LIST PERSON WITH EVAL "STATUS[1] = STATUS[2]"
After some testing it looks like I stumbled across a way of reading the field right to left that many characters. Interesting but not useful for what I need.
LIST PERSON NAME EVAL "NAME[3]" COL.HDG 'Last3'
PERSON Name Last3
0001 Smith ith
Any ideas on how to correctly select on specific field elements?
Apparently the EXTRACT function will let me specify an element but I still can't get a selection on it to work properly.
LIST PERSON STATUS EVAL "EXTRACT(STATUS,1,2,0)" COL.HDG 'Status2'
PERSON STATUS Status2
0001 Added Processed
Processed

I would use eval with #RECORD placeholder with the dynamic array notation as such (assuming that STATUS is in Attribute 11.
Edit:
Previous answer was how I would do this in UniVerse
SELECT PERSON WITH EVAL "#RECORD<11,1>" EQ EVAL "#RECORD<11,2>"
Script Wolf's more better way that works in UniVerse and UniData.
SELECT PERSON WITH EVAL "EXTRACT(#RECORD,11,1,0)" EQ EVAL "EXTRACT(#RECORD,11,2,0)"
Good Luck.

Related

How to make pie chart of these values in Splunk

Have the following query index=app (splunk_server_group=bex OR splunk_server_group=default) sourcetype=rpm-web* host=rpm-web* "CACHE_NAME=RATE_SHOPPER" method = GET | stats count(eval(searchmatch("true))) as Hit, count(eval(searchmatch("found=false"))) as Miss
Need to make a pie chart of two values "Hit and Miss rates"
The field where it is possible to distinguish the values is Message=[CACHE_NAME=RATE_SHOPPER some_other_strings method=GET found=false]. or found can be true
With out knowing the structure of your data it's harder to say what exactly you need todo but,
Pie charts is a single data series so you need to use a transforming command to generate a single series. PieChart Doc
if you have a field that denotes a hit or miss (You could use an Eval statement to create one if you don't already have this) you can use it to create the single series like this.
Lets say this field is called result.
|stats count by result
Here is a link to the documentation for the Eval Command
Good luck, hope you can get the results your looking for
Since you seem to be concerned only about whether "found" equals either "hit" or "miss", try this:
index=app (splunk_server_group=bex OR splunk_server_group=default) sourcetype=rpm-web* host=rpm-web* "CACHE_NAME=RATE_SHOPPER" method=GET found IN("hit","miss")
| stats count by found
Pie charts require a single field so it's not possible to graph the Hit and Miss fields in a pie. However, if the two fields are combined into one field with two possible values, then it will work.
index=app (splunk_server_group=bex OR splunk_server_group=default) sourcetype=rpm-web* host=rpm-web* "CACHE_NAME=RATE_SHOPPER" method = GET
| eval result=if(searchmatch("found=true"), "Hit", "Miss")
| stats count by result

Django Q&Q versus filter.filter

While hunting a bug, I found out that the following 2 statements do different things:
Query 1
Order.objects \
.filter(items__name__icontains="Foo") \
.filter(items__name__icontains="Bar") \
.distinct()
Query 2
Order.objects \
.filter(
Q(items__name__icontains="Foo") &
Q(items__name__icontains="Bar")
) \
.distinct()
The result is as follows:
Query 1 does include orders that have items which either contain "Foo" or "Bar". For example one item's name is "Foo" while another item's name is "Bar".
Query 2 however only includes orders that have at least one item that contains all keywords, for example an item with a name of "Foo Bar".
Looking at the queries, I can see that the filter() method adds another INNER JOIN to the query while the other doesn't.
I can see the reasoning behind this, but I really wonder if that's the intended behavior.
The difference is that the first query has two filter() calls, and the second query only has one.
The first query tries to find an object with a related item containing 'Foo' and a related item containing 'Bar'. The second query tries to find an item with a single related item that contains both 'Foo' and 'Bar'
The fact that one uses Q() objects is not significant - you could change the first query to:
Order.object.filter(
Q(items__name__icontains="Foo"
).filter(
Q(items__name__icontains="Bar")
)
However the Q() is required in your second Query 2m since it would be invalid Python to repeat the keyword argument in .filter(items__name__icontains="Foo", items__name__icontains="Bar")
See the docs on spanning multi-values relationships for more info.

Hadoop Pig - Replace strings in a relation with their corresponding values in a map

I have a relation called conversations_grouped made up of bags of tuples of varying sizes, like so:
DUMP conversations_grouped:
...
({(L194),(L195),(L196),(L197)})
({(L198),(L199)})
({(L200),(L201),(L202),(L203)})
({(L204),(L205),(L206)})
({(L207),(L208)})
({(L271),(L272),(L273),(L274),(L275)})
({(L276),(L277)})
({(L280),(L281)})
({(L363),(L364)})
({(L365),(L366)})
({(L666256),(L666257)})
({(L666369),(L666370),(L666371),(L666372)})
({(L666520),(L666521),(L666522)})
Each L[0-9]+ is a tag corresponding to a string. For example, L194 might be "Hello, how are you doing?" and L195 might be "fine, how are you?". This correspondence is maintained by a map called line_map. Here's a sample:
DUMP line_map;
...
([L666324#Do you think she might be interested in someone?])
([L666264#Well that's typical of Her Majesty's army. Appoint an engineer to do a soldier's work.])
([L666263#Um. There are rumours that my Lord Chelmsford intends to make Durnford Second in Command.])
([L666262#Lighting COGHILL' 5 cigar: Our good Colonel Dumford scored quite a coup with the Sikali Horse.])
([L666522#So far only their scouts. But we have had reports of a small Impi farther north, over there. ])
([L666521#And I assure you, you do not In fact I'd be obliged for your best advice. What have your scouts seen?])
([L666520#Well I assure you, Sir, I have no desire to create difficulties. 45])
([L666372#I think Chelmsford wants a good man on the border Why he fears a flanking attack and requires a steady Commander in reserve.])
([L666371#Lord Chelmsford seems to want me to stay back with my Basutos.])
([L666370#I'm to take the Sikali with the main column to the river])
([L666369#Your orders, Mr Vereker?])
([L666257#Good ones, yes, Mr Vereker. Gentlemen who can ride and shoot])
([L666256#Colonel Durnford... William Vereker. I hear you 've been seeking Officers?])
What I'm trying to do now is parse through each line and replace the L[0-9]+ tags with their corresponding text from line_map. Is it possible to make references to line_map from within a Pig FOREACH statement, or is there something else I have to do?
The first issue with this is that in a map the key must be a quoted string. So you can't use a schema value to access the map. E.G. This will not work.
C: {foo: chararray, M: [value:chararray]}
D = FOREACH C GENERATE M#foo ;
The solution that comes to mind is to FLATTEN conversations_grouped. Then do a join between conversations_grouped and line_map on the L[0-9]+ tag. You'll probably want to project out some of the extra fields (like the L[0-9]+ tag after the join) to make the next step faster. After that you'll have to regroup the data, and massage it into the correct format.
This won't work unless each bag has it's own unique ID for the regrouping, but if each of the L[0-9]+ tags appear in only one bag (conversation) you can use this to create a unique id.
-- A is dumped conversations_grouped
B = FOREACH A {
-- Pulls out an element from the bag to use as the id
id = LIMIT tags 1 ;
-- Flattens B into id, tag form. Each group of tags will have the same id.
GENERATE FLATTEN(id), FLATTEN(tags) ;
}
The schema and output for B is:
B: {id: chararray,tags::tag: chararray}
(L194,L194)
(L194,L195)
(L194,L196)
(L194,L197)
(L198,L198)
(L198,L199)
(L200,L200)
(L200,L201)
(L200,L202)
(L200,L203)
(L204,L204)
(L204,L205)
(L204,L206)
(L207,L207)
(L207,L208)
(L271,L271)
(L271,L272)
(L271,L273)
(L271,L274)
(L271,L275)
(L276,L276)
(L276,L277)
(L280,L280)
(L280,L281)
(L363,L363)
(L363,L364)
(L365,L365)
(L365,L366)
(L666256,L666256)
(L666256,L666257)
(L666369,L666369)
(L666369,L666370)
(L666369,L666371)
(L666369,L666372)
(L666520,L666520)
(L666520,L666521)
(L666520,L666522)
Assuming that the tags are unique, the rest is done like:
-- A2 is line_map, loaded in tag/message pairs instead of a map
-- Joins conversations_grouped and line_map on tag
C = FOREACH (JOIN B by tags::tag, A2 by tag)
-- This generate removes the tag
GENERATE id, message ;
-- Regroups C on the id created in B
D = FOREACH (GROUP C BY id)
-- This step limits the output to just messages
GENERATE C.(message) AS messages ;
Schema and output from D:
D: {messages: {(A2::message: chararray)}}
({(Colonel Durnford... William Vereker. I hear you 've been seeking Officers?),(Good ones, yes, Mr Vereker. Gentlemen who can ride and shoot)})
({(Your orders, Mr Vereker?),(I'm to take the Sikali with the main column to the river),(Lord Chelmsford seems to want me to stay back with my Basutos.),(I think Chelmsford wants a good man on the border Why he fears a flanking attack and requires a steady Commander in reserve.)})
({(Well I assure you, Sir, I have no desire to create difficulties. 45),(And I assure you, you do not In fact I'd be obliged for your best advice. What have your scouts seen?),(So far only their scouts. But we have had reports of a small Impi farther north, over there. )})
NOTE: If at worst, (the L[0-9]+ tags aren't unique) you can give each line of the input file(s) a sequential, integer id before you load it into pig.
UPDATE: If you are using pig 0.11, then you can also use the RANK operator.

Is there a way to do an LDAP query to get records where a particular attribute is the same?

I am trying to find an example LDAP query where I can find records where a particular attribute matches one or more other records. For instance, a user object where the userid is different, but the employee ids are the same. Is this even possible?
From a single LDAP query no. Unless you know the emplyeeID value you are looking for.
We created an LDAP tool, Duplicate Attribute Value Locater Tool, that will do this.
-jim
It's not possible to do sub queries within the filter itself. In this case, as long as I understand correctly, you'd like to find users that match :
objectClass of User
match on the value of employeeID
Out of the above subset, find all with a DISTINCT 'userid'
If you knew what userid to look for, or NOT look for, you could expand the inital AND clause to include finding, or not finding, that attribute :
userid not equal to 12345 :
(&(objectClass=person)(employeeID=JSmith)(!(userid=12345)))
userid equal to 12345 :
(&(objectClass=person)(employeeID=JSmith)(userid=12345)
I found this example for 'myattribute'. Needs some polish, and depending on the size of your directory, it could take a while to run. If that's the case, I'd break it up by attribute sections {attr=aa*, attr=ab*, attr=ac*, etc.}.
ldapsearch -x -h ldapserver.domain.com -b ou=myldap,o=mydomain.com "(&(myattribute=aa*))" myattribute | grep '^myattribute:' | sort | uniq -c| sort -n|awk '$1 > 1 { print }'

How to change values in facet to same in Google Refine?

I'm trying to clean this data: https://dl.dropbox.com/u/820037/local_council_election_data_w_occupation.gz
It's all the candidates for a local councils' election in Finland. In the column "Ammatti" there is the occupation of a candidate as reported by him/her.
I want to find all the students, but the problem is, that they can be "opiskelija" (student) or "yliopisto-opiskelija" (university student) and things like that.
I clicked the column title "Ammatti" and Filtered it with "opiskelija", then I created a "text facet" from the menu in column title.
That gives me the following facet:
agrol. opiskelija AMK 1
agrologiopiskelija 9
agronomiopiskelija 1
...and so on.
I'd want to change the value of "Ammatti" (occupation) to "opiskelija" (student) in everyone of these occasions.
To make thngs a bit more complicated the facet has also some occupations (mature students and administrative staff) I don't want to change to "opiskelija":
aikuisopiskelija 10
opiskelijakunnan hallituksen varapuheenjohtaja 1
opiskelijapalvelun päällikkö 1
opiskelijapalvelupäällikkö 1
I did this by hand clicking through the whole list in the facet and changing the occupations one by one.
I suppose there is a better way to do this, but could someone please tell me how I should've done it?
Using the 'include' option in the facet, select all the rows that you want to transform from the column "Ammatti". Then in for this column invoke the Transform function and replace "value" by "opiskelija"
This will replace all the value you have selected by "opiskelija".
Hope this help (and it doesn't come too late).