I have 2 tables contractPoint and contractPointHistory
ContractPointHistory
ContractPoint
I would like to get contractPoint where point will be subtracted by pointChange. For example: ContractPoint -> id: 3, point: 5
ContractPointHistory has contractPointId: 3 and pointChange: -5. So after manipulating point in contractPoint should be 0
I wrote this code, but it works just for getRawMany(), not for getMany()
const contractPoints = await getRepository(ContractPoint).createQueryBuilder('contractPoint')
.addSelect('"contractPoint".point + COALESCE((SELECT SUM(cpHistory.point_change) FROM contract_point_history AS cpHistory WHERE cpHistory.contract_point_id = contractPoint.id), 0) AS points')
.andWhere('EXTRACT(YEAR FROM contractPoint.validFrom) = :year', { year })
.andWhere('contractPoint.contractId = :contractId', { contractId })
.orderBy('contractPoint.grantedAt', OrderByDirection.Desc)
.getMany();
The method getMany can be used to select all attributes of an entity. However, if one wants to select some specific attributes of an entity then one needs to use getRawMany.
As per the documentation -
There are two types of results you can get using select query builder:
entities or raw results. Most of the time, you need to select real
entities from your database, for example, users. For this purpose, you
use getOne and getMany. But sometimes you need to select some specific
data, let's say the sum of all user photos. This data is not an
entity, it's called raw data. To get raw data, you use getRawOne and
getRawMany
From this, we can conclude that the query which you want to generate can not be made using getMany method.
In my back-end I'm using KnexJS with PostgreSQL and I have to build a Knex function using its SQL builder without the raw SQL.
It is the first time for me to use KnexJS and I got few issues.
What I have to build is as shown in the SQL example
UPDATE
feed
SET
status = 'SOME_STATUS'
WHERE
created_at <= 'SOME_TIMESTAMP'
AND conversation_id = 'ID';
This SQL is updating the table feed all columns with status and where two conditions are met.
In Knex what I tried as example code of my idea
answerPendingMessages(feeds) {
return this.tx(tableName).where({
conversationId,
createdAt <= timestamp // This not idea how to do in Knex ???
}).update({
status: 'ANSWERED'
})
}
In this above function my concern is how to actually convert the where part as one of the consitions is createdAt <= 'TIMESTAMP'
I understood that I can use where with an object but cannot understand how to include the <=
Also I should update the updatedAt column with the new timestamp but also that blocked me at the moment.
Te result in the end should that all columns which met the conditions are updated status to some some status and also a new updatedAt timestamp.
I'm not sure if there is a specific way with Knex to do so
This answer is to provide my goal for my previous question.
The script I'm showing is not doing anything and I don't know know to make it to work.
answerPendingMessages(feeds) {
if (isEmpty(feeds)) return Promise.resolve([]);
const { conversationId, createdAt } = feeds;
return this.tx(tableName)
.where(
columns.conversationId === conversationId,
columns.createdAt <= createdAt
)
.update({
[columns.status]: 'ANSWERED',
[columns.updatedAt]: new Date(),
});
}
What should happen here is that to update a table where I have two conditions but one of then is 'smaller than or equals' relation.
As the output of this should be that all rows where the 2 conditions are met are update status column.
Right now the script is not failing either success piratically nothing is happening.
I have these tables:
Users
Skills
Has_skills (user_id, skill_id)
I'm passing an array of skill IDs to the function which should fetch users who have at least one of those skills. The query should be efficient in a way so it fetches a number of users (limit, range or in other words pagination functionality) but not doing the range starting from user with ID 0 and then going upwards, but the range starting from the user with most matched skills to the least.
So how can the query sort the records by the number of matched skills from most matched skills to least so I can add pagination based on those results? I assume I should additionaly tweak the modifyEager on has_skills and count it, and then implement the range for pagination but I am not entirely sure how to do that. So ultimately, this is what I need to add:
The query should first do the ordering / sorting of the records by the number of matched skills
The above condition should be limited by a number of users or range for pagination purpose and better performance
This is my function:
async function getUsersWithPassedSkillIds({ skillIds }) {
const users = await User.query()
.select('users.id', 'users.name')
.joinEager('has_skills')
.modifyEager('has_skills', builder => builder.select('id', 'name'))
.whereIn('has_skills.id', skillIds)
return users
}
The author of Objection.js helped me make this work!
// Fetch the users who have at least one matching skill
const hasSkillsSubquery = User.relatedQuery("skills").whereIn(
"has_skills.skill_id",
skillIds
);
const users = await User.query()
.select("users.id", "users.name")
// Use .eager instead of .joinEager as pagination doesn't work with it due to joins.
.eager("skills")
// Optional: Populating the matched skills
.modifyEager("skills", builder =>
builder.select("skills.id", "skills.name").whereIn("skills.id", skillIds)
)
// Only taking into account users who have at least 1 matched skill
.whereExists(hasSkillsSubquery.clone())
// Sorting users by matched skills
.orderByRaw("(?) DESC", hasSkillsSubquery.clone().count())
// This would return the user with the most matched skills.
// If you want to fetch 10 users ordered by number of matching skills: .range(0, 9)
.range(0, 0);
i'm searching for a long time now to solve my problem but nearly found nothing helpful.
Hopefully some of you can give me a tip.
I have a relation A with the following format: username, timestamp, ip
For example:
Harald 2014-02-18T16:14:49.503Z 123.123.123.123
Harald 2014-02-18T16:14:51.503Z 123.123.123.123
Harald 2014-02-18T16:14:55.503Z 321.321.321.321
And i want to find out, who changed his ip adress in less then 5 seconds. So the second and the third row should be interesting.
I want do group the relation by username und want to compare the timestamp of the actuall row with the next row. if the ip adress isnt the same and the timestamp is less then 5 seconds bigger, this should be at the output.
could someone help me with that issue?
regards.
first i want to thank you for your time.
but i actually stuck at the Sessionize part.
this is my data comming in:
aoebcu 2014-02-19T14:23:17.503Z 220.61.65.25
aoebcu 2014-02-19T14:23:14.503Z 222.117.144.19
aoebcu 2014-02-19T14:23:14.503Z 222.117.144.19
jekgru 2014-02-19T14:23:14.503Z 213.56.157.109
zmembx 2014-02-19T14:23:12.503Z 199.188.198.91
qhixcg 2014-02-19T14:23:11.503Z 203.40.104.119
and my code till now looks like this:
hijack_Reduced = FOREACH finalLogs GENERATE ClientUserName, timestamp, OriginalClientIP;
hijack_Filtered = FILTER hijack_Reduced BY OriginalClientIP != '-';
hijack_Sessionized = FOREACH (GROUP hijack_Filtered BY ClientUserName) {
views = ORDER hijack_Filtered BY timestamp;
GENERATE FLATTEN(Sessionize(views)) AS (ClientUserName,timestamp,OriginalClientIP,session_id);
}
but when i run this script, i got the following error Message:
15:36:22 ERROR -
org.apache.pig.tools.pigstats.SimplePigStats.setBackendException(542)
| ERROR 0: Exception while executing [POUserFunc (Name:
POUserFunc(datafu.pig.sessions.Sessionize)[bag] - scope-199 Operator
Key: scope-199) children: null at []]:
java.lang.IllegalArgumentException: Invalid format: "aoebcu"
i already tried a lot, but nothing worked.
do you got an idea?
Regards
While you could write a UDF for this, you can actually make use of the UDFs already available in Apache DataFu to solve this.
My solution involves applying sessionization to the data. Basically you look at consecutive events and assign each event a session ID. If the time elapsed between two events exceeds a specified amount of time, in your case 5 seconds, then the next event gets a new session ID. Otherwise consecutive events get the same session ID. Once each event is assigned its session ID the rest is easy. We group by session ID and look for sessions that have more than one distinct IP address.
I'll walk through my solution.
Suppose you have the following input data. Both Harold and Kumar change their IP addresses. But Harold does it within 5 seconds, while Kumar does not. So the output of our script should just be simply "Harold".
Harold,2014-02-18T16:14:49.503Z,123.123.123.123
Harold,2014-02-18T16:14:51.503Z,123.123.123.123
Harold,2014-02-18T16:14:55.503Z,321.321.321.321
Kumar,2014-02-18T16:14:49.503Z,123.123.123.123
Kumar,2014-02-18T16:14:55.503Z,123.123.123.123
Kumar,2014-02-18T16:15:05.503Z,321.321.321.321
Load the data
data = LOAD 'input' using PigStorage(',')
AS (user:chararray,time:chararray,ip:chararray);
Now define a couple UDFs from DataFu. The Sessionize UDF performs sessionization as I described earlier. The DistinctBy UDF will be used to find the distinct IP addresses within each session.
define Sessionize datafu.pig.sessions.Sessionize('5s');
define DistinctBy datafu.pig.bags.DistinctBy('1');
Group the data by user, sort by time, and apply the Sessonize UDF. Note that the timestamp must be the first field, as this is what Sessionize expects. This UDF appends a session ID to each tuple.
data = FOREACH data GENERATE time,user,ip;
data_sessionized = FOREACH (GROUP data BY user) {
views = ORDER data BY time;
GENERATE flatten(Sessionize(views)) as (time,user,ip,session_id);
}
Now that the data is sessionized, we can group by the user and session. I group by user too because I want to spit this value back out. We pass the bag of events into the DistinctBy UDF. Check the documentation of this UDF for a more detailed description. But essentially we will get as many tuples as there are distinct IP addresses per session. Note that I have removed the time from the relation below. This is because 1) it isn't needed, and 2) the DistinctBy in 1.2.0 of DataFu has a bug when handling fields containing dashes, as the time field does.
data_sessionized = FOREACH data_sessionized GENERATE user,ip,session_id;
data_sessionized = FOREACH (GROUP data_sessionized BY (user, session_id)) GENERATE
group.user as user,
SIZE(DistinctBy(data_sessionized)) as distinctIpCount;
Now select all the sessions that had more than one distinct IP address and return the distinct users for these sessions.
data_sessionized = FILTER data_sessionized BY distinctIpCount > 1;
data_sessionized = FOREACH data_sessionized GENERATE user;
data_sessionized = DISTINCT data_sessionized;
This produces simply:
Harold
Here is the full source code, which you should be able to paste directly into the DataFu unit tests and run:
/**
define Sessionize datafu.pig.sessions.Sessionize('5s');
define DistinctBy datafu.pig.bags.DistinctBy('1'); -- distinct by ip
data = LOAD 'input' using PigStorage(',') AS (user:chararray,time:chararray,ip:chararray);
data = FOREACH data GENERATE time,user,ip;
data_sessionized = FOREACH (GROUP data BY user) {
views = ORDER data BY time;
GENERATE flatten(Sessionize(views)) as (time,user,ip,session_id);
}
data_sessionized = FOREACH data_sessionized GENERATE user,ip,session_id;
data_sessionized = FOREACH (GROUP data_sessionized BY (user, session_id)) GENERATE
group.user as user,
SIZE(DistinctBy(data_sessionized)) as distinctIpCount;
data_sessionized = FILTER data_sessionized BY distinctIpCount > 1;
data_sessionized = FOREACH data_sessionized GENERATE user;
data_sessionized = DISTINCT data_sessionized;
STORE data_sessionized INTO 'output';
*/
#Multiline private String sessionizeUserIpTest;
private String[] sessionizeUserIpTestData = new String[] {
"Harold,2014-02-18T16:14:49.503Z,123.123.123.123",
"Harold,2014-02-18T16:14:51.503Z,123.123.123.123",
"Harold,2014-02-18T16:14:55.503Z,321.321.321.321",
"Kumar,2014-02-18T16:14:49.503Z,123.123.123.123",
"Kumar,2014-02-18T16:14:55.503Z,123.123.123.123",
"Kumar,2014-02-18T16:15:05.503Z,321.321.321.321"
};
#Test
public void sessionizeUserIpTest() throws Exception
{
PigTest test = createPigTestFromString(sessionizeUserIpTest);
this.writeLinesToFile("input",
sessionizeUserIpTestData);
List<Tuple> result = this.getLinesForAlias(test, "data_sessionized");
assertEquals(result.size(),1);
assertEquals(result.get(0).get(0),"Harold");
}
I have a need to get a Count of Documents in a particular collection :
There is an existing index Raven/DocumentCollections that stores the Count and Name of the collection paired with the actual documents belonging to the collection. I'd like to pick up the count from this index if possible.
Here is the Map-Reduce of the Raven/DocumentCollections index :
from doc in docs
let Name = doc["#metadata"]["Raven-Entity-Name"]
where Name != null
select new { Name , Count = 1}
from result in results
group result by result.Name into g
select new { Name = g.Key, Count = g.Sum(x=>x.Count) }
On a side note, var Count = DocumentSession.Query<Post>().Count(); always returns 0 as the result for me, even though clearly there are 500 odd documents in my DB atleast 50 of them have in their metadata "Raven-Entity-Name" as "Posts". I have absolutely no idea why this Count query keeps returning 0 as the answer - Raven logs show this when Count is done
Request # 106: GET - 0 ms - TestStore - 200 - /indexes/dynamic/Posts?query=&start=0&pageSize=1&aggregation=None
For anyone still looking for the answer (this question was posted in 2011), the appropriate way to do this now is:
var numPosts = session.Query<Post>().Count();
To get the results from the index, you can use:
session.Query<Collection>("Raven/DocumentCollections")
.Where(x=>x.Name == "Posts")
.FirstOrDefault();
That will give you the result you want.