SQL & LINQ - Distinct by 2 columns and "reversed values"

SQL & LINQ - Distinct by 2 columns and "reversed values" - sql

I'm working on a chat system and need to get the inbox data for user 1119. This query should return the last message from each conversation.
A Message entity is made of a sender id (CreateUserId), receiver id (UserId), date (CreateDate) and message text (Desc)
EDIT: A "conversation" is a group of messages sent between two users. For example if the users are 1119 and 1120, the messages in the conversation are the ones with CreateUserId=1119, UserId=1120 OR CreateUserId=1120, UserId=1119.
The current query looks like this:
SELECT MAX(Id) Id, CreateUserId Sender,
UserId Receiver, MAX(CreateDate) Date, MAX([Desc]) Message
FROM [CarSharing].[dbo].[Message]
WHERE CreateUserId = 1119 OR UserId = 1119
GROUP BY CreateUserId, UserId)
THE ISSUE: The result gives out two messages from the same person, as the sender and receiver ids are switched. The row with the id 124 shouldn't be there
Ultimately, I'd like to implement this with LINQ, so solutions using that are also very welcome !

I am going to assume that a "conversation" is simply a message between two particular people. That seems to approximate your query.
You can't really do what you want with aggregation. Because the "max" of the message columns may not be the maximum date/time. Instead, use window functions:
SELECT m.*
FROM (SELECT m.*,
ROW_NUMBER() OVER (ORDER BY CreateDate DESC, id DESC) as seqnum
FROM [CarSharing].[dbo].[Message] m
WHERE 1119 IN (m.CreateUserId, UserId)
) m
WHERE seqnum = 1;
The ORDER BY in the windowing clause should be however you define the ordering. This is guessing that the creation date is first and then the id.

Related

SELECT DISTINCT to return at most one row

Given the following db structure:
Regions
id
name
1
EU
2
US
3
SEA
Customers:
id
name
region
1
peter
1
2
henry
1
3
john
2
There is also a PL/pgSQL function in place, defined as sendShipment() which takes (among other things) a sender and a receiver customer ID.
There is a business constraint around this which requires us to verify that both sender and receiver sit in the same region - and we need to do this as part of sendShipment(). So from within this function, we need to query the customer table for both the sender and receiver ID and verify that both their region ID is identical. We will also need to ID itself for further processing down the line.
So maybe something like this:
SELECT DISTINCT region FROM customers WHERE id IN (?, ?)
The problem with this is that the result will be either an array (if the customers are not within the same region) or a single value.
Is there are more elegant way of solving this constraint? I was thinking of SELECT INTO and use a temporary table, or I could SELECT COUNT(DISTINCT region) and then do another SELECT for the actual value if the count is less than 2, but I'd like to avoid the performance hit if possible.

There is also a PL/pgSQL function in place, defined as sendShipment() which takes (among other things) a sender and a receiver customer ID.
There is a business constraint around this which requires us to verify that both sender and receiver sit in the same region - and we need to do this as part of sendShipment(). So from within this function, we need to query the customer table for both the sender and receiver ID and verify that both their region ID is identical. We will also need to ID itself for further processing down the line.
This query should work:
WITH q AS (
SELECT
COUNT( * ) AS CountCustomers,
COUNT( DISTINCT c.Region ) AS CountDistinctRegions,
-- MIN( c.Region ) AS MinRegion
FIRST_VALUE( c.Region ) OVER ( ORDER BY c.Region ) AS MinRegion
FROM
Customers AS c
WHERE
c.CustomerId = $senderCustomerId
OR
c.CustomerId = $receiverCustomerId
)
SELECT
CASE WHEN q.CountCustomers = 2 AND q.CountDistinctRegions = 2 THEN 'OK' ELSE 'BAD' END AS "Status",
CASE WHEN q.CountDistinctRegions = 2 THEN q.MinRegion END AS SingleRegion
FROM
q
The above query will always return a single row with 2 columns: Status and SingleRegion.
SQL doesn't have a "SINGLE( col )" aggregate function (i.e. a function that is NULL unless the aggregation group has a single row), but we can abuse MIN (or MAX) with a CASE WHEN COUNT() in a CTE or derived-table as an equivalent operation.
Alternatively, windowing-functions could be used, but annoyingly they don't work in GROUP BY queries despite being so similar, argh.
Once again, this is the ISO SQL committee's fault, not PostgreSQL's.
As your Region column is UUID you cannot use it with MIN, but I understand it should work with FIRST_VALUE( c.Region ) OVER ( ORDER BY c.Region ) AS MinRegion.
As for the columns:
The Status column is either 'OK' or 'BAD' based on those business-constraints you mentioned. You might want to change it to a bit column instead of a textual one, though.
The SingleRegion column will be NOT NULL (with a valid region) if CountDistinctRegions = 2 regardless of CountCustomers, but feel free to change that, just-in-case you still want that info.

For anybody else who's interested in a simple solution, I finally came up with the (kind of obvious) way to do it:
SELECT
r.region
FROM
customers s
INNER JOIN customers r ON
s.region = r.region
WHERE s.id = 'sender_id' and r.id = 'receiver_id';
Huge credit to SELECT DISTINCT to return at most one row who helped me out a lot on this and also posted a viable solution.

SQL BigQuery - Error that variable is not grouped by even though it is

SQL Code:
SELECT community_table.community_name,
community_table.id,
DATE(timestamp) as date,
ifnull(COUNT(distinct app_opened.user_id), 0) as num_opened_DAU,
lag(COUNT(distinct app_opened.user_id)) OVER
(ORDER BY community_table.community_name, community_table.id, DATE(timestamp)) as pre_Value
FROM *** app_opened
LEFT JOIN (
SELECT DISTINCT id, community_id_2, context_traits_first_name, context_traits_last_name
FROM (
SELECT *
FROM ***,
UNNEST (JSON_EXTRACT_ARRAY(context_traits_community_ids, "$")) as community_id_2
)
GROUP by community_id_2, id, context_traits_first_name, context_traits_last_name) as community_id_table
ON community_id_table.id = app_opened.user_id
LEFT JOIN (
SELECT DISTINCT id, name as community_name
FROM ***) as community_table
ON TO_JSON_STRING(community_table.id) = community_id_table.community_id_2
WHERE app_opened.user_id is not null AND
EXTRACT(DAYOFWEEK FROM DATE(timestamp)) = 2 AND
community_table.community_name is not null
GROUP BY community_table.community_name, community_table.id, DATE(timestamp)
Error Message:
I am quite confused on what could be going wrong here, as the error says that timestamp is not grouped, even though I have grouped it at the bottom. I tried including just timestamp rather than Date(timestamp), but that ruins the table data that I am trying to create, where I find the number of users on a single day. Does anyone have any other ideas? My goal is for a single row, get the previous row's data, but because I am grouping by specific metrics, I need to make sure they are ordered by them as well. Thank you so much!

I think you simply need to modify OVER part as:
OVER (PARTITION BY community_table.community_name, community_table.id, DATE(timestamp)) as pre_Value
UPDATE. Seems that the problem was caused by using DATE() function within OVER so it can be solved by using DATE(timestamp) inside of subquery and passing alias to OVER

How to get "session duration" group by "operating system" in Firebase Bigquery SQL?

I try to get the "average session duration" group by "operating system" (device.operating_system) and "date" (event_date).
In the firebase blog, they give us this query to get the average duration session
SELECT SUM(engagement_time) AS total_user_engagement
FROM (
SELECT user_pseudo_id,
(SELECT value.int_value FROM UNNEST(event_params) WHERE key =
"engagement_time_msec") AS engagement_time
FROM `FIREBASE_PROJECT`
)
WHERE engagement_time > 0
GROUP BY user_pseudo_id
This query give me the total user engagement by user ID (each row is a different user):
row|total_user_engagement
---|------------------
1 |989646
2 |225655
3 |125489
4 | 58496
...|......
But I have no idea where I have to add the "operating system" and "event_date" variables to get this information by os and date. I tried differents queries with no result. For example to get this result by operatiing system I tried the following
SELECT SUM(engagement_time) AS total_user_engagement
FROM (
SELECT device.operating_system,
(SELECT value.int_value FROM UNNEST(event_params) WHERE key =
"engagement_time_msec") AS engagement_time
FROM `FIREBASE_PROJECT`
)
WHERE engagement_time > 0
GROUP BY device.operating_system
But it gives me an error message (Error: Unrecognized name: device at [9:10] ). In others queries device.operating_system is recognized.
For example in that one :
SELECT
event_date,
device.operating_system as os_type,
device.operating_system_version as os_version,
device.mobile_brand_name as device_brand,
device.mobile_model_name as device_model,
count(distinct user_pseudo_id) as all_users
FROM `FIREBASE Project`
GROUP BY 1,2,3,4,5
What I would like to have as a result is something like this :
row|event_date|OS |total_user_engagement
---|----------------------------------------
1 |20191212 |ios |989646
2 |20191212 |android|225655
3 |20191212 |ios |125489
4 |20191212 |android| 58496
...
Thank you

The error is probably because you are referencing the variable device in the outer query, while this variable is only visible from the inner query (subquery). I believe the issue will be fixed by changing the last row of the query from GROUP BY device.operating_system
to
GROUP BY operating_system.
Hopefully this will make clearer what is happening here: the inner query is accessing the table FIREBASE_PROJECT and returning the field operating_system from the nested column device. The outer query accesses the results of the inner query, so it only sees the returned field operating_system, without information about its original context within the nested variable device. That is why trying to reference device at this level will fail.
In the other example you posted this issue does not appear, since there is only a simple query.

Need to convert SQL Query to LINQ

I have the following SQL query that I need to convert into LINQ with VB.NET
SELECT *
FROM (SELECT Id
,LocationCode
,LocationName
,ContactName
,ContactEmail
,Comments
,SBUName
,CreatedBy
,CreatedDtm
,ModifiedBy
,ModifiedDtm
,ROW_NUMBER() OVER (PARTITION BY LocationCode ORDER BY ID) AS RowNumber
FROM testDB ) as rows
WHERE ROWNUMBER = 1
There are many duplicates of location code so I only want to display one record of each and the user will be able to edit the information. Once they edit I will save the info for all records that are for that specific location code.
I couldn't use DISTINCT here, it would still bring back all of the data since the CreatedBy/ModifiedBy are different.
By using the following LINQ query to select all of the data, is there a way I can get the DISTINCT records for LocationCode out of it?
queryLocMaint = From MR In objcontextGSC.TestDB
Select MR.Id,
MR.LocationCode,
MR.LocationName,
MR.SBUName,
MR.ContactName,
MR.ContactEmail,
MR.Comments,
MR.CreatedBy,
MR.CreatedDtm,
MR.ModifiedBy,
MR.ModifiedDtm()

ROW_NUMBER is not supported in LINQ, maybe you can use this GROUP BY approach:
Dim q = From mr In objcontextGSC.TestDB
Group mr By mr.LocationCode Into LocationCodeGroup = Group
Select LocationCodeGroup.OrderBy(Function(mr) mr.Id).First()
This takes the first row of each LocationCode-group ordered by id.

Trouble with oracle sql query

I am trying to make a query of
"What are the names of the producers
with at least 2 properties with areas
with less than 10"
I have made the following query that seems to work:
select Producers.name
from Producers
where (
select count(Properties.prop_id)
from Properties
where Properties.area < 10 and Properties.owner = Properties.nif
) >= 2;
yet, my lecturer was not very happy about it. He even thought (at least gave me the impression of) that this kind of queries wouldn't be valid in oracle.
How should one make this query, then? (I have at the moment no way of getting to speak with him btw).
Here are the tables:
Producer (nif (pk), name, ...)
Property (area, owner (fk to
producer), area, ... )

The having clause is typically used to filter on aggregate data (like counts, sums, max, etc).
select
producers.name,
count(*)
from
producers,
property
where
producers.nif = property.owner and
property.area < 10
group by
producers.name
having
count(*) >= 2

select P.name
from Producers p, Properties pr
where p.nif = pr.Owner
AND Properties.area < 10
GROUP BY Producers.name
having Count(*) >= 2

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL & LINQ - Distinct by 2 columns and "reversed values" - sql

Related

SELECT DISTINCT to return at most one row

SQL BigQuery - Error that variable is not grouped by even though it is

How to get "session duration" group by "operating system" in Firebase Bigquery SQL?

Need to convert SQL Query to LINQ

Trouble with oracle sql query

Categories

Resources