sql group by is duplicating values?

sql group by is duplicating values? - sql

I have a table with speaker, session, conference, email. my goal is to make a query that combines the conference and session into one field so that we can apply some HTML to it and format it when we preview it elsewhere.
The issue here is when a speaker is attending two difference conferences and presenting different sessions. This query somehow duplicates the sessions from one conference and applies it to the second conference:
SELECT speaker AS 'speakername', email AS 'email',
CAST(
(SELECT conference AS 'strong',
(SELECT session AS 'session' from speakersessions AS ds
WHERE ds.speaker = dd.speaker
GROUP BY session
for xml path(''), type) AS 'sessions'
FROM speakersessions AS ds
WHERE ds.speaker = dd.speaker
GROUP BY conference
for xml path(''), type)
AS NVARCHAR(MAX))
AS 'conferences'
FROM speakersessions AS dd
GROUP BY speaker, email
the results that show for speaker 'greg' are:
<strong>Business Planning </strong>
<sessions><session>
10 tips to fast-track
</session><session>
Hybrid planning
</session><session>
Planning on the cloud
</session><session>
The Boardroom
</session></sessions>
<strong>Reporting Analytics</strong>
<sessions><session>
10 tips to fast-track
</session><session>
Hybrid planning
</session><session>
Planning on the cloud
</session><session>
The Boardroom
</session></sessions> <br/>
(I added line breaks)
but as you can see, this is not what the speakersessions table shows:
Conference Session
Business Planning | 10 tips to fast-track
Business Planning | Hybrid planning
Reporting Analytics | Planning on the cloud
Reporting Analytics | The Boardroom
so the sessions for reporting analytics are not populating. What's going on here?

SELECT speaker AS 'speakername', email,
CAST(
(SELECT conference AS 'strong',
(SELECT session AS 'session'
from speakersessions AS part3
WHERE part3.conference = part2.conference and part3.email = part1.email
GROUP BY session,conference
for xml path(''), type) AS 'sessions'
FROM speakersessions AS part2
WHERE part1.email = part2.email
GROUP BY part2.conference
for xml path(''), type)
AS NVARCHAR(MAX)) AS 'conferences'
FROM speakersessions AS part1
GROUP BY email, speaker
had to make sure that conference was added but that it was joined with the proper subquery

Related

REGEX in Snowflake/SQL for Serialized Ruby Hash values

This is a tough one (for me at least). I need to obtain the words after "host_goal_to_have" and "host_goal_to_be_able" and there's not really a strong pattern to signify the end.
For example, the result for "host_goal_to_have" should be: "established a thriving community of actors supporting each other and turning passion into income!". And for "host_goal_to_be_able" being: "reach people and unlock the artist inside them"
Sample:
--- !ruby/hash:ActiveSupport::HashWithIndifferentAccess
signup_recaptcha:
hide_welcome_checklist: true
seen_spaces_migration_welcome_intro: true
host_goal_to_have: established a thriving community of actors supporting each other
and turning passion into income!
host_goals_updated_at: '2023-02-11T23:00:52.441Z'
host_goal_to_be_able: reach people and unlock the artist inside them
seen_event_form: true

So we can start with the same data:
with data as (
select * from values
($$--- !ruby/hash:ActiveSupport::HashWithIndifferentAccess
signup_recaptcha:
hide_welcome_checklist: true
seen_spaces_migration_welcome_intro: true
host_goal_to_have: established a thriving community of actors supporting each other
and turning passion into income!
host_goals_updated_at: '2023-02-11T23:00:52.441Z'
host_goal_to_be_able: reach people and unlock the artist inside them
seen_event_form: true$$)
)
and then we can split it into lines, find the matching "start of line tokens [a-z_]: and assume these are not embedded into a values section. Then separate token: value from value
select d.column1
,s.*
,RLIKE(s.value, '^[a-z_]+:.*$') as r1
,regexp_substr(s.value, '^[a-z_]+:(.*)$', 1,1,'c',1) as r2
,iff(r1, r2, s.value) as ss
from data as d,
table(split_to_table(d.column1, '\n')) as s
after this we can chain the values back together:
with data as (
select * from values
($$--- !ruby/hash:ActiveSupport::HashWithIndifferentAccess
signup_recaptcha:
hide_welcome_checklist: true
seen_spaces_migration_welcome_intro: true
host_goal_to_have: established a thriving community of actors supporting each other
and turning passion into income!
host_goals_updated_at: '2023-02-11T23:00:52.441Z'
host_goal_to_be_able: reach people and unlock the artist inside them
seen_event_form: true$$)
), pass_1 as (
select d.column1
,s.seq
,s.index
,regexp_substr(s.value, '^[a-z_]+:') as t
,nvl(t, lag(t)ignore nulls over(partition by s.seq order by s.index)) as token
,iff(RLIKE(s.value, '^[a-z_]+:.*$'), regexp_substr(s.value, '^[a-z_]+:(.*)$', 1,1,'c',1), s.value) as ss
from data as d,
table(split_to_table(d.column1, '\n')) as s
)
select
seq,
token,
listagg(ss) within group (order by index) as val
from pass_1
where token is not null
group by 1,2
;
which gives:
SEQ
TOKEN
VAL
1
seen_spaces_migration_welcome_intro:
true
1
host_goals_updated_at:
'2023-02-11T23:00:52.441Z'
1
signup_recaptcha:
1
host_goal_to_have:
established a thriving community of actors supporting each other and turning passion into income!
1
host_goal_to_be_able:
reach people and unlock the artist inside them
1
hide_welcome_checklist:
true
1
seen_event_form:
true
which can be filter via HAVING:
select
seq,
token,
trim(listagg(ss) within group (order by index)) as val
from pass_1
where token is not null
group by 1,2
having token in ('host_goal_to_have:', 'host_goal_to_be_able:')
SEQ
TOKEN
VAL
1
host_goal_to_have:
established a thriving community of actors supporting each other and turning passion into income!
1
host_goal_to_be_able:
reach people and unlock the artist inside them

How can I identify english language text using BigQuery?

I have some data on YouTube channel descriptions which are quite messy as you'd imagine. I'd like to filter channels whose description is in English, but I'm not sure how to go about it. Here's a sample of what the data looks like
WITH
foo AS (
SELECT ".olá sejam muito bem vindos. este canal foi criado" AS x
UNION ALL SELECT "Hello, I am Abhy and welcome to my channel." AS x
UNION ALL SELECT "Channels I love: Labrant Fam, Norris Nuts, La Familia Diamond, Piper Rockelle" AS x
UNION ALL SELECT "हेलो दोस्तो रमेश और सागर और सुखदेव आपका स्वागत करते हैं इस चैनल के ऊपर" AS x
UNION ALL SELECT "Hi, I'm K-POP RANDOM👩🇲🇨 === 🌈KPOP RANDOM DANCE🌈 === 🌻I hope you can enjoy" AS x
UNION ALL SELECT 'Public TV Kannada news channel. The slogan is "Yaara Aasthiyoo Alla, Idu Nimma TV"' AS x
UNION ALL SELECT "Instagram: www.instagram.com/whatsfordinner5291/" AS x
UNION ALL SELECT "Welcome to RunningBoy12, a gaming channel brought to you by RO!" as x
)
select * from foo
My idea is to hand-label some records, measure the frequency of foreign characters and words, and then fit a logistic regression model to the data using BigQuery ML. Is there a better way?

You can detect language with Cloud Translation API. Before inserting records, you need to run this API. You may want to use Cloud Functions to call this API. Or if you want to do more complicated ETL, you may use Cloud Dataflow.
When a text is categorized as English, you shall insert record to any DB you want.
In this way, you don't have to store non-English text in your DB, and can save your money for storage and querying. Instead of BigQuery, CloudFirestore could be option. It depends on the service you want to achieve.
Here is Cloud Translation API document:
https://cloud.google.com/translate/docs/advanced/detecting-language-v3#before_you_begin
Comparizon of DB:
https://db-engines.com/en/system/Amazon+DocumentDB%3BGoogle+BigQuery%3BGoogle+Cloud+Firestore

How to sort friends by last message time like whatsapp

I'm working on a chat app and I want to get a query that pulls out the list of friends and sorts them by last message time just the way whatsapp does its own.
Three tables in the database are important.
Table name: UsersPurpose: It stores the list of all registered users in the chat app.
Columns:- sn, matricno, fullname, password, faculty, department, level, year, study_centre, gender, email,phoneno and picture.
Table name: Friends
Purpose: It stores all the list of friends and friend requests.
Columns:- sn, user1, user2, date_initiated,status(1=request sent, 2=they are friends, 3= They are no longer friends), date_accepted, date_unfriend
Table name: Messages
Purpose:- It stores all the messages that have been sent between friends
Columns:- sn, sender, recipient, content, date, mread(to indicate if the recipient has read the message)
So far, this query pulls the list of friends just the way I want, what is left is to combine the messages table and sort it using the date column
SELECT *
FROM users
WHERE matricno IN (SELECT user2
FROM friends
WHERE user1 = 'NOU1213131415'
AND STATUS = '2'
UNION
SELECT user1
FROM friends
WHERE user2 = 'NOU1213131415'
AND STATUS = '2')
The picture below is an example of the chat list it pulls out

I don't know the SQL dialect you use and didn't tested it, but maybe you can do something like this:
SELECT
u.*,
(SELECT MAX(date) FROM messages m WHERE m.sender = u.matricno OR m.recipient = u.matricno) AS max_date
FROM users u
JOIN friends f ON (u.matricno = f.user1 OR u.matricno = f.user2) AND f.status = 2
WHERE u.matricno = 'NOU1213131415'

Get all Activities to one Account - CRM 2016

my question is, is it possible to get all Activities for one Account in CRM by a SQL-Query in a acceptable period of time?
Surprising is, that all the Activities are in the Account-Overview in CRM. And that page loads instantly.
I've build a query just for Email-Activities. The query runs round about 25mins. Which is not suprising to me xD. But i can't find a clear relationship between the 2 tables.
Some data:
~460000 Email-Activities
~28000 Contacts
~37000 Accounts
Here's the sql-query:
select account.Name, Max(email.CreatedOn) from Email
as email
join Contact as contact on email.DirectionCode = 1
and datediff(wk, email.CreatedOn, GetDate()) > 12
and (email.ToRecipients Like '%' + contact.EMailAddress1 +'%'
or email.ToRecipients Like '%' + contact.EMailAddress2 +'%'
or email.ToRecipients Like '%' + contact.EMailAddress3 +'%')
join Account as account on account.AccountId = contact.AccountId
Group by account.Name

The problem is in your qry. When you use like with '%...' it will newer use index.
The best solution in your case is catch qry (with SQL Server Profiler (in SSSMS -> tools -> SQL profiler)) - It'll give you information about dataset and you can get connection between tables via reverse analysis.

PL SQL Pivot Table VS Custom Json solution

I'm at a point within one of my Oracle APEX projects where I need to implement different levels of security for specific individuals for specific applications.
To start, I created a cartesian containing the information from the user table, the app table, and the role table.
It looks like this:
SELECT
A.user_id, B.app_id, C.role_id
FROM user A, app B, role C
ORDER BY A.user_id ASC, B.app_id ASC, C.role_id ASC
This allows me to return EVERY combination of user, app, and role. w/o using a where clause it returns over 303k rows. currently almost 500 users, 6 roles, and over 100 apps.
when I select from this view for a specific user its returning in approximately 10 ms which is acceptable.
Now, I also have a vw that stores each user's app/role assignment. I've joined this table to the cartesian in the following fashion.
SELECT
A.*,
DECODE(B.app_right_id, null, 0, 1) AS user_access
FROM
vw_user_app_role A -- My cartesian view
LEFT JOIN vw_tbl_user_app_role B
ON A.user_id = B.user_id
AND A.app_id = B.app_id
AND A.role_id = B.role_id
This returns a very usable set of data that resembles
user_id app_id role_id user_access
50 5 1 0
50 10 2 1
50 15 3 1
75 5 1 1
75 10 2 0
75 15 3 0
I'm considering what my next step should be, If I should create a pivot of the data where the app_id would be the row, the role_id would be the columns, and the user_access would be the "data". The "data" would ultimately be rendered as a check box on a website with the appropriate row/column headings.
I'm also considering using a pure ajax/json solution where I will build the json string using pl sql and return the entire string to the client to be processed via jquery.
I'm concerned with the difficulty of the first option (i'm very new to pl sql, and I'm unsure of how to generate a pivot table to be used in this version of oracle (v 10) ) and I'm concerned with the expense of creating an entire json string that will contain so much data.
Any suggestions would be greatly appreciated.
EDIT
I've achieved the pivot table that I desired via the following sql:
SELECT
B.application_nm,
A.user_id,
MAX(DECODE(b.role_name, 'role 1', A.USER_ACCESS, NULL)) "role 1",
MAX(DECODE(b.role_name, 'role 2', A.USER_ACCESS, NULL)) "role 2",
MAX(DECODE(b.role_name, 'role 3', A.USER_ACCESS, NULL)) "role 3",
MAX(DECODE(b.role_name, 'role 4', A.USER_ACCESS, NULL)) "role 4",
MAX(DECODE(b.role_name, 'role 5', A.USER_ACCESS, NULL)) "role 5",
MAX(DECODE(b.role_name, 'role 6', A.USER_ACCESS, NULL)) "role 6"
FROM
vw_user_app_access A LEFT JOIN vw_tbl_app B ON A.app_id = B.app_id
LEFT JOIN vw_tbl_roles C ON A.role_id = C.role_id
GROUP BY B.application_name, A.user_id
ORDER BY A.user_id DESC
Only problem is when in the future we have to add 'role 7'. I have to then go back into this query and add the line MAX(DECODE(b.role_name, 'role 7', A.USER_ACCESS, NULL)) "role 7"
Thinking ahead, this may be an inconvenience, but considering APEX's framework, I would have to go into the report any way to update the number of columns manually i believe.
I'm thinking this may be the "best" solution for now, unless anyone has any other suggestions...

It is possible for an Apex report region based on a dynamic SQL query to return a different number of columns as the query changes. I have set up a simple demo on apex.oracle.com. Type a new column name into the Columns tabular form and press "Add Row", and the Matrix report is re-drawn with an extra column of that name.
You have to:
Base the report on a function that returns the SQL to be run as a string
Select the region attribute "Use Generic Column Names (parse query at runtime only)"
Set the report Headings Type to PL/SQL and then use a function to dynamically return the required column headings as a colon-separated list. Note that this can be different from the column name, although my example uses the same text for both.
If my example isn't clear enough I'll add more info later - I'm out of time now.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

sql group by is duplicating values? - sql

Related

REGEX in Snowflake/SQL for Serialized Ruby Hash values

How can I identify english language text using BigQuery?

How to sort friends by last message time like whatsapp

Get all Activities to one Account - CRM 2016

PL SQL Pivot Table VS Custom Json solution

Categories

Resources