Want unique records - what is the SQL? - sql

I have this table
User | File | SubmittedDate
Joe 1223.txt 2011-11-12
Joe 3321.txt 2011-11-13
Jack 4332.txt 2012-11-22
Jane 2344.txt 2012-11-10
I want to select so I only get one record of Joe's, one of Jack's, and one of Jane's.
e.g.
Joe 1223.txt 2011-11-12
Jack 4332.txt 2012-11-22
Jane 2344.txt 2012-11-10
In other words, I want a result set of rows that has a unique user field. What's the SQL to get this?

A very quick Google search would show you there are several possible options:
https://www.google.com/search?client=safari&rls=en&q=select+unique+values+from+sql&ie=UTF-8&oe=UTF-8
One of which is to use the DISTINCT keyword:
SELECT DISTINCT User FROM ....

Related

SQL Db2 - How to unify two rows in one using datetime

I've got a table where we have registries of employees and where they have worked. In each row, we have the employee's starting date on that place. It's something like this:
Employee ID
Name
Branch
Start Date
1
John Doe
234
2018-01-20
1
John Doe
300
2019-03-20
1
John Doe
250
2022-01-19
2
Jane Doe
200
2019-02-15
2
Jane Doe
234
2020-05-20
I need a query where the data returned looks for the next value, making the starting date on the next branch as the end of the current. Eg:
Employee ID
Name
Branch
Start Date
End Date
1
John Doe
234
2018-01-20
2019-03-20
1
John Doe
300
2019-03-20
2022-01-19
1
John Doe
250
2022-01-19
---
2
Jane Doe
200
2019-02-15
2020-05-20
2
Jane Doe
234
2020-05-20
---
When there is not another register, we assume that the employee is still working on that branch, so we can leave it blank or put a default "9999-01-01" value.
Is there any way we can achieve a result like this using only SQL?
Another approach to my problem would be a query that returns only the row that is in a range. For example, if I look for what branch John Doe worked in 2020-12-01, the query should return the row that shows the branch 300.
You can use LEAD() to peek at the next row, according to a subgroup and ordering within it.
For example:
select
t.*,
lead(start_date) over(partition by employee_id order by start_date) as end_date
from t

How do I join or concat 2 dataframes where I get a new column for each row where the left_on/right_on key is the same?

Given 2 dataframes:
DF1
ID
Name
123
Jim
456
Bob
DF2
record_id
model_year
make_desc
model_desc
vin
123
2008
Chevy
Tahoe
cvin
456
2020
Hyundai
Elantra
hvin
456
2018
Ford
F-150
fvin
I want to merge/join/groupby, not sure really such that the result is:
ID
Name
model_year1
make_desc1
model_desc1
vin1
123
Jim
2008
Chevy
Tahoe
cvin
456
Bob
2020
Hyundai
Elantra
hvin
model_year2
make_desc2
model_desc2
vin2
2008
Chevy
Tahoe
cvin
2018
Ford
F150
fvin
(the second table of results is just more columns from the first table, i couldnt figure out the markup)
so kind of like a join, I need to be able to join data on a value
but I want to add columns instead of adding rows, when there are multiple matches,
and the number of matches cant be known upfront so it could need to add 10 columns.
I tried a horizontal concat but it doesnt seem to match on value
I have also read up a bunch on groupby, but I can't get it.
any help would be appreciated.
Didnt fight a straigtfoward way. Please try as explained and coded below;
df3=pd.merge(df1,df2, how='left', on='ID')#Merge the two dfs
df3=df3.groupby(['ID','Name'])['JobCode'].unique().reset_index()# JobCode to list
df3[['JobCode','JobCode_x']]=pd.DataFrame(df3['JobCode'].tolist(), index= df3.index)#Create required columns
ID Name JobCode JobCode_x
0 123 Jim H1B None
1 456 Bob H1B H2B

Recording earliest login time for each day

I need to return the earliest login time per day for a single username. However, some returns do not match the login from that date. Query below:
index=app_redacted_int_* sourcetype="redacted" SessionState="Active" UserName=ABCDE123
| rex field=UserRealName "(?<IDNUM>\d+$)"
| bucket _time span=1d as day
| eval day=strftime(_time,"%F")
| stats earliest(SessionStateChangeTime) as SesssionStateChangeTime by day IDNUM UserRealName UserName
Results:
day IDNUM UserRealName UserName SessionStateChangeTime
2020-07-23 123 John Smith ABCDE123 7/22/2020 09:48:52
2020-07-24 123 John Smith ABCDE123 7/23/2020 12:47:13
2020-07-25 123 John Smith ABCDE123 7/24/2020 07:23:01
2020-07-27 123 John Smith ABCDE123 7/27/2020 07:54:34
2020-07-28 123 John Smith ABCDE123 7/27/2020 07:54:34
2020-07-29 123 John Smith ABCDE123 7/28/2020 07:32:04
As you can see, some days are returning their earliest login as a login from the previous day. I need the dates on the left side and the right side to be matching, and I need this all together in one query, I already know how to do it one query at a time. Thanks for taking your time to help! It is greatly appreciated!
It would appear that on those dates you've binned, the earliest login time was from an earlier day
It appears you've conflated multiple dates in the data into expecting them to be "the same"
I would strongly suspect that SesssionStateChangeTime is not the field you want to look at - at least, not in the manner you're trying to now

PowerBI Report or SQL Query Grouping Data Spanning Columns

I'm wracking my brain trying to figure this out. I have a dataset / table that looks like this:
ID | Person1 | Person2 | Person3 | EffortPerPerson
01 | Bob | Ann | Frank | 2
02 | Frank | Bob | Joe | 3
03 | Ann | Joe | Beth | 1
I'm trying add up "Effort" for each person. For example, Bob is 2+3, Joe is 3+1, etc. My goal is to produce a PowerBI scatter plot showing total Effort for each person.
In a perfect world, the query shouldn't care how many "Person" fields there are. It should just count up the Effort value for every row that the individual's name appears.
I thought GROUP BY would work, but obviously that's only for one column, and I can't wrap my head around how to make nested queries work here.
Any one have any ideas? Thanks in advance!
As Nick suggested, you should go with the Unpivot transformation. Go to Edit Queries and select Transform tab:
Select columns you want to transform in rows, open dropdown menu under Unpivot Columns and select "Unpivot Only Selected Columns":
And that's it! Power BI will aggregate values for you:

appropriate method for text match in one column to other column in oracle

I have to write a query in Oracle. I have a table called 'Entity' with 2 columns 'Pref_mail_name' and 'spouse_name'.
Now i want list of all spouse_name where the last name of the spouse_name is not populated from pref_mail_name.
For example my table has following data
Pref_mail_name spouse_name
Kunio Tanaka | Lorraine
Mrs. Betty H. Williams | Chester Williams
Mr. John Baranger | Mrs. Cathy Baranger
William kane Gallio | Karen F. Gallio
Sangon Kim | Jungja
i need output as 1st and 5th row only. I did some analysis and came up with oracle built in function
SELECT PREF_MAIL_NAME, SPOUSE_NAME, UTL_MATCH.JARO_WINKLER_SIMILARITY(a, b)
similarity from entity
order by similarity;
But above query is not looking genuine.Even though spouse last name is not populated from pref_mail_name its giving a value above 80 for similarity.