Replace values in last column - awk

I have multiple text in file.txt
4141 2019-01-08T14:42:55.000+02:00 JonhSmith LS08EE0I 30
2128 2019-11-02T13:47:34.000+02:00 James Davis RT84SO1 40
2293 2019-12-21T17:41:37.000+02:00 James Davis bissness 30
1931 2019-12-15T12:16:48.000+02:00 James Davis IL44DEAA 30
2124 2019-10-12T15:23:46.000+03:00 James Davis AA4074S21 40
2035 2019-12-09T15:33:28.000+02:00 James Davis bissness 30
4843 2022-03-02T12:48:34.000+02:00 Wilson Robert JR autotesit 20
5361 2022-03-02T12:44:55.000+02:00 Wilson Robert JR autotesit 40
2135 2019-10-12T21:06:30.000+03:00 James Davis FR4SA21 40
2122 2019-12-23T20:10:06.000+02:00 Administrator QQ2366I 10
2123 2019-10-12T15:40:16.000+03:00 James Davis LS1d0784EW 40
5075 2022-03-02T12:49:10.000+02:00 Lee Patricia JR autotesit 40
2224 2019-12-20T16:26:36.000+02:00 James Davis G1bissness 30
2582 2021-06-20T15:07:19.000+03:00 Jame E2bissness 30
2121 2019-10-12T17:12:38.000+03:00 James Davis AZ1878S 40
4694 2022-06-20T16:00:48.000+03:00 Oliver A autotest 50
2076 2019-12-02T18:32:42.000+02:00 James Davis bissness 40
2694 2021-04-23T11:42:58.000+03:00 Scott Harper JR AZ0410MAN 40
1721 2019-07-13T15:30:56.000+03:00 Hall Braylon AZ14089D 10
1863 2019-07-25T15:45:02.000+03:00 Diaz Thomas AZ141IJ 40
10 Minimal acces, 20 Guest, 30 View, 40 Reporter, 50 Owner,
I tried to use sed 's/\b30\b/View/g' file.txt , but there was a change throughout the file
and i only need to change the last column.
I need to change the text to look like this
4141 2019-01-08T14:42:55.000+02:00 JonhSmith LS08EE0I View
2128 2019-11-02T13:47:34.000+02:00 James Davis RT84SO1 Reporter
2293 2019-12-21T17:41:37.000+02:00 James Davis bissness View
1931 2019-12-15T12:16:48.000+02:00 James Davis IL44DEAA View
2124 2019-10-12T15:23:46.000+03:00 James Davis AA4074S21 Reporter
2035 2019-12-09T15:33:28.000+02:00 James Davis bissness View
4843 2022-03-02T12:48:34.000+02:00 Wilson Robert JR autotesit Guest
5361 2022-03-02T12:44:55.000+02:00 Wilson Robert JR autotesit Reporter
2135 2019-10-12T21:06:30.000+03:00 James Davis FR4SA21 Reporter
2122 2019-12-23T20:10:06.000+02:00 Administrator QQ2366I Minimal acces
2123 2019-10-12T15:40:16.000+03:00 James Davis LS1d0784EW Reporter
5075 2022-03-02T12:49:10.000+02:00 Lee Patricia JR autotesit Reporter
2224 2019-12-20T16:26:36.000+02:00 James Davis G1bissness View
2582 2021-06-20T15:07:19.000+03:00 Jame E2bissness View
2121 2019-10-12T17:12:38.000+03:00 James Davis AZ1878S Reporter
4694 2022-06-20T16:00:48.000+03:00 Oliver A autotest Owner
2076 2019-12-02T18:32:42.000+02:00 James Davis bissness Reporter
2694 2021-04-23T11:42:58.000+03:00 Scott Harper JR AZ0410MAN Reporter
1721 2019-07-13T15:30:56.000+03:00 Hall Braylon AZ14089D Minimal acces
1863 2019-07-25T15:45:02.000+03:00 Diaz Thomas AZ141IJ Reporter

I would use GNU AWK for this task following way, let file.txt content be
4141 2019-01-08T14:42:55.000+02:00 JonhSmith LS08EE0I 30
2128 2019-11-02T13:47:34.000+02:00 James Davis RT84SO1 40
2293 2019-12-21T17:41:37.000+02:00 James Davis bissness 30
1931 2019-12-15T12:16:48.000+02:00 James Davis IL44DEAA 30
2124 2019-10-12T15:23:46.000+03:00 James Davis AA4074S21 40
2035 2019-12-09T15:33:28.000+02:00 James Davis bissness 30
4843 2022-03-02T12:48:34.000+02:00 Wilson Robert JR autotesit 20
5361 2022-03-02T12:44:55.000+02:00 Wilson Robert JR autotesit 40
2135 2019-10-12T21:06:30.000+03:00 James Davis FR4SA21 40
2122 2019-12-23T20:10:06.000+02:00 Administrator QQ2366I 10
2123 2019-10-12T15:40:16.000+03:00 James Davis LS1d0784EW 40
5075 2022-03-02T12:49:10.000+02:00 Lee Patricia JR autotesit 40
2224 2019-12-20T16:26:36.000+02:00 James Davis G1bissness 30
2582 2021-06-20T15:07:19.000+03:00 Jame E2bissness 30
2121 2019-10-12T17:12:38.000+03:00 James Davis AZ1878S 40
4694 2022-06-20T16:00:48.000+03:00 Oliver A autotest 50
2076 2019-12-02T18:32:42.000+02:00 James Davis bissness 40
2694 2021-04-23T11:42:58.000+03:00 Scott Harper JR AZ0410MAN 40
1721 2019-07-13T15:30:56.000+03:00 Hall Braylon AZ14089D 10
1863 2019-07-25T15:45:02.000+03:00 Diaz Thomas AZ141IJ 40
then
awk 'BEGIN{a[10]="Minimal acces";a[20]="Guest";a[30]="View";a[40]="Reporter";a[50]="Owner"}{$NF=a[$NF];print}' file.txt
gives output
4141 2019-01-08T14:42:55.000+02:00 JonhSmith LS08EE0I View
2128 2019-11-02T13:47:34.000+02:00 James Davis RT84SO1 Reporter
2293 2019-12-21T17:41:37.000+02:00 James Davis bissness View
1931 2019-12-15T12:16:48.000+02:00 James Davis IL44DEAA View
2124 2019-10-12T15:23:46.000+03:00 James Davis AA4074S21 Reporter
2035 2019-12-09T15:33:28.000+02:00 James Davis bissness View
4843 2022-03-02T12:48:34.000+02:00 Wilson Robert JR autotesit Guest
5361 2022-03-02T12:44:55.000+02:00 Wilson Robert JR autotesit Reporter
2135 2019-10-12T21:06:30.000+03:00 James Davis FR4SA21 Reporter
2122 2019-12-23T20:10:06.000+02:00 Administrator QQ2366I Minimal acces
2123 2019-10-12T15:40:16.000+03:00 James Davis LS1d0784EW Reporter
5075 2022-03-02T12:49:10.000+02:00 Lee Patricia JR autotesit Reporter
2224 2019-12-20T16:26:36.000+02:00 James Davis G1bissness View
2582 2021-06-20T15:07:19.000+03:00 Jame E2bissness View
2121 2019-10-12T17:12:38.000+03:00 James Davis AZ1878S Reporter
4694 2022-06-20T16:00:48.000+03:00 Oliver A autotest Owner
2076 2019-12-02T18:32:42.000+02:00 James Davis bissness Reporter
2694 2021-04-23T11:42:58.000+03:00 Scott Harper JR AZ0410MAN Reporter
1721 2019-07-13T15:30:56.000+03:00 Hall Braylon AZ14089D Minimal acces
1863 2019-07-25T15:45:02.000+03:00 Diaz Thomas AZ141IJ Reporter
Explanation: In BEGIN I create array a with replacements as described in requirements, then for each line I use value from a to replace value of last field ($NF) and print such changed line. Disclaimer: this solution assumes that value of last field is always one present in a.
(tested in gawk 4.2.1)

Using any awk in any shell on every Unix box:
$ cat tst.awk
BEGIN {
n = split("10 Minimal acces, 20 Guest, 30 View, 40 Reporter, 50 Owner",tmp,/ *, */)
for (i in tmp) {
old = new = tmp[i]
sub(/ .*/,"",old)
sub(/[^ ]* */,"",new)
map[old] = new
}
}
$NF in map {
$NF = map[$NF]
}
{ print }
$ awk -f tst.awk file
4141 2019-01-08T14:42:55.000+02:00 JonhSmith LS08EE0I View
2128 2019-11-02T13:47:34.000+02:00 James Davis RT84SO1 Reporter
2293 2019-12-21T17:41:37.000+02:00 James Davis bissness View
1931 2019-12-15T12:16:48.000+02:00 James Davis IL44DEAA View
2124 2019-10-12T15:23:46.000+03:00 James Davis AA4074S21 Reporter
2035 2019-12-09T15:33:28.000+02:00 James Davis bissness View
4843 2022-03-02T12:48:34.000+02:00 Wilson Robert JR autotesit Guest
5361 2022-03-02T12:44:55.000+02:00 Wilson Robert JR autotesit Reporter
2135 2019-10-12T21:06:30.000+03:00 James Davis FR4SA21 Reporter
2122 2019-12-23T20:10:06.000+02:00 Administrator QQ2366I Minimal acces
2123 2019-10-12T15:40:16.000+03:00 James Davis LS1d0784EW Reporter
5075 2022-03-02T12:49:10.000+02:00 Lee Patricia JR autotesit Reporter
2224 2019-12-20T16:26:36.000+02:00 James Davis G1bissness View
2582 2021-06-20T15:07:19.000+03:00 Jame E2bissness View
2121 2019-10-12T17:12:38.000+03:00 James Davis AZ1878S Reporter
4694 2022-06-20T16:00:48.000+03:00 Oliver A autotest Owner
2076 2019-12-02T18:32:42.000+02:00 James Davis bissness Reporter
2694 2021-04-23T11:42:58.000+03:00 Scott Harper JR AZ0410MAN Reporter
1721 2019-07-13T15:30:56.000+03:00 Hall Braylon AZ14089D Minimal acces
1863 2019-07-25T15:45:02.000+03:00 Diaz Thomas AZ141IJ Reporter

This might work for you (GNU sed):
lookup=" 10 Minimal acces, 20 Guest, 30 View, 40 Reporter, 50 Owner,"
sed -E 's/$/\n'"${lookup}"'/;s/( \S+)\n.*\1( [^,]+).*/\2/;P;d' file
Append a lookup table to each line and using regexp back references, replace the last field in each line with its match.
N.B. If no match is found the line is printed as is.

mapping="10 Minimal acces, 20 Guest, 30 View, 40 Reporter, 50 Owner, "
awk -v map="$mapping" '
BEGIN {
split(map, a, ",");
for (i in a) {
num = gensub(/^([ ]*)?([^ ]*)([ ]*)?(.*)$/, "\\2", "g", a[i])
desc = gensub(/^([ ]*)?([^ ]*)([ ]*)?(.*)$/, "\\4", "g", a[i])
newnf[num] = desc
}
}
{$NF = newnf[$NF]}1
' input_file
4141 2019-01-08T14:42:55.000+02:00 JonhSmith LS08EE0I View
2128 2019-11-02T13:47:34.000+02:00 James Davis RT84SO1 Reporter
2293 2019-12-21T17:41:37.000+02:00 James Davis bissness View
1931 2019-12-15T12:16:48.000+02:00 James Davis IL44DEAA View
2124 2019-10-12T15:23:46.000+03:00 James Davis AA4074S21 Reporter
2035 2019-12-09T15:33:28.000+02:00 James Davis bissness View
4843 2022-03-02T12:48:34.000+02:00 Wilson Robert JR autotesit Guest
5361 2022-03-02T12:44:55.000+02:00 Wilson Robert JR autotesit Reporter
2135 2019-10-12T21:06:30.000+03:00 James Davis FR4SA21 Reporter
2122 2019-12-23T20:10:06.000+02:00 Administrator QQ2366I Minimal acces
2123 2019-10-12T15:40:16.000+03:00 James Davis LS1d0784EW Reporter
5075 2022-03-02T12:49:10.000+02:00 Lee Patricia JR autotesit Reporter
2224 2019-12-20T16:26:36.000+02:00 James Davis G1bissness View
2582 2021-06-20T15:07:19.000+03:00 Jame E2bissness View
2121 2019-10-12T17:12:38.000+03:00 James Davis AZ1878S Reporter
4694 2022-06-20T16:00:48.000+03:00 Oliver A autotest Owner
2076 2019-12-02T18:32:42.000+02:00 James Davis bissness Reporter
2694 2021-04-23T11:42:58.000+03:00 Scott Harper JR AZ0410MAN Reporter
1721 2019-07-13T15:30:56.000+03:00 Hall Braylon AZ14089D Minimal acces
1863 2019-07-25T15:45:02.000+03:00 Diaz Thomas AZ141IJ Reporter
Other solution
mapping="10 Minimal acces, 20 Guest, 30 View, 40 Reporter, 50 Owner, "
awk -v map="$mapping" '
NR==FNR{ n=$1; $1=""; gsub(/^ /,"",$0); a[n]=$0; next}
{ $NF=a[$NF] }1
' <(tr ',' '\n' <<<"$mapping") input_file
4141 2019-01-08T14:42:55.000+02:00 JonhSmith LS08EE0I View
2128 2019-11-02T13:47:34.000+02:00 James Davis RT84SO1 Reporter
2293 2019-12-21T17:41:37.000+02:00 James Davis bissness View
1931 2019-12-15T12:16:48.000+02:00 James Davis IL44DEAA View
2124 2019-10-12T15:23:46.000+03:00 James Davis AA4074S21 Reporter
2035 2019-12-09T15:33:28.000+02:00 James Davis bissness View
4843 2022-03-02T12:48:34.000+02:00 Wilson Robert JR autotesit Guest
5361 2022-03-02T12:44:55.000+02:00 Wilson Robert JR autotesit Reporter
2135 2019-10-12T21:06:30.000+03:00 James Davis FR4SA21 Reporter
2122 2019-12-23T20:10:06.000+02:00 Administrator QQ2366I Minimal acces
2123 2019-10-12T15:40:16.000+03:00 James Davis LS1d0784EW Reporter
5075 2022-03-02T12:49:10.000+02:00 Lee Patricia JR autotesit Reporter
2224 2019-12-20T16:26:36.000+02:00 James Davis G1bissness View
2582 2021-06-20T15:07:19.000+03:00 Jame E2bissness View
2121 2019-10-12T17:12:38.000+03:00 James Davis AZ1878S Reporter
4694 2022-06-20T16:00:48.000+03:00 Oliver A autotest Owner
2076 2019-12-02T18:32:42.000+02:00 James Davis bissness Reporter
2694 2021-04-23T11:42:58.000+03:00 Scott Harper JR AZ0410MAN Reporter
1721 2019-07-13T15:30:56.000+03:00 Hall Braylon AZ14089D Minimal acces
1863 2019-07-25T15:45:02.000+03:00 Diaz Thomas AZ141IJ Reporter

You can use
sed 's/\([[:space:]]\)30[[:space:]]*$/\1View/' file.txt > newfile.txt
Here, the 30 number is matched only at the end of the string ($) and when preceded with a whitespace.
Details:
\([[:space:]]\) - Capturing group 1 (\1 in the replacement pattern refers to this group value): a whitespace
30 - a fixed string
[[:space:]]* - zero or more (trailing) whitespaces
$ - end of string.
See an online demo:
#!/bin/bash
s='4141 2019-01-08T14:42:55.000+02:00 JonhSmith LS08EE0I 30
2128 2019-11-02T13:47:34.000+02:00 James Davis RT84SO1 40
10 Minimal acces, 20 Guest, 30 View, 40 Reporter, 50 Owner,'
sed 's/\([[:space:]]\)30[[:space:]]*$/\1View/' <<< "$s"
Output:
4141 2019-01-08T14:42:55.000+02:00 JonhSmith LS08EE0I View
2128 2019-11-02T13:47:34.000+02:00 James Davis RT84SO1 40
10 Minimal acces, 20 Guest, 30 View, 40 Reporter, 50 Owner,

As long as you're not concerned about existing extra padded-spaces being squeezed away :
{m,g}awk '
BEGIN {
__[___ =_+= _+= (_+=_^=_<_) \
+_--] = "Minimal access"
__[_+=+_] = "Guest"
__[_+=+_] = "Reporter"
__[_+___] = "Owner"
__[_-___] = "View" } $NF = __[$NF]'
|
4141 2019-01-08T14:42:55.000+02:00 JonhSmith LS08EE0I View
2128 2019-11-02T13:47:34.000+02:00 James Davis RT84SO1 Reporter
2293 2019-12-21T17:41:37.000+02:00 James Davis bissness View
1931 2019-12-15T12:16:48.000+02:00 James Davis IL44DEAA View
2124 2019-10-12T15:23:46.000+03:00 James Davis AA4074S21 Reporter
2035 2019-12-09T15:33:28.000+02:00 James Davis bissness View
4843 2022-03-02T12:48:34.000+02:00 Wilson Robert JR autotesit Guest
5361 2022-03-02T12:44:55.000+02:00 Wilson Robert JR autotesit Reporter
2135 2019-10-12T21:06:30.000+03:00 James Davis FR4SA21 Reporter
2122 2019-12-23T20:10:06.000+02:00 Administrator QQ2366I Minimal access
2123 2019-10-12T15:40:16.000+03:00 James Davis LS1d0784EW Reporter
5075 2022-03-02T12:49:10.000+02:00 Lee Patricia JR autotesit Reporter
2224 2019-12-20T16:26:36.000+02:00 James Davis G1bissness View
2582 2021-06-20T15:07:19.000+03:00 Jame E2bissness View
2121 2019-10-12T17:12:38.000+03:00 James Davis AZ1878S Reporter
4694 2022-06-20T16:00:48.000+03:00 Oliver A autotest Owner
2076 2019-12-02T18:32:42.000+02:00 James Davis bissness Reporter
2694 2021-04-23T11:42:58.000+03:00 Scott Harper JR AZ0410MAN Reporter
1721 2019-07-13T15:30:56.000+03:00 Hall Braylon AZ14089D Minimal access
1863 2019-07-25T15:45:02.000+03:00 Diaz Thomas AZ141IJ Reporter

Related

finding duplicate values with join

ITEMS
ITEM_ID NAME_ID ITEM_NAME
1001 2001 Office chair
1002 2002 Writing Desk
1003 2003 Filing cabinet
1004 2004 Bookshelf bookcase
1005 2005 Table lamp
1006 2001 Office chair
1007 2002 Writing Desk
1008 2003 Filing cabinet
1009 2004 Bookshelf bookcase
1010 2005 Table lamp
1011 2001 Office chair
1012 2002 Writing Desk
1013 2003 Filing cabinet
1014 2004 Bookshelf bookcase
1015 2005 Table lamp
1016 2016 Triangle window
1017 2017 Screen
1018 2018 Cradle
1019 2017 Screen
1020 2018 Cradle
1021 2017 Screen
1022 2018 Cradle
1023 2023 Futon
1024 2024 Single bed
1025 2025 Bunk beds
1026 2026 Sofa bed
1027 2027 Camp bed cot sleeping bag
1028 2028 Airbed air mattress
1029 2029 Hammock
1030 2030 Loveseat
1031 2031 Sleeper sofa
1032 2032 Settee
1032 2032 Settee
1033 2001 Office chair
1034 2002 Writing Desk
1035 2003 Filing cabinet
1036 2004 Bookshelf/bookcase
1037 2005 Table lamp
1038 2001 Office chair
1039 2002 Writing Desk
1040 2003 Filing cabinet
1041 2004 Bookshelf/bookcase
1042 2005 Table lamp
1043 2017 Screen
1044 2018 Cradle
1045 2017 Screen
1046 2018 Cradle
1047 2017 Screen
1048 2018 Cradle
1049 2017 Screen
1050 2018 Cradle
ITEMS_DETAILS:
CITY ITEM_ID SHOP_ID
NEW YORK 1001 4001
NEW YORK 1002 4002
NEW YORK 1003 4003
NEW YORK 1004 4004
NEW YORK 1005 4005
DALLAS 1006 4006
DALLAS 1007 4007
DALLAS 1008 4008
DALLAS 1009 4001
DALLAS 1010 4002
DALLAS 1011 4003
DALLAS 1012 4004
WASHINGTON 1013 4005
WASHINGTON 1014 4006
WASHINGTON 1015 4007
WASHINGTON 1016 4008
WASHINGTON 1017 4009
WASHINGTON 1018 4010
WASHINGTON 1019 4011
SANFRANSISCO 1020 4012
SANFRANSISCO 1021 4013
CHICAGO 1022 4014
CHICAGO 1023 4015
CHICAGO 1024 4016
CHICAGO 1025 4017
BOSTON 1026 4018
BOSTON 1027 4019
BOSTON 1028 4020
BOSTON 1029 4021
BOSTON 1030 4022
SANFRANSISCO 1031 4023
SANFRANSISCO 1032 4024
SANFRANSISCO 1032 4025
SANFRANSISCO 1033 4026
Las Vegas 1034 4027
Austin 1035 4028
Houston 1036 4029
Los Angeles 1037 4030
Seattle 1038 4031
Atlanta 1039 4032
McKinney 1040 4033
Vancouver 1041 4034
Las Vegas 1042 4035
Austin 1043 4036
Houston 1044 4037
Los Angeles 1045 4038
Seattle 1046 4034
Atlanta 1047 4035
McKinney 1048 4036
Vancouver 1049 4037
Las Vegas 1050 4043
Austin 1051 4044
Houston 1052 4045
Los Angeles 1053 4046
Seattle 1054 4047
Atlanta 1055 4048
McKinney 1056 4049
Vancouver 1057 4050
Las Vegas 1058 4051
Austin 1059 4052
Houston 1060 4053
Hi All,
I am trying to find the duplicates values of the columns after the result of the join ITEMS & ITEM_DETAILS.
I know the sql for duplicate values of column on a single table. A bit confused with join.
Logic: If ITEM_NAME is the same but SHOP_ID is different, it should show as duplicate. If SHOP_ID is the same, it should show as unique
Please help me.
I tried as below:
select * from (
select a.NAME_ID from ITEMS a inner join ITEMS_DETAILS b on b.ITEM_ID = a.ITEM_ID) x
inner join ITEMS y on y.NAME_ID=x.NAME_ID
inner join ITEMS_DETAILS z on z.ITEM_ID=y.ITEM_ID
If you are interested in grouping and counting dups then try the query below:
SELECT
COUNT(*) As DupCount,
y.ITEM_ID
FROM
ITEMS y
INNER JOIN ITEMS_DETAILS z ON z.ITEM_ID=y.ITEM_ID
GROUP BY
y.ITEM_ID
HAVING
COUNT(*) > 1

SQL number of first names belonging to two families

Say I have a table with three columns : the first column is the id, the second contains first names and the third column contains last names. They can be rows with the same name but different last names. However, the same name cannot occur twice with the same last name twice in the table.
ID
First_Name
Last_Name
0
John
SMITH
1
John
BROWN
2
John
JONES
3
John
WILLIAMS
4
John
MILLER
5
John
DAVIS
6
John
WILSON
7
John
TAYLOR
8
John
WHITE
9
John
CLARK
10
Michael
SMITH
11
Michael
BROWN
12
James
JONES
13
James
WILLIAMS
14
Robert
MILLER
15
Robert
DAVIS
16
Robert
WILSON
17
Robert
BROWN
18
Robert
JONES
19
Robert
WILLIAMS
20
Jennifer
MILLER
21
Jennifer
DAVIS
22
Jennifer
SMITH
23
Jennifer
BROWN
24
Jennifer
JONES
25
Jennifer
WILLIAMS
26
Jennifer
WILSON
27
Jennifer
TAYLOR
28
Jennifer
WHITE
How do I get a matrix M whose rows and columns are all possible values of last names and M(f1,f2) represents the number of first names who occur with f1 and f2 in the table?
i.e to get this result
Thank you for your help
Getting a (dynamic) matrix as a result of a pure SQL is not doable.
The best you can do is to get the 2 last names in 2 columns, like this:
SELECT
t1.last_n name_a, t2.last_n name_b,
count(case when t1.first_n = t2.first_n then 1 end) count_match
FROM data_table t1,data_table t2
GROUP BY 1,2
ORDER BY 1,2
The results would look like this (this is like the expected results just not in a matrix):
Results
name_a name_b count_match
BROWN BROWN 4
BROWN CLARK 1
BROWN DAVIS 3
BROWN JONES 3
BROWN MILLER 3
BROWN SMITH 3
BROWN TAYLOR 2
BROWN WHITE 2
BROWN WILLIAMS 3
BROWN WILSON 3
CLARK BROWN 1
CLARK CLARK 1
CLARK DAVIS 1
CLARK JONES 1
CLARK MILLER 1
CLARK SMITH 1
CLARK TAYLOR 1
CLARK WHITE 1
CLARK WILLIAMS 1
CLARK WILSON 1
DAVIS BROWN 3
DAVIS CLARK 1
DAVIS DAVIS 3
DAVIS JONES 3
DAVIS MILLER 3
DAVIS SMITH 2
DAVIS TAYLOR 2
DAVIS WHITE 2
DAVIS WILLIAMS 3
DAVIS WILSON 3
JONES BROWN 3
JONES CLARK 1
JONES DAVIS 3
....

How to select the rows that have more than one record in a table?

"Select the beers that have been drank by more than one person". Basically, I'm trying to retrieve a query result that shows me the list of the beers that has been drank by more than one person and then the name of the persons who have drank that beer. I tried to do a group and having clause but realized that I cant have a group clause since I'm interested in duplicate values from both the name and beer columns. What should I do?
table bpb
sysnr beer name
---------- ---------------- ----------
1260 Guinness Draught Agneta
11226 Gigantic IPA Alan
11410 Alesmith Decaden Alan
11581 Trashy Blonde Alan
1260 Guinness Draught Alan
1403 Tuborg Alan
1416 Lowenbrau Alan
1506 Jever Alan
1515 Punk IPA Alan
1523 Armageddon IPA Alan
1540 Westmalle Double Alan
1548 Brooklyn Lager Alan
1553 Chang Beer Alan
1559 Coors Light Alan
1565 Bitburger Alan
1565 Bitburger Alan
1566 Pilsner Urquell Alan
1574 Pabst Blue Ribbo Alan
1585 San Miguel Alan
1594 Lapin Kulta Alan
1625 Sierra Nevada Pa Alan
1642 Fullers London P Alan
1649 Samuel Adams Bos Alan
1650 Orval Alan
1654 Duvel Alan
1657 Chimay vit Alan
1659 Leffe Blond Alan
1664 Kwak Alan
1670 DAB Alan
1670 DAB Alan
1675 Anchor Steam Bee Alan
89607 Lagunitas IPA Alan
89793 Maredsous Tripel Alan
11410 Alesmith Decaden Dick
1553 Chang Beer Dick
1642 Fullers London P Dick
1222 Sofiero Dina
1574 Pabst Blue Ribbo Dina
1650 Orval Dina
11451 Pripps Bla Fredrik
1403 Tuborg Fredrik
1559 Coors Light Fredrik
30611 Dugges High Five Fredrik
11489 Gambrinus Henrik
1353 Budvar Henrik
1544 Litovel Classic Henrik
1566 Pilsner Urquell Henrik
1611 Breznak Henrik
89301 Bernard Henrik
11410 Alesmith Decaden Janne
1260 Guinness Draught Janne
1506 Jever Janne
1559 Coors Light Janne
1559 Coors Light Janne
1649 Samuel Adams Bos Janne
11410 Alesmith Decaden Johan
1515 Punk IPA Johan
1548 Brooklyn Lager Johan
1559 Coors Light Johan
1670 DAB Johan
1403 Tuborg Jonas
1403 Tuborg Juha
1403 Tuborg Juha
1522 Karhu Juha
1523 Armageddon IPA Juha
1566 Pilsner Urquell Juha
1574 Pabst Blue Ribbo Juha
1594 Lapin Kulta Juha
30023 US Red Ale Juha
30658 Stigbergets Sais Juha
11433 Falcon Export Kalle
1519 Saxon Kalle
1522 Karhu Kalle
1551 Citra Pale Ale Kalle
1594 Lapin Kulta Kalle
1675 Anchor Steam Bee Kalle
30023 US Red Ale Kalle
11433 Falcon Export Kjell
1515 Punk IPA Kjell
1548 Brooklyn Lager Kjell
1559 Coors Light Kjell
11226 Gigantic IPA Lennart
11451 Pripps Bla Lennart
11489 Gambrinus Lennart
11581 Trashy Blonde Lennart
1344 Amstel Lennart
1403 Tuborg Lennart
1407 Backyard Brew Lennart
1523 Armageddon IPA Lennart
1540 Westmalle Double Lennart
1565 Bitburger Lennart
1566 Pilsner Urquell Lennart
1574 Pabst Blue Ribbo Lennart
1594 Lapin Kulta Lennart
1642 Fullers London P Lennart
1650 Orval Lennart
1659 Leffe Blond Lennart
1664 Kwak Lennart
1670 DAB Lennart
89793 Maredsous Tripel Lennart
1403 Tuborg Lisen
1407 Backyard Brew Lisen
1548 Brooklyn Lager Lisen
1553 Chang Beer Lisen
1565 Bitburger Lisen
1594 Lapin Kulta Lisen
1657 Chimay vit Lisen
30611 Dugges High Five Lisen
30658 Stigbergets Sais Lisen
11410 Alesmith Decaden Magnus
1260 Guinness Draught Magnus
1407 Backyard Brew Maria
11451 Pripps Bla Marie
11489 Gambrinus Rikard
1353 Budvar Rikard
1540 Westmalle Double Rikard
1544 Litovel Classic Rikard
1611 Breznak Rikard
1650 Orval Rikard
1654 Duvel Rikard
1657 Chimay vit Rikard
1659 Leffe Blond Rikard
1664 Kwak Rikard
1670 DAB Rikard
89793 Maredsous Tripel Rikard
11410 Alesmith Decaden Urban
1416 Lowenbrau Urban
1506 Jever Urban
1565 Bitburger Urban
1642 Fullers London P Urban
1670 DAB Urban
You need a condition in the HAVING clause:
select beer
from bpb
group by beer
having count(distinct name) > 1
With the DISTINCT keyword in COUNT() only different names will be counted.
If you want also the name then:
select * from bpb
where beer in (
select beer
from bpb
group by beer
having count(distinct name) > 1
)
or with EXISTS:
select b.* from bpb b
where exists (
select 1
from bpb
where beer = b.beer and name <> b.name
)
You can use count distinct with group by and having
select beer
from my_table
group by beer
having count(distinct name) > 1
You might find it convenient to get all the names in a single row. I might suggest:
select beer, group_concat(distinct name)
from bpb
group by beer
having count(distinct name) > 1

How to select the highest number of occurances for a certain text value?

Im trying to find the person who has drank most beer types from USA. The result should be just be the name of that person alone and not include a count column. How should I perform the select statement?
The result should look like this:
name
Alan
The table above should be acquired from the table below:
sysnr beer country name
---------- ---------------- ---------- ----------
1260 Guinness Draught Irland Agneta
11226 Gigantic IPA USA Alan
11410 Alesmith Decaden USA Alan
11581 Trashy Blonde Storbritan Alan
1260 Guinness Draught Irland Alan
1403 Tuborg Danmark Alan
1416 Lowenbrau Tyskland Alan
1506 Jever Tyskland Alan
1515 Punk IPA Storbritan Alan
1523 Armageddon IPA Nya Zeelan Alan
1540 Westmalle Double Belgien Alan
1548 Brooklyn Lager USA Alan
1553 Chang Beer Thailand Alan
1559 Coors Light USA Alan
1565 Bitburger Tyskland Alan
1565 Bitburger Tyskland Alan
1566 Pilsner Urquell Tjeckien Alan
1574 Pabst Blue Ribbo USA Alan
1585 San Miguel Spanien Alan
1594 Lapin Kulta Finland Alan
1625 Sierra Nevada Pa USA Alan
1642 Fullers London P Storbritan Alan
1649 Samuel Adams Bos USA Alan
1650 Orval Belgien Alan
1654 Duvel Belgien Alan
1657 Chimay vit Belgien Alan
1659 Leffe Blond Belgien Alan
1664 Kwak Belgien Alan
1670 DAB Tyskland Alan
1670 DAB Tyskland Alan
1675 Anchor Steam Bee USA Alan
89607 Lagunitas IPA USA Alan
89793 Maredsous Tripel Belgien Alan
11410 Alesmith Decaden USA Dick
1553 Chang Beer Thailand Dick
1642 Fullers London P Storbritan Dick
1222 Sofiero Sverige Dina
1574 Pabst Blue Ribbo USA Dina
1650 Orval Belgien Dina
11451 Pripps Bla Sverige Fredrik
1403 Tuborg Danmark Fredrik
1559 Coors Light USA Fredrik
30611 Dugges High Five Sverige Fredrik
11489 Gambrinus Tjeckien Henrik
1353 Budvar Tjeckien Henrik
1544 Litovel Classic Tjeckien Henrik
1566 Pilsner Urquell Tjeckien Henrik
1611 Breznak Tjeckien Henrik
89301 Bernard Tjeckien Henrik
11410 Alesmith Decaden USA Janne
1260 Guinness Draught Irland Janne
1506 Jever Tyskland Janne
1559 Coors Light USA Janne
1559 Coors Light USA Janne
1649 Samuel Adams Bos USA Janne
11410 Alesmith Decaden USA Johan
1515 Punk IPA Storbritan Johan
1548 Brooklyn Lager USA Johan
1559 Coors Light USA Johan
1670 DAB Tyskland Johan
1403 Tuborg Danmark Jonas
1403 Tuborg Danmark Juha
1403 Tuborg Danmark Juha
1522 Karhu Finland Juha
1523 Armageddon IPA Nya Zeelan Juha
1566 Pilsner Urquell Tjeckien Juha
1574 Pabst Blue Ribbo USA Juha
1594 Lapin Kulta Finland Juha
30023 US Red Ale Finland Juha
30658 Stigbergets Sais Sverige Juha
11433 Falcon Export Sverige Kalle
1519 Saxon Finland Kalle
1522 Karhu Finland Kalle
1551 Citra Pale Ale Holland Kalle
1594 Lapin Kulta Finland Kalle
1675 Anchor Steam Bee USA Kalle
30023 US Red Ale Finland Kalle
11433 Falcon Export Sverige Kjell
1515 Punk IPA Storbritan Kjell
1548 Brooklyn Lager USA Kjell
1559 Coors Light USA Kjell
11226 Gigantic IPA USA Lennart
11451 Pripps Bla Sverige Lennart
11489 Gambrinus Tjeckien Lennart
11581 Trashy Blonde Storbritan Lennart
1344 Amstel Holland Lennart
1403 Tuborg Danmark Lennart
1407 Backyard Brew Danmark Lennart
1523 Armageddon IPA Nya Zeelan Lennart
1540 Westmalle Double Belgien Lennart
1565 Bitburger Tyskland Lennart
1566 Pilsner Urquell Tjeckien Lennart
1574 Pabst Blue Ribbo USA Lennart
1594 Lapin Kulta Finland Lennart
1642 Fullers London P Storbritan Lennart
1650 Orval Belgien Lennart
1659 Leffe Blond Belgien Lennart
1664 Kwak Belgien Lennart
1670 DAB Tyskland Lennart
89793 Maredsous Tripel Belgien Lennart
1403 Tuborg Danmark Lisen
1407 Backyard Brew Danmark Lisen
1548 Brooklyn Lager USA Lisen
1553 Chang Beer Thailand Lisen
1565 Bitburger Tyskland Lisen
1594 Lapin Kulta Finland Lisen
1657 Chimay vit Belgien Lisen
30611 Dugges High Five Sverige Lisen
30658 Stigbergets Sais Sverige Lisen
11410 Alesmith Decaden USA Magnus
1260 Guinness Draught Irland Magnus
1407 Backyard Brew Danmark Maria
11451 Pripps Bla Sverige Marie
11489 Gambrinus Tjeckien Rikard
1353 Budvar Tjeckien Rikard
1540 Westmalle Double Belgien Rikard
1544 Litovel Classic Tjeckien Rikard
1611 Breznak Tjeckien Rikard
1650 Orval Belgien Rikard
1654 Duvel Belgien Rikard
1657 Chimay vit Belgien Rikard
1659 Leffe Blond Belgien Rikard
1664 Kwak Belgien Rikard
1670 DAB Tyskland Rikard
89793 Maredsous Tripel Belgien Rikard
11410 Alesmith Decaden USA Urban
1416 Lowenbrau Tyskland Urban
1506 Jever Tyskland Urban
1565 Bitburger Tyskland Urban
1642 Fullers London P Storbritan Urban
1670 DAB Tyskland Urban
Appriciate the help
Use GROUP BY, ORDER BY and LIMIT if you want one result (even when there are ties):
select name -- , count(*) you don't need the count(*) here, but I would keep it
from t
where country = 'USA'
group by name
order by count(*) desc
limit 1;
If you want all rows when there are ties, then use window functions:
select name
from (select name, count(*) as cnt,
rank() over (partition by country order by count(*) desc) as seqnum
from t
where country = 'USA'
group by name
) t
where seqnum = 1;

SQL Queries (Difference between tables)

I'm trying to find a difference between two tables. The tables are
Sample Data
PERSON_PHOTO
ID USERID FNAME
801 uid01 Geroge
801 uid05 George
803 uid01 George
901 uid01 Alice
201 uid01 Alice
330 uid01 Alice
802 uid05 Alice
803 uid05 Alice
804 uid05 Alice
901 uid05 Alice
701 uid05 Alice
201 uid05 Alice
101 uid05 Alice
330 uid05 Alice
501 uid05 Alice
501 uid12 Jane
330 uid12 Jane
101 uid12 Jane
201 uid12 Jane
701 uid12 Jane
801 uid12 Jane
901 uid12 Jane
101 uid07 Mary
101 uid03 Mary
201 uid03 Mary
801 uid03 Mary
901 uid03 Mary
201 uid15 Tom
801 uid15 Tom
Table VALID_FRIEND
FNAME USERID
Bill uid02
George uid01
Mary uid07
Jane uid12
Tom uid15
Alice uid05
Mary uid03
SAMPLE OUTPUT
USERID PHOTOS NOT IN
uid02 0
uid01 5
uid07 9
uid12 3
uid15 8
uid05 8
uid03 6
The query I'm trying to perform is to find the number of Photos that the person is not in. I'm trying to output by USERID and the number of photos not currently in. I know I need to find the count of the distinct PID in person photo and take the difference of the count of the userid in photo. Thanks for any help.