I have a datatable called dtDealer with 2 columns called Customer, Year. I am trying to make a new datatable called dtdistinct which has an extra column called multi which shows the count for the number of duplicate rows...
dtDealer
Customer | Year
AAA 2012
BBB 2011
AAA 2012
BBB 2011
BBB 2011
BBB 2012
dtmulti
Customer | Year | multi
AAA 2012 2
BBB 2011 3
BBB 2012 1
tried with asenumerable but not working please help
AsEnumerable does work. I suspect you had an issue with grouping. I grouped by a Tuple
Dim dealers = {(Customer:= "AAA", Year:= 2012),
(Customer:= "BBB", Year:= 2011),
(Customer:= "AAA", Year:= 2012),
(Customer:= "BBB", Year:= 2011),
(Customer:= "BBB", Year:= 2011),
(Customer:= "BBB", Year:= 2012)}
Dim dtDealer As New DataTable()
dtDealer.Columns.Add("Customer")
dtDealer.Columns.Add("Year")
For Each dealer In dealers
Dim row = dtDealer.NewRow()
row("Customer") = dealer.Customer
row("Year") = dealer.Year
dtDealer.Rows.Add(row)
Next
Console.WriteLine($"Customer | Year")
For Each row In dtDealer.AsEnumerable
Console.WriteLine($"{row("Customer")}{vbTab}{row("Year")}")
Next
Dim dtMulti As New DataTable()
dtMulti.Columns.Add("Customer")
dtMulti.Columns.Add("Year")
dtMulti.Columns.Add("Multi")
dtMulti = dtDealer.AsEnumerable().
GroupBy(Function(dealer) (dealer("Customer"), dealer("Year"))).
Select(Function(g)
Dim newRow = dtMulti.NewRow()
newRow("Customer") = g.First.Item("Customer")
newRow("Year") = g.First.Item("Year")
newRow("Multi") = g.Count()
Return newRow
End Function).
CopyToDataTable()
Console.WriteLine($"Customer | Year | Multi")
For Each row In dtMulti.AsEnumerable
Console.WriteLine($"{row("Customer")}{vbTab}{row("Year")}{vbTab}{row("Multi")}")
Next
Customer | Year
AAA 2012
BBB 2011
AAA 2012
BBB 2011
BBB 2011
BBB 2012
Customer | Year | Multi
AAA 2012 2
BBB 2011 3
BBB 2012 1
Related
I have this dataset on an R/SQL Server:
name year
1 john 2010
2 john 2011
3 john 2013
4 jack 2015
5 jack 2018
6 henry 2010
7 henry 2011
8 henry 2012
I am trying to add two columns that:
Column 1: Looks at the "number of missing years between successive rows" for each person.
Column 2: Sum the cumulative "number of missing years" for each person
For example - the first instance of each person will be 0, and then:
# note: in this specific example that I have created, "missing_ years" is the same as the "cumulative_missing_years"
name year missing_years cumulative_missing_years
1 john 2010 0 0
2 john 2011 0 0
3 john 2013 1 1
4 jack 2015 0 0
5 jack 2018 3 3
6 henry 2010 0 0
7 henry 2011 0 0
8 henry 2012 0 0
I think this can be done with a "grouped cumulative difference" and "grouped cumulative sums":
library(dplyr)
library(DBI)
con <- dbConnect(RSQLite::SQLite(), ":memory:")
# https://stackoverflow.com/questions/30606360/subtract-value-from-previous-row-by-group
final = my_data %>%
group_by(name) %>%
arrange(year) %>%
mutate(missing_year) = year- lag(year, default = first(year)) %>%
mutate(cumulative_missing_years) = mutate( cumulative_missing_years = cumsum(cs))
But I am not sure if I am doing this correctly.
Ideally, I am looking for an SQL approach or an R approach (e.g. via DBPLYR) that can be used to interact with the dataset.
Can someone please suggest an approach for doing this?
Thank you!
Using the data in the Note at the end perform a left self join to get the next year of the same name and then subtract and take the cumulative sum.
library(sqldf)
sqldf("select a.*,
coalesce(min(b.year) - a.year - 1, 0) as missing,
sum(coalesce(min(b.year) - a.year - 1, 0)) over
(partition by a.name order by a.year) as sum
from DF a
left join DF b on a.name = b.name and a.year < b.year
group by a.name, a.year
order by a.name, a.year")
giving:
name year missing sum
1 henry 2010 0 0
2 henry 2011 0 0
3 henry 2012 0 0
4 jack 2015 2 2
5 jack 2018 0 2
6 john 2010 0 0
7 john 2011 1 1
8 john 2013 0 1
Note
Lines <- "name year
1 john 2010
2 john 2011
3 john 2013
4 jack 2015
5 jack 2018
6 henry 2010
7 henry 2011
8 henry 2012
"
DF <- read.table(text = Lines)
I hope this helps
name <- c(rep("John", 3), rep("jack", 2), rep("henry", 3) )
year <- c(2010, 2011, 2013, 2015, 2018, 2010, 2011, 2012)
dt <- data.frame(name = name, year = year)
# first group the data by name then order by year then mutate
dt <- dt %>%
group_by(name) %>%
arrange(year, .by_group = TRUE) %>%
mutate( mis_yr = if_else(is.na(year - lag(year, n = 1L) -1), 0,
year - lag(year, n = 1L) -1) ,
cum_yr = cumsum(mis_yr)
) %>%
ungroup()
Hare is the outcome
name year mis_yr cum_yr
<chr> <dbl> <dbl> <dbl>
1 henry 2010 0 0
2 henry 2011 0 0
3 henry 2012 0 0
4 jack 2015 0 0
5 jack 2018 2 2
6 John 2010 0 0
7 John 2011 0 0
8 John 2013 1 1
I am working with python and want to cut a dataset before the following string occurs: "Customer Information". The goal is that everything before this string is inside my new dataset and the part after the string is cut out.
I have tried some things (2 ways) but it did not work. See Code attached below.
df = dataset.copy()
df.Description = df.Description.str.split('Customer Information').str[0].str.strip()
df['Description'] = [x.lstrip('').rstrip('Customer Information') for x in df['Description']]
The expected result for this string ("test Customer Information: Many lines of customer information") should be:
"test"
my actual result is: "test Customer Information: Many lines of customer information"
Why not:
df.Description=df.Description.str.split('Customer Information').str.get(0)
Here is how I tested:
Item Description
0 1 abc Customer Information abc
1 1 aaa Customer Information abc
2 1 bbb Customer Information abc
3 1 ccc Customer Information abc
Item Description
0 1 abc
1 1 aaa
2 1 bbb
3 1 ccc
I have a datatable with the following fields:
Day
Date
Room Rate
No of Person
Amount
The data is as follows:
Day Date Room No. Room Rate No. of Person Amount
1 4/9/2018 101 900.00 2 1, 800.00
2 4/10/2018 101 900.00 2 1, 800.00
3 4/10/2018 101 900.00 2 1, 800.00
1 4/9/2018 102 1000.00 3 3, 000.00
2 4/10/2018 102 1000.00 3 3, 000.00
3 4/10/2018 102 1000.00 3 3, 000.00
I would like to get the total amount by getting the sum of Amount. But, the last day for each Room should not be included. With the above example, the total amount would be 9, 600.00 since Room 101 and Room 102 of day 3 is not included.
I tried to use the datatable compute function, but this will not be effective:
Convert.ToInt32(DataSet.Tables("dt_Lodging").Compute("SUM(Amount)", "Day = 3")
Day will not be limited to 3. If we have days 1 to 5, day 5 is the one which will not be included in Total.
Try his line
Dim Amount As Decimal = T.Rows.OfType(Of DataRow).GroupBy(Function(X) CStr(X("RoomNo"))).Sum(Function(Room) Room.Take(Room.Count - 1).Sum(Function(X) pDec(X("Amount"))))
But your question is not clear abount name of RoomNo column. And the query has some presumptions according to your question.
It will not work when the room number repeats in different periods. Or when the rows are not sorted by date.
This solution is not optimized in any way. It just calculates the value.
Does room rate vary by date? Room rate*Number of nights would be better solution. You should work with nights of stay instead of days anyway.
EDIT:
Full code version
Public Sub Test()
Dim R As DataRow, i As Integer
Using T As New DataTable
T.Columns.Add("RoomNo", GetType(String))
T.Columns.Add("Amount", GetType(Decimal))
For i = 1 To 3
R = T.NewRow
R("RoomNo") = "101"
R("Amount") = 1800
T.Rows.Add(R)
R = T.NewRow
R("RoomNo") = "102"
R("Amount") = 3000
T.Rows.Add(R)
Next
Dim Amount As Decimal = T.Rows.OfType(Of DataRow).GroupBy(Function(X) CStr(X("RoomNo"))).Sum(Function(Room) Room.Take(Room.Count - 1).Sum(Function(X) CDec(X("Amount"))))
Debugger.Break()
End Using
End Sub
I have a datatable called dtstore with 4 columns called section, department, palletnumber and uniquenumber. I am trying to make a new datatable called dtmulti which has an extra column called multi which shows the count for the number of duplicate rows...
dtstore
section | department | palletnumber | batchnumber
---------------------------------------------------
pipes 2012 1234 21
taps 2011 5678 345
pipes 2012 1234 21
taps 2011 5678 345
taps 2011 5678 345
plugs 2009 7643 63
dtmulti
section | department | palletnumber | batchnumber | multi
----------------------------------------------------------
pipes 2012 1234 21 2
taps 2011 5678 345 3
I have tried lots of approaches but my code always feels clumsy and bloated, is there an efficient way to do this?
Here is the code I am using:
Dim dupmulti = dataTable.AsEnumerable().GroupBy(Function(i) i).Where(Function(g) g.Count() = 2).Select(Function(g) g.Key)
For Each row In dupmulti multirow("Section") = dup("Section")
multirow("Department") = dup("Department")
multirow("PalletNumber") = dup("PalletNumber")
multirow("BatchNumber") = dup("BatchNumber")
multirow("Multi") = 2
Next
Assumptions of the code below these lines: the DataTable containing the original information is called dup. It might contain any number of duplicates and all of them can be defined by just looking at the first column.
'Creating final table from the columns in the original table
Dim multirow As DataTable = New DataTable
For Each col As DataColumn In dup.Columns
multirow.Columns.Add(col.ColumnName, col.DataType)
Next
multirow.Columns.Add("multi", GetType(Integer))
'Looping though the groupped rows (= no duplicates) on account of the first column
For Each groups In dup.AsEnumerable().GroupBy(Function(x) x(0))
multirow.Rows.Add()
'Adding all the cells in the corresponding row except the last one
For c As Integer = 0 To dup.Columns.Count - 1
multirow(multirow.Rows.Count - 1)(c) = groups(0)(c)
Next
'Adding the last cell (duplicates count)
multirow(multirow.Rows.Count - 1)(multirow.Columns.Count - 1) = groups.Count
Next
With this data:
id | month | 2015 | 2014 | 2013
1 | january | 2 | 4 | 6
2 | february | 10 | 12 | 14
3 | march | 16 | 18 | 20
I have vb.net code here
Dim asd As Double = 2015
Dim msg As Double
sql = "select " & asd & " from tbl_coll_penalty where month = 'february'"
sda = New NpgsqlDataAdapter(sql, pgConnection)
sda.Fill(DS, "t")
msg = DS.Tables("t").Rows(0)(0).ToString()
MessageBox.Show(msg)
I got wrong answer with this code because the answer of this code is "2015" but I expect the answer to be "10". Can someone help me the proper code of this?
In PostgreSQL, your column name "2015" is interpreted as a literal value. So when you submit the query:
SELECT 2015 FROM tbl_coll_penalty ...
you simply get the value 2015 back.
In the SQL standard, a column identifier cannot start with a numeral (0..9). Check the documentation here. To have PostgreSQL interpret the string "2015" as a column name, you should double-quote it:
sql = "select """ & asd & """ from tbl_coll_penalty where month = 'february'"