I have a table "Master" with fields called "DFM" and "Target". I ideally need 1 UPDATE query that will populate the "Target" field based on the value of DFM as below:
DFM Target
50001 85
50009 255
50011 233
50012 290
50062 183
50063 150
50064 159.5
50142 187
50143 174
50179 284.25
50180 195.75
50286 157.25
50287 231.25
For example if the DFM value is 50142 it should UPDATE the field for that row with 187.
So can this be done with 1 query, or do I need 13?
I only know the long winded way ie
UPDATE Master, SET Target = 85 WHERE DFM = 50001
I don't really want 13 queries though.
You can use a switch:
update master
set target = switch(dfm = 50001, 85,
dfm = 50009, 255,
. . .
)
where dfm in (50001, 50009, . . .);
Related
I scraped a real estate website and produced a CSV output with data requiring to be cleaned and structured. So far, my code properly organized and reformatted the data for it to work with stats software.
However, every row and then, my 'Gross area' column has the wrong value in m2. The correct value appears in another column ('Furbished').
Gross_area
Furbished
170 #erroneous
190 m2
170 #erroneous
190 m2
160 #correct
Yes
155 #correct
No
I tried using the np.where function. However, I could not specify the condition based on string length, which would allow me to target all '_ _ _ m2' values in column 'Furbished' and reinsert them in 'Gross_area'. It just doesn't work.
df['Gross area']=np.where(len(df['Furbished]) == 6, df['Furbished'],df['Gross area']
As an alternative, I tried setting cumulative conditions to precisely target my '_ _ _ m2' values and insert them in my 'Gross area' column. It does not work:
df['Gross area']=np.where((df['Furbished]) != 'Yes' or 'No', df['Furbished'],df['Gross area']
The outcome I seek is:
Gross_area
Furbished
190 m2
190 m2
190 m2
190m2
160
Yes
Any suggestions? Column Furbished string length criterion would be the best option, as I have other instances that would require the same treatment :)
Thanks in advance for your help!
There is probably a better way to do this, but you could get the intended effect by a simple df.apply() function.
df['Gross area'] = df.apply(lambda row: row['Furbished'] if len(row['Furbished']) == 6 else df['Gross area'], axis=1)
With a simple change, you can also keep the 'Gross area' column in the right type.
df['Gross area'] = df.apply(lambda row: float(row['Furbished'][:-2]) if len(row['Furbished']) == 6 else df['Gross area'], axis=1)
You can use pd.where:
df['Gross_area'] = df['Furbished'].where(df['Furbished'].str.len() == 6, df['Gross_area'])
This tells you to use the value in the Furbished column if its length is 6, otherwise use the value in the Gross_area column.
Result:
Gross_area Furbished
0 190 m2 190 m2
1 190 m2 190 m2
2 160 #correct Yes
3 155 #correct No
Thanks a lot for your help! The suggestion of Derek was the simplest to implement in my program:
df['Gross area']=df['Furbished'].where(df['Furbished'].str.len()==6,df['Gross area'])
I could create a set of rules to replace or delete all the misreferenced data :)
To update data from given column A if column B equals given string
df['Energy_Class']=np.where(df['Energy_Class']=='Usado',df['Bathrooms'],df['Energy_Class'])
To replace string segment found within column rows
net=[]
for row in net_col:
net.append(row)
net_in=[s for s in prices if 'm²' in s]
print(net_in)
net_1=[s.replace('m²','') for s in net]
net_2=[s.replace(',','.') for s in net_1]
net_3=[s.replace('Sim','') for s in net_2]
df['Net area']=np.array(net_3)
To create new column and append standard value B if value A found in existing target column rows
Terrace_list=[]
caraocl0=(df['Caracs/0']
for row in carac_0:
caracl0.append(row)
print(caracl0)
if row == 'Terraço':
yes='Yes'
Terrace_list.append(yes)
else:
null=('No')
Terrace_list.append(null)
df['Terraces']=np.array(Terrace_list)
To append pre-set value B in existing column X if value A found in existing column Y.
df.loc[df['Caracs/1']=='Terraço','Terraces']='Yes'
Hope this helps someone out.
There are some values in my dataset(df) that needs to be replaced with correct values e.g.,
Height
Disease
Weight>90kg
1.58
1
0
1.64
0
1
1.67
1
0
52
0
1
67
0
0
I want to replace the first three values as '158', '164' & '167'. I want to replace the next as 152 and 167 (adding 1 at the beginning).
I tried the following code but it doesn't work:
data_clean <- function(df) {
df[height==1.58] <- 158
df}
data_clean(df)
Please help!
Using recode you can explicitly recode the values:
df <- mutate(df, height = recode(height,
1.58 = 158,
1.64 = 164,
1.67 = 167,
52 = 152,
67 = 167))
However, this obviously is a manual process and not ideal for a case with many values that need recoding.
Alternatively, you could do something like:
df <- mutate(df, height = case_when(
height < 2.5 ~ height * 100,
height < 100 ~ height + 100
)
This would really depend on the makeup of your data but for the example given it would work. Just be careful with what your assumptions are. Could also have used is.double and 'is.integer`.
I have a set of Data in this format
# Input Data for items in the format (i,l,w,h)
i for item, l for length , w for width, h for height
set itemData :=
271440 290 214 361
1504858 394 194 114
4003733 400 200 287
4012512 396 277 250
4013886 273 221 166;
I am trying to get the lengths of each item, using the following code
set IL = setof {i in items, (i,l,w,h) in itemData} (i,l); #length of item i
This method only does not allow me to access the individual item length.
What i am trying to do is to have
display IL[271440] = 290;
how can i go about doing this?
Careful with terminology. In AMPL terms, that table isn't a "set". You have a table of parameters. Your sets are the row and column indices for that table: {"l","w","h"} for the columns, and item ID numbers for the rows.
In AMPL it would be handled something like this:
(.mod part)
set items;
set attributes := {"l","w","h"};
param itemData{items, attributes};
(.dat part)
set items :=
271440
1504858
4003733
4012512
4013886
;
param itemData: l w h :=
271440 290 214 361
1504858 394 194 114
4003733 400 200 287
4012512 396 277 250
4013886 273 221 166
;
You can then do:
ampl: display itemData[271440,"l"];
itemData[271440,'l'] = 290
I think it's possible to define set "items" at the same time as itemData and avoid the need to duplicate the ID numbers. Section 9.2 of the AMPL Book shows how to do this for a parameter that has a single index set, but I'm not sure of the syntax for doing this when you have two index sets as above. (If anybody does know, please add it!)
I'm trying to select accesses for patients where d11.xblood is a minimum value grouped by d11.xpid - and where d11.xcaccess_type is not 288, 289, or 292. (d11.xblood is a chronological index of accesses.)
d11.xpid: Patient ID (int)
d11.xblood: Unique chronological index of patients' accesses (int)
d11.xcaccess_type: Unique identifier for accesses (int)
I want to report one row for each d11.xpid where d11.xblood is the minimum (initial access) for its respective d11.xpid . Moreover, I want to exclude the row if the initial access for a d11.xpid has a d11.xcaccess_type value of 288, 289 or 292.
I have tried several variations of this in the Select Expert:
{d11.xblood} = Minimum({d11.xblood},{d11.xpid}) and
not ({d11.xcaccess_type} in [288, 289, 292])
This correctly selects rows with the initial access but eliminates rows where the current access is not in the array. I only want to eliminate rows where the initial access is not in the array. How can I accomplish this?
Sample table:
xpid xblood xcaccess_type
---- ------ -------------
1 98 400
1 49 300
1 152 288
2 33 288
2 155 300
2 70 400
3 40 300
3 45 400
Sample desired output:
xpid xblood xcaccess_type
---- ------ -------------
1 49 300
3 40 300
See that xpid = 2 is not in the output because its minimum value of xblood had an xcaccess_type = 288 which is excluded. Also see that even though xpid = 1 has an xcaccess_type = 288, because there is a lower value of xblood for xpid = 1 where xcaccess_type not in (288,289,292) it is still included.
If you don't want to write a stored procedure or custom SQL to handle this, you could add another Group. Assuming your deepest group (the one closest to the Details section) is sorting based on xpid, you could add a group inside that one which sorts the xcaccess_type from lowest to highest.
Suppress the header and footer for the new group then add this clause to the details section:
({d11.xpid} = PREVIOUS({d11.xpid})
OR
({d11.xcaccess_type} in [288, 289, 292])
This should modify your report to only ever display the records with the lowest access value per person. And if the lowest access value is one of the three forbidden values, no records will show for that xpid.
I am calculating the next excel table in VBA and leave the results as values because of a volume of data. But then I have to multiply these range by 1 or 0 depending on a column.
The problem is that I don't want to multiply by 0 becouse I gonna lose my data and have to recalculate it (I don't want it).
So, after my macro I get a next table, for example:
var.1 var.2 var.3
0 0 0
167 92 549
159 87 621
143 95 594
124 61 463
0 0 0
5 12 75
in a Range("A2:C9").
In a Range("A1:C1") i gonna have a 1 or 0 values that will be changing so i need my Range("A2:C9") to be like:
var.1 var.2 var.3
=0*A$1 =0*B$1 =0*C$1
=167*A$1 =92*B$1 =549*C$1
...
Is it possible to do with a macro? Thank's
And I would like to get
Okay so what I would do here is first copy the original data to another sheet or set of columns so that it is always preserved. Then use this formula:
=IF($A$1 = 0, 0,E3)
Instead of writing the cell E3 reference the data that you copied.