I want to load data from two different Excel files, and use them in the same table in QlikView.
I have two files (DIVISION.xls and REGION.xls), and I'm using the following code:
let tt = 'DIVISION$' and 'REGION$';
FOR Each db_schema in 'DIVISION.xls','REGION.xls'
FOR Each v_db in $(tt)
div_reg_table:
LOAD *
FROM $(db_schema)
(biff, embedded labels, table is $(v_db));
NEXT
NEXT
This code works fine, and does not show any error, but I don't get any data, and I don't see my new table (div_reg_table).
Can you help me?
The main reason that your code does not load any data is due to the form of the tt variable.
When your inner loop executes, it evaluates tt (denoted by $(tt)), which then results in the evaluation of:
'DIVISION$' AND 'REGION$'
Which results in null, since these are just two strings.
If you change your statement slightly, from LET to SET and remove the AND, then the inner loop will work. For example:
SET tt = 'DIVISION$', 'REGION$';
However, this also now means that your inner loop will be executed for each value in tt for both workbooks. This means that unless you have a REGION sheet in your DIVISION workbook, then the load will fail.
To avoid this, you may wish to restructure your script slightly so that you have a control table which details the files and tables to load. An example I prepared is shown below:
FileList:
LOAD * INLINE [
FileName, TableName
REGION.XLS, REGION$
DIVISION.XLS, DIVISION$
];
FOR i = 0 TO NoOfRows('FileList') - 1
LET FileName = peek('FileName', i, 'FileList');
LET TableName = peek('TableName', i, 'FileList');
div_reg_table:
LOAD
*
FROM '$(FileName)'
(biff, embedded labels, table is '$(TableName)');
NEXT
If you have just two files DIVISION.XLS and REGION.XLS then it may be worth just using two separate loads one after the other, and removing the for loop entirely. For example:
div_reg_table:
LOAD
*
FROM [DIVISION.XLS]
(biff, embedded labels, table is DIVISION$)
WHERE DivisionName = 'AA';
LOAD
*
FROM [REGION.XLS]
(biff, embedded labels, table is REGION$)
WHERE RegionName = 'BB';
Don't you need to make the noofrows('Filelist') test be 1 less than the answer.
noofrows('filelist') will evaluate to 2 so you will get a loop step for 0,1 and 2. So 2 will return a null which will cause it to fail.
I did it like this:
FileList:
LOAD * INLINE [
FileName, TableName
REGION.XLS, REGION
DIVISION.XLS, DIVISION
];
let vNo= NoOfRows('FileList')-1;
FOR i = 0 TO $(vNo)
LET FileName = peek('FileName', i, 'FileList');
LET TableName = peek('TableName', i, 'FileList');
div_reg_table:
LOAD
*,
$(i) as I,
FileBaseName() as SOURCE
FROM '$(FileName)'
(biff, embedded labels, table is '$(TableName)');
NEXT
Related
I'm just trying to get the data from this table:
https://www.listcorp.com/asx/sectors/materials
and put all the values (the TEXT) into a list of lists.
I've tried so many different methods using--> xpath, getByClassName, By.tag
------------
rws = driver.find_elements_by_xpath("//table/tbody/tr/td")
---------------
table = driver.find_element_by_class_name("v-datatable v-table theme--light")
--------------
findElements(By.tagName("table"))
--------------
# to identify the table rows
l = driver.find_elements_by_xpath ("//*[#class= 'v-datatable.v-
table.theme--light']/tbody/tr")
# to get the row count len method
print (len(l))
# THIS RETURNS '1' which cant be right because theres hundreds of rows
And nothing seems to work to get the values in an easy way to understand the manner.
(EDIT SOLVED)
before doing the SOLVED solution below.
First do: time.sleep(10) this will allow the page to load so that the table can actually be retrieved. then just append all the cells to a new list. YOU WILL NEED MULTIPLE LISTS to fit all the rows.
So basically you can use find_elements_by_tag_name
and use this code
row = driver.find_elements_by_tag_name("tr")
data = driver.find_elements_by_tag_name("td")
print('Rows --> {}'.format(len(row)))
print('Data --> {}'.format(len(data)))
for value in row:
print(value.text)
Have proper wait to populate the data.
I want to combine the CSV files from the Johns Hopkins Covid Data (e.g. https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_daily_reports/05-10-2020.csv & https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_daily_reports/01-23-2020.csv).
I already managed to load the files into a DataFrame as well as sanitizing the header (_ vs. / in some names). Now I want to pick one column (e.g. Confirmed), rename it to the day of the file and then combine those CSV files to get a progress over time.
This merge needs to be done by state_province. In both frames, the key may not be present. How can I do this? I experimented with rightjoin and outerjoin, but didn't have any success. Can someone point me the right way please?
I initially didn't want to share the code that I have so far because I didn't want to guide to a specific solution - but here it is. It is copied together from several Jupyter cells.
using Dates
start = Dates.Date(2020,1,22) #begin of recording
now = Dates.Date(Dates.now())- Dates.Day(1) #today
date_range = collect(start:Dates.Day(1):now) #create a date range with 1 element per day
prefix = "https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_daily_reports/"
suffix = ".csv"
function create_url(date)
return prefix * Dates.format(date, "mm-dd-YYYY") * suffix
end
function cleanup_column_names(name)
if name == "Country/Region" || name == "Country_Region"
return "country"
elseif name == "Province/State" || name == "Province_State"
return "state"
else
return name
end
end
using CSV
using HTTP
using DataFrames
selected_data = "Confirmed"
date = date_range[1]
data = DataFrame(CSV.File(HTTP.get(create_url(date)).body))
DataFrames.rename!(cleanup_column_names, data)
DataFrames.select!(data,["state", "country", selected_data])
DataFrames.rename!(data, 3 => Dates.format(date, "YYYY-mm-dd"))
Regards
Tobias
I am relatively new to Julia, so take my answer with a bit of scepticism:
First, we wrap the DataFrame creation into a function:
function prepare_date_df(date)
data = DataFrame(CSV.File(HTTP.get(create_url(date)).body))
DataFrames.rename!(cleanup_column_names, data)
DataFrames.select!(data,["state", "country", selected_data])
DataFrames.rename!(data, 3 => Dates.format(date, "YYYY-mm-dd"))
return data
end
Let's create our first Dataframe:
df = prepare_date_df(date_range[1])
Now, let's iterate over all the other dates, create a dataframe for each date and merge this with our first dataframe:
for date in date_range[2:end]
df_new = prepare_date_df(date)
df = outerjoin(df, df_new, on = [:state, :country])
end
This works fine for the first two months, but with the growing Dataframes, it suddenly gets very slow (and even hangs?). So I would be very interested in a more performative answer!
I just started learning pig and I have a problem with re-naming an alias. Want I want to do is read a file, filter it, and then join it by itself. What I did is this:
register s3n://uw-cse-344-oregon.aws.amazon.com/myudfs.jar
raw = LOAD 's3n://uw-cse-344-oregon.aws.amazon.com/cse344-test-file' USING TextLoader as (line:chararray);
ntriples = foreach raw generate FLATTEN(myudfs.RDFSplit3(line)) as (subject:chararray,predicate:chararray,object:chararray);
ntriples2 = foreach raw generate FLATTEN(myudfs.RDFSplit3(line)) as (subject2:chararray,predicate2:chararray,object2:chararray);
X = FILTER ntriples BY (subject matches '.*business.*');
X2 = FILTER ntriples2 BY (subject2 matches '.*business.*');
joined= join X by subject, X2 by subject2;
joined = DISTINCT joined;
store joined into '/user/hadoop/join-results' using PigStorage();
as you can see I read and filter the file twice in order two have two different alias for each column. How can I simply copy the filtered collection and assign it new aliases? This operation was supposed to take 18 mins, but took 1.5 hour.
I found the answer:
X = FILTER ntriples BY (subject matches '.*rdfabout\\.com.*') PARALLEL 50;
y = foreach X generate subject as subject2, predicate as predicate2, object as object2 PARALLEL 50;
This is how you make copy of x and changing the aliases.
I am using a live data API that is returning the next arriving trains. I plan on giving the user the next 5 trains arriving. If there are less than 5 trains arriving, how you handle that? I am having trouble thinking of a way, I was thinking a way with if statements but don't know how I would set them up.
time1Depart = dataReturnedFromLiveAPI{1,1}.orig_departure_time;
time2Depart = dataReturnedFromLiveAPI{1,2}.orig_departure_time;
time3Depart = dataReturnedFromLiveAPI{1,3}.orig_departure_time;
time4Depart = dataReturnedFromLiveAPI{1,4}.orig_departure_time;
time5Depart = dataReturnedFromLiveAPI{1,5}.orig_departure_time;
time1Arrival = dataReturnedFromLiveAPI{1,1}.orig_arrival_time;
time2Arrival = dataReturnedFromLiveAPI{1,2}.orig_arrival_time;
time3Arrival = dataReturnedFromLiveAPI{1,3}.orig_arrival_time;
time4Arrival = dataReturnedFromLiveAPI{1,4}.orig_arrival_time;
time5Arrival = dataReturnedFromLiveAPI{1,5}.orig_arrival_time;
The code right now uses a matrix that goes from 1:numoftrains but I am using just the first five.
It's a bad practice to assign individual value to a separate variable. Better if you pass all related values to a vector or cell array depending on class of orig_departure_time and orig_arrival_time.
It looks like dataReturnedFromLiveAPI is a cell array of structures. Then you can do:
timeDepart = cellfun(#(x), x.orig_departure_time, ...
dataReturnedFromLiveAPI(1,1:min(5,size(dataReturnedFromLiveAPI,2))), ...
'UniformOutput',0 );
timeArrival = cellfun(#(x), x.orig_arrival_time, ...
dataReturnedFromLiveAPI(1,1:min(5,size(dataReturnedFromLiveAPI,2))), ...
'UniformOutput',0 );
Then you how to access the values one by one as
time1Depart = timeDepart{1};
If orig_departure_time and orig_arrival_time are numeric scalars, you can use ...'UniformOutput',1.... You will get output as a vector and can get single values with timeDepart(1).
Basically:
I am making a game in the LÖVE Engine where you click to create blocks
Every time you click, a block gets created at your Mouse X and Mouse Y
But, I can only get one block to appear, because I have to name that block (or table) 'object1'
Is there any way to create table after table with increasing values? (like object1{}, object2{}, object3{}, etc... But within the main table, 'created_objects')
But only when clicked, which I suppose rules out the looping part (but if it doesn't please tell me)
Here's my code so far, but it doesn't compile.
function object_create(x, y, id) **--Agruments telling the function where the box should spawn and what the ID of the box is (variable 'obj_count' which increases every time a box is spawned)**
currobj = "obj" .. id **--Gives my currently created object a name**
crob.[currobj] = {} **--Is supposed to create a table for the current object, which holds all of its properties. **
crob.[currobj].body = love.physics.newBody(world, x, y, "dynamic")
crob.[currobj].shape = love.physics.newRectangleShape(30, 30)
crob.[currobj].fixture = love.physics.newFixture(crob.[currobj].body, crob.[currobj].shape, 1) **--The properties**
crob.[currobj].fixture:setRestitution(0.3)
But what should I replace [currobj] with?
Solved
Found what I was looking for. Here's the code if people are wondering:
function block_create(x, y, id) --(the mouse x and y, and the variable that increases)
blocks[id] = {}
blocks[id][1] = love.physics.newBody(world, x, y, "dynamic")
blocks[id][2] = love.physics.newRectangleShape(45, 45)
blocks[id][3] = love.physics.newFixture(blocks[id][1], blocks[id][2])
blocks[id][3]:setRestitution(0.2)
blocks[id][4] = math.random(0, 255) --The Color
blocks[id][5] = math.random(0, 255)
blocks[id][6] = math.random(0, 255)
blockcount = blockcount + 1
i would probably do something like this.
local blocks = {} -- equivalent of your crob table
function create_block(x, y)
local block = funcToCreateBlock() -- whatever code to create the block
table.insert(blocks, block)
return block
end
if you wanted to get a reference to the block you just created with the function, just capture it.
-- gives you the new block, automatically adds it to the list of created blocks
local new_block = create_block(0, 10)
that sticks block objects in your block table and automatically gives each one a numeric index in the table. so if you called create_block() 3 times for 3 different mouse clicks, the blocks table would look like this:
blocks = {
[1] = blockobj1,
[2] = blockobj2,
[3] = blockobj3
}
you could get the second block obj from the blocks table by doing
local block2 = blocks[2]
or you could loop over all the blocks in the table using pairs or ipairs
for idx, block in pairs(blocks) do
-- do something with each block
end
sorry if this doesn't exactly answer your question. but from what you wrote, there didn't seem to be a real reason why you'd need to name the blocks anything specific in the crob table.
If you want those tables to be global, then you can do something like:
sNameOfTable = "NAME"
_G[sNameOfTable] = {1,2}
and then you will have a table variable NAME as depicted here (Codepad).
Otherwise, if you want it to be a child to some other table, something like this would also do:
tTbl = {}
for i = 1, 20 do
local sName = string.format( "NAME%02d", i )
tTbl[sName] = {1,2}
end
for i, v in pairs(tTbl) do
print( i, v )
end
Don't worry about the unsorted output here(Codepad). Lua tables with indexes need not be sorted to be used.